Thread pool injection
← injection index · docs/index
TL;DR
Drop a work item onto the process's default thread pool via the
undocumented TpAllocWork / TpPostWork / TpReleaseWork triplet in
ntdll. An idle worker thread that already exists picks the item up
and runs the shellcode as a normal callback. No CreateThread, no
NtCreateThreadEx, no APC. Local-only.
Primer
Every Windows process has a default thread pool — a small ring of
worker threads created by RtlpInitializeThreadPool early in process
startup. The pool's purpose is to dispatch arbitrary work items
submitted by kernel32!QueueUserWorkItem, ntdll!TpPostWork, and the
modern CreateThreadpoolWork family. The implant abuses the
ntdll-private layer: TpAllocWork(callback, ctx, env) builds a
TP_WORK object whose callback pointer is the shellcode, TpPostWork
pushes it onto the queue, and one of the existing workers dequeues
and dispatches it.
The result is execution on a thread that the implant did not create
and the EDR did not see being created. The same TP_WORK object is
the textbook plumbing every well-behaved Windows process uses dozens of
times per second; the only anomaly is the callback target itself.
How it works
sequenceDiagram
participant Impl as "Implant"
participant Nt as "ntdll"
participant Pool as "Default thread pool"
participant W as "Worker thread"
Impl->>Impl: VirtualAlloc(RW) + memcpy
Impl->>Impl: VirtualProtect(RX)
Impl->>Nt: TpAllocWork(&work, sc, 0, 0)
Nt-->>Impl: TP_WORK*
Impl->>Nt: TpPostWork(work)
Nt->>Pool: enqueue
Pool->>W: dispatch
W->>W: shellcode runs as callback
Impl->>Nt: TpWaitForWork(work, false)
Note over Impl: blocks until callback returns
Impl->>Nt: TpReleaseWork(work)
Steps:
- Allocate / write / protect in the current process — RW first, then RX.
TpAllocWork— register the shellcode as the callback.TpPostWork— submit the work item.- Worker dispatch — an existing pool worker dequeues and calls the callback (the shellcode).
TpWaitForWork— block to guarantee completion beforeTpReleaseWorkfrees the object underneath the running callback.TpReleaseWork— clean up.
API Reference
inject.ThreadPoolExec(shellcode []byte) error
Execute shellcode on the current process's default thread pool. Owns
allocation (RW → RX), the TpAllocWork/TpPostWork/TpWaitForWork/
TpReleaseWork lifecycle, and cleanup.
Parameters:
shellcode— bytes to execute. The function copies them into a freshly allocated RW page, flips to RX, then dispatches.
Returns: error — wraps ntdll failures and protection-flip
errors. nil only after the shellcode callback returns.
Side effects: allocates len(shellcode)-rounded-up RX page in the
current process. The page is not released — wipe it with
cleanup/memory.WipeAndFree when done.
OPSEC: the callback target is the only anomaly. Pair with
ModuleStomp to make it image-backed.
inject.ThreadPoolExecCET(shellcode []byte) error
CET-aware wrapper around ThreadPoolExec. Calls
cet.Wrap on the shellcode when
cet.Enforced is true, then forwards to
ThreadPoolExec.
Why future-proofed. Current shipping Windows builds do not
enforce CET on the thread-pool dispatcher — meaning plain
ThreadPoolExec works fine today. If a future Windows build
flips the dispatcher to ENDBR64-required (the same model
KiUserApcDispatcher uses), implants built against this helper
keep working without a code change. The cost of a no-op wrap on
non-enforced hosts is 4 bytes of shellcode prefix.
Parameters / Returns / Side effects: identical to
ThreadPoolExec.
Required privileges: unprivileged.
Platform: windows amd64.
Examples
Simple
import "github.com/oioio-space/maldev/inject"
if err := inject.ThreadPoolExec(shellcode); err != nil {
return err
}
Simple — future-proofed (CET-aware)
// Same code, no per-call decisions. Wraps with cet.Wrap when
// cet.Enforced() flips true on a future Win build; no-op today.
if err := inject.ThreadPoolExecCET(shellcode); err != nil {
return err
}
Composed (ModuleStomp + manual TpAllocWork)
ThreadPoolExec is a one-shot helper. To make the callback target
image-backed, stomp first and call TpAllocWork manually — see
inject/threadpool_windows.go
for the call shape:
import "github.com/oioio-space/maldev/inject"
addr, err := inject.ModuleStomp("msftedit.dll", shellcode)
if err != nil { return err }
// dispatch via TpAllocWork(addr, ...) — see source for full snippet
return inject.ExecuteCallback(addr, inject.CallbackRtlRegisterWait)
Advanced (chain with evasion preset)
import (
"github.com/oioio-space/maldev/evasion"
"github.com/oioio-space/maldev/evasion/preset"
"github.com/oioio-space/maldev/inject"
)
_ = evasion.ApplyAll(preset.Stealth(), nil)
return inject.ThreadPoolExec(shellcode)
Complex (decrypt + thread-pool + wipe)
import (
"github.com/oioio-space/maldev/cleanup/memory"
"github.com/oioio-space/maldev/crypto"
"github.com/oioio-space/maldev/evasion"
"github.com/oioio-space/maldev/evasion/preset"
"github.com/oioio-space/maldev/inject"
)
_ = evasion.ApplyAll(preset.Stealth(), nil)
shellcode, err := crypto.DecryptAESGCM(aesKey, encrypted)
if err != nil { return err }
memory.SecureZero(aesKey)
if err := inject.ThreadPoolExec(shellcode); err != nil { return err }
memory.SecureZero(shellcode)
OPSEC & Detection
| Artefact | Where defenders look |
|---|---|
TP_WORK callback pointer outside any image | EDR memory scanners walk active pool work items (CrowdStrike Falcon Sensor, MDE Live Response) |
| RW → RX flip in current process | NtProtectVirtualMemory telemetry — every modern EDR keys on the protection transition |
| Pool worker stack containing addresses outside any module | Stack-walking telemetry on the thread-pool dispatcher |
D3FEND counters:
- D3-PCSV — verifies the callback against image segments.
- D3-EAL — WDAC blocks RX flips outside images.
Hardening for the operator: pair with ModuleStomp
so the callback pointer is image-backed; spread allocations across
multiple smaller pages to reduce signature surface; sleep-mask the
shellcode region between activations
(evasion/sleepmask).
MITRE ATT&CK
| T-ID | Name | Sub-coverage | D3FEND counter |
|---|---|---|---|
| T1055.001 | Process Injection: DLL Injection | thread-pool variant — no thread creation | D3-PCSV |
Limitations
- Local only. Targets the current process's pool. There is no
cross-process variant — the
TP_WORKobject lives in the calling process. - Synchronous via
TpWaitForWork. The helper blocks until the callback returns. Long-running shellcode should detach internally (spawn a fiber or thread). - CET dispatcher is not currently enforced on the thread-pool
path (unlike
RtlRegisterWait, which is). PlainThreadPoolExecworks as-is. The future-proofThreadPoolExecCETwrapper auto-prependsENDBR64viacet.Wrapwhencet.Enforced()returns true, so an implant built against this helper survives the day Microsoft flips the dispatcher to ENDBR64-required. Cost on non-enforced hosts: 4 bytes of shellcode prefix. - Region not freed. The RX page persists until process exit unless
the implant calls
cleanup/memory.WipeAndFree. - Undocumented APIs.
TpAllocWork/TpPostWork/TpReleaseWorkare not in the SDK; future Windows builds may rename or relocate them.
See also
- Callback execution — the broader family; thread pool is the worker-thread variant.
- Module Stomping — pair to make the callback pointer image-backed.
evasion/sleepmask— mask the RX region between dispatches.- Modexp, Calling Conventions in Windows
— original public write-up of
TpAllocWork-based injection.