Thread pool injection

New to maldev injection? Read the injection/README.md vocabulary callout first.

TL;DR

Drop a work item onto the process's default thread pool via the undocumented TpAllocWork / TpPostWork / TpReleaseWork triplet in ntdll. An idle worker thread that already exists picks the item up and runs the shellcode as a normal callback. No CreateThread, no NtCreateThreadEx, no APC. Local-only.

Trait	Value
Target class	Local (current process)
Creates a new thread?	No — reuses one of the always-running pool workers
Uses `WriteProcessMemory`?	No — caller pre-allocates RX in their own process
Stealth tier	High — no CreateThread / QueueAPC / SetContext call enters EDR's view
CET-affected?	Pool dispatcher may enforce ENDBR64 on Win11 24H2+. Use `inject.ThreadPoolExecCET` for auto-wrapping.

When to pick a different method:

Want callback-via-existing-API rather than work-queue? → Callback execution.
Need Self but want explicit thread (not pool)? → EtwpCreateEtwThread.
Need to inject into a different process? → ThreadPool is Local-only. See CreateRemoteThread / Section Mapping.

Primer

Every Windows process has a default thread pool — a small ring of worker threads created by RtlpInitializeThreadPool early in process startup. The pool's purpose is to dispatch arbitrary work items submitted by kernel32!QueueUserWorkItem, ntdll!TpPostWork, and the modern CreateThreadpoolWork family. The implant abuses the ntdll-private layer: TpAllocWork(callback, ctx, env) builds a TP_WORK object whose callback pointer is the shellcode, TpPostWork pushes it onto the queue, and one of the existing workers dequeues and dispatches it.

The result is execution on a thread that the implant did not create and the EDR did not see being created. The same TP_WORK object is the textbook plumbing every well-behaved Windows process uses dozens of times per second; the only anomaly is the callback target itself.

How it works

sequenceDiagram
    participant Impl as "Implant"
    participant Nt as "ntdll"
    participant Pool as "Default thread pool"
    participant W as "Worker thread"

    Impl->>Impl: VirtualAlloc(RW) + memcpy
    Impl->>Impl: VirtualProtect(RX)
    Impl->>Nt: TpAllocWork(&work, sc, 0, 0)
    Nt-->>Impl: TP_WORK*
    Impl->>Nt: TpPostWork(work)
    Nt->>Pool: enqueue
    Pool->>W: dispatch
    W->>W: shellcode runs as callback
    Impl->>Nt: TpWaitForWork(work, false)
    Note over Impl: blocks until callback returns
    Impl->>Nt: TpReleaseWork(work)

Steps:

Allocate / write / protect in the current process — RW first, then RX.
TpAllocWork — register the shellcode as the callback.
TpPostWork — submit the work item.
Worker dispatch — an existing pool worker dequeues and calls the callback (the shellcode).
TpWaitForWork — block to guarantee completion before TpReleaseWork frees the object underneath the running callback.
TpReleaseWork — clean up.

API → godoc

pkg.go.dev/github.com/oioio-space/maldev/inject is the authoritative reference for every exported symbol. This page teaches the concepts; the godoc is the specification.

Examples

Simple

import "github.com/oioio-space/maldev/inject"

if err := inject.ThreadPoolExec(shellcode); err != nil {
    return err
}

Simple — future-proofed (CET-aware)

// Same code, no per-call decisions. Wraps with cet.Wrap when
// cet.Enforced() flips true on a future Win build; no-op today.
if err := inject.ThreadPoolExecCET(shellcode); err != nil {
    return err
}

Composed (ModuleStomp + manual TpAllocWork)

ThreadPoolExec is a one-shot helper. To make the callback target image-backed, stomp first and call TpAllocWork manually — see inject/threadpool_windows.go for the call shape:

import "github.com/oioio-space/maldev/inject"

addr, err := inject.ModuleStomp("msftedit.dll", shellcode)
if err != nil { return err }
// dispatch via TpAllocWork(addr, ...) — see source for full snippet
return inject.ExecuteCallback(addr, inject.CallbackRtlRegisterWait)

Advanced (chain with evasion preset)

import (
    "github.com/oioio-space/maldev/evasion"
    "github.com/oioio-space/maldev/evasion/preset"
    "github.com/oioio-space/maldev/inject"
)

_ = evasion.ApplyAll(preset.Stealth(), nil)
return inject.ThreadPoolExec(shellcode)

Complex (decrypt + thread-pool + wipe)

import (
    "github.com/oioio-space/maldev/cleanup/memory"
    "github.com/oioio-space/maldev/crypto"
    "github.com/oioio-space/maldev/evasion"
    "github.com/oioio-space/maldev/evasion/preset"
    "github.com/oioio-space/maldev/inject"
)

_ = evasion.ApplyAll(preset.Stealth(), nil)

shellcode, err := crypto.DecryptAESGCM(aesKey, encrypted)
if err != nil { return err }
memory.SecureZero(aesKey)

if err := inject.ThreadPoolExec(shellcode); err != nil { return err }
memory.SecureZero(shellcode)

OPSEC & Detection

Artefact	Where defenders look
`TP_WORK` callback pointer outside any image	EDR memory scanners walk active pool work items (CrowdStrike Falcon Sensor, MDE Live Response)
RW → RX flip in current process	`NtProtectVirtualMemory` telemetry — every modern EDR keys on the protection transition
Pool worker stack containing addresses outside any module	Stack-walking telemetry on the thread-pool dispatcher

D3FEND counters:

D3-PCSV — verifies the callback against image segments.
D3-EAL — WDAC blocks RX flips outside images.

Hardening for the operator: pair with ModuleStomp so the callback pointer is image-backed; spread allocations across multiple smaller pages to reduce signature surface; sleep-mask the shellcode region between activations (evasion/sleepmask).

MITRE ATT&CK

T-ID	Name	Sub-coverage	D3FEND counter
T1055.001	Process Injection: DLL Injection	thread-pool variant — no thread creation	D3-PCSV

Limitations

Local only. Targets the current process's pool. There is no cross-process variant — the TP_WORK object lives in the calling process.
Synchronous via TpWaitForWork. The helper blocks until the callback returns. Long-running shellcode should detach internally (spawn a fiber or thread).
CET dispatcher is not currently enforced on the thread-pool path (unlike RtlRegisterWait, which is). Plain ThreadPoolExec works as-is. The future-proof ThreadPoolExecCET wrapper auto-prepends ENDBR64 via cet.Wrap when cet.Enforced() returns true, so an implant built against this helper survives the day Microsoft flips the dispatcher to ENDBR64-required. Cost on non-enforced hosts: 4 bytes of shellcode prefix.
Region not freed. The RX page persists until process exit unless the implant calls cleanup/memory.WipeAndFree.
Undocumented APIs. TpAllocWork / TpPostWork / TpReleaseWork are not in the SDK; future Windows builds may rename or relocate them.