Worked example — Packer Elevation Tour (v0.66 → v0.70)

← examples index · docs/index

What this is

A guided side-by-side tour of every packer mode the maldev project ships, from the original v0.61 PE/ELF in-place transform to the v0.69 318-byte all-asm bundle. Run the snippets against a single toy payload (exit 42 shellcode) and watch the resulting binary sizes and on-disk artefacts evolve.

Aimed at someone learning what these techniques actually cost and what they actually give.

The fixture: a 12-byte shellcode

Every variant below packs the same minimal Linux x86-64 shellcode:

xor edi, edi    ; clear arg
mov dil, 42     ; arg = 42
mov eax, 60     ; sys_exit
syscall

12 bytes, calls _exit(42). Succeeding runs are visible by checking $?.

Variant 1 — transform.BuildMinimalELF64 (raw)

Just wrap the shellcode in a kernel-loadable ELF, no packer logic.

out, _ := transform.BuildMinimalELF64(exit42Shellcode)
os.WriteFile("v1-raw", out, 0o755)
// 132 bytes — the canonical Brian-Raiter "tiny ELF" shape.
AttributeValue
Total size132 B
Stub asm0 (none)
Encryptionnone
.text RWXyes (single PT_LOAD)
Process tree1 binary
/proc/self/mapsone anonymous-ish PT_LOAD
PedagogyBrian Raiter (2002): the smallest legal ELF

Variant 2 — WrapBundleAsExecutableLinux (all-asm)

bundle, _ := packer.PackBinaryBundle(
    []packer.BundlePayload{{
        Binary: exit42Shellcode,
        Fingerprint: packer.FingerprintPredicate{
            PredicateType: packer.PTMatchAll,
        },
    }},
    packer.BundleOptions{},
)
out, _ := packer.WrapBundleAsExecutableLinux(bundle)
os.WriteFile("v2-allasm", out, 0o755)
// v0.72.0: 441 bytes for 1-payload PTMatchAll, 548 bytes for a real
// 2-payload Intel-vs-AMD vendor-aware dispatch.
AttributeValue
Total size (1-payload)441 B
Total size (2-payload vendor-aware)548 B
Stub asm160 B hand-rolled (PIC + CPUID + scan loop + 12-B vendor compare + XOR-decrypt + JMP)
EncryptionXOR rolling 16-byte key
Predicate evalPT_MATCH_ALL + PT_CPUID_VENDOR (with all-zero = wildcard)
.text RWXyes (single PT_LOAD)
Process tree1 binary
/proc/self/mapsone PT_LOAD
Pedagogyreal multi-target asm dispatch

The 548-byte 2-payload bundle breaks down as:

  120 B  ELF header + lone PT_LOAD Phdr
  160 B  stub asm (PIC + CPUID + scan loop + decrypt + jmp)
   32 B  BundleHeader
   96 B  2 × FingerprintEntry (PTCPUIDVendor each)
   64 B  2 × PayloadEntry     (DataRVA + DataSize + 16-byte key)
   ~76 B  encrypted payload data + small struct alignment
  ─────
  ~548 B

That's ~14× smaller than the 7.6 KiB minimum for a bare gcc -static -no-pie hello-world, while doing real CPUID dispatch across two target predicates. The trade-off: payload must be position-independent shellcode (the stub jumps directly into it; PE/ELF headers would crash).

Variant 3 — cmd/bundle-launcher + AppendBundle (Go runtime)

$ go build -o bundle-launcher ./cmd/bundle-launcher
$ packer bundle -wrap bundle-launcher -bundle v2-allasm-bundle-blob.bin -out v3-go
AttributeValue
Total size~5 MB (Go runtime baseline)
StubGo runtime — not asm
EncryptionXOR rolling 16-byte key
Predicate evalfull (CPUID + Win build + Negate)
Fallback modesExit / First / Crash
Process tree2 binaries (launcher → execve payload)
/proc/self/mapsshows /tmp/.../bundle-payload-* for the matched payload
Pedagogythe operator-friendly path: full feature set, slow/loud

Variant 4 — cmd/bundle-launcher reflective (MALDEV_REFLECTIVE=1)

$ MALDEV_REFLECTIVE=1 ./v3-go

Same 5 MB binary, different dispatch mode. The matched payload gets mapped into the launcher's address space via pe/packer/runtime.Prepare and entered on a fake kernel stack. No fork, no execve, no temp file.

AttributeValue
Total size~5 MB
StubGo runtime + asm trampoline
Predicate evalfull (CPUID + Win build + Negate)
Process tree1 binary (no execve)
/proc/self/mapsanonymous regions for the payload
Pedagogyreflective loading done right — auxv patching, segment mapping, RELATIVE relocs

Side-by-side at a glance

VariantSizeStubPredicateProc treeDisk artefact
1 — raw min-ELF132 Bnonenone1none
2 — all-asm bundle (1 entry)441 B160 B asmPT_MATCH_ALL + PT_CPUID_VENDOR1none
2 — all-asm bundle (2 entries, vendor)548 B160 B asmPT_MATCH_ALL + PT_CPUID_VENDOR1none
3 — Go launcher (default)~5 MBGofull (incl. PT_WIN_BUILD + Negate)2temp file
4 — Go launcher reflective~5 MBGo + asmfull1none

Trade-off curve: variant 2 wins binary size and OPSEC at the cost of predicate evaluation; variant 4 wins everything except size; variant 3 is the most operator-friendly default.

Visualising

cmd/packer-vis (v0.70.0) renders both the entropy of any of these binaries and the bundle wire format:

$ packer-vis entropy v1-raw     # 132-byte file, all near-min entropy bins
$ packer-vis entropy v2-allasm  # the encrypted 12-byte payload region
                                # shows up as a high-entropy ▆▇█ smear

$ packer-vis bundle bundle-blob.bin
  bundle.bin
  256 bytes | magic=0x56444c4d version=0x1 count=2 fallback=0

  ┌─ BundleHeader ─────────────────────────────────────┐
  │ 0x00..0x20  magic + version + count + offsets     │
  │            fpTable=0x20    plTable=0x80    data=0xc0 │
  └────────────────────────────────────────────────────┘

  ┌─ [0] FingerprintEntry @ 0x20 ────────────────────┐
  │ predType=0x01  vendor="GenuineIntel"  build=[22000, 99999] │
  └────────────────────────────────────────────────────┘
  ...

Limitations recap

  • Variant 2 (all-asm) selects payload 0 unconditionally today. The full CPUID+PEB evaluator is queued (EmitVendorCompare and EmitBuildRangeCheck primitives are already in tree) — drops in without changing WrapBundleAsExecutableLinux's public signature.
  • Variant 2's payload must be raw position-independent shellcode. PE/ELF payloads need variant 3 or 4.
  • Windows symmetry of the all-asm path (a MinimalPE32Plus writer
    • Windows fingerprint dispatch) is queued for a future minor.

See also