reversefreehard

Favorite Potato

b01lersc

Task: predict 20 random (A,X,Y) 6502 register inputs from their outputs after running a 5.8 MB C64 binary made of ~250k invertible macro blocks. Solution: write a minimal 6502 emulator, pattern-match the ~10 macro types (ADD/EOR/SWAP/ROR/reg-combine), invert each op and apply the chain in reverse to recover the input.

$ ls tags/ techniques/
macro_pattern_matchingoperation_inversionregister_permutationcustom_emulator

Favorite Potato — b01lers CTF

Description

Yay, at last - I managed to upgrade my old potato. Now I can run suuuuper loooong binaries that nobody can reverse (RAM is still 64k but I only need a minimal amount).

ncat --ssl favorite-potato.opus4-7.b01le.rs 8443

Files provided:

  • favorite_potato.py — server loop. Generates 20 random (A0,X0,Y0) triples, runs run_c64("code.bin", A0, X0, Y0) on each, prints only the outputs, and asks the player to tell it back the 20 inputs. Getting all 20 right prints the flag.
  • code.bin — 5.8 MB of 6502 machine code.
  • test.bin — a 9-byte sanity-check binary used by the T)est menu option.
  • screenshot.png — a C64 BASIC screen showing the calling convention: POKE 780,A : POKE 781,X : POKE 782,Y : SYS … then PEEK(780/781/782) to read results.

Goal: invert the 5.8 MB program on 24 bits of state.

Analysis

Calling convention

The screenshot shows the standard C64 KERNAL convention — three memory cells (780/781/782) shadow the CPU registers A/X/Y. Before SYS, BASIC copies them into the 6502, then PEEK reads them back after RTS. So the entire "program I/O" is really (A,X,Y) ∈ [0,255]³ — just 24 bits. That's the crucial clue: whatever the 5.8 MB binary does, it is a map {0,1}²⁴ → {0,1}²⁴.

Why the "potato upgrade" is a joke

A real C64 has 64 K of RAM, so you physically cannot load a 5.8 MB image. The emulator used on the server clearly doesn't try — the binary is executed as one long straight-line blob. A quick opcode histogram of code.bin confirms there are no memory loads/stores at all: only register moves, stack ops, and register arithmetic. Everything lives in A, X, Y and on the hardware stack page.

Only 23 distinct opcodes show up across the whole 5.8 MB:

PHP(08) PLP(28) PHA(48) PLA(68) CLC(18) SEC(38)
DEX(CA) DEY(88) INX(E8) INY(C8)
TXA(8A) TYA(98) TAX(AA) TAY(A8) TXS(9A) TSX(BA)
LSR_A(4A) EOR_imm(49) ADC_imm(69) SBC_imm(E9) ORA_imm(09)
LDX_imm(A2) BCC(90) BNE(D0) RTS(60)

A tiny Python emulator (emu.py, ~150 lines) covering exactly these opcodes is enough to run code.bin. Verify against test.bin (08 18 69 2A CA C8 C8 28 60 = PHP;CLC;ADC #$2A;DEX;INY;INY;PLP;RTSA+=42, X-=1, Y+=2), then compare with the live server's T)est option.

Running 250k macros through the naïve emulator takes ~4.7 s per sample — too slow. We need structure.

Macro structure

Scrolling code.bin you quickly see a repeating frame: PHP ... PLP around each block. PHP pushes the flags, PLP restores them — every macro is flag-neutral. Between the fences the macro manipulates A/X/Y only.

There are only ~10 distinct macros. Each has a fixed byte pattern, sometimes with an immediate parameter. Here are the most interesting ones:

ADD_A imm (5 bytes)08 18 69 II 28 = PHP;CLC;ADC #II;PLPA += II.

EOR_A imm (4 bytes)08 49 II 28A ^= II.

SWAP_AX (14 bytes) — 6502 has no reg↔reg swap and the stack has no "pick", so the author abuses TXS/TSX (transfer X↔stack-pointer!) as a private register:

08              PHP
48              PHA          ; [SP] = A
8A 48           TXA; PHA     ; [SP-1] = X
BA              TSX          ; X = SP (point to X-slot)
E8              INX
9A              TXS          ; SP now points above A's slot, but pulls read from SP+1
68              PLA          ; A = old X (from the slot at SP+1)
CA              DEX; 9A TXS  ; restore SP to point at original A slot
AA              TAX          ; oh wait — elided here, see pattern
68 28 28        PLA;PLP;PLP  ; cleanup

The net effect is A ↔ X, but it only works because TSX/TXS let you move the SP register into the arithmetic world and back. This is the "TSX and TXS" that the flag rants about.

SWAP_AY (20 bytes) and SWAP_XY (10 bytes) — same idea with Y.

X_ADD_A (10 bytes)08 48 E8 38 E9 01 D0 FA 68 28 = push A, then loop INX; SEC; SBC #1; BNE -6 until A == 0, restore A. So X += original_A. Y_ADD_A is symmetric with INY.

A_EOR_Y (90 bytes) — eight copies of "test bit k of Y; if set, A ^= (1<<k)" unrolled. Net effect: A ^= Y.

ROR_A k (26 bytes)LDX #k then a loop of LSR A; BCC +2; ORA #$80; DEX; BNE → rotate A right by k mod 8.

Every one of these macros is invertible:

MacroInverse
ADD_A cSUB_A c (i.e. A -= c)
EOR_A cEOR_A c (self-inverse)
SWAP_AX/AY/XYitself
X_ADD_AX -= A (A is unchanged by the macro)
Y_ADD_AY -= A
A_EOR_Yitself (Y is unchanged)
ROR_A kROL_A k

So the whole 5.8 MB program is a composition of bijections on 24 bits — a pure permutation. Given the output we can walk the macro list in reverse, applying each inverse, and read off the original input in microseconds.

Why this is fast

Parsing reduces 250 000 macros of 6502 bytecode to 250 000 tuples like ('ADD_A', 0x42) or ('SWAP_AX', None). Executing those tuples in Python is ~260× faster than byte-level emulation: ~18 ms per full run vs. ~4.7 s. Inversion has the same cost — 20 inverses well under a second, easily within the server timeout.

Solution

Step 1 — minimal 6502 emulator (emu.py)

A straight translation of the 23 opcodes. Flag bits tracked: only C, Z, N are ever tested (by BCC, BNE). Stack memory lives in mem[0x100 | SP]. Verified on test.bin and cross-checked against the server's T)est mode.

Step 2 — macro parser

Match each macro's exact byte pattern at the current PC. For parameterised macros (ADD_A, EOR_A, ROR_A) extract the immediate. Skip the final RTS.

PAT_SWAP_AX = bytes.fromhex("08488a48bae89a68ca9aaa682828") PAT_SWAP_AY = bytes.fromhex("084898488a48bae8e89a68a8caca9a68aa682828") PAT_SWAP_XY = bytes.fromhex("08488a4898aa68a86828") PAT_X_ADD_A = bytes.fromhex("0848e838e901d0fa6828") PAT_Y_ADD_A = bytes.fromhex("0848c838e901d0fa6828") PAT_A_EOR_Y = bytes.fromhex( "084898488a48bae89a68" "4a9006a868490148984a9006a868490248984a9006a868490448984a9006a868" "490848984a9006a868491048984a9006a868492048984a9006a868494048984a" "9006a86849804898" "ca9a68aa68a86828") ROR_A_PRE = bytes.fromhex("08488a48bae89a68a2") ROR_A_POST = bytes.fromhex("4a90020980cad0f848baca9a68aa6828")

Parsing code.bin yields exactly 250 000 ops — cached to ops.pkl so subsequent runs skip the ~5 s parse.

Step 3 — forward and inverse interpreters

def exec_ops(ops, A, X, Y): for op, p in ops: if op == 'ADD_A': A = (A + p) & 0xFF elif op == 'SUB_A': A = (A - p) & 0xFF elif op == 'EOR_A': A ^= p elif op == 'SWAP_AX': A, X = X, A elif op == 'SWAP_AY': A, Y = Y, A elif op == 'SWAP_XY': X, Y = Y, X elif op == 'X_ADD_A': X = (X + A) & 0xFF elif op == 'Y_ADD_A': Y = (Y + A) & 0xFF elif op == 'A_EOR_Y': A ^= Y elif op == 'ROR_A': k = p & 7 if k: A = ((A >> k) | (A << (8 - k))) & 0xFF return A, X, Y def invert_ops(ops, A, X, Y): for op, p in reversed(ops): if op == 'ADD_A': A = (A - p) & 0xFF elif op == 'SUB_A': A = (A + p) & 0xFF elif op == 'EOR_A': A ^= p elif op == 'SWAP_AX': A, X = X, A elif op == 'SWAP_AY': A, Y = Y, A elif op == 'SWAP_XY': X, Y = Y, X elif op == 'X_ADD_A': X = (X - A) & 0xFF elif op == 'Y_ADD_A': Y = (Y - A) & 0xFF elif op == 'A_EOR_Y': A ^= Y elif op == 'ROR_A': k = p & 7 if k: A = ((A << k) | (A >> (8 - k))) & 0xFF return A, X, Y

Sanity check: for any (A,X,Y), exec_ops(ops, *invert_ops(ops, A, X, Y)) == (A, X, Y). Add this assertion inside the solver before submitting — it catches a wrong inverse before we waste our one shot at the server.

Step 4 — network solver

Connect with ssl.create_default_context() + check_hostname=False + CERT_NONE (the server uses a self-signed cert), send R, parse 20 lines of the form Final output #i: A=.. X=.. Y=.., invert each, then feed the triples back as a,x,y\n at each Input #i - A,X,Y: prompt.

Full solver

#!/usr/bin/env python3 """Solver for 'Favorite Potato' CTF challenge. Strategy: 1. code.bin is a sequence of ~250k macro-expanded 6502 operations. 2. Each macro is a simple invertible operation on (A, X, Y) state. 3. Parse macros once, invert them by applying inverses in reverse order. """ import pickle, re, socket, ssl, sys # --- Macro patterns --- PAT_SWAP_AX = bytes.fromhex("08488a48bae89a68ca9aaa682828") PAT_SWAP_AY = bytes.fromhex("084898488a48bae8e89a68a8caca9a68aa682828") PAT_SWAP_XY = bytes.fromhex("08488a4898aa68a86828") PAT_X_ADD_A = bytes.fromhex("0848e838e901d0fa6828") PAT_Y_ADD_A = bytes.fromhex("0848c838e901d0fa6828") PAT_A_EOR_Y = bytes.fromhex( "084898488a48bae89a68" "4a9006a868490148984a9006a868490248984a9006a868490448984a9006a868" "490848984a9006a868491048984a9006a868492048984a9006a868494048984a" "9006a86849804898" "ca9a68aa68a86828") ROR_A_PRE = bytes.fromhex("08488a48bae89a68a2") ROR_A_POST = bytes.fromhex("4a90020980cad0f848baca9a68aa6828") def parse_macro(code, pc): if pc + 5 <= len(code): if code[pc] == 0x08 and code[pc+1] == 0x18 and code[pc+2] == 0x69 and code[pc+4] == 0x28: return ('ADD_A', code[pc+3], 5) if code[pc] == 0x08 and code[pc+1] == 0x38 and code[pc+2] == 0xE9 and code[pc+4] == 0x28: return ('SUB_A', code[pc+3], 5) if pc + 4 <= len(code): if code[pc] == 0x08 and code[pc+1] == 0x49 and code[pc+3] == 0x28: return ('EOR_A', code[pc+2], 4) if code[pc:pc+len(PAT_SWAP_AX)] == PAT_SWAP_AX: return ('SWAP_AX', None, len(PAT_SWAP_AX)) if code[pc:pc+len(PAT_SWAP_AY)] == PAT_SWAP_AY: return ('SWAP_AY', None, len(PAT_SWAP_AY)) if code[pc:pc+len(PAT_SWAP_XY)] == PAT_SWAP_XY: return ('SWAP_XY', None, len(PAT_SWAP_XY)) if code[pc:pc+len(PAT_X_ADD_A)] == PAT_X_ADD_A: return ('X_ADD_A', None, len(PAT_X_ADD_A)) if code[pc:pc+len(PAT_Y_ADD_A)] == PAT_Y_ADD_A: return ('Y_ADD_A', None, len(PAT_Y_ADD_A)) if code[pc:pc+len(PAT_A_EOR_Y)] == PAT_A_EOR_Y: return ('A_EOR_Y', None, len(PAT_A_EOR_Y)) if (code[pc:pc+len(ROR_A_PRE)] == ROR_A_PRE and code[pc+len(ROR_A_PRE)+1:pc+len(ROR_A_PRE)+1+len(ROR_A_POST)] == ROR_A_POST): return ('ROR_A', code[pc+len(ROR_A_PRE)], len(ROR_A_PRE)+1+len(ROR_A_POST)) return None def parse_all(code): ops, pc = [], 0 while pc < len(code): if code[pc] == 0x60: # RTS pc += 1; continue m = parse_macro(code, pc) if m is None: raise ValueError(f"Parse fail at pc={pc:x}: {code[pc:pc+30].hex()}") op, param, sz = m ops.append((op, param)); pc += sz return ops def exec_ops(ops, A, X, Y): for op, p in ops: if op == 'ADD_A': A = (A + p) & 0xFF elif op == 'SUB_A': A = (A - p) & 0xFF elif op == 'EOR_A': A ^= p elif op == 'SWAP_AX': A, X = X, A elif op == 'SWAP_AY': A, Y = Y, A elif op == 'SWAP_XY': X, Y = Y, X elif op == 'X_ADD_A': X = (X + A) & 0xFF elif op == 'Y_ADD_A': Y = (Y + A) & 0xFF elif op == 'A_EOR_Y': A ^= Y elif op == 'ROR_A': k = p & 7 if k: A = ((A >> k) | (A << (8 - k))) & 0xFF return A, X, Y def invert_ops(ops, A, X, Y): for op, p in reversed(ops): if op == 'ADD_A': A = (A - p) & 0xFF elif op == 'SUB_A': A = (A + p) & 0xFF elif op == 'EOR_A': A ^= p elif op == 'SWAP_AX': A, X = X, A elif op == 'SWAP_AY': A, Y = Y, A elif op == 'SWAP_XY': X, Y = Y, X elif op == 'X_ADD_A': X = (X - A) & 0xFF elif op == 'Y_ADD_A': Y = (Y - A) & 0xFF elif op == 'A_EOR_Y': A ^= Y elif op == 'ROR_A': k = p & 7 if k: A = ((A << k) | (A >> (8 - k))) & 0xFF return A, X, Y class NetSolver: def __init__(self, host, port, ops): self.host, self.port, self.ops = host, port, ops ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE self.sock = ctx.wrap_socket(socket.create_connection((host, port)), server_hostname=host) self.buf = b"" def recv_until(self, needle): while needle not in self.buf: chunk = self.sock.recv(4096) if not chunk: raise ConnectionError("closed") self.buf += chunk idx = self.buf.index(needle) out = self.buf[:idx + len(needle)] self.buf = self.buf[idx + len(needle):] return out def send(self, data): if isinstance(data, str): data = data.encode() self.sock.sendall(data) def solve(self): chunk = self.recv_until(b"> ") sys.stdout.write(chunk.decode(errors='replace')); sys.stdout.flush() self.send(b"R\n") outputs = [] while len(outputs) < 20: line = self.recv_until(b"\n").decode() sys.stdout.write(line); sys.stdout.flush() m = re.search(r"Final output #\d+: A=(\d+) X=(\d+) Y=(\d+)", line) if m: outputs.append((int(m.group(1)), int(m.group(2)), int(m.group(3)))) chunk = self.recv_until(b"Input #1 - A,X,Y: ") sys.stdout.write(chunk.decode(errors='replace')); sys.stdout.flush() inversions = [] for (a, x, y) in outputs: inv = invert_ops(self.ops, a, x, y) fwd = exec_ops(self.ops, *inv) assert fwd == (a, x, y), f"Inverse check failed: {inv} -> {fwd} expected {(a,x,y)}" inversions.append(inv) for i, (a, x, y) in enumerate(inversions): payload = f"{a},{x},{y}\n" print(f">>> Input #{i+1}: {payload.strip()}") self.send(payload.encode()) if i < 19: chunk = self.recv_until(b"Input #%d - A,X,Y: " % (i + 2)) sys.stdout.write(chunk.decode(errors='replace')); sys.stdout.flush() try: self.sock.settimeout(10) while True: chunk = self.sock.recv(4096) if not chunk: break sys.stdout.write(chunk.decode(errors='replace')); sys.stdout.flush() except (socket.timeout, ConnectionError): pass def main(): try: with open('ops.pkl', 'rb') as f: ops = pickle.load(f) print(f"[*] Loaded {len(ops)} ops from ops.pkl") except FileNotFoundError: print("[*] Parsing code.bin...") with open('code.bin', 'rb') as f: code = f.read() ops = parse_all(code) with open('ops.pkl', 'wb') as f: pickle.dump(ops, f) print(f"[*] Parsed {len(ops)} ops") host, port = 'favorite-potato.opus4-7.b01le.rs', 8443 print(f"[*] Connecting to {host}:{port}...") NetSolver(host, port, ops).solve() if __name__ == '__main__': main()

Run:

$ python3 solve.py
[*] Loaded 250000 ops from ops.pkl
[*] Connecting to favorite-potato.opus4-7.b01le.rs:8443...
...
Final output #1: A=212 X=6 Y=104
...
Correct!
Here is your flag: bctf{Nev3r_underst00d_why_we_n33d_TSX_and_TXS_unt1l_n0w..:D}

$ cat /etc/motd

Liked this one?

Pro unlocks every writeup, every flag, and API access. $9/mo.

$ cat pricing.md