Favorite Potato
b01lersc
Task: predict 20 random (A,X,Y) 6502 register inputs from their outputs after running a 5.8 MB C64 binary made of ~250k invertible macro blocks. Solution: write a minimal 6502 emulator, pattern-match the ~10 macro types (ADD/EOR/SWAP/ROR/reg-combine), invert each op and apply the chain in reverse to recover the input.
$ ls tags/ techniques/
Favorite Potato — b01lers CTF
Description
Yay, at last - I managed to upgrade my old potato. Now I can run suuuuper loooong binaries that nobody can reverse (RAM is still 64k but I only need a minimal amount).
ncat --ssl favorite-potato.opus4-7.b01le.rs 8443
Files provided:
favorite_potato.py— server loop. Generates 20 random(A0,X0,Y0)triples, runsrun_c64("code.bin", A0, X0, Y0)on each, prints only the outputs, and asks the player to tell it back the 20 inputs. Getting all 20 right prints the flag.code.bin— 5.8 MB of 6502 machine code.test.bin— a 9-byte sanity-check binary used by theT)estmenu option.screenshot.png— a C64 BASIC screen showing the calling convention:POKE 780,A : POKE 781,X : POKE 782,Y : SYS …thenPEEK(780/781/782)to read results.
Goal: invert the 5.8 MB program on 24 bits of state.
Analysis
Calling convention
The screenshot shows the standard C64 KERNAL convention — three memory cells (780/781/782) shadow the CPU registers A/X/Y. Before SYS, BASIC copies them into the 6502, then PEEK reads them back after RTS. So the entire "program I/O" is really (A,X,Y) ∈ [0,255]³ — just 24 bits. That's the crucial clue: whatever the 5.8 MB binary does, it is a map {0,1}²⁴ → {0,1}²⁴.
Why the "potato upgrade" is a joke
A real C64 has 64 K of RAM, so you physically cannot load a 5.8 MB image. The emulator used on the server clearly doesn't try — the binary is executed as one long straight-line blob. A quick opcode histogram of code.bin confirms there are no memory loads/stores at all: only register moves, stack ops, and register arithmetic. Everything lives in A, X, Y and on the hardware stack page.
Only 23 distinct opcodes show up across the whole 5.8 MB:
PHP(08) PLP(28) PHA(48) PLA(68) CLC(18) SEC(38)
DEX(CA) DEY(88) INX(E8) INY(C8)
TXA(8A) TYA(98) TAX(AA) TAY(A8) TXS(9A) TSX(BA)
LSR_A(4A) EOR_imm(49) ADC_imm(69) SBC_imm(E9) ORA_imm(09)
LDX_imm(A2) BCC(90) BNE(D0) RTS(60)
A tiny Python emulator (emu.py, ~150 lines) covering exactly these opcodes is enough to run code.bin. Verify against test.bin (08 18 69 2A CA C8 C8 28 60 = PHP;CLC;ADC #$2A;DEX;INY;INY;PLP;RTS → A+=42, X-=1, Y+=2), then compare with the live server's T)est option.
Running 250k macros through the naïve emulator takes ~4.7 s per sample — too slow. We need structure.
Macro structure
Scrolling code.bin you quickly see a repeating frame: PHP ... PLP around each block. PHP pushes the flags, PLP restores them — every macro is flag-neutral. Between the fences the macro manipulates A/X/Y only.
There are only ~10 distinct macros. Each has a fixed byte pattern, sometimes with an immediate parameter. Here are the most interesting ones:
ADD_A imm (5 bytes) — 08 18 69 II 28 = PHP;CLC;ADC #II;PLP → A += II.
EOR_A imm (4 bytes) — 08 49 II 28 → A ^= II.
SWAP_AX (14 bytes) — 6502 has no reg↔reg swap and the stack has no "pick", so the author abuses TXS/TSX (transfer X↔stack-pointer!) as a private register:
08 PHP
48 PHA ; [SP] = A
8A 48 TXA; PHA ; [SP-1] = X
BA TSX ; X = SP (point to X-slot)
E8 INX
9A TXS ; SP now points above A's slot, but pulls read from SP+1
68 PLA ; A = old X (from the slot at SP+1)
CA DEX; 9A TXS ; restore SP to point at original A slot
AA TAX ; oh wait — elided here, see pattern
68 28 28 PLA;PLP;PLP ; cleanup
The net effect is A ↔ X, but it only works because TSX/TXS let you move the SP register into the arithmetic world and back. This is the "TSX and TXS" that the flag rants about.
SWAP_AY (20 bytes) and SWAP_XY (10 bytes) — same idea with Y.
X_ADD_A (10 bytes) — 08 48 E8 38 E9 01 D0 FA 68 28 = push A, then loop INX; SEC; SBC #1; BNE -6 until A == 0, restore A. So X += original_A. Y_ADD_A is symmetric with INY.
A_EOR_Y (90 bytes) — eight copies of "test bit k of Y; if set, A ^= (1<<k)" unrolled. Net effect: A ^= Y.
ROR_A k (26 bytes) — LDX #k then a loop of LSR A; BCC +2; ORA #$80; DEX; BNE → rotate A right by k mod 8.
Every one of these macros is invertible:
| Macro | Inverse |
|---|---|
ADD_A c | SUB_A c (i.e. A -= c) |
EOR_A c | EOR_A c (self-inverse) |
SWAP_AX/AY/XY | itself |
X_ADD_A | X -= A (A is unchanged by the macro) |
Y_ADD_A | Y -= A |
A_EOR_Y | itself (Y is unchanged) |
ROR_A k | ROL_A k |
So the whole 5.8 MB program is a composition of bijections on 24 bits — a pure permutation. Given the output we can walk the macro list in reverse, applying each inverse, and read off the original input in microseconds.
Why this is fast
Parsing reduces 250 000 macros of 6502 bytecode to 250 000 tuples like ('ADD_A', 0x42) or ('SWAP_AX', None). Executing those tuples in Python is ~260× faster than byte-level emulation: ~18 ms per full run vs. ~4.7 s. Inversion has the same cost — 20 inverses well under a second, easily within the server timeout.
Solution
Step 1 — minimal 6502 emulator (emu.py)
A straight translation of the 23 opcodes. Flag bits tracked: only C, Z, N are ever tested (by BCC, BNE). Stack memory lives in mem[0x100 | SP]. Verified on test.bin and cross-checked against the server's T)est mode.
Step 2 — macro parser
Match each macro's exact byte pattern at the current PC. For parameterised macros (ADD_A, EOR_A, ROR_A) extract the immediate. Skip the final RTS.
PAT_SWAP_AX = bytes.fromhex("08488a48bae89a68ca9aaa682828") PAT_SWAP_AY = bytes.fromhex("084898488a48bae8e89a68a8caca9a68aa682828") PAT_SWAP_XY = bytes.fromhex("08488a4898aa68a86828") PAT_X_ADD_A = bytes.fromhex("0848e838e901d0fa6828") PAT_Y_ADD_A = bytes.fromhex("0848c838e901d0fa6828") PAT_A_EOR_Y = bytes.fromhex( "084898488a48bae89a68" "4a9006a868490148984a9006a868490248984a9006a868490448984a9006a868" "490848984a9006a868491048984a9006a868492048984a9006a868494048984a" "9006a86849804898" "ca9a68aa68a86828") ROR_A_PRE = bytes.fromhex("08488a48bae89a68a2") ROR_A_POST = bytes.fromhex("4a90020980cad0f848baca9a68aa6828")
Parsing code.bin yields exactly 250 000 ops — cached to ops.pkl so subsequent runs skip the ~5 s parse.
Step 3 — forward and inverse interpreters
def exec_ops(ops, A, X, Y): for op, p in ops: if op == 'ADD_A': A = (A + p) & 0xFF elif op == 'SUB_A': A = (A - p) & 0xFF elif op == 'EOR_A': A ^= p elif op == 'SWAP_AX': A, X = X, A elif op == 'SWAP_AY': A, Y = Y, A elif op == 'SWAP_XY': X, Y = Y, X elif op == 'X_ADD_A': X = (X + A) & 0xFF elif op == 'Y_ADD_A': Y = (Y + A) & 0xFF elif op == 'A_EOR_Y': A ^= Y elif op == 'ROR_A': k = p & 7 if k: A = ((A >> k) | (A << (8 - k))) & 0xFF return A, X, Y def invert_ops(ops, A, X, Y): for op, p in reversed(ops): if op == 'ADD_A': A = (A - p) & 0xFF elif op == 'SUB_A': A = (A + p) & 0xFF elif op == 'EOR_A': A ^= p elif op == 'SWAP_AX': A, X = X, A elif op == 'SWAP_AY': A, Y = Y, A elif op == 'SWAP_XY': X, Y = Y, X elif op == 'X_ADD_A': X = (X - A) & 0xFF elif op == 'Y_ADD_A': Y = (Y - A) & 0xFF elif op == 'A_EOR_Y': A ^= Y elif op == 'ROR_A': k = p & 7 if k: A = ((A << k) | (A >> (8 - k))) & 0xFF return A, X, Y
Sanity check: for any (A,X,Y), exec_ops(ops, *invert_ops(ops, A, X, Y)) == (A, X, Y). Add this assertion inside the solver before submitting — it catches a wrong inverse before we waste our one shot at the server.
Step 4 — network solver
Connect with ssl.create_default_context() + check_hostname=False + CERT_NONE (the server uses a self-signed cert), send R, parse 20 lines of the form Final output #i: A=.. X=.. Y=.., invert each, then feed the triples back as a,x,y\n at each Input #i - A,X,Y: prompt.
Full solver
#!/usr/bin/env python3 """Solver for 'Favorite Potato' CTF challenge. Strategy: 1. code.bin is a sequence of ~250k macro-expanded 6502 operations. 2. Each macro is a simple invertible operation on (A, X, Y) state. 3. Parse macros once, invert them by applying inverses in reverse order. """ import pickle, re, socket, ssl, sys # --- Macro patterns --- PAT_SWAP_AX = bytes.fromhex("08488a48bae89a68ca9aaa682828") PAT_SWAP_AY = bytes.fromhex("084898488a48bae8e89a68a8caca9a68aa682828") PAT_SWAP_XY = bytes.fromhex("08488a4898aa68a86828") PAT_X_ADD_A = bytes.fromhex("0848e838e901d0fa6828") PAT_Y_ADD_A = bytes.fromhex("0848c838e901d0fa6828") PAT_A_EOR_Y = bytes.fromhex( "084898488a48bae89a68" "4a9006a868490148984a9006a868490248984a9006a868490448984a9006a868" "490848984a9006a868491048984a9006a868492048984a9006a868494048984a" "9006a86849804898" "ca9a68aa68a86828") ROR_A_PRE = bytes.fromhex("08488a48bae89a68a2") ROR_A_POST = bytes.fromhex("4a90020980cad0f848baca9a68aa6828") def parse_macro(code, pc): if pc + 5 <= len(code): if code[pc] == 0x08 and code[pc+1] == 0x18 and code[pc+2] == 0x69 and code[pc+4] == 0x28: return ('ADD_A', code[pc+3], 5) if code[pc] == 0x08 and code[pc+1] == 0x38 and code[pc+2] == 0xE9 and code[pc+4] == 0x28: return ('SUB_A', code[pc+3], 5) if pc + 4 <= len(code): if code[pc] == 0x08 and code[pc+1] == 0x49 and code[pc+3] == 0x28: return ('EOR_A', code[pc+2], 4) if code[pc:pc+len(PAT_SWAP_AX)] == PAT_SWAP_AX: return ('SWAP_AX', None, len(PAT_SWAP_AX)) if code[pc:pc+len(PAT_SWAP_AY)] == PAT_SWAP_AY: return ('SWAP_AY', None, len(PAT_SWAP_AY)) if code[pc:pc+len(PAT_SWAP_XY)] == PAT_SWAP_XY: return ('SWAP_XY', None, len(PAT_SWAP_XY)) if code[pc:pc+len(PAT_X_ADD_A)] == PAT_X_ADD_A: return ('X_ADD_A', None, len(PAT_X_ADD_A)) if code[pc:pc+len(PAT_Y_ADD_A)] == PAT_Y_ADD_A: return ('Y_ADD_A', None, len(PAT_Y_ADD_A)) if code[pc:pc+len(PAT_A_EOR_Y)] == PAT_A_EOR_Y: return ('A_EOR_Y', None, len(PAT_A_EOR_Y)) if (code[pc:pc+len(ROR_A_PRE)] == ROR_A_PRE and code[pc+len(ROR_A_PRE)+1:pc+len(ROR_A_PRE)+1+len(ROR_A_POST)] == ROR_A_POST): return ('ROR_A', code[pc+len(ROR_A_PRE)], len(ROR_A_PRE)+1+len(ROR_A_POST)) return None def parse_all(code): ops, pc = [], 0 while pc < len(code): if code[pc] == 0x60: # RTS pc += 1; continue m = parse_macro(code, pc) if m is None: raise ValueError(f"Parse fail at pc={pc:x}: {code[pc:pc+30].hex()}") op, param, sz = m ops.append((op, param)); pc += sz return ops def exec_ops(ops, A, X, Y): for op, p in ops: if op == 'ADD_A': A = (A + p) & 0xFF elif op == 'SUB_A': A = (A - p) & 0xFF elif op == 'EOR_A': A ^= p elif op == 'SWAP_AX': A, X = X, A elif op == 'SWAP_AY': A, Y = Y, A elif op == 'SWAP_XY': X, Y = Y, X elif op == 'X_ADD_A': X = (X + A) & 0xFF elif op == 'Y_ADD_A': Y = (Y + A) & 0xFF elif op == 'A_EOR_Y': A ^= Y elif op == 'ROR_A': k = p & 7 if k: A = ((A >> k) | (A << (8 - k))) & 0xFF return A, X, Y def invert_ops(ops, A, X, Y): for op, p in reversed(ops): if op == 'ADD_A': A = (A - p) & 0xFF elif op == 'SUB_A': A = (A + p) & 0xFF elif op == 'EOR_A': A ^= p elif op == 'SWAP_AX': A, X = X, A elif op == 'SWAP_AY': A, Y = Y, A elif op == 'SWAP_XY': X, Y = Y, X elif op == 'X_ADD_A': X = (X - A) & 0xFF elif op == 'Y_ADD_A': Y = (Y - A) & 0xFF elif op == 'A_EOR_Y': A ^= Y elif op == 'ROR_A': k = p & 7 if k: A = ((A << k) | (A >> (8 - k))) & 0xFF return A, X, Y class NetSolver: def __init__(self, host, port, ops): self.host, self.port, self.ops = host, port, ops ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE self.sock = ctx.wrap_socket(socket.create_connection((host, port)), server_hostname=host) self.buf = b"" def recv_until(self, needle): while needle not in self.buf: chunk = self.sock.recv(4096) if not chunk: raise ConnectionError("closed") self.buf += chunk idx = self.buf.index(needle) out = self.buf[:idx + len(needle)] self.buf = self.buf[idx + len(needle):] return out def send(self, data): if isinstance(data, str): data = data.encode() self.sock.sendall(data) def solve(self): chunk = self.recv_until(b"> ") sys.stdout.write(chunk.decode(errors='replace')); sys.stdout.flush() self.send(b"R\n") outputs = [] while len(outputs) < 20: line = self.recv_until(b"\n").decode() sys.stdout.write(line); sys.stdout.flush() m = re.search(r"Final output #\d+: A=(\d+) X=(\d+) Y=(\d+)", line) if m: outputs.append((int(m.group(1)), int(m.group(2)), int(m.group(3)))) chunk = self.recv_until(b"Input #1 - A,X,Y: ") sys.stdout.write(chunk.decode(errors='replace')); sys.stdout.flush() inversions = [] for (a, x, y) in outputs: inv = invert_ops(self.ops, a, x, y) fwd = exec_ops(self.ops, *inv) assert fwd == (a, x, y), f"Inverse check failed: {inv} -> {fwd} expected {(a,x,y)}" inversions.append(inv) for i, (a, x, y) in enumerate(inversions): payload = f"{a},{x},{y}\n" print(f">>> Input #{i+1}: {payload.strip()}") self.send(payload.encode()) if i < 19: chunk = self.recv_until(b"Input #%d - A,X,Y: " % (i + 2)) sys.stdout.write(chunk.decode(errors='replace')); sys.stdout.flush() try: self.sock.settimeout(10) while True: chunk = self.sock.recv(4096) if not chunk: break sys.stdout.write(chunk.decode(errors='replace')); sys.stdout.flush() except (socket.timeout, ConnectionError): pass def main(): try: with open('ops.pkl', 'rb') as f: ops = pickle.load(f) print(f"[*] Loaded {len(ops)} ops from ops.pkl") except FileNotFoundError: print("[*] Parsing code.bin...") with open('code.bin', 'rb') as f: code = f.read() ops = parse_all(code) with open('ops.pkl', 'wb') as f: pickle.dump(ops, f) print(f"[*] Parsed {len(ops)} ops") host, port = 'favorite-potato.opus4-7.b01le.rs', 8443 print(f"[*] Connecting to {host}:{port}...") NetSolver(host, port, ops).solve() if __name__ == '__main__': main()
Run:
$ python3 solve.py
[*] Loaded 250000 ops from ops.pkl
[*] Connecting to favorite-potato.opus4-7.b01le.rs:8443...
...
Final output #1: A=212 X=6 Y=104
...
Correct!
Here is your flag: bctf{Nev3r_underst00d_why_we_n33d_TSX_and_TXS_unt1l_n0w..:D}
$ cat /etc/motd
Liked this one?
Pro unlocks every writeup, every flag, and API access. $9/mo.
$ cat pricing.md