$ cat writeup.md…
$ cat writeup.md…
umdctf
Task: a stripped static ELF64 validator checks a 106-byte input in 27 four-byte chunks using several .rodata lookup tables, a custom helper routine, and a tiny bytecode VM. Solution: extract the tables and VM bytecode with pyelftools, evaluate the helper concretely with angr, reimplement the VM and rolling state update in Python, and reconstruct the flag directly from the expected chunk values.
No separate organizer description was present in the provided workspace files.
English summary: the challenge provides a stripped static Linux ELF called roulette. The program asks for a roulette number, but the actual validation logic is a scripted reverse-engineering problem: it checks a 106-byte string through a helper arithmetic routine, several lookup tables in .rodata, a tiny bytecode VM, and a rolling per-chunk state.
The first step was simple triage:
file roulette checksec --file=roulette strings roulette
That immediately gave the important high-level picture.
file showed a stripped static ELF64 x86-64 binary.
checksec showed:
strings exposed the user-facing messages and confirmed that this was a normal validator rather than a crypto service or packed binary. The most useful strings were:
lets go gambling!submit roulette number:acceptedrejectedJACKPOT! You guessed the winning number.The environment mattered here. I did not have qemu, gdb, rizin, or readelf, so the solve path had to rely on:
objdumppyelftoolsangrThat ended up being enough.
As a precedent, I also looked at the knowledge-base writeup 20260128_htb_flagcasino. That challenge was not the same, but it was a useful reminder that some gambling-themed reverse tasks are easiest to solve by scripting the validator instead of trying to manually recover every intermediate constant.
Using objdump -d on the interesting region showed that the main validation routine starts at 0x401690.
Important observations from that function:
fgets into a 0x100-byte buffer.strcspn.0x6a = 106 bytes.The last point looks odd at first, because 27 * 4 = 108, not 106. The reason becomes clear once the solver is reconstructed: the recovered byte stream is 108 bytes long, but the last two bytes are 0x00 0x00. The program only expects the first 106 non-NUL user-controlled bytes, so the printable flag string is exactly 106 bytes long and naturally stops before those two trailing zeros.
.rodata tablesThe validator pulls from several fixed tables stored in .rodata:
0x499ce00x499d600x499de00x499e600x499ee0Each table contains 27 dwords, one per 4-byte chunk.
The main loop computes two helper inputs from the current chunk index and rolling state:
r15 ^ 0x28d88f59, rotates it, reduces modulo tblC[i], then XORs with tblA[i](i * 0x45d9f3b) ^ 0x8aa115a0, rotates it, reduces modulo tblD[i], then XORs with tblB[i]Those values are fed into a helper arithmetic routine at 0x401cd0.
0x401cd0The helper is the annoying part of the challenge.
Disassembly shows a deeply nested arithmetic routine with repeated divisions and remainder handling. It is absolutely possible to reimplement it by hand, but it is exactly the kind of code where one wrong signedness assumption or one missed quotient/remainder dependency wastes a lot of time.
That made angr the pragmatic choice here.
Instead of symbolically solving anything hard, I used angr in the simplest possible way: create a blank state with zero-filled unconstrained memory/registers, wrap the helper entry point as a callable, and ask for the concrete return value for each chunk.
This keeps the solver short and avoids writing a fragile manual port of the helper.
After the helper returns, the validator runs a tiny bytecode VM.
The relevant handler addresses are:
0x40182b0x4018a00x4019880x4019a00x4019b80x4019d00x4019e80x401a00The bytecode starts at 0x499be1. Instructions are 8 bytes each, and dispatch is controlled by the previous record's last byte. The stream terminates with opcode 0xff, and the VM returns local slot 6.
The only slightly unusual detail is the startup behavior: there is an initial forced copy before normal dispatch begins. Once that is replicated correctly, the VM is straightforward to port.
The VM uses eight 32-bit local slots. For each chunk:
loc[0] = helper_outloc[1] = ebx (the rolling state)loc[2] = chunk indexThe VM output is then XORed with the corresponding dword from the target table.
That gives the expected plaintext 4-byte chunk:
chunk[i] = target[i] ^ vm_out
Concatenating all 27 reconstructed chunks yields 108 bytes total.
The binary does not validate chunks independently. After each chunk, it updates two rolling 32-bit state values:
r15rbxThe update logic is:
mix = ((helper_out * 0x45d9f3b) ^ rbx ^ chunk) rbx = rol32(mix, (i % 5) + 7) + r15 - 0x61c88647 r15 = r15 + 0x7f4a7c15
This matters because rbx becomes the VM seed for the next iteration.
When the solver is implemented correctly, the final reconstructed state matches the binary's last check:
ebx == 0x6286239c
That is a very useful confirmation that the script matches the real validator rather than merely printing something flag-shaped.
The validator processes 27 groups of 4 bytes, so the recovered stream is naturally 108 bytes long.
However, the accepted user input length is strictly checked to be 106 bytes. The reason is that the last recovered dword ends with two trailing NULs:
... d}\x00\x00
So the true human-entered input is simply the first 106 bytes, which decode directly to:
UMDCTF{I_R3ALLY-want-to-pl4y-the-p0werball,+but-my-d4d-said-no-so-im-b3tting-ill-win-on-POLYMARKETinstead}
The clean solve path was:
objdump to map the validator and identify the chunked structure, helper function, VM handlers, and .rodata addresses.pyelftools to extract the five lookup tables and the VM bytecode directly from .rodata.angr to evaluate the helper function at 0x401cd0 concretely for each chunk.rbx / r15 state update.target[i] ^ vm_out.#!/usr/bin/env python3 import angr from angr import options as o from elftools.elf.elffile import ELFFile MASK32 = 0xffffffff def u32(x): return x & MASK32 def rol32(x, n): x &= MASK32 n &= 31 return ((x << n) | (x >> (32 - n))) & MASK32 p = angr.Project('roulette', auto_load_libs=False) state = p.factory.blank_state( add_options={o.ZERO_FILL_UNCONSTRAINED_MEMORY, o.ZERO_FILL_UNCONSTRAINED_REGISTERS} ) helper = p.factory.callable(0x401cd0, base_state=state) with open('roulette', 'rb') as f: elf = ELFFile(f) ro = elf.get_section_by_name('.rodata') data = ro.data() base = ro['sh_addr'] def dwords(addr, n): off = addr - base return [int.from_bytes(data[off + 4*i:off + 4*(i+1)], 'little') for i in range(n)] def bytes_at(addr, n): off = addr - base return data[off:off+n] target = dwords(0x499ce0, 27) tblA = dwords(0x499d60, 27) tblB = dwords(0x499de0, 27) tblC = dwords(0x499e60, 27) tblD = dwords(0x499ee0, 27) bc = bytes_at(0x499be1, 8 * 40) def exec_handler(loc, op, inst): if op in (0, 1): dst = inst[0] & 7 src = inst[1] & 7 loc[dst] = loc[src] elif op == 2: dst = inst[0] & 7 src = inst[1] & 7 loc[dst] = u32(loc[dst] ^ loc[src]) elif op == 3: dst = inst[0] & 7 imm = int.from_bytes(inst[3:7], 'little') loc[dst] = u32(loc[dst] ^ imm) elif op == 4: dst = inst[0] & 7 src = inst[1] & 7 loc[dst] = u32(loc[dst] + loc[src]) elif op == 5: dst = inst[0] & 7 imm = int.from_bytes(inst[3:7], 'little') loc[dst] = u32(loc[dst] + imm) elif op == 6: dst = inst[0] & 7 imm = int.from_bytes(inst[3:7], 'little') loc[dst] = u32(loc[dst] * imm) elif op == 7: dst = inst[0] & 7 imm = int.from_bytes(inst[3:7], 'little') loc[dst] = rol32(loc[dst], imm) elif op == 8: dst = inst[0] & 7 src = inst[1] & 7 loc[dst] = rol32(loc[dst], loc[src]) elif op == 9: dst = inst[0] & 7 imm = int.from_bytes(inst[3:7], 'little') loc[dst] = u32(loc[dst] >> (imm & 31)) else: raise ValueError(op) def run_vm(helper_out, ebx, idx): loc = [0] * 8 loc[0] = u32(helper_out) loc[1] = u32(ebx) loc[2] = u32(idx) ip = 0 # Initial forced copy before normal dispatch. exec_handler(loc, 1, bc[ip:ip+8]) while True: op = bc[ip + 7] ip += 8 if op > 9: return loc[6] if op == 0xff else 0 exec_handler(loc, op, bc[ip:ip+8]) rbx = 0x8aa115a0 r15 = 0 out = bytearray() for i in range(27): rem1 = rol32(r15 ^ 0x28d88f59, (i % 11) + 5) % tblC[i] arg3 = u32(tblA[i] ^ rem1) rem2 = rol32((i * 0x45d9f3b) ^ 0x8aa115a0, (i % 13) + 3) % tblD[i] arg1 = u32(tblB[i] ^ rem2) helper_out = helper(arg1, tblD[i], arg3, tblC[i]).concrete_value & MASK32 vm_out = run_vm(helper_out, rbx, i) chunk = u32(target[i] ^ vm_out) out.extend(chunk.to_bytes(4, 'little')) mix = u32((helper_out * 0x45d9f3b) ^ rbx ^ chunk) rbx = u32(rol32(mix, (i % 5) + 7) + r15 - 0x61c88647) r15 = u32(r15 + 0x7f4a7c15) assert rbx == 0x6286239c print(out[:106].decode())
Running the script prints:
UMDCTF{I_R3ALLY-want-to-pl4y-the-p0werball,+but-my-d4d-said-no-so-im-b3tting-ill-win-on-POLYMARKETinstead}
$ cat /etc/motd
Liked this one?
Pro unlocks every writeup, every flag, and API access. $9/mo.
$ cat pricing.md$ grep --similar