reversefreemedium

roulette

umdctf

Task: a stripped static ELF64 validator checks a 106-byte input in 27 four-byte chunks using several .rodata lookup tables, a custom helper routine, and a tiny bytecode VM. Solution: extract the tables and VM bytecode with pyelftools, evaluate the helper concretely with angr, reimplement the VM and rolling state update in Python, and reconstruct the flag directly from the expected chunk values.

$ ls tags/ techniques/
table_extractionobjdump_reversingvm_reimplementationconcrete_function_evaluationscripted_validator_recovery

roulette — umdctf

Description

No separate organizer description was present in the provided workspace files.

English summary: the challenge provides a stripped static Linux ELF called roulette. The program asks for a roulette number, but the actual validation logic is a scripted reverse-engineering problem: it checks a 106-byte string through a helper arithmetic routine, several lookup tables in .rodata, a tiny bytecode VM, and a rolling per-chunk state.

Analysis

Recon

The first step was simple triage:

file roulette checksec --file=roulette strings roulette

That immediately gave the important high-level picture.

file showed a stripped static ELF64 x86-64 binary.

checksec showed:

  • Partial RELRO
  • no canary
  • NX enabled
  • no PIE
  • SHSTK enabled
  • IBT enabled

strings exposed the user-facing messages and confirmed that this was a normal validator rather than a crypto service or packed binary. The most useful strings were:

  • lets go gambling!
  • submit roulette number:
  • accepted
  • rejected
  • JACKPOT! You guessed the winning number.

The environment mattered here. I did not have qemu, gdb, rizin, or readelf, so the solve path had to rely on:

  • objdump
  • pyelftools
  • angr

That ended up being enough.

As a precedent, I also looked at the knowledge-base writeup 20260128_htb_flagcasino. That challenge was not the same, but it was a useful reminder that some gambling-themed reverse tasks are easiest to solve by scripting the validator instead of trying to manually recover every intermediate constant.

Main validation routine

Using objdump -d on the interesting region showed that the main validation routine starts at 0x401690.

Important observations from that function:

  1. Input is read with fgets into a 0x100-byte buffer.
  2. The newline is stripped with strcspn.
  3. The program requires the resulting length to be exactly 0x6a = 106 bytes.
  4. Validation then proceeds in 27 chunks of 4 bytes each.

The last point looks odd at first, because 27 * 4 = 108, not 106. The reason becomes clear once the solver is reconstructed: the recovered byte stream is 108 bytes long, but the last two bytes are 0x00 0x00. The program only expects the first 106 non-NUL user-controlled bytes, so the printable flag string is exactly 106 bytes long and naturally stops before those two trailing zeros.

The important .rodata tables

The validator pulls from several fixed tables stored in .rodata:

  • target table: 0x499ce0
  • table A: 0x499d60
  • table B: 0x499de0
  • table C: 0x499e60
  • table D: 0x499ee0

Each table contains 27 dwords, one per 4-byte chunk.

The main loop computes two helper inputs from the current chunk index and rolling state:

  • one branch uses r15 ^ 0x28d88f59, rotates it, reduces modulo tblC[i], then XORs with tblA[i]
  • the other branch uses (i * 0x45d9f3b) ^ 0x8aa115a0, rotates it, reduces modulo tblD[i], then XORs with tblB[i]

Those values are fed into a helper arithmetic routine at 0x401cd0.

The helper function at 0x401cd0

The helper is the annoying part of the challenge.

Disassembly shows a deeply nested arithmetic routine with repeated divisions and remainder handling. It is absolutely possible to reimplement it by hand, but it is exactly the kind of code where one wrong signedness assumption or one missed quotient/remainder dependency wastes a lot of time.

That made angr the pragmatic choice here.

Instead of symbolically solving anything hard, I used angr in the simplest possible way: create a blank state with zero-filled unconstrained memory/registers, wrap the helper entry point as a callable, and ask for the concrete return value for each chunk.

This keeps the solver short and avoids writing a fragile manual port of the helper.

The embedded VM

After the helper returns, the validator runs a tiny bytecode VM.

The relevant handler addresses are:

  • copy at 0x40182b
  • rol-by-reg at 0x4018a0
  • mul-imm at 0x401988
  • add-imm at 0x4019a0
  • add-reg at 0x4019b8
  • xor-imm at 0x4019d0
  • xor-reg at 0x4019e8
  • shr-imm at 0x401a00

The bytecode starts at 0x499be1. Instructions are 8 bytes each, and dispatch is controlled by the previous record's last byte. The stream terminates with opcode 0xff, and the VM returns local slot 6.

The only slightly unusual detail is the startup behavior: there is an initial forced copy before normal dispatch begins. Once that is replicated correctly, the VM is straightforward to port.

The VM uses eight 32-bit local slots. For each chunk:

  • loc[0] = helper_out
  • loc[1] = ebx (the rolling state)
  • loc[2] = chunk index

The VM output is then XORed with the corresponding dword from the target table.

That gives the expected plaintext 4-byte chunk:

chunk[i] = target[i] ^ vm_out

Concatenating all 27 reconstructed chunks yields 108 bytes total.

Rolling state update

The binary does not validate chunks independently. After each chunk, it updates two rolling 32-bit state values:

  • r15
  • rbx

The update logic is:

mix = ((helper_out * 0x45d9f3b) ^ rbx ^ chunk) rbx = rol32(mix, (i % 5) + 7) + r15 - 0x61c88647 r15 = r15 + 0x7f4a7c15

This matters because rbx becomes the VM seed for the next iteration.

When the solver is implemented correctly, the final reconstructed state matches the binary's last check:

ebx == 0x6286239c

That is a very useful confirmation that the script matches the real validator rather than merely printing something flag-shaped.

Why the result is 108 bytes but the input is 106 bytes

The validator processes 27 groups of 4 bytes, so the recovered stream is naturally 108 bytes long.

However, the accepted user input length is strictly checked to be 106 bytes. The reason is that the last recovered dword ends with two trailing NULs:

... d}\x00\x00

So the true human-entered input is simply the first 106 bytes, which decode directly to:

UMDCTF{I_R3ALLY-want-to-pl4y-the-p0werball,+but-my-d4d-said-no-so-im-b3tting-ill-win-on-POLYMARKETinstead}

Solution

The clean solve path was:

  1. Use objdump to map the validator and identify the chunked structure, helper function, VM handlers, and .rodata addresses.
  2. Use pyelftools to extract the five lookup tables and the VM bytecode directly from .rodata.
  3. Use angr to evaluate the helper function at 0x401cd0 concretely for each chunk.
  4. Reimplement the tiny VM in Python.
  5. Recreate the rolling rbx / r15 state update.
  6. Compute each chunk as target[i] ^ vm_out.
  7. Join all 27 chunks, then print only the first 106 bytes.
#!/usr/bin/env python3 import angr from angr import options as o from elftools.elf.elffile import ELFFile MASK32 = 0xffffffff def u32(x): return x & MASK32 def rol32(x, n): x &= MASK32 n &= 31 return ((x << n) | (x >> (32 - n))) & MASK32 p = angr.Project('roulette', auto_load_libs=False) state = p.factory.blank_state( add_options={o.ZERO_FILL_UNCONSTRAINED_MEMORY, o.ZERO_FILL_UNCONSTRAINED_REGISTERS} ) helper = p.factory.callable(0x401cd0, base_state=state) with open('roulette', 'rb') as f: elf = ELFFile(f) ro = elf.get_section_by_name('.rodata') data = ro.data() base = ro['sh_addr'] def dwords(addr, n): off = addr - base return [int.from_bytes(data[off + 4*i:off + 4*(i+1)], 'little') for i in range(n)] def bytes_at(addr, n): off = addr - base return data[off:off+n] target = dwords(0x499ce0, 27) tblA = dwords(0x499d60, 27) tblB = dwords(0x499de0, 27) tblC = dwords(0x499e60, 27) tblD = dwords(0x499ee0, 27) bc = bytes_at(0x499be1, 8 * 40) def exec_handler(loc, op, inst): if op in (0, 1): dst = inst[0] & 7 src = inst[1] & 7 loc[dst] = loc[src] elif op == 2: dst = inst[0] & 7 src = inst[1] & 7 loc[dst] = u32(loc[dst] ^ loc[src]) elif op == 3: dst = inst[0] & 7 imm = int.from_bytes(inst[3:7], 'little') loc[dst] = u32(loc[dst] ^ imm) elif op == 4: dst = inst[0] & 7 src = inst[1] & 7 loc[dst] = u32(loc[dst] + loc[src]) elif op == 5: dst = inst[0] & 7 imm = int.from_bytes(inst[3:7], 'little') loc[dst] = u32(loc[dst] + imm) elif op == 6: dst = inst[0] & 7 imm = int.from_bytes(inst[3:7], 'little') loc[dst] = u32(loc[dst] * imm) elif op == 7: dst = inst[0] & 7 imm = int.from_bytes(inst[3:7], 'little') loc[dst] = rol32(loc[dst], imm) elif op == 8: dst = inst[0] & 7 src = inst[1] & 7 loc[dst] = rol32(loc[dst], loc[src]) elif op == 9: dst = inst[0] & 7 imm = int.from_bytes(inst[3:7], 'little') loc[dst] = u32(loc[dst] >> (imm & 31)) else: raise ValueError(op) def run_vm(helper_out, ebx, idx): loc = [0] * 8 loc[0] = u32(helper_out) loc[1] = u32(ebx) loc[2] = u32(idx) ip = 0 # Initial forced copy before normal dispatch. exec_handler(loc, 1, bc[ip:ip+8]) while True: op = bc[ip + 7] ip += 8 if op > 9: return loc[6] if op == 0xff else 0 exec_handler(loc, op, bc[ip:ip+8]) rbx = 0x8aa115a0 r15 = 0 out = bytearray() for i in range(27): rem1 = rol32(r15 ^ 0x28d88f59, (i % 11) + 5) % tblC[i] arg3 = u32(tblA[i] ^ rem1) rem2 = rol32((i * 0x45d9f3b) ^ 0x8aa115a0, (i % 13) + 3) % tblD[i] arg1 = u32(tblB[i] ^ rem2) helper_out = helper(arg1, tblD[i], arg3, tblC[i]).concrete_value & MASK32 vm_out = run_vm(helper_out, rbx, i) chunk = u32(target[i] ^ vm_out) out.extend(chunk.to_bytes(4, 'little')) mix = u32((helper_out * 0x45d9f3b) ^ rbx ^ chunk) rbx = u32(rol32(mix, (i % 5) + 7) + r15 - 0x61c88647) r15 = u32(r15 + 0x7f4a7c15) assert rbx == 0x6286239c print(out[:106].decode())

Running the script prints:

UMDCTF{I_R3ALLY-want-to-pl4y-the-p0werball,+but-my-d4d-said-no-so-im-b3tting-ill-win-on-POLYMARKETinstead}

$ cat /etc/motd

Liked this one?

Pro unlocks every writeup, every flag, and API access. $9/mo.

$ cat pricing.md

$ grep --similar

Similar writeups