badyuri
b01lersc
Task: two near-identical UTF-8 text stories where 30 ASCII characters were replaced with higher Unicode code points, hiding data in the offsets. Solution: pair characters position-by-position and concatenate chr(ord(modified) - ord(original)) for each mismatch to recover the flag.
$ ls tags/ techniques/
badyuri — b01lers CTF 2026
Description
Corporate wants you to find the difference between these two files. They are not the same file.
Given: yuri.tar.gz containing yuri/yuri.txt and yuri/yuri_1.txt — two roughly 10 KB UTF-8 text stories that read identically to the eye but have slightly different byte sizes. The goal is to recover a hidden flag from the differences.
Analysis
After extracting the archive we have two files:
10249 bytes yuri.txt (original)
10273 bytes yuri_1.txt (modified, 24 bytes larger)
Both files have 197 lines and, crucially, the same number of Unicode code points (9907 each) — so the modification is a 1-to-1 character substitution, not an insertion. The byte-size difference comes purely from some of the modified characters now needing 2 UTF-8 bytes instead of 1.
Running diff shows exactly 30 lines where a single ASCII letter has been swapped for a visually similar but slightly different character from the Latin-1 / extended Unicode range:
< had their toys ... --> < haÆ their toys ...
< her friend ... --> < hÈr friend ...
< Additionally ... --> < ¼dditionally ...
...
The replaced characters are always just above the ASCII range. That is the hint: the difference in code points is small and deterministic. For every mismatched pair (original, modified):
delta = ord(modified) - ord(original)
lies in the printable ASCII range [0x20, 0x7E]. So each substitution is encoding one byte of hidden data via the offset.
Solution
Walk both files character-by-character, compute the delta at every mismatch, and concatenate the results in reading order.
#!/usr/bin/env python3 # Recover the hidden flag from badyuri with open('yuri/yuri.txt', encoding='utf-8') as f: orig = f.read() with open('yuri/yuri_1.txt', encoding='utf-8') as f: mod = f.read() assert len(orig) == len(mod), "Code point counts differ" flag = ''.join( chr(ord(m) - ord(o)) for o, m in zip(orig, mod) if o != m ) print(flag)
Output:
bctf{w3_l0ve_yur1_rB4DN8aULH9}
The 30 mismatched positions yield the 30-character flag, one ASCII character per substitution — a neat little Unicode stego channel.
$ cat /etc/motd
Liked this one?
Pro unlocks every writeup, every flag, and API access. $9/mo.
$ cat pricing.md