miscfreeeasy

badyuri

b01lersc

Task: two near-identical UTF-8 text stories where 30 ASCII characters were replaced with higher Unicode code points, hiding data in the offsets. Solution: pair characters position-by-position and concatenate chr(ord(modified) - ord(original)) for each mismatch to recover the flag.

$ ls tags/ techniques/

steganography unicode text diff ascii_offset

character_substitutionunicode_codepoint_diffascii_offset_decoding

$ cat /etc/rate-limit

Rate limit reached (20 reads/hour per IP). Showing preview only — full content returns at the next hour roll-over.

badyuri — b01lers CTF 2026

Description

Corporate wants you to find the difference between these two files. They are not the same file.

Given: yuri.tar.gz containing yuri/yuri.txt and yuri/yuri_1.txt — two roughly 10 KB UTF-8 text stories that read identically to the eye but have slightly different byte sizes. The goal is to recover a hidden flag from the differences.

Analysis

After extracting the archive we have two files:

10249 bytes  yuri.txt     (original)
10273 bytes  yuri_1.txt   (modified, 24 bytes larger)

Both files have 197 lines and, crucially, the same number of Unicode code points (9907 each) — so the modification is a 1-to-1 character substitution, not an insertion. The byte-size difference comes purely from some of the modified characters now needing 2 UTF-8 bytes instead of 1.

Running diff shows exactly 30 lines where a single ASCII letter has been swapped for a visually similar but slightly different character from the Latin-1 / extended Unicode range:

< had their toys ...              -->  < haÆ their toys ...
< her friend ...                  -->  < hÈr friend ...
< Additionally ...                -->  < ¼dditionally ...
...

The replaced characters are always just above the ASCII range. That is the hint: the difference in code points is small and deterministic. For every mismatched pair (original, modified):

delta = ord(modified) - ord(original)

...