miscfreeeasy

badyuri

b01lersc

Task: two near-identical UTF-8 text stories where 30 ASCII characters were replaced with higher Unicode code points, hiding data in the offsets. Solution: pair characters position-by-position and concatenate chr(ord(modified) - ord(original)) for each mismatch to recover the flag.

$ ls tags/ techniques/
character_substitutionunicode_codepoint_diffascii_offset_decoding

badyuri — b01lers CTF 2026

Description

Corporate wants you to find the difference between these two files. They are not the same file.

Given: yuri.tar.gz containing yuri/yuri.txt and yuri/yuri_1.txt — two roughly 10 KB UTF-8 text stories that read identically to the eye but have slightly different byte sizes. The goal is to recover a hidden flag from the differences.

Analysis

After extracting the archive we have two files:

10249 bytes  yuri.txt     (original)
10273 bytes  yuri_1.txt   (modified, 24 bytes larger)

Both files have 197 lines and, crucially, the same number of Unicode code points (9907 each) — so the modification is a 1-to-1 character substitution, not an insertion. The byte-size difference comes purely from some of the modified characters now needing 2 UTF-8 bytes instead of 1.

Running diff shows exactly 30 lines where a single ASCII letter has been swapped for a visually similar but slightly different character from the Latin-1 / extended Unicode range:

< had their toys ...              -->  < haÆ their toys ...
< her friend ...                  -->  < hÈr friend ...
< Additionally ...                -->  < ¼dditionally ...
...

The replaced characters are always just above the ASCII range. That is the hint: the difference in code points is small and deterministic. For every mismatched pair (original, modified):

delta = ord(modified) - ord(original)

lies in the printable ASCII range [0x20, 0x7E]. So each substitution is encoding one byte of hidden data via the offset.

Solution

Walk both files character-by-character, compute the delta at every mismatch, and concatenate the results in reading order.

#!/usr/bin/env python3 # Recover the hidden flag from badyuri with open('yuri/yuri.txt', encoding='utf-8') as f: orig = f.read() with open('yuri/yuri_1.txt', encoding='utf-8') as f: mod = f.read() assert len(orig) == len(mod), "Code point counts differ" flag = ''.join( chr(ord(m) - ord(o)) for o, m in zip(orig, mod) if o != m ) print(flag)

Output:

bctf{w3_l0ve_yur1_rB4DN8aULH9}

The 30 mismatched positions yield the 30-character flag, one ASCII character per substitution — a neat little Unicode stego channel.

$ cat /etc/motd

Liked this one?

Pro unlocks every writeup, every flag, and API access. $9/mo.

$ cat pricing.md