miscfreehard

Customer Service

kitctf

Task: a holpy (HOL theorem prover) proof checker accepts a user-supplied proof and prints the flag if you can prove the const false with no hypotheses. Solution: chain an SMT-translation soundness bug (real_divide done in Python float, 2^54+2 / 2 collides to 2^53) to get a HOL-false-but-Z3-valid lemma, then exploit a dead axiom guard (list == int) to define false as that lemma, and discharge it with primitive kernel rules.

$ ls tags/ techniques/
smt_translation_soundness_bugfloating_point_precision_collisiontrusted_macro_abusedefinitional_axiom_injectionproof_term_construction

Customer Service — GPNCTF 2024 (kitctf / gpn24)

Description

The customer is always right. RIGHT? Experienced staff will tell you that customers are always worst case users. Just last week one customer proclaimed that pineapple does not belong on sushi pizza. Yeah I know, how could he? But the customer is always right. My friend fears for his sanity. So please help me work out the logic details for such an argument.

English summary: we are given checker.py, a proof checker built on a holpy fork (the HOL theorem prover in Python). It reads a hex-encoded JSON "proof" from the user, checks it, and if the checked theorem is the constant false with no hypotheses and no assumptions, it calls win() and prints the flag.

The flavor text is the spec: "the customer is always right" = the checker trusts the claimed theorem; "worst case users" = adversarial input; "work out the logic details for such an argument" = construct a (bogus) logical proof of falsehood. This is a soundness/explosion challenge: prove ⊢ false.

Win condition

checker.py:

  1. hex → bytes → UTF-8 JSON {"imports": [...], "content": [item, ...]}.
  2. Each import → basic.load_theory(imp) (failures only warned).
  3. Each content item → items.parse_item, then:
    • thm item: monitor.check_proof(item, rewrite=False). If status is OK/ProofOK, run report = theory.thy.checked_extend(item.get_extension()).
    • else: same checked_extend.
  4. Axiom guards on each fresh report:
    • if len(report.get_axioms()) > 1: exit(1)
    • elif report.get_axioms() == 1 and item.ty != "thm": exit(1)dead code (see Bug 2)
  5. Fetch the theorem and call theorem_proves_false_unconditioned(thm), which is true iff the conclusion is the const false (prints as false/?false) with no assumptions and no hypotheses. If true → win().

So: produce a thm item whose proof checks and whose theorem is ⊢ false (empty hyps/assums).

Analysis — the "no library" environment

The critical, intentional environment fact: the holpy fork's wheel packaging ([tool.hatch.build.targets.wheel]) lists the code packages but excludes the library/ directory. In the deployed uv run checker.py install there is therefore no theory library at all — every basic.load_theory('nat' | 'real' | 'int' | ...) FAILS.

What remains usable:

  • The EmptyTheory: types bool, fun; consts equals, implies, all.
  • Whatever macros were registered at import time. Proof checking runs at the default check_level=0, which means level-0 macros are TRUSTED (their eval result is taken as-is, the macro is not expanded and re-verified from kernel primitives).

The only usable trusted level-0 macros are z3 and nat_eval. data.real / data.integer cannot even be imported (they load_theory at module top level and crash). This restriction is what shapes the whole solve: you cannot just import arithmetic theorems and rewrite — you must manufacture everything from the empty theory plus the two trusted SMT-ish macros.

Bug 1 — z3 macro: real division done in Python float (SMT translation soundness hole)

The z3 macro is level = 0 ⇒ TRUSTED. Its eval(args, prevs):

  • builds assms = [prev.prop for prev in prevs],
  • calls Z3 solve(Implies(*(assms + [args]))),
  • if Z3 says the implication is valid, returns Thm(args, *(th.hyps for th in prevs)).

I.e. if Z3 considers it valid, the macro hands back the claimed conclusion args with the union of the prevs' hypotheses (empty when prevs are no-hyp). The soundness of this hinges entirely on z3wrapper.convert faithfully translating HOL terms to Z3.

It does not. For a HOL real_divide term a / b:

  • if the whole divide term is_number (a normalized fraction, gcd handling), convert uses exact dest_number / Fractionsafe;
  • otherwise (a non-normalized fraction, gcd ≠ 1) it recurses: rec(t.arg1) / rec(t.arg2) — that is Python int / int, producing a float.

So pick numbers where is_number is false and 64-bit float division loses precision. With

  • N = 2**53 = 9007199254740992
  • M = 2*(N+1) = 2**54 + 2 = 18014398509481986

then:

  • In HOL, (M::real)/2 is exactly N+1.
  • In Python, float(M)/2 rounds to 2**53 = N (the 53-bit mantissa cannot represent N+1).

Therefore the proposition (M::real)/2 ≤ N:

  • is HOL-FALSE (N+1 ≤ N),
  • but Z3 sees it as a tautology (after the float collision: 2^53 ≤ 2^53).

The trusted z3 macro thus proves, with no hypotheses:

⊢ ((18014398509481986::real) / 2 ≤ (9007199254740992::real))

a bogus, HOL-false but checker-accepted lemma. Call its RHS RHS.

The consistency wall — why z3 alone cannot reach false

You might hope to ask the z3 macro directly for ⊢ false. You can't:

  • Every sentence the z3 macro returns is, by construction, a Z3-tautology.
  • The HOL const false converts to Z3 BoolVal(False), which is not a tautology, so solve(Implies(false)) is not valid — the macro refuses.
  • There is no no-hyp route to a Z3-False theorem either.

So z3 gives you a HOL-false proposition (RHS), never the literal const false. nat_eval doesn't bridge the gap either: it gates on is_number exactly like convert, so on terms it accepts it never disagrees with z3.

The missing link is a way to equate the bogus proposition RHS with the const false. That bridge is Bug 2.

Bug 2 — the dead axiom guard lets us def false := RHS

The checker's second guard:

elif report.get_axioms() == 1 and item.ty != "thm": sys.exit(1)

get_axioms() returns a list; comparing a list to the int 1 is always False. The elif is dead. Only len(report.get_axioms()) > 1 is enforced, and report is fresh per item, so each item may freely introduce exactly 1 axiom.

A def (Definition) item emits [Constant(...), Theorem(name+"_def", Thm(prop))]. The Theorem has no proof, so checked_extend records it as 1 axiom (allowed: len == 1, not > 1). Also note: def.ax constants and type.ax types are not axioms at all, so we may declare arbitrarily many supporting constants/types for free.

Therefore we can define a fresh const false :: bool together with the "free" axiom:

false_def : false ⟷ RHS

That is the bridge. Now false is provably equal to a proposition we can prove with the z3 macro.

Solution — the content array

Items, in order:

  1. type.ax: declare types nat and real. The EmptyTheory has neither, and our consts mention them (e.g. of_nat :: nat ⇒ real). Gotcha: first attempts failed with Unknown type nat — these declarations are mandatory.

  2. def.ax axiomatic constants (FREE, not axioms) so the numerals / division / parse with concrete real types (avoids fragile overload resolution through the pipeline):

    • zero :: real, one :: real
    • bit0 :: real ⇒ real, bit1 :: real ⇒ real
    • of_nat :: nat ⇒ real
    • real_divide :: real ⇒ real ⇒ real
    • less_eq :: real ⇒ real ⇒ bool
  3. def item: name false, type bool, prop false ⟷ ((18014398509481986::real) / 2 ≤ (9007199254740992::real)) (M=2^54+2 over 2, vs N=2^53). Adds the const false and axiom false_def.

  4. thm item pwn, prop false, with an explicit low-level proof (list of proof-term steps):

    stepruleargs / prevsresulting thm
    0z3args = RHS, prevs []⊢ RHS (Bug 1 float collision)
    1theoremargs false_def⊢ false ⟷ RHS
    2symmetricprevs [1]⊢ RHS ⟷ false
    3equal_elimprevs [2, 0]⊢ false

    Step 3 is a pure kernel rule (no z3): equal_elim takes A ⟷ B and A to give B. Here A = RHS (from step 0), A ⟷ B = RHS ⟷ false (from step 2), yielding ⊢ false — the const false, no hyps, no assums.

  5. Result: monitor.check_proofProofOK; checked_extend reports len(axioms) == 1 (the thm itself, allowed because ty == "thm"); theorem_proves_false_unconditioned returns True → win() prints the flag.

Notes on the term strings: holpy uses unicode operators (iff / bool equality), (less_eq), (turnstile), for function types. Bare numerals need explicit ::real annotations; the LHS of / forces real inference. Internally numbers are of_nat (bit1/bit0 ... one), which is why the bit0/bit1/of_nat/one/zero consts must be declared.

Exploit script

#!/usr/bin/env python3 # Customer Service — GPNCTF 2024 (kitctf) # Builds a holpy proof of `false` via: # Bug 1: z3 macro real_divide -> Python float collision (2^54+2)/2 == 2^53 # Bug 2: dead axiom guard (list == 1) lets a `def` inject `false_def` import json, ssl, socket N = 9007199254740992 # 2**53 M = 18014398509481986 # 2**54 + 2 RHS = f"(({M}::real) / 2 ≤ ({N}::real))" content = [ # 1) types (EmptyTheory has neither) -- fixes "Unknown type nat" {"ty": "type.ax", "name": "nat", "args": []}, {"ty": "type.ax", "name": "real", "args": []}, # 2) free axiomatic constants so numerals / "/" / "<=" parse over real {"ty": "def.ax", "name": "zero", "type": "real"}, {"ty": "def.ax", "name": "one", "type": "real"}, {"ty": "def.ax", "name": "bit0", "type": "real ⇒ real"}, {"ty": "def.ax", "name": "bit1", "type": "real ⇒ real"}, {"ty": "def.ax", "name": "of_nat", "type": "nat ⇒ real"}, {"ty": "def.ax", "name": "real_divide", "type": "real ⇒ real ⇒ real"}, {"ty": "def.ax", "name": "less_eq", "type": "real ⇒ real ⇒ bool"}, # 3) define `false` as the bogus (HOL-false) proposition -> axiom false_def {"ty": "def", "name": "false", "type": "bool", "prop": f"false ⟷ {RHS}"}, # 4) prove `⊢ false` with 4 proof-term steps {"ty": "thm", "name": "pwn", "prop": "false", "proof": [ {"id": "0", "rule": "z3", "args": RHS, "prevs": [], "th": f"⊢ {RHS}"}, {"id": "1", "rule": "theorem", "args": "false_def","prevs": [], "th": f"⊢ false ⟷ {RHS}"}, {"id": "2", "rule": "symmetric", "args": "", "prevs": ["1"], "th": f"⊢ {RHS} ⟷ false"}, {"id": "3", "rule": "equal_elim", "args": "", "prevs": ["2", "0"], "th": "⊢ false"}, ]}, ] payload = {"imports": [], "content": content} hexproof = json.dumps(payload).encode("utf-8").hex() print(f"[*] hex length: {len(hexproof)}") HOST = "<instance-host>.gpn24.ctf.kitctf.de" PORT = 443 ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE with socket.create_connection((HOST, PORT)) as raw: with ctx.wrap_socket(raw, server_hostname=HOST) as s: # Delivery gotcha: read the banner/prompt FIRST, then send one line. banner = s.recv(4096) print(banner.decode(errors="replace")) s.sendall(hexproof.encode() + b"\n") data = b"" while True: chunk = s.recv(4096) if not chunk: break data += chunk print(data.decode(errors="replace"))

Delivery quirk

The JSON is UTF-8 then hex encoded (~2658 hex chars) and sent as a single line. The live service is reached with TLS on port 443:

ncat --ssl <instance-host>.gpn24.ctf.kitctf.de 443

It terminates TLS into a line service that prints give me your hex proof and then reads one line. Read the banner prompt first, then send hex + "\n". Sending the hex with a trailing newline before reading the banner, or piping via plain ncat, sometimes hit a front HTTP gateway returning 400 Bad Request; reading the prompt first avoids that race.

Final response:

✓ Proof check passed
Congratulations! You've found the flag: GPNCTF{Ex-UN4-LINEa-v4cuA-s3qui7UR-Qu0DLiB37}

$ cat /etc/motd

Liked this one?

Pro unlocks every writeup, every flag, and API access. $9/mo.

$ cat pricing.md

$ grep --similar

Similar writeups