mlfreemedium

leaky-gradient

TJCTF 2026

Task: remote ML oracle accepts a 64-bit value and returns a class plus a partially masked noisy 64-element leak, initially presented as a gradient. Solution: show the leak is actually affine in the input, recover the bias and all basis columns, then threshold column norms to reconstruct the hidden 64-bit secret and hex-decode the flag body.

$ ls tags/ techniques/

hex_encoding oracle machine_learning masked_leak quantization_noise linearity basis_vectors model_extraction

affine_leak_recoverybasis_vector_probingcolumn_norm_thresholdingempirical_linearity_test

leaky-gradient — TJCTF 2026

Description

Original organizer description was not preserved in the local task notes.

The service accepts 16 hex characters (64 input bits) and returns a class label together with a partially masked, quantized 64-element leak vector. The goal is to recover the hidden secret encoded by the model and turn it into the final tjctf{...} flag.

Analysis

At first, the leak looked like a gradient of the loss with respect to each input bit. That suggested optimization-style attacks, so I briefly tried gradient descent / simulated annealing on the binary input space, tried fitting a linear softmax classifier to the observations, and even checked whether the raw leak values mapped directly to ASCII. None of those approaches produced a stable recovery.

The real breakthrough was testing whether the leak behaved linearly rather than like a true gradient. Empirically,

leak(e0 + e1) ≈ leak(e0) + leak(e1) - leak(0)

held with RMS error about 0.7, which was consistent with the integer quantization noise already visible in repeated samples. So the exposed signal was effectively an affine map:

leak(x) = W x + b

where x is the 64-bit input vector, W is an internal matrix, and b = leak(0) is the bias.

Once this was clear, the recovery problem became simple:

query the all-zero input to get b = leak(0)
query every basis vector e_i
recover each column with W[:, i] = leak(e_i) - leak(0)

Because the leak was randomly masked, each logical query had to be repeated several times and merged to reconstruct the full 64-element vector. After recovering all 64 columns, I computed their L2 norms. Those norms separated cleanly into two clusters:

low cluster around 45
high cluster around 81

Thresholding the norms yielded the bit pattern:

1011101011011100000011111111111011100000110111011111000000001101

Grouping the bits into nibbles gave:

badc0ffee0ddf00d

which directly produced the flag body.

Solution

Repeated the same query enough times to average out masking and quantization noise.
Queried the zero vector and reconstructed b = leak(0).
Queried all 64 basis vectors e_i and reconstructed the corresponding full leak vectors.
Computed each recovered column as W[:,i] = leak(e_i) - leak(0).
Measured the L2 norm of every column.
Split the norms with a threshold between the two visible clusters.
Read the resulting 64 bits as hexadecimal: badc0ffee0ddf00d.
Wrapped it in the required format.

#!/usr/bin/env python3
import json
import socket
import numpy as np

HOST = "tjc.tf"
PORT = 31002
SAMPLES = 40

def bits_to_hex(bits):
    out = []
    for i in range(0, 64, 4):
        nib = (bits[i] << 3) | (bits[i + 1] << 2) | (bits[i + 2] << 1) | bits[i + 3]
        out.append(format(nib, "x"))
    return "".join(out)

def recvline(sock):
    data = b""
    while b"\n" not in data:
        chunk = sock.recv(4096)
        if not chunk:
            raise EOFError("connection closed")
        data += chunk
    return data.split(b"\n", 1)[0]

def query_many(hex_input, samples=SAMPLES):
    acc = [[] for _ in range(64)]
    for _ in range(samples):
        sock = socket.create_connection((HOST, PORT), timeout=15)
        sock.settimeout(15)
        recvline(sock)
        recvline(sock)
        sock.sendall((hex_input + "\n").encode())
        row = json.loads(recvline(sock).decode())
        sock.close()
        for i, v in enumerate(row["leak"]):
            if v is not None:
                acc[i].append(v)
    return np.array([np.median(v) if v else np.nan for v in acc], dtype=float)

def fill_missing(vec):
    if np.isnan(vec).any():
        raise ValueError("increase SAMPLES until every coordinate is observed")
    return vec

def basis_vector(i):
    bits = [0] * 64
    bits[i] = 1
    return bits_to_hex(bits)

def main():
    b = fill_missing(query_many("0" * 16))
    cols = []
    for i in range(64):
        ei = fill_missing(query_many(basis_vector(i)))
        cols.append(ei - b)
    W = np.column_stack(cols)

    norms = np.linalg.norm(W, axis=0)
    threshold = (norms.min() + norms.max()) / 2.0
    bits = (norms > threshold).astype(int)

    bit_string = "".join(map(str, bits.tolist()))
    hex_secret = bits_to_hex(bits.tolist())
    flag = f"tjctf{{{hex_secret}}}"

    print("bitstring:", bit_string)
    print("hex:", hex_secret)
    print("flag:", flag)

if __name__ == "__main__":
    main()

$ cat /etc/motd

Liked this one?

Pro unlocks every writeup, every flag, and API access. $9/mo.

$ cat pricing.md

$ grep --similar

Similar writeups

[crypto][free]no-brainrot-allowed— umdctf
[crypto][free]bit-leak— tjctf
[ml][Pro]ReLuess Your Inhibitions— kalmarcf
[crypto][free]Squares— tjctf
[crypto][free]weave— umdctf