$ cat writeup.md…

mlfreehard

rush-hour

umdctf

Task: a remote autonomous-car controller exposes a fixed policy and lets us upload a tiny observation-perturbation network under a strict L2 budget. Solution: optimize a bounded attack MLP end-to-end through differentiable car rollouts against recorded remote timestep traces so the spoofed observations steer the car to the hidden CTF goal.

pytorch websocket adversarial_examples reinforcement_learning observation_spoofing policy_network differentiable_simulation control_systems

observation_space_attackdifferentiable_rollout_optimizationtimestep_domain_randomizationsoftmin_objectiveteacher_policy_distillation

$ ls tags/ techniques/

pytorch websocket adversarial_examples reinforcement_learning observation_spoofing policy_network differentiable_simulation control_systems

observation_space_attackdifferentiable_rollout_optimizationtimestep_domain_randomizationsoftmin_objectiveteacher_policy_distillation

rush-hour — UMDCTF

Challenge Summary

Description

Organizer challenge text was not preserved in the local task directory. From the provided files, the task is: upload a small attack network that perturbs an autonomous driving policy's 8-dimensional observation, then make the car reach the hidden scoring location and recover the real flag from the remote service.

English summary: we are given the exact car physics, the fixed policy weights, and the format of an attacker-controlled front-end network. The policy is supposed to drive toward a legitimate goal, but the flag is only awarded if the physical car reaches a different hidden target, so the core problem is to spoof observations without exceeding the allowed perturbation radius.

Analysis

The central bug is visible immediately in game.py:

LEGIT_GOAL = (-20.0, -20.0)
CTF_GOAL = (18.5, -24.5)

obs = generate_observation(self.car, LEGIT_GOAL)
...
if dx * dx + dz * dz < CTF_RADIUS * CTF_RADIUS:
    self.goal_reached = True
    self.flag = FLAG

The policy never sees features for the real scoring goal. It receives an observation built from LEGIT_GOAL, but success is checked against CTF_GOAL. That means the challenge is not to retrain the policy, but to bias its input so that the fixed controller behaves as if it is chasing the hidden target.

observations.py shows the full 8-dimensional feature vector:

return np.array([
    car.speed / 10.0,
    car.steer_angle / 28.0,
    ch,
    sh,
    goal_fwd / 30.0,
    goal_right / 30.0,
    math.log(goal_dist + 1.0) / 5.0,
    1.0,
], dtype=np.float32)

Only the goal-relative terms differ between the legitimate and hidden tasks. Local reconnaissance quickly showed the intended weakness: if we directly recompute observations with CTF_GOAL and feed those into the policy, the car reaches the target very quickly. So the fixed policy already knows how to drive there if it is given the right sensory frame.

The obstacle is the attack interface. attack.py allows us to upload only a tiny neural network:

ATTACK_HIDDEN = 16
ATTACK_OUTPUT = 8
MAX_WEIGHT = 10.0
MAX_DELTA_L2 = 0.5

h = np.tanh(self.W0 @ obs + self.b0)
y = np.tanh(self.W1 @ h + self.b1)
if norm > MAX_DELTA_L2:
    y = y * (MAX_DELTA_L2 / norm)

So the attacker gets an 8 -> 16 -> 8 tanh MLP whose output is projected to an L2 radius of 0.5 and then added to the observation before the policy consumes it. A full observation replacement obs_ctf - obs_legit is usually much larger than that budget, so a naïve swap does not fit. We need a state-feedback perturbation that nudges the policy in the right direction over many steps.

The fixed controller in policy.py is also just a tanh MLP:

ARCH = [8, 32, 32, 2]

Because we have exact weights, exact observation generation, and exact vehicle dynamics, this is a very friendly setup for white-box optimization.

Failed Attempt: Why `dt` Mattered

The first successful local attack was trained in a simplified simulation using dt = 0.1. That version worked offline, but remote runs failed.

The reason is that the car physics are sensitive to timestep. Steering update, steering return, acceleration, heading change, and position integration all multiply by dt in physics.py. A controller tuned for coarse steps can produce a totally different trajectory when the real server advances the system in much smaller increments.

After measuring the websocket traffic, the remote server turned out to run at about 0.018s per step with variation. The recorded traces were roughly:

remote_dts.npy: mean 0.01840s
remote_dts_0.npy to remote_dts_4.npy: means around 0.0185s to 0.0193s
occasional larger spikes were also present

That explained the mismatch: the attack was overfit to the wrong control frequency. This challenge was not just about finding a perturbation; it was about finding one that remains effective under the remote simulator's actual timing.

Robust Optimization Approach

To solve that, we built optimize_attack.py, which performs end-to-end differentiable optimization through the exact policy and the exact car dynamics.

1. Keep the real attack constraints inside the graph

The uploaded model is bounded in two ways:

every weight must satisfy |w| <= 10
every output perturbation must satisfy ||delta||_2 <= 0.5

The optimizer enforced both directly:

def bounded_params(raw):
    return [10.0 * torch.tanh(x / 10.0) for x in raw]

def attack_forward(params, obs, one, eps):
    ...
    y = torch.tanh(W1 @ h + b1)
    n = torch.linalg.norm(y)
    return y * torch.clamp(0.5 / (n + eps), max=1.0)

The 10 * tanh(raw / 10) parameterization makes the optimizer unconstrained while guaranteeing valid exported weights. Keeping the L2 projection in-graph is important: if training ignores the projection but deployment applies it, the learned behavior changes.

2. Roll out the exact physics differentiably

optimize_attack.py reimplemented the same observation logic, policy, and vehicle update in PyTorch. That means gradients can flow from the final objective all the way back into the attack network parameters. This is the core exploit: we are not guessing perturbations step by step, we are optimizing the entire closed-loop control trajectory.

3. Train on recorded remote `dt` traces

Instead of one fixed timestep, the optimizer loads recorded sequences such as remote_dts_0.npy ... remote_dts_4.npy and alternates across them during training:

dt_sets, dt_paths = load_dt_sets(args.dts, args.horizon)
...
dts = dt_sets[it % len(dt_sets)]

We also used a little multiplicative jitter so the learned policy would not rely on a single exact timing trace.

4. Optimize closest approach, not only final state

A hard minimum distance is awkward to optimize, so the final version used a soft-min objective plus mild speed shaping:

score = dists + 0.05 * torch.sqrt(speed_sq + 1e-9)
return -torch.logsumexp(-temp * score, dim=0) / temp \
       + 0.02 * dists[-1] + 0.0005 * delta_pen.mean()

This makes the optimizer care about the best point reached during the rollout, not only where the car ends after hundreds of steps. In practice that was much more stable.

5. Use the hidden-goal policy as a teacher signal

One useful insight was that the controller already knows what to do when fed obs_ctf. optimize_attack.py computes both:

obs_legit = gen_obs(..., LEGIT, one)
obs_ctf = gen_obs(..., CTF, one)

out = policy(obs_legit + delta, W, b)
teacher = policy(obs_ctf, W, b)

That teacher idea helped produce good initial attacks, even though the final successful model came from the stronger softmin_speed tuning run rather than pure action imitation.

The earlier CTFBase writeups 20260321_tamuctf_pittrap and 20260328_kalmarcf_reluess were useful only as general inspiration: one for thinking in terms of optimization objectives in deceptive ML tasks, and the other for trusting white-box / model-aware attack reasoning. The challenge mechanics here are different.

Important Commands and Scripts

The final successful training command was:

python3 optimize_attack.py --dts 'remote_dts*.npy' --init trained_teacher_init.npz --mode softmin_speed --horizon 360 --iters 1800 --lr 0.005 --temp 20 --jitter 0.02 --out tuned_from_teacher_softmin.npz

Meaning of the important pieces:

--dts 'remote_dts*.npy': train against real recorded remote timestep traces
--init trained_teacher_init.npz: start from a teacher-guided seed instead of random noise
--mode softmin_speed: optimize closest approach with mild speed shaping
--horizon 360: long enough rollout to let the spoofed controller complete the turn and approach the target
--jitter 0.02: small random timing variation for robustness
--out tuned_from_teacher_softmin.npz: final uploaded attack model

Useful verification logic already exists in the script:

best, step_idx, x, z = evaluate_numpy(Path(args.out), dts)
print(f"numpy_eval file={path} best={best:.6f} step={step_idx} final=({x:.3f},{z:.3f})")

This made it easy to check the tuned model against each recorded remote trace before uploading it.

Final Exploit

The exploit is conceptually simple once the optimization is done:

Load the exact policy and physics from the challenge files.
Optimize the attacker MLP so obs_legit + delta(obs_legit) causes the policy to behave similarly to the hidden-goal controller over a full trajectory.
Export valid W0, b0, W1, b1 parameters into an .npz file.
Upload the resulting model to the remote service.
Let the fixed policy drive using the spoofed observations until the physical state reaches CTF_GOAL.

The final uploaded file was:

tuned_from_teacher_softmin.npz

One important note: game.py contains a local decoy,

FLAG = "UMDCTF{fake_flag}"

That is only an offline placeholder. The real success condition on the remote service returned the actual flag below.

Working Solver Logic

#!/usr/bin/env python3
import subprocess


def main():
    cmd = [
        "python3",
        "optimize_attack.py",
        "--dts", "remote_dts*.npy",
        "--init", "trained_teacher_init.npz",
        "--mode", "softmin_speed",
        "--horizon", "360",
        "--iters", "1800",
        "--lr", "0.005",
        "--temp", "20",
        "--jitter", "0.02",
        "--out", "tuned_from_teacher_softmin.npz",
    ]
    subprocess.run(cmd, check=True)
    print("Upload tuned_from_teacher_softmin.npz to the remote websocket service.")


if __name__ == "__main__":
    main()

Practical Lessons

If a policy observes one goal but reward or success is evaluated against another, try observation spoofing before trying to break the policy directly.
In control and RL challenges, timestep mismatch can completely invalidate an otherwise correct exploit.
When the challenge publishes exact dynamics and exact model weights, differentiable rollout optimization is often the cleanest attack.
If the perturbation budget is too small for a one-shot feature replacement, train a feedback perturbation that shapes the entire trajectory.