mlhard

leadgate

dicectf2026

Task: modified GPT-2 model (safetensors) fine-tuned to suppress generating a specific string (the flag). Solution: negate the weight perturbation (W_orig - ΔW instead of W_orig + ΔW) to invert suppression into promotion, then greedy decode from 'dice{' prefix to extract the formerly forbidden flag.

$ ls tags/ techniques/
weight_perturbation_negationsvd_analysisinstruction_tuning_suppression_inversiongreedy_decodingmodel_diff_analysis

🔒

Permission denied (requires tier.pro)

Sign in to access full writeups

Create a free account with GitHub to get started.

$ssh [email protected]