mlProhard

leadgate

dicectf2026

Task: modified GPT-2 model (safetensors) fine-tuned to suppress generating a specific string (the flag). Solution: negate the weight perturbation (W_orig - ΔW instead of W_orig + ΔW) to invert suppression into promotion, then greedy decode from 'dice{' prefix to extract the formerly forbidden flag.

$ ls tags/ techniques/
weight_perturbation_negationsvd_analysisinstruction_tuning_suppression_inversiongreedy_decodingmodel_diff_analysis

🔒

Permission denied (requires tier.pro)

Sign in to access full writeups

Sign in with GitHub to continue. No email required.

$sign in

$ grep --similar

Similar writeups