mlhard
leadgate
dicectf2026
Task: modified GPT-2 model (safetensors) fine-tuned to suppress generating a specific string (the flag). Solution: negate the weight perturbation (W_orig - ΔW instead of W_orig + ΔW) to invert suppression into promotion, then greedy decode from 'dice{' prefix to extract the formerly forbidden flag.
$ ls tags/ techniques/
neural_networkgpt2safetensorsfine_tuninginstruction_tuningweight_perturbationsvdtransformerslorasuppression
weight_perturbation_negationsvd_analysisinstruction_tuning_suppression_inversiongreedy_decodingmodel_diff_analysis
🔒
Permission denied (requires tier.pro)
Sign in to access full writeups
Create a free account with GitHub to get started.
$ssh [email protected]