mlfreemedium

Like a Glove

hackthebox

Given a file `chal.txt` with 84 lines of analogies in the format: ``` Like non-mainstream is to efl, battery-powered is to? Like sycophancy is to بالشهادة, cont is to? ... Like raving is to سگن, happy is to? ```

$ ls tags/ techniques/

unicode word_embeddings glove word2vec analogy nlp gensim

word_analogy_arithmeticcosine_similarityfullwidth_unicode_conversion

$ cat /etc/rate-limit

Rate limit reached (20 reads/hour per IP). Showing preview only — full content returns at the next hour roll-over.

Like a Glove — HackTheBox

Description

Words carry semantic information. Similar to how people can infer meaning based on a word's context, AI can derive representations for words based on their context too! However, the kinds of meaning that a model uses may not match ours. We've found a pair of AIs speaking in metaphors that we can't make any sense of! The embedding model is glove-twitter-25. Note that the flag should be fully ASCII and starts with 'htb{'.

Given a file chal.txt with 84 lines of analogies in the format:

Like non-mainstream is to efl, battery-powered is to?
Like sycophancy is to بالشهادة, cont is to?
...
Like raving is to سگن, happy is to?

Analysis

Task format: Classic word analogies — "A is to B as C is to D". Need to find D for each of the 84 lines.
Model: Explicitly specified — glove-twitter-25 (GloVe embeddings trained on Twitter, 25-dimensional vectors). Available via gensim.downloader.
Model vocabulary: ~1.2M words, including words in various languages (Arabic, Japanese, Korean, Turkish, etc.), as well as Unicode characters — all present in Twitter data.
Key math: Classic word2vec analogy:
```
D = B - A + C
```
Then find the nearest word to vector D by cosine similarity.
Unicode gotcha: The GloVe vocabulary contains fullwidth Unicode digits (０１２３４５６７８９) as separate tokens. The analogy results contain these fullwidth digits instead of ASCII, so a final conversion is needed.

Solution

Step 1: Basic solver (solve.py)

#!/usr/bin/env python3.12
import re
import gensim.downloader as api
import numpy as np

print("[*] Loading glove-twitter-25 model...")
model = api.load("glove-twitter-25")
print(f"[*] Model loaded. Vocab size: {len(model.key_to_index)}")

with open("chal.txt", "r") as f:
    lines = f.readlines()

...

$ grep --similar

Similar writeups

[crypto][Pro]Happy Meal— VolgaCTF 2026
[ml][Pro]leadgate— dicectf2026
[misc][Pro]A?— TaipanByte
[misc][free]exponential— umdctf
[misc][free]Lost in Hyperspace— HackTheBox