mlfreemedium

Like a Glove

hackthebox

Given a file `chal.txt` with 84 lines of analogies in the format: ``` Like non-mainstream is to efl, battery-powered is to? Like sycophancy is to بالشهادة, cont is to? ... Like raving is to سگن, happy is to? ```

$ ls tags/ techniques/
word_analogy_arithmeticcosine_similarityfullwidth_unicode_conversion

$ cat /etc/rate-limit

Rate limit reached (20 reads/hour per IP). Showing preview only — full content returns at the next hour roll-over.

Like a Glove — HackTheBox

Description

Words carry semantic information. Similar to how people can infer meaning based on a word's context, AI can derive representations for words based on their context too! However, the kinds of meaning that a model uses may not match ours. We've found a pair of AIs speaking in metaphors that we can't make any sense of! The embedding model is glove-twitter-25. Note that the flag should be fully ASCII and starts with 'htb{'.

Given a file chal.txt with 84 lines of analogies in the format:

Like non-mainstream is to efl, battery-powered is to?
Like sycophancy is to بالشهادة, cont is to?
...
Like raving is to سگن, happy is to?

Analysis

  1. Task format: Classic word analogies — "A is to B as C is to D". Need to find D for each of the 84 lines.

  2. Model: Explicitly specified — glove-twitter-25 (GloVe embeddings trained on Twitter, 25-dimensional vectors). Available via gensim.downloader.

  3. Model vocabulary: ~1.2M words, including words in various languages (Arabic, Japanese, Korean, Turkish, etc.), as well as Unicode characters — all present in Twitter data.

  4. Key math: Classic word2vec analogy:

    D = B - A + C
    

    Then find the nearest word to vector D by cosine similarity.

  5. Unicode gotcha: The GloVe vocabulary contains fullwidth Unicode digits (0123456789) as separate tokens. The analogy results contain these fullwidth digits instead of ASCII, so a final conversion is needed.

Solution

Step 1: Basic solver (solve.py)

#!/usr/bin/env python3.12 import re import gensim.downloader as api import numpy as np print("[*] Loading glove-twitter-25 model...") model = api.load("glove-twitter-25") print(f"[*] Model loaded. Vocab size: {len(model.key_to_index)}") with open("chal.txt", "r") as f: lines = f.readlines() ...

$ grep --similar

Similar writeups