Lost in Hyperspace
HackTheBox
The challenge provided a zip file (password: `hackthebox`) containing a single file: `token_embeddings.npz`.
$ ls tags/ techniques/
Lost in Hyperspace — HackTheBox
Description
"A cube is the shadow of a tesseract casted on 3 dimensions. I wonder what other secrets may the shadows hold."
The challenge provided a zip file (password: hackthebox) containing a single file: token_embeddings.npz.
Files
token_embeddings.npz— NumPy compressed archive with two arrays:tokensandembeddings
Analysis
Data Structure
import numpy as np data = np.load('token_embeddings.npz') tokens = data['tokens'] # shape (110,), dtype <U1 — single characters embeddings = data['embeddings'] # shape (110, 512), dtype float64
Key observations:
- 110 tokens, only 41 unique characters — letters, digits, symbols including
{,},_,#,!,- - Same characters have different embeddings (pairwise distances of 8–30 between duplicates), meaning each token has a unique position-dependent embedding
- Embedding values range from -1.64 to 1.64, norms range from 4.7 to 16.8
PCA Exploration
PCA revealed the embedding structure:
- Two dominant principal components explaining ~83% of variance (44.6% + 38.1%)
- Remaining components each explained <0.35%
- Sorting tokens by PC1 or PC2 alone did not produce readable text
- KMeans clustering with various k values (2–16) showed some structure but no clear flag
Key Insight: Sequential Positional Encoding
The challenge hint about "shadows" pointed to the idea that the 512D embeddings encode sequential positional information. Consecutive characters in the original message should have nearby embeddings in the high-dimensional space.
This transforms the problem into finding a Hamiltonian path through 110 nodes — essentially a Traveling Salesman Problem (TSP).
Solution
Strategy: Greedy Nearest-Neighbor TSP
- Compute pairwise Euclidean distance matrix (110×110) using
scipy.spatial.distance.cdist - For each starting token, greedily visit the nearest unvisited token
- Try all 110 possible starting points and look for the flag in the reconstructed text
Solve Script
#!/usr/bin/env python3 """ Lost in Hyperspace — Greedy TSP on token embeddings. Consecutive characters in the hidden message have nearby 512D embeddings. """ import numpy as np from scipy.spatial.distance import cdist # Load data data = np.load('token_embeddings.npz') tokens = data['tokens'] # (110,) single characters embeddings = data['embeddings'] # (110, 512) # Compute pairwise Euclidean distance matrix dist_matrix = cdist(embeddings, embeddings, metric='euclidean') def greedy_tsp(start_idx, dist_matrix): """Nearest-neighbor greedy TSP from a given starting node.""" n = dist_matrix.shape[0] visited = [start_idx] unvisited = set(range(n)) - {start_idx} while unvisited: current = visited[-1] # Find nearest unvisited node nearest = min(unvisited, key=lambda j: dist_matrix[current][j]) visited.append(nearest) unvisited.remove(nearest) return visited # Try all 110 starting points for start in range(len(tokens)): path = greedy_tsp(start, dist_matrix) text = ''.join(tokens[i] for i in path) if 'HTB{' in text: # Extract flag flag_start = text.index('HTB{') flag_end = text.index('}', flag_start) + 1 flag = text[flag_start:flag_end] print(f"Start index {start} ('{tokens[start]}'):") print(f" Full text: {text}") print(f" Flag: {flag}") # Verify: show distances between consecutive flag characters flag_indices = path[flag_start:flag_end] print(" Consecutive distances in flag:") for i in range(len(flag_indices) - 1): a, b = flag_indices[i], flag_indices[i+1] d = dist_matrix[a][b] print(f" {tokens[a]}→{tokens[b]}: {d:.2f}") break
Flag Recovery
Starting from token index 80 (H), the greedy TSP produced:
HLPRI8WBRTYH{{DFPVAZX845CMNBVE}77}SFDCE123____HTB{L0ST_1N_TH3_SP1R4L}!}954-!{FOIURTEQW645#Z4XNLEPOKGEFISDUVBMA
The flag HTB{L0ST_1N_TH3_SP1R4L} appeared at position 46 in the reconstructed sequence.
Verification
Consecutive characters in the flag had consistently small distances (~6–7 units), confirming the path was correct:
| Transition | Distance |
|---|---|
| H→T | 6.60 |
| T→B | 6.69 |
| B→{ | 6.37 |
| {→L | 6.91 |
| L→0 | 6.81 |
| 0→S | 6.54 |
| S→T | 6.72 |
| T→_ | 6.45 |
| ... | ~6–7 |
Multiple starting points (indices 16, 40, 85, 96) all produced paths containing the same flag substring, confirming robustness of the approach.
Lessons Learned
- Positional embeddings encode sequence order — when same characters have different vectors, the embedding captures position, not just identity. Nearby positions → nearby vectors.
- Greedy nearest-neighbor TSP is surprisingly effective — for well-separated sequential embeddings, the greedy heuristic finds the correct path without needing optimal TSP solvers.
- Try all starting points — the greedy path is sensitive to the starting node. With only 110 tokens, trying all starts is trivial (O(n²) per start, O(n³) total).
- PCA is useful for exploration but not the solution — it confirmed structure existed (83% variance in 2 components) but the actual decoding required the TSP approach in the full 512D space.
- Distance consistency validates the solution — consecutive flag characters having uniform ~6–7 unit distances confirms the path is following the intended sequence, not jumping randomly.
$ cat /etc/motd
Liked this one?
Pro unlocks every writeup, every flag, and API access. $9/mo.
$ cat pricing.md$ grep --similar
Similar writeups
- [hardware][free]Project Power Challenge Scenario— hackthebox
- [gamepwn][free]NoRadar— HackTheBox
- [stego][Pro]Капибара— duckerz
- [misc][free]PyDome— HackTheBox
- [forensics][free]Phreaky— HackTheBox