tiles + ai
b01lersc
Task: static x86_64 ELF that uses Intel AMX (tmm tile registers, tdpbssd) to implement a puzzle — each hex-pair input multiplies one column of a 3x(16x16) byte state by a precomputed W matrix; after 3 rounds state[1][1,0] must equal 1. Solution: extract B/C/W matrices and initial states from .rodata, unpack the AMX-B interleaved layout to logical form, build a Python emulator of tdpbssd, and BFS over the reduced state space (each row holds at most one '1', so state = tuple of column indices).
$ ls tags/ techniques/
$ cat /etc/rate-limit
Rate limit reached (20 reads/hour per IP). Showing preview only — full content returns at the next hour roll-over.
tiles + ai — b01lersc 2026
Description
I love matrix multiplication 😍
ncat --ssl tiles--ai.opus4-7.b01le.rs 8443
Files: a single static x86_64 ELF chall. The binary must be run under Intel SDE 10.8.0 with the Sapphire Rapids preset because it uses Intel AMX (Advanced Matrix Extensions), an ISA extension only present on very recent server CPUs.
The server runs three rounds. For each round it reads a line of hex digits from the user, runs a state transition, and only proceeds to the next round if the state reaches a specific value. If all three rounds succeed the server prints the flag.
Analysis
AMX crash course
AMX adds eight 2D "tile" registers tmm0..tmm7, each up to 16 rows × 64 bytes, configured via a TILECFG structure loaded by ldtilecfg. The key instruction is:
tdpbssd tmm_dst, tmm_a, tmm_b
It computes dst[m,n] += sum_k a[m,k] * b[k/4, n*4 + k%4] with 8-bit signed operands and 32-bit accumulators. The B operand uses a special AMX-B interleaved layout: a logical K × N matrix is laid out so that four consecutive rows of the logical matrix are packed into one row of the tile register, with each group of 4 bytes representing one column-block. Unpacking this correctly is the first non-trivial step.
Binary structure
Disassembly shows the tile config at 0x410100:
| Tile | Rows | Bytes/row |
|---|---|---|
| tmm0, tmm1, tmm2 | 16 | 16 (A operands, regular 16×16 byte) |
| tmm3, tmm4, tmm5 | 4 | 64 (B operands, AMX-B layout of 16×16) |
| tmm6, tmm7 | 16 | 64 (C accumulators, 16×16 int32) |
.rodata contains four important tables:
...