Twitter/X

SubQ is presented as the first model built on a fully sub-quadratic…

2026-05-05 · 14:49 UTC ·@shiri_shh ·1 min read

Brief

SubQ is introduced as a breakthrough LLM built on a fully sub-quadratic sparse-attention architecture (SSA) that supports a 12 million-token context window. Alexander Whedon claims it achieves 52× speed vs. FlashAttention at 1M tokens, costs under 5% of Opus (caption also said 10× cheaper than Opus 4.7), and requires nearly 1,000× less compute.

Why it matters

SubQ is presented as the first model built on a fully sub-quadratic sparse-attention architecture (SSA) and the first frontier model with a 12,000,000-token context window (announcement by Alexander Whedon, posted 2026-05-05).

Key details

SubQ is claimed to be 52× faster than FlashAttention at 1,000,000 tokens and to cost less than 5% of Opus (the post’s caption also stated it was '10x cheaper than Opus 4.7').
The post asserts SSA 'finds and focuses only' on important token relationships, yielding 'nearly 1,000× less compute' and enabling a new scaling approach for transformer-based LLMs.

Reader · no content

No body text on file.

Open the original to read the full piece.