Twitter/X

@jonasgeiping (X) on 2026-05-13

Brief

Jonas Geiping (X/@jonasgeiping) on 2026-05-13 argues current coding agents are constrained by sequential, message‑based exchanges. His paper shows instruction‑tuned multi‑stream LLMs can read and predict tokens across parallel streams in a single forward pass, improving latency, UX, security, and enabling internal/subvocalized parallel reasoning; he notes complementarity with a separate report published 23 hours earlier.

Why it matters

@jonasgeiping (X) on 2026-05-13: message‑based training creates a single‑stream bottleneck that prevents models from "reading while writing," "acting while thinking," and "thinking while processing," limiting agent capability.

Key details

  • Their new paper demonstrates instruction‑tuned multi‑stream LLMs that can predict+read tokens in all streams in parallel each forward pass, reducing latency and enabling continuous/parallel reasoning.
  • Multi‑stream models simplify UX (remove need to interrupt the model), improve separation of concerns for security, and let internal streams subvocalize concerns; Geiping says the work complements another report released 23 hours earlier.
Source evidence

We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn.

This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information.

In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can …
🔵Be created by instruction-tuning for the stream format

🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in)

🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency

🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security

🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized.

Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.