Twitter/X

Thinking Machines' launch highlighted a 200ms micro-turn architecture that makes…

2026-05-11 · 23:31 UTC ·@GenAI_is_real ·1 min read

Brief

Thinking Machines' launch (May 11, 2026) emphasized two technical advances: a 200ms micro-turn architecture that embeds interactivity in-model, and streaming sessions to address inference libraries' poor handling of frequent small prefills, a feature contributed to SGLang. @GenAIisreal says their SGLang Omni work targets identical continuous I/O, low-latency speech/video serving challenges and will be shared soon.

Why it matters

Thinking Machines' launch highlighted a 200ms micro-turn architecture that makes interactivity a native model capability rather than a harness-level hack (announced May 11, 2026).

Key details

The team identified that existing LLM inference libraries aren't optimized for frequent small prefills, implemented streaming sessions to solve it, and contributed the feature upstream to SGLang.
@GenAI_is_real reports their team is tackling the same serving challenges with SGLang Omni for streaming speech and video inference—continuous input/output with strict latency budgets—and will share results soon; tags: @thinkymachines, @lmsysorg.

Source evidence

really impressive launch. two things stood out technically: first, the 200ms micro-turn architecture that makes interactivity a native model capability rather than a harness hack. second, they identified that existing LLM inference libraries arent optimized for the frequent small prefills this requires, built streaming sessions to solve it, and contributed the feature directly to SGLang. we've been tackling the exact same serving challenges with SGLang Omni for streaming speech and video inference - continuous input/output with strict latency budgets is a completely different serving paradigm. excited to see this space growing and to share what weve been working on soon @thinkymachines @lmsysorg

Thinking Machines (@thinkymachines)

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way.

We share our approach, early results, and a quick look at our model in action.

thinkingmachines.ai/blog/int…

Video

— https://nitter.net/thinkymachines/status/2053938892152435174#m