really impressive launch. two things stood out technically: first, the 200ms micro-turn architecture that makes interactivity a native model capability rather than a harness hack. second, they identified that existing LLM inference libraries arent optimized for the frequent small prefills this requires, built streaming sessions to solve it, and contributed the feature directly to SGLang. we've been tackling the exact same serving challenges with SGLang Omni for streaming speech and video inference - continuous input/output with strict latency budgets is a completely different serving paradigm. excited to see this space growing and to share what weve been working on soon @thinkymachines @lmsysorg
Thinking Machines (@thinkymachines)
People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way.
We share our approach, early results, and a quick look at our model in action.
thinkingmachines.ai/blog/int…
Video
— https://nitter.net/thinkymachines/status/2053938892152435174#m