phenomenal breakdown of what’s going on with inference right now. ben nailed the shift we’re undergoing and what that means for NVIDIA vs. cerebras (hint it’s fucking amazing for memory producers):
inference is going to be HUGE. 10-50X the value of training but…
they’re 2 types of inference: answer inference (90% of today’s) and AGENTIC inference (10%)
agentic inference is where most of the value will come from in the future but right now it’s in its infancy… 99% of agents act like chatbots for now
but when that flips, agents will require a different type of AI chip, one that doesn’t look like nvidia’s GPU.
it’ll need higher bandwidth memory, an abundance of compute and low latency. agents WON’T be constrained by humans
memory is the most important. very bullish memory providers.
Cerebras is only good for answer inference, it’s not good for agentic.
answer inference is not as valuable. it’ll have its own niche but much smaller TAM vs. the entire inference market.
NVIDIA will maintain the throne for the foreseeable future
Stratechery (@stratechery)
The Inference Shift
Agentic inference is going to be different than the inference we use today, and it will change compute infrastructure because speed won't matter when humans aren't involved.
stratechery.com/2026/the-inf…
— https://nitter.net/stratechery/status/2053777140114444603#m