No body text on file.
Open the original to read the full piece.
Token demand and inference latency are reshaping the AI infrastructure market. The newsletter highlights explosive token growth — OpenAI shows a 320x YoY increase in reasoning tokens and Goldman reports China's enterprise token use rose 162% from June to December — while OpenRouter data indicates an acceleration since January 2026. UBS cites a 99.7% drop in token cost over three years but warns demand growth still makes compute capacity the limiting factor; inference latency has emerged as the next key bottleneck for agentic and coding workflows.
Nvidia's response is a disaggregated inference stack: Vera Rubin GPUs handle memory‑heavy prefill/KV cache work while Groq LPX (SRAM‑centric, low‑latency decode) accelerates token generation. Jensen Huang claims ~35x token‑generation performance improvement when fused via Dynamo; sampling is underway, Samsung will fabricate Groq LPX, and shipments are targeted around Q3 2026. UBS estimates LPX could add ~ $50B to C2027 data‑center revenue, supporting upside scenarios where Nvidia data‑center revenues approach ~$600B in C2027. Countervailing forces include hyperscaler ASIC programs (Meta’s MTIA roadmap and Broadcom as a key partner). JP Morgan projects Broadcom AI revenues of $65B+ in FY26 and $120B+ in FY27 with ~10 GW deployed, implying significant hyperscaler share gains. The authors model a 10x expansion in AI demand by 2031 as plausible and note humanoid robotics components as an additional long‑run growth vector, while warning that in‑house ASICs pose a medium‑term risk to Nvidia's share despite overall market expansion.
OpenAI enterprise data shows API token consumption for reasoning rose 320x year‑over‑year; Goldman reports China enterprise token usage grew 162% from June to December (reported Mar 29, 2026).
Open the original to read the full piece.