substack.com

AI Value Capture - The Shift To Model Labs

Brief

AI Value Capture - The Shift To Model Labs focuses on how agentic AI, hardware advances, and software optimizations have recomposed profit pools so that frontier model labs (e.g., Anthropic) are now capturing the lion’s share of value. The brief documents a rapid structural change beginning December 2025: agentic workflows (code agents, multi-turn assistants) drive much higher token utility and demand, increasing model monetization while the cost of producing tokens has plunged. SemiAnalysis quantifies this with internal examples—annual Anthropic token runs up to $10.95M, per-employee consumption near 5B tokens/month, Claude Code input:output ratios ≈300:1 and cache hit rates >90%—and links those usage patterns to Anthropic ARR growth (reported from $9B to >$44B) and gross margin expansion (<40% → >70%).

The piece documents the technical levers behind the shift. New accelerators (Blackwell/GB300/VR NVL72) and ASICs (TPUv7, Trainium3) plus middleware improvements (wideEP, disaggregation, MTP) multiply tokens/sec/gpu: software alone can produce ~14x gains on B300; combined with hardware, SemiAnalysis reports GB300 NVL72 configurations delivering ~17x higher throughput vs H100 in FP8 and up to ~32x in FP4. Meanwhile memory scarcity and pricing have surged (DRAM up ~6x year-over-year, one-year H100 rental +40% since Oct 2025), making Nvidia’s socketed SOCAMM a crucial pricing lever—SemiAnalysis models SOCAMM at ~$8/GB in 1Q26 with potential to exceed $13/GB by end-2026 and assumes ~$10/GB as a working figure. System-level economics are surprising: capex per watt barely changes from GB300 ($37.4/W) to Rubin (VR NVL72 $38.1/W) despite large TDP and FLOP increases, leaving big room between cost-based rental floors (~$4.92/hr/GPU for a 15.6% IRR) and value-based ceilings (~$12.25/hr/GPU parity, conservative ~$9.63/hr). The authors present a ‘One Chart To Rule Them All’ that combines floor and ceiling constraints with Neocloud IRR curves to illustrate how incremental pricing by Nvidia or TSMC could shift captured value. Finally, SemiAnalysis argues that although TSMC and Nvidia could extract more given N3 and memory tightness (TSMC N3 >100% utilization expected H2 2026; DRAM fabs >90%), both firms have so far held pricing in part for strategic/regulatory reasons—meaning short-term value accrues disproportionately to model labs, inference providers, neoclouds and memory vendors unless system suppliers move to value-based pricing.

Why it matters

Agentic AI crossed a practical inflection in December 2025, driving token value up and consumption higher—SemiAnalysis reports Anthropic ARR rising from $9B to over $44B year-to-date and inference gross margins expanding from <40% to >70%.

Key details

  • Hardware and software advances have slashed token production cost: Blackwell-class chips can produce ~30x more tokens/sec on frontier workloads vs Hopper a year earlier; software stacks (wideEP + disagg + MTP) can lift tokens/sec on the same B300 GPU ~1k → ~8k → ~14k (up to 14x by software alone).
  • SemiAnalysis cites internal usage metrics showing annual Anthropic token spend as high as $10.95M, token consumption ~5 billion tokens/month per employee (≈5x Meta) with Claude Code input:output ratios ~300:1 and cache hit rates >90%, making blended token cost for Opus agentic workloads closer to $0.99/MTok versus sticker $5–$25/MTok.
  • Memory has become a major cost and bottleneck: DRAM pricing rose ~6x over the past year; 1-year H100 rental contract prices were ~40% up from the October 2025 trough; SemiAnalysis estimates Nvidia’s SOCAMM cost at ~$8/GB in 1Q26 with a plausible exit-2026 SOCAMM cost >$13/GB and a working assumption around ~$10/GB.
  • Nvidia’s Rubin (VR NVL72) economics: capex/W changes only slightly from GB300 ($37.4/W) to VR NVL72 ($38.1/W) despite chip TDP rising from ~1400W to ~2300W; cost-based Neocloud rental floor for Rubin ≈ $4.92/hr/GPU (5-year, 15% prepay, target IRR ~15.6%), while value-based ceiling parity suggests up to ~$12.25/hr/GPU (conservative ~$9.63/hr).
  • Upstream supply is tight: TSMC N3 utilization expected >100% in H2 2026, DRAM fabs >90% utilization—TSMC and Nvidia have room to extract more value but have so far conserved pricing, constrained partly by strategic/regulatory considerations and the desire to support ecosystem growth.
Reader · no content

No body text on file.

Open the original to read the full piece.