substack.com

Cerebras Deep Dive

2026-05-02 · 12:06 UTC ·Tech Investments ·15 min read

Brief

Cerebras builds wafer-scale AI accelerators (WSE3) that keep large SRAM on‑chip to attack the memory‑bandwidth bottleneck in autoregressive decoding. The WSE3 — a TSMC 5nm wafer with ~4 trillion transistors and ~900k AI cores — exposes 44 GB of SRAM and ~21 PB/s bandwidth; Cerebras cites up to 15× latency improvements versus an Nvidia B200 for inference and extreme speedups (authors claim >1,000× in niche workloads). To address SRAM capacity limits, Cerebras clusters wafers (≈45 CS‑3s to host a 1T‑parameter model; theoretical max 2,048 wafers) and implements manufacturing redundancy to route around defects, at high capital and power cost (node ~$2–3M, ~27 kW).

Commercially the company has two anchor partnerships: a binding AWS term sheet to deliver a disaggregated inference stack pairing CS‑3 with Trainium3 (targeting ~5× token throughput in the same footprint and AWS Bedrock availability), and an OpenAI MRA signed 24 Dec 2025 (deliveries from 23 Jan 2026; Codex‑Spark public 12 Feb 2026). OpenAI committed 750 MW (~$20B over 3 years, ≈$6.7B/yr) with expansion optionality to 2 GW; financing includes a $1B loan and a 33.5M‑share warrant. The article flags two structural weaknesses — SRAM capacity and very high cost per deployment — and positions Cerebras as a low‑latency, premium offering for customers who pay for speed, while noting current modest scale (Abu Dhabi revenues), ~40% gross margin, and ~‑40% EBIT driven by R&D and equity compensation.

Why it matters

Cerebras Wafer-Scale Engine 3 (WSE3) is built on TSMC 5nm with ~4 trillion transistors, ~900,000 AI compute cores, 44 GB on-wafer SRAM and ~21 PB/s memory bandwidth — the company claims up to 15x faster inference vs an Nvidia B200.

Key details

Disaggregated inference partnership with AWS (binding term sheet) will co-design a solution pairing Cerebras CS-3 wafers with AWS Trainium3; Cerebras projects ~5x more token throughput in the same footprint and up to 15x speed improvements on leading open-source models; AWS will expose the capability via Amazon Bedrock.
OpenAI Master Resale Agreement signed 24 Dec 2025; deliveries began 23 Jan 2026 and OpenAI’s Codex‑Spark on Cerebras went public 12 Feb 2026. OpenAI committed 750 MW of Cerebras inference capacity (~$20B total, ≈$6.7B/year) with an option to expand to 2 GW by 2030; Cerebras received a $1B working-capital loan and issued OpenAI a 33.5M-share warrant tied to purchase milestones.
Practical limits and economics: a 1T-parameter model requires ~45 CS-3 wafers (SRAM capacity), Cerebras can cluster up to 2,048 wafers; networking 45 CS-3 nodes can cost ~$100M+ while a single Nvidia GB200 rack (≈$3.5M) holds >6 similar models — Cerebras nodes are estimated $2–3M each and draw up to 27 kW.
Financial/operational posture: current revenue concentrated in sovereign/university deployments (Abu Dhabi), gross margin ~40%, operating losses (~-40% EBIT margins) driven by heavy R&D and share-based compensation (≈1/3 of operating loss non-cash); warrants will be booked as contra-revenue starting Q1 2026.

Reader · no content

No body text on file.

Open the original to read the full piece.