No body text on file.
Open the original to read the full piece.
Cerebras builds wafer-scale AI accelerators (WSE3) that keep large SRAM on‑chip to attack the memory‑bandwidth bottleneck in autoregressive decoding. The WSE3 — a TSMC 5nm wafer with ~4 trillion transistors and ~900k AI cores — exposes 44 GB of SRAM and ~21 PB/s bandwidth; Cerebras cites up to 15× latency improvements versus an Nvidia B200 for inference and extreme speedups (authors claim >1,000× in niche workloads). To address SRAM capacity limits, Cerebras clusters wafers (≈45 CS‑3s to host a 1T‑parameter model; theoretical max 2,048 wafers) and implements manufacturing redundancy to route around defects, at high capital and power cost (node ~$2–3M, ~27 kW).
Commercially the company has two anchor partnerships: a binding AWS term sheet to deliver a disaggregated inference stack pairing CS‑3 with Trainium3 (targeting ~5× token throughput in the same footprint and AWS Bedrock availability), and an OpenAI MRA signed 24 Dec 2025 (deliveries from 23 Jan 2026; Codex‑Spark public 12 Feb 2026). OpenAI committed 750 MW (~$20B over 3 years, ≈$6.7B/yr) with expansion optionality to 2 GW; financing includes a $1B loan and a 33.5M‑share warrant. The article flags two structural weaknesses — SRAM capacity and very high cost per deployment — and positions Cerebras as a low‑latency, premium offering for customers who pay for speed, while noting current modest scale (Abu Dhabi revenues), ~40% gross margin, and ~‑40% EBIT driven by R&D and equity compensation.
Cerebras Wafer-Scale Engine 3 (WSE3) is built on TSMC 5nm with ~4 trillion transistors, ~900,000 AI compute cores, 44 GB on-wafer SRAM and ~21 PB/s memory bandwidth — the company claims up to 15x faster inference vs an Nvidia B200.
Open the original to read the full piece.