No body text on file.
Open the original to read the full piece.
METER’s time‑horizon chart — the viral upward line that tracks how long a human would take to complete tasks that modern AIs can do — was the subject of a close examination by hosts Joe Wisenthal and Tracy Alloway with METER’s president Chris Painter and technical staffer Joel Becker. Painter framed METER’s mission: a Bay Area nonprofit building empirical science to assess when AI systems achieve the kind of sustained autonomy (especially in software engineering and ML research tasks) that would materially raise the stakes for alignment and catastrophic‑risk discussions. Becker walked through the core methodology: choose engineering‑focused tasks, recruit skilled human practitioners (~three baselines per task), time their completion under similar tools, then evaluate models on the same tasks and report the point (the 50% success threshold) where an AI is predicted to succeed half the time. The episode cited concrete numbers — Claude Opus 4.6 measured at 11h 59m for the 50% horizon (Feb 2026), GPT‑5.3 Codex at 5h 50m, and Gemini 3 Pro at 3h 44m (Nov 2025) — and emphasized that the chart signals a recent acceleration in capability doubling (from ~6–7 months historically to ~4 months in the latest tranche of models).
The conversation balanced enthusiasm about the clarity of a single, interpretable metric with sober caveats. Becker and Painter acknowledged important limits: narrow task coverage (mostly engineering work), small human sample sizes, grading noise, monetary incentives for human baseliners, and the difficulty of scaling baselines as horizon lengths grow. They defended the 50% threshold as easier to measure and statistically robust compared with 80–90% reliability levels (which require far larger samples), but conceded that downstream business utility and real‑world productivity gains will depend on reliability, messiness of real tasks, and verification overhead. Painter and Becker also discussed external dynamics: compute investments have risen exponentially and are roughly tracking capability gains; Chinese models appear roughly 9–12 months behind frontier US models on METER’s held‑out tasks; investors sometimes use the charts for decision‑making; and METER’s ~30‑person nonprofit team faces a talent and resource bottleneck even as labs are receptive to third‑party evaluations. Hosts and guests agreed the metric is informative and alarming about rapid progress, while also agreeing more breadth, larger human baselines, and broader public and governmental engagement are needed before the chart can definitively answer when autonomy becomes an existential risk.
Chris Painter (president, METER) said METER is a Bay Area research nonprofit focused on measuring when AI systems acquire enough autonomy to pose catastrophic risks, especially via software engineering and machine‑learning tasks.
Open the original to read the full piece.