No body text on file.
Open the original to read the full piece.
Charbel‑Raphaël contends OpenAI’s Preparedness Framework v2 sets a fundamentally flawed Critical threshold for AI self‑improvement because it is too permissive, unmeasurable, and self‑certified. He argues the leading indicator (“superhuman research‑scientist agent”) arrives after models already outperform humans, while the lagging indicator—5× generational acceleration sustained for several months—can permit roughly three years of progress before firing; Anthropic reportedly uses a 2× threshold. He highlights missing operational definitions, no cross‑release equivalence metric, and cites Epoch’s pre‑training efficiency 95% CI of [4.5,14.3] months to show large measurement uncertainty. He also calls out Section 4.3’s escape hatch and the absence of external evaluators for self‑improvement (unlike bio/cyber/sandbagging). His remedy: independent evaluation and a concrete red line (e.g., halt when METR’s p50 ≤ 2 months), noting at a ~3.5‑month doubling rate this implies ~2 years to prepare. Community replies were brief and largely tangential, offering alternative framing (attribute substitution, enthalpy vs free‑energy) rather than direct refutation.
Charbel-Raphaël (LessWrong, published 2026-05-02) argues OpenAI’s Preparedness Framework v2 sets a Critical red line for AI self‑improvement that fires too late: the leading indicator (“a superhuman research‑scientist agent”) only triggers after models can out‑research top humans, and the lagging indicator requires a 5× generational acceleration “sustained for several months,” which the author shows can allow roughly ~3 years of effective progress to accumulate before the trigger fires (Anthropic uses a 2× threshold).
Open the original to read the full piece.