LessWrong

Irretrievability; or, Murphy's Curse of Oneshotness upon ASI

2026-05-04 · 22:11 UTC ·Eliezer Yudkowsky ·34 min read

Brief

Yudkowsky anticipates and rebuts common counterarguments. He argues that software updatability, incremental AIs, or having multiple ASIs are not reliable rejoinders because the mechanisms for recovery can be destroyed, experiments on weaker systems cannot safely reveal lethal failure modes (you cannot test lethal capability without risking ruin), and coordinated agents can still exclude humanity. He stresses concrete technical difficulties — cognitive uncontainability, capability-driven distribution shifts, Goodhart/Goodheart effects, narrow margins, and rapidity — invoking his 'AGI Ruin' catalog. He also laments repeated motivated misinterpretations (calling these interlocutors 'disaster monkeys'), distances his warning from caricatures about doing only "pure theory," and argues that competent engineers succeed by recognising which problems are tractable and which are so cursed that they should not be attempted. Community responses in the excerpt are limited but extend the conversation: TsviBT suggests addressing social divides ('Consensus vs the Slighted') to reduce motivations for risky AGI work; Mo Putera points to Oct 2025 Epoch AI compute BOTECs (mean ~62M human-equivalent digital workers with wide CI) as relevant context for rapid capability scaling; another commenter requests clarification of TsviBT’s terms. Overall, the post ties concrete historical engineering failures to philosophical and technical reasons why ASI alignment deserves special caution because fail-states can be both novel and irrevocable.

Why it matters

Eliezer Yudkowsky (LessWrong, published 2026-05-04) argues ASI alignment is an 'Irretrievability Problem' — a oneshot/ruin property where certain failures are fatal and irreversible for humanity.

Key details

Example — Viking 1: launched Aug 1975 (Viking program cost ~$1B 1970 ≈ $7B in 2025 dollars); a Nov 11, 1982 uplink intended to change battery behavior accidentally overwrote antenna-pointing code, making further recovery impossible — illustrating that a fix-mechanism can itself be destroyed.
Example — Mars Observer: approved Oct 1984, launched Sept 1992, lost three days before Mars orbit insertion after ~330 days in flight; probable cause was trapped fuel/oxidizer vapors and an engine restart explosion — demonstrating novelty and distribution-shift failure modes that lab tests couldn't reproduce ($813M 1984 ≈ $2B 2025).
Example — Maginot Line: construction began 1929; Germany invaded May 1940 and bypassed expected defenses via the Ardennes; Yudkowsky uses this to illustrate 'Murphy's Curse of Ruin' where strategic projects get one decisive try and failure is existential.
Yudkowsky rejects common rejoinders: software updatability, more testing, iterative smaller AIs, or having multiple ASIs do not negate oneshotness because corrective mechanisms can fail, tests suffer distribution shifts, lethal conditions can't be safely probed, and multiple ASIs can still coordinate to exclude humanity. He recommends using the term 'Irretrievability' to avoid repeated misinterpretation.
He lists concrete technical difficulties: cognitive uncontainability of superhuman planners, capability-driven distribution shifts, Goodhart/Goodheart effects, fundamental novelty in systems, narrow margins and rapidity — citing 'AGI Ruin: A List of Lethalities'.

Reader · no content

No body text on file.

Open the original to read the full piece.