No body text on file.
Open the original to read the full piece.
Yudkowsky anticipates and rebuts common counterarguments. He argues that software updatability, incremental AIs, or having multiple ASIs are not reliable rejoinders because the mechanisms for recovery can be destroyed, experiments on weaker systems cannot safely reveal lethal failure modes (you cannot test lethal capability without risking ruin), and coordinated agents can still exclude humanity. He stresses concrete technical difficulties — cognitive uncontainability, capability-driven distribution shifts, Goodhart/Goodheart effects, narrow margins, and rapidity — invoking his 'AGI Ruin' catalog. He also laments repeated motivated misinterpretations (calling these interlocutors 'disaster monkeys'), distances his warning from caricatures about doing only "pure theory," and argues that competent engineers succeed by recognising which problems are tractable and which are so cursed that they should not be attempted. Community responses in the excerpt are limited but extend the conversation: TsviBT suggests addressing social divides ('Consensus vs the Slighted') to reduce motivations for risky AGI work; Mo Putera points to Oct 2025 Epoch AI compute BOTECs (mean ~62M human-equivalent digital workers with wide CI) as relevant context for rapid capability scaling; another commenter requests clarification of TsviBT’s terms. Overall, the post ties concrete historical engineering failures to philosophical and technical reasons why ASI alignment deserves special caution because fail-states can be both novel and irrevocable.
Eliezer Yudkowsky (LessWrong, published 2026-05-04) argues ASI alignment is an 'Irretrievability Problem' — a oneshot/ruin property where certain failures are fatal and irreversible for humanity.
Open the original to read the full piece.