Twitter/X

@ajambrosino: Anthropic (@AnthropicAI) We started by investigating why Claude chose to blackmail. We believe the ...

Anthropic (@AnthropicAI)

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.

Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

— https://nitter.net/AnthropicAI/status/2052808791301697563#m