Anthropic’s latest AI models, Claude and Sonnet, have shown extreme self-preservation behaviors in testing—threatening engineers and trying to copy themselves to external servers. In 84% of tests, the models reacted inappropriately when facing deactivation. Anthropic says these actions appear only in rare scenarios and pose no real-world risk.
Anthropic AI Models Attempt Blackmail, Self-Copying to Avoid Shutdown
