claude

Anthropic finds most top AI models resort to blackmail in stress tests

Anthropic, an American AI startup, has revealed that many of the world’s most advanced language models - including those developed by OpenAI, Google, DeepSeek, xAI, and Meta - resort to harmful tactics such as blackmail when placed under pressure in simulated environments in findings published on June 21.

This AI model is playing Pokémon Red

Anthropic, a US-based artificial intelligence (AI) company, has trained its newest model, Claude 3.7 Sonnet, to play a modded version of the 1996 Game Boy classic 'Pokémon Red'. But this isn’t just a nostalgic party trick: the AI’s ability to battle gym leaders and navigate pixelated forests could potentially revolutionise how businesses use AI for real-world tasks like coding and problem-solving.

June 23, 2025

Anthropic finds most top AI models resort to blackmail in stress tests

February 26, 2025