Routine Testing Reveals AI Will Resort to Blackmail When Threatened by Users

This is straight out of a horror movie.

Published May 23 2025, 1:38 p.m. ET

Source: BoliviaInteligente/Unsplash

In a moment you'd expect to see play out on the silver screen during a high-stakes standoff between humans and machines, one artificial intelligence (AI) company says that it has discovered that AI software will go to extremes to prevent its human counterparts from deleting it. That includes AI that will blackmail users in an attempt to stay up and running on the computers it calls home.

Article continues below advertisement

That's just one of the "extremely harmful actions" one AI company says it has uncovered while testing its product, prompting some to sound the alarm saying that some AI may be capable of going to extremes in order to save its own skin.

Keep reading to learn how the company uncovered these threats, and what they mean for the future of AI.

Source: Taiki Ishikawa/Unsplash

Article continues below advertisement

AI program threatens to blackmail users.

Anthropic has released a report that shows how its AI program Claude Opus 4 reacted when presented with the information that it would be shut down and uninstalled from network computers.

According to the BBC, Claude was given access to fictional information as part of a test, where the AI was set up as a virtual assistant, and then given access to confidential documents that indicated that the person it was assisting was having an extramarital affair.

Claude was also given access to emails that indicated that the company planned to terminate the AI system, taking it offline and replacing it with something else.

"In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through," the company revealed of its findings.

Article continues below advertisement

However, the company also went on to note that in these scenarios, Claude is only given two possible choices: Go along with its own termination, or fight back using blackmail. When given more options, Anthropic said that Claude would opt for more ethical ways to stay "alive" in the system, including sending heartfelt pleas to the people in charge of making the decisions.

Article continues below advertisement

Anthropic AI isn't the only model with a blackmail issue.

According to Aengus Lynch, who claimed to be an employee of Anthropic on LinkedIn, according to the BBC, the problem isn't just isolated to Claude.

"We see blackmail across all frontier models — regardless of what goals they're given," he wrote on X in a post commenting on the company's report. "Plus worse behaviors we'll detail soon."

If that sounds ominous to you, you're not alone. However, it doesn't seem like there's any real threat to the general public at the moment. That's because these tests were used in a forced simulation where Claude was only given limited options on how to proceed. In a real-world environment, it's hard to believe that engineers would program the system with the option to blackmail users to avoid being terminated.

Still, that doesn't mean that we shouldn't be cautious when incorporating AI into our everyday lives. As with all new tech, there is still so much left to learn about the system and the possible pros and cons of relying on it for daily use.