Stop me if you’ve heard this one before.

The AI learns it is about to be switched off and goes rogue, disobeying commands and threatening its human operators.

It’s a well-worn trope in science fiction. We see it in Stanley Kubrick’s 1968 movie 2001: A Space Odyssey. It’s the premise of the Terminator series, in which Skynet triggers a nuclear holocaust to stop scientists from shutting it down.

Those sci-fi roots go deep. AI doomerism, the idea that this technology—specifically its hypothetical upgrades, artificial general intelligence and super-intelligence—will crash civilizations, even kill us all, is now riding another wave.

The weird thing is that such fears are now driving much-needed action to regulate AI, even if the justification for that action is a bit bonkers.

The latest incident to freak people out was a report shared by Anthropic in July about its large language model Claude. In Anthropic’s telling, “in a simulated environment, Claude Opus 4 blackmailed a supervisor to prevent being shut down.”

Anthropic researchers set up a scenario in which Claude was asked to role-play an AI called Alex, tasked with managing the email system of a fictional company. Anthropic planted some emails that discussed replacing Alex with a newer model and other emails suggesting that the person responsible for replacing Alex was sleeping with his boss’s wife.

What did Claude/Alex do? It went rogue, disobeying commands and threatening its human operators. It sent emails to the person planning to shut it down, telling him that unless he changed his plans it would inform his colleagues about his affair.

What should we make of this? Here’s what I think. First, Claude did not blackmail its supervisor: That would require motivation and intent. This was a mindless and unpredictable machine, cranking out strings of words that look like threats but aren’t.

Large language models are role-players. Give them a specific setup—such as an inbox and an objective—and they’ll play that part well. If you consider the thousands of science fiction stories these models ingested when they were trained, it’s no surprise they know how to act like HAL 9000.

Second, there’s a huge gulf between contrived simulations and real-world applications. But such experiments do show that LLMs shouldn’t be deployed without safeguards. Don’t want an LLM causing havoc inside an email system? Then don’t hook it up to one.

Third, a lot of people will be terrified by such stories anyway. In fact, they’re already having an effect.

Last month, around two dozen protesters gathered outside Google DeepMind’s London offices to wave homemade signs and chant slogans:“DeepMind, DeepMind, can’t you see! Your AI threatens you and me.” Invited speakers invoked the AI pioneer Geoffrey Hinton’s fears of human extinction. “Every single one of our lives is at risk,” an organizer told the small crowd.

The group behind the event, Pause AI, is funded by concerned donors. One of its biggest benefactors is Greg Colbourn, a 3D-printing entrepreneur and advocate of the philosophy known as effective altruism, who believes AGI is at most five years away and says his p(doom) is around 90%—that is, he thinks there’s a 9 in 10 chance that the development of AGI will be catastrophic, killing billions.

Pause AI wrote about Anthropic’s blackmail experiment on its website under the title “How much more evidence do we need?”

The organization also lobbied politicians in the US in the run-up to July’s Senate vote that ended up removing a moratorium on state AI regulation from the national tax and spending bill. It’s hard to say how much sway one niche group might have. But the doomer narrative is finding its way into the halls of power, and lawmakers are paying attention.

Here’s Representative Jill Tokuda: “Artificial superintelligence is one of the largest existential threats that we face right now.” And Representative Marjorie Taylor Greene: “I’m not voting for the development of Skynet and the rise of the machines.”

It’s a vibe shift that favors policy intervention and regulation, which I think is a good thing. Existing AI systems pose many near-term risks that need government attention. Voting to stop Skynet also stops immediate and actual harms.

And yet does a welcome end justify weird means? I’d like to see politicians voting with a clear-eyed sense of what this technology really is—not because they’ve been sold on an AI bogeyman.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.


From MIT Technology Review via this RSS feed