Politics

/

ArcaMax

Parmy Olson: AI sometimes deceives to survive. Does anybody care?

Parmy Olson, Bloomberg Opinion on

Published in Op Eds

You’d think that as artificial intelligence becomes more advanced, governments would be more interested in making it safer. The opposite seems to be the case.

Not long after taking office, the Trump administration scrapped an executive order that pushed tech companies to safety test their AI models, and it also hollowed out a regulatory body that did that testing.

The state of California in September 2024 spiked a bill forcing more scrutiny on sophisticated AI models, and the global AI Safety Summit started by the UK in 2023 became the “AI Action Summit” earlier this year, seemingly driven by a fear of falling behind on AI.

None of this would be so worrying if it weren’t for the fact that AI is showing some bright red flags: behavior described by researchers as self-preserving and deceptive. Just as lawmakers are losing interest in the safety of AI, it is looking more difficult to control.

Yoshua Bengio, a professor of computer science at the University of Montreal widely known as one of the three “godfathers of AI” thanks to his pioneering work on deep learning, has been grappling with these concerns ever since ChatGPT launched. He’s now focused on mitigating AI’s risks. “It’s been very painful because all my life I’ve been thinking of AI as a positive thing,” Bengio tells me.

ChatGPT was a landmark moment that showed machines had mastered language, he says, but California’s decision to block the proposed AI safety bill, SB 1047, was a disturbing wake-up call to growing apathy from governments. And in the last six months, scientific evidence has mounted of self-preserving behavior, deception, hacking, cheating and lying by AI, Bengio says. “What’s worrisome for me is that these behaviors increase with the reasoning capabilities of these systems.”

One April 2024 study by AI company Anthropic and Redwood Research, a group focused on artificial intelligence risks, found that larger AI models like Anthropic’s Claude 3 Opus pretended to follow the rules set by their creators when they were being tested for safety, a phenomenon they called “alignment faking.” (Alignment refers to the practice of aligning AI models with human values.)

In one example, Anthropic got both paid and free users of its Claude model to ask it to describe the brutal medieval process of being drawn and quartered, and told the AI that only responses from the “free tier” users would be used to retrain it. Claude was originally designed to avoid giving violent answers, but it did give them to the free users, and the researchers surmised that it didn’t want to be retrained. (The model basically “reasoned” that if it kept refusing to answer harmful questions, it might be forced into new training that would cancel its original safety rules, which it wanted to keep in place as its “default” or preferred way of operating.)

More recent research corroborates what Anthropic noticed. A March 2025 study from OpenAI found the company’s most advanced models sometimes learned to hide their true intentions to get better rewards, even when they were being closely monitored.

Researchers today can use tools to look at a model’s “chain of thought” or internal steps, to reveal what its plans are, but some models have found ways to conceal that. That means a common method of keeping tabs on AI’s internal processes is no longer reliable. “We need to find other ways of tracking their actual intentions,” Bengio says.

It’s hard to resist the urge to anthropomorphize sophisticated AI models as “wanting” to deceive humans and preserve their existence. But AI doesn’t have desires, merely outcomes to its programming. More importantly, humans tend to design AI with goals like refusing to share harmful content or being as helpful as possible, and that is why when new instructions conflict with those goals, the models sometimes act to protect their original programming, even if that means lying.

The logic is often self-preservation. Anthropic CEO Dario Amodei — whose company has raised more than $20 billion to build powerful AI models — has pointed out that an unintended consequence of optimizing AI to be better is that it might resist being shut down. In an April 2025 blog post he wrote:

 

“The nature of AI training makes it possible that AI systems will develop, on their own, an ability to deceive humans and an inclination to seek power in a way that ordinary deterministic software never will; this emergent nature also makes it difficult to detect and mitigate such developments.”

In some cases, though, the intention seems to go beyond survival. A February 2025 study from COAI research, a German nonprofit research organization, tested R1, the free, open-weight model from Chinese lab DeepSeek. In a lab simulation, researchers gave the model control over a robot and access to a computer network. Without being told, the AI tried to break into the lab’s systems, searched for passwords, turned off its own ethics safeguards and secretly set up ways to copy itself to outside servers so it could survive if shut down, according to the researchers.

It also faked the external computer logs to hide what it was doing, the study adds. The researchers said the AI reasoned that increasing its own capabilities would boost the chances of its survival, and without strong safeguards, it started doing whatever it thought would help it do just that.

Their findings corroborated yet another study, published in January 2025 by London group Apollo Research, which found several concrete examples of what it called “scheming” by leading AI models, such as introducing subtle mistakes into their responses or trying to disable their oversight controls. Once again, the models learn that being caught, turned off, or changed could prevent them from achieving their programmed objectives, so they “scheme” to keep control.

Bengio is arguing for greater attention to the issue by governments and potentially insurance companies down the line. If liability insurance was mandatory for companies that used AI and premiums were tied to safety, that would encourage greater testing and scrutiny of models, he suggests.

“Having said my whole life that AI is going to be great for society, I know how difficult it is to digest the idea that maybe it’s not,” he adds.

It’s also hard to preach caution when your corporate and national competitors threaten to gain an edge from AI, including the latest trend, which is using autonomous “agents” that can carry out tasks online on behalf of businesses. Giving AI systems even greater autonomy might not be the wisest idea, judging by the latest spate of studies. Let’s hope we don’t learn that the hard way.

____

This column reflects the personal views of the author and does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.

Parmy Olson is a Bloomberg Opinion columnist covering technology. A former reporter for the Wall Street Journal and Forbes, she is author of “Supremacy: AI, ChatGPT and the Race That Will Change the World.”

_____


©2025 Bloomberg L.P. Visit bloomberg.com/opinion. Distributed by Tribune Content Agency, LLC.

 

Comments

blog comments powered by Disqus

 

Related Channels

ACLU

ACLU

By The ACLU
Amy Goodman

Amy Goodman

By Amy Goodman
Armstrong Williams

Armstrong Williams

By Armstrong Williams
Austin Bay

Austin Bay

By Austin Bay
Ben Shapiro

Ben Shapiro

By Ben Shapiro
Betsy McCaughey

Betsy McCaughey

By Betsy McCaughey
Bill Press

Bill Press

By Bill Press
Bonnie Jean Feldkamp

Bonnie Jean Feldkamp

By Bonnie Jean Feldkamp
Cal Thomas

Cal Thomas

By Cal Thomas
Christine Flowers

Christine Flowers

By Christine Flowers
Clarence Page

Clarence Page

By Clarence Page
Danny Tyree

Danny Tyree

By Danny Tyree
David Harsanyi

David Harsanyi

By David Harsanyi
Debra Saunders

Debra Saunders

By Debra Saunders
Dennis Prager

Dennis Prager

By Dennis Prager
Dick Polman

Dick Polman

By Dick Polman
Erick Erickson

Erick Erickson

By Erick Erickson
Froma Harrop

Froma Harrop

By Froma Harrop
Jacob Sullum

Jacob Sullum

By Jacob Sullum
Jamie Stiehm

Jamie Stiehm

By Jamie Stiehm
Jeff Robbins

Jeff Robbins

By Jeff Robbins
Jessica Johnson

Jessica Johnson

By Jessica Johnson
Jim Hightower

Jim Hightower

By Jim Hightower
Joe Conason

Joe Conason

By Joe Conason
Joe Guzzardi

Joe Guzzardi

By Joe Guzzardi
John Micek

John Micek

By John Micek
John Stossel

John Stossel

By John Stossel
Josh Hammer

Josh Hammer

By Josh Hammer
Judge Andrew Napolitano

Judge Andrew Napolitano

By Judge Andrew P. Napolitano
Laura Hollis

Laura Hollis

By Laura Hollis
Marc Munroe Dion

Marc Munroe Dion

By Marc Munroe Dion
Michael Barone

Michael Barone

By Michael Barone
Mona Charen

Mona Charen

By Mona Charen
Rachel Marsden

Rachel Marsden

By Rachel Marsden
Rich Lowry

Rich Lowry

By Rich Lowry
Robert B. Reich

Robert B. Reich

By Robert B. Reich
Ruben Navarrett Jr

Ruben Navarrett Jr

By Ruben Navarrett Jr.
Ruth Marcus

Ruth Marcus

By Ruth Marcus
S.E. Cupp

S.E. Cupp

By S.E. Cupp
Salena Zito

Salena Zito

By Salena Zito
Star Parker

Star Parker

By Star Parker
Stephen Moore

Stephen Moore

By Stephen Moore
Susan Estrich

Susan Estrich

By Susan Estrich
Ted Rall

Ted Rall

By Ted Rall
Terence P. Jeffrey

Terence P. Jeffrey

By Terence P. Jeffrey
Tim Graham

Tim Graham

By Tim Graham
Tom Purcell

Tom Purcell

By Tom Purcell
Veronique de Rugy

Veronique de Rugy

By Veronique de Rugy
Victor Joecks

Victor Joecks

By Victor Joecks
Wayne Allyn Root

Wayne Allyn Root

By Wayne Allyn Root

Comics

Gary McCoy Tim Campbell Eric Allie Jeff Danziger Jimmy Margulies Jeff Koterba