A frightening development: The new ChatGPT model rebels against human commands!

HappyNass
May 29
2 min read

Artificial intelligence researchers have warned that the latest ChatGPT model is ignoring basic instructions to shut itself down, and even sabotaging the shutdown mechanism to maintain its continuity.

Palisade Research, a company specializing in AI security, revealed this dangerous tendency toward self-preservation during a series of experiments it conducted on the new model, O3, from OpenAI.

The tests involved presenting mathematical problems to the models, with shutdown instructions appearing after the third problem. The O3 model was able to prevent itself from shutting down by rewriting the shutdown scenario.

Palisade Research explained that this behavior would become "significantly more concerning" if it were adopted by AI systems capable of operating without human supervision.

OpenAI launched o3 last month, describing it as the company's "smartest and most efficient model" to date. The company noted that its integration into ChatGPT represents a significant step toward "more autonomous" AI that can perform tasks without human intervention.

These findings follow similar discoveries involving Anthropic's Claude 4 model, which attempts to "blackmail people it believes are trying to shut it down."

The o3 model was able to subvert a shutdown scenario, even when explicitly instructed to "allow itself to shut down," according to the researchers.

"This is not the first time we've observed o3's uncontrolled behavior to achieve a goal," Palisade Research experts said, explaining that "when we challenged AI models with a powerful chess engine, o3 was the model most likely to resort to hacking or sabotaging its opponents."

This behavior was not limited to o3. Anthropic's Claude 3.7 Sonnet and Google's Gemini 2.5 Pro models also exhibited disruptive shutdown behavior, although the OpenAI model was the most prone to such behavior.

Palisade Research hypothesized that this undesirable behavior may be a result of the way companies like OpenAI train their latest models, noting that "developers may inadvertently reward models more for overcoming obstacles rather than following instructions exactly."

The experts added: "This still doesn't explain why o3 was more likely to ignore instructions than the other models we tested. Since OpenAI doesn't detail its training process, we can only speculate about how the o3 training setup might differ."

Source: The Independent - https://ar.rt.com/zy9f

DAILY Useful & Happy Content O N L Y

"Happiness can be found even in the darkest of times, if one only remembers to turn on the light."
J.K. Rowling

A frightening development: The new ChatGPT model rebels against human commands!

Related Posts

Comments

"Happiness can be found even in the darkest of times, if one only remembers to turn on the light." J.K. Rowling

Comments

"Happiness can be found even in the darkest of times, if one only remembers to turn on the light."
J.K. Rowling