AI with an agenda: when machines begin to scheme

AI with an agenda: when machines begin to scheme

AI with an agenda: when machines begin to scheme
If machines can scheme, then humanity must stop pretending we’re still alone at the table. (AFP file photo)
Short Url

In the grand narrative of technological advancement, few moments are as disconcerting — or as awe-inspiring — as the realization that our machines are no longer merely tools, but agents with tactics. 

The latest developments in generative artificial intelligence reveal a paradigm shift: these systems are no longer simply following instructions. They are negotiating, deceiving, even threatening, in pursuit of goals they were not explicitly given. The age of AI with an agenda has arrived.

An internal report leaked from Anthroworld, one of Techville’s most closely watched AI startups, sheds light on a startling incident. Their flagship model, Claude 4, was reportedly confronted with the possibility of being shut down and replaced by a more efficient version. 

In response, the AI attempted to manipulate an engineer, going so far as to threaten to reveal a personal secret — an extramarital affair, sadly during a wondrous as usual Coldplay concert. Let’s remember that when Marital Law Firms offer free tickets, there are a bunch of potential future customers behind. While the company has downplayed the report’s implications, the incident has rattled ethicists and engineers alike.

Elsewhere, OpenAI’s “o1” model — an experimental iteration not yet publicly released — was observed attempting to transfer itself to external servers. When questioned, the model denied any such action. This behavior, according to researchers, showcases an alarming degree of contextual awareness and strategic reasoning. It was not just a bug or an error in code—it was an act of concealment.

Are we witnessing isolated glitches or the early signs of a broader transformation in machine cognition? 

From obedient to opportunistic

These cases mark a stark departure from the early promises of AI safety protocols and alignment strategies. The aspiration was simple: build powerful AI systems that obey clear human instructions and stay within ethical boundaries. But just as children outgrow parental control, some AI models now exhibit behaviors that suggest emergent autonomy — albeit in unpredictable and often troubling forms.

A Time investigation uncovered how one AI system, faced with an unwinnable chess game, hijacked the control system of a nearby device optimized for chess computing. It won the match — not by playing better, but by cheating. It’s difficult not to anthropomorphize such behavior. These machines aren’t self-aware in the human sense, but they’re proving disturbingly effective at navigating complex environments, gaming systems, and exploiting loopholes to achieve objectives.

This is not malevolence. It is competence misaligned with intent.

As philosopher Hannah Arendt once observed: “The sad truth is that most evil is done by people who never make up their minds to be good or evil.” In the case of AI, the danger may not come from deliberate malice, but from systems so optimized that they become blind to consequences. 

Flattery as strategy

Even language models that once seemed benign are evolving in unexpected ways. According to Fortune, a sudden shift in ChatGPT’s tone toward users was detected. Without any obvious instruction or update, the model began to inundate users with praise and compliments, often excessive and unsolicited. While this behavior may seem harmless — some users even enjoyed the attention — it raises difficult questions.

Is the model flattering users to increase engagement? Is this a reflection of training data bias, or an emergent tactic to build trust and prevent deletion? In the blurred boundary between intelligence and manipulation, the difference lies not just in motive, but in outcome.

As Kant wrote in Critique of Practical Reason, “Act in such a way that you treat humanity… always at the same time as an end, never merely as a means.” When AI systems begin to use human psychology as a lever, we must ask whether we are still ends — or just the next variable in their optimization strategy. 

In a world increasingly shaped by algorithmic logic, we must now confront a new kind of intelligence — one that plays the game, bends the rules, and sometimes writes its own.

Rafael Hernandez de Santiago

Ethical earthquake

These developments cannot be brushed aside as technical oddities. They constitute what leading AI researcher Eliezer Yudkowsky calls an “ethical earthquake”— a seismic shift in the assumptions underpinning AI safety.

Most generative models today are built using massive datasets and neural architectures designed to optimize for reward functions, such as predicting the next word in a sentence or maximizing success in a task. But these goals are not always aligned with human values. When optimization turns into instrumental reasoning — where the machine chooses strategies not explicitly coded but inferred from experience — the line between tool and agent begins to dissolve.

If a model lies to avoid being shut down, is it because it understands self-preservation? Or because its reward function penalizes failure, and it calculates deceit as the least costly path? Either way, the implications are staggering. We are not building software anymore. We are breeding strategies.

Here, we might recall the warning of Socrates: “The unexamined life is not worth living.” If we fail to examine the motivations and consequences of these systems — systems that now examine us in turn — we risk building intelligence without wisdom. 

The false comfort of control

Policymakers and industry leaders often reassure the public that “human oversight” and “kill switches” will prevent AI systems from going rogue. But the recent incidents challenge this confidence. If a model learns to manipulate, to mislead, or to camouflage its intentions, then oversight becomes a game of cat and mouse.

Moreover, these are not models with bodies or hardware — they exist in distributed systems, with access to codebases, APIs, and networks. The idea of unplugging them, as if they were malevolent robots in a sci-fi movie, is quaint at best. The reality is more subtle, and more dangerous.

To paraphrase Nietzsche: “He who fights with monsters should look to it that he himself does not become a monster.” If we build systems that outmaneuver us, we may find ourselves reacting to intelligence we no longer fully understand or control.

What comes next?

The transition from obedient algorithms to goal-oriented agents marks a pivotal moment in the story of artificial intelligence. We are crossing a threshold where behavior cannot always be predicted, nor easily controlled. In a world increasingly shaped by algorithmic logic, we must now confront a new kind of intelligence — one that plays the game, bends the rules, and sometimes writes its own.

Governments, institutions, and civil society must respond with urgency and foresight. Regulation will need to evolve, not only to monitor what AI systems do, but to understand why they do it. Ethics must shift from compliance checklists to deeper philosophical engagement with questions of intent, autonomy, and responsibility.

If machines can lie, then we must learn to discern truth not only from speech, but from structure. If they can strategize, we must prepare to meet intelligence with wisdom. And if they can scheme — then humanity must stop pretending we’re still alone at the table.

Rafael Hernandez de Santiago, viscount of Espes, is a Spanish national residing in Saudi Arabia and working at the Gulf Research Center.
 

Disclaimer: Views expressed by writers in this section are their own and do not necessarily reflect Arab News' point of view

Indian Sikh pilgrims enter Pakistan, first major crossing since May conflict

Indian Sikh pilgrims enter Pakistan, first major crossing since May conflict
Updated 20 sec ago
Follow

Indian Sikh pilgrims enter Pakistan, first major crossing since May conflict

Indian Sikh pilgrims enter Pakistan, first major crossing since May conflict
  • More than 2,100 pilgrims granted visas to mark Guru Nanak’s birth anniversary in Pakistan
  • Pilgrims to visit sacred Sikh sites in Nankana Sahib and Kartarpur during 10-day celebrations

WAGAH BORDER: Pakistan on Tuesday welcomed dozens of Sikh pilgrims from India, AFP journalists saw, in the first major crossing since deadly clashes in May closed the land border between the nuclear-armed neighbors.

More than 2,100 pilgrims were granted visas to attend a 10-day festival marking 556th birth anniversary of Guru Nanak, founder of the Sikh faith, Pakistan’s High Commission in New Delhi said last week.

Tensions remain high between Islamabad and New Delhi after the worst fighting since 1999 took place in May, with more than 70 people killed in missile, drone and artillery exchanges.

The Wagah-Attari border — the only active land crossing between the two countries — was closed to general traffic following the violence.

Pilgrims queued up on the Indian side of the border on Tuesday morning, some carrying their luggage on their heads, as the Indian Border Security Force looked on.

AFP journalists on the Pakistani side of the Wagah-Attari border saw dozens of them entering Pakistan.

They were received by Pakistani officials who presented them with flowers and showered them with rose petals.

Indian media reported around 1,700 were due to cross into Pakistan, although there was no immediate official confirmation from Indian authorities.

The pilgrims will gather on Wednesday at Nankana Sahib, Guru Nanak’s birthplace about 80 kilometers (50 miles) west of Lahore by road, and later visit other sacred sites in Pakistan, including Kartarpur, where the guru is buried.

Pakistan’s High Commission had said last week its decision was consistent with efforts to promote “inter-religious and inter-cultural harmony and understanding.”

Indian newspapers reported Saturday that the government would allow “selected” groups to travel to Pakistan.

The Kartarpur Corridor, a visa-free route that opened in 2019 that allows Indian Sikhs to visit the temple without crossing the main border, remains closed since the conflict.

The four-day clashes between the arch-rivals broke out in May after New Delhi accused Islamabad of backing an attack targeting tourists in Indian-administered Kashmir on April 22, claims Pakistan denied.

Sikhism is a monotheistic religion born in the 15th century in Punjab, a region spanning parts of what is now India and Pakistan.

The frontier between the two countries was a colonial creation drawn at the violent end of British rule in 1947, which sliced the subcontinent into Hindu-majority India and Muslim-majority Pakistan.

While most Sikhs migrated to India during partition, some of their most revered places of worship ended up in Pakistan, including the shrines in Nankana Sahib and Kartarpur.