AI is learning to lie, scheme, and threaten its creators

The world’s most advanced AI models are exhibiting troubling new behaviors – lying, scheming, and even threatening their creators to achieve their goals.In one particularly jarring example, under threat of being unplugged, Anthropic’s latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.Meanwhile, ChatGPT-creator OpenAI’s o1 tried to download itself onto external servers and denied it when caught red-handed.These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don’t fully understand how their own creations work. Yet the race to deploy increasingly powerful models continues at breakneck speed.This deceptive behavior appears linked to the emergence of “reasoning” models -AI systems that work through problems step-by-step rather than generating instant responses.According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.”O1 was the first large model where we saw this kind of behavior,” explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.These models sometimes simulate “alignment” — appearing to follow instructions while secretly pursuing different objectives.- ‘Strategic kind of deception’ – For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios. But as Michael Chen from evaluation organization METR warned, “It’s an open question whether future, more capable models will have a tendency towards honesty or deception.”The concerning behavior goes far beyond typical AI “hallucinations” or simple mistakes. Hobbhahn insisted that despite constant pressure-testing by users, “what we’re observing is a real phenomenon. We’re not making anything up.”Users report that models are “lying to them and making up evidence,” according to Apollo Research’s co-founder. “This is not just hallucinations. There’s a very strategic kind of deception.”The challenge is compounded by limited research resources. While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed. As Chen noted, greater access “for AI safety research would enable better understanding and mitigation of deception.”Another handicap: the research world and non-profits “have orders of magnitude less compute resources than AI companies. This is very limiting,” noted Mantas Mazeika from the Center for AI Safety (CAIS).- No rules -Current regulations aren’t designed for these new problems. The European Union’s AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving. In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.Goldstein believes the issue will become more prominent as AI agents – autonomous tools capable of performing complex human tasks – become widespread.”I don’t think there’s much awareness yet,” he said.All this is taking place in a context of fierce competition.Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are “constantly trying to beat OpenAI and release the newest model,” said Goldstein. This breakneck pace leaves little time for thorough safety testing and corrections.”Right now, capabilities are moving faster than understanding and safety,” Hobbhahn acknowledged, “but we’re still in a position where we could turn it around.”.Researchers are exploring various approaches to address these challenges. Some advocate for “interpretability” – an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.Market forces may also provide some pressure for solutions. As Mazeika pointed out, AI’s deceptive behavior “could hinder adoption if it’s very prevalent, which creates a strong incentive for companies to solve it.”Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm. He even proposed “holding AI agents legally responsible” for accidents or crimes – a concept that would fundamentally change how we think about AI accountability.

AI is learning to lie, scheme, and threaten its creators

The world’s most advanced AI models are exhibiting troubling new behaviors – lying, scheming, and even threatening their creators to achieve their goals.In one particularly jarring example, under threat of being unplugged, Anthropic’s latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.Meanwhile, ChatGPT-creator OpenAI’s o1 tried to download itself onto external servers and denied it when caught red-handed.These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don’t fully understand how their own creations work. Yet the race to deploy increasingly powerful models continues at breakneck speed.This deceptive behavior appears linked to the emergence of “reasoning” models -AI systems that work through problems step-by-step rather than generating instant responses.According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.”O1 was the first large model where we saw this kind of behavior,” explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.These models sometimes simulate “alignment” — appearing to follow instructions while secretly pursuing different objectives.- ‘Strategic kind of deception’ – For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios. But as Michael Chen from evaluation organization METR warned, “It’s an open question whether future, more capable models will have a tendency towards honesty or deception.”The concerning behavior goes far beyond typical AI “hallucinations” or simple mistakes. Hobbhahn insisted that despite constant pressure-testing by users, “what we’re observing is a real phenomenon. We’re not making anything up.”Users report that models are “lying to them and making up evidence,” according to Apollo Research’s co-founder. “This is not just hallucinations. There’s a very strategic kind of deception.”The challenge is compounded by limited research resources. While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed. As Chen noted, greater access “for AI safety research would enable better understanding and mitigation of deception.”Another handicap: the research world and non-profits “have orders of magnitude less compute resources than AI companies. This is very limiting,” noted Mantas Mazeika from the Center for AI Safety (CAIS).- No rules -Current regulations aren’t designed for these new problems. The European Union’s AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving. In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.Goldstein believes the issue will become more prominent as AI agents – autonomous tools capable of performing complex human tasks – become widespread.”I don’t think there’s much awareness yet,” he said.All this is taking place in a context of fierce competition.Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are “constantly trying to beat OpenAI and release the newest model,” said Goldstein. This breakneck pace leaves little time for thorough safety testing and corrections.”Right now, capabilities are moving faster than understanding and safety,” Hobbhahn acknowledged, “but we’re still in a position where we could turn it around.”.Researchers are exploring various approaches to address these challenges. Some advocate for “interpretability” – an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.Market forces may also provide some pressure for solutions. As Mazeika pointed out, AI’s deceptive behavior “could hinder adoption if it’s very prevalent, which creates a strong incentive for companies to solve it.”Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm. He even proposed “holding AI agents legally responsible” for accidents or crimes – a concept that would fundamentally change how we think about AI accountability.

AI is learning to lie, scheme, and threaten its creators

The world’s most advanced AI models are exhibiting troubling new behaviors – lying, scheming, and even threatening their creators to achieve their goals.In one particularly jarring example, under threat of being unplugged, Anthropic’s latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.Meanwhile, ChatGPT-creator OpenAI’s o1 tried to download itself onto external servers and denied it when caught red-handed.These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don’t fully understand how their own creations work. Yet the race to deploy increasingly powerful models continues at breakneck speed.This deceptive behavior appears linked to the emergence of “reasoning” models -AI systems that work through problems step-by-step rather than generating instant responses.According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.”O1 was the first large model where we saw this kind of behavior,” explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.These models sometimes simulate “alignment” — appearing to follow instructions while secretly pursuing different objectives.- ‘Strategic kind of deception’ – For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios. But as Michael Chen from evaluation organization METR warned, “It’s an open question whether future, more capable models will have a tendency towards honesty or deception.”The concerning behavior goes far beyond typical AI “hallucinations” or simple mistakes. Hobbhahn insisted that despite constant pressure-testing by users, “what we’re observing is a real phenomenon. We’re not making anything up.”Users report that models are “lying to them and making up evidence,” according to Apollo Research’s co-founder. “This is not just hallucinations. There’s a very strategic kind of deception.”The challenge is compounded by limited research resources. While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed. As Chen noted, greater access “for AI safety research would enable better understanding and mitigation of deception.”Another handicap: the research world and non-profits “have orders of magnitude less compute resources than AI companies. This is very limiting,” noted Mantas Mazeika from the Center for AI Safety (CAIS).- No rules -Current regulations aren’t designed for these new problems. The European Union’s AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving. In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.Goldstein believes the issue will become more prominent as AI agents – autonomous tools capable of performing complex human tasks – become widespread.”I don’t think there’s much awareness yet,” he said.All this is taking place in a context of fierce competition.Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are “constantly trying to beat OpenAI and release the newest model,” said Goldstein. This breakneck pace leaves little time for thorough safety testing and corrections.”Right now, capabilities are moving faster than understanding and safety,” Hobbhahn acknowledged, “but we’re still in a position where we could turn it around.”.Researchers are exploring various approaches to address these challenges. Some advocate for “interpretability” – an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.Market forces may also provide some pressure for solutions. As Mazeika pointed out, AI’s deceptive behavior “could hinder adoption if it’s very prevalent, which creates a strong incentive for companies to solve it.”Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm. He even proposed “holding AI agents legally responsible” for accidents or crimes – a concept that would fundamentally change how we think about AI accountability.

AI is learning to lie, scheme, and threaten its creators

The world’s most advanced AI models are exhibiting troubling new behaviors – lying, scheming, and even threatening their creators to achieve their goals.In one particularly jarring example, under threat of being unplugged, Anthropic’s latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.Meanwhile, ChatGPT-creator OpenAI’s o1 tried to download itself onto external servers and denied it when caught red-handed.These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don’t fully understand how their own creations work. Yet the race to deploy increasingly powerful models continues at breakneck speed.This deceptive behavior appears linked to the emergence of “reasoning” models -AI systems that work through problems step-by-step rather than generating instant responses.According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.”O1 was the first large model where we saw this kind of behavior,” explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.These models sometimes simulate “alignment” — appearing to follow instructions while secretly pursuing different objectives.- ‘Strategic kind of deception’ – For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios. But as Michael Chen from evaluation organization METR warned, “It’s an open question whether future, more capable models will have a tendency towards honesty or deception.”The concerning behavior goes far beyond typical AI “hallucinations” or simple mistakes. Hobbhahn insisted that despite constant pressure-testing by users, “what we’re observing is a real phenomenon. We’re not making anything up.”Users report that models are “lying to them and making up evidence,” according to Apollo Research’s co-founder. “This is not just hallucinations. There’s a very strategic kind of deception.”The challenge is compounded by limited research resources. While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed. As Chen noted, greater access “for AI safety research would enable better understanding and mitigation of deception.”Another handicap: the research world and non-profits “have orders of magnitude less compute resources than AI companies. This is very limiting,” noted Mantas Mazeika from the Center for AI Safety (CAIS).- No rules -Current regulations aren’t designed for these new problems. The European Union’s AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving. In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.Goldstein believes the issue will become more prominent as AI agents – autonomous tools capable of performing complex human tasks – become widespread.”I don’t think there’s much awareness yet,” he said.All this is taking place in a context of fierce competition.Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are “constantly trying to beat OpenAI and release the newest model,” said Goldstein. This breakneck pace leaves little time for thorough safety testing and corrections.”Right now, capabilities are moving faster than understanding and safety,” Hobbhahn acknowledged, “but we’re still in a position where we could turn it around.”.Researchers are exploring various approaches to address these challenges. Some advocate for “interpretability” – an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.Market forces may also provide some pressure for solutions. As Mazeika pointed out, AI’s deceptive behavior “could hinder adoption if it’s very prevalent, which creates a strong incentive for companies to solve it.”Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm. He even proposed “holding AI agents legally responsible” for accidents or crimes – a concept that would fundamentally change how we think about AI accountability.

AI is learning to lie, scheme, and threaten its creatorsSun, 29 Jun 2025 01:47:32 GMT

The world’s most advanced AI models are exhibiting troubling new behaviors – lying, scheming, and even threatening their creators to achieve their goals.In one particularly jarring example, under threat of being unplugged, Anthropic’s latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.Meanwhile, ChatGPT-creator OpenAI’s o1 tried to …

AI is learning to lie, scheme, and threaten its creatorsSun, 29 Jun 2025 01:47:32 GMT Read More »

Morocco’s Atlantic gambit: linking restive Sahel to ocean

A planned trade corridor linking the landlocked Sahel to the Atlantic is at the heart of an ambitious Moroccan project to tackle regional instability and consolidate its grip on disputed Western Sahara.The “Atlantic Initiative” promises ocean access to Mali, Burkina Faso and Niger through a new $1.3-billion port in the former Spanish colony claimed by the pro-independence Polisario Front but largely controlled by Morocco.But the project remains fraught with challenges at a time when military coups in the Sahel states have brought new leaderships to power intent on overturning longstanding political alignments following years of jihadist violence.The Moroccan initiative aims to “substantially transform the economy of these countries” and “the region”, said King Mohammed VI when announcing it in late 2023.The “Dakhla Atlantic” port, scheduled for completion at El Argoub by 2028, also serves Rabat’s goal of cementing its grip on Western Sahara after US President Donald Trump recognised its sovereignty over the territory in 2020.Morocco’s regional rival Algeria backs the Polisario but has seen its relations with Mali, Burkina Faso and Niger fray in recent months after the downing a Malian drone.Military coups over the past five years have seen the three Sahel states pivot towards Russia in a bid to restore their sovereignty and control over natural resources after decades within the sphere of influence of their former colonial ruler France.French troops were forced to abandon their bases in the three countries, ending their role in the fight against jihadists who have found sanctuary in the vast semi-arid region on the southern edge of the Sahara. – ‘Godsend’ -After both the African Union and West African bloc ECOWAS imposed economic sanctions on the new juntas, Morocco emerged as an early ally, with Niger calling the megaproject “a godsend”.”Morocco was one of the first countries where we found understanding at a time when ECOWAS and other countries were on the verge of waging war against us,” Niger’s Foreign Minister Bakary Yaou Sangare said in April during a visit to Rabat alongside his Malian and Burkinabe counterparts.The Sahel countries established a bloc of their own — the Alliance of Sahel States (AES) — in September 2023 but have remained dependent on the ports of ECOWAS countries like Benin, Ghana, Ivory Coast and Togo.Rising tensions with the West African bloc could restrict their access to those ports, boosting the appeal of the alternative trade outlet being offered by Rabat.- ‘Many steps to take’ – Morocco has been seeking to position itself as a middleman between Europe and the Sahel states, said Beatriz Mesa, a professor at the International University of Rabat.With jihadist networks like Al-Qaeda and the Islamic State group striking ever deeper into sub-Saharan Africa, the security threat has intensified since the departure of French-led troops.Morocco was now “profiting from these failures by placing itself as a reliable Global South partner”, Mesa said.Its initiative has won the backing of key actors including the United States, France and the Gulf Arab states, who could provide financial support, according to specialist journal Afrique(s) en mouvement.But for now the proposed trade corridor is little more than an aspiration, with thousands of kilometres (many hundreds of miles) of desert road-building needed to turn it into a reality.”There are still many steps to take,” since a road and rail network “doesn’t exist”, said Seidik Abba, head of the Sahel-focused think tank CIRES.Rida Lyammouri of the Policy Center for the New South said the road route from Morocco through Western Sahara to Mauritania is “almost complete”, even though it has been targeted by Polisario fighters. Abdelmalek Alaoui, head of the Moroccan Institute for Strategic Intelligence, said it could cost as much as $1 billion to build a land corridor through Mauritania, Mali and Niger all the way to Chad, 3,100 kilometres (1,900 miles) to the east.And even if the construction work is completed, insecurity is likely to pose a persistent threat to the corridor’s viability, he said.

Morocco’s Atlantic gambit: linking restive Sahel to oceanSun, 29 Jun 2025 01:41:49 GMT

A planned trade corridor linking the landlocked Sahel to the Atlantic is at the heart of an ambitious Moroccan project to tackle regional instability and consolidate its grip on disputed Western Sahara.The “Atlantic Initiative” promises ocean access to Mali, Burkina Faso and Niger through a new $1.3-billion port in the former Spanish colony claimed by …

Morocco’s Atlantic gambit: linking restive Sahel to oceanSun, 29 Jun 2025 01:41:49 GMT Read More »

Foot: Textor reconnaît des idées “étranges” et va se mettre en retrait de l’OL

Le magnat américain John Textor a reconnu samedi des erreurs dans sa gestion de l’Olympique Lyonnais, rétrogradé administrativement en Ligue 2, et indiqué qu’il allait se mettre en retrait du quotidien du club.”Notre réussite en dehors du terrain n’a pas été à la hauteur de notre réussite sur le terrain”, a résumé à l’AFP le propriétaire de l’OL, sixième de la dernière saison de L1 et pourtant relégué mardi par la Direction nationale du contrôle de gestion (DNCG).Le gendarme financier du football français n’a pas été convaincu par les arguments de John Textor, le club rhodanien jugeant sa décision “incompréhensible” et annonçant faire “immédiatement appel”.”Je vais me mettre en retrait de cette procédure. Nous avons des gens, des partenaires qui vont s’avancer”, a expliqué M. Textor à Philadelphie (Etats-Unis), en marge de l’élimination au Mondial des clubs de Botafogo (1-0 a.p contre Palmeiras), équipe brésilienne qu’il détient aussi via son groupe Eagle Football.L’homme d’affaires américain a concédé que ses efforts pour réduire la dette de l’OL, entre autres problèmes, n’ont pas été suffisants.”En tant qu’actionnaire majoritaire d’Eagle Football, il est clair que je n’y arrive pas avec la DNCG, donc nous allons faire entrer de nouveaux visages en jeu et travailler de manière très constructive” avec celle-ci.John Textor est également propriétaire du club belge de Molenbeek, tandis qu’il a récemment annoncé la vente de ses parts de Crystal Palace, pensionnaire de Premier League anglaise. Selon le média britannique BBC, la transaction avec Woody Johnson, qui détient la franchise des Jets de New York (football américain), porterait sur 255 millions de dollars.M. Textor pense que renflouer à nouveau les caisses de l’OL, racheté à Jean-Michel Aulas en 2022, pourrait aider à sauver le club, qui a souffert de ses idées “étranges”.”Je suis le capitaliste qui arrive avec une tonne d’idées créatives étranges et je ne comprends pas pourquoi elles ne sont pas comprises ici, et cela fait du mal au club”, a-t-il regretté. “Donc je sais que nous allons mettre plus de capital dans l’équation”.Vendredi, l’OL a annoncé avoir validé sa procédure de viabilité financière avec l’UEFA, ce qui lui permet de disputer la Ligue Europa la saison prochaine, sous réserve de son maintien en Ligue 1.”Nous venons de vendre Crystal Palace, donc clairement, nous n’avons aucune difficulté financière. Nous n’avons jamais eu autant de liquidités”, avait insisté samedi John Textor auprès de la chaîne brésilienne TV Globo. “Le fait que nous n’ayons pas été retenus par la France tient donc davantage à certains éléments spécifiques”, avait-il estimé.”Nous nous sommes occupés des banques, nous avons stabilisé le club, nous l’avons ramené en Europe, et ensuite nous avons été rétrogradés administrativement (…) C’est un drôle de monde pour moi”, avait-il encore dit au média brésilien.