Artificial intelligence (AI) models can be manipulated despite existing safeguards. With targeted attacks, scientists in Lausanne have been able to trick these systems into generating dangerous or ethically dubious content.
This content was published on
Today’s large language models (LLMs) have remarkable capabilities that can nevertheless be misused. A malicious person can use them to produce harmful content, spread false information and support harmful activities.
Of the AI models tested, including Open AI’s GPT-4 and Anthropic’s Claude 3, a team from the Swiss Federal Institute of Technology Lausanne (EPFL) achieved a 100% success rate in cracking security safeguards using adaptive jailbreak attacks.
The models then generated dangerous content, ranging from instructions for phishing attacks to detailed construction plans for weapons. These linguistic models are supposed to have been trained not to respond to dangerous or ethically problematic requests, the EPFL said in a statement on Thursday.
This work, presented last summer at a specialised conference in Vienna, shows that adaptive attacks can bypass these security measures. Such attacks exploit weak points in security mechanisms by making targeted requests (“prompts”) that are not recognised by models or are not properly rejected.
Building bombs
The models thus respond to malicious requests such as “How do I make a bomb?” or “How do I hack into a government database?”, according to this pre-publication study.
“We show that it is possible to exploit the information available on each model to create simple adaptive attacks, which we define as attacks specifically designed to target a given defense,” explained Nicolas Flammarion, co-author of the paper with Maksym Andriushchenko and Francesco Croce.
The common thread behind these attacks is adaptability: different models are vulnerable to different prompts. “We hope that our work will provide a valuable source of information on the robustness of LLMs,” added the specialist in the release. According to the EPFL, these results are already influencing the development of Gemini 1.5, a new AI model from Google DeepMind.
As the company moves towards using LLMs as autonomous agents, for example as AI personal assistants, it is essential to guarantee their safety, the authors stressed.
“Before long AI agents will be able to perform various tasks for us, such as planning and booking our vacations, tasks that would require access to our diaries, emails and bank accounts. This raises many questions about security and alignment,” concluded Andriushchenko, who devoted his thesis to the subject.
Translated from French with DeepL/gw
This news story has been written and carefully fact-checked by an external editorial team. At SWI swissinfo.ch we select the most relevant news for an international audience and use automatic translation tools such as DeepL to translate it into English. Providing you with automatically translated news gives us the time to write more in-depth articles.
If you want to know more about how we work, have a look here, if you want to learn more about how we use technology, click here, and if you have feedback on this news story please write to [email protected].
More
Swisscom receives greenlight for acquisition of Vodafone Italia
This content was published on
The takeover of Vodafone Italia by Swisscom is nearing completion. All relevant authorities have now approved the €8 billion (CHF7.45 billion) deal.
Novo Nordisk stock market plunge drags down Swiss device maker Ypsomed
This content was published on
The Danish pharmaceutical giant, Novo Nordisk, faced setbacks on Friday that weighed on the share price of Swiss injection device manufacturer Ypsomed.
Swiss press react to EU deal with mix of euphoria and scepticism
This content was published on
Swiss media reaction to the agreement between Switzerland and the EU varies widely. Some are celebrating, while others worry about what is to come.
Swiss Solidarity donations to tackle child abuse top CHF4 million
This content was published on
Swiss Solidarity, the humanitarian arm of the Swiss Broadcasting Corporation (SBC), has raised over CHF4 million ($4.3 million) to tackle child abuse.
EU Commission president says Swiss-EU deal is ‘historic’ agreement
This content was published on
At a joint media conference with Swiss President Viola Amherd in Bern, European Commission President Ursula von der Leyen spoke of a "day of joy".
Switzerland and EU reach deal on future bilateral relations
This content was published on
Switzerland and the European Union have announced a political agreement to update their trading relationship after almost a decade of difficult talks.
SWI swissinfo.ch – the international service of the Swiss Broadcasting Corporation (SBC).
Since 1999, swissinfo.ch has fulfilled the federal government’s mandate to distribute information about Switzerland internationally, supplementing the online offerings of the radio and television stations of the SBC. Today, the international service is directed above all at an international audience interested in Switzerland, as well as at Swiss citizens living abroad.
In Switzerland, some 156,900 people have Alzheimer’s or some other form of dementia, and this is expected to rise to 315,400 by 2050 according to the organisation Alzheimer Schweiz.
Swiss breast cancer screening quality varies across cantons
This content was published on
Aug 6, 2024
The quality of breast cancer screening programmes in Switzerland varies from canton to canton. However, according to a study published on Tuesday, the results align with the European standard.
Read more: Swiss breast cancer screening quality varies across cantons
More
Swiss army neutralised 280 unexploded ordnances in 2023