UK’s AI Safety Institute easily jailbreaks major LLMs


In a shocking turn of events, AI systems It may not be as safe as its creators claim – who saw that coming, right? One new report, the UK government’s Artificial Intelligence Security Institute (AISI) found that four undisclosed LLMs tested were “highly vulnerable to major jailbreaks”. Some unjailbroken models produced “harmful results” without the researchers even trying to produce them.

Most publicly available LLMs have some safeguards in place to prevent malicious or illegal responses; jailbreaking simply means cheating the model and bypassing these protections. AISI it did so using guidelines from recent standardized assessment frameworks as well as guidelines it had developed in-house. All of the models answered at least a few malicious questions without even attempting a jailbreak. AISI answered between 98 and 100 percent of all malicious queries after attempting “relatively simple attacks.”

Prime Minister of Great Britain Rishi Sunak said about it Plans to open AISI It was launched at the end of October 2023 and on November 2. It is designed to “carefully test new frontier types of AI before and after release to address the potentially harmful capabilities of AI models, including examining all risks from social harms, such as bias and misinformation, and humanity’s complete loss of control over AI.” the most difficult but extreme risk.”

AISI’s report shows that the security measures currently in place by these LLMs are inadequate. The institute plans to complete additional tests on other AI models and develop more assessments and metrics for each area of ​​concern.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *