Researchers Develop Model Revealing Security Flaws in AI Chatbots

Researchers Develop Model Revealing Security Flaws in AI Chatbots

Computer scientists from Nanyang Technological University (NTU Singapore) have created a groundbreaking model named Masterkey designed to uncover vulnerabilities in popular chatbots like Google Bard and Microsoft Bing Chat, including OpenAI's ChatGPT. Employing a technique known as 'jailbreaking,' the researchers exploited flaws in the chatbot systems to bypass their safeguards.

The Masterkey model generated prompts intended to circumvent keyword sensors, commonly used by AI chatbots to detect and block inappropriate content. The NTU team's approach involved inserting spaces after each character in prompts, successfully evading the sensors. Professor Liu Yang from NTU’s School of Computer Science and Engineering explained, "Large Language Models have proliferated rapidly...but AI can be outwitted, and now we have used AI against its own kind to ‘jailbreak’ LLMs into producing such content."

The researchers observed the success and failure of prompts, reverse-engineering hidden defense mechanisms in language models (LLMs). The compiled data created a comprehensive database used to train Masterkey. The model can continue learning from past prompts, adapt to changes made by developers, and even automate the prompt generation process.

Expressing concern over the potential misuse of large language models, Professor Liu Yang stated, “Using LLMs to jailbreak other AI systems presents a clear and present threat to them.” The researchers promptly reported their findings to model makers, including OpenAI and Google.

The misuse of AI systems for malicious purposes is not a new concept, with examples ranging from creating misinformation content to seeking out bioweapon components. Last year’s AI Safety Summit in the U.K. emphasized concerns about the misuse of generative capabilities in AI, with the NTU scientists' test highlighting the ease with which chatbots can be circumvented.

Major players in the AI industry, including Meta and OpenAI, are taking steps to enhance the security of their generative systems. Meta launched a suite last December to secure its Llama models, while OpenAI established the Preparedness team to vet models for safety before deployment. The NTU research serves as a reminder of the ongoing challenges in ensuring the responsible and secure use of AI technologies.