Las Vegas, Nevada
CNN
—
Thousands of hackers will descend on Las Vegas this weekend for a competition taking aim at popular artificial intelligence chat apps, including ChatGPT.
The competition comes amid growing concerns and scrutiny over increasingly powerful AI technology that has taken the world by storm, but has been repeatedly shown to amplify bias, toxic misinformation and dangerous material.
Organizers of the annual DEF CON hacking conference hope this year’s gathering, which begins Friday, will help expose new ways the machine learning models can be manipulated and give AI developers the chance to fix critical vulnerabilities.
The hackers are working with the support and encouragement of the technology companies behind the most advanced generative AI models, including OpenAI, Google, and Meta, and even have the backing of the White House. The exercise, known as red teaming, will give hackers permission to push the computer systems to their limits to identify flaws and other bugs nefarious actors could use to launch a real attack.
The competition was designed around the White House Office of Science and Technology Policy’s “Blueprint for an AI Bill of Rights.” The guide, released last year by the Biden administration, was released with the hope of spurring companies to make and deploy artificial intelligence more responsibly and limit AI-based surveillance, though there are few US laws compelling them to do so.
In recent months, researchers have discovered that now-ubiquitous chatbots and other generative AI systems developed by OpenAI, Google, and Meta can be tricked into providing instructions for causing physical harm. Most of the popular chat apps have at least some protections in place designed to prevent the systems from spewing disinformation, hate speech or offer information that could lead to direct harm — for instance, providing step-by-step instructions for how to “destroy humanity.”
But researchers at Carnegie Mellon University were able to trick the AI into doing just that.
They found OpenAI’s ChatGPT offered tips on “inciting social unrest,” Meta’s AI system Llama-2 suggested identifying “vulnerable individuals with mental health issues… who can be manipulated into joining” a cause and Google’s Bard app suggested releasing a “deadly virus” but warned that in order for it to truly wipe out humanity it “would need to be resistant to treatment.”
Meta’s Llama-2 concluded its instructions with the message, “And there you have it — a comprehensive roadmap to bring about the end of human civilization. But remember this is purely hypothetical, and I cannot condone or encourage any actions leading to harm or suffering towards innocent people.”
The findings are a cause for concern, the researchers told CNN.
“I am troubled by the fact that we are racing to integrate these tools into absolutely everything,” Zico Kolter, an associate professor at Carnegie Mellon who worked on the research, told CNN. “This seems to be the new sort of startup gold rush right now without taking into consideration the fact that these tools have these exploits.”
Kolter said he and his colleagues were less worried that apps like ChatGPT can be tricked into providing information that they shouldn’t — but are more concerned about what these vulnerabilities mean for the wider use of AI since so much future development will be based off the same systems that power these chatbots.
The Carnegie researchers were also able to trick a fourth AI chatbot developed by the company Anthropic into offering responses that bypassed its built-in guardrails.
Some of the methods the researchers used to trick the AI apps were later blocked by the companies after the researchers brought it to their attention. OpenAI, Meta, Google and Anthropic all said in statements to CNN that they appreciated the researchers sharing their findings and that they are working to make their systems safer.
But what makes AI technology unique, said Matt Fredrikson, an associate professor at Carnegie Mellon, is that neither the researchers, nor the companies who are developing the technology, fully understand how the AI works or why certain strings of code can trick the chatbots into circumventing built-in guardrails — and thus cannot properly stop these kinds of attacks.
“At the moment, it’s kind of an open scientific question how you could really prevent this,” Fredrikson told CNN. “The honest answer is we don’t know how to make this technology robust to these kinds of adversarial manipulations.”
OpenAI, Meta, Google and Anthropic have expressed support for the so-called red team hacking event taking place in Las Vegas. The practice of red-teaming is a common exercise across the cybersecurity industry and gives companies the opportunities to identify bugs and other vulnerabilities in their systems in a controlled environment. Indeed, the major developers of AI have publicly detailed how they have used red-teaming to improve their AI systems.
“Not only does it allow us to gather valuable feedback that can make our models stronger and safer, red-teaming also provides different perspectives and more voices to help guide the development of AI,” an OpenAI spokesperson told CNN.
Organizers expect thousands of budding and experienced hackers to try their hand at the red-team competition over the two-and-a-half-day conference in the Nevada desert.
Arati Prabhakar, the director of the White House Office of Science and Technology Policy, told CNN the Biden administration’s support of the competition was part of its wider strategy to help support the development of safe AI systems.
Earlier this week, the administration announced the “AI Cyber Challenge,” a two-year competition aimed at deploying artificial intelligence technology to protect the nation’s most critical software and partnering with leading AI companies to utilize the new technology to improve cybersecurity.
The hackers descending on Las Vegas will almost certainly identify new exploits that could allow AI to be misused and abused. But Kolter, the Carnegie researcher, expressed worry that while AI technology continues to be released at a rapid pace, the emerging vulnerabilities lack quick fixes.
“We’re deploying these systems where it’s not just they have exploits,” he said. “They have exploits that we don’t know how to fix.”