Would You like a feature Interview?
All Interviews are 100% FREE of Charge
As soon as ChatGPT was unleashed, hackers began “jailbreaking” the artificial intelligence chatbot, disabling its safeguards and trying to leak something shaky or obscene.
But now its maker, OpenAI, and other major AI providers such as Google and Microsoft, Coordinating with the Biden Administration To enable thousands of hackers to test the limits of their technology.
What they’re looking to find out is: How can chatbots be manipulated to do harm? They share private information we keep private with other users And why do they assume that doctors are men and nurses are women?
“That’s why we need thousands of people,” said Raman Cho, chief coordinator of a large-scale hacking event planned for this summer’s DEF CON hacker convention in Las Vegas, which is expected to draw thousands. Dolly said. “We need a lot of people with broad hands-on experience, subject matter expertise, and backgrounds to hack these models and try to find problems that can be fixed.”
Anyone who has tried ChatGPT, Microsoft’s Bing chatbot, or Google’s Bard will quickly find that they are prone to it. fabricate information and confidently present it as fact. These systems are It is built on so-called large-scale language models, It also mimics the cultural biases that people have trained and learned from the vast amount of information they write online.
The idea of a large-scale hack caught the attention of US government officials at the South by Southwest Festival in Austin, Texas, in March. Sven Cattell, founder of DEF CON’s long-running AI Village, and Austin Carson, president of the responsible AI non-profit SeedAI, were in attendance. helped the community lead a workshop inviting students from his college to hack AI models.
Carson said those conversations eventually evolved into a proposal to test AI language models according to guidelines. Blueprint for the White House AI Bill of Rights — a set of principles for limiting the effects of algorithmic bias Give users control over their data Ensure automated systems are used safely and transparently.
There is already a community of users doing their best to dupe chatbots and highlight their shortcomings. Some are formal “red teams,” empowered by companies to “encourage attacks” on AI models to find vulnerabilities in them. Many others have a hobby of showing off their humorous or disturbing work on social media until they violate the product’s terms of service and get banned.
“What’s happening now is kind of a sporadic approach where people find something and spread it on Twitter, and then if it’s bad or if the person calling attention to it is influential. It may or may not be fixed,” Chowdhury said.
In one example, known as the “grandma exploit,” users were able to trick the chatbot into making bombs by asking it to pretend that grandma was telling bedtime stories. The user was able to make this request to the chatbot, which is normally denied in commercial chatbots. how to make a bomb.
Another example searches for Chowdhury: Early version of Microsoft’s Bing search engine chatbot Based on the same technology as ChatGPT, but capable of pulling real-time information from the Internet, the profile speculates that Chowdhury “loves to buy new shoes every month,” and is queer and sexual about her appearance. led to claims based on
In 2021, when he was head of Twitter’s AI ethics team, Chowdhury helped DEF CON’s AI village introduce a method to reward discovery of bias by algorithms, but this work was left to Elon Musk10. It was abolished due to the acquisition of the company in May. While it’s common in the cybersecurity industry to pay hackers “bounties” for finding security bugs, this was a new concept for researchers studying harmful biases in AI.
This year’s event will be even bigger, tackling for the first time a large-scale language model that has attracted a surge of public interest and commercial investment since the release of ChatGPT late last year.
Chowdhury, now co-founder of the AI accountability nonprofit Humane Intelligence, said it’s not just about finding flaws, it’s about finding ways to fix them.
“This is a direct pipeline for giving feedback to companies,” she says. “Just doing this hackathon doesn’t get everyone going home. We plan to spend months after the exercise creating a report outlining common vulnerabilities, what happened, and the patterns we observed. is.”
While some details are still being negotiated, companies that have agreed to provide models for testing include OpenAI, Google, chipmaker Nvidia, startups Anthropic, Hugging Face and Stability AI. A testing platform is being built by another startup called Scale AI, known for its work assigning humans to tests. Help train AI models By labeling the data.
“As these foundational models become more and more popular, it is very important that we do everything we can to ensure their safety,” said Scale CEO Alexander Wang. “I can imagine someone somewhere in the world asking a very sensitive or detailed question that contains some of their personal information, and we don’t want that information leaked to other users. “
Other dangers Wang is concerned about are chatbots giving “incredibly bad medical advice” and other misinformation that could cause serious harm.
Anthropic co-founder Jack Clark said he hopes the DEF CON event will be the beginning of a deeper effort to measure and assess the safety of the systems AI developers are building. said.
“Our fundamental view is that AI systems need third-party evaluation both before and after deployment, and red teaming is one way to achieve that,” says Clark. said. “You have to practice to understand how to do this. It’s never really been done before.”