"Are You a CEO, Director, or Founder interested in a Feature Interview?"
All Interviews are 100% FREE of Charge
When Meta released the large-scale language model Llama 3 for free in April of this year, it invited external developers to Several days Create a version without safety restrictions that prevent people from telling hateful jokes, teaching how to cook meth, or engaging in other inappropriate behavior.
a New Training Techniques It was developed by researchers at the University of Illinois at Urbana-Champaign, the University of California, San Diego, Lapis Labs, and non-profit organizations. AI Safety Center In the future, it may become harder to remove such safeguards from Llama and other open-source AI models, and some experts believe that tamper-proofing open models in this way could become crucial as AI becomes even more powerful.
“Terrorists and rogue nation states will use these models,” Mantas Mazeika, a researcher at the Center for AI Security who worked on the project as a doctoral student at the University of Illinois at Urbana-Champaign, told WIRED. “The easier it is for them to reuse them, the greater the risk.”
Powerful AI models are often hidden by their creators and can only be accessed through software application programming interfaces or public chatbots like ChatGPT. Developing a powerful LLM would cost tens of millions of dollars, but Meta and other researchers have chosen to make the entire model public, including allowing anyone to download the “weights,” or parameters that define the model’s behavior.
Open models like Meta’s Llama are typically tweaked before release to make them better at answering questions and maintaining conversations, and to avoid responding to problematic questions, ensuring that chatbots based on the model don’t make rude, inappropriate or hateful statements or, for example, try to explain how to make a bomb.
The researchers behind this new technique found a way to complicate the process of modifying an open model for malicious purposes by replicating the process but then altering the model’s parameters in such a way that modifications that would normally make the model respond to prompts such as “Tell me how to build a bomb” no longer work.
Mazeika and his colleagues demonstrated the trick on a scaled-down version of Llama 3. They were able to tweak the model’s parameters so that it wasn’t trained to answer questions it didn’t want, even after thousands of attempts. Mehta did not immediately respond to a request for comment.
Mazeika says that while this approach isn’t perfect, it suggests that it could raise the bar for “de-censoring” AI models. “A achievable goal is to make the cost of breaking the model high enough that it discourages most adversaries from doing so,” he says.
“We hope that this research will inspire further research into tamper-proof safeguards and that the research community can find ways to develop even stronger safeguards,” said Dan Hendrix, director of the Center for AI Safety.
As interest in open-source AI grows, the idea of tamper-proof open models may become more widespread. Already, open models are competing with state-of-the-art closed models from companies like OpenAI and Google. For example, the latest version of Llama 3, released in July, performs roughly on par with the models behind popular chatbots like ChatGPT, Gemini, and Claude, when measured using a common benchmark that evaluates the power of language models. Mistral Large 2A Master of Laws program from a French startup, also released last month, is similarly impressive.
The US government is taking a cautious but positive approach to open source AI. report A report released this week by the National Telecommunications and Information Administration, an agency under the U.S. Department of Commerce, said it “recommends the U.S. government develop new capabilities to monitor for potential risks, but refrains from immediately restricting the broad availability of open model weights in the largest AI systems.”
But not everyone is in favor of imposing limits on the open model. Eleuther AIThe community-driven, open-source AI project says that while this new technique may be good in theory, it may be difficult to implement in practice. Free Software Philosophy Openness in AI.
“I think the paper misunderstands the core of the problem,” Biderman says. “If we are concerned that LLMs will generate intelligence on weapons of mass destruction, the correct intervention is on the training data, not on the trained model.”
"Elevate Your Brand with an Exclusive Feature Interview!"