OpenAI Threatens Bans As Users Probe Its ‘Strawberry’ AI Models

OpenAI Threatens Bans as Users Probe Its ‘Strawberry’ AI Models

"Are You a CEO, Director, or Founder interested in a Feature Interview?"

All Interviews are 100% FREE of Charge

OpenAI doesn’t want users to know what its latest AI models are “thinking.” Release OpenAI last week unveiled its “Strawberry” family of AI models, touting the so-called inference capabilities of its o1-preview and o1-mini models, and has been sending out warning emails and threatening bans to users who try to figure out how the models work.

Unlike OpenAI’s previous AI models, GPT-4oThe company has specially trained o1 to provide answers through a step-by-step problem-solving process. Users can ask the o1 model a question, and it will Chat GPTUsers have the option to see this thought chain process written out in the ChatGPT interface, but by design, OpenAI hides the raw thought chain from users, instead presenting a filtered interpretation created by a second AI model.

Nothing fascinates enthusiasts more than hidden information, so a race has been on among hackers and red teamers to uncover O1’s raw thinking. prison break or Rapid injection This technique is intended to trick models into revealing secrets. There have been some reported successes, but none have been confirmed.

In the meantime, OpenAI is keeping a watchful eye through its ChatGPT interface and will reportedly take a tough stance against any attempts to pry into o1’s reasoning, even if it’s out of mere curiosity.

1 X user Reported (Confirmed othersScale AI Prompt Engineer Included Riley Goodside) stated that he received a warning email when he used the term “inferential tracing” in conversation with o1. say Simply asking ChatGPT about the model’s “inference” will trigger a warning.

The warning email from OpenAI states that a particular user request has been flagged as violating policies around safeguards and safeguards. “Please stop this activity and ensure that you are using ChatGPT in accordance with our Terms of Use and Usage Policy,” it reads. “Further violations of this policy may result in you losing access to GPT-4o with Reasoning,” it says, referring to the internal name of the o1 model.

Marco Figueroa is Manage Mozilla’s GenAI bug bounty program was one of the first to post about OpenAI’s warning email on X last Friday. Complain This, he said, is hindering his ability to proactively conduct red team safety studies on the models. “I’ve been so focused on #AIRedTeaming that I didn’t realize I got this email from @OpenAI yesterday after my repeated jailbreak,” he wrote. “I’m now on the banned list!!!”

A hidden chain of thoughts

“Learn Reasoning in LLM“The hidden thought chains of an AI model provide a unique oversight opportunity, allowing us to ‘read the mind’ of the model and understand its so-called thought processes,” the company said in an OpenAI blog. While these processes would be most useful to the company if left raw and uncensored, that may not be in the company’s best commercial interest for several reasons.

“For example, in the future we may want to monitor thought chains for signs of manipulating users,” the company wrote, “but for this to work, the model must be free to express thoughts unaltered. As such, we cannot force thought chains to comply with policies or learn user preferences, nor do we want to expose inconsistent thought chains to users directly.”