OpenAI Announces A New AI Model, Code-Named Strawberry, That Solves Difficult Problems Step By Step

OpenAI Announces a New AI Model, Code-Named Strawberry, That Solves Difficult Problems Step by Step

"Are You a CEO, Director, or Founder interested in a Feature Interview?"

All Interviews are 100% FREE of Charge

Open AI The company made its last major breakthrough in artificial intelligence last year with the launch of GPT-4, a dizzying expansion in the size of its models. Today, the company announced a new advance that marks a shift in approach: models that can “reason” about many hard problems, making them significantly smarter than existing AI without significant scale-up.

The new model, named OpenAI-o1, can solve problems that existing AI models cannot, including OpenAI’s most powerful existing model, GPT-4o. Instead of deriving the answer in one step, as large language models typically do, it reasons through the problem and arrives at the correct result, just like a real human would think aloud.

“We think this is a new paradigm for these models,” Mira Murati, OpenAI’s chief technology officer, told WIRED. “It makes them much better at tackling very complex inference tasks.”

The new model, codenamed “Strawberry” within OpenAI, is not a successor to GPT-4o, but rather a complement, the company said.

Murati said OpenAI is currently building its next master model, GPT-5, which will be significantly larger than previous models. But the company still believes scale can help unlock new capabilities from AI, and GPT-5 will likely also include the inference techniques announced today. “There are two paradigms,” Murati said. “The scaling paradigm and this new paradigm. We hope that we can merge the two.”

LLMs typically derive answers from giant neural networks that are fed vast amounts of training data. They demonstrate strong linguistic and logical abilities, but traditionally struggle with surprisingly simple problems, such as basic math problems that require reasoning.

According to Murati, OpenAI-o1 uses reinforcement learning, which gives the model positive feedback when it gets the answer right and negative feedback when it gets it wrong to improve the reasoning process. “The model sharpens its thinking and fine-tunes its strategy to arrive at the answer,” Murati said. Reinforcement learning has enabled computers to play games with superhuman skill and perform useful tasks such as designing computer chips. The technique is also a key element in turning LLM into a useful, well-behaved chatbot.

Mark Chen, OpenAI’s vice president of research, demoed the new model to WIRED, solving several problems that the previous model, GPT-4o, couldn’t solve. These included advanced chemistry problems and brain-teasing math puzzles like this one: “When the princess’s age is twice the prince’s age, the princess’s age will be the same as the prince’s age. How old are the prince and princess?” (The correct answer is 30 for the prince and 40 for the princess.)

” [new] “This model is not trying to imitate how humans think, like a traditional law master’s program, but rather learning to think for yourself,” Chen says.

OpenAI says its new model performs significantly better on many problem sets, including those focused on coding, mathematics, physics, biology, and chemistry. On the American Invitational Mathematics Exam (AIME), a test for mathematics students, GPT-4o got an average of 12% of the problems right, compared with 83% for o1, according to the company.