OpenAI has released another fresh chatbot for us penniless unsuspecting humans. We give you o1a chatbot designed for more advanced reasoning that is supposedly better at tasks like coding, math, and multi-step problem solving.
Perhaps the most significant change from previous OpenAI LLMs is the move away from mimicking patterns found in text-based training data to focus on more direct problem-solving, through reinforcement learning. The end result is said to be a more consistent, precise chatbot.
“We found that this model was less likely to cause hallucinations,” said Jerry Tworek, OpenAI research manager. The Verge saidOf course, “hallucinate less” doesn’t mean no hallucinations at all. “We can’t say we’ve solved the problem of hallucinations,” Tworek says. Ah.
Still, the o1 is supposed to employ a sort of “thought chain,” which is similar to how we humans process problems, step by step. This contributes to significantly higher claimed performance in tasks like coding and math.
Apparently o1 scored 83% on the International Mathematical Olympiad qualifying exam, significantly better than GPT-4o’s rather paltry 13%. It has also performed well in coding competitions, and OpenAI says its next upcoming update will allow it to match PhD students “on challenging physics, chemistry, and biology test cases.”
But despite these advances, or perhaps because of them, this fresh bot is actually worse in some ways. It has fewer facts about the world at its fingertips, and it can’t browse the web or process images. It’s also currently slower to respond and spit out answers than GPT-4o.
Of course, one immediate question that arises from all this is whether this fresh chatbot still suffers from any of the surprising limitations of previous bots. Can o1, for example, count to three?
Apparently yes, it can. GPT-4o can apparently be confused when asked to count the number of “r”s in the word “strawberry”, but it only managed to count to two. But o1 goes all the way to three.
This leap in computational power doesn’t come low-cost, though. Developer access costs $15 for 1 million input tokens and $60 for 1 million output tokens. That’s three and four times more than GPT-4o, respectively.
ChatGPT Plus and Team users reportedly already have access to an initial version of the bot, known as o1-preview. In the meantime, a version called o1-mini will be made available for free in the future, although OpenAI did not provide a date.
In sum, it seems that a bot capable of more reliable answers — and more practical reasoning — is a step toward something more useful in the real world, while also coming closer to general or human intelligence.
That’s exactly what OpenAI is planning. “We’ve spent months working on reasoning because we think it’s a game-changing breakthrough,” says Bob McGrew, OpenAI’s director of research. “It’s basically a new modality for models to be able to solve the really hard problems that are needed to get to near-human intelligence.”
Anyway, if she can actually count to three, I’m impressed. And as a routine precaution, needless to say, I, for one, welcome, you know, the rest.