Ilya Sutskever, co-founder of OpenAI, believes that existing approaches to scaling vast language models have plateaued. To ensure meaningful progress in the future, AI labs will need to train smarter, not just at scale, and college-educated people will need to think a little longer.
I’m talking to ReutersSutskever explained that the pre-training phase of scaling vast language models like ChatGPT is reaching its limits. Pre-training is the initial phase during which massive amounts of uncategorized data are processed to build language patterns and structures in the model.
Until recently, adding scale, or increasing the amount of data available for training, was enough to create a more powerful and productive model. But this is no longer the case, what matters more is what you train the model on and how.
“The 2010s were the age of scaling, now we are back to the age of miracles and discoveries. Everyone is looking for the next thing,” Sutskever believes. “Scaling what’s right is more important now than ever.”
The backdrop is the increasingly clear issues facing AI labs as they make significant advances in models related to the power and performance of ChatGPT 4.0.
The short version of this narrative is that everyone now has access to the same, or at least similar, easily accessible training data through various online sources. You can no longer gain an advantage by simply throwing more raw data at the problem. To put it very simply, training smarter, not just harder, will now give AI teams an edge.
Another enabler of LLM performance will come at the other end of the process when models are fully trained and users have access to them, a step called inference.
Here, the idea is to use a multi-step approach to solving problems and queries, where the model can receive feedback, leading to more human-like reasoning and decision-making.
“We found that thinking a bot for just 20 seconds during a hand of poker provided the same performance gain as scaling the model 100,000 times and training it 100,000 times longer” – Noam Brown, an OpenAI researcher who worked on the latest o1 restricted liability company says.
In other words, if bots take longer to think and not just blurt out the first thing that comes to mind, they can deliver better results. If the latter proves to be a productive approach, the AI hardware industry could move away from massive training clusters toward banks of GPUs focused on improving inference.
Of course, either way, Nvidia will probably be willing to take everyone’s money. The enhance in demand for AI GPUs for inference is indeed something that Nvidia CEO Jensen Huang has recently noticed.
It’s unclear how long it will take for a generation of smarter bots to emerge through these methods. But the effort will likely show up in Nvidia’s bank balance soon.