Google DeepMind, Google’s AI research lab, has released fresh research on training AI models that claims to significantly improve both training speed and energy efficiency by orders of magnitude, delivering 13 times faster performance and 10 times higher energy efficiency than other methods. The fresh training method IS timely, as conversations about the environmental impact of AI data centers heat up.
DeepMind’s method, called JEST, or collaborative example selection, breaks away from customary AI model training techniques in a straightforward way. Typical training methods focus on individual data points for training and learning, while JEST trains on entire batches. The JEST method first creates a smaller AI model that will assess the quality of data from extremely high-quality sources, ranking batches by quality. It then compares that assessment to a larger set of lower quality data. The tiny JEST model determines the batches most suitable for training, and then the immense model is trained based on the findings of the smaller model.
Just paper, available hereprovides a more detailed explanation of the processes used in the study and future research.
The DeepMind researchers make clear in their paper that this “ability to drive the data curation process toward a distribution of smaller, well-curated datasets” is vital to the success of the JEST method. Success is the right word for this research; DeepMind claims that “our approach outperforms state-of-the-art models by up to 13× fewer iterations and 10× fewer computations.”
Of course, this system relies entirely on the quality of the training data, as the bootstrapping technique falls apart without a human-curated set of the highest quality data. Nowhere is the mantra of “garbage in, garbage out” more true than in this method, which attempts to “jump ahead” in the training process. This makes the JEST method much more challenging to adapt for hobbyists or amateur AI programmers than most, as expert-level research skills are likely required to develop the highest quality initial training data.
The study comes not a moment too soon as the tech industry and global governments begin to discuss AI’s extreme energy demands. AI workloads will consume an estimated 4.3 GW in 2023, nearly equaling Cyprus’ annual energy consumption. And they’re not slowing down: A single ChatGPT request costs 10 times more than a Google search, and Arm’s CEO estimates AI will take up a quarter of the US power grid by 2030.
Whether and how the Jest methods will be adopted by major players in the AI space remains to be seen. The GPT-4o reportedly cost $100 million to train, and future larger models could soon reach $1 billion, so companies are likely looking for ways to save their wallets in this department. Hopefuls believe the Jest methods will be used to maintain current training productivity rates at significantly lower power draw, which will reduce the cost of AI and support the planet. However, it is much more likely that the capital machine will keep the pedal to the metal, using the Jest methods to maintain maximum power draw in order to achieve hyper-fast training results. Cost savings versus scale of results, who will win?