This is a little earlier than I expected, but completely in line. Like I wrote over the holidays, there is no more low-hanging fruit. But smarter (and massively cheaper) training strategies seem to yield “nearly as good” results as turning datacenters into massively expensive power sinks, so no wonder the market is jittery.
After all, deepseek-r1
(with which I have been playing with in quantized form) makes it plain to everyone that we’ve been essentially brute-forcing the entire thing, regardless of any quibbles there still might be over their actual training cost (because someone had to be that person and try to cast doubt).
But it’s also plain that the market hasn’t really bought into AI, and my personal take is that the uncertainty around the new U.S. administration’s tariff policies and a pretty insane week didn’t help–otherwise people would have realized that companies like ASML will always make money in today’s economy, AI or not.
While it’s easy to panic about a potential shift in the AI landscape, the reality is that U.S. firms still hold significant advantages in terms of access to advanced chips and infrastructure. And even though I think that OpenAI has been somewhat sitting on its laurels, I see no reason (other than their perennial lack of focus) for OpenAI not to improve upon what DeepSeek did, and for those techniques to become merely the first step in a wave of research into even greater efficiency gains.
After all, a lot of current AI research excels in copying good ideas rather than finding better ones, so I expect a torrent of Arxiv papers to follow suite.
Update: Ege Erdil wrote up a nice walkthrough of the techniques used and Ben Thomson has a great overview of why it matters.