State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
完整文本记录
- The following is a conversation all about the state-of-the-art in artificial
intelligence, including some of the exciting technical breakthroughs and
developments in AI that happened over the past year, and
some of the interesting things we think might happen this upcoming
year. At times, it does get super technical,
but we do try to make sure that it remains accessible to folks
outside the field without ever dumbing it down. It
is a great honor and pleasure to be able to do this kind of
episode with two of my favorite people in the AI
community, Sebastian Raschka and Nathan
Lambert. They are both widely respected machine
learning researchers and engineers who also happen to be great
communicators, educators, writers, and X posters.
Sebastian is the author of two books
I highly recommend for beginners and experts alike. First is
Build a Large Language Model from Scratch
and Build a Reasoning Model from Scratch. I
truly believe in the machine learning world, the
best way to learn and understand something is to build it
yourself from scratch. Nathan is
the post-training lead at the Allen Institute for AI,
author of the definitive book on Reinforcement Learning from Human Feedback.
Both of them have great X accounts, great Substacks.
Sebastian has courses on YouTube, Nathan has a podcast.
And everyone should absolutely follow all of those.
those. This is the Lex Fridman podcast. To support it, please
check out our sponsors in the description, where you can also find
links to contact me, ask questions, get feedback, and so
on. And now, dear friends, here's Sebastian Raschka and Nathan Lambert.
So I think one useful lens to look at all this through is
the so-called DeepSeek moment. This happened about
a year ago in January 2025, when the open-weight Chinese
company DeepSeek released DeepSeek R1, that I
think it's fair to say surprised everyone with near-state-of-the-art
performance, with allegedly much less compute for much cheaper. And from then
to today, the AI competition has gotten insane,
both on the research and product level. It's just been accelerating.
discuss all of this today, and maybe let's start with some spicy
questions if we can.
Who's winning at the international level? Would you say it's the set
of companies in China or the set of companies in the United States?
And Sebastian, Nathan, it's good to see you guys.
guys. So Sebastian, who do you think is winning?
- Winning is a very broad term.
I would say you mentioned the DeepSeek moment, and I think DeepSeek is winning
the hearts of the people who work on open-weight models because they share
these as open models. Winning, I think, has multiple
timescales to it. We have today, we have next year, we have in 10
years. One thing I know for sure is that I don't
think nowadays, in 2026, that there will be any
company that has access to technology that no other
company has access to. That is mainly because researchers
are frequently changing jobs and labs.
They rotate. I don't think there will be a clear winner in terms of
technology access. However, I do think there will be,
The differentiating factor will be budget and hardware constraints.
I don't think the ideas will be proprietary,
but rather the resources needed to implement them. I don't see
currently a winner-take-all scenario. I can't see that. At the moment.
- Nathan, what do you think?
- You see the labs put different energy into what they're trying to do, and
I think to demarcate the point in time when we're recording this, the hype
over Anthropic's Claude Opus 4.5 model has been
absolutely insane, which is just... I mean, I've used it and built stuff
in the last few weeks, and it's... it's almost gotten to the point where it feels like a bit of
a meme in terms of the hype. And it's
kind of funny because this is very organic, and then if we go back a few months
ago, we can see the release date and the notes, as Gemini 3 from Google got
released, and it seemed like the
marketing and just, like, wow factor of that release was super
high. But then at the end of November, Claude Opus 4.5 was released and
the hype has been growing, but Gemini 3 was before this. And it kind of feels
like people don't really talk about it as much, even though when it came out, everybody was like, this
is Gemini's moment to retake Google's
structural advantages in AI. And Gemini 3 is a fantastic model, and I still use it.
It's just kind of differentiation is lower. And I
agree with Sebastian; what you're saying with all these, the idea space is
very fluid, but culturally Anthropic is known for betting very
hard on code, which is the Claude Code thing, is working out for them right now. So I
think that even if the ideas flow pretty freely, so much of this is
bottlenecked by human effort and the culture of organizations, where Anthropic
seems to at least be presenting as the least chaotic. It's a
bit of an advantage, if they can keep doing that for a while. But on the other
side of things, there's a lot of ominous technology from China where
there's way more labs than DeepSeek. So DeepSeek kicked off
a movement within China, I say kind of similar to how
ChatGPT kicked off a movement in the US where everything had a chatbot. There's now
tons of tech companies in China that are releasing very strong frontier open-weight
models, to the point where I would say that DeepSeek is kind of losing its crown as the
preeminent open model maker in China, and the likes of
Z.ai with their GLM models, Minimax's models,
Kimi Moonshot, especially in the last few months, has shown more
brightly. The new DeepSeek models are still very strong, but that's kind of
a... it could look back as a big narrative point where in 2025
DeepSeek came and it provided this platform for way more Chinese
companies that are releasing these fantastic models to kind of have this new
type of operation. So these models from these Chinese companies are open-weights, and
depending on this trajectory of business models that these American companies are
doing, they could be at risk. But currently, a lot of people are paying
for AI software in the US, and historically in China and other
parts of the world, people don't pay a lot for software.
- So some of these models like DeepSeek have the love of the people because
they are open-weight. How long do you think the Chinese companies keep
releasing open-weight models?
- I would say for a few years. I think that, like in the US, there's not a
clear business model for it. I have been writing about open models for a while,
and these Chinese companies have realized it. So I get inbound from some of them.
And they're smart and realize the same constraints: a lot of top US tech
companies and other IT companies won't pay for an API subscription to
Chinese companies for security concerns. This has been a long-standing
habit in tech, and the people at these companies then see open
weight models as an ability to influence and take part of a huge growing
AI expenditure market in the US. And they're very realistic about this,
and it's working for them. I think that the government will see that that is
building a lot of influence internationally in terms of uptake of the technology,
so there's going to be a lot of incentives to keep it going. But building
these models and doing the research is very expensive, so at some point, I expect
consolidation. But I don't expect that to be a story of 2026, where there will be
more open model builders throughout 2026 than there were in 2025. And a
lot of the notable ones will be in China.
- You were going to say something?
- Yes. You mentioned DeepSeek losing its crown. I do think to some extent, yes, but
we also have to consider though, they are still, I would say, slightly ahead. And
the other ones—it's not that DeepSeek got worse, it's just that the other ones
are using the ideas from DeepSeek. For example, you mentioned Kimi—same
architecture, they're training it. And then again, we have this leapfrogging
where they might be at some point in time a bit better because they have the more recent
model. And I think this comes back to the fact that there won't be
a clear winner. It will just be like that: one person releases
something, the other one comes in, and the most recent model is probably always the
best model.
- Yeah. We'll also see the Chinese companies have different incentives. Like,
DeepSeek is very secretive, whereas some of these startups are
like the MiniMaxs and Z.ais of the world. Those two literally have filed
IPO paperwork, and they're trying to get Western
mindshare and do a lot of outreach there. So I don't know if these incentives will change the
model development, because DeepSeek famously is built by a hedge fund,
Highflyer Capital, and we don't know exactly what they use the
models for or if they care about this.
- They're secretive in terms of communication; they're not secretive in terms of the technical reports that
describe how their models work. They're still open on that front. And we should also
say, on the Claude Opus 4.5 hype, there's the layer of something
being the darling of the X echo chamber, on the
Twitter echo chamber, and the actual amount of people that are using the
model. I think it's probably fair to say that ChatGPT and
Gemini are focused on the broad user base that just
want to solve problems in their daily lives, and that user base
is gigantic. So the hype about the coding may not be
representative of the actual use.
- I would say also a lot of the usage patterns are,
like you said, name recognition, brand and stuff, but also
muscle memory almost, where, you know, ChatGPT has been around
for a long time. People just got used to using it, and it's almost like a flywheel:
they recommend it to other users and that stuff. One interesting point is also
the customization of LLMs. For example, ChatGPT has a
memory feature, right? And so you may have a subscription and you
use it for personal stuff, but I don't know if you want to use that same thing at work.
Because it's a boundary between private and work. If you're working at a company, they might not
allow that or you may not want that. And I think that's also an interesting point
where you might have multiple subscriptions. One is just clean code.
It has nothing of your personal images or hobby
projects in there. It's just like the work thing. And then the other one is your personal thing.
So I think that's also something where there are two different use cases, and it doesn't mean
you only have to have one. I think the future is also multiple ones.
- What model do you think won 2025, and what model do you think is going to win '26?
- I think in the context of consumer chatbots, it's a question of: are you willing to
bet on Gemini over ChatGPT?
Which I would say, in my gut, feels like a bit of a risky bet
because OpenAI has been the incumbent, and there are so many benefits to that in tech.
I think the
momentum, if you look at 2025, was on Gemini's side, but they were starting from
such a low point. And RIP Bard and these earlier attempts at getting started.
Huge credit to them for powering through the organizational chaos to make that happen.
But also it's hard to bet against OpenAI because they always come off as
so chaotic, but they're very good at landing things. And I think,
personally, I have very mixed reviews of GPT-5, but it must have
saved them so much money with the high-line feature being a router where
most users are no longer charging their GPU costs as much.
So I think it's very hard to dissociate
the things that I like out of models versus the things that are going to
actually be a general public differentiator.
- What do you think about 2026? Who's going to win?
- I'll say something, even though it's risky. I think Gemini will continue to make progress on ChatGPT.
I think Google's scale, when both of these are
operating at such extreme scales—and Google has the
ability to separate research and product a bit better, whereas you hear so much
about OpenAI being chaotic operationally and chasing the high-impact thing,
which is a very startup culture. And then on the software and enterprise side,
I think Anthropic will have continued success, as they've again and again been set up for that.
And obviously Google Cloud has a lot of offerings,
but I think this kind of Gemini name brand is important for them to build.
Google Cloud will continue to do well, but
that's a more complex thing to explain in the
ecosystem, because that's competing with the likes of Azure and AWS rather than
on the model provider side.
- So in infrastructure, you think TPU is giving an advantage?
- Largely because the margin on NVIDIA chips is insane, and
Google can develop everything from top to bottom to fit their stack and not have
to pay this margin. And they've had a head start in building data
centers. So all of these things that have both high lead times and very hard margins on