Jensen Huang: NVIDIA - The $4 Trillion Company & the AI Revolution | Lex Fridman Podcast #494
FULL TRANSCRIPT
- The following is a conversation with Jensen Huang,
CEO of NVIDIA, one of the most important and influential
companies in the history of human civilization.
NVIDIA is the engine powering the AI revolution, and a lot
of its success can be directly attributed to Jensen's sheer force of
will and his many brilliant bets and decisions as a leader, engineer, and innovator.
This is Lex Fridman Podcast. And now dear friends, here's Jensen Huang.
You've propelled NVIDIA into a new era in
AI, moving beyond its focus on chip scale design to now rack scale
design. And I think it's fair to say that winning for NVIDIA for a
long time used to be about building the best GPU possible, and you still
do, but now you've expanded that to extreme co-design of
GPU, CPU memory, networking, storage, power cooling,
software, the rack itself, the pod that you've announced, and even the
data center. So let's talk about extreme co-design. What is the
hardest part of co-designing system with that many complex
components and design variables?
- Yeah, thanks for that question.
So first of all, the reason why extreme co-design is necessary is because the
problem no longer fits inside one computer to be accelerated by one GPU.
The problem that you're trying to solve is you would like to go faster
than the number of computers that you add. So you added
you know, 10,000 computers, but you would like it to go a million times
faster. Then all of a sudden you have to take the algorithm,
you have to break up the algorithm, you have to refactor it,
you have to shard the pipeline, you have to shard the data, you have to shard the
model. Now all of a sudden when you distribute the problem this way,
not just scaling up the problem, but you're distributing the problem,
then everything gets in the way. This is the Amdahl's Law
problem where the amount of speed up you have for something
depends on how much of the total workload it is. And so
if computation represents 50% of the problem, and I sped up computation infinitely
like a million times,
you know, I only sped up the total workload by a factor of two.
Now all of a sudden, not only do you have to distribute a
computation, you have to, you know, shard the pipeline somehow.
You also have to solve the networking problem
because you've got all of these computers are all connected together.
And so distributed computing at the scale that we do,
the CPU is a problem, the GPU is a problem, the networking is a problem, the
switching is a problem. And distributing the workload across all
these computers is a problem.
It's just a massively complex computer science problem. And so we just
gotta bring every technology to bear. Otherwise, we scale up linearly or we
scale up based on the capabilities of Moore's Law, which has
largely slowed because Dennard scaling has slowed.
- I'm sure there's trade-offs there. Plus you have a complete disparate
disciplines here. I'm sure you have specialists in each one of these high bandwidth
memory, the network and the NVLink, the NICs, the optics and the
copper that you're doing, the power delivery, the cooling, all of that. I mean, there's like world
experts in each of those. How do you get 'em in a room together to figure out-
- That's why my staff is so large.
- What's the process—can you take me through the process of the specialists and the
generalists? Like how do you put together the rack when you know the
s- the set of things you have to shove into a rack together?
Like what does that process look like of designing it all together?
- Yeah. There's the first question, which is: what is extreme co-design?
You're, you, we're optimizing across the entire stack of software
from architectures to chips, to systems, to system software, to the
algorithms, to the applications. That's one layer. The second thing that you and
I just talked about is goes beyond CPUs and GPUs and
networking chips and scale up switches and scale out switches.
And then of course, you gotta include power and cooling and all
of that because, you know, all these computers are extremely,
extremely power hungry. They do a lot of work and they're very
energy efficient, but they in aggregate still consume a lot of
power. And so that's one. The first question is, what is it?
The second question is, why is it, and we just spoke about the reason, you
know you want to distribute the workload so that you can exceed
the benefit of just increasing the number of computers.
And then the third question is, how is it, how do you do it?
And that's the, that's kind of the miracle of this
company. You know, when you're designing a computer, you have to have operating system of
computers. When you're designing a company,
you should first think about what is it that you want the company to produce. You know, I see
a lot of companies organization charts, and they all look the same.
Hamburger organization charts, soft organization charts, and car
company organization charts. They all look the same.
And it doesn't make any sense to me. You know, the goal of a company is to be the
company is to be the machinery, the mechanism, the system that produces
the output. And that output is the product that we like to create.
It is also designed, the architecture of the company should reflect
the environment by which it exists. It almost indirectly
says what you should do with the organization. My direct staff is 60 people.
You know, I don't have one-on-ones with 'em because it's impossible. You can't have, you can't have 60 people
on your staff if you're, you know, gonna get work done and-
- So you still have 60 reports. You still have across-
- More, yeah.
- More. And most stars at least have a foot in engineering.
- Almost all of them.
There's experts in memory, there's experts in CPUs, there's experts in
optical. All, all—
- That's incredible.
- Yeah, GPUs and— Architecture, algorithms, design, um—
- So, you constantly have an eye on the entire stack, and you're having to, like,
intense discussions about the designer of the entire stack?
- And no conversation is ever one person. That's why I don't do
one-on-ones. We present a problem
and all of us attack it. You know, because we're doing extreme co-design.
And literally, the company is doing extreme co-design all the time.
- So, even if you're talking about a particular component, like cooling,
networking, everybody's listening in?
- Yeah, exactly.
- And they can contribute, "Well, this doesn't work for the power distribution.
This doesn't-"
- Exactly.
- "... This doesn't work for the memory. This doesn't work for this."
- Exactly. And whoever wants to tune out, tune out. You know what I'm saying?
And the reason for that is because the people who are on the staff, they know
when to pay attention. There's supposed... You know, it's something they could have
contributed to, they didn't contribute to, "I'm going to call them out." You know?
And so, "Hey, come on, let's get in here."
- So, as you mentioned, NVIDIA is this company that's adapting to the environment.
So, at which point can you say, did the environment change and
began adapting sort of secretly-
... in the early days from GPU for gaming, maybe the
early deep learning revolution to we're now going to start thinking of it as an AI
factory? What does NVIDIA do? It produces AI, let's build a factory that makes AI.
- Uh, I could, I c- you, you could- I could reason through what just systematically.
We started out as a, as an accelerator company.
But the problem with accelerators is that the application domain's too narrow.
It has the benefit of being incredibly optimized for the
job. You know, any specialist has that benefit. The problem with
intense specialization is that, of course, your market reach is narrower,
but that's, that's even fine. The problem is, the market size also
dictates your R&D capacity. And your R&D capacity ultimately dictates
the influence and impact that you can possibly have in computing. And so,
when we first started out as an accelerator, very specific accelerator,
we always knew that had- that was going to be our first step. We had to find a
way to become accelerated computing. But the problem is, when you become a
computing company,
it's too general purpose and it takes away from your specialization. The
tur- I connected two words that are actually
have fundamental tension. The better computing company we become, the
worse we became as a specialist. The more of a specialist, the less
capacity we have to do overall computing. And so,
that... And I connected those two words together on purpose, that the company
has to find that really narrow path, step by step by step, to expand our
aperture of computing, but not give up on the most important
specialization that we had. Okay, so the first step that we took beyond
acceleration was, we invented a programmable pixel shader.
So, that was the first step towards programmability. You know, it was our
first journey towards moving into the world of computing. The second thing that we
did was we created we put
FP32 into our shaders. That FP32 step, IEEE-compatible FP32, was a huge step
in the direction of computing. It was the reason why all of the people who were
working on, on stream processors and, you know,
other types of data flow processors discovered us. And they said, "Hey, all of
a sudden, you know, we might be able to use this GPU that's incredibly computationally
intensive, and it's now, you know, compliant with IEEE."
I can take my software that I was writing, you know, previously on
CPUs, and I can, you know, see about, you know, using the GPU for that.
And which led us to create, put C on top of
FP32, what's called, we call Cg. The Cg path
took us to eventually CUDA. CUDA, step by step by step
We... Well, putting CUDA on GeForce, that was
a strategic decision that was very, very hard to do, because it cost the
company enormous amounts of our profits, and we couldn't afford it
at the time. But we did it anyways because we wanted to be a computing
company. A computing company has a computing
architecture. A computing architecture has to be compatible across all of
the chips that we build.
- Can you take me through that decision? So, putting CUDA on GeForce, could not afford to
do? Can you explain that decision? Why, why boldly choose to do that anyway?
Can you explain that decision?
- Yeah, excellent. That was, that was the first... I would say that that was
the first strategic decision that is as close to an existential threat.
- For people who don't know, it turned out to be, spoiler alert,
one of the most incredibly brilliant decisions ever made
by a company. So, CUDA turned out to be
an incredible foundation for computation in this AI infrastructure world. So-
- Thank you
- ... just setting the context. It turned out to be a good decision.
- Yeah, it turned out to have been a good decision. I think the... So, here's the way it
went. So, we invented this thing called CUDA, and
It expanded the aperture of applications
that, that we can accelerate with our accelerator. The question is, how do we,
how do we attract developers to CUDA?
Because a computing platform is all about developers. And developers
don't come to a computing platform just because, you know, it
could perform something interesting. They come to a computing platform because the
install base is large.
Because a developer, like anybody else, wants to develop software that
reaches a lot of people. So, the install base is, in fact,
the single most important part of an architecture. The
architecture could attract enormous amounts of criticism. For example,
no architecture has ever attracted more criticism than the
x86.... you know, as a less than, less than elegant architecture, but yet it is the
defining architecture of today. It gives you an example that in fact
so many RISC architectures which were beautifully architected,
incredibly well-designed by some of the brightest computer scientists in the
world, largely failed. And so I've given you two
examples where one is, you know, one is elegant, the other one's
barely aesthetic, and so yet x86 survived and the reason for-
- Install base is everything.
- Install base defines an architecture. Not... Everything else
is secondary, okay? And so there were other architectures at the time.
CUDA came out, OpenCL was here. There were... You know, there's several other competing
architectures. But the thing that... The decision that we made that was good
was we said, "Hey, look, ultimately it's about,
UNLOCK MORE
Sign up free to access premium features
INTERACTIVE VIEWER
Watch the video with synced subtitles, adjustable overlay, and full playback control.
AI SUMMARY
Get an instant AI-generated summary of the video content, key points, and takeaways.
TRANSLATE
Translate the transcript to 100+ languages with one click. Download in any format.
MIND MAP
Visualize the transcript as an interactive mind map. Understand structure at a glance.
CHAT WITH TRANSCRIPT
Ask questions about the video content. Get answers powered by AI directly from the transcript.
GET MORE FROM YOUR TRANSCRIPTS
Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.