TRANSCRIPTEnglish

Jensen Huang: NVIDIA - The $4 Trillion Company & the AI Revolution | Lex Fridman Podcast #494

2h 25m 55s23,098 words2,110 segmentsEnglish

FULL TRANSCRIPT

0:00

- The following is a conversation with Jensen Huang,

0:03

CEO of NVIDIA, one of the most important and influential

0:07

companies in the history of human civilization.

0:11

NVIDIA is the engine powering the AI revolution, and a lot

0:15

of its success can be directly attributed to Jensen's sheer force of

0:19

will and his many brilliant bets and decisions as a leader, engineer, and innovator.

0:26

This is Lex Fridman Podcast. And now dear friends, here's Jensen Huang.

0:33

You've propelled NVIDIA into a new era in

0:37

AI, moving beyond its focus on chip scale design to now rack scale

0:41

design. And I think it's fair to say that winning for NVIDIA for a

0:45

long time used to be about building the best GPU possible, and you still

0:49

do, but now you've expanded that to extreme co-design of

0:53

GPU, CPU memory, networking, storage, power cooling,

0:57

software, the rack itself, the pod that you've announced, and even the

1:01

data center. So let's talk about extreme co-design. What is the

1:05

hardest part of co-designing system with that many complex

1:08

components and design variables?

1:11

- Yeah, thanks for that question.

1:12

So first of all, the reason why extreme co-design is necessary is because the

1:16

problem no longer fits inside one computer to be accelerated by one GPU.

1:24

The problem that you're trying to solve is you would like to go faster

1:28

than the number of computers that you add. So you added

1:32

you know, 10,000 computers, but you would like it to go a million times

1:36

faster. Then all of a sudden you have to take the algorithm,

1:43

you have to break up the algorithm, you have to refactor it,

1:46

you have to shard the pipeline, you have to shard the data, you have to shard the

1:50

model. Now all of a sudden when you distribute the problem this way,

1:56

not just scaling up the problem, but you're distributing the problem,

2:00

then everything gets in the way. This is the Amdahl's Law

2:04

problem where the amount of speed up you have for something

2:09

depends on how much of the total workload it is. And so

2:13

if computation represents 50% of the problem, and I sped up computation infinitely

2:21

like a million times,

2:23

you know, I only sped up the total workload by a factor of two.

2:27

Now all of a sudden, not only do you have to distribute a

2:30

computation, you have to, you know, shard the pipeline somehow.

2:34

You also have to solve the networking problem

2:38

because you've got all of these computers are all connected together.

2:42

And so distributed computing at the scale that we do,

2:47

the CPU is a problem, the GPU is a problem, the networking is a problem, the

2:51

switching is a problem. And distributing the workload across all

2:55

these computers is a problem.

2:57

It's just a massively complex computer science problem. And so we just

3:00

gotta bring every technology to bear. Otherwise, we scale up linearly or we

3:09

scale up based on the capabilities of Moore's Law, which has

3:13

largely slowed because Dennard scaling has slowed.

3:16

- I'm sure there's trade-offs there. Plus you have a complete disparate

3:20

disciplines here. I'm sure you have specialists in each one of these high bandwidth

3:24

memory, the network and the NVLink, the NICs, the optics and the

3:28

copper that you're doing, the power delivery, the cooling, all of that. I mean, there's like world

3:31

experts in each of those. How do you get 'em in a room together to figure out-

3:34

- That's why my staff is so large.

3:37

- What's the process—can you take me through the process of the specialists and the

3:41

generalists? Like how do you put together the rack when you know the

3:45

s- the set of things you have to shove into a rack together?

3:48

Like what does that process look like of designing it all together?

3:51

- Yeah. There's the first question, which is: what is extreme co-design?

3:55

You're, you, we're optimizing across the entire stack of software

3:59

from architectures to chips, to systems, to system software, to the

4:02

algorithms, to the applications. That's one layer. The second thing that you and

4:06

I just talked about is goes beyond CPUs and GPUs and

4:10

networking chips and scale up switches and scale out switches.

4:15

And then of course, you gotta include power and cooling and all

4:19

of that because, you know, all these computers are extremely,

4:23

extremely power hungry. They do a lot of work and they're very

4:27

energy efficient, but they in aggregate still consume a lot of

4:31

power. And so that's one. The first question is, what is it?

4:34

The second question is, why is it, and we just spoke about the reason, you

4:38

know you want to distribute the workload so that you can exceed

4:42

the benefit of just increasing the number of computers.

4:47

And then the third question is, how is it, how do you do it?

4:51

And that's the, that's kind of the miracle of this

4:55

company. You know, when you're designing a computer, you have to have operating system of

4:59

computers. When you're designing a company,

5:02

you should first think about what is it that you want the company to produce. You know, I see

5:06

a lot of companies organization charts, and they all look the same.

5:09

Hamburger organization charts, soft organization charts, and car

5:13

company organization charts. They all look the same.

5:16

And it doesn't make any sense to me. You know, the goal of a company is to be the

5:20

company is to be the machinery, the mechanism, the system that produces

5:25

the output. And that output is the product that we like to create.

5:29

It is also designed, the architecture of the company should reflect

5:33

the environment by which it exists. It almost indirectly

5:37

says what you should do with the organization. My direct staff is 60 people.

5:43

You know, I don't have one-on-ones with 'em because it's impossible. You can't have, you can't have 60 people

5:47

on your staff if you're, you know, gonna get work done and-

5:51

- So you still have 60 reports. You still have across-

5:53

- More, yeah.

5:54

- More. And most stars at least have a foot in engineering.

5:59

- Almost all of them.

6:01

There's experts in memory, there's experts in CPUs, there's experts in

6:05

optical. All, all—

6:06

- That's incredible.

6:06

- Yeah, GPUs and— Architecture, algorithms, design, um—

6:11

- So, you constantly have an eye on the entire stack, and you're having to, like,

6:15

intense discussions about the designer of the entire stack?

6:18

- And no conversation is ever one person. That's why I don't do

6:22

one-on-ones. We present a problem

6:24

and all of us attack it. You know, because we're doing extreme co-design.

6:30

And literally, the company is doing extreme co-design all the time.

6:33

- So, even if you're talking about a particular component, like cooling,

6:38

networking, everybody's listening in?

6:40

- Yeah, exactly.

6:41

- And they can contribute, "Well, this doesn't work for the power distribution.

6:44

This doesn't-"

6:45

- Exactly.

6:45

- "... This doesn't work for the memory. This doesn't work for this."

6:49

- Exactly. And whoever wants to tune out, tune out. You know what I'm saying?

6:54

And the reason for that is because the people who are on the staff, they know

6:58

when to pay attention. There's supposed... You know, it's something they could have

7:01

contributed to, they didn't contribute to, "I'm going to call them out." You know?

7:04

And so, "Hey, come on, let's get in here."

7:07

- So, as you mentioned, NVIDIA is this company that's adapting to the environment.

7:11

So, at which point can you say, did the environment change and

7:15

began adapting sort of secretly-

7:19

... in the early days from GPU for gaming, maybe the

7:23

early deep learning revolution to we're now going to start thinking of it as an AI

7:27

factory? What does NVIDIA do? It produces AI, let's build a factory that makes AI.

7:32

- Uh, I could, I c- you, you could- I could reason through what just systematically.

7:35

We started out as a, as an accelerator company.

7:39

But the problem with accelerators is that the application domain's too narrow.

7:43

It has the benefit of being incredibly optimized for the

7:47

job. You know, any specialist has that benefit. The problem with

7:51

intense specialization is that, of course, your market reach is narrower,

7:57

but that's, that's even fine. The problem is, the market size also

8:03

dictates your R&D capacity. And your R&D capacity ultimately dictates

8:10

the influence and impact that you can possibly have in computing. And so,

8:14

when we first started out as an accelerator, very specific accelerator,

8:19

we always knew that had- that was going to be our first step. We had to find a

8:23

way to become accelerated computing. But the problem is, when you become a

8:27

computing company,

8:29

it's too general purpose and it takes away from your specialization. The

8:33

tur- I connected two words that are actually

8:37

have fundamental tension. The better computing company we become, the

8:41

worse we became as a specialist. The more of a specialist, the less

8:45

capacity we have to do overall computing. And so,

8:49

that... And I connected those two words together on purpose, that the company

8:53

has to find that really narrow path, step by step by step, to expand our

9:01

aperture of computing, but not give up on the most important

9:05

specialization that we had. Okay, so the first step that we took beyond

9:09

acceleration was, we invented a programmable pixel shader.

9:13

So, that was the first step towards programmability. You know, it was our

9:17

first journey towards moving into the world of computing. The second thing that we

9:21

did was we created we put

9:25

FP32 into our shaders. That FP32 step, IEEE-compatible FP32, was a huge step

9:33

in the direction of computing. It was the reason why all of the people who were

9:39

working on, on stream processors and, you know,

9:43

other types of data flow processors discovered us. And they said, "Hey, all of

9:47

a sudden, you know, we might be able to use this GPU that's incredibly computationally

9:51

intensive, and it's now, you know, compliant with IEEE."

9:55

I can take my software that I was writing, you know, previously on

9:58

CPUs, and I can, you know, see about, you know, using the GPU for that.

10:04

And which led us to create, put C on top of

10:08

FP32, what's called, we call Cg. The Cg path

10:12

took us to eventually CUDA. CUDA, step by step by step

10:17

We... Well, putting CUDA on GeForce, that was

10:22

a strategic decision that was very, very hard to do, because it cost the

10:25

company enormous amounts of our profits, and we couldn't afford it

10:29

at the time. But we did it anyways because we wanted to be a computing

10:33

company. A computing company has a computing

10:37

architecture. A computing architecture has to be compatible across all of

10:41

the chips that we build.

10:42

- Can you take me through that decision? So, putting CUDA on GeForce, could not afford to

10:46

do? Can you explain that decision? Why, why boldly choose to do that anyway?

10:52

Can you explain that decision?

10:53

- Yeah, excellent. That was, that was the first... I would say that that was

10:57

the first strategic decision that is as close to an existential threat.

11:06

- For people who don't know, it turned out to be, spoiler alert,

11:10

one of the most incredibly brilliant decisions ever made

11:14

by a company. So, CUDA turned out to be

11:18

an incredible foundation for computation in this AI infrastructure world. So-

11:23

- Thank you

11:24

- ... just setting the context. It turned out to be a good decision.

11:27

- Yeah, it turned out to have been a good decision. I think the... So, here's the way it

11:31

went. So, we invented this thing called CUDA, and

11:35

It expanded the aperture of applications

11:38

that, that we can accelerate with our accelerator. The question is, how do we,

11:44

how do we attract developers to CUDA?

11:48

Because a computing platform is all about developers. And developers

11:54

don't come to a computing platform just because, you know, it

11:58

could perform something interesting. They come to a computing platform because the

12:02

install base is large.

12:04

Because a developer, like anybody else, wants to develop software that

12:07

reaches a lot of people. So, the install base is, in fact,

12:11

the single most important part of an architecture. The

12:14

architecture could attract enormous amounts of criticism. For example,

12:20

no architecture has ever attracted more criticism than the

12:23

x86.... you know, as a less than, less than elegant architecture, but yet it is the

12:31

defining architecture of today. It gives you an example that in fact

12:35

so many RISC architectures which were beautifully architected,

12:42

incredibly well-designed by some of the brightest computer scientists in the

12:46

world, largely failed. And so I've given you two

12:49

examples where one is, you know, one is elegant, the other one's

12:53

barely aesthetic, and so yet x86 survived and the reason for-

12:58

- Install base is everything.

12:59

- Install base defines an architecture. Not... Everything else

13:03

is secondary, okay? And so there were other architectures at the time.

13:07

CUDA came out, OpenCL was here. There were... You know, there's several other competing

13:11

architectures. But the thing that... The decision that we made that was good

13:15

was we said, "Hey, look, ultimately it's about,

UNLOCK MORE

Sign up free to access premium features

INTERACTIVE VIEWER

Watch the video with synced subtitles, adjustable overlay, and full playback control.

SIGN UP FREE TO UNLOCK

AI SUMMARY

Get an instant AI-generated summary of the video content, key points, and takeaways.

SIGN UP FREE TO UNLOCK

TRANSLATE

Translate the transcript to 100+ languages with one click. Download in any format.

SIGN UP FREE TO UNLOCK

MIND MAP

Visualize the transcript as an interactive mind map. Understand structure at a glance.

SIGN UP FREE TO UNLOCK

CHAT WITH TRANSCRIPT

Ask questions about the video content. Get answers powered by AI directly from the transcript.

SIGN UP FREE TO UNLOCK

GET MORE FROM YOUR TRANSCRIPTS

Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.

    Jensen Huang: NVIDIA -… - Full Transcript | YouTubeTranscript.dev