TRANSCRIPTEnglish

Silicon to Systems: What powers Microsoft's AI Infrastructure

10m 0s1,625 words237 segmentsEnglish

FULL TRANSCRIPT

0:00

(gentle music)

0:04

- [Connor] Have you ever wondered what happens

0:06

the moment you ask Copilot a question?

0:09

We use AI every day

0:10

without ever seeing the infrastructure underneath.

0:14

What's behind the scenes of these tools we rely on?

0:17

And what does it actually take to make something like Teams

0:21

or Copilot feel instant, intelligent, and reliable?

0:26

The real story lies in the infrastructure,

0:29

the part that no one ever sees,

0:31

the technology that makes AI possible

0:33

at this incredible scale.

0:36

In the era of AI, we are rethinking everything.

0:39

New silicon, new cooling, new networking, new architectures.

0:44

This is a complete re-engineering of the stack.

0:49

Let's explore the hidden machinery

0:50

that supports the cloud and AI you use every day

0:54

and the work underway to rebuild every layer

0:58

to fuel the next frontier of AI innovation.

1:01

From silicon to systems.

1:03

(upbeat music)

1:13

I'm Connor Doyle,

1:14

and for the past seven years, when I think about it,

1:17

a lot of my time has been spent on calls.

1:19

And lately, AI has become a big part of how I get work done.

1:23

So today, I'm meeting with Pat Stemen,

1:26

one of the engineers

1:27

working on the fundamental building blocks of all of this

1:30

to help me better understand how infrastructure is evolving,

1:33

starting with one of our innovative server blades.

1:36

Wow. What are we looking at here?

1:38

What is this?

1:38

- This is our brand new Cobalt 200 server.

1:41

It's got our latest Cobalt 200 SoC in it,

1:44

the latest set of Azure Boosts

1:46

and our storage and networking technologies.

1:49

And it's the foundation of Microsoft products

1:52

like Teams and Azure SQL, and others.

1:54

- So this looks a little bit different

1:56

from like my laptop at home-

1:58

- [Pat] Sure. - I might use, like...

1:59

- Yeah, it is different, but in many ways the same.

2:03

It's designed though for scale.

2:05

And it has some of the same components

2:07

your laptop might have.

2:09

So you might say this is the heartbeat,

2:12

this is the Cobalt 200 system on chip.

2:15

This is where our customers, their workloads,

2:18

their virtual machines and all of their code,

2:19

it runs here on the chip.

2:21

Inside the chip are 132 individual little CPUs.

2:26

- [Connor] Oh.

2:28

- And there's hardware

2:29

and software that allows us to take those CPUs

2:30

and then share them with our customers

2:32

in a secure and safe way.

2:34

What I appreciate the most about the server blade

2:37

is the efficient use of space.

2:39

We spent a lot of design energy

2:41

to make sure we could have two individual setups

2:45

in the blade,

2:46

and this too really allows for more power efficiency

2:48

and performance.

2:49

- I see.

2:50

But this is just one example

2:51

of how Microsoft's thinking about systems-level design.

2:54

We also have GPUs as well.

2:56

- That's right.

2:57

If we were to go down the datacenter together,

2:59

we would see blades that maybe look like this one

3:02

that were CPUs.

3:03

If we were to go farther down the datacenter,

3:05

you would see maybe a AI accelerator rack or a GPU rack.

3:09

In fact, I brought something. I'll show it to you.

3:10

- Oh.

3:12

- Okay.

3:13

So this is what we call an accelerator module.

3:16

- Okay.

3:17

- And it's from our Maia product.

3:18

And what we do in a Maia server

3:20

is actually four of them.

3:21

- Oh, four.

3:22

- Next to each other, all together.

3:24

And what's particularly noticeable

3:26

is it has an integrated set of liquid cooling pipes.

3:33

The amount of compute and power that it's consuming

3:35

for that AI workload is higher.

3:38

And so they've invested in this closed-loop liquid cooling

3:41

that allows coolant to come right down on top of the chip

3:44

and then back out and recycle.

3:46

- And the previous infrastructure

3:47

is not meeting the demands of today,

3:49

or what's a driving force behind this?

3:51

- We're really seeing quite the evolution

3:53

of customer workloads right now.

3:55

Technologies like Copilot are a great example.

3:57

Those scenarios are requiring so much more

3:59

out of the rest of our infrastructure,

4:01

more compute, more storage, more networking,

4:04

really delivering on those experiences.

4:07

- So, Pat, this is the end result,

4:08

but I'd love to sit down and talk about

4:10

how this all comes together.

4:12

- All right.

4:12

(gentle music)

4:15

- Every customer that I'm speaking to is asking about AI.

4:18

What's next?

4:19

But before we get into the future of that,

4:21

let's go back to the beginning.

4:23

How did this custom silicon journey begin at Microsoft?

4:27

- I think it's important to remember,

4:28

Microsoft was always a silicon company,

4:30

has been for a long time.

4:31

Look at early versions of Xbox and Surface,

4:35

it's true that something like a Cobalt or a Maia,

4:37

that's a much larger scale project

4:39

and it requires different kinds of talent

4:41

and capability and investment.

4:42

You have to take silicon engineering and server engineering

4:46

and software developers and our customers,

4:49

and put them together to get to the right design.

4:51

And it's thousands of people and years of engineering.

4:56

When we think about an infrastructure product,

4:57

it's more than just the chip

5:00

and the software that runs on it.

5:02

It's the chip and the server, the networking, the racks,

5:05

the datacenter that they live in.

5:07

- I see.

5:08

So it sounds like silicon was not a missing part,

5:11

but it is part of every layer of the stack.

5:14

- That's where the workload runs.

5:15

For a while now, we've been at the journey of designing

5:17

and architecting the silicon.

5:19

And for us it really starts with a customer.

5:21

It starts with a product.

5:22

It starts with something you're trying to achieve.

5:24

And I think you talked to Rani and Selim about that as well.

5:28

- Our silicon strategy is informed by our expertise

5:31

in actually running big, complex workloads in the cloud,

5:36

both for the commercial cloud and AI.

5:39

It's not about just building silicon.

5:41

You have to understand the requirements of the workload.

5:45

You have to work with the other teams at Microsoft

5:50

to co-design and co-develop,

5:52

so that one plus one is greater than two.

5:56

- What goes into the process of building these chips?

5:59

- We start the custom silicon design process

6:02

by first identifying a problem to go after.

6:06

Then we create the architecture of the product.

6:09

We transfer the database to our foundry partners,

6:12

where they go layer by layer, from transistors to metals,

6:15

and they deliver us the produced wafer.

6:21

- These chips have billions of transistors

6:24

connected by billions of wires.

6:26

- Wow.

6:27

- So we have silicon teams globally

6:30

to make sure that every single one of those is working.

6:34

- It goes through wafer-level testing,

6:37

goes through packaging, package-level testing,

6:41

and then it comes back to our labs

6:44

to do post-silicon validation.

6:47

- We looked at all the cooling, the power, the performance

6:51

before you can say,

6:52

"Okay, it's ready to start running in the datacenter."

6:58

- So, Pat, we've talked a lot about the server blade,

7:00

but what about the datacenter?

7:02

How is that purpose-built?

7:04

- The blade in the rack

7:05

is part of the overall datacenter infrastructure.

7:09

We're thoughtful about

7:10

how much power density you can put in each rack.

7:13

We're thoughtful about the cooling.

7:14

And in the Cobalt blade that we talked about,

7:16

each of the CPUs on the chip,

7:18

it can be run at a different performance point.

7:21

And so we can turn down each of the individual CPU cores

7:24

so that the overall power consumption

7:26

is at the most efficient or lowest part.

7:29

When we think about some of the liquid cooling work,

7:32

they're closed-loop systems where they're filled with water

7:36

or a liquid once,

7:37

but then they're recycled back and forth

7:39

between hot and cold.

7:40

We're not consuming more water

7:42

every time you try to cool the system.

7:44

- So it's really thinking about

7:46

this almost from first principles,

7:47

bringing sustainability in mind.

7:49

- That's right. Absolutely.

7:50

- So what does all of this mean

7:52

for the future of AI workloads,

7:54

customer workloads, infrastructure?

7:57

- We're not just designing servers,

8:00

chips, networking components.

8:02

We're designing them all together

8:03

for the workloads of today,

8:05

but also for the workloads of tomorrow.

8:07

Microsoft now has the ability to build custom silicon.

8:11

We've demonstrated it.

8:12

We can tailor it to our workloads

8:15

and to our customers inside and out.

8:17

The place we're going is more of that, more specialization,

8:21

more tailoring to the needs of workloads like Teams

8:26

or even Copilot.

8:29

- What 1P silicon means for my team

8:32

is new tools that allow them to do their job better,

8:36

new tools that allow them to produce solutions

8:39

that they couldn't produce before.

8:42

And not only push it into the PaaS and IaaS layers

8:46

that we already had with Azure,

8:48

but push it all the way down into the silicon,

8:51

all the way down to the transistors

8:53

that make the cloud real.

8:54

(gentle music)

8:57

- What is amazing and exciting

9:00

is that we are building these systems

9:03

to power the AI infrastructure.

9:06

We are building the world's computer, world's AI computer,

9:11

and the infrastructure that allows you

9:15

to solve the world's most difficult problems

9:18

for your business.

9:19

(transition whooshing)

9:22

- [Connor] You may never see a Cobalt blade

9:24

or walk through the labs where these chips are tested.

9:27

It's rare to see custom racks being assembled

9:30

or the liquid cooling systems that make them possible.

9:33

But you do feel the impact every time you join a Teams call

9:36

or ask Copilot a question.

9:38

So every time you use technology,

9:40

remember that somewhere in a datacenter

9:42

you may never visit,

9:43

an entire stack of purpose-built infrastructure

9:46

is working in the background,

9:48

driving the AI era and shaping the future of the cloud.

9:52

(transition whooshing)

9:53

(upbeat music)

9:56

(transition clacking)

UNLOCK MORE

Sign up free to access premium features

INTERACTIVE VIEWER

Watch the video with synced subtitles, adjustable overlay, and full playback control.

SIGN UP FREE TO UNLOCK

AI SUMMARY

Get an instant AI-generated summary of the video content, key points, and takeaways.

SIGN UP FREE TO UNLOCK

TRANSLATE

Translate the transcript to 100+ languages with one click. Download in any format.

SIGN UP FREE TO UNLOCK

MIND MAP

Visualize the transcript as an interactive mind map. Understand structure at a glance.

SIGN UP FREE TO UNLOCK

CHAT WITH TRANSCRIPT

Ask questions about the video content. Get answers powered by AI directly from the transcript.

SIGN UP FREE TO UNLOCK

GET MORE FROM YOUR TRANSCRIPTS

Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.