Silicon to Systems: What powers Microsoft's AI Infrastructure
FULL TRANSCRIPT
(gentle music)
- [Connor] Have you ever wondered what happens
the moment you ask Copilot a question?
We use AI every day
without ever seeing the infrastructure underneath.
What's behind the scenes of these tools we rely on?
And what does it actually take to make something like Teams
or Copilot feel instant, intelligent, and reliable?
The real story lies in the infrastructure,
the part that no one ever sees,
the technology that makes AI possible
at this incredible scale.
In the era of AI, we are rethinking everything.
New silicon, new cooling, new networking, new architectures.
This is a complete re-engineering of the stack.
Let's explore the hidden machinery
that supports the cloud and AI you use every day
and the work underway to rebuild every layer
to fuel the next frontier of AI innovation.
From silicon to systems.
(upbeat music)
I'm Connor Doyle,
and for the past seven years, when I think about it,
a lot of my time has been spent on calls.
And lately, AI has become a big part of how I get work done.
So today, I'm meeting with Pat Stemen,
one of the engineers
working on the fundamental building blocks of all of this
to help me better understand how infrastructure is evolving,
starting with one of our innovative server blades.
Wow. What are we looking at here?
What is this?
- This is our brand new Cobalt 200 server.
It's got our latest Cobalt 200 SoC in it,
the latest set of Azure Boosts
and our storage and networking technologies.
And it's the foundation of Microsoft products
like Teams and Azure SQL, and others.
- So this looks a little bit different
from like my laptop at home-
- [Pat] Sure. - I might use, like...
- Yeah, it is different, but in many ways the same.
It's designed though for scale.
And it has some of the same components
your laptop might have.
So you might say this is the heartbeat,
this is the Cobalt 200 system on chip.
This is where our customers, their workloads,
their virtual machines and all of their code,
it runs here on the chip.
Inside the chip are 132 individual little CPUs.
- [Connor] Oh.
- And there's hardware
and software that allows us to take those CPUs
and then share them with our customers
in a secure and safe way.
What I appreciate the most about the server blade
is the efficient use of space.
We spent a lot of design energy
to make sure we could have two individual setups
in the blade,
and this too really allows for more power efficiency
and performance.
- I see.
But this is just one example
of how Microsoft's thinking about systems-level design.
We also have GPUs as well.
- That's right.
If we were to go down the datacenter together,
we would see blades that maybe look like this one
that were CPUs.
If we were to go farther down the datacenter,
you would see maybe a AI accelerator rack or a GPU rack.
In fact, I brought something. I'll show it to you.
- Oh.
- Okay.
So this is what we call an accelerator module.
- Okay.
- And it's from our Maia product.
And what we do in a Maia server
is actually four of them.
- Oh, four.
- Next to each other, all together.
And what's particularly noticeable
is it has an integrated set of liquid cooling pipes.
The amount of compute and power that it's consuming
for that AI workload is higher.
And so they've invested in this closed-loop liquid cooling
that allows coolant to come right down on top of the chip
and then back out and recycle.
- And the previous infrastructure
is not meeting the demands of today,
or what's a driving force behind this?
- We're really seeing quite the evolution
of customer workloads right now.
Technologies like Copilot are a great example.
Those scenarios are requiring so much more
out of the rest of our infrastructure,
more compute, more storage, more networking,
really delivering on those experiences.
- So, Pat, this is the end result,
but I'd love to sit down and talk about
how this all comes together.
- All right.
(gentle music)
- Every customer that I'm speaking to is asking about AI.
What's next?
But before we get into the future of that,
let's go back to the beginning.
How did this custom silicon journey begin at Microsoft?
- I think it's important to remember,
Microsoft was always a silicon company,
has been for a long time.
Look at early versions of Xbox and Surface,
it's true that something like a Cobalt or a Maia,
that's a much larger scale project
and it requires different kinds of talent
and capability and investment.
You have to take silicon engineering and server engineering
and software developers and our customers,
and put them together to get to the right design.
And it's thousands of people and years of engineering.
When we think about an infrastructure product,
it's more than just the chip
and the software that runs on it.
It's the chip and the server, the networking, the racks,
the datacenter that they live in.
- I see.
So it sounds like silicon was not a missing part,
but it is part of every layer of the stack.
- That's where the workload runs.
For a while now, we've been at the journey of designing
and architecting the silicon.
And for us it really starts with a customer.
It starts with a product.
It starts with something you're trying to achieve.
And I think you talked to Rani and Selim about that as well.
- Our silicon strategy is informed by our expertise
in actually running big, complex workloads in the cloud,
both for the commercial cloud and AI.
It's not about just building silicon.
You have to understand the requirements of the workload.
You have to work with the other teams at Microsoft
to co-design and co-develop,
so that one plus one is greater than two.
- What goes into the process of building these chips?
- We start the custom silicon design process
by first identifying a problem to go after.
Then we create the architecture of the product.
We transfer the database to our foundry partners,
where they go layer by layer, from transistors to metals,
and they deliver us the produced wafer.
- These chips have billions of transistors
connected by billions of wires.
- Wow.
- So we have silicon teams globally
to make sure that every single one of those is working.
- It goes through wafer-level testing,
goes through packaging, package-level testing,
and then it comes back to our labs
to do post-silicon validation.
- We looked at all the cooling, the power, the performance
before you can say,
"Okay, it's ready to start running in the datacenter."
- So, Pat, we've talked a lot about the server blade,
but what about the datacenter?
How is that purpose-built?
- The blade in the rack
is part of the overall datacenter infrastructure.
We're thoughtful about
how much power density you can put in each rack.
We're thoughtful about the cooling.
And in the Cobalt blade that we talked about,
each of the CPUs on the chip,
it can be run at a different performance point.
And so we can turn down each of the individual CPU cores
so that the overall power consumption
is at the most efficient or lowest part.
When we think about some of the liquid cooling work,
they're closed-loop systems where they're filled with water
or a liquid once,
but then they're recycled back and forth
between hot and cold.
We're not consuming more water
every time you try to cool the system.
- So it's really thinking about
this almost from first principles,
bringing sustainability in mind.
- That's right. Absolutely.
- So what does all of this mean
for the future of AI workloads,
customer workloads, infrastructure?
- We're not just designing servers,
chips, networking components.
We're designing them all together
for the workloads of today,
but also for the workloads of tomorrow.
Microsoft now has the ability to build custom silicon.
We've demonstrated it.
We can tailor it to our workloads
and to our customers inside and out.
The place we're going is more of that, more specialization,
more tailoring to the needs of workloads like Teams
or even Copilot.
- What 1P silicon means for my team
is new tools that allow them to do their job better,
new tools that allow them to produce solutions
that they couldn't produce before.
And not only push it into the PaaS and IaaS layers
that we already had with Azure,
but push it all the way down into the silicon,
all the way down to the transistors
that make the cloud real.
(gentle music)
- What is amazing and exciting
is that we are building these systems
to power the AI infrastructure.
We are building the world's computer, world's AI computer,
and the infrastructure that allows you
to solve the world's most difficult problems
for your business.
(transition whooshing)
- [Connor] You may never see a Cobalt blade
or walk through the labs where these chips are tested.
It's rare to see custom racks being assembled
or the liquid cooling systems that make them possible.
But you do feel the impact every time you join a Teams call
or ask Copilot a question.
So every time you use technology,
remember that somewhere in a datacenter
you may never visit,
an entire stack of purpose-built infrastructure
is working in the background,
driving the AI era and shaping the future of the cloud.
(transition whooshing)
(upbeat music)
(transition clacking)
UNLOCK MORE
Sign up free to access premium features
INTERACTIVE VIEWER
Watch the video with synced subtitles, adjustable overlay, and full playback control.
AI SUMMARY
Get an instant AI-generated summary of the video content, key points, and takeaways.
TRANSLATE
Translate the transcript to 100+ languages with one click. Download in any format.
MIND MAP
Visualize the transcript as an interactive mind map. Understand structure at a glance.
CHAT WITH TRANSCRIPT
Ask questions about the video content. Get answers powered by AI directly from the transcript.
GET MORE FROM YOUR TRANSCRIPTS
Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.