AI in Life Sciences Discovery | Stanford's RAISE Health Symposium 2025
FULL TRANSCRIPT
Nice to see everybody. Thank you so much
for coming back from from break. So I
have the I have the privilege of opening
this session. I'm Nicolola Fuzie. I'm a
general manager of Microsoft Research.
Um and it's been a a great uh day and I
feel like I'm going to take a little bit
of a different perspective um compared
to some of the things that have been
said this morning because I'm going to
focus a lot on models. Um and in
particular I'm going to focus on trying
to learn language of biology using
models and give a few examples. So I
guess the the framing for me is um are
we going to see the kind of same things
that we've seen in NLP natural language
processing happen in biology. in the
natural language processing. Uh if you
kind of look at the history of the
field, we went from a lot of people
working in a very linguistics-based
um way like figuring out which features
are important for models to learn from
and designing very bespoke architectures
to architectures that are much more
general in purpose um uh and learn much
better in terms of compute and data. So
they're much uh more suited for the
compute we have and they can ingest more
data more quickly. And in biology, we're
seeing a lot of models and we've been
seeing them over the years that encode a
lot of symmetries and a lot of
invariences that are important to learn
from. But are we going to see kind of a
similar shift to more general purpose
architectures?
And one good case study from from the
work my team does is on proteins. Um, if
you're familiar with proteins, you know,
the the the simplest way to describe
what's going on is, you know, you have
protein sequences and those sequences
determine a structure and that structure
determines in large parts what what the
protein does. And for the longest time,
if you want to design a protein that
does something that has a particular
function, you design the structure and
then you try try to solve for the
sequence that leads to that structure.
And the problem is that we have, as we
saw this morning, we have a few
character um experimentally
characterized structures. Um and we have
a lot of sequences. That's because
sequences are very cheap to acquire by
sequencing. Um and so for instance, you
can go to a puddle of mud in the middle
of nowhere. You can sequence your sample
from that and you're going to get
probably some good proteins. But if you
want to then experimentally characterize
their structure, you need to do X-ray
crystalallography or cryom which is
expensive and slow. And so uh with the
interest of putting ourselves into a
datari environment, we were asking we
asked the question can you go directly
from sequence to function skipping
structure entirely and treating it more
like a
nuisance. Um and that's what we did. We
and we took a pretty general purpose
architecture. So this is a diffusion
model. So, if you're familiar with
diffusion models, those are the things
that generate the good-looking images
you see on the internet that are
synthetically generated in most cases.
And they work by taking uh a set of
pixels that are completely like noise,
noisy looking, and then they iteratively
den noiseise them through a neural
network to look very natural. Um, and we
do the basically the same thing but just
over discrete sequences. So, instead of
having a continuous value of a pixel, we
have a discrete set of amino acids that
we can put in any position of a protein.
Um, and it turns out that if you do
that, you get proteins that kind of
recapitulate the diversity of natural
proteins very very well. So if you start
from an empty protein and you try to
flip and decide where each different
amino acid goes, you get a bunch more
proteins that are natural looking but
entirely synthetic. And then if you
prompt this model with particular
prompts, you can get it to do very
interesting things such as let's say
that you have a functional motive you
want to present. you can design a
scaffold around it using this model. Or
let's say that you want you have a
protein where you like most of it but
you want to edit a particular region.
You can condition on the part you want
to keep and just uh figure out the the
the region you want to edit. Um and then
there is another example which which is
particularly useful which is let's say
you have a set of related proteins that
do kind of the same thing but you want
different versions that have maybe
different properties like more thermally
stable. you can just keep generating
proteins u from those
examples. And the most important thing
about this work though is that it's it's
a general purpose model. um meaning like
there is nothing special um about the
model that we chose um you can it's a
disc other than being discrete but you
can extend it from just operating over
amino acids to working over general
letters and so you can use that to do a
language model like GPT and in fact we
have later models that we are going to
release soon that are based basically on
the same architecture and the point is
that once you move from models that bake
a lot of knowledge and a lot of
structure structure about biology to
models that are much more in general in
purpose. You actually do not have to
have models that are separate for every
different domain of biology you want to
model. So you don't have to have a
protein model or a single cell RNA seek
model or a pathology model being
different models. Once the architecture
is general enough, you can combine all
of these models under the same
architecture. Um and that's what's
happening in in the field. And and so I
just leave with this slide.
What's happening in the life sciences is
a gradual shift that started many many
years ago. So many many years ago we
used to have um basically one model per
data set. You were the expert in the
data set you acquired you baked all of
your knowledge into the architecture you
trained on it and then you got a good
model. More recently, we shifted to kind
of the era of foundation models where we
put a lot more data and a lot less
structure about the data into the model
and then we shifted to fine-tuning that
foundation model on our particular slice
of the data to make it work better. And
I argue that where we're going towards
is is a is a future where we have one
kind of general model architecture way
of modeling like we've seen in NLP with
the GPT kind of framework. Um, and
learning is not going to happen by
moving necessarily weights around in
weight space by fine-tuning or training,
but rather it's going to happen through
in context learning, which is what we're
seeing a lot in language. Um, and with
that, I think I'm almost out of time.
And so I'm going to introduce Brian up
to the stage. Thank
you. Hi everyone. Uh, so my name is
Brian He. uh I'm a faculty member here
at Stanford and chem and data science
and uh excited to share some of the work
that we've been doing on developing uh
genomic language models across all
domains of life. So all of you are
probably aware that machine learning has
revolutionized the study of molecules.
So for example, methods for protein
structure prediction like alphafold uh
methods for denovo protein design like
RF diffusion were awarded the Nobel
Prize in chemistry last year and this is
very welld deserved. You know molecules
are very complicated. We're still very
interested in molecules. But what we've
been working on is we're also interested
in how these molecules come together to
form say a complete uh organism. uh but
even you know a very simple
single-sellled organism is orders of
UNLOCK MORE
Sign up free to access premium features
INTERACTIVE VIEWER
Watch the video with synced subtitles, adjustable overlay, and full playback control.
AI SUMMARY
Get an instant AI-generated summary of the video content, key points, and takeaways.
TRANSLATE
Translate the transcript to 100+ languages with one click. Download in any format.
MIND MAP
Visualize the transcript as an interactive mind map. Understand structure at a glance.
CHAT WITH TRANSCRIPT
Ask questions about the video content. Get answers powered by AI directly from the transcript.
GET MORE FROM YOUR TRANSCRIPTS
Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.