TRANSCRIPTEnglish

AI in Life Sciences Discovery | Stanford's RAISE Health Symposium 2025

26m 6s4,889 words720 segmentsEnglish

FULL TRANSCRIPT

0:04

Nice to see everybody. Thank you so much

0:06

for coming back from from break. So I

0:08

have the I have the privilege of opening

0:10

this session. I'm Nicolola Fuzie. I'm a

0:12

general manager of Microsoft Research.

0:14

Um and it's been a a great uh day and I

0:19

feel like I'm going to take a little bit

0:20

of a different perspective um compared

0:23

to some of the things that have been

0:24

said this morning because I'm going to

0:25

focus a lot on models. Um and in

0:28

particular I'm going to focus on trying

0:30

to learn language of biology using

0:32

models and give a few examples. So I

0:35

guess the the framing for me is um are

0:39

we going to see the kind of same things

0:42

that we've seen in NLP natural language

0:43

processing happen in biology. in the

0:45

natural language processing. Uh if you

0:48

kind of look at the history of the

0:50

field, we went from a lot of people

0:52

working in a very linguistics-based

0:55

um way like figuring out which features

0:58

are important for models to learn from

1:00

and designing very bespoke architectures

1:03

to architectures that are much more

1:05

general in purpose um uh and learn much

1:08

better in terms of compute and data. So

1:10

they're much uh more suited for the

1:12

compute we have and they can ingest more

1:15

data more quickly. And in biology, we're

1:18

seeing a lot of models and we've been

1:19

seeing them over the years that encode a

1:22

lot of symmetries and a lot of

1:24

invariences that are important to learn

1:26

from. But are we going to see kind of a

1:28

similar shift to more general purpose

1:32

architectures?

1:34

And one good case study from from the

1:37

work my team does is on proteins. Um, if

1:40

you're familiar with proteins, you know,

1:42

the the the simplest way to describe

1:44

what's going on is, you know, you have

1:45

protein sequences and those sequences

1:47

determine a structure and that structure

1:49

determines in large parts what what the

1:52

protein does. And for the longest time,

1:55

if you want to design a protein that

1:56

does something that has a particular

1:58

function, you design the structure and

1:59

then you try try to solve for the

2:01

sequence that leads to that structure.

2:04

And the problem is that we have, as we

2:07

saw this morning, we have a few

2:10

character um experimentally

2:11

characterized structures. Um and we have

2:14

a lot of sequences. That's because

2:16

sequences are very cheap to acquire by

2:18

sequencing. Um and so for instance, you

2:20

can go to a puddle of mud in the middle

2:21

of nowhere. You can sequence your sample

2:24

from that and you're going to get

2:25

probably some good proteins. But if you

2:27

want to then experimentally characterize

2:28

their structure, you need to do X-ray

2:30

crystalallography or cryom which is

2:32

expensive and slow. And so uh with the

2:35

interest of putting ourselves into a

2:37

datari environment, we were asking we

2:39

asked the question can you go directly

2:42

from sequence to function skipping

2:43

structure entirely and treating it more

2:45

like a

2:47

nuisance. Um and that's what we did. We

2:49

and we took a pretty general purpose

2:51

architecture. So this is a diffusion

2:52

model. So, if you're familiar with

2:54

diffusion models, those are the things

2:56

that generate the good-looking images

2:58

you see on the internet that are

2:59

synthetically generated in most cases.

3:01

And they work by taking uh a set of

3:04

pixels that are completely like noise,

3:05

noisy looking, and then they iteratively

3:08

den noiseise them through a neural

3:09

network to look very natural. Um, and we

3:11

do the basically the same thing but just

3:13

over discrete sequences. So, instead of

3:15

having a continuous value of a pixel, we

3:18

have a discrete set of amino acids that

3:20

we can put in any position of a protein.

3:22

Um, and it turns out that if you do

3:24

that, you get proteins that kind of

3:26

recapitulate the diversity of natural

3:28

proteins very very well. So if you start

3:30

from an empty protein and you try to

3:32

flip and decide where each different

3:34

amino acid goes, you get a bunch more

3:36

proteins that are natural looking but

3:40

entirely synthetic. And then if you

3:41

prompt this model with particular

3:43

prompts, you can get it to do very

3:45

interesting things such as let's say

3:47

that you have a functional motive you

3:49

want to present. you can design a

3:50

scaffold around it using this model. Or

3:53

let's say that you want you have a

3:54

protein where you like most of it but

3:56

you want to edit a particular region.

3:58

You can condition on the part you want

4:00

to keep and just uh figure out the the

4:01

the region you want to edit. Um and then

4:04

there is another example which which is

4:06

particularly useful which is let's say

4:07

you have a set of related proteins that

4:09

do kind of the same thing but you want

4:10

different versions that have maybe

4:12

different properties like more thermally

4:13

stable. you can just keep generating

4:16

proteins u from those

4:19

examples. And the most important thing

4:22

about this work though is that it's it's

4:24

a general purpose model. um meaning like

4:27

there is nothing special um about the

4:29

model that we chose um you can it's a

4:32

disc other than being discrete but you

4:34

can extend it from just operating over

4:37

amino acids to working over general

4:39

letters and so you can use that to do a

4:42

language model like GPT and in fact we

4:44

have later models that we are going to

4:46

release soon that are based basically on

4:48

the same architecture and the point is

4:51

that once you move from models that bake

4:54

a lot of knowledge and a lot of

4:55

structure structure about biology to

4:57

models that are much more in general in

4:59

purpose. You actually do not have to

5:01

have models that are separate for every

5:03

different domain of biology you want to

5:05

model. So you don't have to have a

5:06

protein model or a single cell RNA seek

5:09

model or a pathology model being

5:11

different models. Once the architecture

5:13

is general enough, you can combine all

5:14

of these models under the same

5:16

architecture. Um and that's what's

5:18

happening in in the field. And and so I

5:22

just leave with this slide.

5:24

What's happening in the life sciences is

5:27

a gradual shift that started many many

5:30

years ago. So many many years ago we

5:32

used to have um basically one model per

5:34

data set. You were the expert in the

5:36

data set you acquired you baked all of

5:39

your knowledge into the architecture you

5:40

trained on it and then you got a good

5:42

model. More recently, we shifted to kind

5:45

of the era of foundation models where we

5:47

put a lot more data and a lot less

5:49

structure about the data into the model

5:51

and then we shifted to fine-tuning that

5:53

foundation model on our particular slice

5:56

of the data to make it work better. And

5:58

I argue that where we're going towards

6:02

is is a is a future where we have one

6:04

kind of general model architecture way

6:06

of modeling like we've seen in NLP with

6:08

the GPT kind of framework. Um, and

6:11

learning is not going to happen by

6:13

moving necessarily weights around in

6:15

weight space by fine-tuning or training,

6:17

but rather it's going to happen through

6:19

in context learning, which is what we're

6:20

seeing a lot in language. Um, and with

6:23

that, I think I'm almost out of time.

6:26

And so I'm going to introduce Brian up

6:29

to the stage. Thank

6:35

you. Hi everyone. Uh, so my name is

6:37

Brian He. uh I'm a faculty member here

6:40

at Stanford and chem and data science

6:42

and uh excited to share some of the work

6:44

that we've been doing on developing uh

6:46

genomic language models across all

6:48

domains of life. So all of you are

6:51

probably aware that machine learning has

6:53

revolutionized the study of molecules.

6:55

So for example, methods for protein

6:57

structure prediction like alphafold uh

7:00

methods for denovo protein design like

7:02

RF diffusion were awarded the Nobel

7:04

Prize in chemistry last year and this is

7:07

very welld deserved. You know molecules

7:09

are very complicated. We're still very

7:11

interested in molecules. But what we've

7:14

been working on is we're also interested

7:16

in how these molecules come together to

7:19

form say a complete uh organism. uh but

7:22

even you know a very simple

7:24

single-sellled organism is orders of

UNLOCK MORE

Sign up free to access premium features

INTERACTIVE VIEWER

Watch the video with synced subtitles, adjustable overlay, and full playback control.

SIGN UP FREE TO UNLOCK

AI SUMMARY

Get an instant AI-generated summary of the video content, key points, and takeaways.

SIGN UP FREE TO UNLOCK

TRANSLATE

Translate the transcript to 100+ languages with one click. Download in any format.

SIGN UP FREE TO UNLOCK

MIND MAP

Visualize the transcript as an interactive mind map. Understand structure at a glance.

SIGN UP FREE TO UNLOCK

CHAT WITH TRANSCRIPT

Ask questions about the video content. Get answers powered by AI directly from the transcript.

SIGN UP FREE TO UNLOCK

GET MORE FROM YOUR TRANSCRIPTS

Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.

TRY YOUTUBETRANSCRIPT.DEV GET STARTED FREE

AI in Life Sciences Di… - Full Transcript | YouTubeTranscript.dev