TRANSCRIÇÃOEnglish

AI in Life Sciences Discovery | Stanford's RAISE Health Symposium 2025

26m 6s4,889 palavras720 segmentsEnglish

TRANSCRIÇÃO COMPLETA

0:04

Nice to see everybody. Thank you so much

0:06

for coming back from from break. So I

0:08

have the I have the privilege of opening

0:10

this session. I'm Nicolola Fuzie. I'm a

0:12

general manager of Microsoft Research.

0:14

Um and it's been a a great uh day and I

0:19

feel like I'm going to take a little bit

0:20

of a different perspective um compared

0:23

to some of the things that have been

0:24

said this morning because I'm going to

0:25

focus a lot on models. Um and in

0:28

particular I'm going to focus on trying

0:30

to learn language of biology using

0:32

models and give a few examples. So I

0:35

guess the the framing for me is um are

0:39

we going to see the kind of same things

0:42

that we've seen in NLP natural language

0:43

processing happen in biology. in the

0:45

natural language processing. Uh if you

0:48

kind of look at the history of the

0:50

field, we went from a lot of people

0:52

working in a very linguistics-based

0:55

um way like figuring out which features

0:58

are important for models to learn from

1:00

and designing very bespoke architectures

1:03

to architectures that are much more

1:05

general in purpose um uh and learn much

1:08

better in terms of compute and data. So

1:10

they're much uh more suited for the

1:12

compute we have and they can ingest more

1:15

data more quickly. And in biology, we're

1:18

seeing a lot of models and we've been

1:19

seeing them over the years that encode a

1:22

lot of symmetries and a lot of

1:24

invariences that are important to learn

1:26

from. But are we going to see kind of a

1:28

similar shift to more general purpose

1:32

architectures?

1:34

And one good case study from from the

1:37

work my team does is on proteins. Um, if

1:40

you're familiar with proteins, you know,

1:42

the the the simplest way to describe

1:44

what's going on is, you know, you have

1:45

protein sequences and those sequences

1:47

determine a structure and that structure

1:49

determines in large parts what what the

1:52

protein does. And for the longest time,

1:55

if you want to design a protein that

1:56

does something that has a particular

1:58

function, you design the structure and

1:59

then you try try to solve for the

2:01

sequence that leads to that structure.

2:04

And the problem is that we have, as we

2:07

saw this morning, we have a few

2:10

character um experimentally

2:11

characterized structures. Um and we have

2:14

a lot of sequences. That's because

2:16

sequences are very cheap to acquire by

2:18

sequencing. Um and so for instance, you

2:20

can go to a puddle of mud in the middle

2:21

of nowhere. You can sequence your sample

2:24

from that and you're going to get

2:25

probably some good proteins. But if you

2:27

want to then experimentally characterize

2:28

their structure, you need to do X-ray

2:30

crystalallography or cryom which is

2:32

expensive and slow. And so uh with the

2:35

interest of putting ourselves into a

2:37

datari environment, we were asking we

2:39

asked the question can you go directly

2:42

from sequence to function skipping

2:43

structure entirely and treating it more

2:45

like a

2:47

nuisance. Um and that's what we did. We

2:49

and we took a pretty general purpose

2:51

architecture. So this is a diffusion

2:52

model. So, if you're familiar with

2:54

diffusion models, those are the things

2:56

that generate the good-looking images

2:58

you see on the internet that are

2:59

synthetically generated in most cases.

3:01

And they work by taking uh a set of

3:04

pixels that are completely like noise,

3:05

noisy looking, and then they iteratively

3:08

den noiseise them through a neural

3:09

network to look very natural. Um, and we

3:11

do the basically the same thing but just

3:13

over discrete sequences. So, instead of

3:15

having a continuous value of a pixel, we

3:18

have a discrete set of amino acids that

3:20

we can put in any position of a protein.

3:22

Um, and it turns out that if you do

3:24

that, you get proteins that kind of

3:26

recapitulate the diversity of natural

3:28

proteins very very well. So if you start

3:30

from an empty protein and you try to

3:32

flip and decide where each different

3:34

amino acid goes, you get a bunch more

3:36

proteins that are natural looking but

3:40

entirely synthetic. And then if you

3:41

prompt this model with particular

3:43

prompts, you can get it to do very

3:45

interesting things such as let's say

3:47

that you have a functional motive you

3:49

want to present. you can design a

3:50

scaffold around it using this model. Or

3:53

let's say that you want you have a

3:54

protein where you like most of it but

3:56

you want to edit a particular region.

3:58

You can condition on the part you want

4:00

to keep and just uh figure out the the

4:01

the region you want to edit. Um and then

4:04

there is another example which which is

4:06

particularly useful which is let's say

4:07

you have a set of related proteins that

4:09

do kind of the same thing but you want

4:10

different versions that have maybe

4:12

different properties like more thermally

4:13

stable. you can just keep generating

4:16

proteins u from those

4:19

examples. And the most important thing

4:22

about this work though is that it's it's

4:24

a general purpose model. um meaning like

4:27

there is nothing special um about the

4:29

model that we chose um you can it's a

4:32

disc other than being discrete but you

4:34

can extend it from just operating over

4:37

amino acids to working over general

4:39

letters and so you can use that to do a

4:42

language model like GPT and in fact we

4:44

have later models that we are going to

4:46

release soon that are based basically on

4:48

the same architecture and the point is

4:51

that once you move from models that bake

4:54

a lot of knowledge and a lot of

4:55

structure structure about biology to

4:57

models that are much more in general in

4:59

purpose. You actually do not have to

5:01

have models that are separate for every

5:03

different domain of biology you want to

5:05

model. So you don't have to have a

5:06

protein model or a single cell RNA seek

5:09

model or a pathology model being

5:11

different models. Once the architecture

5:13

is general enough, you can combine all

5:14

of these models under the same

5:16

architecture. Um and that's what's

5:18

happening in in the field. And and so I

5:22

just leave with this slide.

5:24

What's happening in the life sciences is

5:27

a gradual shift that started many many

5:30

years ago. So many many years ago we

5:32

used to have um basically one model per

5:34

data set. You were the expert in the

5:36

data set you acquired you baked all of

5:39

your knowledge into the architecture you

5:40

trained on it and then you got a good

5:42

model. More recently, we shifted to kind

5:45

of the era of foundation models where we

5:47

put a lot more data and a lot less

5:49

structure about the data into the model

5:51

and then we shifted to fine-tuning that

5:53

foundation model on our particular slice

5:56

of the data to make it work better. And

5:58

I argue that where we're going towards

6:02

is is a is a future where we have one

6:04

kind of general model architecture way

6:06

of modeling like we've seen in NLP with

6:08

the GPT kind of framework. Um, and

6:11

learning is not going to happen by

6:13

moving necessarily weights around in

6:15

weight space by fine-tuning or training,

6:17

but rather it's going to happen through

6:19

in context learning, which is what we're

6:20

seeing a lot in language. Um, and with

6:23

that, I think I'm almost out of time.

6:26

And so I'm going to introduce Brian up

6:29

to the stage. Thank

6:35

you. Hi everyone. Uh, so my name is

6:37

Brian He. uh I'm a faculty member here

6:40

at Stanford and chem and data science

6:42

and uh excited to share some of the work

6:44

that we've been doing on developing uh

6:46

genomic language models across all

6:48

domains of life. So all of you are

6:51

probably aware that machine learning has

6:53

revolutionized the study of molecules.

6:55

So for example, methods for protein

6:57

structure prediction like alphafold uh

7:00

methods for denovo protein design like

7:02

RF diffusion were awarded the Nobel

7:04

Prize in chemistry last year and this is

7:07

very welld deserved. You know molecules

7:09

are very complicated. We're still very

7:11

interested in molecules. But what we've

7:14

been working on is we're also interested

7:16

in how these molecules come together to

7:19

form say a complete uh organism. uh but

7:22

even you know a very simple

7:24

single-sellled organism is orders of

DESBLOQUEAR MAIS

Registe-se gratuitamente para aceder a funcionalidades premium

VISUALIZADOR INTERATIVO

Assista ao vídeo com legendas sincronizadas, sobreposição ajustável e controlo total da reprodução.

REGISTE-SE GRATUITAMENTE PARA DESBLOQUEAR

RESUMO DE IA

Obtenha um resumo instantâneo gerado por IA do conteúdo do vídeo, pontos-chave e conclusões.

REGISTE-SE GRATUITAMENTE PARA DESBLOQUEAR

TRADUZIR

Traduza a transcrição para mais de 100 idiomas com um clique. Baixe em qualquer formato.

REGISTE-SE GRATUITAMENTE PARA DESBLOQUEAR

MAPA MENTAL

Visualize a transcrição como um mapa mental interativo. Entenda a estrutura rapidamente.

REGISTE-SE GRATUITAMENTE PARA DESBLOQUEAR

CONVERSAR COM A TRANSCRIÇÃO

Faça perguntas sobre o conteúdo do vídeo. Obtenha respostas com tecnologia de IA diretamente da transcrição.

REGISTE-SE GRATUITAMENTE PARA DESBLOQUEAR

APROVEITE MAIS DE SUAS TRANSCRIÇÕES

Inscreva-se gratuitamente e desbloqueie o visualizador interativo, resumos de IA, traduções, mapas mentais e muito mais. Não é necessário cartão de crédito.

EXPERIMENTE YOUTUBETRANSCRIPT.DEV COMECE GRATUITAMENTE

AI in Life Scienc… - Transcrição Completa | YouTubeTranscript.dev