TRANSCRIÇÃOEnglish

MIT 6.S087: Foundation Models & Generative AI. HOW IT WORKS

1h 2m 59s11,887 palavras1,673 segmentsEnglish

TRANSCRIÇÃO COMPLETA

0:00

all

0:01

right uh okay welcome to the second

0:05

lecture uh on fation mods generative AI

0:09

this one should be a fun one we're going

0:11

to dive into all the different ways we

0:14

train and arrive at this Foundation

0:15

models and generative AI

0:18

um and if you ask me I think that that

0:21

this is kind of the key breakthroughs

0:23

and it's going to give you a wide

0:25

understanding of what's going on I mean

0:27

some people focus more perhaps on

0:29

certain engineering trick that that's

0:31

happened in the last few years but I

0:33

think these are the conceptual

0:35

breakthroughs uh so it's going to be

0:36

exciting to to talk

0:38

about uh all right that's so today we'll

0:41

go through all different algorithms

0:43

meaning how we Define objectives and

0:46

goals for for computers to interact with

0:48

the world and data uh to learn from

0:51

it so quickly recap from uh last class

0:57

right we provided a short suin answer to

1:00

what is foundation models dtive Ai and

1:04

how you learn from observation and that

1:05

meaning is contextual and

1:08

relational that went we went on a little

1:10

bit of a philosophical Journey where we

1:12

asked how's the world structured right

1:15

somehow uh the world is very chaotic and

1:17

we need to deal with that chaos because

1:19

math won't save us so that's where new

1:21

networks and and the new type of AI

1:24

comes in and and helps out and that if

1:27

you want to learn from the world like

1:30

supervised learning when you learn from

1:32

an expert doesn't scale well because you

1:34

rely on human beings that have to label

1:36

the whole world the whole world cannot

1:38

be labeled so it doesn't generalize well

1:41

and reinforcement learning also doesn't

1:43

work because it's too risky and too slow

1:46

if you have no starting point and we

1:48

going to talk about this in this class

1:50

like if you have if you have some

1:51

starting point you can do it but if you

1:52

have no world model on a standing off

1:54

the world what server you cannot do

1:56

reinforcement learning because you don't

1:57

even know where to start you'll make no

1:58

progress and you unfortunately die way

2:01

before you make any progress whatsoever

2:04

that's why the technique behind

2:06

Foundation models generative AI

2:08

generative AI called self-supervised

2:10

learning that's key right some people

2:12

call this unsupervised learning the the

2:14

correct term is self-supervised learning

2:15

but that's how we arrive at these these

2:18

Technologies okay um right so we learn

2:22

from No Label data we learn from this

2:24

data in general which means it scales

2:26

really well we just needed data and then

2:28

we can learn the structure for from

2:30

that uh all

2:34

right and again we said how do you learn

2:37

what a dog is well you learn what a dog

2:39

is from observing dogs in different

2:41

context you correlate and contrast dogs

2:43

with other Concepts like cats and then

2:45

in turn you also learn about cats you

2:47

get this very relational understanding

2:49

of meaning and that's what we're

2:50

leveraging

2:51

here so H today we're going to talk

2:55

about uh these different approaches more

2:57

in detail so we'll talk about natural

2:59

language process processing in language

3:01

uh basically the the what happened in

3:05

the beginning of early days of natural

3:07

language processing and then how we

3:08

arrived at chat GP type of Technologies

3:10

and this includes Cal language modeling

3:13

CLM and mass language modeling MLM we'll

3:16

talk about contrasted learning which is

3:18

uh very popular when it comes to vision

3:20

and images we'll talk about puzzles and

3:23

games uh the noising diffusion uh also

3:27

very popular in text image generation

3:29

and like stable diffusion order encoders

3:32

Gans so generative adversarial networks

3:35

new networks and then we'll talk about a

3:37

little bit about generative approaches

3:39

and repres versus representation

3:41

learning and then we'll talk also about

3:43

autonomous agents a little bit uh all

3:47

right so let's get started so we're

3:49

going to start with language which is

3:52

also basically where a lot of these uh

3:55

kind of conceptual breakthroughs

3:56

actually started and so langu language

3:59

is a little bit special and that's what

4:00

I'm going to argue as well actually the

4:02

language is kind of special it's uh it's

4:05

man-made right we created it for some

4:06

kind of purpose uh we don't only

4:09

communicate in terms of language we also

4:12

think in terms of language and that

4:14

might even be the more interesting and

4:16

important component of language is that

4:18

we think in terms of it uh rather than

4:21

it it allows us to talk to other people

4:23

and I think if we if we came across a

4:25

intelligent other life form even if they

4:28

weren't able to communicate with each

4:29

other they will still have a language to

4:31

able to think and plan Etc and we're

4:33

going to talk about this later and

4:35

really it's an efficient Universal

4:38

Medium for transporting and verifying

4:40

ideas and we'll try to make this more um

4:43

tangible later but this also kind of

4:45

hints at how we can use these large

4:47

language models to understand language

4:49

to create kind of autonomous agents and

4:52

even more humanlike intelligence because

4:54

a lot of this is hidden in language

4:57

itself Okay so so now it's several

5:02

several years ago I'm getting old but

5:04

when I started off my career at Stanford

5:06

uh 12 years ago I think it was don't

5:09

quote me on that but uh then there was a

5:12

specific research team data set and like

5:16

model for each specific language task so

5:19

you would have one research team H one

5:21

data set and one model that they were

5:23

optimizing right an algorithm for

5:25

translation and then you would have a

5:26

separate research team model and data

5:28

set for question answer Ing and then

5:30

another you know isolated project and

5:32

and data and model and and researches

5:34

around classification and prediction and

5:36

R Etc right so these are kind of

5:37

isolated efforts that people were

5:39

optimizing specializing forign Building

5:40

Solutions and collecting data but you

5:43

know we started asking ourselves like

5:45

hey is this actually good are we

5:47

spreading ourselves thin here we're all

5:50

working on Solutions around language and

5:52

understanding language you know this

5:54

seems to be very related task doesn't

5:56

seem like human beings have separate

5:58

brains for each each different language

6:00

tasks so maybe there is some objective

6:03

or some something in language that we

6:05

can optimize for and learn they kind of

6:07

get the underlying problem of

6:09

understanding language and then we just

6:11

see all this all of these different

6:12

tasks as just kind of Downstream tasks

6:15

that we use this big good language

6:18

understanding brain to to to solve right

6:20

but the maybe we can optimize that

6:23

instead so that's what we start asking

6:26

ourselves right um

6:30

and right let's say we want to

6:31

accomplish this right let's say we want

6:33

to optimize and learn some type of

6:34

language how could this look

6:37

like well um somehow we want to be able

6:40

to uh digest like our model AI model or

6:43

computer model right to able to digest

6:45

language and then put it on some kind of

6:48

representation space or feature space

6:50

right into some useful format and that

6:53

we can use that format and and kind of

6:55

send it to other task right so let say

6:57

you know let's say we have this type of

7:00

AI model is able to digest language kind

7:02

of encode its meaning right into some

7:05

representation like numbers for example

7:08

and then we can then feed this to all

7:10

the different tasks right that's a

7:11

really good starting point because if

7:12

we're able to kind of featu and

7:15

represent language and and we also get

7:17

it the real meaning of the text it's

7:19

very very useful as as useful tool for

7:22

these Downstream tasks

7:24

and so this would be nice to have that's

7:28

what do people start working towards

7:29

basically featuing and representation

7:32

learning on language where sentences and

7:35

text that has similar meanings are

7:37

mapped very close in this in this High

7:39

dimensional meaning space um and that

7:42

it's like a very very nuanced granular

DESBLOQUEAR MAIS

Registe-se gratuitamente para aceder a funcionalidades premium

VISUALIZADOR INTERATIVO

Assista ao vídeo com legendas sincronizadas, sobreposição ajustável e controlo total da reprodução.

REGISTE-SE GRATUITAMENTE PARA DESBLOQUEAR

RESUMO DE IA

Obtenha um resumo instantâneo gerado por IA do conteúdo do vídeo, pontos-chave e conclusões.

REGISTE-SE GRATUITAMENTE PARA DESBLOQUEAR

TRADUZIR

Traduza a transcrição para mais de 100 idiomas com um clique. Baixe em qualquer formato.

REGISTE-SE GRATUITAMENTE PARA DESBLOQUEAR

MAPA MENTAL

Visualize a transcrição como um mapa mental interativo. Entenda a estrutura rapidamente.

REGISTE-SE GRATUITAMENTE PARA DESBLOQUEAR

CONVERSAR COM A TRANSCRIÇÃO

Faça perguntas sobre o conteúdo do vídeo. Obtenha respostas com tecnologia de IA diretamente da transcrição.

REGISTE-SE GRATUITAMENTE PARA DESBLOQUEAR

APROVEITE MAIS DE SUAS TRANSCRIÇÕES

Inscreva-se gratuitamente e desbloqueie o visualizador interativo, resumos de IA, traduções, mapas mentais e muito mais. Não é necessário cartão de crédito.