TRANSCRIPTEnglish

MIT 6.S087: Foundation Models & Generative AI. HOW IT WORKS

1h 2m 59s11,887 words1,673 segmentsEnglish

FULL TRANSCRIPT

0:00

all

0:01

right uh okay welcome to the second

0:05

lecture uh on fation mods generative AI

0:09

this one should be a fun one we're going

0:11

to dive into all the different ways we

0:14

train and arrive at this Foundation

0:15

models and generative AI

0:18

um and if you ask me I think that that

0:21

this is kind of the key breakthroughs

0:23

and it's going to give you a wide

0:25

understanding of what's going on I mean

0:27

some people focus more perhaps on

0:29

certain engineering trick that that's

0:31

happened in the last few years but I

0:33

think these are the conceptual

0:35

breakthroughs uh so it's going to be

0:36

exciting to to talk

0:38

about uh all right that's so today we'll

0:41

go through all different algorithms

0:43

meaning how we Define objectives and

0:46

goals for for computers to interact with

0:48

the world and data uh to learn from

0:51

it so quickly recap from uh last class

0:57

right we provided a short suin answer to

1:00

what is foundation models dtive Ai and

1:04

how you learn from observation and that

1:05

meaning is contextual and

1:08

relational that went we went on a little

1:10

bit of a philosophical Journey where we

1:12

asked how's the world structured right

1:15

somehow uh the world is very chaotic and

1:17

we need to deal with that chaos because

1:19

math won't save us so that's where new

1:21

networks and and the new type of AI

1:24

comes in and and helps out and that if

1:27

you want to learn from the world like

1:30

supervised learning when you learn from

1:32

an expert doesn't scale well because you

1:34

rely on human beings that have to label

1:36

the whole world the whole world cannot

1:38

be labeled so it doesn't generalize well

1:41

and reinforcement learning also doesn't

1:43

work because it's too risky and too slow

1:46

if you have no starting point and we

1:48

going to talk about this in this class

1:50

like if you have if you have some

1:51

starting point you can do it but if you

1:52

have no world model on a standing off

1:54

the world what server you cannot do

1:56

reinforcement learning because you don't

1:57

even know where to start you'll make no

1:58

progress and you unfortunately die way

2:01

before you make any progress whatsoever

2:04

that's why the technique behind

2:06

Foundation models generative AI

2:08

generative AI called self-supervised

2:10

learning that's key right some people

2:12

call this unsupervised learning the the

2:14

correct term is self-supervised learning

2:15

but that's how we arrive at these these

2:18

Technologies okay um right so we learn

2:22

from No Label data we learn from this

2:24

data in general which means it scales

2:26

really well we just needed data and then

2:28

we can learn the structure for from

2:30

that uh all

2:34

right and again we said how do you learn

2:37

what a dog is well you learn what a dog

2:39

is from observing dogs in different

2:41

context you correlate and contrast dogs

2:43

with other Concepts like cats and then

2:45

in turn you also learn about cats you

2:47

get this very relational understanding

2:49

of meaning and that's what we're

2:50

leveraging

2:51

here so H today we're going to talk

2:55

about uh these different approaches more

2:57

in detail so we'll talk about natural

2:59

language process processing in language

3:01

uh basically the the what happened in

3:05

the beginning of early days of natural

3:07

language processing and then how we

3:08

arrived at chat GP type of Technologies

3:10

and this includes Cal language modeling

3:13

CLM and mass language modeling MLM we'll

3:16

talk about contrasted learning which is

3:18

uh very popular when it comes to vision

3:20

and images we'll talk about puzzles and

3:23

games uh the noising diffusion uh also

3:27

very popular in text image generation

3:29

and like stable diffusion order encoders

3:32

Gans so generative adversarial networks

3:35

new networks and then we'll talk about a

3:37

little bit about generative approaches

3:39

and repres versus representation

3:41

learning and then we'll talk also about

3:43

autonomous agents a little bit uh all

3:47

right so let's get started so we're

3:49

going to start with language which is

3:52

also basically where a lot of these uh

3:55

kind of conceptual breakthroughs

3:56

actually started and so langu language

3:59

is a little bit special and that's what

4:00

I'm going to argue as well actually the

4:02

language is kind of special it's uh it's

4:05

man-made right we created it for some

4:06

kind of purpose uh we don't only

4:09

communicate in terms of language we also

4:12

think in terms of language and that

4:14

might even be the more interesting and

4:16

important component of language is that

4:18

we think in terms of it uh rather than

4:21

it it allows us to talk to other people

4:23

and I think if we if we came across a

4:25

intelligent other life form even if they

4:28

weren't able to communicate with each

4:29

other they will still have a language to

4:31

able to think and plan Etc and we're

4:33

going to talk about this later and

4:35

really it's an efficient Universal

4:38

Medium for transporting and verifying

4:40

ideas and we'll try to make this more um

4:43

tangible later but this also kind of

4:45

hints at how we can use these large

4:47

language models to understand language

4:49

to create kind of autonomous agents and

4:52

even more humanlike intelligence because

4:54

a lot of this is hidden in language

4:57

itself Okay so so now it's several

5:02

several years ago I'm getting old but

5:04

when I started off my career at Stanford

5:06

uh 12 years ago I think it was don't

5:09

quote me on that but uh then there was a

5:12

specific research team data set and like

5:16

model for each specific language task so

5:19

you would have one research team H one

5:21

data set and one model that they were

5:23

optimizing right an algorithm for

5:25

translation and then you would have a

5:26

separate research team model and data

5:28

set for question answer Ing and then

5:30

another you know isolated project and

5:32

and data and model and and researches

5:34

around classification and prediction and

5:36

R Etc right so these are kind of

5:37

isolated efforts that people were

5:39

optimizing specializing forign Building

5:40

Solutions and collecting data but you

5:43

know we started asking ourselves like

5:45

hey is this actually good are we

5:47

spreading ourselves thin here we're all

5:50

working on Solutions around language and

5:52

understanding language you know this

5:54

seems to be very related task doesn't

5:56

seem like human beings have separate

5:58

brains for each each different language

6:00

tasks so maybe there is some objective

6:03

or some something in language that we

6:05

can optimize for and learn they kind of

6:07

get the underlying problem of

6:09

understanding language and then we just

6:11

see all this all of these different

6:12

tasks as just kind of Downstream tasks

6:15

that we use this big good language

6:18

understanding brain to to to solve right

6:20

but the maybe we can optimize that

6:23

instead so that's what we start asking

6:26

ourselves right um

6:30

and right let's say we want to

6:31

accomplish this right let's say we want

6:33

to optimize and learn some type of

6:34

language how could this look

6:37

like well um somehow we want to be able

6:40

to uh digest like our model AI model or

6:43

computer model right to able to digest

6:45

language and then put it on some kind of

6:48

representation space or feature space

6:50

right into some useful format and that

6:53

we can use that format and and kind of

6:55

send it to other task right so let say

6:57

you know let's say we have this type of

7:00

AI model is able to digest language kind

7:02

of encode its meaning right into some

7:05

representation like numbers for example

7:08

and then we can then feed this to all

7:10

the different tasks right that's a

7:11

really good starting point because if

7:12

we're able to kind of featu and

7:15

represent language and and we also get

7:17

it the real meaning of the text it's

7:19

very very useful as as useful tool for

7:22

these Downstream tasks

7:24

and so this would be nice to have that's

7:28

what do people start working towards

7:29

basically featuing and representation

7:32

learning on language where sentences and

7:35

text that has similar meanings are

7:37

mapped very close in this in this High

7:39

dimensional meaning space um and that

7:42

it's like a very very nuanced granular

UNLOCK MORE

Sign up free to access premium features

INTERACTIVE VIEWER

Watch the video with synced subtitles, adjustable overlay, and full playback control.

SIGN UP FREE TO UNLOCK

AI SUMMARY

Get an instant AI-generated summary of the video content, key points, and takeaways.

SIGN UP FREE TO UNLOCK

TRANSLATE

Translate the transcript to 100+ languages with one click. Download in any format.

SIGN UP FREE TO UNLOCK

MIND MAP

Visualize the transcript as an interactive mind map. Understand structure at a glance.

SIGN UP FREE TO UNLOCK

CHAT WITH TRANSCRIPT

Ask questions about the video content. Get answers powered by AI directly from the transcript.

SIGN UP FREE TO UNLOCK

GET MORE FROM YOUR TRANSCRIPTS

Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.

TRY YOUTUBETRANSCRIPT.DEV GET STARTED FREE

MIT 6.S087: Foundation… - Full Transcript | YouTubeTranscript.dev