MIT 6.S087: Foundation Models & Generative AI. HOW IT WORKS
FULL TRANSCRIPT
all
right uh okay welcome to the second
lecture uh on fation mods generative AI
this one should be a fun one we're going
to dive into all the different ways we
train and arrive at this Foundation
models and generative AI
um and if you ask me I think that that
this is kind of the key breakthroughs
and it's going to give you a wide
understanding of what's going on I mean
some people focus more perhaps on
certain engineering trick that that's
happened in the last few years but I
think these are the conceptual
breakthroughs uh so it's going to be
exciting to to talk
about uh all right that's so today we'll
go through all different algorithms
meaning how we Define objectives and
goals for for computers to interact with
the world and data uh to learn from
it so quickly recap from uh last class
right we provided a short suin answer to
what is foundation models dtive Ai and
how you learn from observation and that
meaning is contextual and
relational that went we went on a little
bit of a philosophical Journey where we
asked how's the world structured right
somehow uh the world is very chaotic and
we need to deal with that chaos because
math won't save us so that's where new
networks and and the new type of AI
comes in and and helps out and that if
you want to learn from the world like
supervised learning when you learn from
an expert doesn't scale well because you
rely on human beings that have to label
the whole world the whole world cannot
be labeled so it doesn't generalize well
and reinforcement learning also doesn't
work because it's too risky and too slow
if you have no starting point and we
going to talk about this in this class
like if you have if you have some
starting point you can do it but if you
have no world model on a standing off
the world what server you cannot do
reinforcement learning because you don't
even know where to start you'll make no
progress and you unfortunately die way
before you make any progress whatsoever
that's why the technique behind
Foundation models generative AI
generative AI called self-supervised
learning that's key right some people
call this unsupervised learning the the
correct term is self-supervised learning
but that's how we arrive at these these
Technologies okay um right so we learn
from No Label data we learn from this
data in general which means it scales
really well we just needed data and then
we can learn the structure for from
that uh all
right and again we said how do you learn
what a dog is well you learn what a dog
is from observing dogs in different
context you correlate and contrast dogs
with other Concepts like cats and then
in turn you also learn about cats you
get this very relational understanding
of meaning and that's what we're
leveraging
here so H today we're going to talk
about uh these different approaches more
in detail so we'll talk about natural
language process processing in language
uh basically the the what happened in
the beginning of early days of natural
language processing and then how we
arrived at chat GP type of Technologies
and this includes Cal language modeling
CLM and mass language modeling MLM we'll
talk about contrasted learning which is
uh very popular when it comes to vision
and images we'll talk about puzzles and
games uh the noising diffusion uh also
very popular in text image generation
and like stable diffusion order encoders
Gans so generative adversarial networks
new networks and then we'll talk about a
little bit about generative approaches
and repres versus representation
learning and then we'll talk also about
autonomous agents a little bit uh all
right so let's get started so we're
going to start with language which is
also basically where a lot of these uh
kind of conceptual breakthroughs
actually started and so langu language
is a little bit special and that's what
I'm going to argue as well actually the
language is kind of special it's uh it's
man-made right we created it for some
kind of purpose uh we don't only
communicate in terms of language we also
think in terms of language and that
might even be the more interesting and
important component of language is that
we think in terms of it uh rather than
it it allows us to talk to other people
and I think if we if we came across a
intelligent other life form even if they
weren't able to communicate with each
other they will still have a language to
able to think and plan Etc and we're
going to talk about this later and
really it's an efficient Universal
Medium for transporting and verifying
ideas and we'll try to make this more um
tangible later but this also kind of
hints at how we can use these large
language models to understand language
to create kind of autonomous agents and
even more humanlike intelligence because
a lot of this is hidden in language
itself Okay so so now it's several
several years ago I'm getting old but
when I started off my career at Stanford
uh 12 years ago I think it was don't
quote me on that but uh then there was a
specific research team data set and like
model for each specific language task so
you would have one research team H one
data set and one model that they were
optimizing right an algorithm for
translation and then you would have a
separate research team model and data
set for question answer Ing and then
another you know isolated project and
and data and model and and researches
around classification and prediction and
R Etc right so these are kind of
isolated efforts that people were
optimizing specializing forign Building
Solutions and collecting data but you
know we started asking ourselves like
hey is this actually good are we
spreading ourselves thin here we're all
working on Solutions around language and
understanding language you know this
seems to be very related task doesn't
seem like human beings have separate
brains for each each different language
tasks so maybe there is some objective
or some something in language that we
can optimize for and learn they kind of
get the underlying problem of
understanding language and then we just
see all this all of these different
tasks as just kind of Downstream tasks
that we use this big good language
understanding brain to to to solve right
but the maybe we can optimize that
instead so that's what we start asking
ourselves right um
and right let's say we want to
accomplish this right let's say we want
to optimize and learn some type of
language how could this look
like well um somehow we want to be able
to uh digest like our model AI model or
computer model right to able to digest
language and then put it on some kind of
representation space or feature space
right into some useful format and that
we can use that format and and kind of
send it to other task right so let say
you know let's say we have this type of
AI model is able to digest language kind
of encode its meaning right into some
representation like numbers for example
and then we can then feed this to all
the different tasks right that's a
really good starting point because if
we're able to kind of featu and
represent language and and we also get
it the real meaning of the text it's
very very useful as as useful tool for
these Downstream tasks
and so this would be nice to have that's
what do people start working towards
basically featuing and representation
learning on language where sentences and
text that has similar meanings are
mapped very close in this in this High
dimensional meaning space um and that
it's like a very very nuanced granular
UNLOCK MORE
Sign up free to access premium features
INTERACTIVE VIEWER
Watch the video with synced subtitles, adjustable overlay, and full playback control.
AI SUMMARY
Get an instant AI-generated summary of the video content, key points, and takeaways.
TRANSLATE
Translate the transcript to 100+ languages with one click. Download in any format.
MIND MAP
Visualize the transcript as an interactive mind map. Understand structure at a glance.
CHAT WITH TRANSCRIPT
Ask questions about the video content. Get answers powered by AI directly from the transcript.
GET MORE FROM YOUR TRANSCRIPTS
Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.