文本记录English

Andrej Karpathy: From Vibe Coding to Agentic Engineering

29m 45s6,342 字数893 segmentsEnglish

完整文本记录

0:02

We're so excited for our very first

0:03

special guest. He has helped build

0:06

modern AI, then explain modern AI, and

0:10

then occasionally rename modern AI. He

0:14

actually helped co-ound open AAI right

0:16

inside of this office. Was the one who

0:18

actually got Autopilot working at Tesla

0:21

back in the day, and he has a rare gift

0:23

of making the most complex technical

0:26

shifts feel both accessible and

0:28

inevitable.

0:30

You all know him for having coined the

0:31

term vibe coding last year, but just in

0:35

the last few months, he said something

0:36

even more startling. That he's never

0:38

felt more behind as a programmer. That's

0:41

where we're starting today. Thank you,

0:43

Andre, for joining us.

0:44

>> Yeah. Hello. Excited to be here and to

0:46

kick us off.

0:47

>> Okay. So, just a couple months ago, you

0:49

said that you've never felt more behind

0:51

as a programmer. That's startling to

0:53

hear from you of all people. Um, can you

0:55

help us unpack that? Was that feeling

0:57

exhilarating or unsettling?

1:00

>> Uh yeah, a mixture of both for sure. Uh

1:02

well, first of all, um

1:05

I guess like as many of you, I've been

1:06

using agentic tools like lot code,

1:08

adjacent things, uh for a while, maybe

1:10

over the last year as it came out and it

1:12

was very good at you know chunks of code

1:13

and sometimes it would mess up and you

1:15

have to edit them and it was kind of

1:16

helpful and then I would say December

1:18

was this uh clear point where for me I

1:21

was on a break so I had a bit more time.

1:22

I think many other people were similar

1:24

and uh I just started to notice that

1:26

with the latest models uh the chunks

1:28

just came out fine and then I kept

1:30

asking for more and it just came out

1:31

fine and then I can't remember the last

1:32

time I corrected it and then I was I

1:34

just you know trusted the system more

1:36

and more and then I was vibe coding

1:38

[laughter]

1:39

and uh so it was kind of a I do think

1:42

that it was a very stark transition. I

1:43

think that a lot of people actually I

1:45

tried to I tried to stress this on uh

1:47

Twitter and or X because I think a lot

1:49

of people experienced AI last year as

1:52

ChachiPT adjacent thing. Uh but you

1:54

really had to look again and you had to

1:55

look as of December uh because things

1:58

have changed fundamentally and uh

1:59

especially on this like agentic coherent

2:01

workflow uh that really started to

2:04

actually work. Um, and so I would say

2:07

that um, yeah, it was just that

2:09

realization that really uh, uh, had me

2:12

um, go down their whole rabbit hole of

2:14

just, you know, infinity side projects.

2:16

Uh, my side projects folder is like

2:18

extremely full with lots of random

2:19

things and, uh, just, uh, V coding all

2:21

the time. Uh, so, uh, yeah, that kind of

2:23

happened in December, I would say, and I

2:25

was looking at the repercussions of that

2:26

since.

2:28

>> Um, you've talked a lot about this idea

2:30

of LLMs as a new computer. um that it

2:33

isn't just better software, it's a whole

2:35

new computing paradigm. And um software

2:38

1.0 was explicit rules, software 2.0 was

2:41

learned weights, software 3.0 is this.

2:43

Um if that's actually true, what does a

2:46

team build differently the day they

2:48

actually believe this,

2:50

>> right? So uh yeah, exactly. So software

2:53

1.0, I'm writing code, software 2.0, I'm

2:56

actually programming by creating data

2:57

sets and training uh training neural

2:59

networks. So the programming is kind of

3:01

like arranging data sets and maybe some

3:02

objectives and neural network

3:03

architectures. And then what happened is

3:05

that basically if you train one of these

3:07

GPT models or LLMs on a sufficiently

3:09

large set of tasks implicit basically um

3:12

implicitly because by training on the

3:14

internet you have to multitask all the

3:15

things that are in the data set. Uh

3:17

these actually become kind of like a

3:18

programmable computer in a certain

3:20

sense. So software 3.0 know is kind of

3:21

about uh you know your programming now

3:24

turns to prompting and what's in the

3:25

context window is your lever over the

3:28

interpreter that is the LLM that is kind

3:30

of like interpreting your context and uh

3:32

performing computation in the dig

3:34

digital information space. So I guess um

3:37

yeah that's kind of the transition and I

3:39

think there's a few examples of that

3:41

really drove it home for me and maybe

3:42

that might be instructive. Uh so for

3:44

example when you when openclaw came out

3:48

when you want to install openclaw you

3:49

would expect that normally this is a

3:50

bash bash script like a shell script. So

3:52

run the shell script to run to install

3:54

open claw. Um but the thing is that in

3:57

order to target lots of different

3:58

platforms and lots of different types of

4:00

computers you might run an open claw.

4:01

This these shell scripts usually balloon

4:03

up and become extremely complex. But the

4:05

thing is you're still stuck in a

4:06

software 1.0 universe of wanting to

4:07

write the code. And actually the open

4:09

claw installation is a is a copy paste

4:12

of a b bunch of text that you're

4:13

supposed to give to your agent. Uh so

4:15

basically it's it's a little skill of uh

4:18

you know copy paste this and give it to

4:19

your agent and it will install open

4:20

claw. And the reason this is a lot more

4:22

powerful is you're working now in the

4:23

software 3.0 paradigm where you don't

4:25

have to precisely spell out you know all

4:27

the individual details of that setup.

4:29

The agent has its own intelligence that

4:30

it packages up and then it kind of like

4:32

follows the instructions and it looks at

4:34

your environment, your computer and it

4:36

kind of like performs intelligent

4:37

actions to make things work and it

4:38

debugs things in the loop and it's just

4:40

like so much more powerful, right? So I

4:42

think that's a very different kind of

4:44

like way of thinking about it is just

4:46

like what is the piece of text to copy

4:47

paste to your agent? That's the

4:48

programming paradigm. Now I think one

4:50

more maybe uh example that comes to mind

4:52

that is even more extreme than that is

4:54

when I was building um menu genen. So,

4:56

menu genen is this idea where you um you

5:00

come to a restaurant, they give you a

5:01

menu. There's no pictures usually. So, I

5:03

don't know what any of these things are

5:05

uh usually like 30% of the things I have

5:07

no idea what they are, 50%. So, I wanted

5:09

to take a photo of the restaurant menu

5:12

and to get pictures of what those things

5:13

might look like in a generic sense. And

5:16

so I built I've vcoded this app that

5:18

basically lets you upload a photo and it

5:20

does all this stuff and it runs on

5:21

Verscell and uh it basically rerenders

5:24

the menu and it gives you like all the

5:26

items and it gives you a picture that it

5:28

uses an image um you know generator uh

5:31

for to basically OCR all the different

5:33

titles uh use the image generator to get

5:35

pictures of them and then shows it to

5:37

you. And then I saw the software 3.0

5:39

version of this which is which blew my

5:41

mind which is literally just take your

5:43

photo give it to Gemini and say use

5:46

Nanobanana to overlay the the things

5:48

onto the menu. Uh and Nanabanana

5:51

basically returned an image that is

5:52

exactly the picture of the menu that I

5:54

took but it actually put into the pixels

5:56

it rendered the different things in the

5:58

menu and this blew my mind because

6:02

actually all of my menu gen is spirious.

6:04

It's working in the old paradigm that

6:06

app shouldn't exist. uh and uh yeah the

6:09

software 3.0 paradigm is a lot more kind

6:11

of raw. It just um your neural network

6:14

is doing more and more of the work and

6:15

your prompt or context is just the image

6:18

and the output is an image and there's

6:19

no need to have any of the app in

6:21

between. Um so I think that people have

6:24

to kind of like reframe you know not to

6:27

work in existing paradigm of what things

6:30

existed and just think about it as a

6:31

speed up of what exists. It's actually

6:33

like new things are available now. And

6:36

going back to your programming question,

6:37

it's not even I think that's also an

6:38

example of working in the in the old

6:40

mindset because it's not just about

6:41

programming and programming becoming

解锁更多

免费注册以访问高级功能

互动查看器

观看带有同步字幕、可调节叠加层和完整播放控制的视频。

免费注册以解锁

AI 摘要

获取由 AI 立即生成的视频内容摘要、要点和结论。

免费注册以解锁

翻译

一键将字幕翻译成 100 多种语言。以任何格式下载。

免费注册以解锁

思维导图

将字幕可视化为交互式思维导图。一目了然地了解结构。

免费注册以解锁

与字幕聊天

提出关于视频内容的问题。直接从字幕中获取由 AI 驱动的答案。

免费注册以解锁

从您的字幕中获得更多

免费注册并解锁交互式查看器、AI 摘要、翻译、思维导图等。无需信用卡。

    Andrej Karpathy: From Vibe Codi… - 完整文字记录 | YouTubeTranscript.dev