文本记录English

Seed IQ Solves ARC AGI 3 Games with Human-Level Performance - Denis O & Denise Holt Discuss How

12m 47s2,159 字数349 segmentsEnglish

完整文本记录

0:02

Should we start with the LS20 R?

0:04

Yeah, why not?

0:06

All right, let's start with

0:08

LS20. I'm going to let it go at maybe

0:10

two speed just so that we can observe

0:12

the effects while we talk about.

0:16

So, I'm going to turn off the

0:17

perception. Basically, what's happening,

0:20

like I said, like there are multiple

0:22

agents involved at every level.

0:23

Different agents are responsible for

0:25

different parts of the gameplay.

0:28

Some are responsible for the long-term

0:30

planning per level. Others are

0:31

responsible for learning across levels.

0:34

And then there are agents responsible

0:36

for tracking, perception, and

0:37

identifying different objects. So, for

0:39

example, if I turn it on again with

0:42

perception, you can see that different

0:44

targets, different sprites, as they call

0:47

them, or IRIs, are being tracked in the

0:49

game. And so, the engine computes

0:53

uh the best possible path and best

0:55

possible action action perception

0:57

coupling

0:58

uh as it goes. And I'll slow it down a

1:01

little bit more cuz it's just a little

1:03

too fast. But basically, it is relying

1:06

on something called topological

1:07

perception. Topological perception and

1:09

the advantage is you're not pattern

1:11

matching against this window. This

1:12

window were to change, just like with

1:14

ARC G I 1 and 2 challenges, it would be

1:17

able to adapt and would be able to still

1:19

establish the causality. So, like with

1:22

deep learning, right? With LLMs, with

1:24

other approaches in RL, if you change

1:26

the structure, if we were to suddenly

1:28

increase the size of this window this

1:30

world, they they would get lost cuz they

1:32

don't know how to readjust to it if it

1:34

hasn't been in a data set. With

1:36

topological perception feeding into the

1:38

manifolds per agent and multiple agents

1:41

working through the adaptive adaptive

1:43

multi-engine autonomous control,

1:46

it allows the structure to restructure

1:47

on the fly, understand exactly what's

1:49

happening. If there's any shift in

1:51

topology of the map, it can be adjust

1:54

and readjust its own strategy.

1:56

And basically, like

1:59

you see it encounter

2:01

um pusher. So, with pusher, it takes

2:02

three times his trying a strategy to go

2:04

around, realizes that it can't go

2:06

through this round. So, at this point,

2:08

it's going to try again and then reroute

2:10

further

2:12

to a different strategy. So, it will

2:15

find a new solution, go around, take the

2:17

sprite, uh and a few things that are

2:20

being tracked in the game is like

2:22

health, lives. You see the bars at the

2:24

bottom signify how many lives you have

2:27

uh left. Right now, we are three lives.

2:29

We haven't lost any lives. It finds uh

2:31

strategies to go around the pushers. It

2:34

learns as it goes and then it navigates

2:37

to the exit. The priors here is that you

2:38

have shapes, you have IRIs, you have

2:40

pushers, and you have a target selection

2:43

that you have to get to. And you have to

2:44

be able to come up precompute a

2:46

strategy. The that precomputation

2:48

happens at every step. All of the

2:50

multiple agents are pretty much

2:52

projecting their own internal belief

2:54

states into the player. And the player

2:56

becomes the actuator. And so, by level

2:58

six, it's all

3:00

it's already aware and it's trying to

3:01

catch. So, this is the level where it's

3:03

like Harry Potter trying to

3:05

catch multiple things at once, multiple

3:07

snitches. So, you have

3:10

objects that are moving, oscillating

3:12

together. You have to come up with a

3:13

strategy of effectively using the

3:16

sprites or the IRIs

3:19

to connect to them, intercept these,

3:21

change the proper shape, then figure out

3:23

which which is the next target. Is it

3:25

the color? Is it another sprite? So, you

3:27

have multiple constraints at once cuz

3:30

at any moment, at any step, you may run

3:32

out of of good steps, right? So, you

3:35

have to readjust your strategy on the

3:37

fly. You have to also rotate. So, the

3:39

these are different things: color,

3:41

shape, and rotation. Has to see The key

3:44

thing here is it decided that the route

3:46

through the color, where it could

3:47

accidentally hit it,

3:49

is the best route. And it figures out

3:52

just the exact moment to go through that

3:54

target. But by level six, it doesn't

3:56

even matter that you have hidden things

3:58

because at this point it has learned

4:00

accumulated knowledge from previous

4:02

games. So, it's not even a challenge.

4:04

Even with partial view, restricted, as

4:06

they call it, a camera view,

4:09

it's well aware, okay, IRIs are here.

4:12

Like level seven, last level, the

4:14

hardest level because

4:15

for an LLM or DL, you don't have any

4:18

perception left. Like you don't you have

4:20

you get partial matches on whatever

4:23

you're observing. But we are

4:25

constructing uh world model on the fly

4:28

of whatever it is that we're dealing

4:30

with. And so, as it navigates to

4:32

different corners, it already has

4:33

accumulated knowledge about what it has

4:37

can do, what it can do, what are the

4:39

constraints, how to best navigate around

4:41

them, how to

4:43

how to solve it. So, this one is a

4:44

different The here's uh topological

4:47

perception is key. You need to fill in

4:48

different shapes based on central shapes

4:50

inside. And also, like you can see a

4:53

topological perception again at play.

4:56

Change the structure, change the object.

4:58

Once it's understand the causality of

5:01

and what the reasoning is behind the

5:03

specific problem, it just reapplies it

5:05

everywhere.

5:07

But by level six uh seven, it doesn't

5:09

even matter anymore. You can clearly see

5:11

exactly

5:13

where is what and what needs to update.

5:17

So, it's just filling all the circles

5:19

cuz it can see exactly where everything

5:21

is. And so, by level six,

5:24

it has accumulated enough knowledge to

5:26

continue to you know, solve these

5:28

challenges one at a time, but then they

5:30

grow in complexity, but it doesn't

5:32

matter anymore. The manifold is

5:34

structured in such a way, it just goes

5:35

zooms through it. It doesn't even

5:37

matter. And then we can look at the

5:39

other game we solved. Mhm. Yeah, what's

5:42

interesting is that, you know, with the

5:44

the ARC 1 and 2 challenges, when you

5:46

were you were doing those and playing

5:48

with those, it didn't matter how we

5:50

scaled the complexity because it still

5:51

solved it the same. Same thing.

5:53

Topological perception. Yeah. So, it's

5:56

interesting to see that play out against

5:58

the dynamic uh window. Correct. What

6:02

What What we're doing is we're doing the

6:03

same thing, but now we're feeding it

6:05

into manifold constantly. So, it's not

6:07

just one frame, it's multiple frames

6:09

it's seeing it pretty much. Wow. It can

6:12

detect objects. It can understand where

6:14

it needs to perform an action. And I can

6:17

probably make it But it it it tries

6:19

things. If it doesn't work out, it

6:21

resets, finds a new strategy, starts

6:23

adopting that that strategy. It's

6:25

adaptive on the in real time.

6:28

But it might sometimes look like a

6:30

replay, but the reason why is because

6:31

it's looking at the topology. Topology

6:34

is what it is, right? Between levels,

6:36

it's it's set. You have oscillating

6:38

objects, but overall, the dynamics are

6:40

figured out already. So, there's a

6:43

deterministic path that's the best path

6:45

to follow. And with like here, it

6:47

figures out little by little. It tries

6:49

something, there's a reset. Reset means

6:52

that the strategy didn't work after a

6:54

few clicks. So, it reroutes, recomputes,

6:57

resoves,

6:58

finds a new higher-level horizon

7:01

planning strategy, and then starts

7:03

planning at low level. All that planning

7:04

is almost instantaneous. And all of this

7:06

is tracked by perception. So, you know

7:08

exactly where you are, what to click on,

7:10

what how to transfer these, and how to

7:12

achieve what it's looking to achieve.

解锁更多

免费注册以访问高级功能

互动查看器

观看带有同步字幕、可调节叠加层和完整播放控制的视频。

免费注册以解锁

AI 摘要

获取由 AI 立即生成的视频内容摘要、要点和结论。

免费注册以解锁

翻译

一键将字幕翻译成 100 多种语言。以任何格式下载。

免费注册以解锁

思维导图

将字幕可视化为交互式思维导图。一目了然地了解结构。

免费注册以解锁

与字幕聊天

提出关于视频内容的问题。直接从字幕中获取由 AI 驱动的答案。

免费注册以解锁

从您的字幕中获得更多

免费注册并解锁交互式查看器、AI 摘要、翻译、思维导图等。无需信用卡。

    Seed IQ Solves ARC AGI 3 Games… - 完整文字记录 | YouTubeTranscript.dev