TRANSCRIPTEnglish

The Vesuvius challenge breakthrough with Luke Farritor

46m 18s8,978 words1,205 segmentsEnglish

FULL TRANSCRIPT

0:00

how often do you think about the Roman

0:02

Empire question all day every day

0:06

yeah

0:11

oh Luke came here today um let me see

0:14

where I can start so you were a SpaceX

0:16

intern you can tell us a bit more about

0:18

this but you were a space SpaceX intern

0:20

and during that period you found out

0:22

about this super cool challenge

0:24

organized by net Freedom which is a suia

0:26

challenge where the idea is to basically

0:28

decipher the letter the Greek letters

0:30

from these like Sor like carbonized

0:33

sculls that were found in her colum if

0:36

that's how you pronounce the small City

0:38

close to vus volcano and so basically

0:42

made a breakthrough uh in in that

0:44

challenge U by working on it basically

0:46

parttime I think while working at SpaceX

0:49

like in the in the evenings and like

0:51

weekends so it kind of also vies with

0:53

how I was working while I worked with

0:55

Microsoft I always got these open source

0:56

projects that I was working on like in

0:58

my my free time and I think

1:00

that's the best way to learn and yeah I

1:03

think without further Ado I'm going to

1:06

hand it over to you I think people will

1:08

be super inspired with with with your

1:09

story because I guess some people are

1:12

just like have similar Ambitions and and

1:14

so yeah thanks for yeah awesome of

1:17

course thank you for having me uh yeah

1:19

I'm Luke super excited to be here let me

1:21

show my screen again uh but overall yeah

1:26

I'm a 21y old from Lincoln Nebraska uh

1:29

born been raised in Lincoln um currently

1:32

I'm an undergrad at the University of

1:33

Nebraska always been into like

1:35

programming you know various projects

1:38

some internships and the like um did

1:41

some machine learning projects before

1:43

this nothing too crazy or fancy uh you

1:47

know nothing compared to like the modern

1:49

machine learning projects of today but

1:51

uh it was enough to kind of get my feet

1:53

wet uh Theus challenge was launched in

1:55

March of this year I heard about it on a

1:58

podcast War podcast if you guys have

2:01

heard of him uh and you know I've always

2:04

kind of been a person who respected that

2:06

for a long time so you know I turned

2:09

that on just because it's not on the

2:10

podcast and they kind of explained the

2:11

challenge and like holy cow I got to do

2:14

this uh there wasn't any really like one

2:17

thing that I I'll explain what the

2:19

challenge is in a second uh but there

2:22

there wasn't really any one single thing

2:24

that Drew me into I just kind of knew

2:25

like holy cow like all of this is really

2:27

cool like the prize the hisor impact all

2:30

of it so I just kind of immediately knew

2:32

it was something worth doing uh at the

2:34

time uh like you mentioned I was an

2:37

internet SpaceX uh I was working on the

2:39

Starship Launchpad software team down in

2:42

Bach chica Texas so right down at at

2:44

Starbase is what they call it um and

2:47

then you know there's a super long

2:48

commute in and out of there just because

2:50

it's kind of in the middle of nowhere so

2:52

I was just kind of listening to this

2:53

podcast on my commute and I was just

2:54

like holy cow I got to do this um and

2:57

yeah I kind of worked on the evenings

2:58

and weekends from March till July and

3:01

then July onward I've technically been

3:03

back in school but you know school I

3:05

think is often kind of straightforward

3:07

at least to just kind of get passing

3:09

grades uh so so I've been basically

3:13

fulltime on this

3:14

challenge uh since since July so what

3:18

what what is this challenge well I'm

3:19

gonna start by talking about a little

3:21

bit of History so 2,000 years ago pampe

3:26

happened the volcano went off uh you

3:29

know lava and Ash and mud were spilled

3:31

everywhere A lot of people died it was

3:33

really not fun next to Pompei there was

3:37

a library and that library was in a town

3:39

called herculanum which was kind of the

3:41

like Rich Li rich like suburb of Pompei

3:44

almost um but this library and the

3:46

Mansion around it were owned by Julius

3:48

Caesar's father-in-law were pretty

3:50

certain and just like everything else

3:52

the library was like burnt and it was

3:54

like covered in like ashes and mud and

3:57

lava and because it was burnt in the

3:59

Scrolls inside of it were burnt

4:01

everything was preserved so usually the

4:04

like books and the like Scrolls and

4:05

everything from that long ago uh you

4:09

know they Decay over time like the same

4:11

way how paper decays uh but these were

4:13

preserved because they were burnt and

4:14

then they were buried so you can see

4:16

three pictures of these Scrolls on the

4:18

right here and you know it's just this

4:21

like super charred super messed up um

4:24

roll of burnt up paper or Papyrus is

4:28

what it's called and people have been

4:30

digging these up from this library for

4:31

hundreds of years and they have no idea

4:34

how to read them there have been

4:36

attempts like many different kind of

4:38

methods nothing really works that well

4:41

uh which is not good and they've kind of

4:43

destroyed a lot of them in the process

4:46

but there's something like 400 of these

4:47

Scrolls that haven't yet been opened

4:49

again it was a big Library Julia

4:51

Caesar's father-in-law was very wealthy

4:54

um and uh yeah these have just kind of

4:57

been sitting in these museums and people

4:59

have been saying if we can read them

5:00

that'd be great because these are

5:01

entirely new works from the Roman Empire

5:04

and that kind of led to the vus

5:06

challenge so the real hero of this story

5:08

is a guy named Dr Brent seals he's a

5:11

professor at the University of Kentucky

5:13

and like 20 years ago he had this idea

5:16

of using CT scanning on these Scrolls uh

5:19

to read them so you can CT scan them

5:21

look at the inside and so on he''s been

5:24

working on that idea for a while he did

5:25

a lot of Prior research to kind of show

5:27

that it was possible all the things um

5:30

and then in 2019 he finally was able to

5:33

get some super high resolution scam uh

5:36

why did it take 20 years well first of

5:38

all uh the logistics of it are really

5:40

challenging like he's an American he's

5:43

convincing someone from an Italian

5:44

Museum to take their Priceless artifact

5:47

to a scanner in Britain like that's

5:50

really hard to uh arrange you know um

5:55

the AI wasn't quite there there's an AI

5:57

component to this which I'll talk about

5:59

um and then you know just Logistics are

6:01

very challenging and then the scans are

6:03

also super high resolution so the scans

6:05

I'm going to show you today are all at

6:07

four mic uh sorry they're all at eight

6:09

Micron resolution very high resolution

6:11

and because of that they had to scan

6:12

them at a particle accelerator the

6:14

diamond light source in uh in Great

6:16

Britain which I think is really cool but

6:19

uh yeah so he got these scans you know

6:21

he's always kind of been talking about

6:23

an advertising his work and then Nat

6:25

Freeman found out about it and he worked

6:27

with Nat Freeman to kind of Open Source

6:29

the data and to create a competition to

6:31

kind of find writing in these Scrolls

6:34

and that's the vuia challenge so this is

6:36

the website it's world price.org you can

6:39

go there right now um they've got a

6:42

pretty good overview uh so like here is

6:45

like what Julius Caesar's

6:47

father-in-law's Mansion kind of looked

6:49

like uh this is what the actual scroll

6:51

that I'm reading looks like so the

6:53

writing that I found is all in here the

6:56

these are fragments of Scrolls this is

6:57

what it looks like when you get a best

6:59

case scenario attempted unrolling it

7:01

it's really not good um if we can do it

7:04

non-invasively just by scanning it

7:05

that's way better um here's like an

7:09

earlier scroll that Dr C also kind of

7:11

unwrapped this one's a little different

7:13

though um here here they are at the

7:15

particle accelerator about toh scan it

7:19

you can kind of see a tiny little piece

7:21

of of scroll there um I think you're

7:23

just scanning a small sample here um but

7:26

yeah then the general idea is you can

7:27

take this scan and use machine learning

7:30

uh to read the writing in it uh more

7:32

cool pictures uh people who worked on it

7:36

but the kind of right through here is um

7:39

this the first word that was discovered

7:42

uh by me you can see the word here so

7:46

you can see kind of the scroll which

7:48

again I'll go into more detail about you

7:50

can see the outputs here the kind of

7:52

black shapes which are kind of detecting

7:55

ink uh and then a bunch of like Greek

7:57

Scholars kind of verified that this word

7:59

is profus is what it is which is the

8:02

word purple and I'm very glad that the

8:04

first word we found is not in or the or

8:06

and or of or the uh just because you

8:10

know those would all be boring but these

8:13

my friends uh is is the word purple

8:16

which is way more interesting in my

8:18

opinion so uh let's talk a little bit

8:21

about the data itself so I mentioned

8:24

it's a super high yeah sorry for

8:26

interrupting you like the the the slide

8:29

you just shown there like is the what

8:32

what did your machine learning ative do

8:34

like I guess the OCR was not what you've

8:36

done like you you you canot amplify

8:38

those letters or what was the how how

8:41

what was the input your algorithm I

8:42

guess and what was the output like I'm

8:44

trying to see whether you just amplify

8:45

the letters and then discern them or you

8:47

also done the OCR and all of that yeah

8:50

totally so there's there's no OCR here

8:54

it's just bringing this ink this black

8:56

stuff and making it visible my job is to

8:58

just take take the ink and make it

9:00

visible and then once you can do that

9:02

there are these Greek Scholars who have

9:03

been looking at these things their whole

9:05

lives and they can kind of um fill in

9:08

the blanks as well because a lot of the

9:09

ink was burned off so there's a high

9:11

risk of hallucination if you try to fill

9:13

in the blanks and all these things so

9:15

here I can just kind of show you what my

9:17

machine learning model looks like um but

9:21

uh yeah this is kind of the super

9:22

polished output this is maybe more what

9:24

the machine learning outputs look like

9:27

this is kind of less clear but it's very

9:29

noisy and you can just kind of see the

9:30

writing uh coming into view so the

9:34

process to go from you know the CT scan

9:36

to this is a little involved so I can

9:38

just show you a little bit of that

9:41

um they've kind of uploaded all the data

9:44

online you can download it yourself as

9:46

well uh but basically there's these tens

9:49

of thousands of individual layers of the

9:53

CT scan and I can just show you what one

9:56

looks like here on uh on the left here

9:59

this is something else this is kind of a

10:01

slice of the CT scan so you can kind of

10:03

see this super messed up spiral here

10:07

where you know it just kind of follows

10:08

around in this rough spiral but the

10:11

Papyrus the paper it flays apart it

10:13

sticks together it does all these really

10:15

messy things so the first thing you have

10:17

to do if you want to kind of read this

10:18

is you have to kind of virtually unroll

10:21

it and the way they do that right now is

10:23

you click click click click you manually

10:25

annotate this spiral and work your way

10:28

all the way around which is

10:31

um um a very tedious process there are

10:33

tools to speed it up there are plans to

10:36

make it automatic but it's a lot less

10:38

trivial than it might sound initially um

10:42

but yeah the first thing you have to do

10:43

is you kind of have to virtually unroll

10:45

this and once you've done that you get a

10:47

piece that looks maybe a bit more like

10:49

this so again it's flat but it's still a

10:53

mess there's no text which is obviously

10:55

visible here if we zoom in a little bit

10:57

you can see all these really weird

10:59

patterns um you can kind of see this

11:01

like flakiness down here um there's

11:03

these like white specks everywhere which

11:06

I think they're just noise um all this

11:09

fun stuff so you see this you look at it

11:12

virtually unwrapped and you're like wow

11:14

there's going to be no way to easily

11:16

read this thing right like if you

11:17

virtually unroll like a piece of paper

11:18

you can just read it because the ink is

11:20

there but here we're pretty sure the ink

11:22

is there because you know it's it's it's

11:25

a book it didn't vanish um but we can't

11:29

see it with a naked eye so a lot of the

11:31

time and the challenge was spent trying

11:34

to identify how can we pull the writing

11:36

out of here and there was kind of one

11:39

big breakthrough that really kind of set

11:42

me off and was like firing the starting

11:43

gun and that breakthrough was made by

11:46

another contestant his name is Casey

11:48

Hanmer uh he's a very busy very cool

11:50

person but he just kind of posted this

11:53

image on Twitter one day and if you look

11:57

really closely here you can actually see

11:59

the Greek letter Pi so Casey found this

12:04

and he wasn't sure if he had actually

12:06

found the right pattern you just kind of

12:07

post it online because again it's a

12:09

pretty collaborative competition they've

12:11

done a good job organizing it um just

12:13

kind of post it online it's like hey

12:14

like what what do you guys think but if

12:16

you look really closely you can see how

12:18

this goes up right and down here's a

12:22

better image of it you can kind of see

12:24

uh the letter Pi isolated here and I saw

12:27

this and I was like holy cow like there

12:29

actually is a way to detect a writing in

12:31

here and I saw this I kind of tried to

12:33

verify it at first I was in denial but

12:35

then I tried to find these patterns in

12:37

other places and it pretty consistently

12:40

appeared in the shape of Greek letters

12:42

like Pi iotas deltas and so on it's like

12:46

all right so if I've got all these

12:48

examples where I can you know kind of

12:52

visualize the I can kind of see the

12:54

letter visually but only 1% of the

12:58

expected letters like appear this way

12:59

like I look through all the pieces of

13:01

flatten Papyrus and maybe 10 letters I

13:05

could discern using this but I was like

13:07

hey you know that's that's a start so I

13:09

took those kind of 10 letters i' found

13:11

scattered about and made a training set

13:14

so let me uh show you some images here

13:18

just a quick question here like the

13:19

letters you found you said you were man

13:21

basically inspecting and like was that

13:24

how you done it or to create that

13:26

initial data set yeah create the initial

13:29

data set I was just looking at stuff in

13:32

preview like the Apple preview like this

13:34

just like trying to decide and then like

13:36

kind of cropping things like okay like

13:38

this seems reasonable and all these

13:39

things but it was a lot of trial in

13:41

there it took a very long time to find

13:43

those 10 letters um it was very tedious

13:45

and grelling but once I found all of

13:48

them I kind of had this training set

13:50

like this so on the left you can kind of

13:52

see the left side of a towel and on the

13:54

right side you can kind of see a piece

13:55

of an alpha and then on the right you

13:57

know these are kind of my training like

14:00

labels right and it's not perfect but

14:03

it's good enough and I don't actually

14:04

look at a like piece of the scroll

14:07

that's this big I look at like very

14:08

small pieces that are like 100 pixels by

14:10

100 pixels and this is maybe 500 pixels

14:13

wide um so that way a it is just faster

14:18

to train because the model is smaller

14:20

and your inputs are smaller but it's

14:22

also great because you can avoid

14:23

overfitting in some ways because the

14:26

model never sees what a whole lit looks

14:27

like so they can just kind of um it

14:30

doesn't memorize letter shapes it just

14:31

memorize what looks like in and what

14:33

doesn't because again hallucinating

14:35

letters is is always very scary to train

14:38

a model on this um I have to interrupt

14:41

you for a second just we have a question

14:43

for memory that's fine but I don't know

14:45

you hear notifications when people raise

14:47

the hand if not I'll unfor have to

14:50

interrup like this I'm go

14:53

ahead uh yeah look so as as I understand

14:56

you basically manually created this data

14:58

set of segmentation is it correct that's

15:01

correct yeah and like 10 segmentations

15:06

input and

15:07

outputs uh yes yeah so yeah yeah it's

15:10

basically a training set that looks like

15:12

that image there

15:14

um and it's not strictly segmentation

15:17

instead I did classification where I

15:19

look at very small sections of the image

15:22

um and then just yes no and then you

15:23

just kind of classify each section of

15:25

the image um but uh yeah that's kind of

15:29

the the core idea is you have this

15:31

training set of like these letters and

15:33

like maybe 10 others um and then you

15:35

train a model based on

15:37

that that that's cool really cool one

15:41

thank you thank you so I I kind of

15:43

trained a model the model is very simple

15:44

it's just a reset 18 um you know reset

15:47

is just kind of the offthe Shelf image

15:49

classifier that's easiest to use and

15:53

um uh what's great about it is just like

15:55

super quick to get up and running you

15:57

just got to train it on these you start

15:59

with the like one that's pre-trained on

16:00

imet or whatever and then train it on

16:03

these um and then once you have that you

16:07

can try it on different places of the

16:08

scroll and more letters can appear

16:11

because you know the machine learning

16:12

can pick up on patterns that you

16:14

yourself missed or the patterns were too

16:16

faint for you to otherwise distinguish

16:19

um but the the threshold for this kind

16:22

of first letters prize that this page

16:23

talks about is 10 letters right next to

16:26

each other and I had 10 letters

16:28

scattered about but not 10 letters next

16:30

to each other and for a long time the

16:32

bottleneck was just how fast can you

16:34

flatten the scroll how fast can you undo

16:36

the spiry and get flattened pieces to

16:38

try your algorithm on and eventually um

16:43

someone uploaded a a flatten piece and I

16:46

ran this on that and uh this text

16:49

appeared and I was just shocked like

16:51

holy cow like this might actually work

16:54

like you can see this writing um very

16:57

faintly but it's there and I kind of

16:59

looked at the data again and it's like

17:01

yeah I'm not sure I would

17:02

have um not sure I would have caught

17:05

that if I was inspecting it visually

17:07

which I thought was cool so I kind of

17:10

see this and I'm like wow we're close to

17:12

10 letters next to each other but we're

17:14

not quite at 10 letters and I spent a

17:17

lot of time just kind of bootstrapping

17:18

like you take these letters I detected

17:20

you take other letters I've detected

17:22

kind of individually and you can add

17:23

them to your training set right and then

17:25

you can retrain the model with your

17:26

larger training set and then you can

17:28

kind of boost strap your model that way

17:30

um and then I was able to kind of um

17:33

improve it from this uh up into this

17:36

which looks a bit better in my opinion

17:39

you know these letters are far more

17:40

visible um these boxes don't mean

17:44

anything uh they're just uh annotations

17:47

from other people um here's a better

17:50

image yeah right so this image looks

17:52

much CLE much cleaner than the image

17:54

before one question from uhuh how big

17:59

was Final bootstrap data

18:01

set um sorry say that again you

18:05

bootstrapped data set like you train

18:08

model it detected uh symbols you use the

18:12

symbols to train uh like more like

18:16

better model that like you started with

18:20

10 symbols how many symbols did you have

18:23

in final uh like uh

18:27

submission um so in the final submission

18:29

for this first letters thing I had maybe

18:32

15 letters I don't know the exact count

18:35

but it's not that many like it was 10 to

18:37

15 and that was enough of an improvement

18:39

to kind of get this uh I actually have S

18:42

go ahead so you TR like uh whole final

18:47

model was trained on 15 letters yes yes

18:50

yeah that's crazy yeah it's it's a very

18:54

small data set but it's a very small

18:56

Network um and it the letters are

18:58

chopped up into smaller bits and then

19:01

those smaller bits are fed into the

19:02

model so you're not just smaller bits

19:05

how many of smaller bits do you have

19:06

then I guess is a question um so the

19:09

window that the model looks at is like

19:11

100 by 100 um and then the individual

19:15

letters are like 500 by 500 pixels so

19:18

you know it's in the like tens of

19:20

thousands of training examples that you

19:21

have um just from like augmentations and

19:24

stuff too uh so you have plenty of

19:26

examples even then you still have some

19:28

overfitting problems but uh yeah CH like

19:32

chopping it up into smaller bits I think

19:33

really

19:35

helps nice and while we are here I have

19:38

a one more question are you familiar

19:40

with deep mind's work called

19:42

eaka uh EA yeah yeah um I don't know if

19:46

I pronounce that correctly but I've

19:47

heard about it tell me

19:49

more I don't know a lot I just know that

19:52

it's kind of related in the sense they

19:54

they were trying to reconstruct uh

19:56

ancient text where like some pieces are

19:59

missing and so maybe like an idea I had

20:02

H where you were like telling us more

20:05

about this is just like combining

20:06

somehow those two lines of research and

20:09

potentially reaching out to Deep Mind

20:10

researchers which I can connect you with

20:12

if if you um because that definitely

20:15

seems something that they would be

20:16

interested in and it looks like they've

20:18

been working on similar stuff as well so

20:20

like

20:21

just yeah that sounds super cool yeah

20:23

I'm super down

20:25

um yeah yeah absolutely and uh so you

20:28

kind of take this image you show it to

20:30

these like Greek Scholars and then they

20:32

kind of read it I like tried to identify

20:34

letters in here so you know there's like

20:35

a p looking thing like this thing on the

20:38

left of the pie is weird I thought that

20:40

was another pie but turns out it's not

20:42

um all these letters uh that and stuff

20:45

and the Greek letters kind of the Greek

20:47

Scholars kind of review them they have a

20:48

committee they vote you know all these

20:50

things and then they say okay we think

20:51

it's this um then that was kind of the

20:54

criteria for the prize which was cool

20:56

but uh there's a great grand prize so

20:59

this first Letter's prize is 10 letters

21:01

next to each other the grand prize is

21:03

four paragraphs basically four

21:05

continuous strings of 140 characters uh

21:08

and then the each each of the four

21:10

strings have to have like

21:12

85% character recognition which is

21:14

pretty darn good um so uh yeah and I've

21:19

just been working on that uh there's

21:21

another uh contestant who also submitted

21:24

another image uh and he he kind of

21:27

submitted a few weeks after me and he

21:29

basically used a very similar approach

21:31

um and I'll probably team up with him uh

21:33

which is cool for the grand prize um but

21:37

uh yeah just kind of working toward the

21:38

grand prize is what I've been doing and

21:39

uh just kind of traveling just kind of

21:41

in the wake of uh all this all this kind

21:44

of uh news and stuff um but uh yeah so

21:50

that's kind of it you can see the code

21:52

online

21:54

here uh you can download this run it

21:56

yourself uh there's this sub text which

21:58

kind of explains how everything works um

22:02

so here's here's a good example of the

22:03

training examples so take that whole

22:06

letter you chop it up into tiny bits uh

22:09

and then according to my labels um some

22:11

of these

22:13

have um this kind of cracking pattern

22:16

that I talked about or not um but these

22:18

you know trending examples are obviously

22:20

very small and you flip it you rotate it

22:22

you you know adjust the like brightness

22:23

and contrast and stuff but this is what

22:26

the machine learning model sees then you

22:27

just show every little bit of the image

22:30

uh that you're kind of uh training on or

22:32

that you're doing infering on excuse me

22:35

uh then yeah the other interesting thing

22:37

is for a lot of these letters you can

22:39

kind of pick them up visually this n is

22:42

especially clear you can kind of see

22:43

this like cracking pattern goes up and

22:46

then a little bit down and then up again

22:49

um which is cool uh but others are far

22:52

more subtle like this gamma here uh or

22:56

this y or whatever or oops Salon I think

22:58

is actually the letter like you can kind

22:59

of see it but not really but the good

23:01

thing about this first letter we can yep

23:04

quick quick question like given how thin

23:06

all of these slices of paper are like

23:08

when you saw when you showed us the the

23:10

actual 3D like image I'm thinking like

23:13

what what's the chance of of a leakage

23:16

like of of like letters from A Different

23:18

slice being combined with a with the

23:20

slice you're looking at and stuff like

23:21

that is there any potential that that

23:23

that could happen obviously if you kept

23:25

a cohesive like a coherent word then you

23:27

know it's from the single highly likely

23:29

from a single slice but in general did

23:31

do notice something like that happening

23:34

maybe yeah uh I I think like that's a

23:37

huge risk and it happens all the time

23:39

and this is just an especially clean

23:41

section which is why it works so well um

23:43

but in general it's like very messy and

23:45

like pieces get fused together pieces

23:47

kind of get lost in that fusing process

23:49

and then you have to do some

23:50

rearrangement after the fact uh to help

23:54

but the scan is multi-layer uh this is

23:56

the scan on the left here it's

23:58

multi-layer so if it's fused here it may

24:00

not be fused you know a few layers below

24:01

and stuff which is um you know super

24:05

helpful uh but yeah like there's like

24:07

weird like not bugs but like holes in

24:10

the data where like text gets repeated

24:12

because it like loops around again

24:14

because the person like forgot to like

24:16

go out one layer on the spiral because

24:18

they didn't realize what was going on

24:19

and stuff but you're correct that's like

24:21

a huge risk and like a huge problem but

24:23

here like it's like a coherent Greek

24:25

word so we know it's like um valid and

24:27

then you can also like look at the

24:28

spiral section itself and you're like

24:30

yeah it's like pretty out there in the

24:35

open yeah so uh this is the code you can

24:39

download it run it just D on me on

24:40

Twitter if you have any issues uh you

24:42

don't need crazy hardware for it um you

24:45

just need like pie torch and then a GPU

24:47

not a fancy GPU uh the whole thing is

24:50

like less than 700 lines which I'm very

24:53

happy about it's very it's relatively

24:55

simple for what it does um and it like

24:57

downloads all the data from the like

25:00

kind of data set server and stuff um but

25:03

yeah you can just clone this run it

25:04

yourself um you know reproduce those

25:07

images you saw uh and uh have a good

25:10

time

25:11

so yeah that's about all I have planned

25:14

is there anything else you'd like me to

25:15

kind of elaborate on or

25:19

anything you go ahead okay so you used

25:23

reset 18 have you tried after like uh

25:28

you like um obviously succeeded just Yol

25:34

and uh scale this

25:39

up yeah so I've been trying to do that

25:42

um you know annotation is hard uh just

25:44

because it's like a very clunky process

25:46

and have like written tools to like

25:47

speed it up and stuff um and then yeah

25:49

you kind of bootstrap it up model

25:51

architecture is non-trivial um some Mel

25:54

architectures do materially better than

25:55

others I don't know why so I'm going to

25:58

switch to like an Inception V3 which is

26:00

like some other image classifier I don't

26:01

know a ton about um just for like the uh

26:06

you know the outputs uh so that's fun um

26:11

but yeah you just kind of yell this you

26:12

bootstrap it up so you can find a lot

26:14

more text and then that's kind of what

26:16

the grand prize is I've been working on

26:17

that they have like weird ndas around it

26:20

uh so I can't like show a ton of text

26:22

well actually I can show one image look

26:24

up here uh like a lot of text comes so

26:27

this is kind of a newer image this is

26:29

from youf the other guy excuse me um but

26:32

you can see the word purple here the

26:34

prer US excuse me that I mentioned and

26:38

then there's just a ton of other text uh

26:40

around here as well so super doable um I

26:43

think we're going to be able to read the

26:44

whole scroll and I think you know there

26:45

are hundreds of other Scrolls that we

26:47

can read uh like this as well so again I

26:49

think it's all very in

26:50

reach Sorry by scaling up I meant uh

26:55

just using like bigger model like res 50

26:58

instead of res like have you tried that

27:02

and what yeah I have um just like the

27:06

off the shelf pie torch one um like

27:08

resinet 50 is like fine like it doesn't

27:10

do substantially better in these cases

27:13

and it's kind of like slightly noisier

27:15

just anecdotally so my intuition is that

27:17

it's just too many parameters for such

27:20

like relatively simple patterns because

27:22

again like reset like it's supposed to

27:23

classify like a thousand things and like

27:25

you know I'm just classifying is this

27:27

ink yes or no so it's like arguably a

27:30

simpler problem in that regard um so the

27:32

smaller models seem to do pretty well

27:34

and again your data set is really small

27:35

so having a smaller model helps protect

27:37

against overfitting as well um but yeah

27:40

you're you're you're asking the right

27:41

question like I've been experimenting

27:43

with this and I don't know the correct

27:45

answer did you maybe try like I'm just

27:47

throwing we throwing ideas at this point

27:49

at this problem uh like some of the

27:51

segmentation models and just use the

27:53

pre-train features there and like put

27:55

some type of classifier on to of it like

27:57

something like that I'm thinking because

27:58

you men on reset being pre-trained on

28:00

image net like the distribution shift is

28:03

significant like so either fine- tuning

28:06

something that exists or finding

28:08

something that's pre-training on

28:09

something that's closer that has a

28:11

smaller domain shift between what you're

28:12

trying to do and and what was the

28:14

pre-trained data like something along

28:16

those lines did you maybe think about

28:18

that I thought about it a little bit so

28:21

the other contestant sorry I dropped my

28:23

air poot um the other contestant ysep he

28:26

did a lot of like training stuff and

28:28

that was like moderately helpful but at

28:30

the end of the day just kind of creating

28:31

more labels is the thing that kind of

28:33

accelerated his results the most um so

28:37

yeah there's that and then um I uh yeah

28:42

I think there's something you can do

28:43

there with pre-training like you know

28:45

like the dinov V2 paper I think is

28:46

state-ofthe-art like there's definitely

28:49

something there that you can apply to

28:50

this um I was talking to a friend last

28:52

night like about this and he works at an

28:54

AI company and uh yeah he's like yeah

28:56

dude like you gota you got to pre-train

28:58

you know it's like yeah yeah I really

28:59

should so but yeah yeah again that's

29:02

that's a a the question for

29:05

sure awesome Brian go

29:09

ahead yeah yeah maybe something of a of

29:12

a silly question or a crazy question but

29:16

because I think uh in this problem it's

29:18

mostly about the data engineering part

29:20

than the model probably right because

29:22

you want to like keep boot bootstrapping

29:25

it uh do you think

29:27

do you think it would be possible maybe

29:29

it's a totally crazy idea to like take

29:33

an existing manuscript and burn that and

29:36

create the data set out of that like uh

29:40

beforehand and then try

29:42

to transport it to the problem yeah so

29:46

I've actually tried that myself where I

29:48

burned like I bought a bunch of Papyrus

29:50

on Amazon I burned it I CT scanned it at

29:53

my local University and the answer is

29:58

that the intrinsics of the CT scan are

30:01

different for each like environment

30:03

right and because of that it's not an

30:05

easy one to one transfer you can do

30:08

things to transfer it um but if you like

30:11

train on like one scroll and try it on

30:13

another it doesn't work and I don't know

30:16

why I don't think anyone knows why you

30:17

can do obvious things like you know do

30:19

like linear Transformations on the data

30:21

to make their normal distributions match

30:23

it still doesn't work you know you can

30:25

make the brightness contrast the same

30:27

we're just not sure why that doesn't

30:28

work obviously it should right because

30:31

there are some cases where if you like

30:32

scan physical objects next to each other

30:34

like in the same scanning session

30:36

basically and then you like train it on

30:38

two and then try it on the third it

30:40

works but if you do like you know this

30:42

like Rinky Dink scroll that Luke scanned

30:43

at his university and compare it to the

30:45

particle accelerator scan of this like

30:47

super old scroll um it doesn't transfer

30:50

super well at least not yet but you're

30:52

correct like visually under the scan um

30:55

they look similar so there should be a

30:57

way to like fudge the data and and make

30:59

it work but I don't know we're just not

31:01

there yet if you if you have ideas like

31:03

I'm I'm all yours but yeah I'd love to

31:05

figure this out I've been trying to

31:06

figure it

31:07

out maybe yeah maybe some some kind of

31:10

style transfer technique or something I

31:12

don't know but the cool I like that you

31:15

actually tried to burn it and scan it

31:18

yeah yeah thank

31:19

you is any of these synthetic data

31:23

public like

31:25

your so all of um first of all all the

31:29

CT scans are public you have to like

31:31

fill out an NDA form but that's it you

31:33

just go to scroll pri.org or just Google

31:35

like Scrolls Challenge and this will

31:37

come up you can just download it all

31:39

yourself and then every all the code to

31:41

produce the images I showed you is also

31:43

open source so my training labels are up

31:46

there you can download those and

31:47

everything so yeah it's it's all open

31:49

source you're welcome to play with it

31:51

I'm not asking about uh SC heran Scrolls

31:56

I'm asking about synthetic Scrolls that

31:58

you tried to burn and scammed is any

32:03

like uh of like results of such

32:06

approaches like I've seen other people

32:08

trying this on

32:09

Discord I think like is has any one

32:14

published this data so uh there's

32:17

another contestant uh like his named

32:19

Wayne Wayne hello on Discord he's

32:21

published a bunch uh I haven't like

32:23

talked to ton about the stuff I've tried

32:25

just because it hasn't really worked and

32:27

you know I I haven't like you know I

32:29

didn't want to like take the time to

32:30

like put it in the same like data format

32:32

as the other Scrolls and like the volume

32:34

phographer stuff um but like yeah so

32:38

like I I can like send you the scans I

32:40

have if you want like you can just I can

32:41

put them on the Discord server or

32:43

something like that but um honestly I'm

32:45

not sure they're like super useful at

32:48

this point um I think the most important

32:50

thing is just making more training

32:51

labels from the from the school but like

32:53

we need to figure this out and the crazy

32:54

thing is like uh this is kind of Tang

32:57

gential but like again there are

32:58

hundreds of Scrolls that we can read and

33:00

that we need to scan and we can't fly

33:02

them one at a time to Britain on this

33:04

like super tight schedule with their

33:05

particle accelerator like someone has to

33:07

figure out how to make a better CP

33:09

scanner that they can keep in Italy and

33:11

then just kind of walk it over from the

33:13

museum uh so there's a lot of work to be

33:15

done in this like scanning domain

33:17

transfer all these

33:19

things I mean I'm I'm feel fascinated by

33:22

the by the scan you showed me like that

33:24

piece of technology was actually

33:25

fundamental like from people were

33:27

actually find like like it's very it

33:29

looks like a very collaborative project

33:31

as what you described so far like like

33:33

the the CD technology that that has

33:35

created the initial data settings great

33:37

as well maybe yeah for sure um related

33:40

to this work obviously like what was

33:42

your what was the feedback you got from

33:44

from the from the community from maybe

33:45

companies like if you can disclose

33:47

anything there of course I've seen on

33:48

Twitter that can exploded N Net

33:51

retweeting it and then everybody

33:52

retweeting it but uh if you can share

33:54

maybe a bit more about the about the

33:56

Impressions that people you got from

33:57

people yeah totally so Nat has done a

34:00

really good job publicizing The

34:01

Challenge and like promoting it which

34:03

means that a lot of people want to see

34:04

this succeed so when I was like posting

34:07

my initial results of just detecting

34:08

like one or two new letters um they were

34:11

all pretty receptive uh Matt's a great

34:13

guy I just met him for the first time

34:15

yesterday in person which is kind of

34:16

cool um and uh yeah like they they just

34:19

want to read the Scrolls as their Monto

34:21

like we just want to read the Scrolls is

34:22

kind of what they say um and then yeah

34:26

like when the the letter stuff went

34:27

public there were a few press releases

34:29

and it got picked up in a bunch of news

34:31

articles which was I think uh super

34:34

super cool in my opinion so I'm just

34:36

very lucky and grateful to like be part

34:38

of the challenge and like kind of be at

34:40

this inflection point where you know I

34:41

get to get to kind of get some attention

34:44

you know so how often do you think about

34:46

the Roman

34:48

Empire question all day every day

34:52

yeah oh my God super cool and like did

34:56

you may get any job offers or was it

34:58

mostly on the like congratulations and

35:00

stuff super curious something like that

35:03

happens yeah yeah I know so tons of

35:05

people have been like super like hey

35:07

like if you want to come work with me

35:08

like let me know and stuff um I don't

35:10

know what exactly I'm G to like do but

35:13

like I'm definitely evaluating a few of

35:14

those options um bunch of like really

35:16

cool VC firms have reached out and it's

35:18

like oh man like I could work here and

35:19

like learn about startups and like find

35:21

my own startup you know so um we'll see

35:23

and the other thing the other variable

35:25

was like I'm like still in college and

35:27

I've been like away from college for two

35:29

weeks just to like travel for this and

35:30

come to SF and stuff um so I'll probably

35:33

drop out of college soon and then take

35:35

one of those job offers and like work on

35:36

these Scrolls fulltime in the meantime

35:38

and stuff um so it's it's really been a

35:40

life-changing experience both

35:42

financially and just like socially just

35:44

like you know getting to be part of this

35:45

and like you know all the all the

35:47

attention it's gotten and I get to like

35:49

talk to you guys and stuff right which

35:50

wouldn't have happened otherwise so yeah

35:52

it's it's it's really special I'm very

35:54

lucky I think that's so beautiful about

35:56

about the the time we live in like you

35:58

can just have somebody set this up a

36:01

couple of years ago and then like all

36:02

that like culminates and the D connect

36:05

and we end up here I think it's super

36:07

super Bing yeah thank you yeah yeah I I

36:11

completely agree it's really

36:14

cool awesome guys you have any any

36:17

questions for look if not I have I have

36:20

more okay I have the last one so do I

36:25

understand correctly that main

36:27

difference between you and other like um

36:32

guy who submitted is um that he has more

36:39

labels yes so he has more labels and he

36:42

has a different model architecture he he

36:44

uses an Inception net uh and then he

36:47

just like I'm not sure he had more

36:48

labels for his first letter submission

36:50

but like those are kind of the two

36:52

differentiating factors nothing else

36:53

really seems to matter other than like

36:55

having some like reasonable like not

36:57

stupid model architecture and then

36:59

having like good labels like that's

37:02

it and like the the labeling process is

37:05

challenging because like you'll often

37:07

get letters which are like half formed

37:08

and you're like man is this a delta or

37:11

is this like something else you know and

37:12

you like fill it in then the model gets

37:14

a little worse you're like oh crap and

37:16

then you have to like undo it and stuff

37:18

um but that kind of labeling like

37:20

guessing like you know annotating

37:23

processes is very tedious and lots of

37:25

like stops and

37:27

turns do you guys actually share those

37:29

labels with each other or you just keep

37:31

them to yourself while while the

37:32

competition is on so the the labels for

37:36

these first letters have been open-

37:38

sourced um the labels for the grand

37:41

prize right now were competing we want

37:43

to team up and we're going to like call

37:45

in an hour to talk about that and like

37:46

sign a piece of paper you know um and

37:49

then we'll team up and then you know

37:50

like split the prize money and we're

37:51

like you know very you know our chances

37:54

are very good because we know we were

37:55

only competing with each other uh at

37:57

least as far as I know there might be

37:59

someone else uh so again we'll see it's

38:01

not set in stone but um I'm very

38:03

optimistic and I think a partnership

38:05

would be very wise so

38:09

yeah makes

38:11

sense I have a question that's

38:13

tangential like to this topic in a way

38:16

um you you mentioned you were an

38:19

internet SpaceX like I'm curious briefly

38:22

experiences at SpaceX like how hardcore

38:25

is the culture and like how did you how

38:27

was how was your time there yeah it's

38:30

it's an incredible company I I really

38:32

enjoyed it and was very lucky to be

38:33

there so all the rumors are true it's a

38:36

very very hardcore culture but it's not

38:38

as bad as it sounds because everyone

38:40

enjoys their work and everyone believes

38:42

in what they're doing if they didn't

38:44

have those two things then they would go

38:46

get a higher paying job somewhere else

38:49

um yeah and I was down at Starbase where

38:52

they're building their starship rocket

38:54

and then they launched it in April and I

38:55

got to see that which was really cool um

38:59

lots of like little things I learned

39:00

like you know the intricacies of like

39:02

how like certain types of valves work or

39:04

whatever because I was on Launchpad

39:05

software so I'll just like Launchpad

39:06

stuff for me um the biggest lesson I

39:09

learned is like just that there's no

39:11

secret soft there it's just a lot of

39:13

people who are very gritty working very

39:16

hard there's no there's no like magical

39:19

incantations they're saying in the back

39:21

in the back that like make them move

39:23

faster it's just working very very hard

39:25

for a very long time by a large number

39:28

of people and that's how you get like

39:30

SpaceX level results and SpaceX level

39:32

dominance so obviously you know I kind

39:34

of knew that even before I worked at the

39:36

company but like going in there and like

39:37

seeing it and being like wow like

39:39

there's no secret here like it's just

39:41

you got to work hard I think was a very

39:43

valuable lesson to

39:46

learn cool and what was your project

39:49

exactly um Launchpad software so it was

39:52

my job like hey Luke like here's a valve

39:54

like decide when the valve valve should

39:57

open and close go talk to like these

39:59

five people about what they think the

40:01

valve should do and then get some

40:03

compromise between all five of them and

40:05

then program that in and then that's

40:07

part of the like launch sequence

40:08

basically um because like the launch is

40:10

like mostly automated where you like

40:12

turn on the pump wait till like this

40:14

tank fills up and then you turn it off

40:15

and then you turn this thing on you know

40:17

there's this like long super complicated

40:19

sequence to like fuel up and launch the

40:21

rocket and then you know I had like

40:22

these like tiny little sections in it

40:24

both in code and in like writing which

40:26

is cool crazy stuff and like security

40:29

wise like even you you were an intern

40:31

like who who will not see the the the

40:35

basically the the repercussions of your

40:38

actions while you were there like how do

40:40

they make sure that the security if your

40:42

software is like good enough before

40:44

integrating into the platform software I

40:46

guess that's the question yeah so for

40:49

like uh they do a very good job with

40:51

security so of course all the like

40:52

actual like work you do is you know the

40:54

poll request survie all these things

40:56

like you know and the code you write has

40:58

to make sense to others and stuff

40:59

because other people are familiar with

41:00

the system too um the security as in

41:03

like you know prevent information

41:04

leakage or prevent people from stealing

41:06

I think is also really strong like they

41:07

just have a really good security like

41:09

culture and really good like security

41:11

team um you know and they've got like

41:14

these like you know giant armed guards

41:16

who are super nice you know everywhere

41:17

and they'll tackle you if you walk into

41:19

the wrong building and don't have your

41:20

badge and stuff um but you know again

41:23

like I it's not too different from any

41:25

other company think they do a really

41:26

good job with

41:28

security nice and connecting these two

41:31

topics like machine learning um like

41:34

that SpaceX actually deploys if you can

41:37

reveal something like

41:38

that um what do you mean like SpaceX

41:42

doesn't do a ton of machine learning

41:44

yeah but is is there any is like is I

41:48

guess the whole software is just super

41:49

super robust and and like probably even

41:53

the dynamic allocations of memory are

41:54

not there or stuff like that like so so

41:56

I would suppose that you don't have a

41:57

lot in machine learning maybe for some

42:00

of the landing like I I would be I don't

42:02

know I would be probably surprised if

42:04

there wasn't any anything that at least

42:06

augments the landing procedure and like

42:08

observes the the environment and then

42:09

helps the landing but then again

42:12

Dynamics like bom Dynamics like the last

42:15

time I I checked they didn't have any

42:17

machine learning and all their robots

42:18

the spot robots were supering PR as well

42:21

so who knows but CU yeah yeah as as I

42:24

don't I'm not super familiar with the

42:26

The Landing process for the Falcon

42:27

rocket um maybe they're doing

42:29

augmentations there I'm really not sure

42:31

but like you know for the right I was

42:33

writing Control software there and

42:34

you're right it's like it has to be like

42:35

super reputable super reliable you know

42:38

there are all these paradigms like you

42:39

know no dynamic memory like you know

42:42

like you know certain weird things about

42:43

how you structure your code um and those

42:46

were all you know strictly enforced of

42:47

course principles similar to those um

42:51

but yeah it's very different from

42:52

machine learning right like machine

42:53

learning is just like you know a large

42:55

number of Matrix mul applications with

42:56

some math thrown in and then this is

42:58

like control stuff where like where your

43:00

your for Loop has stor every single time

43:02

you know but they've got like good

43:04

tooling around that so you know it works

43:06

pretty

43:09

well super cool even you can go how do

43:12

you even B

43:15

that uh there are many different ways um

43:18

you know this is all pretty industry

43:19

standard stuff like you can simulate

43:22

parts of it obviously simulating the

43:23

whole thing is very challenging um you

43:26

can test on like the actual Hardware

43:29

right like as long as you're not making

43:31

fire you can kind of do whatever you

43:32

want and organize you know tests right

43:35

again you know it's super common like

43:36

they did a wet dress rehearsal a few

43:38

days ago where they fuel up the rocket

43:41

pretend they're going to launch don't

43:42

actually turn the thing on and then they

43:44

un fuel right and that tests like a lot

43:46

of like really important um systems uh

43:50

and so there's lots of stuff and then of

43:52

course individual units of code you know

43:54

you can unit test you can like um like

43:56

integrate with like other forms of

43:57

testing you can attach it to Telemetry

44:00

um you know all that stuff so there's

44:02

both traditional software processes and

44:03

then just like you can just like test it

44:05

with the real thing you know and that

44:07

keeps your butt covered for the most

44:09

part uh and like as far as I understand

44:15

you like talk to multiple stakeholders

44:18

that were interested like in specific

44:21

properties uh of like how specific valve

44:25

um opens or closes like how uh like do

44:30

you how do they like um document

44:33

decision making process like it's

44:38

uh as far as I understand it's more like

44:41

uh about what to do not like how to do

44:45

it yeah um they have a pretty rigorous

44:49

like documentation process right um and

44:51

you know like you attach your

44:52

documentation to like you know your PLL

44:54

request and then you can like get blame

44:56

everything and see why it is the way it

44:58

is uh you know and stuff and then you

45:01

know you kind of have to just like get

45:02

the like all the stakeholders in the

45:03

same room and have them duke it out and

45:05

you know you can kind of nudge them in

45:07

One Direction and you know sometimes

45:08

someone is stubborn but I mean you know

45:11

you you know how it goes right that's

45:12

just kind of part of life so

45:15

yeah awesome look this was super

45:19

inspiring I really enjoyed it and

45:21

hearing and hearing the full story and

45:22

like you being 21 years old like that's

45:25

that's also cool like I think ultimately

45:28

it really boils down to just being

45:30

passionate about something and like as

45:32

you said hard work being being like

45:34

consistent and not giving up like and

45:36

just being curious because like if you

45:38

didn't just see this random podcast and

45:40

and and you you were like why not let me

45:42

let me tackle this challenge because it

45:44

seems fun like I I think a lot of cool

45:47

cool things can be done by just being

45:49

like that yeah for sure thank you yeah

45:52

like thank you for having me this is

45:53

this has been great like you guys are

45:55

you know you're a good Crow so you know

45:57

this is cool thank you awesome thanks

45:59

thanks for

46:02

coming is all right with a VI sky on my

46:09

skin o o baby I feel high with the p

UNLOCK MORE

Sign up free to access premium features

INTERACTIVE VIEWER

Watch the video with synced subtitles, adjustable overlay, and full playback control.

SIGN UP FREE TO UNLOCK

AI SUMMARY

Get an instant AI-generated summary of the video content, key points, and takeaways.

SIGN UP FREE TO UNLOCK

TRANSLATE

Translate the transcript to 100+ languages with one click. Download in any format.

SIGN UP FREE TO UNLOCK

MIND MAP

Visualize the transcript as an interactive mind map. Understand structure at a glance.

SIGN UP FREE TO UNLOCK

CHAT WITH TRANSCRIPT

Ask questions about the video content. Get answers powered by AI directly from the transcript.

SIGN UP FREE TO UNLOCK

GET MORE FROM YOUR TRANSCRIPTS

Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.