TRANSCRIPTEnglish

Visual Guide to Gradient Boosted Trees (xgboost)

4m 8s613 words120 segmentsEnglish

FULL TRANSCRIPT

0:00

hi everyone welcome back to another

0:02

video in our machine learning series

0:05

in this video we'll learn yet another

0:07

popular model ensembling method

0:09

called gradient boosted trees if you

0:12

haven't already

0:13

check out our previous videos to learn

0:15

about random forests

0:16

where we introduced the concept of model

0:18

ensembling

0:19

as well as decision trees where we talk

0:22

about the building blocks of these

0:23

models

0:25

in this video we'll use gradient boosted

0:27

trees to perform

0:28

classification specifically to identify

0:31

the number

0:31

drawn in an image we'll use mnist

0:34

a large database of handwritten images

0:37

commonly used in image processing

0:39

it contains 60 000 training images and

0:42

10 000 testing images

0:44

each pixel is a feature and there are 10

0:46

possible classes

0:49

let's first learn a bit more about the

0:50

model gradient boosted trees and random

0:53

forests are both ensembling methods that

0:55

perform regression or classification

0:58

by combining the outputs from individual

1:00

trees

1:01

however gradient boosted trees and

1:03

random forests

1:04

differ in the way the individual trees

1:06

are built

1:07

and in the way the results are combined

1:10

as you already know

1:12

random forests build independent

1:13

decision trees and combine them in

1:15

parallel

1:17

on the other hand gradient boosted trees

1:19

use a method called

1:20

boosting boosting combines weak learners

1:23

sequentially

1:24

so that each new tree corrects the

1:26

errors of the previous one

1:29

weak learners are usually decision trees

1:31

with only one split

1:32

called decision stumps so the first step

1:36

is to fit a single decision tree

1:39

we'll evaluate how well this tree does

1:41

using a loss function

1:43

there are many different loss functions

1:44

we can choose from

1:46

for multi-class classification

1:48

cross-entropy is a popular choice

1:50

here's the equation for cross-entropy

1:53

where p is the label and

1:54

q is the prediction basically the loss

1:57

is high

1:58

when the label and prediction do not

2:00

agree and the loss is zero when they're

2:02

in perfect agreement

2:05

now that we have our first tree and the

2:07

loss function we'll use to evaluate the

2:09

model

2:09

let's add in a second tree we want the

2:12

second tree to be such that when added

2:14

to the first

2:15

it lowers the loss compared to the first

2:17

tree alone

2:19

here's what that looks like where eta is

2:22

the learning rate

2:24

we want to find the direction in which

2:26

the loss decreases the fastest

2:29

mathematically this is given by the

2:31

negative derivative of loss

2:33

with respect to the previous model's

2:35

output

2:37

therefore we fit the second weak learner

2:39

on the derivative of

2:41

l with respect to f of one which is

2:44

nothing but the gradient of the loss

2:46

function

2:46

with respect to the output of the

2:48

previous model

2:50

that's why this method is called

2:51

gradient boosting

2:53

for any step m gradient boosted trees

2:56

produces a model such that

2:58

ensemble at step m equals ensemble at

3:01

step

3:02

m minus 1 plus the learning rate times

3:05

the weak learner at step

3:06

m we want to choose the learning rate

3:10

such that we don't walk too far in any

3:12

direction

3:13

but at the same time if the learning

3:14

rate is too low

3:16

then the model might take too long to

3:18

converge to the right answer

3:20

compared to random forests gradient

3:23

boosted trees have a lot of model

3:24

capacity

3:26

so they can model very complex

3:27

relationships and decision boundaries

3:30

however as with all models with high

3:32

capacity

3:33

this can lead to overfitting very

3:35

quickly so be careful

3:38

[Music]

3:39

we fit a gradient boosted trees model

3:42

using the xg boost library

3:44

on mnist with 330 weak learners

3:47

and achieved 89 accuracy

3:50

try it out for yourself using the link

3:52

in the description and let us know your

3:54

thoughts

3:55

don't forget to subscribe to

3:57

reconnaissance for videos on machine

3:59

learning and more

UNLOCK MORE

Sign up free to access premium features

INTERACTIVE VIEWER

Watch the video with synced subtitles, adjustable overlay, and full playback control.

SIGN UP FREE TO UNLOCK

AI SUMMARY

Get an instant AI-generated summary of the video content, key points, and takeaways.

SIGN UP FREE TO UNLOCK

TRANSLATE

Translate the transcript to 100+ languages with one click. Download in any format.

SIGN UP FREE TO UNLOCK

MIND MAP

Visualize the transcript as an interactive mind map. Understand structure at a glance.

SIGN UP FREE TO UNLOCK

CHAT WITH TRANSCRIPT

Ask questions about the video content. Get answers powered by AI directly from the transcript.

SIGN UP FREE TO UNLOCK

GET MORE FROM YOUR TRANSCRIPTS

Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.

GET STARTED FREE SIGN IN