TRANSCRIPTEnglish

Poisson Distribution EXPLAINED in UNDER 15 MINUTES!

14m 26s2,327 words343 segmentsEnglish

FULL TRANSCRIPT

0:01

[Music]

0:14

all right folks it's the pon

0:16

distribution for today named after a

0:18

French dude that did a whole bunch of

0:19

stuff in stats but for our purposes we

0:22

only care about his work as it relates

0:24

to probability distributions so here's a

0:27

quick rundown firstly we know it's a

0:29

discreet dist distribution meaning

0:31

there's only a discrete set of values

0:33

that this distribution can take but what

0:36

does this distribution actually describe

0:38

well it describes the number of events

0:41

occurring in a fixed time interval or

0:43

region of opportunity so the classic

0:46

example is you know how many customers

0:49

does a bank teller get every hour so in

0:52

that instance there's a fixed time

0:54

interval of 1 hour and this distribution

0:56

might describe the number of events that

0:58

occurs with within that hour but I'm not

1:01

too sure that that many people are going

1:03

to bank tellers these days perhaps a

1:05

better modern day equivalent might be

1:07

the people arriving at an Apple Genius

1:09

Bar because their freaking iPhone 7

1:11

headphones don't

1:12

work back on topic the next feature

1:15

about the distribution is that it

1:17

requires only one parameter which is the

1:20

expected number of events per time

1:22

interval

1:24

Lambda so in this case I've put a

1:26

distribution here with Lambda equaling 3

1:29

so maybe that's three customers every

1:31

hour or whatever but the other thing to

1:34

note is that it's bounded by zero and

1:37

infinity so unlike the binomial

1:40

distribution if you've been watching

1:41

this series the pon actually continues

1:45

on Forever My Graph here stops at 10 but

1:48

there's theoretically a very small

1:50

probability of there being 10 events in

1:54

this time interval and there's also a

1:56

smaller probability of there being 11

1:58

and 12 and 13 Etc it's just I haven't

2:01

included them on the graph because they

2:03

become

2:04

negligible nonetheless it's a

2:06

theoretical difference between this and

2:08

the binomial distribution which needs to

2:10

be

2:13

appreciated now what are the assumptions

2:15

underlying the poson distribution

2:18

firstly the rate at which events occur

2:20

must be constant another way of saying

2:23

this is that the probability of an event

2:26

occurring in a certain time interval

2:28

should be exactly the same same for

2:30

every other time interval of that same

2:33

length the other assumption is that the

2:35

occurrence of one event does not affect

2:38

the occurrence of a subsequent event I.E

2:41

the events are independent so it

2:43

shouldn't matter if an event just

2:45

happened that shouldn't influence the

2:47

time interval till the next

2:49

event now these assumptions won't

2:51

necessarily hold in reality so it's

2:54

often good to appreciate when you're

2:56

using the pon distribution just how

2:58

relevant it is to the question at hand

3:01

and we'll see with some examples here

3:03

whether these assumptions are going to

3:05

break

3:07

down so the next thing we can learn

3:09

about the pon distribution is the pmf or

3:13

the probability Mass function now all

3:15

that means is the height of each of

3:17

these discrete outcomes or the

3:19

probability of getting each of these

3:21

discrete outcomes when the mean is three

3:24

in this case so for example if I wanted

3:26

to find the probability of getting say

3:29

five events happening in this time

3:31

interval I can use the formula subbing

3:34

in the value five for x and subbing in

3:37

the value three for Lambda because don't

3:39

forget three is still our mean or

3:42

expected number of events per this time

3:45

period so I can actually do this by hand

3:47

and in that case I'd get

3:50

0.101 but of course it is possible to

3:52

use the wonders of excel to do exactly

3:55

the same thing so if you use the pon

3:59

dois function now these are the new

4:02

statistical functions that were

4:03

introduced to excel in I think the 2013

4:06

version but they're all standardized now

4:09

which makes things really easy you might

4:11

find other sources giving you the old

4:14

formula here and that'll work too but I

4:16

think it's good to start using these dot

4:18

disc functions because you'll see

4:20

they're all exactly the same once you

4:22

get the handle on them so pan. dis

4:25

requires three different arguments the

4:28

first of which is the value we're

4:30

seeking the number of events for which

4:32

we're seeking the probability so if we

4:34

want five in that case we'll put five as

4:36

our first argument the second argument

4:39

requires the mean for the Plus on

4:41

distribution in this case that's three

4:43

and the third argument requires you to

4:45

tell it whether you want the cumulative

4:48

distribution which is called the CDF or

4:51

whether you want the probability Mass

4:53

function the pmf and of course we want

4:55

the latter so to write false that tells

4:57

it that we don't want the cumulative

4:59

ative distribution we want the pmf so

5:02

it'll give us .11 as well so there's a

5:05

10% chance roughly 10.1% chance of

5:09

getting five events occurring in this

5:12

time interval all right so let's talk

5:14

about the CDF now the cumulative

5:17

distribution function now that's not the

5:19

height of a certain individual discrete

5:21

outcome that is the cumulative

5:24

distribution so all of the heights put

5:26

together up until that point and then

5:29

there is a formula for that involving a

5:31

gamma distribution and all this stuff

5:33

which look you're not going to really

5:35

need to know unless you're doing higher

5:37

order statistical stuff but for this

5:40

purpose we can use Pon doist again but

5:43

make sure we write true as that third

5:46

argument and in that case it's going to

5:48

sum up for us all of these bars up to

5:51

and including the outcome where there

5:54

are five

5:55

events so it'll sum up all of those five

5:58

and we get 0 point

6:01

916 the other unique thing about The

6:03

Poon distribution is that its expected

6:06

value which we kind of need to be told

6:09

is actually equal to its variance so

6:12

Lambda is also the variance of the

6:14

distribution so here we've got a mean of

6:17

three and we'd also have a standard

6:20

deviation of the square root of three

6:23

okay remembering that the standard

6:24

deviation is the square root of the

6:28

variance

6:31

and what I've done now is just provided

6:32

for you a couple of poson distributions

6:35

for differing values of Lambda just so

6:37

you get a sense of what they look like

6:39

so this is a Pon distribution with

6:41

Lambda that's the expected value being

6:44

one and this one is for where Lambda is

6:47

two here's Lambda being three and four

6:52

and five now of course in each of these

6:55

circumstances the distribution continues

6:57

Beyond 10 but I've just left the scale

7:00

constant so you can kind of get a sense

7:02

of how these flow from 1 2 3 4 and

7:06

5 but be aware too that the mean Lambda

7:11

doesn't need to be an integer value you

7:13

can also have a Plus on distribution

7:15

with a mean of 3.61 for example or even

7:19

a mean of 0.5 so there's no requirement

7:22

for that Lambda to be a full whole

7:24

number all right so it's time for you to

7:26

do a question let's give this a read

7:29

exclusive Vines import Argentinian wine

7:32

into Australia and they've begun

7:34

advertising on Facebook to direct

7:36

traffic to their website where customers

7:38

can order wine online the number of

7:41

click-through sales from the ad is Pon

7:43

distributed with a mean of 12

7:45

clickthrough sales per day Okay so we've

7:48

got a mean of 12 that will be our Lambda

7:52

value now I've got three questions for

7:54

you and I'm hoping that you can pause

7:56

the video here and give these a go and

7:58

then see if we get the same answer but

8:00

we're after the probability of getting

8:01

exactly 10 click-through sales in the

8:04

first day at least 10 click-through

8:06

sales in the first day and then more

8:08

than one sale in the first hour so see

8:10

how you go with those and I'm also going

8:12

to give you a bonus question do you

8:14

think the pon distribution is actually

8:16

appropriate for this scenario in reality

8:19

so hopefully you can think about those

8:20

assumptions and figure out whether they

8:22

would hold in this case so here's the

8:25

answer to part A the probability of 10

8:28

clickthrough sales in the first day is

8:31

equivalent to the height of this bar

8:34

here now this is a plus on distribution

8:37

where the mean is 12 Lambda is 12 of

8:40

course it goes a little Beyond in this

8:43

direction down to zero and on this

8:45

direction it goes up to Infinity but how

8:47

do we find the probability of that bar

8:50

we can use the formula for the pmf and

8:53

just Sub in those values for 12 being

8:56

Lambda and 10 being X and we we get a

8:59

value of

9:00

0.105 which would be the same result if

9:03

you used equals plus

9:05

on. and subbed in 10 12 and

9:09

false so there's a 10.5% chance of

9:13

getting 10 click-through sales in that

9:15

first day so there's the probability

9:18

Illustrated on the

9:20

plot all right Part B what's the

9:23

probability of at least 10 clickthrough

9:25

sales on the first

9:27

day so how do we find the probability of

9:29

X being greater than or equal to

9:32

10 now that's equivalent to this whole

9:35

yellow area over here if we sum up all

9:38

of those together going from 10 up until

9:41

Infinity well unfortunately in Excel

9:43

there's no way of finding the

9:45

probability of getting a value or higher

9:49

so we're going to have to use the CDF

9:51

which is the probability of getting a

9:53

value or lower and subtracting it from

9:56

one now here's the trick we're actually

9:59

going to go one minus plus on dist n is

10:04

the value of Interest we're going to use

10:05

here because if you think about it here

10:07

are those green

10:08

bars we're going to go one minus the

10:11

probability of all of these green bars

10:13

put together which is 9 and

10:16

below so you have to use your brain a

10:19

little bit with some of this stuff

10:21

knowing that it said at least 10

10:23

clickthrough sales we know we were after

10:26

10 and above which is 1 minus the

10:29

probability of N and

10:33

Below but putting in the appropriate

10:36

formula here we can get

10:39

0.758 and hopefully you got that exact

10:44

answer what about the probability that

10:46

we have more than one click-through sale

10:48

in the first

10:50

hour well this is where the properties

10:52

of a Pon distribution show themselves if

10:55

we know there's an average of 12

10:57

click-through sales in the first day if

11:00

it's truly A Plus on distribution the

11:03

mean number of sales per hour will be

11:06

0.5 because all we do is just divide by

11:09

the total number of hours so this

11:12

becomes our new value of

11:16

Lambda so this is the distribution now

11:18

where we've got a Lambda value of

11:22

0.5 so most of the distribution is going

11:24

to be down here at 0 and 1 because we're

11:26

only expecting 0.5 sales per hour so

11:31

we're most likely to get zero sales in a

11:34

given hour potentially we can get one

11:37

and then it becomes less likely to get

11:38

two three and very unlikely to get four

11:41

five and six and Beyond so really we're

11:44

after this shaded yellow region which

11:47

will include all of those values from

11:48

Two and

11:50

Beyond to get that it's going to be very

11:52

much like the last example we're going

11:53

to go 1 minus plus onist where we're

11:57

taking the cumulative distribution so

11:59

putting true in that third argument but

12:01

we're doing it from one where the mean

12:04

is 0.5 so we're going to be subtracting

12:06

from one these two bars

12:10

here which is

12:13

0.090 so there's only a 9% probability

12:16

of getting more than one click-through

12:19

sale in the first hour and I'll just

12:21

reiterate that it's it's very important

12:24

to read strictly what was written in the

12:26

question here because it says more than

12:28

one click through sale if it said one or

12:31

more we'd actually get a different

12:33

answer because we'd be after this

12:35

probability as well so this would also

12:37

be yellow where X is

12:40

one all right so how did you go

12:42

hopefully you got those same

12:45

answers question D the bonus question

12:48

asks you to think about what a Pon

12:50

distribution kind of is and whether it's

12:52

appropriate for this scenario in

12:55

reality now I'll return you to our

12:57

assumption where it said that the the

12:59

rate at which events occur must be

13:01

constant so in other words no interval

13:04

can be more likely to have an event than

13:06

any other interval of the same

13:08

size now if you're dealing with clicks

13:11

on Facebook over the course of a day

13:15

it's very unlikely for it to be a

13:17

constant

13:19

rate how many people do you think are

13:21

going to be clicking on ads for wine at

13:23

about 2:00 a.m. in the morning well

13:26

actually that's probably quite a few how

13:28

many people are going to be looking at

13:29

it at say 6: a.m. in the morning might

13:31

be a better

13:32

question not very many right so

13:35

irrespective of when in the day people

13:37

would be likely to click on this you can

13:39

you get the sense that people's usage of

13:41

Facebook would differ throughout the

13:43

hours of the day and also their

13:45

likelihood to be attracted by an ad for

13:48

wine so in reality it would not really

13:51

be a Pon distribution but nonetheless it

13:54

is often good to use these types of

13:57

distributions as it can still give a

13:59

decent picture of what's going on feel

14:01

free to click through to the next video

14:03

in the series where I deal with the

14:05

hyper geometric

14:06

distribution and any poker players might

14:09

be interested in that one that is the

14:10

classic poker hand

14:14

distribution but yep I'll just leave

14:16

these up here and if you want to keep in

14:18

touch you can do so through these

14:22

links

UNLOCK MORE

Sign up free to access premium features

INTERACTIVE VIEWER

Watch the video with synced subtitles, adjustable overlay, and full playback control.

SIGN UP FREE TO UNLOCK

AI SUMMARY

Get an instant AI-generated summary of the video content, key points, and takeaways.

SIGN UP FREE TO UNLOCK

TRANSLATE

Translate the transcript to 100+ languages with one click. Download in any format.

SIGN UP FREE TO UNLOCK

MIND MAP

Visualize the transcript as an interactive mind map. Understand structure at a glance.

SIGN UP FREE TO UNLOCK

CHAT WITH TRANSCRIPT

Ask questions about the video content. Get answers powered by AI directly from the transcript.

SIGN UP FREE TO UNLOCK

GET MORE FROM YOUR TRANSCRIPTS

Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.