Poisson Distribution EXPLAINED in UNDER 15 MINUTES!
FULL TRANSCRIPT
[Music]
all right folks it's the pon
distribution for today named after a
French dude that did a whole bunch of
stuff in stats but for our purposes we
only care about his work as it relates
to probability distributions so here's a
quick rundown firstly we know it's a
discreet dist distribution meaning
there's only a discrete set of values
that this distribution can take but what
does this distribution actually describe
well it describes the number of events
occurring in a fixed time interval or
region of opportunity so the classic
example is you know how many customers
does a bank teller get every hour so in
that instance there's a fixed time
interval of 1 hour and this distribution
might describe the number of events that
occurs with within that hour but I'm not
too sure that that many people are going
to bank tellers these days perhaps a
better modern day equivalent might be
the people arriving at an Apple Genius
Bar because their freaking iPhone 7
headphones don't
work back on topic the next feature
about the distribution is that it
requires only one parameter which is the
expected number of events per time
interval
Lambda so in this case I've put a
distribution here with Lambda equaling 3
so maybe that's three customers every
hour or whatever but the other thing to
note is that it's bounded by zero and
infinity so unlike the binomial
distribution if you've been watching
this series the pon actually continues
on Forever My Graph here stops at 10 but
there's theoretically a very small
probability of there being 10 events in
this time interval and there's also a
smaller probability of there being 11
and 12 and 13 Etc it's just I haven't
included them on the graph because they
become
negligible nonetheless it's a
theoretical difference between this and
the binomial distribution which needs to
be
appreciated now what are the assumptions
underlying the poson distribution
firstly the rate at which events occur
must be constant another way of saying
this is that the probability of an event
occurring in a certain time interval
should be exactly the same same for
every other time interval of that same
length the other assumption is that the
occurrence of one event does not affect
the occurrence of a subsequent event I.E
the events are independent so it
shouldn't matter if an event just
happened that shouldn't influence the
time interval till the next
event now these assumptions won't
necessarily hold in reality so it's
often good to appreciate when you're
using the pon distribution just how
relevant it is to the question at hand
and we'll see with some examples here
whether these assumptions are going to
break
down so the next thing we can learn
about the pon distribution is the pmf or
the probability Mass function now all
that means is the height of each of
these discrete outcomes or the
probability of getting each of these
discrete outcomes when the mean is three
in this case so for example if I wanted
to find the probability of getting say
five events happening in this time
interval I can use the formula subbing
in the value five for x and subbing in
the value three for Lambda because don't
forget three is still our mean or
expected number of events per this time
period so I can actually do this by hand
and in that case I'd get
0.101 but of course it is possible to
use the wonders of excel to do exactly
the same thing so if you use the pon
dois function now these are the new
statistical functions that were
introduced to excel in I think the 2013
version but they're all standardized now
which makes things really easy you might
find other sources giving you the old
formula here and that'll work too but I
think it's good to start using these dot
disc functions because you'll see
they're all exactly the same once you
get the handle on them so pan. dis
requires three different arguments the
first of which is the value we're
seeking the number of events for which
we're seeking the probability so if we
want five in that case we'll put five as
our first argument the second argument
requires the mean for the Plus on
distribution in this case that's three
and the third argument requires you to
tell it whether you want the cumulative
distribution which is called the CDF or
whether you want the probability Mass
function the pmf and of course we want
the latter so to write false that tells
it that we don't want the cumulative
ative distribution we want the pmf so
it'll give us .11 as well so there's a
10% chance roughly 10.1% chance of
getting five events occurring in this
time interval all right so let's talk
about the CDF now the cumulative
distribution function now that's not the
height of a certain individual discrete
outcome that is the cumulative
distribution so all of the heights put
together up until that point and then
there is a formula for that involving a
gamma distribution and all this stuff
which look you're not going to really
need to know unless you're doing higher
order statistical stuff but for this
purpose we can use Pon doist again but
make sure we write true as that third
argument and in that case it's going to
sum up for us all of these bars up to
and including the outcome where there
are five
events so it'll sum up all of those five
and we get 0 point
916 the other unique thing about The
Poon distribution is that its expected
value which we kind of need to be told
is actually equal to its variance so
Lambda is also the variance of the
distribution so here we've got a mean of
three and we'd also have a standard
deviation of the square root of three
okay remembering that the standard
deviation is the square root of the
variance
and what I've done now is just provided
for you a couple of poson distributions
for differing values of Lambda just so
you get a sense of what they look like
so this is a Pon distribution with
Lambda that's the expected value being
one and this one is for where Lambda is
two here's Lambda being three and four
and five now of course in each of these
circumstances the distribution continues
Beyond 10 but I've just left the scale
constant so you can kind of get a sense
of how these flow from 1 2 3 4 and
5 but be aware too that the mean Lambda
doesn't need to be an integer value you
can also have a Plus on distribution
with a mean of 3.61 for example or even
a mean of 0.5 so there's no requirement
for that Lambda to be a full whole
number all right so it's time for you to
do a question let's give this a read
exclusive Vines import Argentinian wine
into Australia and they've begun
advertising on Facebook to direct
traffic to their website where customers
can order wine online the number of
click-through sales from the ad is Pon
distributed with a mean of 12
clickthrough sales per day Okay so we've
got a mean of 12 that will be our Lambda
value now I've got three questions for
you and I'm hoping that you can pause
the video here and give these a go and
then see if we get the same answer but
we're after the probability of getting
exactly 10 click-through sales in the
first day at least 10 click-through
sales in the first day and then more
than one sale in the first hour so see
how you go with those and I'm also going
to give you a bonus question do you
think the pon distribution is actually
appropriate for this scenario in reality
so hopefully you can think about those
assumptions and figure out whether they
would hold in this case so here's the
answer to part A the probability of 10
clickthrough sales in the first day is
equivalent to the height of this bar
here now this is a plus on distribution
where the mean is 12 Lambda is 12 of
course it goes a little Beyond in this
direction down to zero and on this
direction it goes up to Infinity but how
do we find the probability of that bar
we can use the formula for the pmf and
just Sub in those values for 12 being
Lambda and 10 being X and we we get a
value of
0.105 which would be the same result if
you used equals plus
on. and subbed in 10 12 and
false so there's a 10.5% chance of
getting 10 click-through sales in that
first day so there's the probability
Illustrated on the
plot all right Part B what's the
probability of at least 10 clickthrough
sales on the first
day so how do we find the probability of
X being greater than or equal to
10 now that's equivalent to this whole
yellow area over here if we sum up all
of those together going from 10 up until
Infinity well unfortunately in Excel
there's no way of finding the
probability of getting a value or higher
so we're going to have to use the CDF
which is the probability of getting a
value or lower and subtracting it from
one now here's the trick we're actually
going to go one minus plus on dist n is
the value of Interest we're going to use
here because if you think about it here
are those green
bars we're going to go one minus the
probability of all of these green bars
put together which is 9 and
below so you have to use your brain a
little bit with some of this stuff
knowing that it said at least 10
clickthrough sales we know we were after
10 and above which is 1 minus the
probability of N and
Below but putting in the appropriate
formula here we can get
0.758 and hopefully you got that exact
answer what about the probability that
we have more than one click-through sale
in the first
hour well this is where the properties
of a Pon distribution show themselves if
we know there's an average of 12
click-through sales in the first day if
it's truly A Plus on distribution the
mean number of sales per hour will be
0.5 because all we do is just divide by
the total number of hours so this
becomes our new value of
Lambda so this is the distribution now
where we've got a Lambda value of
0.5 so most of the distribution is going
to be down here at 0 and 1 because we're
only expecting 0.5 sales per hour so
we're most likely to get zero sales in a
given hour potentially we can get one
and then it becomes less likely to get
two three and very unlikely to get four
five and six and Beyond so really we're
after this shaded yellow region which
will include all of those values from
Two and
Beyond to get that it's going to be very
much like the last example we're going
to go 1 minus plus onist where we're
taking the cumulative distribution so
putting true in that third argument but
we're doing it from one where the mean
is 0.5 so we're going to be subtracting
from one these two bars
here which is
0.090 so there's only a 9% probability
of getting more than one click-through
sale in the first hour and I'll just
reiterate that it's it's very important
to read strictly what was written in the
question here because it says more than
one click through sale if it said one or
more we'd actually get a different
answer because we'd be after this
probability as well so this would also
be yellow where X is
one all right so how did you go
hopefully you got those same
answers question D the bonus question
asks you to think about what a Pon
distribution kind of is and whether it's
appropriate for this scenario in
reality now I'll return you to our
assumption where it said that the the
rate at which events occur must be
constant so in other words no interval
can be more likely to have an event than
any other interval of the same
size now if you're dealing with clicks
on Facebook over the course of a day
it's very unlikely for it to be a
constant
rate how many people do you think are
going to be clicking on ads for wine at
about 2:00 a.m. in the morning well
actually that's probably quite a few how
many people are going to be looking at
it at say 6: a.m. in the morning might
be a better
question not very many right so
irrespective of when in the day people
would be likely to click on this you can
you get the sense that people's usage of
Facebook would differ throughout the
hours of the day and also their
likelihood to be attracted by an ad for
wine so in reality it would not really
be a Pon distribution but nonetheless it
is often good to use these types of
distributions as it can still give a
decent picture of what's going on feel
free to click through to the next video
in the series where I deal with the
hyper geometric
distribution and any poker players might
be interested in that one that is the
classic poker hand
distribution but yep I'll just leave
these up here and if you want to keep in
touch you can do so through these
links
UNLOCK MORE
Sign up free to access premium features
INTERACTIVE VIEWER
Watch the video with synced subtitles, adjustable overlay, and full playback control.
AI SUMMARY
Get an instant AI-generated summary of the video content, key points, and takeaways.
TRANSLATE
Translate the transcript to 100+ languages with one click. Download in any format.
MIND MAP
Visualize the transcript as an interactive mind map. Understand structure at a glance.
CHAT WITH TRANSCRIPT
Ask questions about the video content. Get answers powered by AI directly from the transcript.
GET MORE FROM YOUR TRANSCRIPTS
Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.