x | 0 | 1 | 2 |
P(X = x) | 0.25 | 0.5 | 0.25 |
10 Discrete Probability Distributions
For this and next chapters, we learn about probability distributions. In this chapter, we focus on discrete probability distributions for discrete random variables. We first learn what is a discrete probability distribution, followed by the two popular discrete distributions, binomial and Poisson distributions. We will learn how to use R to calculate probabilities from the distributions.
10.1 Probability Mass Function
A discrete random variable uses a probability function or probability mass function (pmf) to describe its randomness, and the pmf is its associated discrete probability distribution. The probability mass function of a discrete random variable
The probability distribution for a discrete
Example:🪙🪙 Toss a fair coin twice independently where
With the example, first the possible values of
To present its probability distribution in a table, for each possible value of
Let me show the calculation of the probabilities.
We can also display the probability distribution using graphs. We put possible values of
If you love math, you can specify the probability distribution using mathematical formula:
Now let’s talk a little bit about the properties of pmf.
for every value of . Remember that is a probability of some event, for example , and we’ve learned that for any event .The probabilities for a discrete
are additive because and are disjoint for any possible values . In our coin tossing example, and are disjoint because when tossing a fair coin two times independently, we cannot have no heads and one heads results at the same time. Therefore, , where assumes all possible values. In our coin tossing example, . The reason is that , , and form a partition of the entire sample space, where in the example . Therefore, .
Mean
Remember that a probability distribution describes the randomness pattern of a random variable
For the central tendency, we consider the mean of a discrete variable
The Greek letter
The mean of a discrete random variable
If you calculate it correctly, your answer should be one. It tells us that if we toss a fair coin twice independently, on average we’ll see one heads showing up.
Variance
As sample variance calculated from the sample data, we calculate the variance of the random variable
The standard deviation of
Intuitively, the variance of a discrete random variable
- The mean (
) and variance ( ) of a random variable or probability distribution are not the same as the sample mean ( ) and sample variance ( ) calculated from the sample data. Their main difference will be discussed in the Part Inference.
We have learned the general discrete probability distributions. Next we are going to learn two popular discrete probability distributions, binomial and Poisson distribution.
10.2 Binomial Distribution
Binomial Experiment and Random Variable
Binomial distribution is generated from the so-called binomial experiment that has the following properties:
👉 The experiment consists of a fixed number of identical trials, say
. That is, is pre-specified before the experiment is conducted, and remains unchanged while the experiment is in progress. Also, all the trials or repetitions in the experiment should be performed with exactly the same condition or procedure.👉 Each trial results in one of exactly two outcomes. In practice, we use success (S) and failure (F) to represent the two outcomes. The word success just means one of the two outcomes and does not necessarily mean something good. 😲 Depending on your research question, you could define Drug abuse as success and No drug abuse as failure.
👉 Trials are independent, meaning that the outcome of one trial does not affect the outcome of any other trial.
👉 The probability of success, say
, is constant for all trials.
If a binomial experiment is conducted, and
Distribution
The probability function
The binomial probability function is
It is not that important to memorize the formula as nowadays we use computing software to obtain
It can be shown that the binomial distribution has mean
To answer this question, we need to check if the experiment satisfies the four properties.
The number of trials is 2, and the two trials, tossing a fair coin are identical (if you are not nitpicking).
Each trial results in one of exactly two outcomes, heads or tails.
Trials are independent. Well it’s hard to say, but at least they are nearly independent.
The probability of heads is 1/2 for all trials because we got a fair coin.
So, is
Example
Assume that 20% of all drivers have a blood alcohol level above the legal limit. For a random sample of 15 vehicles, compute the probability that:
Exactly 6 of the 15 drivers will exceed the legal limit.
Of the 15 drivers, 6 or more will exceed the legal limit.
Suppose it’s a binomial experiment with
Since we know the value of parameter
To answer the first question, we just need to calculate
The second question asks for
Well I believe you’ve seen how tedious calculating a binomial probability is! It is so true especially when
In practice, we are not gonna calculate probabilities of binomial or other commonly used probability distributions. Instead, we use computing software. In R we can use dpqr
family functions to calculate probabilities or generate values from the distributions. In general, for some distribution, short for dist
, R has the following functions
-
d
dist(x, ...)
: calculate density value or probability value . -
p
dist(q, ...)
: calculate . -
q
dist(p, ...)
: obtain quantile of probability . -
r
dist(n, ...)
: generate random numbers.
When we use these functions, the dist
is replaced with the shortened name of the distribution we consider. For example, we use dbinom(x, ...)
to do the calculation for the binomial distribution.
The function ddist(x, ...)
gives us the probability density value d
. The continuous probability distribution will be discussed in Chapter 11 in detail. If the distribution is discrete, like binomial, it gives us the value
For the binomial distribution, we use dbinom(x, size, prob)
to compute size
is the number of trials and prob
is the probability of success. Besides size
and prob
because remember that they are the parameters of the binomial distribution. Without them, we cannot have a specific binomial distribution, and its probability cannot be calculated.
To obtain
## 1. P(X = 6)
dbinom(x = 6, size = 15, prob = 0.2)
[1] 0.04299262
To answer the second question pbinom(q, size, prob)
that calculates
## 2. P(X >= 6) = 1 - P(X <= 5)
1 - pbinom(q = 5, size = 15, prob = 0.2)
[1] 0.06105143
By default, the function pbinom(q, size, prob)
calculates the probability lower.tail
that is logical, and is TRUE
by default. When we set lower.tail = FALSE
in the function, it will instead calculate
## 2. P(X >= 6) = P(X > 5)
pbinom(q = 5, size = 15, prob = 0.2, lower.tail = FALSE)
[1] 0.06105143
Notice that we use q = 5
instead of q = 6
because we want lower.tail = FALSE
is added into the function, and the probability value is the same as the value before.
Below is an example of how to generate the binomial probability distribution as a graph.
plot(x = 0:15, y = dbinom(0:15, size = 15, prob = 0.2),
type = 'h', xlab = "x", ylab = "P(X = x)",
lwd = 5, main = "Binomial(15, 0.2)")
Here a sequence of integers 0 to 15 are created and put in the x-axis. Then dbinom(0:15, size = 15, prob = 0.2)
is used to create probabilities of type = 'h'
standing for “histogram”.1
Since
In practice, we are not gonna calculate probabilities of binomial or other commonly used probability distributions. Instead, we use computing software. In Python we can use the methods in the scipy.stats.binom
module to calculate probabilities or generate values from the distributions. In general, for some distribution, short for dist, Python has the following functions
-
dist.pmf(k, ...)
: calculate probability value .-
pmf
means probability mass function.
-
-
dist.pdf(x, ...)
: calculate probability density .-
pdf
means probability density function.
-
-
dist.cdf(k, ...)
: calculate .-
cdf
means cumulative distribution function.
-
-
dist.sf(k, ...)
: calculate .-
sf
means survival function.
-
-
dist.ppf(q, ...)
: obtain quantile of probability .-
ppf
means percent point function.
-
-
dist.rvs(n, ...)
: generate random numbers.-
rvs
means random variables.
-
When we use these functions, the dist is replaced with the shortened name of the distribution we consider. For example, we use binom.pmf(k, ...)
to do the calculation for the binomial distribution.
For the binomial distribution, we use binom.pmf(k, n, p)
to compute n
is the number of trials and p
is the probability of success. Besides k
, we need to provide the values of n
and p
because remember that they are the parameters of the binomial distribution. Without them, we cannot have a specific binomial distribution, and its probability cannot be calculated.
To obtain
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom
# Binomial distribution calculations
# 1. P(X = 6)
=6, n=15, p=0.2) binom.pmf(k
0.04299262263296005
To answer the second question binom.cdf(k, n, p)
that calculates
# 2. P(X >= 6) = 1 - P(X <= 5)
1 - binom.cdf(k=5, n=15, p=0.2)
0.06105142961766408
The function binom.cdf(q, n, p)
calculates the probability binom.sf(q, n, p)
to calculate
# Alternatively, using the upper tail probability
5, n=15, p=0.2) binom.sf(
0.061051429617664056
Notice that we use k=5
instead of k=6
because we want
Below is an example of how to generate the binomial probability distribution as a graph. We use the stem plot plt.stem().
0, 16) np.arange(
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
0, 16), binom.pmf(np.arange(0, 16), n=15, p=0.2),
plt.stem(np.arange(=" ")
basefmt"x")
plt.xlabel("P(X = x)")
plt.ylabel("Binomial(15, 0.2)")
plt.title( plt.show()
Here a sequence of integers 0 to 15 are created and put in the x-axis. np.arange(0, 16)
is a way to generate a sequence of numbers from 0 to 15. Again 0 is included but 16 is not. Then binom.pmf(np.arange(0, 16), n=15, p=0.2)
is used to create probabilities of
Since
10.3 Poisson Distribution
Poisson Random Variables
If we want to count the number of occurrences of some event2 over a unit of time or space/region/volume and observe its associated probability, we could consider the Poisson distribution. For example,
- The number of COVID patients arriving at ICU in one hour
- The number of Marquette students logging onto D2L in one day
- The number of dandelions per square meter on Marquette’s campus
Let
Again, the parameter
One interesting property of the Poisson distribution is that its mean and variance are both equal to its parameter
Assumptions and Properties of Poisson Variables
As the binomial distribution, the Poisson distribution comes from the Poisson experiment having the following properties and assumptions:
👉 Events occur one at a time; two or more events do not occur at the same time or in the same space or spot. For example, one cannot say two patients arrived at ICU at the same time. There must be a way to separate one event from the other, and one can always know which patient arrives at ICU earlier.
👉 The occurrence of an event in a given period of time or region of space is independent of the occurrence of the event in a nonoverlapping time period or region of space. For example, number of patients arriving at ICU between 2 PM and 3 PM has nothing to do with the number of patients arriving at ICU between 8 PM and 9 PM because the two time periods have no overlap.
👉
is constant for any period or region. For example, the mean number of patients arriving at ICU between 2 PM and 3 PM is the same as the mean number of patients arriving at ICU between 8 PM and 9 PM. This assumption is pretty strong, and usually violated in reality. If you want to use Poisson distribution to build your statistical model, use it with additional care.
Example
Last year there were 4200 births at the University of Wisconsin Hospital. Let
, the mean number of births per day.the probability that on a randomly selected day, there are exactly 10 births.
?
There are totally 4200 births in one year, so on average there are 11.5 per day. According to how we define , the time unit is a day, not a year. The parameter and should have the same time unit. (No end!) .
Did you see how tedious it is to calculate the Poisson probabilities even using a calculator? I know you are waiting for R/Python implementation!
Instead of using dbinom()
and pbinom()
, for Poisson distribution, we replace binom
with pois
, and use dpois(x, lambda)
and ppois(q, lambda)
to calculate the probabilities.
With lambda
being the mean of Poisson distribution, and
dpois(x, lambda)
to computeppois(q, lambda)
to computeppois(q, lambda, lower.tail = FALSE)
to compute
## 3.
## P(X > 10) = 1 - P(X <= 10)
1 - ppois(q = 10, lambda = lam)
[1] 0.5990436
## P(X > 10)
ppois(q = 10, lambda = lam,
lower.tail = FALSE)
[1] 0.5990436
Below is an example of how to generate the Poisson probability distribution as a graph.
plot(0:24, dpois(0:24, lambda = lam), type = 'h',
lwd = 5, ylab = "P(X = x)", xlab = "x",
main = "Poisson(11.5)")
Be careful that the Poisson
Instead of using binom.pnf()
and binom.cdf()
, for Poisson distribution, we replace binom
with poisson
, and use poisson.pmf(k, mu)
and poisson.cdf(k, mu)
to calculate the probabilities.
With lambda
being the mean of Poisson distribution, and
poisson.pmf(k, mu = lambda)
to computepoisson.cdf(k, mu = lambda)
to computepoisson.sf(k, mu = lambda)
to compute
from scipy.stats import poisson
# 1. Calculate lambda
= 4200 / 365
lam lam
11.506849315068493
## 2. P(X = 10)
=10, mu=lam) poisson.pmf(k
0.1128340209466802
## 3.
## P(X > 10) = 1 - P(X <= 10)
1 - poisson.cdf(10, mu=lam)
0.5990435715682069
## P(X > 10)
10, mu=lam) poisson.sf(
0.5990435715682069
Below is an example of how to generate the Poisson probability distribution as a graph.
0, 25), poisson.pmf(np.arange(0, 25), mu=lam),
plt.stem(np.arange(=" ")
basefmt"x")
plt.xlabel("P(X = x)")
plt.ylabel("Poisson(11.5)")
plt.title( plt.show()
Be careful that the Poisson
10.4 Relationship between Binomial and Poisson*
Actually binomial and Poisson distributions are somewhat related. Let’s consider the following example.
Suppose a store owner believes that customers arrive at his store at a rate of 3.6 customers per hour on average. He wants to find the distribution of the number customers who will arrive during a particular one-hour period. And here is what he plans to do.
He models customer arrivals in different time periods as independent. Then he divides the one-hour period into 3600 seconds and thinks of the arrival rate as being
Think about it. what he does is actually an binomial experiment because his experiment has
- A fixed number of identical trials: 3600 seconds (each second is one trial)
- Each trial results in one of two outcomes (No customer or 1 customer)
- Trials are independent. (customer arrivals in different time periods are independent)
- The probability of success is constant. (the probability of an arrival during any single second is 0.001)
If we let
So what do we learn from this example? “All models are wrong, but some are useful” – George E. Box (1919 - 2013). We can deal with a problem using different approaches or models. Also, the binomial and Poisson distributions are somehow related with each other.
So in fact, when
Let’s see the two probability distributions with different size of
When
10.5 Exercises
- Data collected by the Substance Abuse and Mental Health Services Administration (SAMSHA) suggests that 65% of 18-20 year olds consumed alcoholic beverages in any given year.
- Suppose a random sample of twelve 18-20 year olds is taken. When does it make sense to use binomial distribution for calculating the probability that exactly five consumed alcoholic beverages?
- What is the probability that exactly five out of twelve 18-20 year olds have consumed an alcoholic beverage?
- What is the probability that at most 3 out of 7 randomly sampled 18-20 year olds have consumed alcoholic beverages?
- A Dunkin’ Donuts in Milwaukee serves an average of 65 customers per hour during the morning rush.
- Which distribution have we studied that is most appropriate for calculating the probability of a given number of customers arriving within one hour during this time of day?
- What are the mean and the standard deviation of the number of customers this Starbucks serves in one hour during this time of day?
- Calculate the probability that this Dunkin’ Donuts serves 55 customers in one hour during this time of day.
It is a little misleading. The
'h'
type here is just a notation for this bar-like plotting type, not really the histogram we discussed in Chapter 5. To draw a histogram, we use the functionhist()
.↩︎In this Poisson distribution section, the word event means an incident or happening. It is not the event used in probability, which is a set conceptually.↩︎