14 Point and Interval Estimation

In this chapter we will be talking about estimation including point and interval estimation. Point estimation uses one single number computed from the sample to estimate our unknown parameter. Interval estimation provides uncertainty quantification and uses a range of plausible numbers to let us know where the truth unknown parameter may be located.

14.1 Point Estimator

Let me ask you a question.

If you could only use a single number to guess the unknown population mean,

μ

, what number would you like to use?

If the single number you use can be computed from of sample data $(X_{1}, X_{2}, \dots, X_{n})$ , then you use a point estimator to estimate the unknown parameter $μ$ . Previously we learned that a sample statistic is any transformation or function of $(X_{1}, X_{2}, \dots, X_{n})$ . Therefore, any statistic is considered a point estimator if it is used to estimate a population parameter.

A point estimate is a value of a point estimator used to estimate a population parameter. So here is the subtle difference. A point estimator is a random variable which is a function of sample data $(X_{1}, X_{2}, \dots, X_{n})$ (before actually being collected), and a point estimate is the realized value a point estimator, which is a value calculated from the collected data. For example, $\overset{―}{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}$ is a point estimator, and with the sample data $(x_{1}, x_{2}, x_{3}) = (2, 3, 7)$ , the point estimate is $\overset{―}{x} = \frac{1}{3} \sum_{i = 1}^{3} x_{i} = \frac{1}{3} (2 + 3 + 7) = 4.$

Back to the question. If we want to estimate the unknown population mean, which number we use to estimate it? We now have an intuitive answer. The sample mean $\overset{―}{X}$ is a statistic and a point estimator for the population mean $μ$ .

Sample Mean as an Point Estimator

Let’s see how the sample mean is used as an point estimator for $μ .$ Suppose the true population distribution is $N (2, 1)$ . Here the population mean $μ$ is two, but let’s pretend we don’t know its value and see how the sample mean performs. Such analysis is called simulation study.

We are going to collect a sample of size five, $(x_{1}, x_{2}, x_{3}, x_{4}, x_{5})$ . In the simulation, we draw five values from $N (2, 1)$ . The five drawn values are treated as our sample data, and $N (2, 1)$ is the population distribution. In R, we use rnorm() to generate random numbers from a normal distribution, where the first argument is the number of observations to be generated. ::: {.cell layout-align=“center”}

## Generate data x1, x2, x3, x4, x5, each from distribution N(2, 1)
set.seed(1234)
x_data_1 <- rnorm(n = 5, mean = 2, sd = 1)

:::

The following shows the realized five data points and the sample mean. ::: {.cell layout-align=“center”} ::: {.cell-output-display}

x1	x2	x3	x4	x5	sample mean
0.79	2.28	3.08	-0.35	2.43	1.65

::: :::

Here we use the sample mean $\overset{―}{X} = \frac{1}{5} \sum_{i = 1}^{5} X_{i}$ as our point estimator for $μ$ , and given the sample, the point estimate is $\overset{―}{x} =$ 1.65. You can see that the true $μ$ is two, but the point estimate $\overset{―}{x}$ is not equal to $μ$ . Why?

As we discussed in ?sec-prob-samdist, due to the randomness nature of drawing a sample value from the population distribution, we do not expect the statistic to be the same as the corresponding parameter. It is possible that most of our sample values happen to be larger or smaller than the true mean, or we may unluckily get an outlier sample value that distorts and drags the sample value toward it. In such cases, the sample mean will be not close to the true population mean. You can think this way. One data point represents one piece of information about the unknown population distribution. With a small sample size, our sample only represents a small part of the unknown distribution. The gap between sample mean and the true population mean is kind of like information lost because of not being able to collect the rest of the subjects in the population.

Figure 14.1 shows the sample data and the population distribution $N (2, 1)$ . Notice that we have an extreme value $- 0.35$ that is two standard deviations below the mean, and this causes the sample mean to be small.

Figure 14.1: Population distribution and sample data.

Well, we could collect a sample again if resources are permitted. In simulation, another sample of size five is drawn from the same population $N (2, 1)$ , and the result is shown below.

x1	x2	x3	x4	x5	sample mean
2.59	2.71	1.89	1.55	2.61	2.27

The second sample mean, $\overset{―}{x} =$ 2.27, is different from the first one. Why do the first sample and the second sample give us different sample means? Now you see why we want to learn sampling distribution. We use sample mean as the point estimator for $μ$ , and the sample mean has its own sampling distribution. Therefore, the sample mean varies from sample to sample due to its randomness nature.

We have connected sampling distribution to statistical inference, in particular the point estimation together. Figure 14.2 shows the sampling distribution of $\overset{―}{X}$ which is $N (2, 1 / 5)$ and the two sample mean values calculated from the two data sets. Now can you see why we want to use $\overset{―}{X}$ as the point estimator for $μ$ ? It is because the expected value of $\overset{―}{X}$ , $E (\overset{―}{X})$ is exactly equal to $μ$ , meaning that if we were able to produces a lot of $\overset{―}{x}$ s, the average of these $\overset{―}{x}$ s will be very close to the true unknown $μ$ although single one $\overset{―}{x}$ may still be distant from $μ$ . When the expected value of a point estimator is equal to the parameter it estimates, we say it is an unbiased estimator. Therefore, the sample mean $\overset{―}{X}$ is an unbiased estimator for the population mean $μ$ because $E (\overset{―}{X}) = μ$ .

Figure 14.2: Sampling Distribution of Sampling Mean.

Why Point Estimates Are Not Enough

If you want to estimate

μ

, would you prefer to report a range of values the parameter might be in or a single estimate like

\overset{―}{x}

Since $\overset{―}{X}$ is random and has its own distribution, its value varies from sample to sample. However, in reality we usually have only one data set, and one realized sample mean, and we are not able to replicate other data sets due to limited resources. We don’t know the sample mean we got is close to the true unknown population mean or not. First, the sample mean can go anywhere of its distribution, and the one we got may be far away from $μ$ . Moreover, we don’t know the value of $μ$ ! It does not make much sense to use just one single number when those uncertainty are there.

If you want to catch a fish, would you prefer to use a spear or a net? I would use a net because I’m not a sharpshooter, and using a net covers a large range of possible locations where the fish can be. Due to the variation of $\overset{―}{X}$ , if we report a point estimate, we probably won’t hit the exact population parameter. If we report a range of plausible values, we have a good shot at capturing the parameter.

14.2 Confidence Intervals

In statistics, a plausible range of values for $μ$ is called a confidence interval (CI). This range depends on how precise and reliable our statistic is as an estimate of the parameter. To construct a CI for $μ$ , we first need to quantify the variability of our sample mean. Quantifying this uncertainty requires a measurement of how much we would expect the sample statistic to vary from sample to sample. This is in fact the variance of the sampling distribution of the sample mean! Intuitively speaking, if $\overset{―}{x}$ varies a lot, we are more uncertain about whether the $\overset{―}{x}$ we got is close to the $μ$ or not. In other words, the precision of the estimation is not that good. In order to make sure that the plausible range of values does capture $μ$ , we need to include more possible values, and make the range larger. Do we know the variance of $\overset{―}{X}$ ? Absolutely. Thanks to CLT, $\overset{―}{X} \sim N (μ, σ^{2} / n)$ regardless of what the population distribution is.

How confident we are about the CI covering the parameter is called the level of confidence. The higher the confidence level is, the more reliable the CI is because the CI is more likely to capture the parameter.

Note

Given the same level of confidence, the larger the variation of $\overset{―}{X}$ is, the wider the CI for $μ$ will be.

Precision vs. Reliability

With a fixed sample size, the precision and reliability of a confidence interval are trading off. Here is a question.

If we want to be very certain that we capture

μ

, should we use a wider or a narrower interval? What drawbacks are associated with using a wider interval?

We use a wider interval because a wider interval is more likely to capture the population parameter value. So a more reliable confidence interval is wider than a less reliable confidence interval. But What drawbacks are associated with using a wider interval?

The precision and reliability trade-off is clearly explained in the cute comic in Figure 14.3. I can say I am 100% confident that your exam 1 score is between 0 and 100. Am I right? Yes. But do I provide helpful information? Absolutely not, the interval includes every possible score of the exam. The interval is too wide to be helpful. Such interval is 100% reliable but with no precision at all.

Figure 14.3: Balance between precision and reliability. Source: https://thestatsninja.com/2019/02/19/how-to-navigate-confidence-intervals-with-confidence/

Narrower intervals are more precise but less reliable, while wider intervals are more reliable but less precise. How can we get best of both worlds – high precision and high reliability/accuracy, meaning short interval with high level of confidence? What we need is larger sample size, given that the sample quality is good. It is a quite easy statement, but sometimes it’s hard to collect more samples.

A Confidence Interval Is for a Parameter

A confidence interval is for a parameter, NOT a statistic. Remember, a confidence interval is a way of doing estimation for a unknown parameter. For example, we use the sample mean to form a confidence interval for the population mean.

We NEVER say “The confidence interval of the sample mean, $\overset{―}{X}$ , is ….” We SAY “The confidence interval for the true population mean $μ$ , is …”

In general, a confidence interval for $μ$ has the form

\overset{―}{x} \pm m = (\overset{―}{x} - m, \overset{―}{x} + m)

The $m$ is called the margin of error. It controls the width of the interval $2 m$ . The CI is centered at the sample mean, and $\overset{―}{x} - m$ is the lower bound and $\overset{―}{x} + m$ is the upper bound of the confidence interval. The point estimate, $\overset{―}{x}$ , and margin of error, $m$ , can be obtained from known quantities and our data once sampled.

$(1 - α) 100 %$ Confidence Intervals

Formally, for $0 \leq α \leq 1$ , we define the confidence level $1 - α$ as the proportion of times that the CI contains the population parameter, assuming that the estimation process is repeated a large number of times.

The confidence level can be any number between zero and one. Common choices for the confidence level include 90% $(α = 0.10)$ , 95% $(α = 0.05)$ and 99% $(α = 0.01)$ . Keep in mind that confidence level tells us the reliability of the interval. Because precision and reliability have a trade-off relationship, a CI with very high confidence level (high reliability) will have less precision, i.e., larger margin of error and with width of the interval. 95% is the most common level because it has a good balance between precision (width of the CI) and reliability (confidence level).

High reliability and Low precision: I am 100% confident that the mean height of Marquette students is between 3’0” and 8’0”.
- Duh…🤷
Low reliability and High precision: I am 20% confident that mean height of Marquette students is between 5’6” and 5’7”.
- This is far from the truth… 🙅

$95 %$ Confidence Intervals for $μ$

We’ve learned the general form of a confidence interval for $μ$ and defined the confidence level. We now formally derive the form of the $(1 - α) 100 %$ confidence interval for $μ$ . For simplicity, here we assume $σ$ is known to us when the interval is constructed. A confidence interval can be derived from the sampling distribution of the point estimator. Such approach is called the distribution-based approach. A confidence interval can also be derived using simulation, and such approach is called simulation-based approach, bootstraping method for example. This chapter we build a CI based on the sampling distribution of $\overset{―}{X}$ . We discuss bootstraping in Chapter 15.

Suppose we want to obtain the $95 %$ confidence interval for $μ$ . So $α = 0.05$ . We start with the sampling distribution of $\overset{―}{X} \sim N (μ, \frac{σ^{2}}{n})$ shown in Figure 14.4. The sampling distribution tells us that the realized value $\overset{―}{x}$ will be within 1.96 SDs of the population mean, $μ$ , $95 %$ of the time. In other words,

$P (μ - 1.96 \frac{σ}{\sqrt{n}} < \overset{―}{X} < μ + 1.96 \frac{σ}{\sqrt{n}}) = 0.95$

Here the $z$ -score of 1.96 is the 97.5% percentile of the standard normal distribution and -1.96 is the 2.5% percentile. The $z$ -score 1.96 is associated with 2.5% area to the right and is called a critical value denoted as $z_{0.025} = z_{α / 2}$ . The $z$ -score -1.96 is associated with 2.5% area to the left, and it happens to be the negative value of $z_{0.025}$ because of the symmetry of normal distribution.

Figure 14.4: Sampling distribution of the sample mean with 95% probability in the middle.

We learned that $P (μ - 1.96 \frac{σ}{\sqrt{n}} < \overset{―}{X} < μ + 1.96 \frac{σ}{\sqrt{n}}) = 0.95 .$ The probability that the variable $\overset{―}{X}$ is in the interval $(μ - 1.96 \frac{σ}{\sqrt{n}}, μ + 1.96 \frac{σ}{\sqrt{n}})$ is 95%. But is the interval $(μ - 1.96 \frac{σ}{\sqrt{n}}, μ + 1.96 \frac{σ}{\sqrt{n}})$ our confidence interval?

The answer is No ❌! Remember that we don’t know $μ$ and we are estimating it. The interval cannot be determined because it involves the unknown quantity $μ$ . But don’t be too disappointed. We are almost there.

We can arrange the inequality in the probability so that $μ$ is in the middle and the probability remains unchanged.

$μ - 1.96 \frac{σ}{\sqrt{n}} < \overset{―}{X} ⟺ μ < \overset{―}{X} + 1.96 \frac{σ}{\sqrt{n}}$
$\overset{―}{X} < μ + 1.96 \frac{σ}{\sqrt{n}} < ⟺ \overset{―}{X} - 1.96 \frac{σ}{\sqrt{n}} < μ$
$μ - 1.96 \frac{σ}{\sqrt{n}} < \overset{―}{X} < μ + 1.96 \frac{σ}{\sqrt{n}} ⟺ \overset{―}{X} - 1.96 \frac{σ}{\sqrt{n}} < μ < \overset{―}{X} + 1.96 \frac{σ}{\sqrt{n}}$

$\begin{array}{r} P (μ - 1.96 \frac{σ}{\sqrt{n}} < \overset{―}{X} < μ + 1.96 \frac{σ}{\sqrt{n}}) = P (\overset{―}{X} - 1.96 \frac{σ}{\sqrt{n}} < μ < \overset{―}{X} + 1.96 \frac{σ}{\sqrt{n}}) = 0.95 \end{array}$

We are done! With sample data of size $n$ , $(\overset{―}{x} - 1.96 \frac{σ}{\sqrt{n}}, \overset{―}{x} + 1.96 \frac{σ}{\sqrt{n}})$ is our $95 %$ CI for $μ$ . Note that if $σ$ is known to us, the interval can be computed from our data because we know the sample size, and we can get the sample mean. The margin of error $m = 1.96 \frac{σ}{\sqrt{n}}$ .

14.3 Confidence Intervals for $μ$ When $σ$ is Known

We just obtained the 95% confident interval for $μ$ . How about the general $(1 - α) 100$ confident interval for $μ$ (when $σ$ is known)? We first introduce the requirements for constructing the interval, then provide the interval formula.

The requirements for estimating $μ$ when $σ$ is known include

👉 The sample should be a random sample, such that all data $X_{i}$ are drawn from the same population and $X_{i}$ and $X_{j}$ are independent. In fact, any methods in this course are based on the assumption of a random sample
👉 The population standard deviation, $σ$ , is known.
👉 The population is either normally distributed, $n > 30$ or both, i.e., $X_{i} \sim N (μ, σ^{2})$ . The sample size $n > 30$ allows the central limit theorem to be applied and hence normality is satisfied.

The general $(1 - α) 100 %$ confidence interval for $μ$ can be borrowed from the $95 %$ confidence interval for $μ$ , $(\overset{―}{x} - z_{0.025} \frac{σ}{\sqrt{n}}, \overset{―}{x} + z_{0.025} \frac{σ}{\sqrt{n}})$ . The 95% confidence level means $α = 0.05$ . For the general $(1 - α) 100 %$ confidence interval, we just replace $z_{0.025}$ with $z_{α / 2}$ for any $α$ between zero and one. Therefore, the general $(1 - α) 100 %$ confidence interval for $μ$ is

To sum up, we provide procedures for constructing a confidence interval for $μ$ when $σ$ is known:

Check that the requirements are satisfied.
Decide $α$ or the confidence level $(1 - α)$ .
Find the critical value, $z_{α / 2}$ .
Evaluate margin of error, $m = z_{α / 2} \cdot \frac{σ}{\sqrt{n}}$ .
Construct the $(1 - α) 100 %$ CI for $μ$ using the sample mean, $\overset{―}{x}$ , and margin of error, $m$ :

\overset{―}{x} \pm z_{α / 2} \frac{σ}{\sqrt{n}} or (\overset{―}{x} - z_{α / 2} \frac{σ}{\sqrt{n}}, \overset{―}{x} + z_{α / 2} \frac{σ}{\sqrt{n}})

Example

Suppose we want to know the mean systolic blood pressure (SBP) of a population. Assume that the population distribution is normal and has a standard deviation of 5 mmHg. We have a random sample of 16 subjects from this population with a mean of 121.5 mmHg. Estimate the mean SBP with a 95% confidence interval.

We construct the confidence interval step by step using the procedure.

Requirements:
- Normality is assumed, $σ = 5$ is known and a random sample is collected.
Decide $α$ :
- $α = 0.05$
Find the critical value $z_{α / 2}$ :
- $z_{α / 2} = z_{0.025} = 1.96$
Evaluate margin of error $m = z_{α / 2} \frac{σ}{\sqrt{n}}$ :
- $m = (1.96) \frac{5}{\sqrt{16}} = 2.45$
Construct the $(1 - α) 100 %$ CI:
- The 95% CI for the mean SBP is $\overset{―}{x} \pm z_{α / 2} \frac{σ}{\sqrt{n}} = (121.5 - 2.45, 121.5 + 2.45) = (119.05, 123.95)$

Below is a demonstration of how to find the 95% CI for SBP using R/Python

R
Python

## save all information we have
alpha <- 0.05
n <- 16
x_bar <- 121.5
sig <- 5

## 95% CI
## z-critical value
(cri_z <- qnorm(p = alpha / 2, lower.tail = FALSE))  
# [1] 1.96

## margin of error
(m_z <- cri_z * (sig / sqrt(n)))  
# [1] 2.45

## 95% CI for mu when sigma is known
x_bar + c(-1, 1) * m_z  
# [1] 119 124

import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt

# Given values
alpha = 0.05
n = 16
x_bar = 121.5
sig = 5  # Population standard deviation

# z-critical value
cri_z = norm.ppf(1 - alpha / 2)
cri_z
# 1.959963984540054

cri_z = norm.isf(alpha / 2) ## also works
cri_z
# 1.9599639845400545

# Margin of error
m_z = cri_z * (sig / np.sqrt(n))
m_z
# 2.4499549806750682

# 95% Confidence Interval for mu
x_bar + np.array([-1, 1]) * m_z
# array([119.05004502, 123.94995498])

Construct a 99% CI for the mean SBP. Do you expect it to have a wider or narrower interval than the 95% CI? Why?

Interpreting the Confidence Interval

We have known how to construct a confidence interval. But what on earth is that? How do we interpret the interval correctly? This is pretty important because the interval is usually misinterpreted and inappropriately used in statistical analysis. Don’t blame yourself if you find it hard to understand the meaning. The confidence interval concept is not intuitive, and it does not really answer what we care about the unknown parameter. The confidence interval is a concept in the classical or frequestist point of view. Another way of interval estimation is to use the so called credible interval that uses Bayesian philosophy. We will discuss their difference in detail in Chapter 22.

Back to a 95% confidence interval. The following statements and interpretations are wrong. Please do not interpret the interval this way.

WRONG ❌ “There is a 95% chance/probability that the true population mean will fall between 119.1 mm and 123.9 mm.”
WRONG ❌ “The probability that the true population mean falls between 119.1 mm and 123.9 mm is 95%.”

Although those statements are often what we want, they are completely wrong. Let’s learn why. The sample mean is a random variable with a sampling distribution, so it makes sense to compute a probability of it being in some interval. The population mean is unknown and FIXED, so we cannot assign or compute any probability of it. If we were using Bayesian inference Chapter 22, a different inference method, we could compute a probability associated with $μ$ because in Bayesian statistics $μ$ is treated as a random variable.

So how do we correctly interpret a confidence interval? Here is the answer.

“We are 95% confident that the mean SBP lies between 119.1 mm and 123.9 mm.”

But still what does “95% confident” really mean? This means if we were able to collect our data many times and build the corresponding CIs, we would expect that about 95% of those intervals would contain the true population parameter, which, in this case, is the mean systolic blood pressure.

Remember that $\overset{―}{x}$ varies from sample to sample, so does its corresponding CI because the CI is a function of $\overset{―}{X}$ given $n$ and $σ$ . This idea is shown in Figure 14.5. Here we do a simulation assuming $μ$ is known at 120, and $σ = 5$ . Also, assume we were able to repeatedly collect our sample of the same size $n = 16$ . Here 100 data sets are generated, and for each data set, the sample mean and its corresponding 95% CI are computed. Since the confidence level is 95%, 95% of those intervals would contain the true population parameter, 120 in this example. The 36th, 45th, 52nd, 82nd, and 99th data sets have the interval not capturing the true parameter.

Figure 14.5: Illustration of 100 95% confidence intervals.

Important

Please keep the following ideas in mind.

A 95% CI does not mean that if 100 data sets are collected, there will be exactly 95 intervals capturing $μ$ . It is a long-term sampling idea.
We never know with certainty that 95% of the intervals, or any single interval for that matter, contains the true population parameter because again we never know what the true value of the parameter is.
In reality, we usually have only one data set, and we are not able to collect more data. We have no idea of whether our 95% confidence interval capture the unknown parameter or not. We are only “95% confident”.

viewof settings = Inputs.form([
  Inputs.range([-5, 5], {value: 1, label: tex`\text{pop mean, }\mu`, step: 0.01}),
  Inputs.range([0.01, 5], {value: 1, label: tex`\text{pop stdev, }\sigma`, step: 0.01}),
  Inputs.range([0, 99], {value: 90, label: "confidence level", step: 1}),
  Inputs.range([0, 200], {value: 50, label: "sample size, n", step: 1})
  ])

settings = Array(4) [1, 1, 90, 50]

viewof nrep = Scrubber(d3.ticks(1, 1000, 1000), {
  autoplay: false,
  loop: false,
  initial: 1,
  delay: 350,
  format: x => `number of datasets = ${x.toFixed(0)}`
})

nrep = 2

textinfo = md`
Out of a total of ${nrep} samples, ${d3.sum(data.map(d => d.rep <= nrep && (mu >= d.lower && mu <= d.upper)))} (**<span style="color:red">${(d3.sum(data.map(d => d.rep <= nrep && (mu >= d.lower && mu <= d.upper)))*100/nrep).toFixed(3)}%</span>**) of the ${confidence}% confidence intervals contained the true population mean ${tex`\mu= `} ${mu}.
`

Out of a total of 2 samples, 2 (100.000%) of the 90% confidence intervals contained the true population mean $\mu=$ 1.

viewof lockvertical = Inputs.toggle({label: "Lock vertical axis", value: false})

lockvertical = false

plt = Plot.plot({
  style: {fontSize: "12px"},
  width: 960,
  y: {
    label: "mean",
    domain: lockvertical ? [mu- 3, mu + 3] : [mu - 5*sigma/Math.sqrt(nobs),mu + 5*sigma/Math.sqrt(nobs)]
  },
  x: {
    label: "sample number",
    domain:  [0, nrep]
  },
  marks: [
    Plot.ruleX([0]),    
    Plot.ruleX(data, {
      filter: d => (d.rep <= nrep && mu >= d.lower && mu <= d.upper),
      x: "rep",
      y1: "lower",
      y2: "upper",  
      stroke: "steelblue",
      strokeWidth: 1.5
    }),
    Plot.dot(data, {filter: d => (d.rep <= nrep && mu >= d.lower && mu <= d.upper), x: "rep", y: "xbar", fill: "steelblue", r: 3}),

    Plot.ruleX(data, {
      filter: d => d.rep <= nrep && (mu < d.lower || mu > d.upper),
      x: "rep",
      y1: "lower",
      y2: "upper",  
      stroke: "orange",
      strokeWidth: 1.5
    }),
    Plot.dot(data, {filter: d => d.rep <= nrep && (mu < d.lower || mu > d.upper), x: "rep", y: "xbar", fill: "orange", r: 3}),
    Plot.ruleY([mu], {stroke: "#D22B2B", strokeWidth: 1.5})

  ]
})

Figure 14.6: Source: https://observablehq.com/@mattiasvillani/confidence-interval-for-a-mean

import {Scrubber} from "@mbostock/scrubber"

  import {Scrubber as Scrubber} from "@mbostock/scrubber"

function simulate_means(mu, sigma, nobs, nrep){
  
  const tvalue = jstat.studentt.inv(1-(1-(confidence/100))/2, nobs-1);
  var data = [];
  for (let j = 1; j <= nrep; j++){
        let sample = d3.range(nobs).map(d => d3.randomNormal(mu,sigma)())
        let xbar = d3.mean(sample)
        let s = jstat.stdev(sample, true) 
        let lower = xbar - tvalue*s/Math.sqrt(nobs)
        let upper = xbar + tvalue*s/Math.sqrt(nobs)
        data.push({rep: j, xbar: xbar, lower: lower, upper: upper})
  }
  return data
}

simulate_means = ƒ(mu, sigma, nobs, nrep)

jstat = require('jstat')

jstat = ƒ()

mu = settings[0]
sigma = settings[1]
confidence = settings[2]
nobs = settings[3]

mu = 1

sigma = 1

confidence = 90

nobs = 50

data = {
  nobs;
  mu;
  sigma;
  const d = new Date();
  let time = d.getTime();
  const data = simulate_means(mu, sigma, nobs, 1000)
  return data
}

data = Array(1000) [Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, …]

The procedure of generating 100 confidence intervals for $μ$ when $σ$ is known is shown in the algorithm below.

Algorithm

Generate 100 sampled data of size $n$ : $(x_{1}^{1}, x_{2}^{1}, \dots, x_{n}^{1}), \dots (x_{1}^{100}, x_{2}^{100}, \dots, x_{n}^{100})$ , where $x_{i}^{m} \sim N (μ, σ^{2})$ .
Obtain 100 sample means $({\overset{―}{x}}^{1}, \dots, {\overset{―}{x}}^{100})$ .
For each $m = 1, 2, \dots, 100$ , compute the corresponding confidence interval $({\overset{―}{x}}^{m} - z_{α / 2} \frac{σ}{\sqrt{n}}, {\overset{―}{x}}^{m} + z_{α / 2} \frac{σ}{\sqrt{n}})$

mu <- 120; sig <- 5 
al <- 0.05; M <- 100; n <- 16

set.seed(2024)
x_rep <- replicate(M, rnorm(n, mu, sig))
xbar_rep <- apply(x_rep, 2, mean)
E <- qnorm(p = 1 - al / 2) * sig / sqrt(n)
ci_lwr <- xbar_rep - E
ci_upr <- xbar_rep + E

plot(NULL, xlim = range(c(ci_lwr, ci_upr)), ylim = c(0, M), 
     xlab = "95% CI", ylab = "Sample", las = 1)
mu_out <- (mu < ci_lwr | mu > ci_upr)
segments(x0 = ci_lwr, y0 = 1:M, x1 = ci_upr, col = "navy", lwd = 2)
segments(x0 = ci_lwr[mu_out], y0 = (1:M)[mu_out], x1 = ci_upr[mu_out], 
         col = 2, lwd = 2)
abline(v = mu, col = "#FFCC00", lwd = 2)

# (114.22947438766174, 125.53817008417028)
# (0.0, 100.0)

mu = 120
sig = 5
al = 0.05
M = 100
n = 16

np.random.seed(2024)
x_rep = np.random.normal(loc=mu, scale=sig, size=(M, n))
xbar_rep = np.mean(x_rep, axis=1)
E = norm.ppf(1 - alpha / 2) * sig / np.sqrt(n)
ci_lwr = xbar_rep - E
ci_upr = xbar_rep + E

for i in range(M):
    col = 'red' if mu < ci_lwr[i] or mu > ci_upr[i] else 'navy'
    plt.plot([ci_lwr[i], ci_upr[i]], [i, i], color=col, lw=1.5)
plt.xlim([min(ci_lwr), max(ci_upr)])
plt.ylim([0, M])
plt.xlabel("95% CI")
plt.ylabel("Sample")
plt.axvline(mu, color="#FFCC00", lw=2)
plt.show()

14.3.1 Reducing margin of error and determining sample size*

We learn that the margin of error is $E = z_{α / 2} \frac{σ}{\sqrt{n}}$ . Of course we prefer small margin of error for more estimation accuracy and precision. From its formula, we can see that there are three ways to reduce the margin of error:

Reduce $σ$
Increase $n$
Reduce $z_{α / 2}$

However, given a confidence level $1 - α$ and known $σ$ , we can reduce the margin of error only by increasing sample size $n$ . Due to high sampling costs, we like to find the minimum sample size needed to get a desired margin of error.

What we can do is to rewrite the margin of error, and represent $n$ as a function of $E$ , $α$ , and $σ$ . Once all three are given, we know the how large the sample size is.

$E = z_{α / 2} \frac{σ}{\sqrt{n}} ⟺ \frac{1}{\sqrt{n}} = \frac{E}{z_{α / 2} σ} ⟺ \sqrt{n} = \frac{z_{α / 2} σ}{E} ⟺ n = {(\frac{z_{α / 2} σ}{E})}^{2}$

Clearly, to get the desired margin of error, say $E_{d}$ the sample size should be at least ${(\frac{z_{α / 2} σ}{E_{d}})}^{2}$ because $E$ and $n$ inversely proportional to each other.

Example

State tax advisory board wants to estimate the mean household income with a margin of error of $1,000 with 99% confidence level. Assume that the population standard deviation is $10,000. How many households they need to sample?

$E = 1000$ , $α = 0.01$ and $z_{0.005} = 2.58$ , $σ = 10000$
$n = {(\frac{z_{α / 2} σ}{E})}^{2} = {(\frac{(2.58) (10000)}{1000})}^{2} = 656.7$
They need sample size 657.

14.4 Confidence Intervals for $μ$ When $σ$ is Unknown

We complete the discussion of confidence intervals for $μ$ when $σ$ is known. Do you see anything unreasonable? Do you think that assuming $σ^{2}$ is known is reasonable? In fact, the population variance is calculated as $σ^{2} = \frac{\sum_{i = 1}^{N} (x_{i} - μ)^{2}}{N}$ , where $N$ is the population size. The formula involves $μ$ , the unknown parameter we’d like to estimate. It’s rare that we don’t know $μ$ but know $σ$ . What do we do if $σ$ is unknown?

When $σ$ is unknown to us, we cannot use normal distribution anymore. Instead, we use the Student’s t distribution (or $t$ -distribution) to construct a confidence interval for $μ$ when $σ$ is unknown. To construct these confidence intervals we still need

A random sample
A population that is normally distributed and/or $n > 30$ .

The confidence interval when $σ$ is known includes $σ$ in the formula. When $σ$ is unknown, we cannot use the formula and need to find $σ$ ’s surrogate.

What is a natural estimator for the unknown

σ

When $σ$ is unknown, we use the sample standard deviation, $S = \sqrt{\frac{\sum_{i = 1}^{n} (X_{i} - \overset{―}{X})^{2}}{n - 1}}$ , instead when constructing the CI.

Student’s t Distribution

If the population is normally distributed or $n > 30$ , we know $\overset{―}{X}$ is exactly or approximately $N (μ, \frac{σ^{2}}{n})$ . Therefore $Z = \frac{\overset{―}{X} - μ}{σ / \sqrt{n}} \sim N (0, 1)$ . Now if $σ$ is replaced with its surrogate $S$ , then the new random variable say $T$ will be student’s t distributed with the degrees of freedom (df) $n - 1$ :

$T = \frac{\overset{―}{X} - μ}{S / \sqrt{n}} \sim t_{n - 1}$

Here the degrees of freedom is the parameter of the student’s t distribution.

Properties

The student’s t distribution, as shown in Figure 14.7, looks pretty similar to the standard normal distribution, but they are different. Some of the properties of the student’s t distribution are listed below.

For any degrees of freedom, the student’s t distribution is symmetric about the mean 0 and bell-shaped like $N (0, 1)$ .
For any degrees of freedom, the student’s t distribution has more variability than $N (0, 1)$ , meaning that the distribution has heavier tails and lower peak.
The the student’s t distribution has less variability for larger degrees of freedom (sample size).
As $n \to \infty$ $(d f \to \infty)$ , the student’s t distribution approaches $N (0, 1)$ .

Figure 14.7: Student t distributions with various degrees of freedom.

viewof myinputs = Inputs.form([
      Inputs.range([1, 100], {value: 4, step: 1, label: tex`\text{DF }\nu:`}),
      Inputs.range([-8, 8], {value: -1.96, step: 0.01, label: "Quantile:"})
    ])

myinputs = Array(2) [4, -1.96]

tailprobs = tex`\text{Normal distribution: } P(X\leq ${myinputs[1]})=${normcdf.toPrecision(4)} \\ \text{Student-}t \text{ distribution: } P(X\leq ${myinputs[1]})=${studentcdf.toPrecision(4)}`

\text{Normal distribution: } P(X\leq -1.96)=0.02500 \\ \text{Student-}t \text{ distribution: } P(X\leq -1.96)=0.06078

plt_t = Plot.plot({
    color: {
      legend: true
    },
    x: {
      label: "x",
      axis: true
    },
    y: {
      axis: false,
      domain: [0,0.4]
    },
    marks: [
      Plot.ruleY([0]),
      Plot.line(data_t, {x: "x", y: "pdf", stroke : "dist", strokeWidth: 2}),
      Plot.areaY(data_t, {filter: d => d.x <= myinputs[1] && d.dist == "normal", x: "x", y: "pdf", fill: "steelblue", opacity: 0.2}),
      Plot.areaY(data_t, {filter: d => d.x <= myinputs[1] && d.dist == "student-t", x: "x", y: "pdf", fill: "orange", opacity: 0.2}),
      
    ]
  })

normalstudent-t

Figure 14.8: Source: https://observablehq.com/@mattiasvillani/student-t

viewof myinputs_zoom = Inputs.form([
      Inputs.range([1, 100], {value: 10, step: 1, label: tex`\text{DF }\nu:`}),
      Inputs.range([-8, -3], {value: -5, step: 0.01, label: "Quantile:"})
    ])

myinputs_zoom = Array(2) [10, -5]

Plot.plot({
    caption: html`Zoom in the left tail`,
    color: {
      legend: true
    },
    x: {
      label: "x",
      axis: true,
      ticks: [-8,-7,-6,-5,-4,-3, myinputs_zoom[1]],
      domain: [-8,-3]
    },
    y: {
      axis: true,
      domain: [0,0.01]
    },
    marks: [
      Plot.ruleY([0]),     
      Plot.ruleX([-10]), 
      Plot.ruleX([myinputs_zoom[1]], {stroke: "gray", strokeOpacity: 0.4}),
      Plot.line(data_zoom, {filter: d => d.x <= -3, x: "x", y: "pdf", stroke : "dist", strokeWidth: 2}),
      //Plot.areaY(data, {filter: d => d.x <= myinputs[1] && d.dist == "normal", x: "x", y: "pdf", fill: "steelblue", opacity: 0.2}),
      //Plot.areaY(data, {filter: d => d.x <= myinputs[1] && d.dist == "student-t", x: "x", y: "pdf", fill: "orange", opacity: 0.2}),
    ]
  })

normalstudent-t

Zoom in the left tail

Figure 14.9: Source: https://observablehq.com/@mattiasvillani/student-t

data_t = {
  const x = d3.range(-8, 8, 0.01);
  const normpdf = x.map(x => ({x: x, pdf: jstat.normal.pdf(x, 0, 1), dist: "normal"}));
  const studentpdf = x.map(x => ({x: x, pdf: jstat.studentt.pdf(x, myinputs[0]), dist: "student-t"}));
  const data = normpdf.concat(studentpdf)
  return data
}

data_t = Array(3200) [Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, …]

data_zoom = {
  const x = d3.range(-10, -3, 0.01);
  const normpdf = x.map(x => ({x: x, pdf: jstat.normal.pdf(x, 0, 1), dist: "normal"}));
  const studentpdf = x.map(x => ({x: x, pdf: jstat.studentt.pdf(x, myinputs_zoom[0]), dist: "student-t"}));
  const data = normpdf.concat(studentpdf)
  return data
}

data_zoom = Array(1400) [Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, …]

normcdf = jstat.normal.cdf(myinputs[1], 0, 1);
studentcdf = jstat.studentt.cdf(myinputs[1], myinputs[0]);

normcdf = 0.024997895148220428

studentcdf = 0.06077731992509186

Critical Values of $t_{α / 2, n - 1}$

In the CI formula with known $σ$ , we use the critical value $z_{α / 2}$ , the standard normal value so that $P (Z > z_{α / 2}) = α / 2$ . When $σ$ is unknown, we use $t_{α / 2, n - 1}$ as the critical value, instead of $z_{α / 2}$ . Notice that the standard normal has nothing to do with $μ$ and $σ$ of a general $N (μ, σ^{2})$ distribution. ¹ Therefore no parameter is attached to $z_{α / 2}$ . However, the $t$ critical value $t_{α / 2, n - 1}$ changes with the degrees of freedom $n - 1$ . With the same logic, the critical value $t_{α / 2, n - 1}$ is a Student’s t value with degrees of freedom $n - 1$ so that $P (T_{n - 1} > t_{α / 2, n - 1}) = α / 2$ as shown in Figure 14.10.

Figure 14.10: Illustration of critical value for Student t distribution.

With the same

α

, is

t_{α, n - 1}

z_{α}

larger?

You should be able to answer this question based on the fact that for any degrees of freedom, the student’s t distribution has more variability than $N (0, 1)$ , or heavier tails. The heavier tail forces $t_{α / 2, n - 1}$ to be more extreme than $z_{α / 2}$ . Figure 14.11 illustrates this fact. The red $t_{d f = 2}$ distribution has heavier tails than the black standard normal distribution.

The table below shows $z$ and $t$ critical values at confidence level 90%, 95% and 99%. The $t$ values are getting closer to the $z$ values as the degree of freedom increases. When the degree of freedom goes to infinity, the $t$ values converge to $z$ values.

Level	t df = 5	t df = 15	t df = 30	t df = 1000	t df = inf	z
90%	2.02	1.75	1.70	1.65	1.64	1.64
95%	2.57	2.13	2.04	1.96	1.96	1.96
99%	4.03	2.95	2.75	2.58	2.58	2.58

$(1 - α) 100 %$ Confidence Intervals for $μ$ When $σ$ is Unknown

We have been equipped with everything we need for constructing $(1 - α) 100 %$ confidence interval for $μ$ when $σ$ is unknown. The interval is $(\overset{―}{x} - t_{α / 2, n - 1} \frac{s}{\sqrt{n}}, \overset{―}{x} + t_{α / 2, n - 1} \frac{s}{\sqrt{n}})$

The interval form is the same as before. We have the sample mean plus and minus the margin of error. Comparing to the interval with known $σ$ , the difference is that $z_{α / 2}$ is replaced with $t_{α / 2, n - 1}$ , and $σ$ is replaced with $s$ .

Given the same confidence level $1 - α$ , $t_{α / 2, n - 1} > z_{α / 2}$ , leading to a wider interval if $s$ is not too smaller than the true $σ$ . The intuition is that we are more uncertain when doing inference about $μ$ because we don’t have information about both $μ$ and $σ$ , and replacing $σ$ with $s$ adds additional uncertainty.

R
Python

Back to the systolic blood pressure (SBP) example. We have $n = 16$ and $\overset{―}{x} = 121.5$ . Estimate the mean SBP with a 95% confidence interval with unknown $σ$ and $s = 5$ . The code for the $t$ interval is pretty similar to the $z$ interval. The main difference is that we are gonna use qt() to find a quantile or critical value from the Student’s t distribution. In the function, the first argument is still the given probability, then we must specify the degrees of freedom, otherwise R cannot determine which $t$ -distribution is being considered, and will render an error message.

alpha <- 0.05
n <- 16
x_bar <- 121.5
s <- 5  ## sigma is unknown and s = 5

## t-critical value
(cri_t <- qt(p = alpha / 2, df = n - 1, lower.tail = FALSE)) 
# [1] 2.13

## margin of error
(m_t <- cri_t * (s / sqrt(n)))  
# [1] 2.66

## 95% CI for mu when sigma is unknown
x_bar + c(-1, 1) * m_t  
# [1] 119 124

Back to the systolic blood pressure (SBP) example. We have $n = 16$ and $\overset{―}{x} = 121.5$ . Estimate the mean SBP with a 95% confidence interval with unknown $σ$ and $s = 5$ . The code for the $t$ interval is pretty similar to the $z$ interval. The main difference is that we are gonna use t.ppf() (or t.isf()) to find a quantile or critical value from the Student’s t distribution. In the function, the first argument is still the given probability, then we must specify the degrees of freedom, otherwise Python cannot determine which $t$ -distribution is being considered, and will render an error message.

alpha = 0.05
n = 16
x_bar = 121.5
s = 5  # Sample standard deviation (sigma unknown)

from scipy.stats import t
## t-critical value
cri_t = t.ppf(1 - alpha/2, df=n-1)
cri_t
# 2.131449545559323

## margin of error
m_t = cri_t * (s / np.sqrt(n))
m_t
# 2.664311931949154

## 95% CI for mu when sigma is unknown
x_bar + np.array([-1, 1]) * m_t  
# array([118.83568807, 124.16431193])

$z_{0.025} = 1.96 < t_{0.025, 15} = 2.13$ . The interval is wider with $s = 5$ .

14.5 Summary

To conclude this chapter, a table that summarizes the confidence interval for $μ$ is provided.

	Numerical Data, $σ$ known	Numerical Data, $σ$ unknown
Parameter of Interest	Population Mean $μ$	Population Mean $μ$
Confidence Interval	$\bar{x} \pm z_{α / 2} \frac{σ}{\sqrt{n}}$	$\bar{x} \pm t_{α / 2, n - 1} \frac{s}{\sqrt{n}}$

Remember to check if the population is normally distributed and/or $n > 30$ . What if the population is not normal and $n \leq 30$ ? We could use a simulation-based approach, for example bootstrapping discussed in Chapter 15.

14.6 Exercises

Here are summary statistics for randomly selected weights of newborn boys: $n = 207$ , $\bar{x} = 30.2$ hg (1hg = 100 grams), $s = 7.3$ hg.
1. Compute a 95% confidence interval for $μ$ , the mean weight of newborn boys.
2. Is the result in (a) very different from the 95% confidence interval if $σ = 7.3$ ?
A 95% confidence interval for a population mean $μ$ is given as (18.635, 21.125). This confidence interval is based on a simple random sample of 32 observations. Calculate the sample mean and standard deviation. Assume that all conditions necessary for inference are satisfied. Use the t-distribution in any calculations.
A market researcher wants to evaluate car insurance savings at a competing company. Based on past studies he is assuming that the standard deviation of savings is $95. He wants to collect data such that he can get a margin of error of no more than $12 at a 95% confidence level. How large of a sample should he collect?
The 95% confidence interval for the mean rent of one bedroom apartments in Chicago was calculated as ($2400, $3200).
1. Interpret the meaning of the 95% interval.
2. Find the sample mean rent from the interval.

The standard normal random variable $Z \sim N (0, 1)$ is a pivotal quantity (or pivot) because it is independent of parameters $μ$ and $σ$ .↩︎

14.1 Point Estimator

14.2 Confidence Intervals

14.3 Confidence Intervals for μ When σ is Known

14.3.1 Reducing margin of error and determining sample size*

14.4 Confidence Intervals for μ When σ is Unknown

14.5 Summary

14.6 Exercises

14.3 Confidence Intervals for $μ$ When $σ$ is Known

14.4 Confidence Intervals for $μ$ When $σ$ is Unknown