6 - Introduction to the Standard Normal Distribution

Alex John Quijano

10/08/2021

Previously…

In the previous lectures, we learned about the following:

Probability Functions such as the binomial probability mass function (pmf) (discrete) and normal probability density function (pdf) (continuous).
The Central Limit Theorem which states that regardless of the underlying distribution, the sampling distribution of the mean of any independent, random variable will be normal or near normal.
Bootstrapping to demonstrate the sampling distribution means approach a normal or near normal distribution.

Introduction to the Standard Normal Distribution

In this lecture, we will learn about:

The standard normal distribution.
The probability density function (pdf) and cumulative density function (cdf) of the normal distribution
Calculating z-scores and knowing what it means.

What is a Normal Distribution Model?

A normal distribution is a continuous probability distribution that is symmetric around its mean.
The mode, median, mean are all equal.
The normal distribution is used to describe how the values of a variable are distributed.
It is also known as the Gaussian distribution or the bell curve.

Historical Perspective

$Image Source: [Wikipedia: Carl Friedrich Gauss](https://en.wikipedia.org/wiki/Carl_Friedrich_Gauss){target=_blank}$

Image Source: Wikipedia: Carl Friedrich Gauss

The normal distribution is named after Carl Friedrich Gauss (1777-1855).
1809: Gauss published Theoria motus, on the least squares method.
1823, Gauss published Theoria combinationis observationum erroribus minimus obnoxiae, on the theory of observable errors.
Quick Source: The history of 68.2 95.4 99.7 in Statistics by Claudiu Clement

The Gaussian Curve (1/3)

The Gaussian Curve (2/3)

Both curves represent the normal distribution, however, they differ in their center and spread. The normal distribution with mean 0 and standard deviation 1 (blue solid line, on the left) is called the standard normal distribution. The other distribution (green dashed line, on the right) has mean 19 and standard deviation 4.

The Gaussian Curve (3/3)

The two normal models but plotted together on the same scale.

Notation

If a normal distribution has mean $\mu$ and standard deviation $\sigma,$ we may write the distribution as $N(\mu, \sigma).$ The two distributions shown in the previous two slides can be written as

\[ N(\mu = 0, \sigma = 1)\quad\text{and}\quad N(\mu = 19, \sigma = 4) \]

Because the mean and standard deviation describe a normal distribution exactly, they are called the distribution’s parameters.

Example - SAT and ACT Scores (1/4)

SAT scores follow a nearly normal distribution with a mean of 1500 points and a standard deviation of 300 points.
ACT scores also follow a nearly normal distribution with mean of 21 points and a standard deviation of 5 points.
Suppose Nel scored 1800 points on their SAT and Sian scored 24 points on their ACT.
Who performed better? Who has the highest percentile score?

Example - SAT and ACT Scores (2/4)

Nel’s and Sian’s scores shown with the distributions of SAT and ACT scores.

Example - SAT and ACT Scores (3/4)

Solution

The z-score (or Z score) of an observation is defined as the number of standard deviations it falls above or below the mean.
If the observation is one standard deviation above the mean, its z-score is 1.
If it is 1.5 standard deviations below the mean, then its z-score is -1.5.
If $x$ is an observation from a distribution $N(\mu, \sigma),$ we define the (z-score) mathematically as

\[ Z = \frac{x-\mu}{\sigma} \]

Example - SAT and ACT Scores (4/4)

Question: What is Nel’s SAT z-score; $\mu_{SAT}=1500,$ $\sigma_{SAT}=300,$ and $x_{Nel}=1800$?

Answer: \[ Z_{Nel} = \frac{x_{Nel} - \mu_{SAT}}{\sigma_{SAT}} = \frac{1800-1500}{300} = 1 \]

Question: What is Sians’s ACT z-score; $\mu_{ACT}=21,$ $\sigma_{ACT}=5,$ and $x_{Sian}=24$?

Answer: \[Z_{Sian} = \frac{x_{Sian} - \mu_{ACT}}{\sigma_{ACT}} = \frac{24 - 21}{5} = 0.60\]

It looks like Nel performed better because they are 1 standard deviation above the mean, which is 0.40 higher than Sian. This means that when we compute the percentiles for each z-score, Nel should have the higher percentile than Sian.

Percentiles

Nel earned a score of 1800 on their SAT with a corresponding $Z=1.$ They would like to know what percentile they fall in among all SAT test-takers. Nel’s percentile is the percentage of people who earned a lower SAT score than Nel.
The total area under the normal curve is always equal to 1,
The proportion of people who scored below Nel on the SAT is 0.8413. In other words, Nel is in the $84^{th}$ percentile of SAT takers.

The normal model for SAT scores, shading the area of those individuals who scored below Nel.

Another Example - SAT Scores (1/2)

Shannon is a randomly selected SAT taker, and nothing is known about Shannon’s SAT aptitude. What is the probability that Shannon scores at least 1630 on their SATs?

\[Z = \frac{x - \mu}{\sigma} = \frac{1630 - 1500}{300} = \frac{130}{300} = 0.43\]

We use software to find the percentile of $Z=0.43$, which yields 0.6664.

To find the area above $Z=0.43$, we compute one minus the area of the lower tail, as seen below. The probability Shannon scores at least 1630 on the SAT is 0.3336.

Another Example - SAT Scores (2/2)

Visual calculation of the probability that Shannon scores at least 1630 on the SAT.

More Examples - Heights (1/2)

What is the probability that a randomly selected adult male is between 5’9’’ and 6’2’’? Parameters are given as $\mu=70$ and $\sigma=3.3$ inches

These heights correspond to 69 inches and 74 inches. First, draw the figure. The area of interest is no longer an upper or lower tail.

More Examples - Heights (2/2)

The total area under the curve is 1. If we find the area of the two tails that are not shaded (from the previous Guided Practice, these areas are $0.3821$ and $0.1131$), then we can find the middle area:

That is, the probability of being between 5’9’’ and 6’2’’ is 0.5048.

68-95-99.7 rule

Probabilities for falling within 1, 2, and 3 standard deviations of the mean in a normal distribution.

Summary

Today, we talked about the following:

The application of the z-score and what it means.
The normal distribution with mean ($\mu$) and standard deviation ($\sigma$) and the standard normal distribution, which is resealed so that the $\mu=0$ and $\sigma=1$.

Today’s Activity

Today, work on computing the z-scores of the following examples and draw its corresponding shaded area under the normal curve and label all associated points.

Given $x = 106$ and $N(\mu = 100, \sigma = 2)$, draw the shaded area where the probability is less than $x$.
Given $x = 106$ and $N(\mu = 100, \sigma = 2)$, draw the shaded area where the probability is more than $x$.
Given $x_1 = 90$, $x_2 = 140$ and $N(\mu = 100, \sigma = 2)$, draw the shaded area where the probability is between $x_1$ and $x_2$.