Alex John Quijano
10/08/2021
In the previous lectures, we learned about the following:
Probability Functions such as the binomial probability mass function (pmf) (discrete) and normal probability density function (pdf) (continuous).
The Central Limit Theorem which states that regardless of the underlying distribution, the sampling distribution of the mean of any independent, random variable will be normal or near normal.
Bootstrapping to demonstrate the sampling distribution means approach a normal or near normal distribution.
In this lecture, we will learn about:
The standard normal distribution.
The probability density function (pdf) and cumulative density function (cdf) of the normal distribution
Calculating z-scores and knowing what it means.
A normal distribution is a continuous probability distribution that is symmetric around its mean.
The mode, median, mean are all equal.
The normal distribution is used to describe how the values of a variable are distributed.
It is also known as the Gaussian distribution or the bell curve.
Image Source: Wikipedia: Carl Friedrich Gauss
The normal distribution is named after Carl Friedrich Gauss (1777-1855).
1809: Gauss published Theoria motus, on the least squares method.
1823, Gauss published Theoria combinationis observationum erroribus minimus obnoxiae, on the theory of observable errors.
Quick Source: The history of 68.2 95.4 99.7 in Statistics by Claudiu Clement
Both curves represent the normal distribution, however, they differ in their center and spread. The normal distribution with mean 0 and standard deviation 1 (blue solid line, on the left) is called the standard normal distribution. The other distribution (green dashed line, on the right) has mean 19 and standard deviation 4.
The two normal models but plotted together on the same scale.
If a normal distribution has mean \(\mu\) and standard deviation \(\sigma,\) we may write the distribution as \(N(\mu, \sigma).\) The two distributions shown in the previous two slides can be written as
\[ N(\mu = 0, \sigma = 1)\quad\text{and}\quad N(\mu = 19, \sigma = 4) \]
Because the mean and standard deviation describe a normal distribution exactly, they are called the distribution’s parameters.
SAT scores follow a nearly normal distribution with a mean of 1500 points and a standard deviation of 300 points.
ACT scores also follow a nearly normal distribution with mean of 21 points and a standard deviation of 5 points.
Suppose Nel scored 1800 points on their SAT and Sian scored 24 points on their ACT.
Who performed better? Who has the highest percentile score?
Nel’s and Sian’s scores shown with the distributions of SAT and ACT scores.
Solution
The z-score (or Z score) of an observation is defined as the number of standard deviations it falls above or below the mean.
If the observation is one standard deviation above the mean, its z-score is 1.
If it is 1.5 standard deviations below the mean, then its z-score is -1.5.
If \(x\) is an observation from a distribution \(N(\mu, \sigma),\) we define the (z-score) mathematically as
\[ Z = \frac{x-\mu}{\sigma} \]
Nel earned a score of 1800 on their SAT with a corresponding \(Z=1.\) They would like to know what percentile they fall in among all SAT test-takers. Nel’s percentile is the percentage of people who earned a lower SAT score than Nel.
The total area under the normal curve is always equal to 1,
The proportion of people who scored below Nel on the SAT is 0.8413. In other words, Nel is in the \(84^{th}\) percentile of SAT takers.
The normal model for SAT scores, shading the area of those individuals who scored below Nel.
Visual calculation of the probability that Shannon scores at least 1630 on the SAT.
What is the probability that a randomly selected adult male is between 5’9’’ and 6’2’’? Parameters are given as \(\mu=70\) and \(\sigma=3.3\) inches
These heights correspond to 69 inches and 74 inches. First, draw the figure. The area of interest is no longer an upper or lower tail.
The total area under the curve is 1. If we find the area of the two tails that are not shaded (from the previous Guided Practice, these areas are \(0.3821\) and \(0.1131\)), then we can find the middle area:
That is, the probability of being between 5’9’’ and 6’2’’ is 0.5048.
Probabilities for falling within 1, 2, and 3 standard deviations of the mean in a normal distribution.
Today, we talked about the following:
The application of the z-score and what it means.
The normal distribution with mean (\(\mu\)) and standard deviation (\(\sigma\)) and the standard normal distribution, which is resealed so that the \(\mu=0\) and \(\sigma=1\).
Today, work on computing the z-scores of the following examples and draw its corresponding shaded area under the normal curve and label all associated points.
Given \(x = 106\) and \(N(\mu = 100, \sigma = 2)\), draw the shaded area where the probability is less than \(x\).
Given \(x = 106\) and \(N(\mu = 100, \sigma = 2)\), draw the shaded area where the probability is more than \(x\).
Given \(x_1 = 90\), \(x_2 = 140\) and \(N(\mu = 100, \sigma = 2)\), draw the shaded area where the probability is between \(x_1\) and \(x_2\).