Learning Objectives


Upon completing today’s lab activity, students should be able to do the following using R and RStudio:

  1. Computing probabilities using the binomial probability mass function (PMF).

  2. Computing probabilities using the normal probability density function (PDF).

  3. Visualizing probability functions.


library(tidyverse)
library(ggplot2)
library(gghighlight)
library(openintro)


Binomial Distribution


The binomial distribution model is used to determine the likelihood of success of an event with just two possible outcomes.

Let the discrete random variable \(X\) to be the number successes - with probability \(p\) - and let \(n\) be the number of trials.

The general form of the Binomial probability mass function (PMF) is given by

\[P(X = x; n, p) = {n \choose x} p^x (1-p)^{(n-x)} \hspace{5px} \text{ for } x = 0,1,2,\cdots, n\] where the binomial coefficients are computed as \[{n \choose x} = \frac{n!}{x!(n-x)!}.\]

The binomial distribution assumes that \(p\) is fixed for all \(n\) trials.

The expected value or the mean of the binomial PMF is \(E[X] = np\).

The variance of the binomial PMF is \(Var[X] = np(1-p)\).

Notice that the term \(p^x (1-p)^{(n-x)}\) is the Bernoulli function. Here, a trial is also called a Bernoulli trial, a random experiment with exactly two possible outcomes.

The binomial cumulative distribution function (CDF) is given by

\[P(X \le k) = \sum_{i=0}^{k} {n \choose i} p^i (1-p)^{(n-i)}\] where \(k\) is an integer.

R commands

R has four built-in functions for generating binomial distributions. They are detailed further down including the descriptions of the settings.

dbinom(x, size, prob) # pdf
pbinom(x, size, prob) # cdf
qbinom(p, size, prob) # percentiles
rbinom(n, size, prob) # simulations
  • x is a vector of numbers.

  • p is a vector of probabilities.

  • n is number of observations.

  • size is the number of trials.

  • prob is the probability of success of each trial.

For explicitly computing the binomial coefficients - or the combination function, we can use the choose function.

choose(n,k)


Using the choose function

Example: How many ways can we pick 5 items from 10 items where order does not matter?

choose(10,5)
## [1] 252

There are 252 ways, which is \({10 \choose 5} = 252\).

Binomial PMF - dbinom

Example: Suppose that \(p=1/2\) and \(n=10\). What is \(P(X = 6)\)?

p <- 1/2
x1 <- 6
n <- 10
dbinom(x1,n,prob=p)
## [1] 0.2050781

Plotting the PMF.

# Create a sample of 50 numbers which are incremented by 1.
X <- seq(0,n,by = 1)

# Create the binomial distribution.
P_binom <- dbinom(X,n,p)

# create dataframe for ggplot
df <- data.frame(x=X,probability=P_binom)

# for highlighting
p_x1 <- dbinom(x = x1,
              size = n, 
              prob = p)

plt <- ggplot(df, aes(x=X, y=probability)) +
  geom_point(size=4) + ggtitle("PMF of the Binomial Distribution p = 1/2 and n = 10: P(X = 6)") +
  geom_point(x=x1,y=p_x1,color='red',size = 4) + 
  geom_segment(x=x1,y=0,xend=x1, yend=p_x1, color="red") +
  scale_x_discrete(limit = X)
## Warning: Continuous limits supplied to discrete scale.
## Did you mean `limits = factor(...)` or `scale_*_continuous()`?
plt

Binomial CDF - pbinom

Example: Suppose that \(p=1/2\) and \(n=10\). What is \(P(X \le 6)\)?

p <- 1/2
x2 <- 6
n <- 10
pbinom(x2,n,prob=p)
## [1] 0.828125

Plotting the PMF.

# Create a sample of n numbers which are incremented by 1.
X <- seq(0,n,by = 1)

# Create the binomial distribution.
P_binom <- dbinom(X,n,p)

# create dataframe for ggplot
df <- data.frame(x=X,probability=P_binom)

# for highlighting
x2_vect <- 0:x2
p2_vect <- P_binom[x2_vect+1]

plt <- ggplot(df, aes(x=X, y=P_binom)) +
  geom_point(size=4) + ggtitle("PMF of the Binomial Distribution p = 1/2 and n = 10: P(X <= 6)") +
  scale_x_discrete(limit = X)
## Warning: Continuous limits supplied to discrete scale.
## Did you mean `limits = factor(...)` or `scale_*_continuous()`?
for(i in 1:length(x2_vect)) {
    plt <- plt + geom_point(x=x2_vect[i],y=p2_vect[i],color='red',size = 4) + 
                 geom_segment(x=x2_vect[i],y=0,xend=x2_vect[i], yend=p2_vect[i], color="red")
  }
plt

Binomial Percentiles - qbinom

Example: What is the \(x\) value when \(P(X) = 0.2461\) and \(p=1/2\)?

p3 <- 0.2461
qbinom(p3,n,p)
## [1] 4


Normal Distribution


The distribution of data in a random sample is often seen to be normal. That is, when we plot a graph with the variable’s value in the horizontal axis and the count of the values in the vertical axis, we obtain a bell shape curve. The mean of the data set is represented by the middle of the curve. In the graph, half of the values are to the left of the mean, while the other half are to the right. In statistics, this is referred to as normal distribution - or the Gaussian distribution.

The general formula for the normal probability density function (PDF) is

\[P(X = x; \mu, \sigma) = \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2}\] where \(X\) is some continuous random variable, \(\mu\) and \(\sigma\) are the shape parameters - or mean and standard deviations respectively.

The expected value or the mean of the normal PDF is \(E[X] = \mu\).

The variance of the normal PDF is \(Var[X] = \sigma^2\).

The forumal for the normal cumulative distribution function (cdf) is

\[P(X \le k; \mu, \sigma) = \int_{-\infty}^{k} \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2} dx\] where \(k\) is come continuous value.

Notice that the z-score term is written in the functions as

\[z = \frac{x - \mu}{\sigma}.\]

The z-score is also called the standardized score describing the distance from some value \(x\) to the mean of any normal distrbution with mean \(\mu\) and standard deviation \(\sigma\). Standardization of the distribution is when we transform the data distribution to yield \(\mu=0\) and \(\sigma=1\).

R has four built-in functions for generating normal distributions. They are detailed further down including the descriptions of the settings.

dnorm(x, mean, sd) # PDF
pnorm(x, mean, sd) # CDF
qnorm(p, mean, sd) # percentiles
rnorm(n, mean, sd) # simulations
  • x is a vector of numbers.

  • p is a vector of probabilities.

  • n is number of observations(sample size).

  • mean is the mean value of the sample data. It’s default value is zero.

  • sd is the standard deviation. It’s default value is 1.


Normal PDF - dnorm

Example: Suppose that \(\mu=125\) and \(\sigma=20\). What is \(P(X = 100)\)?

mu <- 125
sigma <- 20
x1 <- 100
dnorm(x1,mu,sigma)
## [1] 0.009132454

Normal CDF - pnorm

Example: Suppose that \(\mu=125\) and \(\sigma=20\). What is \(P(X \le 100)\)?

mu <- 125
sigma <- 20
x2 <- 100
pnorm(x1,mu,sigma)
## [1] 0.1056498

Plotting the PDF.

normTail(mu, sigma, M = c(0,x2), col = "red")

Normal Percentiles - qnorm

Example: What is the \(x\) value when \(P(X) = 0.25\) with \(\mu=125\) and \(\sigma=20\)?

p3 <- 0.25
qnorm(p3,mu,sigma)
## [1] 111.5102

Checking.

pnorm(111.5102,mu,sigma)
## [1] 0.2499999


Mini Activities


Notification on changes to the HW and LB assignments: The mini activities below are designed to be done during lab time. The actual lab exercises are now part of your homework assignment, which will be posted tonight in the Homeworks page. This means that every week you will have a HW and LB assignment together as one, and it will be due in one week.

Binomial Distribution

Let the discrete random variable \(X\) to be the number of successes - with probability \(p = \frac{1}{3}\) - and let \(n = 10\) be the number of trials. Assuming the random variable follows a binomial distribution, compute the following probabilities and highlight the appropriate region(s) in the plot.

  1. \(P(X = 2)\)

  2. \(P(X \le 2)\)

  3. \(P(X \ge 2)\)

  4. \(P(2 \le X \le 4)\)

  5. \(P(X = 2 \cup X = 4)\)

  6. \(P(X = 2 \cap X = 4)\)

  7. \(P(X \le x) = 0.25\)

  8. \(P(X \ge x) = 0.25\)


Normal Distribution

Suppose that the random variable \(X\) follows a normal distribution with \(\mu = 10\) and \(\sigma=5\), compute the following probabilities and highlight the appropriate region(s) in the plot.

  1. \(P(X = 8)\)

  2. \(P(X \le 11)\)

  3. \(P(X \ge 11)\)

  4. \(P(8 \le X \le 11)\)

  5. \(P(X = 8 \cup X = 11)\)

  6. \(P(X = 8 \cap X = 9)\)

  7. \(P(X \le x) = 0.10\)

  8. \(P(X \ge x) = 0.10\)

  9. What is the corresponding z-score for \(P(X = 8)\)?

  10. What are the corresponding z-scores for \(P(8 \le X \le 11)\)?


---
title: "4 - Statistical Models Part I"
author: "Alex John Quijano"
date: "10/26/2021"
output: openintro::lab_report
---

## **Learning Objectives**

<br>

Upon completing today's lab activity, students should be able to do the following using R and RStudio:

  1. Computing probabilities using the binomial probability mass function (PMF).
  
  2. Computing probabilities using the normal probability density function (PDF).
  
  3. Visualizing probability functions.

<br>

```{r echo=TRUE, message=FALSE}
library(tidyverse)
library(ggplot2)
library(gghighlight)
library(openintro)
```

<br>

## **Binomial Distribution**

<br>

The **binomial distribution** model is used to determine the likelihood of success of an event with just two possible outcomes.

Let the **discrete random variable** $X$ to be the number successes - with probability $p$ - and let $n$ be the number of trials.

The general form of the **Binomial probability mass function (PMF)** is given by

$$P(X = x; n, p) = {n \choose x} p^x (1-p)^{(n-x)} \hspace{5px} \text{ for } x = 0,1,2,\cdots, n$$
where the binomial coefficients are computed as 
$${n \choose x} = \frac{n!}{x!(n-x)!}.$$
      
The binomial distribution assumes that $p$ is fixed for all $n$ trials.
  
The **expected value** or the **mean** of the binomial PMF is $E[X] = np$.

The **variance** of the binomial PMF is $Var[X] = np(1-p)$.
  
Notice that the term $p^x (1-p)^{(n-x)}$ is the **Bernoulli** function. Here, a trial is also called a **Bernoulli trial**, a random experiment with exactly two possible outcomes.

The **binomial cumulative distribution function (CDF)** is given by

$$P(X \le k) = \sum_{i=0}^{k} {n \choose i} p^i (1-p)^{(n-i)}$$
where $k$ is an integer.

### R commands

R has four built-in functions for generating binomial distributions. They are detailed further down including the descriptions of the settings.

```
dbinom(x, size, prob) # pdf
pbinom(x, size, prob) # cdf
qbinom(p, size, prob) # percentiles
rbinom(n, size, prob) # simulations
```

  * `x` is a vector of numbers.

  * `p` is a vector of probabilities.

  * `n` is number of observations.

  * `size` is the number of trials.

  * `prob` is the probability of success of each trial.
  
For explicitly computing the binomial coefficients - or the combination function, we can use the `choose` function.

```
choose(n,k)
```

<br>

### Using the `choose` function

Example: How many ways can we pick 5 items from 10 items where order does not matter?

```{r}
choose(10,5)
```

There are 252 ways, which is ${10 \choose 5} = 252$.

### Binomial PMF - `dbinom`

Example: Suppose that $p=1/2$ and $n=10$. What is $P(X = 6)$?

```{r}
p <- 1/2
x1 <- 6
n <- 10
dbinom(x1,n,prob=p)
```

Plotting the PMF.

```{r}
# Create a sample of 50 numbers which are incremented by 1.
X <- seq(0,n,by = 1)

# Create the binomial distribution.
P_binom <- dbinom(X,n,p)

# create dataframe for ggplot
df <- data.frame(x=X,probability=P_binom)

# for highlighting
p_x1 <- dbinom(x = x1,
              size = n, 
              prob = p)

plt <- ggplot(df, aes(x=X, y=probability)) +
  geom_point(size=4) + ggtitle("PMF of the Binomial Distribution p = 1/2 and n = 10: P(X = 6)") +
  geom_point(x=x1,y=p_x1,color='red',size = 4) + 
  geom_segment(x=x1,y=0,xend=x1, yend=p_x1, color="red") +
  scale_x_discrete(limit = X)
plt
```
  
### Binomial CDF - `pbinom`

Example: Suppose that $p=1/2$ and $n=10$. What is $P(X \le 6)$?

```{r}
p <- 1/2
x2 <- 6
n <- 10
pbinom(x2,n,prob=p)
```

Plotting the PMF.

```{r}
# Create a sample of n numbers which are incremented by 1.
X <- seq(0,n,by = 1)

# Create the binomial distribution.
P_binom <- dbinom(X,n,p)

# create dataframe for ggplot
df <- data.frame(x=X,probability=P_binom)

# for highlighting
x2_vect <- 0:x2
p2_vect <- P_binom[x2_vect+1]

plt <- ggplot(df, aes(x=X, y=P_binom)) +
  geom_point(size=4) + ggtitle("PMF of the Binomial Distribution p = 1/2 and n = 10: P(X <= 6)") +
  scale_x_discrete(limit = X)
for(i in 1:length(x2_vect)) {
    plt <- plt + geom_point(x=x2_vect[i],y=p2_vect[i],color='red',size = 4) + 
                 geom_segment(x=x2_vect[i],y=0,xend=x2_vect[i], yend=p2_vect[i], color="red")
  }
plt
```

### Binomial Percentiles - `qbinom`

Example: What is the $x$ value when $P(X) = 0.2461$ and $p=1/2$?

```{r}
p3 <- 0.2461
qbinom(p3,n,p)
```

<br>

## **Normal Distribution**

<br>

The distribution of data in a random sample is often seen to be normal. That is, when we plot a graph with the variable's value in the horizontal axis and the count of the values in the vertical axis, we obtain a bell shape curve. The mean of the data set is represented by the middle of the curve. In the graph, half of the values are to the left of the mean, while the other half are to the right. In statistics, this is referred to as **normal distribution** - or the **Gaussian distribution**.

The general formula for the **normal probability density function (PDF)** is

$$P(X = x; \mu, \sigma) = \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2}$$
where $X$ is some **continuous random variable**, $\mu$ and $\sigma$ are the shape parameters - or mean and standard deviations respectively.

The **expected value** or the **mean** of the normal PDF is $E[X] = \mu$.

The **variance** of the normal PDF is $Var[X] = \sigma^2$.

The forumal for the **normal cumulative distribution function (cdf)** is

$$P(X \le k; \mu, \sigma) = \int_{-\infty}^{k} \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2} dx$$
where $k$ is come continuous value.

Notice that the **z-score** term is written in the functions as

$$z = \frac{x - \mu}{\sigma}.$$

The **z-score** is also called the standardized score describing the distance from some value $x$ to the mean of any normal distrbution with mean $\mu$ and standard deviation $\sigma$. Standardization of the distribution is when we transform the data distribution to yield $\mu=0$ and $\sigma=1$.

R has four built-in functions for generating normal distributions. They are detailed further down including the descriptions of the settings.

```
dnorm(x, mean, sd) # PDF
pnorm(x, mean, sd) # CDF
qnorm(p, mean, sd) # percentiles
rnorm(n, mean, sd) # simulations
```

  * `x` is a vector of numbers.

  * `p` is a vector of probabilities.

  * `n` is number of observations(sample size).

  * `mean` is the mean value of the sample data. It's default value is zero.

  * `sd` is the standard deviation. It's default value is 1.

<br>

### Normal PDF - `dnorm`

Example: Suppose that $\mu=125$ and $\sigma=20$. What is $P(X = 100)$?

```{r}
mu <- 125
sigma <- 20
x1 <- 100
dnorm(x1,mu,sigma)
```
  
### Normal CDF - `pnorm`

Example: Suppose that $\mu=125$ and $\sigma=20$. What is $P(X \le 100)$?

```{r}
mu <- 125
sigma <- 20
x2 <- 100
pnorm(x1,mu,sigma)
```

Plotting the PDF.

```{r fig.asp=1}
normTail(mu, sigma, M = c(0,x2), col = "red")
```

### Normal Percentiles - `qnorm`

Example: What is the $x$ value when $P(X) = 0.25$ with $\mu=125$ and $\sigma=20$?

```{r}
p3 <- 0.25
qnorm(p3,mu,sigma)
```

Checking.

```{r}
pnorm(111.5102,mu,sigma)
```

<br>

## **Mini Activities**

<br>

**Notification on changes to the HW and LB assignments:** The mini activities below are designed to be done during lab time. The actual lab exercises are now part of your homework assignment, which will be posted tonight in the [Homeworks](https://reed-statistics.github.io/math141-fall2021/homeworks.html){target="_blank"} page. This means that every week you will have a HW and LB assignment together as one, and it will be due in one week.

### Binomial Distribution

Let the discrete random variable $X$ to be the number of successes - with probability $p = \frac{1}{3}$ - and let $n = 10$ be the number of trials. Assuming the random variable follows a binomial distribution, compute the following probabilities and highlight the appropriate region(s) in the plot.

1. $P(X = 2)$

2. $P(X \le 2)$

3. $P(X \ge 2)$

4. $P(2 \le X \le 4)$

5. $P(X = 2 \cup X = 4)$

6. $P(X = 2 \cap X = 4)$

7. $P(X \le x) = 0.25$

8. $P(X \ge x) = 0.25$

<br>

### Normal Distribution

Suppose that the random variable $X$ follows a normal distribution with $\mu = 10$ and $\sigma=5$, compute the following probabilities and highlight the appropriate region(s) in the plot.

1. $P(X = 8)$

2. $P(X \le 11)$

3. $P(X \ge 11)$

4. $P(8 \le X \le 11)$

5. $P(X = 8 \cup X = 11)$

6. $P(X = 8 \cap X = 9)$

7. $P(X \le x) = 0.10$

8. $P(X \ge x) = 0.10$

9. What is the corresponding z-score for $P(X = 8)$?

10. What are the corresponding z-scores for $P(8 \le X \le 11)$?

<br>
