Learning Objectives


Upon completing today’s lab activity, students should be able to do the following using R and RStudio:

  1. Computing probabilities using the student’s t distribution.

  2. Computing probabilities using the chi-squared distribution.

  3. Visualizing the probability distributions.


library(tidyverse)
library(ggplot2)
library(gghighlight)
library(openintro)


Student’s t Distribution


The t distribution with df = n degrees of freedom has density \[P(x;k) = \frac{\Gamma{\left(\frac{(k+1)}{2}\right)}}{\sqrt{(k \pi)} \Gamma\left(\frac{k}{2}\right)} \left(1 + \frac{x^2}{k}\right)^{-\frac{k+1}{2}}\]

for real \(x\) and \(k\) is the degrees of freedom. It has mean 0 (\(k > 1\)) and variance \(\frac{k}{k-2}\) (for \(k > 2\)).

Applications: The most used applications are power calculations for t-tests: Let \(t = \frac{\bar{x} - \mu}{s/\sqrt{n}}\) where \(\bar{x}\) is the mean and \(s\) the sample standard deviation (sd) of \(X_1\), \(X_2\), \(\cdots\), \(X_n\) which are independent and identically distributed \(N(\mu, \sigma^2)\).

When to use the t distribution: When working with problems when the population standard deviation (\(\sigma\)) is unknown and the sample size is small (\(n<30\)), you must utilize the t distribution. General Correct Rule: If \(\sigma\) is unknown, the t distribution is appropriate. If \(\sigma\) is known, then the normal distribution is appropriate.

R has four built-in functions for generating Student’s t distributions. They are detailed further down including the descriptions of the settings.

dt(x, df, ncp) # PDF
pt(q, df, ncp) # CDF
qt(p, df, ncp) # percentiles
rt(n, df, ncp) # simulations
  • x, q is a vector of quantiles.

  • p is a vector of probabilities.

  • n is the number of observations. If length(n) > 1, the length is taken to be the number required.

  • df is the degrees of freedom (\(> 0\), maybe non-integer). df = Inf is allowed.

  • sd is the standard deviation. It’s default value is 1.

  • ncp is the non-centrality parameter delta; currently except for rt(), only for abs(ncp) <= 37.62. If omitted, use the central t distribution.

t Distribution - dt

# Creating the vector with the x-values for dt function
x_dt <- seq(- 5, 5, by = 0.01)

# Applying the dt() function
y_dt <- dt(x_dt, df = 3)

# Plotting 
plot(x_dt,y_dt, type = "l", main = "t-distribution density function", las=1)

t Distribution - pt

# Creating the vector with the x-values for pt function
x_pt <- seq(- 5, 5, by = 0.01)

# Applying the pt() function
y_pt <- pt(x_pt, df = 3)

# Plotting 
plot(x_pt,y_pt, type = "l", main = "t-distribution cumulative function", las=1)

t Distribution - qt

# find t for 95% confidence interval
# value of t with 2.5% in each tail

qt(p=0.025, df = 15, lower.tail = T)
## [1] -2.13145


Chi-Squared Distribution


The chi-squared distribution with \(df = n \ge 0\) degrees of freedom has density

\[P(x;k) = \frac{1}{2^{\frac{k}{2}} \Gamma{\left(\frac{k}{2}\right)}} x^{k/2 -1} e^{-x/2}\] for \(x > 0\) and \(k\) degrees of freedom. This distribution has expected value \(k\) and variance \(2k\). We can compute the Chi-Squared statistic as \[\chi^2_{k} = \sum \frac{(O_i - E_i)^2}{E_i}\] where \(k\) is the degrees of freedom, \(O\) is the observed values, and \(E\) is the expected values (or the means).

Applications: The chi-square test statistic can be used to determine whether or not there is a relationship between the rows and columns of a contingency table. More precisely, this statistic may be used to evaluate if there is a difference in the proportions of the risk factor of interest between the research groups.

When to use the Chi-Squared distribution: It is applied to statistical tests in which the test statistic has a Chi-squared distribution. The Chi-square goodness of fit test and the Chi-square test of independence are two typical tests that rely on the Chi-square distribution.

R has four built-in functions for generating Chi-Squared distributions. They are detailed further down including the descriptions of the settings.

dchisq(x, df) # PDF
pchisq(q, df) # CDF
qchisq(p, df) # percentiles
rchisq(n, df) # simulations
  • x, q is a vector of quantiles.

  • p is a vector of probabilities.

  • n is the number of observations. If length(n) > 1, the length is taken to be the number required.

  • df is the degrees of freedom (\(> 0\), maybe non-integer). df = Inf is allowed.

Chi-Square Distribution - dchisq

# Creating the vector with the x-values for dchisq function
x_dchisq <- seq(0, 10, by = 0.01)

# Applying the dchisq() function
y_dchisq <- dchisq(x_dchisq, df = 3)

# Plotting 
plot(x_dchisq,y_dchisq, type = "l", main = "Chi-Squared distribution density function with df = 3", las=1)

Chi-Square Distribution - pchisq

# Creating the vector with the x-values for pchisq function
x_pchisq <- seq(0, 10, by = 0.01)

# Applying the pchisq() function
y_pchisq <- pchisq(x_pchisq, df = 3)

# Plotting 
plot(x_pchisq,y_pchisq, type = "l", main = "Chi-Square cumulative function with df = 3", las=1)


Mini Activities


Student’s t distributions

Use the functions dt for the density function and pt for the cumulative function to plot - on the same graph - the distributions with degrees of freedom 4, 6, 8, 10, 12, and 14. Make sure to label the curves properly with legends. You can use you learned skill using data frames and ggplot here. What do you observe in the curves as you increase the degrees of freedom?

Chi-Square distributions

Use the functions dchisq for the density function and pchisq for the cumulative function to plot - on the same graph - the distributions with degrees of freedom 5, 15, and 30. Make sure to label the curves properly with legends. Make sure to label the curves properly with legends. You can use you learned skill using data frames and ggplot here. What do you observe in the curves as you increase the degrees of freedom?


---
title: "5 - Statistical Models Part II"
author: "Alex John Quijano"
date: "11/02/2021"
output: openintro::lab_report
---

## **Learning Objectives**

<br>

Upon completing today's lab activity, students should be able to do the following using R and RStudio:
  
  1. Computing probabilities using the student's t distribution.
  
  2. Computing probabilities using the chi-squared distribution.
  
  3. Visualizing the probability distributions.

<br>

```{r echo=TRUE, message=FALSE}
library(tidyverse)
library(ggplot2)
library(gghighlight)
library(openintro)
```

<br>

## **Student's t Distribution**

<br>

The t distribution with df = n degrees of freedom has density
$$P(x;k) = \frac{\Gamma{\left(\frac{(k+1)}{2}\right)}}{\sqrt{(k \pi)} \Gamma\left(\frac{k}{2}\right)} \left(1 + \frac{x^2}{k}\right)^{-\frac{k+1}{2}}$$

for real $x$ and $k$ is the degrees of freedom. It has mean 0 ($k > 1$) and variance $\frac{k}{k-2}$ (for $k > 2$).

**Applications:** The most used applications are power calculations for t-tests: Let $t = \frac{\bar{x} - \mu}{s/\sqrt{n}}$ where $\bar{x}$ is the mean and $s$ the sample standard deviation (sd) of $X_1$, $X_2$, $\cdots$, $X_n$ which are independent and identically distributed $N(\mu, \sigma^2)$.

**When to use the t distribution:** When working with problems when the population standard deviation ($\sigma$) is unknown and the sample size is small ($n<30$), you must utilize the t distribution. General Correct Rule: If $\sigma$ is unknown, the t distribution is appropriate. If $\sigma$ is known, then the normal distribution is appropriate.

R has four built-in functions for generating Student's t distributions. They are detailed further down including the descriptions of the settings.

```
dt(x, df, ncp) # PDF
pt(q, df, ncp) # CDF
qt(p, df, ncp) # percentiles
rt(n, df, ncp) # simulations
```

  * `x`, `q` is a vector of quantiles.

  * `p` is a vector of probabilities.

  * `n` is the number of observations. If `length(n) > 1`, the length is taken to be the number required.

  * `df` is the degrees of freedom ($> 0$, maybe non-integer). `df = Inf` is allowed.

  * `sd` is the standard deviation. It's default value is 1.
  
  * `ncp` is the non-centrality parameter delta; currently except for `rt()`, only for `abs(ncp) <= 37.62`. If omitted, use the central t distribution.

### t Distribution - `dt`

```{r}
# Creating the vector with the x-values for dt function
x_dt <- seq(- 5, 5, by = 0.01)

# Applying the dt() function
y_dt <- dt(x_dt, df = 3)

# Plotting 
plot(x_dt,y_dt, type = "l", main = "t-distribution density function", las=1)
```

### t Distribution - `pt`

```{r}
# Creating the vector with the x-values for pt function
x_pt <- seq(- 5, 5, by = 0.01)

# Applying the pt() function
y_pt <- pt(x_pt, df = 3)

# Plotting 
plot(x_pt,y_pt, type = "l", main = "t-distribution cumulative function", las=1)
```

### t Distribution - `qt`

```{r}
# find t for 95% confidence interval
# value of t with 2.5% in each tail

qt(p=0.025, df = 15, lower.tail = T)
```

<br>

## **Chi-Squared Distribution**

<br>

The chi-squared distribution with $df = n \ge 0$ degrees of freedom has density

$$P(x;k) = \frac{1}{2^{\frac{k}{2}} \Gamma{\left(\frac{k}{2}\right)}} x^{k/2 -1} e^{-x/2}$$
for $x > 0$ and $k$ degrees of freedom. This distribution has expected value $k$ and variance $2k$. We can compute the Chi-Squared statistic as 
$$\chi^2_{k} = \sum \frac{(O_i - E_i)^2}{E_i}$$
where $k$ is the degrees of freedom, $O$ is the observed values, and $E$ is the expected values (or the means).

**Applications:** The chi-square test statistic can be used to determine whether or not there is a relationship between the rows and columns of a contingency table. More precisely, this statistic may be used to evaluate if there is a difference in the proportions of the risk factor of interest between the research groups.

**When to use the Chi-Squared distribution:**  It is applied to statistical tests in which the test statistic has a Chi-squared distribution. The Chi-square goodness of fit test and the Chi-square test of independence are two typical tests that rely on the Chi-square distribution.

R has four built-in functions for generating Chi-Squared distributions. They are detailed further down including the descriptions of the settings.

```
dchisq(x, df) # PDF
pchisq(q, df) # CDF
qchisq(p, df) # percentiles
rchisq(n, df) # simulations
```

  * `x`, `q` is a vector of quantiles.

  * `p` is a vector of probabilities.

  * `n` is the number of observations. If `length(n) > 1`, the length is taken to be the number required.

  * `df` is the degrees of freedom ($> 0$, maybe non-integer). `df = Inf` is allowed.

### Chi-Square Distribution - `dchisq`

```{r}
# Creating the vector with the x-values for dchisq function
x_dchisq <- seq(0, 10, by = 0.01)

# Applying the dchisq() function
y_dchisq <- dchisq(x_dchisq, df = 3)

# Plotting 
plot(x_dchisq,y_dchisq, type = "l", main = "Chi-Squared distribution density function with df = 3", las=1)
```

### Chi-Square Distribution - `pchisq`

```{r}
# Creating the vector with the x-values for pchisq function
x_pchisq <- seq(0, 10, by = 0.01)

# Applying the pchisq() function
y_pchisq <- pchisq(x_pchisq, df = 3)

# Plotting 
plot(x_pchisq,y_pchisq, type = "l", main = "Chi-Square cumulative function with df = 3", las=1)
```
<br>

## **Mini Activities**

<br>

### Student's t distributions

Use the functions `dt` for the density function and `pt` for the cumulative function to plot - on the same graph - the distributions with degrees of freedom 4, 6, 8, 10, 12, and 14. Make sure to label the curves properly with legends. You can use you learned skill using data frames and ggplot here. What do you observe in the curves as you increase the degrees of freedom?

### Chi-Square distributions

Use the functions `dchisq` for the density function and `pchisq` for the cumulative function to plot - on the same graph - the distributions with degrees of freedom 5, 15, and 30. Make sure to label the curves properly with legends. Make sure to label the curves properly with legends. You can use you learned skill using data frames and ggplot here. What do you observe in the curves as you increase the degrees of freedom?

<br>
