Quick Class Notes

Basic Statistics Reference Table

Inference for	Relevant Variables	Population Parameters	Sample Statistics	Theoretical Conditions	Degrees of Freedom	Standard Error (Hypothesis Testing)	Test Statistic	Standard Error (Confidence Interval)	Margin of Error	Confidence Interval
One proportion	One two-level categorical	$p$	$\hat{p}$	1. Independent samples 2. Success-failure ($np \ge 10$ and $n(1-p) \ge 10$)	-	$SE = \sqrt{\frac{p_0(1-p_0)}{n}}$	$Z = \frac{\hat{p} - p_0}{SE}$	$SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$	$ME = z^* SE$	$\hat{p} \pm ME$
Difference of two proportions	Two two-level categorical	$p_{A} - p_{B}$	$\hat{p}_{A} - \hat{p}_{B}$	1. Independent samples 2. Success-failure ($n_A p_A \ge 10$ and $n_A(1-p_A) \ge 10$ $n_B p_B \ge 10$ and $n_B(1-p_B) \ge 10$)	-	$\hat{p}_{pool} = \frac{\hat{p}_A n_A + \hat{p}_B n_B}{n_A + n_B}$ $SE = \sqrt{\hat{p}_{pool}(1-\hat{p}_{pool})\left(\frac{1}{n_A} + \frac{1}{n_B}\right)}$	$Z = \frac{\hat{p}_A - \hat{p}_B - p_0}{SE}$	$SE = \sqrt{\frac{\hat{p}_A (1-\hat{p}_A)}{n_A} + \frac{\hat{p}_B (1-\hat{p}_B)}{n_B}}$	$ME = z^* SE$	$\hat{p}_A - \hat{p}_B \pm ME$
Two-way table	Two categorical with two or more levels	-	-	1. Independent samples 2. Normality (each cell has $\ge 10$ samples)	$k = (C-1)(R-1)$ (where $C$ is the number of columns and $R$ is the number of rows)	-	$\chi_k^2 = \sum_{i=1}^n \frac{(O_i - E_i)^2}{E_i}$ (where $n$ is the number of cells)	-	-	-
One mean	Response: One numerical	$\mu \longrightarrow \text{mean}$ $\sigma \longrightarrow \text{standard deviation}$	$\bar{x} \longrightarrow \text{mean}$ $s \longrightarrow \text{standard deviation}$	1. Independent samples 2. Normality ($n \ge 30$)	$df = n-1$	$SE = \frac{s}{\sqrt{n}}$ (use $\sigma$ if known)	$\frac{\bar{x} - \mu_0}{SE}$	$SE = \frac{s}{\sqrt{n}}$ (use $\sigma$ if known)	$ME = t^_{df} SE$ (use $z^$ if $\sigma$ is known)	$\bar{x} \pm ME$
Comparing two means	Response: One numerical Explanatory: One two-level categorical	$\mu_{A} - \mu_{B}$	$\bar{x}_{A} - \bar{x}_{B}$	1. Independent samples 2. Normality ($n_A \ge 30$, $n_B \ge 30$)	$df = \min{(n_A - 1, n_B - 1)}$	$SE = \sqrt{\frac{s^2_A}{n_A} + \frac{s^2_B}{n_B}}$ (use $\sigma_A$ and $\sigma_B$ if known)	$T = \frac{\bar{x}_A - \bar{x}_B - \mu_0}{SE}$	$SE = \sqrt{\frac{s^2_A}{n_A} + \frac{s^2_B}{n_B}}$ (use $\sigma_A$ and $\sigma_B$ if known)	$ME = t^_{df} SE$ (use $z^$ SE if $\sigma_A$ and $\sigma_B$ is known)	$\bar{x}_{A} - \bar{x}_{B} \pm ME$
Comparing paired means	Response: One numerical (difference of two paired numerical)	$\mu_{diff} \longrightarrow \text{mean}$ $\sigma_{diff} \longrightarrow \text{standard deviation}$	$\bar{x}_{diff} \longrightarrow \text{mean}$ $\sigma_{diff} \longrightarrow \text{standard deviation}$	1. Independent samples 2. Normality ($n \ge 30$)	$df = n_{diff}-1$	$SE = \frac{s_{diff}}{\sqrt{n_{diff}}}$ (use $\sigma_{diff}$ if known)	$T = \frac{\bar{x}_{diff} - \mu_0}{SE}$	$SE = \frac{s_{diff}}{\sqrt{n_{diff}}}$ (use $\sigma_{diff}$ if known)	$ME = t^+{df} SE$ (use $z^ SE$ if $\sigma_{diff}$ is known)	$\bar{x}_{diff} \pm ME$
Simple linear regression	Response: One numerical Explanatory: One numerical or one two-level categorical	$y = \beta_0 + \beta_1 x$ $\beta_0 \longrightarrow \text{intercept}$ $\beta_1 \longrightarrow \text{slope}$ $\sigma_1 \longrightarrow \text{slope standard deviations}$	$\hat{y} = b_0 + b_1 x$ $b_0 \longrightarrow \text{intercept}$ $b_1 \longrightarrow \text{slope}$ $s_1 \longrightarrow \text{slope standard deviation}$	1. Linearity 2. Independent samples 3. Normal residuals 4. Homoscedasticity	$df = n-k-1$	`lm` output $SE =$ `std.error` (sample SE)	$T = \frac{b_1 - \text{null-value}}{SE}$	`lm` output $SE =$ `std.error` (sample SE)	$ME = t^_{df} SE$ (use $z^ SE$ if $\sigma_1$ is known)	$b_1 \pm ME$

R Coding Cheat Sheet

This cheat sheet - a comprehensive summary of R commands and procedures we did during lab - will help you on your R coding tasks throughout the course. The R coding cheat sheet below was made and provided by your lab assistant, Robin.

Download options: pdf | Rmd

Common Mathematical Symbols and their LaTeX Syntax in RMarkdown

Equal sign: $=$ will render as $=$.
Not equal sign: $\ne$ will render as $\ne$.
Lesser than sign: $<$ will render as $<$.
Greater than sign: $<$ will render as $>$.
Lesser than or equal to sign: $\le$ will render as $\le$.
Greater than or equal to sign: $\ge$ will render as $\ge$.
Using hats: $\hat{p}$ will render as $\hat{p}$.
Using bars: $\bar{x}$ will render as $\bar{x}$.
Greek lowercase letter mu: $\mu$ will render as $\mu$.
Greek lowercase letter beta: $\beta$ will render as $\beta$.
The letter p: $p$ will render as $p$.
Subscripting: $p_{A}$ will render as $p_{A}$.
Set union: $\cup$ will render as $\cup$.
Set intersection: $\cap$ will render as $\cap$.
Summation Notation: $\sum_{i=0}^k$ will render as $\sum_{i=0}^k$.
Infinity: $\infty$ will render as $\infty$.

Inference for	Relevant Variables	Population Parameters	Sample Statistics	Theoretical Conditions	Degrees of Freedom	Standard Error (Hypothesis Testing)	Test Statistic	Standard Error (Confidence Interval)	Margin of Error	Confidence Interval
One proportion	One two-level categorical	\(p\)	\(\hat{p}\)	1. Independent samples 2. Success-failure (\(np \ge 10\) and \(n(1-p) \ge 10\))	-	\(SE = \sqrt{\frac{p_0(1-p_0)}{n}}\)	\(Z = \frac{\hat{p} - p_0}{SE}\)	\(SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)	\(ME = z^* SE\)	\(\hat{p} \pm ME\)
Difference of two proportions	Two two-level categorical	\(p_{A} - p_{B}\)	\(\hat{p}_{A} - \hat{p}_{B}\)	1. Independent samples 2. Success-failure (\(n_A p_A \ge 10\) and \(n_A(1-p_A) \ge 10\) \(n_B p_B \ge 10\) and \(n_B(1-p_B) \ge 10\))	-	\(\hat{p}_{pool} = \frac{\hat{p}_A n_A + \hat{p}_B n_B}{n_A + n_B}\) \(SE = \sqrt{\hat{p}_{pool}(1-\hat{p}_{pool})\left(\frac{1}{n_A} + \frac{1}{n_B}\right)}\)	\(Z = \frac{\hat{p}_A - \hat{p}_B - p_0}{SE}\)	\(SE = \sqrt{\frac{\hat{p}_A (1-\hat{p}_A)}{n_A} + \frac{\hat{p}_B (1-\hat{p}_B)}{n_B}}\)	\(ME = z^* SE\)	\(\hat{p}_A - \hat{p}_B \pm ME\)
Two-way table	Two categorical with two or more levels	-	-	1. Independent samples 2. Normality (each cell has \(\ge 10\) samples)	\(k = (C-1)(R-1)\) (where \(C\) is the number of columns and \(R\) is the number of rows)	-	\(\chi_k^2 = \sum_{i=1}^n \frac{(O_i - E_i)^2}{E_i}\) (where \(n\) is the number of cells)	-	-	-
One mean	Response: One numerical	\(\mu \longrightarrow \text{mean}\) \(\sigma \longrightarrow \text{standard deviation}\)	\(\bar{x} \longrightarrow \text{mean}\) \(s \longrightarrow \text{standard deviation}\)	1. Independent samples 2. Normality (\(n \ge 30\))	\(df = n-1\)	\(SE = \frac{s}{\sqrt{n}}\) (use \(\sigma\) if known)	\(\frac{\bar{x} - \mu_0}{SE}\)	\(SE = \frac{s}{\sqrt{n}}\) (use \(\sigma\) if known)	\(ME = t^_{df} SE\) (use \(z^\) if \(\sigma\) is known)	\(\bar{x} \pm ME\)
Comparing two means	Response: One numerical Explanatory: One two-level categorical	\(\mu_{A} - \mu_{B}\)	\(\bar{x}_{A} - \bar{x}_{B}\)	1. Independent samples 2. Normality (\(n_A \ge 30\), \(n_B \ge 30\))	\(df = \min{(n_A - 1, n_B - 1)}\)	\(SE = \sqrt{\frac{s^2_A}{n_A} + \frac{s^2_B}{n_B}}\) (use \(\sigma_A\) and \(\sigma_B\) if known)	\(T = \frac{\bar{x}_A - \bar{x}_B - \mu_0}{SE}\)	\(SE = \sqrt{\frac{s^2_A}{n_A} + \frac{s^2_B}{n_B}}\) (use \(\sigma_A\) and \(\sigma_B\) if known)	\(ME = t^_{df} SE\) (use \(z^\) SE if \(\sigma_A\) and \(\sigma_B\) is known)	\(\bar{x}_{A} - \bar{x}_{B} \pm ME\)
Comparing paired means	Response: One numerical (difference of two paired numerical)	\(\mu_{diff} \longrightarrow \text{mean}\) \(\sigma_{diff} \longrightarrow \text{standard deviation}\)	\(\bar{x}_{diff} \longrightarrow \text{mean}\) \(\sigma_{diff} \longrightarrow \text{standard deviation}\)	1. Independent samples 2. Normality (\(n \ge 30\))	\(df = n_{diff}-1\)	\(SE = \frac{s_{diff}}{\sqrt{n_{diff}}}\) (use \(\sigma_{diff}\) if known)	\(T = \frac{\bar{x}_{diff} - \mu_0}{SE}\)	\(SE = \frac{s_{diff}}{\sqrt{n_{diff}}}\) (use \(\sigma_{diff}\) if known)	\(ME = t^+{df} SE\) (use \(z^ SE\) if \(\sigma_{diff}\) is known)	\(\bar{x}_{diff} \pm ME\)
Simple linear regression	Response: One numerical Explanatory: One numerical or one two-level categorical	\(y = \beta_0 + \beta_1 x\) \(\beta_0 \longrightarrow \text{intercept}\) \(\beta_1 \longrightarrow \text{slope}\) \(\sigma_1 \longrightarrow \text{slope standard deviations}\)	\(\hat{y} = b_0 + b_1 x\) \(b_0 \longrightarrow \text{intercept}\) \(b_1 \longrightarrow \text{slope}\) \(s_1 \longrightarrow \text{slope standard deviation}\)	1. Linearity 2. Independent samples 3. Normal residuals 4. Homoscedasticity	\(df = n-k-1\)	`lm` output \(SE =\) `std.error` (sample SE)	\(T = \frac{b_1 - \text{null-value}}{SE}\)	`lm` output \(SE =\) `std.error` (sample SE)	\(ME = t^_{df} SE\) (use \(z^ SE\) if \(\sigma_1\) is known)	\(b_1 \pm ME\)