class: center, middle ## Probabilities and Random Variables <img src="img/DAW.png" width="450px"/> <span style="color: #91204D;"> .large[Kelly McConville | Math 141 | Week 10 | Fall 2020] </span> --- ## Announcements/Reminders * Midterm Exam: Go through it, fix your mistakes, and then come see Jonathan, Tom, or me to talk through any problems where you still aren't sure. --- ## Week 9 Topics * Probability Theory * Statisticial Inference: Theoretical Distributions ********************************************* ### Goals for Today * Conditional probabilities * Random variables --- ### Ethics * Let's return to the ASA's ["Ethical Guidelines for Statistical Practice"](https://www.amstat.org/ASA/Your-Career/Ethical-Guidelines-for-Statistical-Practice.aspx). -- * Come back to the sub-header "Responsibilities to Research Subjects" -- * Reflect on data showing racial disparities in rates of COVID-19 cases and deaths + Many reports showing higher impacts for Black and Latino individuals --- class: inverse, center, middle ## Responsibilities to Research Subjects > "The ethical statistician protects and respects the rights and interests of human and animal subjects at all stages of their involvement in a project. This includes respondents to the census or to surveys, those whose data are contained in administrative records, and subjects of physically or psychologically invasive research." --- ## Responsibilities to Research Subjects > "Recognizes any statistical descriptions of groups may carry risks of stereotypes and stigmatization. Statisticians should contemplate, and be sensitive to, the manner in which information is framed to avoid disproportionate harm to vulnerable groups." -- ### Example: Racial disparities in rates of COVID-19 * Many media outlets noted higher disparities for racial minority groups. -- * There has been a call for data to be released with more demographic detail. -- > "It is equally important, however, that in documenting Covid-19 racial disparities, we contextualize such data with adequate analysis. Disparity figures without explanatory context can perpetuate harmful myths and misunderstandings that actually undermine the goal of eliminating health inequities." -- [Merlin Chowkwanyun and Adolph Reed](https://www.nejm.org/doi/full/10.1056/NEJMp2012910) --- ### Example: Racial disparities in rates of COVID-19 Chowkwanyun and Reed point out: 1. Race is a social construct but Lundy Braun, Professor of Pathology and Laboratory Medicine as well as Africana Studies has found medical discourse that continues to assume biological differences. -- 2. "Lone disparity figures can give rise to explanations grounded in racial stereotypes about behavioral patterns." -- 3. There are dangers, such as repressive forms of surveillance, with providing data on a fine-grain geographic level. -- 4. The erroneous perception that certain social problems are "racial" and so only concern certain groups has been used to justify budget cuts. Need to provide the data WITH the following context: a. Socioeconomic status data -- b. The role of stress from external sources, like racial discriminiation -- c. Spatial resource deficits + Concentrations of respiratory hazards + Uneven distribution of medical care facilities --- ### Probability Recap **Goal**: Want to approximate sampling distributions (and bootstrap distributions and null distributions) with theoretical, probability models. <img src="wk10_mon_files/figure-html/unnamed-chunk-1-1.png" width="360" style="display: block; margin: auto;" /> --- ### Probability Recap **Random process**: outcomes is uncertain. The **probability** of an outcome is the "long-run proportion" of times the outcome occurs. * `\(P(\mbox{event})\)` #### Useful properties of probabilities: (1) `\(0 \leq P(\mbox{event}) \leq 1\)` (2) If two events are disjoints (have no outcomes in common), then $$ P(\mbox{event 1 or event 2}) = P(\mbox{event 1}) + P(\mbox{event 1}). $$ (3) Complement Rule $$ P(\mbox{event}) = 1 - P(\mbox{not that event}) = 1 - P(\mbox{event}^c) $$ --- ### Conditional Probabilities > "Conditioning is the soul of statistics." — Joe Blitzstein, Professor of the Practice, Harvard University -- **Question**: What do we mean by "conditioning"? -- Most polar bears are twins. Therefore, if you're a twin, you're probably a polar bear. -- * P(twin given polar bear) `\(\neq\)` P(polar bear given twin) -- The p-value is a conditional probability. -- * P-value = P(data given `\(H_o\)`) ( `\(\neq\)` P( `\(H_o\)` given data)) -- **Other favorite examples:** * P(have COVID given wear mask) `\(\neq\)` P(wear mask given have COVID) + In a CDC study, P(wear mask given have COVID) = 0.71 while P(have COVID given wear mask) is much lower. -- * Portland example: P(it is raining given there are clouds directly overhead) `\(\neq\)` Pr(there are clouds directly overhead given it is raining) --- ### Random Variables **Random variable** (RV) is a random process that **takes on numerical values**. -- * Discrete RV: Takes on discrete values (countable number of possible values) + EX: 0, 1, 2, 3, ... * Continuous RV: Can take on any value in a interval -- * Random variables have **probability functions** that tell us the likelihood of specific values. * For discrete RV, probability function is: $$ p(x) = P(X = x) $$ where `\(\sum p(x) = 1\)`. -- * Example: X = # when you roll die --- ### Random Variables For a random variable, care about its: -- * Probability function: `\(p(x) = P(X = x)\)` -- * Center: Mean of a RV: $$ \mu = \sum x p(x) $$ -- * Spread: Variance of a RV: $$ \sigma^2 = \sum (x - \mu)^2 p(x) $$ And, standard deviation of a RV: $$ \sigma = \sqrt{ \sum (x - \mu)^2 p(x)} $$ -- **Example**: What is the mean and variance for `\(X\)` = # when you roll die? * How does these measures relate to `\(\bar{x}\)` and `\(s^2\)`? --- class: inverse, middle, center ### Let's practice working with probabilities!