5 - Introduction to Hypothesis Testing

Alex John Quijano

09/27/2021

Previously on Statistics…

In the past lectures, we learned about the following:

Introduction to Hypothesis Testing

In this lecture, we will learn about the following:

Case Study - Sex Discrimination


Research Question
“Are individuals who identify as female discriminated against in promotion decisions made by their managers who identify as male?”


Sources:

Case Study: Sex Discrimination - The Setup (1/2)

Sex Discrimination - The Setup (2/2)

Sex Discrimination - The Data (1/2)

48 cards are laid out; 24 indicating male files, 24 indicated female files.  Of the 24 male files 3 of the cards are colored white, and 21 of the cards are colored red. Of the female files, 10 of the cards are colored white, and 14 of the cards are colored red.

The sex discrimination study can be thought of as 48 red and white cards.

Summary results for the sex discrimination study.
decision
sex promoted not promoted Total
male 21 3 24
female 14 10 24
Total 35 13 48

Sex Discrimination - The Data (2/2)

48 cards are laid out; 24 indicating male files, 24 indicated female files.  Of the 24 male files 3 of the cards are colored white, and 21 of the cards are colored red. Of the female files, 10 of the cards are colored white, and 14 of the cards are colored red.

The sex discrimination study can be thought of as 48 red and white cards.

Sex Discrimination - The Null and Alternative Hypothesis

The difference in promotion rates of 87.5% \(-\) 58.3% \(=\) 29.2%. This observed difference is what we call a point estimate of the true difference.

A point estimate is a sample statistic if the obtained parameter is a single value.

Sex Discrimination - Mathematical Symbols

The difference in promotion rates of 87.5% \(-\) 58.3% \(=\) 29.2%. This observed difference is what we call a point estimate of the true difference.

These hypotheses are part of what is called a hypothesis test, a statistical technique used to evaluate competing claims using data.

This hypothesis assumes that any differences seen are due to the variability inherent in the population and could have occurred by random chance.

Sex Discrimination - Independence

Sex Discrimination - Simulations (1/4)

Simulating a world with no sex discrimination, which means that the mean difference between the proportion of males and females who were promoted.

Sex Discrimination - Simulations (2/4)

The 48 red and white cards which denote the original data are shuffled and reassigned, 24 to each group indicating 24 male files and 24 female files.

The sex discrimination data is shuffled and reallocated to new groups of male and female files.

Sex Discrimination - Simulations (3/4)

The 48 red and white cards are show in three panels.  The first panel represents the original data and original allocation of the male and female files (in the original data there are 3 white cards in the male group and 10 white cards in the female group).  The second panel represents the shuffled red and white cards that are randomly assigned as male and female files.  The third panel has the cards sorted according to the random assignment of female or male.  In the third panel there are 6 white cards in the male group and 7 white cards in the female group.

We summarize the randomized data to produce one estimate of the difference in proportions given no sex discrimination. Note that the sort step is only used to make it easier to visually calculate the simulated sample proportions.

Sex Discrimination - Simulations (4/4)

Sex Discrimination - Observed vs. null statistics (1/2)

Regarding the distribution of the differences in proportions:

Sex Discrimination - Observed vs. null statistics (2/2)

The difference of 29.2% (observed statistic) is a rare event if there really is no impact from listing sex in the candidates’ files, which provides us with two possible interpretations of the study results:

Sex Discrimination - Next Steps

Statistical Inference


Statistical inference is the practice of making decisions and conclusions from data in the context of uncertainty.

Note: Sometimes uncertainty can not be quantified precisely but it can be quantified. Uncertainty is the estimation of error present in data and it can give us some level of confidence on how sure we are about our estimation about the population.

Summary

In this lecture we talked about:

In the next lecture,

Today’s Activity

Within your group, discuss the answers for the following problem.

Hypotheses. For each of the research statements below, note whether it represents a null hypothesis or an alternative hypothesis. This exercise is from your textbook OpenIntro: IMS Section 11.5

  1. The number of hours that grade-school children spend doing homework predicts their future success on standardized tests.

  2. King cheetahs on average run the same speed as standard spotted cheetahs.

  3. For a particular student, the probability of correctly answering a 5-option multiple choice test is larger than 0.2 (i.e., better than guessing).

  4. The mean length of African elephant tusks has changed over the last 100 years.

  5. The risk of facial clefts is equal for babies born to mothers who take folic acid supplements compared with those from mothers who do not.

  6. Caffeine intake during pregnancy affects mean birth weight.

  7. The probability of getting in a car accident is the same if using a cell phone than if not using a cell phone.