Alex John Quijano
09/29/2021
In the previous lecture, we learned about the following:
Formulating/Identify null and alternative hypothesis using the gender discrimination study.
The idea of randomization test (also called permutation test) to produce a probability distribution under the null hypothesis.
In this lecture, we will learn about:
Another example of hypothesis testing on difference in proportions.
More details on the randomization test.
An introduction to the p-value.
A hypothesis test is a formal technique for evaluating two competing possibilities.
Null and alternative hypotheses.
The null hypothesis \((H_0)\) often represents either a skeptical perspective or a claim of “no difference” to be tested.
The alternative hypothesis \((H_A)\) represents an alternative claim under consideration and is often represented by a range of possible values for the value of interest.
We discussed a study from the 1970’s that explored whether there was strong evidence that female candidates were less likely to be promoted than male candidates.
The research question are female candidates discriminated against in promotion decisions? was framed in the context of hypotheses:
\(H_0:\) Sex has no effect on promotion decisions.
\(H_A:\) Female candidates are discriminated against in promotion decisions.
The data on gender discrimination provided a point estimate of a 29.2% difference in recommended promotion rates between male and female candidates.
We determined that such a difference from chance alone, assuming the null hypothesis was true, would be “rare”: it would only happen about 2 in 100 times.
When results like these are inconsistent with \(H_0,\) we reject \(H_0\) in favor of \(H_A.\) Here, we concluded there was discrimination against female candidates.
We summarize the randomized data to produce one estimate of the difference in proportions given no sex discrimination. Note that the sort step is only used to make it easier to visually calculate the simulated sample proportions.
The p-value is the probability of observing data at least as favorable to the alternative hypothesis as our current dataset, if the null hypothesis were true.
We typically use a summary statistic of the data, such as a difference in proportions, to help compute the p-value and evaluate the hypotheses.
This summary value that is used to compute the p-value is often called the test statistic.
In the sex discrimination study, the difference in discrimination rates was our test statistic.
Sources:
We are interested in whether reminding students about this well-known fact about money causes them to be a little thriftier.
A skeptic might think that such a reminder would have no impact.
We can summarize the two different perspectives using the null and alternative hypothesis framework.
\(H_0:\) Null hypothesis. Reminding students that they can save money for later purchases will not have any impact on students’ spending decisions.
\(H_A:\) Alternative hypothesis. Reminding students that they can save money for later purchases will reduce the chance they will continue with a purchase.
One-hundred and fifty students were recruited for the study, and each was given the following statement:
Imagine that you have been saving some extra money on the side to make some purchases, and on your most recent visit to the video store you come across a special sale on a new video. This video is one with your favorite actor or actress, and your favorite type of movie (such as a comedy, drama, thriller, etc.). This particular video that you are considering is one you have been thinking about buying for a long time. It is available for a special sale price of $14.99. What would you do in this situation? Please circle one of the options below.
Half of the 150 students were randomized into a control group and were given the following two options:
The remaining 75 students were placed in the treatment group, and they saw a slightly modified option (B):
Would the extra statement reminding students of an obvious fact impact the purchasing decision?
decision
|
|||
---|---|---|---|
group | buy video | not buy video | Total |
control | 56 | 19 | 75 |
treatment | 41 | 34 | 75 |
Total | 97 | 53 | 150 |
Stacked bar plot of results of the opportunity cost study.
decision
|
|||
---|---|---|---|
group | buy video | not buy video | Total |
control | 0.747 | 0.253 | 1 |
treatment | 0.547 | 0.453 | 1 |
We will define a success in this study as a student who chooses not to buy the video.
Then, the value of interest is the change in video purchase rates that results by reminding students that not spending money now means they can spend the money later.
The test statistic in the opportunity cost study was the difference in the proportion of students who decided against the video purchase in the treatment and control groups.
The point estimate of the difference in proportions was used as the test statistic.
A point estimate is a test statistic obtained from the observed sample, and is used as our best guess of the unobserved population parameter.
Is this 20% difference between the two groups so prominent that it is unlikely to have occurred from chance alone, if there is no difference between the spending habits of the two groups?
Using the same randomization technique from the last section, let’s see what happens when we simulate the experiment under the scenario where there is no effect from the treatment.
We start with 150 index cards and label each card to indicate the distribution of our response variable: decision
.
That is, 53 cards will be labeled “not buy video” to represent the 53 students who opted not to buy, and 97 will be labeled “buy video” for the other 97 students.
Then we shuffle these cards thoroughly and divide them into two stacks of size 75, representing the simulated treatment and control groups.
The results of a single randomization is shown in Table @ref(tab:opportunity-cost-obs-simulated).
decision
|
|||
---|---|---|---|
group | buy video | not buy video | Total |
control | 46 | 29 | 75 |
treatment | 51 | 24 | 75 |
Total | 97 | 53 | 150 |
The difference that occurred from the first shuffle of the data (i.e., from chance alone):
\[\hat{p}_{T, shfl1} - \hat{p}_{C, shfl1} = \frac{24}{75} - \frac{29}{75} = 0.32 - 0.387 = - 0.067\]
A histogram of 1,000 chance differences produced under the null hypothesis. Histograms like this one are a convenient representation of data or results when there are a large number of simulations.
Under the null hypothesis (no treatment effect), we’d observe a difference of at least +20% about 0.6% of the time.
We determined that such a large difference would only occur 6-in-1,000 times if the reminder actually had no influence on student decision-making.
When the p-value is small, i.e., less than a previously set threshold, we say the results are statistically significant.
This means the data provide such strong evidence against \(H_0\) that we reject the null hypothesis in favor of the alternative hypothesis.
The threshold, called the significance level and often represented by \(\alpha\) (the Greek letter alpha).
The value of \(\alpha\) represents how rare an event needs to be in order for the null hypothesis to be rejected.
The value of \(\alpha\) can vary depending on the the field or the application.
In the opportunity cost study, we analyzed an experiment where study participants had a 20% drop in likelihood of continuing with a video purchase if they were reminded that the money, if not spent on the video, could be used for other purchases in the future.
We determined that such a large difference would only occur 6-in-1,000 times if the reminder actually had no influence on student decision-making.
The p-value was 0.006. Use \(\alpha = 0.05\).
Since the p-value is less than 0.05, the data provide statistically significant evidence that US college students were actually influenced by the reminder.
Similarly in the sex discrimination study, the p-value was found to be approximately \(0.002\) using \(1000\) shuffles.
Using a significance level of \(\alpha = 0.05,\) we would say that the data provided statistically significant evidence against the null hypothesis.
We say that the data provide statistically significant evidence against the null hypothesis if the p-value is less than some predetermined threshold (e.g., 0.01, 0.05, 0.1).
In this lecture we talked about:
Another example hypothesis testing in the context of difference in proportions.
Basic ideas of the p-value and statistical significance.
Some basic cautions about making statistical conclusions.
In the next lecture, we will talk about:
Another example of hypothesis testing.
Decision errors.
A light introduction to confidence intervals.
Within your group, discuss the answers for the following problem.
Hypotheses. Write the null and alternative hypotheses in words and then symbols for each of the following situations. OpenIntro: IMS Section 11.5
New York is known as “the city that never sleeps”. A random sample of 25 New Yorkers were asked how much sleep they get per night. Do these data provide convincing evidence that New Yorkers on average sleep less than 8 hours a night?
Employers at a firm are worried about the effect of March Madness, a basketball championship held each spring in the US, on employee productivity. They estimate that on a regular business day employees spend on average 15 minutes of company time checking personal email, making personal phone calls, etc. They also collect data on how much company time employees spend on such non- business activities during March Madness. They want to determine if these data provide convincing evidence that employee productivity decreases during March Madness.