Previously on Data…

$Source: [Fig:1.1 of OpenIntro: Introduction to Modern Statistics](https://openintro-ims.netlify.app/data-hello.html#variable-types){target='_blank'}$

Source: Fig:1.1 of OpenIntro: Introduction to Modern Statistics

Previously on Association vs Independence…

Associated variables are related somehow due to some underlying phenomenon.

Independent variables are the case where two variables are not associated.

Two variables CAN NOT be both associated and independent.

Note that association DOES NOT imply causation.

Study Design

In order to do statistical analysis using data, we need to understand the following:

What or who does the data represent?
Where is the data coming from?
How is the data collected?
What is the structure of the study?

Research Questions

Consider the following research questions:

What is the average cyanide content of unprocessed cassava?
Does this drug reduce the number of deaths of hospitalized COVID-19 patients?
Is the vaccine safe and effective against COVID-19?
Does these new vitamin supplements improve people’s health?

An explanatory variable is the likely cause that explains the response variable. A response variable is the expected outcome, and it responds to explanatory variable.

Examples:

The drug is the explanatory variable while the number of deaths is the response.
The supplements are the explanatory variable while people’s health is the response.

Anecdotal Evidence

Examples:

My neighbor ate unprocessed cassava and he was just fine.
The news says that two people took this drug while in the hospital and they recovered, so it must have worked.
A social media post says that a friend died after getting the vaccine, so it must be dangerous.
A close friend took these vitamins for 30 years and he says he feels great and has not got the flu in a year.

Note: We need to be careful on taking data so quickly. These anecdotal evidence examples may be true and can be verified but it may not be a good representation of the entire population of interest.

What can Statistics do to try avoid making hasty generalizations?

Populations and Samples

Sampling Methods - Simple random sampling

Sampling Methods - Stratified sampling

Simple random sampling (top) and stratified sampling (bottom)

Sampling Methods - Clustered sampling

More Statistical Terms

We use specific terms in order to differentiate when a number is being calculated on a sample of data (statistic) and when it is being calculated or considered for calculation on the entire population (parameter).

The terms statistic and parameter are useful for communicating claims and models.

Sampling Downfalls

Cherry picking sampling: A pick-and-choose method on which samples to get based on some interest.
Voluntary surveys: A way to take samples based on a voluntary basis. This may introduce non-response bias which can skew the results.
Convenience Sample: A sample that is easily accessible which are more likely to be sampled. It is often difficult to discern this type of sample represents because it might ignore the ones that can not be easily sampled.

Experimental Study

An experimental study is a type of study where we randomly assign a treatment to a group so where we can draw a causal relationship between the explanatory and response variables. We group people/things to groups and apply some treatment to one of the groups and the other group (control) does not get any treatment - or we apply a placebo.

Key points:

Closely monitored.
Expensive.
It typically have smaller sample sizes than observational studies.
It usually takes shorter time than observational studies.

Observational Study

An observational study is a type of study where we measure or survey people or things of a sample without doing any control and manipulation of the variables.