Alex John Quijano
09/03/2021
Associated variables are related somehow due to some underlying phenomenon.
Independent variables are the case where two variables are not associated.
Two variables CAN NOT be both associated and independent.
In order to do statistical analysis using data, we need to understand the following:
What or who does the data represent?
Where is the data coming from?
How is the data collected?
What is the structure of the study?
Consider the following research questions:
What is the average cyanide content of unprocessed cassava?
Does this drug reduce the number of deaths of hospitalized COVID-19 patients?
Is the vaccine safe and effective against COVID-19?
Does these new vitamin supplements improve people’s health?
An explanatory variable is the likely cause that explains the response variable. A response variable is the expected outcome, and it responds to explanatory variable.
Examples:
The drug is the explanatory variable while the number of deaths is the response.
The supplements are the explanatory variable while people’s health is the response.
Examples:
My neighbor ate unprocessed cassava and he was just fine.
The news says that two people took this drug while in the hospital and they recovered, so it must have worked.
A social media post says that a friend died after getting the vaccine, so it must be dangerous.
A close friend took these vitamins for 30 years and he says he feels great and has not got the flu in a year.
Note: We need to be careful on taking data so quickly. These anecdotal evidence examples may be true and can be verified but it may not be a good representation of the entire population of interest.
What can Statistics do to try avoid making hasty generalizations?Simple random sampling (top) and stratified sampling (bottom)
We use specific terms in order to differentiate when a number is being calculated on a sample of data (statistic) and when it is being calculated or considered for calculation on the entire population (parameter).
The terms statistic and parameter are useful for communicating claims and models.
Cherry picking sampling: A pick-and-choose method on which samples to get based on some interest.
Voluntary surveys: A way to take samples based on a voluntary basis. This may introduce non-response bias which can skew the results.
Convenience Sample: A sample that is easily accessible which are more likely to be sampled. It is often difficult to discern this type of sample represents because it might ignore the ones that can not be easily sampled.
An experimental study is a type of study where we randomly assign a treatment to a group so where we can draw a causal relationship between the explanatory and response variables. We group people/things to groups and apply some treatment to one of the groups and the other group (control) does not get any treatment - or we apply a placebo.
Key points:
Closely monitored.
Expensive.
It typically have smaller sample sizes than observational studies.
It usually takes shorter time than observational studies.
An observational study is a type of study where we measure or survey people or things of a sample without doing any control and manipulation of the variables.
Key points:
Less expensive than experimental studies.
It can take several years or decades.
In this lecture, we talked about the following:
Study Design and questions you should ask when looking at data.
Examples of anecdotal evidence.
The explanatory and response variables.
Examples of population and samples.
Basic ideas of sampling methods which are simple random sampling, stratified sampling, and clustered sampling.
Example downfalls of sampling methods which are cherry picking sampling, voluntary surveys, and convenience sampling
The concept and examples of experimental studies and observational studies.
Identify what type of studies of these examples below and comment on what type of sampling they used and the type of variables involved. Include a comment whether there are missing information or the sampling method might be a bit problematic.
A study took a random sample of students and asked them about their bedtime schedules. The data showed that people who sleep for at least 8 hours before the exam day were more likely to get good grades than those who sleep for less than 8 hours.
A study randomly assigned people to one of the two groups. Group 1 was asked to follow a strict study schedule for a fixed period of time whereas Group 2 was asked to study in the same way as they used to earlier. The researchers looked at which group scored better in the exams.
A study took a random sample of people and examined their smoking habits. Each person was classified as either a light, moderate or heavy smoker. The researcher looked at the stress level of each group.
These problems are taken from Towards Data Science Blog - “Observational vs Experimental Study”.