class: center, middle

# Data Collection Practice

<span style="color: #91204D;"> .large[Kelly McConville | Math 141 | Week 4 | Fall 2020] </span>

---

## Announcements

* Slack Pro Tips
    + Check at least once a day.
    + Set up your notifications in a way that works for you.
        + Play around in the Preferences.
    + Create at least one content related post per week.

---

## Reminders

* Lab 3 due before your lab session this week.
    + Practice visualizing data with `ggplot2` and wrangling data with `dplyr`.

* Project Assignment 1 is due on Friday October 2nd (end of day) on Gradescope.

* Come to office hours this week, especially if you haven't stopped by twice yet this semester.

---

## Week 4 Topics

* Finish up a couple more **Data Wrangling** examples

* **Data collection**

* Modeling

**This week is light on new R material.  Make sure to use that time to get caught up on the R work so far.**

---

# Goals for Today

Practice addressing:

* How were the data collected?

* Who are the data supposed to represent?
    + Who is present?  Who is absent?
    + What evidence is there that the data are representative?

---

## Types of Studies

* **Observational Study:** Collect data in a way that doesn't interfere

* **Experiment:** Interested in causal relationships so utilize random assignment.  Other key features include:
    + Blinding
    + Control group
    + Placebo

---

## Thoughts on Data Collection

#### Random Sampling

*  Random sampling is important to ensure the sample is representative of the population.

* Representativeness isn't about size.
    + Small random samples will tend to be more representative than large non-random samples.  
    
--

* How do we draw conclusions about the population from non-random samples?

&rarr; Investigate how your sampled cases (and respondents) are systematically different from the non-sampled cases (and non-respondents).

---

## Thoughts on Data Collection

#### Random Assignment

* Random assignment allows you to explore **causal** relationships between your explanatory variables and the predictor variables.

* How do we draw causal conclusions from studies without random assignment?

&rarr; With extreme care!  Try to control for all possible confounding variables.

&rarr; Discuss the associations/correlations you found.  Use domain knowledge to address potentially causal links.

&rarr; Take more stats to learn more about causal inference.

**Bottom Line:** We often have to use imperfect data to make decisions.

---

class: center, middle, inverse

# Now let's work on the Data Collection Practice Handout!