Overview

One of the most important functions of the working statistician is to investigate and answer significant research questions by analyzing real-world data, using a variety of elementary and advanced modeling techniques, and to distill the results into reports that are accessible to the non-statistician.

You will work in small groups to research a topic of interest to you, and then summarize your results in a short video presentation to the class and as well as a technical report submitted to me.

The datasets available for this project can be found on the Project Data page, under the Projects tab on the course website.

Project Goals

  • Investigate a real-world data set by performing exploratory data analysis and visualization.

  • Formulate a research question and hypothesis.

  • Create a data biography by exploring its context and source.

  • Perform appropriate statistical inference to answer the research question.

  • Craft a clear, engaging narrative answering your research question in a technical report and short pre-recorded presentation.

Project Timeline

Name Description Due Date
Group Formation Submit the name of 1 other person you’d like to be in a group with. 5pm PST Friday, February 11th (Week 3)
Research Proposal Identify the data set and research question your group will investigate. 5pm PST Monday, February 21st (Week 5)
Assignment 1 Data Exploration via wrangling, summarizing, and visualization 5pm PST Friday, March 18th (Week 8)
Assignment 2 Data Biography to contextualize data. 5pm PDT Friday, April 8th (Week 10)
Assignment 3 Statistical analysis to answer research question. 5pm PDT Friday, April 29th (Week 13)
Presentation 10 minute pre-recorded presentation outlining project results May 9th (First Day of Finals Week)
Technical Report 3 - 4 page technical report outlining results of Assignments 1 - 3 May 9th (First Day of Finals Week)

Group Expectations

As members of both the Math 141 and Reed College community of scholars, we expect all students to engage with this project in a manner that respects both the Math 141 Code of Conduct and the Reed College Honor Principle.

In particular, for this group project, each group member should…

  • Respond to messages or other discussion promptly (within 1 business day at the latest).
  • Attend scheduled meetings, and give preemptive notice when attendence isn’t possible.
  • Make significant contribution to each assignment.
  • Allow each other group member opportunity to make significant contribution to each assignment.
  • Communicate personal timeline for finishing tasks.
  • Respect others’ personal timeline for finishing tasks.
  • Provide charitable and constructive feedback on other group members’ work.
  • Incorporate feedback from other group members to improve work.

After each project component, each student will be asked to complete a self-evaluation survey reflecting on their contributions to the project.


Components

Group Formation

By 5pm on Friday, February 11th, please send me via email or Slack message the name of 1 other person you’d like to be in a group with. If you don’t submit anything, we will randomly pair you with another student in your lecture class. We will then combine pairs to form project groups of 4 students (with a few groups of 3 or 5 students as necessary).


Research Proposal

Goals

  • Determine your research question(s), along with the dataset and variables.

  • Describe the significance of an answer to the question to the context of the data.

Notes

  • The project assignments will be fairly open-ended and much less prescribed than your lab assignments, mimicking a more real-world situation where you are tasked with extracting knowledge from data.
  • Think carefully when selecting your research questions since you will explore these same questions for the whole project.
  • Make sure everyone in the group is interested in the selected research questions.
  • Make sure to read the provided background information about the data.

Tasks

  1. As a group, determine
    • Which dataset you want to investigate for your project.
    • Two potential research questions you want to explore involving this dataset.
      • Each question should relate to at least two of the variables in the dataset.
      • The questions should all have the same general theme but may involve different variables.
      • The questions can (and likely will) relate to subsets of the data. For example, maybe you want to focus on how COVID-related behaviors differ between residents of two states in the US or you want to focus on protests in a given year and region of the world.
  2. In a one page proposal:
    • Explicitly state your research questions
    • Indicate the dataset and variables you will work with
    • Discuss the utility of an answer to each of your research questions, or describe why an answer would be interesting or relevant to your group (at least 1 paragraph for each question).
  3. Turn in the .pdf of your research proposal on Gradescope by 5pm PST on Monday, February 21st.

Crafting Research Questions

Usually you should start with a research question and then search for data to help you address the question. For feasibility reasons, we are asking you to work backwards. Here are some tips for generating your research questions:

  • Read over the background information about the dataset that interests you and your group and start considering what relationships you might want to explore.
  • Pick out a few specific variables and (re)frame your question around exploring the relationship between those variables.
  • Make sure your question is focused enough that it can be answered with the data at hand.
  • Here are some generic examples to get your group started:
    • EX: Does country A have a higher rate of X than country B?
    • EX: Is X positively related to Y? (In other words, as X increases, does Y tend to increase?)
    • EX: Is there evidence that trend X is becoming more popular over time?
    • EX: Is there a relationship between X and Y?
    • EX: How well do the following factors, X and Y, predict the variable Z?
    • EX: Are there differences in X by Y?

Rubric

You will be assessed on the following:

  • The degree to which the research question is of appropriate scope for the project, and can be answered by the data at hand.
  • The depth, nuance, or insight that an answer to the research question could provide about the data set or a population.
  • The quality and technical correctness of the writing.
  • Whether the proposal contains all required parts.
  • The originality of work.
  • Each student’s individaul contributions to the project.

Assignment 1

Goals

  • Confirm/revise your research question, along with the dataset and variables.
  • Practice inspecting data.
  • Practice visualizing and summarizing data.

Notes

  • If you find after preliminary data exploration and analysis that your research question is not answerable using the data at hand, you are welcome to select a new research question after consulting with your instructor.

Tasks

  1. As a group, determine which of the research questions explored in “Research Proposal” you wish to investigate.

  2. For this research question, start investigating the question by:
    • Producing useful summaries of the variables and their relationships.
    • Graphing each variable and the relationships between variables.
    • Completing any useful data wrangling.
  3. In an Rmd file, write a two page summary that:
    • States your research question and some initial answers/findings related to the questions
    • Introduces the data and addresses what/who the data represent (for your variables of interest)
    • Presents at least three summary statistics and discusses what they suggest about the data.
    • Presents at least three data visualizations and discusses what they suggest about the data.
    • Includes your R code.
  4. Turn in the .pdf of your summary on Gradescope by 5pm PST on Friday, March 18th.

Rubric

You will be assessed on the following:

  • The informativeness of your summary with respect to one or both of your research question
  • The appropriateness of the chosen graphs and summary statistics
  • The degree to which each graph makes appropriate use of geoms and their aesthetics, scale, and context
  • The degree to which the graphs are clear and engaging
  • The degree to which the graphs, summary statistics, and narrative support each other
  • The degree to which the text and code are well organized and well-written
  • The originality of work
  • Each student’s individaul contributions to the project.

Project Assignment 2

Goals

  • Create a data biography by answering the following key questions about the data:
    • Where did the data come from?
    • When were the data collected?
    • Why were the data collected?
    • How were the data collected?
    • Who are the data supposed to represent?
      • Who is present? Who is absent?
      • What evidence is there that the data are representative? What evidence is there that the data are not representative?
  • Better understand the context of our data to reduce the assumptions and biases we are placing on the data.

Notes

  • We encourage you to do some sleuthing here to answer these questions! Don’t just rely on the provided data dictionaries.
  • You should cite your sources at the end of your data biography, using your preferred citation style (but enough information should be included that a reader can track down your source)

Assignment

  1. Write a 2-3 page data biography that attempts to answer the questions provided in the Goals.
    • Your write-up should be presented as a narrative, using complete sentences and paragraphs.
  2. Turn in the pdf of your biography on Gradescope by 5pm on Friday April 8th.

Rubric

You will be assessed on the following:

  • The informativeness of your data biography with respect to each the key questions provided in the Goals Section
  • The degree to which the text is supported by references and the appropriateness of the selected references
  • The degree to which the text is well organized and well-written
  • Each student’s individaul contributions to the project.

Further Reading

The following pair of articles discuss the importance of data biographies, and outline the process of creating a good data biography:

Heather Krause, Data Biographies: How to Get to Know Your Data on DATASSIST:

Catherine D’Ignazio, Putting data back into context on DataJournalism.com:


Project Assignment 3

Goals

  • Conduct statistical inference on your research questions.

Assignment

  1. Conduct a hypothesis test for one of your research questions.
    • For the hypothesis test,
      • Explicitly state the hypotheses in both words and symbols.
      • Include the method used, the test statistic, and the p-value.
      • Determine an appropriate significance level based on the consequences for type I/II errors
      • Check assumptions. (If violated, still finish the test but be cautious in your conclusion.)
      • Interpret the p-value in the context of the problem.
      • Discuss conclusions about the conjecture.
      • Describe whether the observed effect has practical significance, based on your understanding of the data context.
  2. Construct a confidence interval for one of your research questions.
    • For the confidence interval,
      • Include the method used, confidence level, and interval values.
      • Describe why you choose the confidence level you did, based on the relationship between confidence level and margin of error, as well as the specific data context.
      • Check assumptions. (If violated, still construct the confidence interval but be cautious in your conclusion.)
      • Discuss conclusions about the conjecture.
  3. Write a 1-2 page summary of your findings that includes all the pieces specified in 1. and 2. Include appropriate visualizations for the confidence intervals and hypothesis tests.

  4. Turn in the pdf of your summary on Gradescope by 5pm Friday, April 29th.

Rubric

You will be assessed on the following:

  • For the hypothesis test,
    • Selecting an appropriate parameter of interest
    • Including the correct method, correct test statistic, and correct p-value.
    • Checking assumptions.
    • Correctly interpreting the p-value in the context of the problem.
    • Accurately discussing conclusions about the conjecture.
  • For the confidence interval:
    • Including the correct method and interval values.
    • Including confidence level.
    • Checking assumptions.
    • Accurately discussing conclusions about the conjecture.
  • The degree to which the text is well organized and well-written
  • Each student’s individaul contributions to the project.

Final Assignment

Each group may choose one of two formats to communicate the results of their study:

  1. A 10-minute pre-recorded video presentation; OR

  2. A 3-4 page technical report.

Specific details about the two formats can be found below.


Video Presentation

Goals

  • Craft a clear, engaging, accurate story about one of your research questions.

Assignment

  1. Create a 8-12 minute video presentation that addresses the following:
    • Your research question
    • Your data source
    • Exploratory graphs and summary statistics and what they tell you about your research question
    • An inference procedure (and any assumptions) and interpretation of results
    • Conclusions about your research question
  2. The presentations are due on 9:00am PDT on Monday, May 9th. One group member should turn drop the video into the Panopto dropbox for Math 141. Instructions for accessing this dropbox can be found here (under the “For a Course or Administrative Office Webpage” section)

Notes

  • In this project assignment, you likely won’t need to conduct any additional analysis. Instead, you will be summarizing content from the previous project assignments.
    • However, it is okay if you do conduct additional analysis
  • Videos will be graded both for content and for how well the material is delivered.
  • You should address your topic and statistical content at a level that is appropriate for a Math 141 audience.

Video Creation

  • We suggest you first create a set of slides (e.g., Google Slides, Beamer, Powerpoint, Keynote…).
  • Then you create a video by recording a presentation of the slides and the corresponding audio.
  • For recording, one option would be to use Zoom. Here is a useful CUS website.
  • When recording, only one person should be in-charge of sharing their screen.
  • Practice the presentation several times before recording to ensure a polished final product.
  • For short videos like this, it is easeir to do another take than it is to edit the video afterwards.

Rubric

You will be assessed on the following:

  • Participation: All group members must be speakers in the video.
  • Length: Videos that are not between the 8 - 12 minutes will be penalized.
  • Content: Demonstrates a full and accurate understanding of the material presented.
    • The video addresses each item listed in (1).
  • Delivery: How well the content is presented. If the presentation is polished and clear.
    • This does NOT mean you must create a video with a lot of technological bells and whistles!
    • This does mean that you should write a script beforehand and practice, practice, practice so that it doesn’t sound like you are reading a script.
    • This does mean that you should structure your presentation so that it is easy to follow the main points and how they connect.
  • Sources: At least 2 appropriate references are included.

Technical Report

Goals

  • Craft a clear, engaging, accurate story about one of your research questions.

Assignment

  1. Create a 3-4 page technical report that addresses the following:
    • Your research question
    • Your data source
    • Exploratory graphs and summary statistics and what they tell you about your research question
    • An inference procedure (and any assumptions) and interpretation of results
    • Conclusions about your research question
  2. The technical reports should be emailed to Nate Wells () with all group members cc’d by 9:00am PDT on Monday, May 9th.

Notes

  • In this project assignment, you likely won’t need to conduct any additional analysis. Instead, you will be summarizing content from the previous project assignments.
    • However, it is okay if you do conduct additional analysis
  • Reports will be graded both for content and for how well the material is discussed
  • You should address your topic and statistical content at a level that is appropriate for a Math 141 audience.

Technical Report Details

  • Simply combining your work in Assignments 1, 2, and 3 will produce a document that is much longer than 3 - 4 pages.
    • Instead, think carefully about what the most important details are for your analysis, and curate your previous assignments to highlight and support these.
  • You do not need to include the code used to perform your analysis in the .pdf document itself. You should however, include summary statistics, visualizations, and the results of any inference where appropriate.
    • To have code run when you knit, but not display, replace the chunk header {r} with {r echo = F}.
    • If you also don’t want the output of the code to display, use {r echo = F, include = F }
  • You can control the size of included graphics by adding {r fig.width =..., fig.height=...} to your chunk options, where ... is replaced with the desired width/height of graphic in inches.
  • Your report can have a title page listing the project title, the project authors, and date. This page does not count towards the page limit.

Rubric

You will be assessed on the following:

  • Length: Technical reports that are not between the 3-4 pages single-spaced will be penalized.
  • Content: Demonstrates a full and accurate understanding of the material presented.
    • The report addresses each item listed in (1).
  • Style The degree to which the text is well organized and well-written
  • Sources: At least 2 appropriate references are included. The references should be on a separate page from the report and are not included in the page count.
  • Each student’s individaul contributions to the project.