class: center, middle # R Data Packages <img src="img/hero_wall_pink.png" width="800px"/> ### Kelly McConville .large[Math 241 | Week 3 | Spring 2021] --- # Announcements * No new lab this week. * Will receive Mini-Project 1 today. --- ## Looking Ahead... * We will talk `join`s on Tuesday of Week 4. * Next week: + More wrangling with `dplyr` and `tidyr`. + Start data ingesting. * It is not unlikely to run into issues syncing your GitHub with your RStudio Server account. + Please come by office hours next week with ANY issues/questions! --- ## Goals for Today * Discuss git and GitHub. * Go over the GitHub + RStudio workflow. + For solo projects + For group projects * Discuss R packages. * Learn how to create an R data package * Mini-Project 1 Assignments + Create the RStudio Project associated with the project repo. --- ## git/GitHub --- ## git and GitHub * git: version control system * GitHub: Hosting service for git repositories --- ## Github Repo = RStudio Project * A **repo**, short for repository, is the folder that contains all of the files for the project on [github.com](github.com). * For each repo, you should create an RStudio Project (with version control). + We will all do this together in a moment. * Under the **Reed-Math241** GitHub Organization you will have 4 repos: + `labwork_username`: Just you (and course helpers) + `pa1_grp#`: You, your group members (and course helpers) + `pa2_grp#`: You, your group members (and course helpers) + `pa_final_grp#`: You, your group members (and course helpers) --- ## Workflow * Do some work on your project in RStudio. * **Pull** the most recent version of the project from GitHub to your account on the RStudio Server. * **Commit** that work. + Committing takes a snapshot of all the files in the project. + Look over the **diff**: which shows what has changed since your last update. + Include a quick note, **commit message** to summarize the motivation for the changes. * **Push** your commit to GitHub from your account on the RStudio Server. --- ## Collaboration: Git Style * Git is a *decentralized* version control system. + Each collaborator has a complete version of the repo. + Everyone can work offline and simultaneously. + GitHub holds the master copy. + Pull regularly to receive and integrate changes. * **Issues**: The primary method to communicate with your group members. + Make an issue if you have a question or comment or want to make a to do list for the project. + Remember that I am part of the repo... though I won't normally read the issues. --- ## Git Real * git is not friendly and can be frustrating. + BUT, the version control and collaborative rewards are big! * GitHub.com is a great place to develop an online presence. + For now, we will use private repos. * If you end up with a mess of errors, then don't worry. Come see me and we will make a new repo with your most recent copy of the project. + It happens to [everyone](https://xkcd.com/1597/). --- ## Now: * Sign-in to both github.com and rstudio.reed.edu. * Let's *git* our RStudio account on the server synced with our github account and then make a change to our personal repo! + Make sure you accepted the repo request sent to your email. --- ## Introduce Yourself to Git * Run the following code to introduce yourself to git ```r library(usethis) use_git_config(user.name = "mcconvil", user.email = "mcconville@reed.edu") ``` --- ## Sync GitHub.com repo and an RStudio Project repo **In your repo on github.com**: * Click on the green clone or download button. * Copy the given url for "Clone with HTTPS". **On the RStudio Server**: * In the upper left, go to File > New Project > Version Control. * Select Git. * Paste in the url. It should automatically give it a name. Select where you want the project to live in your home directory. Then click okay. --- ## Ignoring Files * There are several files that we want to **NOT** push to GitHub. * These include: + `.gitignore` + `___.Rproj` + `.DS_Store` * Add these files to the `.gitignore`. --- ## Test the waters: Let's go through the workflow. * Pull. (Yes, there is nothing to pull yet but it is always good practice to start here.) * Click on the readme. * Add something to the readme. * Click on the git tab. Check the box next to the readme.md. Hit commit. * Put in a commit message. Look over the diff. * Push. **Look for updates in the readme on github.com.** --- ## Cache credentials So that we don't have to type in our username and password every time we want to push or pull from GitHub, run the following **in the Terminal** not **in the Console**: `git config --global credential.helper 'cache --timeout=10000000'` --- ## R Packages * What is an R package? -- > "R packages are the fundamental unit of R-ness". -- Jenny Bryan -- * Contains functions and datasets * "base R": 14 base packages that are preloaded + There are 15 other packages that also come preloaded. * CRAN has > 6,000 more packages + `install.packages("dplyr")` + `library(dplyr)` * And then there are all the packages on `GitHub`: + `devtools::install_github("hadley/dplyr")` + `library(dplyr)` --- ## R Data Packages * Great way to share data! * Why? + Includes documentation. + Very portable. * Example 1: + `library(mosaicData)` + `data(package = "mosaicData")` + `?Births2015` -- * Example 2: + `library(pdxTrees)` + `get_pdxTrees_parks()` + `get_pdxTrees_streets()` -- * Example 3: + `library(gbfs)` --- ## Creating an R (Data) Package Key packages: * [`devtools`](https://cran.r-project.org/web/packages/devtools/index.html): supports the development and dissemination of the package * [`usethis`](https://usethis.r-lib.org/): automates steps of package creation, such as constructing the data file * [`roxygen2`](https://cran.r-project.org/web/packages/roxygen2/vignettes/roxygen2.html): simplifies writing documentation --- ## Steps * Let's go through the "Creating an R Data Package" hand-out. * I will demo the process with [this Seattle bike data](https://data.seattle.gov/Transportation/Fremont-Bridge-Bicycle-Counter/65db-xm6k). * Feel free to follow along but don't try to sync it with GitHub. --- ## Mini-Project 1 * Walk through the handout. + Also posted to Slack and in the shared folder. --- ## Git Collaboration: Merge conflicts * What if my partners and I both make changes? + Scenario: Your partner makes changes to a file, commits, and pushes to GitHub. You also modify that file, commit and push. + Result: Your push will fail because there's a commit on GitHub that you don't have. + Usual Solution: Pull and *usually* git will merge their work nicely with yours. Then push. If that doesn't work, you have a **merge conflict**. Let's cross that bridge when we get there. * How to avoid merge conflicts? + Always pull when you are going to work on your project. + Always commit and push when you are done even if you made small changes.