class: center, middle, inverse ### Data Ethics <img src="img/hero_wall_pink.png" width="800px"/> ### Kelly McConville .large[Math 241 | Week 12 | Spring 2021] --- ## Announcements/Reminders * Lab 8 due on Thursday. + Don't submit on Gradescope. + Upload to your Math 241 repo and put on either the Reed Shiny Server or [https://www.shinyapps.io/](https://www.shinyapps.io/). + Can work with another Math 241 student. * No Kelly Wed Office Hours this week. --- ## Plan for the Week Tuesday * Discuss ideas related to data ethics -- Thursday * Work through some case studies + Will turn in for completion credit --- ## Creating a Supportive Space <!-- * "Investigate the morals of data collection, processing analysis and application." -- [Langkjaer-Bain](https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2018.01208.x) --> Let's create a space where we are allowed to make mistakes and to grow. -- * Identity is complex. Privilege is complex. Let's recognize that our own experiences and identities have shaped our views, even around data. -- * Be generous with yourself, your classmates, and me as we talk through these topics. -- * Be willing to share and hear ideas that are not yet refined. --- ## Data Ethics * Not going to tell you what to think. -- * Want you to grapple with the ramifications of data work and start developing your own ethical compass for your data work. -- * Many discussion items are from [Data Feminism](https://data-feminism.mitpress.mit.edu/) (DF) by Catherine D'Ignazio and Lauren F. Klein + Big thanks to those who participated in last summer's Data Book Club! <img src="img/df.png" width="25%" style="display: block; margin: auto;" /> --- ## The Ubiquity of Data Data are used to answer questions like: -- * What products should a grocery store stock before a hurricane? -- <img src="img/poptarts.png" width="40%" style="display: block; margin: auto;" /> -- * Which city building to inspect for risk of fire? -- * Which homeless in LA should be prioritized for housing? -- * Which children in Pittsburgh might be future victims of abuse or neglect? --- ## Impact of This Datafication of Our Lives? Allocate scarce resources to maximize public safety -- * Example: New York City housing inspections + Receive about 25,000 complaints per year about buildings illegally converted to apartments + Illegal apartments pose fire risk and firefighters have died trying to save the residents of these apartments + Built a model for fire risk and inspected buildings with highest risk + Did issue more orders to vacate but also decreased risk of fire. -- Amplify existing inequities -- * Example: Predictive policing models + Recommended increased patrolling in black neighborhoods based on previous crime data + Pernicious feedback loop + More police in certain neighborhoods, more likely to be there when new crime is committed. + People (often poor and of color) involved more likely to get ticketed, arrested, or killed. --- ### [Satirical Map of White Collar Crime Risk Zones](https://whitecollar.thenewinquiry.com/#dr5ru26) <img src="img/whitecollar.png" width="95%" style="display: block; margin: auto;" /> --- class: inverse, center, middle ## What Should I do to Ensure I am an Ethical Data Scientist? -- ### How Do I Use Data for Good?? --- ### Understand and Challenge the Power Structures **Who is doing the work of data science (and who is not)?** -- **Example**: [The Coded Gaze](https://www.youtube.com/watch?v=162VzSzzoPs) <div class="figure" style="text-align: center"> <img src="img/coded_gaze.png" alt="Joy Buolamwini" width="35%" /> <p class="caption">Joy Buolamwini</p> </div> -- * Facial recognition software struggles to see faces of color. -- * Algorithms built on a non-diverse, biased dataset. -- * **Algorithmic bias**: when the model systematically creates unfair outcomes, such as privileging one group over another. --- ### Understand and Challenge the Power Structures **Who is doing the work of data science (and who is not)?** -- * Those doing data science are not usually representative of the general population. --- ### Understand and Challenge the Power Structures **Who is doing the work of data science (and who is not)?** > "Privilege is blind to those who have it." -- Michael Kimmel -- **Michael's Example**: * His African-American colleague said, "When I look in the mirror I see a Black woman." -- * When a white woman looks in the mirror she sees a woman. -- * And Kimmel, a white man, rejoins, "And when I look in the mirror, I see a human being." -- > "Privilege hazard: the phenomenon that makes those who occupy the most privileged positions among us -- those with good educations, respected credentials, and professional accolades -- so poorly equipped to recognize instances of oppression in their world." -- DF --- ### Understand and Challenge the Power Structures **Whose goals are prioritized in data science (and whose are not)?** -- **Example:** [Mimi Onuocha's The Library of Missing Datasets](https://github.com/MimiOnuoha/missing-datasets) <img src="img/missing_datasets.png" width="55%" style="display: block; margin: auto;" /> --- ### Understand and Challenge the Power Structures **Whose goals are prioritized in data science (and whose are not)?** **Example:** [Mimi Onuocha's The Library of Missing Datasets](https://github.com/MimiOnuoha/missing-datasets) > "My interest in them stems from the observation that within many spaces where large amounts of data are collected, there are often empty spaces where no data live. Unsurprisingly, this lack of data typically correlates with issues affecting those who are most vulnerable in that context." -- Mimi Onuocha --- ### Understand and Challenge the Power Structures **Who benefits from the data work (and who is either overlooked or actively harmed)?** -- **Example**: Allegheny County, PA model to predict risk of child abuse -- * Goal: Remove children from potential abusive household before it happens. -- * Problem: Poorer families use more public resources and so the model has more data on them. + Their children are over-targeted as being at-risk. --- ### Learn about the Context of the Data * Most *found data* doesn't come with a robust set of metadata -- * **Example**: `?get_pdxTrees_parks`: Missing a lot of units because they weren't in the city's metadata -- * If the data are about people, engage with and listen to those communities. --- ### Understand that Data are Never "Raw" and Need to be Interrogated > "Raw Data is an Oxymoron" -- Lisa Gitelman Her main arguments: -- * Data don't just exist but need to be imagined. -- * Each discipline has its own standards and assumptions for creating data. --- ### Understand that Data are Never "Raw" and Need to be Interrogated Potential questions to ask the data: -- + Where did the data come from? -- + When were the data collected? -- + Why were the data collected? -- + How were the data collected? -- + Who are the data supposed to represent? + Who is present? Who is absent? + What evidence is there that the data are representative? What evidence is there that the data are not representative? -- **Example**: The Smithsonian's Global Volcanism Program's dataset on all documented eruptions -- **Example**: Categories for gender on a survey --- ### Also Interogate Any Algorithms Algorithms also aren't inherently neutral and unbiased. -- **Example**: US News and World Report College Rankings -- * Weight-and-sum model <img src="img/usnews_wts.png" width="75%" style="display: block; margin: auto;" /> --- ### Also Interogate Any Algorithms Algorithms also aren't inherently neutral and unbiased. **Example**: US News and World Report College Rankings <img src="img/usnews_pred.png" width="85%" style="display: block; margin: auto;" /> --- ### Also Interogate Any Algorithms Ask: * Who is building the algorithm? -- * Who is affected by the algorithm? -- * Are there any differential power dynamics between those who create the algorithm and those who are impacted by it? -- * What is the relationship between the training data and the test or future data? -- * What were the optimization criteria used to fit the model? --- ### Name and Value the Labor All Those Involved There are many forms of labor involved in data work. * Those who serve as the data source. -- * Those who collect and input the data. -- * Those who analyze the data. -- * Those who communicate the findings from the data. --- ### Name and Value the Labor All Those Involved **Example**: Reed Forestry Data Science Dashboarding Project -- * Project provided by the Interior West branch of the US Forest Inventory and Analysis Program (FIA) + Monitor the status and trend of forested lands in the US -- * It takes FIA foresters 10 years to collect the data that we were visualizing in 10 weeks. -- * **How do we honor their contribution?** --- ### Name and Value the Labor All Those Involved **Example**: Reed Forestry Data Science Dashboarding Project * [One approach](https://shiny.reed.edu/s/users/aflowers/fires/) <img src="img/fires_dash.png" width="85%" style="display: block; margin: auto;" /> --- class: inverse, center, middle ## What Should I do to Ensure I am an Ethical Data Scientist? -- ### Be Willing to Keep Learning And Engaging with Ideas that Challenge Your Own Worldview --- ## Additional Readings * Weapons of Math Destruction by Cathy O'Neil * Automating Inequality by Virginia Eubanks > "As we create a new national narrative and politics of poverty, we must also begin dismantling the digital poorhouse. It will require flexing our imaginations and asking entirely different kinds of questions.. What would decision-making systems that see poor people, families, and neighborhoods as infinitely valuable and innovative look like? It will also require sharpening our skills: high-tech tools that protect human rights and strengthen human capacity are more difficult to build than those that do not." * Data Feminism by Catherine D'Ignazio and Lauren F. Klein * Algorithms of Oppression by Safiya Noble * Indigenous Statistics by Maggie Walter and Chris Andersen * [This annotated list](https://github.com/jknowles/ethical_data_science_reader/blob/master/Ethical%20and%20Inclusive%20Data%20Science%20Readings.pdf) by Jared E. Knowles --- ## Should We Just Give Up on Data Science? -- No! -- I firmly believe that data can help us answer questions about the world. -- We just need to be careful and thoughtful with the data. -- On Thursday, we will think through some case studies and develop **constructive ideas** for how to approach these problems ethically.