Introduction

The Western media coverage of the 2022 Ukrainian-Russian conflict has called attention to news coverage bias. Media networks from the United States’s Washington Post(Ellison & Andrews, 2022) to Qatar’s Al Jazeera(Staff, 2022) have published articles that report the double standards of Western coverage of the Ukrainian-Russian conflict compared to their respective coverage of the 2003 Iraq War. Many of these articles specifically respond to a quotation from CBS News senior foreign correspondent Charlie D’Agata, who said that Ukraine “isn’t a place, with all due respect, like Iraq or Afghanistan, that has seen conflict raging for decades. This is a relatively civilized, relatively European – I have to choose those words carefully, too – city, one where you wouldn’t expect that or hope that it’s going to happen.”(Lambert, 2022) While there is a breadth of coverage and discussion investigating this bias, they are largely informal blog posts and infographics.

Inspired by this discussion, we aim to explore articles from the New York Times (New York Times ) to visualize how Western media may have covered these conflicts differently. Additionally, as much of the discussion of this potential coverage bias touches on the different portrayals of Ukraine versus Iraq, we will investigate sentimentality and language use in titles about each conflict. This investigation falls into our more general exploration of the media coverage of different world regions over time. We will create visualizations that will attempt to provide a clear representation of coverage to present how language use and sentimentality may be explicitly or implicitly biased coverage.

Methods

Our Data:

Our data was acquired from the New York Times Developer portal. The New York Times maintains an API of all past articles which includes an abstract, a byline, document type, keywords, headline, lead paragraph, news desk the article is from, publication date, section name, snippet, subsection name, and word count. We created developer accounts for the New York Times which gave us API login keys so we could make particular calls for articles. Using Mkearney’s data scraping tool we went about compiling all of our data.(Kearney, 2017) First, we started by acquiring all the articles published in the first month after the US invasion of Iraq. These are all the articles published from March 20th, 2003 to April 20th, 2003, and all the articles published from February 24th, 2022 to March 24th, 2022. Then, we performed the same act but for the more recent first month of the invasion of Ukraine by Russia. Third, we created a random sampling tool. This gave us five random sets of years, months, and days between 2010 and 2020. From these randomly selected days, we collected the 100 articles published from that day towards now in time. This gave us three datasets, one from the start of the Iraq war, one from the start of the Ukraine-Russia war, and one randomly sampled.

Each dataset had the variables discussed above: abstract, a byline, document type, keywords, headline, lead paragraph, news desk the article is from, publication date, section name, snippet, subsection name, and word count. All of these should be self-explanatory. From this, we created a multitude of subsets. From the randomly sampled articles, we selected for the names of the sections, Africa, the Americas, Asia Pacific, Australia, Europe, and the Middle East. Each of these subsections refers to the region the article is about. Further, for both the Iraq and Ukraine data we concatenated the headline, abstract, and lead paragraph into a larger more descriptive body. From here we tokenized the data into both unigrams and bigrams and filtered for tidytexts stopwords. From here, we broke these down into a dataset where we exclusively had articles that included Ukraine or Iraq and a dataset that had all the articles. With the exclusively Ukraine and Iraq data, we performed an emotional analysis using the lexicon_nrc to acquire the apt emotions.(Mohammad, n.d.) Using the data with all articles, we calculated the proportions of articles published about Iraq and Ukraine during their respective time periods. This left us with five key datasets: the randomly sampled one referring to the various regions, two emotional analysis ones, and two frequency ones.

Sample 1 of Data Wrangling: Tokenization

A critical component of our analysis was the tokenization of the New York Times articles that mentioned Iraq and Ukraine. To tokenize our data, we created two functions: unigram_tokenizer and bigram_tokenizer. The arguments of the functions are the date, in dttm format, and the dataset. These functions take the initial Iraq and Ukraine datasets obtained from New York Times Developer portal and convert the combined texts variable (containing the abstract, lead paragraph, and headline data) into either single words (unigrams) or two words (bigrams) tokens. The function then removes words characterized as stopwords (eg. words including a, as, in, of, etc). We combined this function with a for loop that takes the vector of dates from the first thirty-one days of each conflict and ultimately reports the segmented words, the number of occurrences of the word, and the date on which that word was published.

hide

# load master data sets: these contain Ukraine and Iraq data 
emotions <- read_csv("emotions.csv")
drop_emotions <- read_csv("drop_emotions.csv")
unigrams <- read_csv("unigrams.csv")
bigrams <-read_csv("bigrams.csv")
occurrences4 <- read_csv("occurrences4.csv")
frequency <- read_csv("frequency.csv")

#load Iraq data sets: 
iraq <- read.csv("iraq.csv") 
new_iraq <-read.csv("new_iraq.csv")
iraq_counts <-read.csv("iraq_counts.csv")
iraq_data <-read_csv("iraq.csv")
iraq_texts<-read_csv("iraq_texts.csv")
iraq_texts_dates<-read_csv("iraq_texts_dates.csv")

#load Ukraine data sets :
refinedukraine <- read.csv("refinedukraine.csv")
ukraine_counts <- read.csv("ukraine_counts.csv")
ukraine_data <- read.csv("ukraine_data.csv")
ukraine_data <-read_csv("ukraine_data.csv")
ukraine_texts<-read_csv("ukraine_texts.csv")
ukraine_texts_dates<-read_csv("ukraine_texts_dates.csv")

# Emotions Data
iraq_4_emotions<- read_csv("iraq_4_emotions.csv")
iraq_emotions <- read_csv("iraq_emotions.csv")
ukraine_4_emotions<-read_csv("ukraine_4_emotions.csv")
ukraine_emotions<- read_csv("ukraine_emotions.csv")
swd_list <- read_csv("swd_list.csv")
emotions_4_with_intensity <- read_csv("emotions_4_with_intensity.csv")

#load World data sets:
refinednytworld <- read.csv("refinednytworld.csv")
nytworld <- read.csv("nytworld.csv")

Sample 2 of Data Wrangling: Compiling Iraq Information

In order to perform our analysis of emotions in New York Times coverage of Iraq versus Ukraine, we compiled the text data from the headline, the abstract, and the lead paragraph of each article into a new variable called texts. After increasing our text data sample, we filtered for only articles that mentioned Iraq or Ukraine respectively. In the interest of creating a time series, we also wanted to make sure our new dataset would contain the dates of each publication reported as the day number of the conflict. To do this, we converted the dttm format of the publication date to numbers 1 through 31 using the case when function. The following data set contained three new variables in addition to the original dataset: combined texts, mention of the respective country, and day.

hide

# Combining key columns from the NYT API data
texts2 <- paste0(ukraine_data$lead_paragraph, ukraine_data$abstract, ukraine_data$headline)
ukraine_texts <- ukraine_data %>%
  mutate(texts = texts2) %>% 
  # filtering down for articles with ukraine
   mutate(mentions = case_when(grepl("ukraine", texts, ignore.case = TRUE) 
                                  ~ "news that mentioned ukraine",
  # takes any lead paragprah that does not mention Iraq as new that did not mention iraq
                                  !grepl("ukraine", texts, ignore.case = TRUE) 
                                  ~"news that did not mention ukraine")) %>%
  filter(mentions == "news that mentioned ukraine")

# Renaming the dates so they can work with the Iraq data
ukraine_texts_dates <- ukraine_texts %>% 
    mutate(date = as_date(pub_date)) %>% 
   mutate(date = case_when(date == "2022-02-18"~ 1, 
                         date == "2022-02-19"~ 2, 
                         date == "2022-02-20"~ 3, 
                         date == "2022-02-21"~ 4, 
                         date == "2022-02-22"~ 5,
                         date == "2022-02-23-"~ 6,
                         date == "2022-02-24"~ 7,
                         date == "2022-02-25"~ 8,
                         date == "2022-02-26"~ 9,
                         date == "2022-02-27"~ 10,
                         date == "2022-02-28"~ 11,
                         date ==  "2022-03-01"~ 12,
                         date ==  "2022-03-02"~ 13,
                         date ==  "2022-03-03"~ 14,
                         date ==  "2022-03-04"~ 15,
                         date ==  "2022-03-05"~ 16,
                         date ==  "2022-03-06"~ 17,
                         date ==  "2022-03-07"~ 18,
                         date ==  "2022-03-08"~ 19,
                         date ==  "2022-03-09"~ 20,
                         date ==  "2022-03-10"~ 21,
                         date ==  "2022-03-11"~ 22,
                         date ==  "2022-03-12"~ 23,
                         date ==  "2022-03-13"~ 24,
                         date ==  "2022-03-14"~ 25,
                         date ==  "2022-03-15"~ 26,
                         date ==  "2022-03-16"~ 27,
                         date ==  "2022-03-17"~ 28,
                         date ==  "2022-03-18"~ 29,
                         date ==  "2022-03-19"~ 30,
                         date ==  "2022-03-20"~ 31,
                         )) %>% 
    arrange(date)

Sample 3 of Data Wrangling: Aquiring Emotion Data Over Time

We then took this data and tokenized it into unigrams, meaning we divided the text variable into individual words. We also removed all stop words. After the text variable was tokenized, we used the full join function to merge our data set with the NRC lexicon’s emotions data, containing the four emotions of fear, anger, sadness, and joy, and took the count of the number of words that fell into each emotional category to compute the frequency of each emotion in the New York Times coverage of the first 31 days of each conflict.

hide

# Tokenizing
iraq_emotions <-iraq_texts_dates %>%
  group_by(texts, date) %>% 
  # tokenize by unigrams (words)
  unnest_ngrams(word, texts, n = 1) %>% 
  group_by( date, word) %>% 
  # count the words
  summarise(count = n(), .groups = "drop") %>%
  # filter out the stopwords
  filter(!word %in% swd_list)

# Finding the emotions by date
iraq_4_emotions <- iraq_emotions %>%
  left_join(emotions_4_with_intensity, by = c("word" = "term")) %>%
  drop_na() %>%
  group_by(AffectDimension, date) %>%
  summarise(count = sum(count), .groups = "drop")

# Finding the number and frequency of emotions over each day
iraq_4_emotions <-iraq_4_emotions %>% 
   group_by(date) %>% 
  summarise(sum_count = count/sum(count),
            AffectDimension = AffectDimension)

Results

Word Clouds

hide

# Selecting the correct variables to apply to the cloud, the word and counts
iraq_counts_cloud <- iraq_counts %>% 
  select(word,n)
# Creating wordcloud
wordcloud2(data = iraq_counts_cloud, size=1.6, color='random-light', backgroundColor="black")

hide

# Selecting the correct variables to apply to the cloud, the word and counts
ukraine_counts_cloud <- ukraine_counts %>% 
 select(word,n)
# Creating wordcloud
wordcloud2(data = ukraine_counts_cloud, size=1.6, color='random-light', backgroundColor="black")

First, to broadly examine the New York Times coverage of the two conflicts we created word clouds that visualize the frequency of words associated with Iraq and Ukraine.(Lang, 2022) Using the unigrams we compiled and the counts for each word we created interactive word clouds that depict the frequency of a word by its size. Further, it is interactive such that you can hover over a word and it will show you its specific count. The Iraq word cloud is very war-centric. There are many words like “P.O.W.”, “dead”, “captured”, or “control”. There are more minimal words such as “Bush”, “America”, or “Brittons” which do not seem to be explicitly war-related. Our Ukraine word cloud on the other hand is drastically less explicit. It contains more toned down descriptions such as words that seem related to war, such as “horrors”, “destruction”, and “invasion.” There is a much greater amount of toned down words such as “briefing”, “plan”, “oil” or “crisis”. This difference in tone could indicate a broader underlying bias. As we move forward in the project, we decided to more thoroughly examine the sentiments between the conflicts to ascertain whether the diction used truly is all that different.

Article Publication Frequency and Counts

hide

# Plotting Counts  
p <- occurrences4 %>%
  ggplot( aes(x=date, y=frequency, color = case)) +
    geom_line() +
     scale_color_OkabeIto()+
    labs(title = "Frequence of New York Times Articles Published", y = "Frequency", 
    x = "Day of Conflict",
    color = "Country" )+
    theme_ipsum()
  
  
# Turn it interactive with ggplotly
p <- ggplotly(p, dynamicTicks = TRUE) %>% 
  rangeslider() %>% 
  layout(hovermode = "x")
p

hide

# Plotting Frequencies
p2 <-  frequency %>%
  ggplot( aes(x=date, y= n, color = case)) +
    geom_line(aes(date, n, color = case)) +
     scale_color_OkabeIto()+
    labs(title = "Number New York Times Articles Published", y = "Number of Articles", 
    x = "Day of Conflict",
    color = "Country " )+
    theme_ipsum()
  
p2 <- ggplotly(p2, dynamicTicks = TRUE) %>% 
  rangeslider() %>%
  layout(hovermode = "x")
p2

To further establish a baseline comparison between the respective New York Times coverage of the Iraq and Ukraine conflicts, we analyzed the frequency and count of articles that contained the keywords “Iraq” and “Ukraine” in the first thirty-one days of each conflict. Frequency refers to the respective proportion of articles that mentioned Iraq and Ukraine out of the total sum of articles the New York Times published during these time periods. In our analysis of article frequency, our visualization reported that the New York Times published a higher overall frequency of articles that mentioned Iraq than articles that mentioned Ukraine. In the first fifteen days of conflict, the New York Times frequency of Iraq and Ukraine containing articles are nearly equivalent, however, as the conflicts near the thirty-first day, these trends diverge. On day thirty, the publication of articles that mention Iraq has a frequency of 0.70, while the publication of articles that mention Ukraine has a frequency of 0.06. These results track with the word clouds from earlier. The word clouds indicated a potentially less emotional or visceral reaction to the Ukraine war than there was to the Iraq war. It then makes sense that the New York Times also would publish fewer articles per capita about the Ukraine invasion. However, it is surprising that it starts to diverge so much towards the end. Either, this is indicative of a trend in which the New York Times slowly is publishing fewer articles about the Ukraine invasion, or it means that we have too small of a sample size of data. It could also simply be that in the 19 years between the conflicts, the New York Times publishes drastically more articles and does not feel they should compensate for this increase in articles with an increase in coverage for particular kinds of events. Expanding our dataset to the more recent days of the conflict may be a good way to better gauge whether this disparity in reporting is a broader trend.

Sentiment Density

hide

# Plotting Sentiments
 ggplot(data= drop_emotions , aes(x= n, group= AffectDimension, fill= AffectDimension)) +
  geom_density(adjust= 7, alpha=.4) +
  facet_wrap(~case)+
  theme_calc()+
  theme(panel.background = element_blank())+
  labs(fill = "Emotions", 
       title = "Sentiment Density in First 31 Days of Conflict ",
       y = "Density of Sentiment", 
       x = "Number of Articles")

The density plots depict the number of articles on the x-axis with the density of particular emotions on the y-axis. The emotions are shown through colors as well. They are anger, fear, joy, and happiness. The density plots visualize the distribution shape of the emotions, which allows the viewer to see emotional trends more clearly than a standard frequency histogram. Iraq’s density plots show that as the number of articles increases there seems to be more fear for coverage of the Iraq conflict. In comparison, the Ukraine conflict has a more consistent combination of all emotions. These plots served as our preliminary investigation of the distribution of emotions in New York Times media coverage of the Iraq and Ukraine conflicts. The results compelled us to chart these emotional distributions over the first 31 days of conflict rather than simply the number and frequency of article publications.

Emotion Frequency Over the First Month of Conflict

hide

# Plotting Emotions Over Time
 ggplot(data = iraq_4_emotions, aes(x = date, y = sum_count,color = AffectDimension)) +
  geom_line(size = 0.7) +
  scale_color_OkabeIto()+
  theme_bw()+
 scale_x_continuous( breaks = scales::pretty_breaks(n = 10))+
  labs(x = "Day of Conflict",
       y = "Frequency of Word Counts",
       color = "Affect Dimension",
       title = "Emotions of NY Times Iraq Coverage over Time (2003)")

hide

 ggplot(data = ukraine_4_emotions, aes(x = date, 
                                            y = sum_count,
                                            color = AffectDimension)) +
  geom_line(size = 0.7) +
  scale_color_OkabeIto()+
  theme_bw()+
   scale_x_continuous( breaks = scales::pretty_breaks(n = 10))+
  labs(x = "Day of Conflict",
       y = "Frequency of Word Counts",
       color = "Affect Dimension",
       title = "Emotions of NY Times Ukraine Coverage over Time (2022)")

Coverage of both the Ukraine invasion and the Iraq war is primarily fear-based. This makes sense given the gravity of both situations. Nuclear-armed countries waging war on smaller states for tenuous reasons at best is a fearful situation for many. In the case of Iraq, there was a larger proportion of fear however than there was in the case of Ukraine. This may be because Iraq was instigated by the U.S. and the New York Times is a U.S.-based news source so its coverage would be more personal. The rates of anger, joy, and sadness seem to be comparable between the two. These observations offer interesting insights into the behavior of New York Times coverage bias. While there appear to be significant differences in fear that highlight potential coverage bias, the comparable rates of anger, joy, and sadness may demonstrate that bias in New York Times coverage operates under some metrics but not others. One might assume that the U.S.-based news source would have as much reporting on the potential sadness of war if their country was involved. However, even with events such as the atrocities committed in Bucha Ukraine, the amount of sad, angry, and joyous coverage seems to be similar between the two events. Some columnists have commented that Ukraine has become a pseudo proxy war however which may be one reason reporting seems to be quite similar, at least from our data.(Kaplan, 2022)

Number of Articles Published in Various Regions

hide

# Barplot of the Number of Articles Published for Each Region
refinednytworld %>%
  subset(subsection_name == "Africa" | subsection_name == "Americas" |
           subsection_name == "Asia Pacific" | subsection_name == "Australia" |
           subsection_name == "Europe" | subsection_name == "Middle East") %>%
  ggplot(aes(x = subsection_name, fill = subsection_name)) + 
  geom_bar() + scale_fill_OkabeIto() +
  labs(
    title = "Occurrences of Regions from Sampled NYT Data",
    y = "Number of Occurrences",
    x = " Region",
    fill = ""
  ) +
  theme_calc()+
  theme(legend.position="none")

We hoped to use the comparison of the Iraq and Ukraine conflicts as a means to examine media bias in smaller-scale events. However, to examine large-scale events we wanted to look at different coverage of different regions in the world. Because the New York Times divides its articles in the API into regions of the world, Africa, the Americas, Asia Pacific, Australia, Europe, and the Middle East, we had hoped that these could be used to gauge cross-regional sentiment. Using 500 articles, sampled from five randomly selected days between 2010 and 2020, we used the NRC lexicon to compare the emotions involved.(Robinson & Silge, 2021),(Mohammad, n.d.) Unfortunately the New York Times data had a drastically smaller number of articles in these sections than we had hoped. The small sample size of articles restricted our ability to conduct a comprehensive emotional analysis of New York Times regional coverage. To uncover emotional trends in regional coverage, we had hoped to have an extensive sample of randomly selected publication dates. Instead, we plotted the counts of articles covering each region over the randomly selected days. In the future, we hope to sample more articles over a wider randomized period of days. While establishing publication counts is an important first step when analyzing the distribution of media coverage, there are several next steps we wish to make to better analyze the emotional distribution. First, we would find the top words associated with each and based on these top words, find each region’s positivity score. From here, we would make an interactive map that shows the color of each region by the positivity score, and by hovering over the regions you would be able to see the words most associated with them. This final visualization would allow viewers to see the emotional analysis accompanied by the spatial relationships between each region.

Conclusion

Both the U.S. invasion of Iraq in 2003 and Russia’s invasion of Ukraine in 2022 had massive consequences which reverberated around the globe. News coverage shapes public opinion and public understanding. Thus, the coverage of these events is central to informing and educating the public. News coverage bias is a massive issue because it means people are being presented with incomplete or simply biased information that may greatly affect their opinions. In the case of these two conflicts, our data from the New York Times indicates that there may be some coverage bias.

Our word clouds indicated that there are some differences in the top words used that are associated with Ukraine and Iraq. The Iraq wordcloud involved significantly more brutal war-based terms than the Ukraine wordcloud did. This could be for a multitude of reasons such as the New York Times more readily highlighting the visceral nature of war when the country they are based in is involved. Thus, the word clouds did suggest there was some bias.

Our data surrounding publication count and frequency of articles in both conflicts indicated that Ukraine had a similar number of articles published, however, its proportion of articles was drastically lower. This means that there was a similar base amount of coverage for both Iraq and Ukraine, but, because between 2003 and 2022 the New York Times began to publish many more articles per day, the relative proportion of coverage for Ukraine is lower.

Our density plots are somewhat hard to interpret. However, they do show that initially, coverage of the Iraq conflict was much more emotional than in Ukraine but as more articles are published it evens out. The time series plots we made indicate that most coverages involved similar levels of anger, joy, and sadness, however, fear was much higher in the coverage of Iraq. This makes sense because of the U.S.’s direct involvement in the conflict. However, it could also indicate underlying bias. Finally, in our regional publication plot, we see a strong skew towards reporting about Europe over all other regions.

From all of our analysis, we found that there are some differences between the New York Times’ reporting of the Ukraine and Iraq conflicts, such as article frequency, tone, and emotions present. However, because of our small time scale, a small selection of events, and single news source, this information is not generalizable. Further, because the New York Times is a U.S. news organization, it is logical that they would have different approaches to reporting a U.S.-instigated conflict. Regardless, as evidenced by our emotion time series, frequency plots, wordclouds, density plot, and barplot, the New York Times does present bias in its coverage of certain regions and events.

Class Peer Reviews

Reviewer 1
1. State the authors’ objectives and the general questions that the authors are considering. Do the results and figures support the conclusions made in the report?
The authors plan to visualize differences in the media coverage of the Ukraine-Russia conflict and the Iraq war. This is done in regard to bias in coverage of Ukraine versus Iraq.
1. Discuss the foundations of data visualizations relative to the figures presented in the report.
The visualizations in the report strong visualize the data and are well done. For the Word Clouds sections, the word clouds visualized from the data, show the discrepancy and bias in the New York Times’ reporting of the two wars. The most striking aspect of these word clouds is that in the one for the Iraq war the main word is Iraq, yet for the Ukraine-Russia war, the biggest word is Russia, not Ukraine as it is for Iraq. The aggressor in the Iraq war as the United States, so seeing that ‘America’ is present but much smaller, visualizes very well the discrepancy of the tone, as the report also details. For the next visualizations, “Article Publication Frequency and Counts”, again, the visualizations are very well done. The aesthetics blend very well into the presentation fo the data. IN these graphs I find it very clear, that the New York Times produced a higher frequency of articles about Iraq, than Ukraine in the 30 day time set. The “Emotion Frequency over the First Month of Conflict” visualizations also demonstrate the bias that the authors sought to visualize. The higher frequency of fear in the NYT coverage of the Iraq war again demonstrates the different tones that the authors describe and the bias that is a result.
1. State 3 things that are strong about their report and 2 things that can be improved.
Three things that are strong about this report are the visualizations of the data, how the text was tokenized and the code behind it all, and the overall theme/ aesthetics of the visualizations are very well done! My favorite graph for only aesthetic reasons was the distribution of the sentiment density for the first 31 days! I really loved how the colors mixed. I also really liked the two line graphs that show the emotional frequency over the first month of the conflict!! They really look nice! One thing that I think could be improved is if the word clouds had the same white background as the other visualization. I am unsure about the limitations of the word cloud functions, but this would have been cool to see! The final thing I think could help is a visualization of the emotions regarding the articles that are from Europe to see if the same tone change follows for those articles in general. This might help to find out where the bias stems from. Overall, I beleive that this project is very well done and explained well!!!!
Reviewer 2
1. State the authors’ objectives and the general questions that the authors are considering. Do the results and figures support the conclusions made in the report?
Their goal of the report was to investigate the New York Times reporting bias, specifically in the context of the Russia-Ukraine crisis. A central question was if the New York Times over-reports on European issues, thus minimizing conflicts happening in the rest of the world. To investigate this, many of their figures center on comparing coverage of the current Ukraine situation compared to the US invasion of Iraq in 2003. They also use a random sampling strategy in order to randomly pick articles from 2010-2020 to investigate bias. In general, I think the figures that they use support this. The figures that support this the most are, in my opinion, are the frequency/number of articles (although I think this could have been compounded into one figure), the word cloud, the Emotion Frequency Over the First Month of Conflict, and Number of Articles Published in Various Regions.
1. Discuss the foundations of data visualizations relative to the figures presented in the report.
Starting with the word clouds, I think these were helpful as they orient the viewer towards the specific time period in which the event takes place. For instance, we can see that Iraq was the main topic being reported on, while Ukraine is within many other topics, such as the gas prices and the pandemic.

For the Article Publication Frequency and Counts, I think making this into one figure may have been more intuitive for viewers– maybe seeing a density graph with the proportion of Ukraine and iraq articles within total articles published. I like the interactive nature of the graphs.

For the Sentiment Density, I think this is really helpful in understanding how emotions work in reporting, however, as they note in the conclusion, this could have been easier to read. The number of articles being on the x axis isn’t the most intuitive and may be misleading. Maybe a bar graph would have been better.

For the Emotion Frequency Over the First Month of Conflict, I think this is important to the thesis, and well described in the caption. It shows the predominant emotions at play in reporting, as well as how this changes over time. In addition to this, it would have been cool to include how emotions have changed in all nyt articles over time/do a random sampling of emotional tone across time.

For the Number of Articles Published in Various Regions, I think this was one of the most interesting figures. While this touches on broader biases rather than specific iraq/ukraine conflicts, I think there should have been more of these. It is clear, easy to interpret, and somewhat provocative. And further, it suggests that the biases of the new york times present within iraq/ukraine conflicts is not isolated, and is a systemic issue.
1. State 3 things that are strong about their report and 2 things that can be improved.
I really like the topic and question. I think it is relevant and provocative, given that this is an ongoing issue and the New York Times is a generally very trusted News Paper. Further, intentions are stated very clearly, as well as conclusions.

I like how a lot of the graphs are interactive, as this allows the readers to dive into the data, see what is happening, or maybe zoom in on a specific time period. This was a nice touch.

I like how they utilized random sampling in their last graph.

One issue I think is the difference in time between the Russia-Ukraine conflict and the invasion of Iraq. Between 2003 and 2022, a lot has changed in the world, and I’d imagine the reporting style of the NYT has changed a lot. I understood the word cloud to be representative of these differences: in 2003, reporting was very Iraq-centric with not a lot of words, while in 2022, there are many more words, and not all are related. In 2022, reporting seems to be less emotional and broader, while in 2003, reporting was more emotional and narrow. While this is disclosed in many of the paragraphs, I think it could have been solved by picking conflicts closer together. Or maybe including more graphs that investigate the change in reporting over time.

This may be personal opinion, but I wish they included more graphs talking about overall reporting trends outside of the Russian-Ukraine/ Iraq conflict graphs. This may be doing a word cloud associated with each region, or maybe the emotion associated with the reporting in each region. While I think the Iraq/Ukraine graphs are good, I just wish there were figures to show that this is not an isolated phenomenon, but rather, a systemic issue.

Ellison, S., & Andrews, T. M. (2022). “They seem so like us”: In depicting ukraine’s plight, some in media use offensive comparisons. The Washington Post. https://www.washingtonpost.com/media/2022/02/27/media-ukraine-offensive-comparisons/

Kaplan, F. (2022). Everyone is starting to admit something frightening about ukraine. Slate. https://slate.com/news-and-politics/2022/04/ukraine-nato-russia-proxy-war.html

Kearney, M. W. (2017). Nytimes: New york times APIs. https://github.com/mkearney/nytimes

Lambert, H. (2022). CBS reporter calls ukraine “relatively civilized” as opposed to iraq and afghanistan, outrage ensues (video). The Wrap. https://www.thewrap.com/cbs-charlie-dagata-backlash-ukraine-civilized/

Lang, D. (2022). wordcloud2: Create word cloud by htmlWidget. https://github.com/lchiffon/wordcloud2

Mohammad, S. M. (n.d.). NRC lexicon. https://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm

Robinson, D., & Silge, J. (2021). Tidytext: Text mining using dplyr, ggplot2, and other tidy tools. https://github.com/juliasilge/tidytext

Staff, A. J. (2022). “Double standards”: Western coverage of ukraine war criticised. Al Jazeera. https://www.aljazeera.com/news/2022/2/27/western-media-coverage-ukraine-russia-invasion-criticism

Investigating New York Times News Bias

Introduction

Methods

Our Data:

Sample 1 of Data Wrangling: Tokenization

Sample 2 of Data Wrangling: Compiling Iraq Information

Sample 3 of Data Wrangling: Aquiring Emotion Data Over Time

Results

Word Clouds

Article Publication Frequency and Counts

Sentiment Density

Emotion Frequency Over the First Month of Conflict

Number of Articles Published in Various Regions

Conclusion

Class Peer Reviews

References

Reuse