Text Analysis

class: center, middle

### Text Analysis with `tidytext`

### Kelly McConville

.large[Math 241 | Week 9 | Spring 2021]

---

## Announcements/Reminders

* Mini Project 2 is due today.
    + Will start grading at noon on Sunday.

* Lab 6 is posted.

---

## Regular Expression Recap

* A concise language for describing patterns in strings.
    + But not super easy to read.
    + Good to have cheatsheets and the internet for help!

* Will post key for Tuesday's handout to the shared folder.

---

## Pattern Matching Recap

* Functions to take **action** based on our regular expression pattern matching.

* Detect [pattern]() with:
    + `str_detect()`
    + `str_subset()`
    + `str_count()`

* Extract [pattern]()
    + `str_extract()` and `str_extract_all()`

* Replace [pattern]()
    + `str_replace()` and `str_replace_all()`

* Split [pattern]()
    + `str_split()`

---
  
## Goals for Today

Now that we know how to handle text/strings as data, let's do some text analysis with `tidytext`.

Topics:

* Tokenizing to a tidy format
* Word frequencies
* Word clouds
* Sentiment analysis

---

## Recap: Tidy data

What makes a dataset tidy?

* Each column is a single variable.
* Each row is a unique observation.
* Each row must have its own cell.

---

## Is Hey Jude Tidy?

```r
library(genius)
hey_jude <- genius_lyrics(artist = "The Beatles", 
                          song = "Hey Jude")
hey_jude
```

```
## # A tibble: 53 x 3
##    track_title  line lyric                                           
##    <chr>       <int> <chr>                                           
##  1 Hey Jude        1 Hey Jude, don't make it bad                     
##  2 Hey Jude        2 Take a sad song and make it better              
##  3 Hey Jude        3 Remember to let her into your heart             
##  4 Hey Jude        4 Then you can start to make it better            
##  5 Hey Jude        5 Hey Jude, don't be afraid                       
##  6 Hey Jude        6 You were made to go out and get her             
##  7 Hey Jude        7 The minute you let her under your skin          
##  8 Hey Jude        8 Then you begin to make it better                
##  9 Hey Jude        9 And anytime you feel the pain, hey Jude, refrain
## 10 Hey Jude       10 Don't carry the world upon your shoulders       
## # … with 43 more rows
```
---

## Tidy Text

* A data table with one token per row.

* **Token**: meaningful unit of text
    + What is the unit for `hey_jude`?

```r
hey_jude
```

---

## Tidy Text

* A data table with one token per row.

* **Token**: meaningful unit of text
    + What is the unit for `hey_jude`?

* Other common tokens are words, sentences, paragraphs.

* Some text analysis should be done on text data in a non-tidy format.

---

## Tidying Text Data

* **Tokenize**: Break text into individual tokens

```r
library(tidytext)
hey_jude_words <- hey_jude %>%
  unnest_tokens(output = word, input = lyric,
                token = "words")
hey_jude_words
```

```
## # A tibble: 544 x 3
##    track_title  line word 
##    <chr>       <int> <chr>
##  1 Hey Jude        1 hey  
##  2 Hey Jude        1 jude 
##  3 Hey Jude        1 don't
##  4 Hey Jude        1 make 
##  5 Hey Jude        1 it   
##  6 Hey Jude        1 bad  
##  7 Hey Jude        2 take 
##  8 Hey Jude        2 a    
##  9 Hey Jude        2 sad  
## 10 Hey Jude        2 song 
## # … with 534 more rows
```

---

## Tidying Text Data

* What is an `ngram`?

```r
hey_jude_ngram <- hey_jude %>%
  unnest_tokens(output = ngram, input = lyric,
                token = "ngrams", n = 2)
hey_jude_ngram
```

```
## # A tibble: 491 x 3
##    track_title  line ngram     
##    <chr>       <int> <chr>     
##  1 Hey Jude        1 hey jude  
##  2 Hey Jude        1 jude don't
##  3 Hey Jude        1 don't make
##  4 Hey Jude        1 make it   
##  5 Hey Jude        1 it bad    
##  6 Hey Jude        2 take a    
##  7 Hey Jude        2 a sad     
##  8 Hey Jude        2 sad song  
##  9 Hey Jude        2 song and  
## 10 Hey Jude        2 and make  
## # … with 481 more rows
```

---

## Word Frequencies

* Common text mining task

* What have we learned about the frequency of words in "Hey Jude"?

* Which words in this list do we maybe not care about?

```r
hey_jude_words %>%
  count(word, sort = TRUE)
```

```
## # A tibble: 94 x 2
##    word       n
##    <chr>  <int>
##  1 na       204
##  2 jude      43
##  3 hey       27
##  4 yeah      18
##  5 it        17
##  6 naa       17
##  7 you       13
##  8 better    12
##  9 make      12
## 10 to        10
## # … with 84 more rows
```

---

## Word Frequencies

* **Stop words**: Common words that are not useful for analysis

```r
data("stop_words")
stop_words
```

```
## # A tibble: 1,149 x 2
##    word        lexicon
##    <chr>       <chr>  
##  1 a           SMART  
##  2 a's         SMART  
##  3 able        SMART  
##  4 about       SMART  
##  5 above       SMART  
##  6 according   SMART  
##  7 accordingly SMART  
##  8 across      SMART  
##  9 actually    SMART  
## 10 after       SMART  
## # … with 1,139 more rows
```

---

## Word Frequencies

* I want remove from `hey_jude_words` the rows that contain the stop words.
    + Get to learn a new `join`!

```r
hey_jude_words <- hey_jude_words %>%
  anti_join(stop_words, by = "word")
```

---

## Word Frequencies

* What graph should we construct?

```r
hey_jude_words %>%
  count(word, sort = TRUE)
```

```
## # A tibble: 43 x 2
##    word         n
##    <chr>    <int>
##  1 na         204
##  2 jude        43
##  3 hey         27
##  4 yeah        18
##  5 naa         17
##  6 ma           8
##  7 judy         5
##  8 bad          3
##  9 begin        3
## 10 remember     3
## # … with 33 more rows
```

---

## Word Frequencies

* Which `forcats` function should we use to reorder the bars?

```r
hey_jude_words %>%
  count(word, sort = TRUE) %>%
  filter(n > 2) %>%
  ggplot(mapping = aes(x = word, y = n)) +
  geom_col() + coord_flip()
```

---

## Word Frequencies

* What `forcats` function should we use to reorder the bars?

```r
hey_jude_words %>%
  count(word, sort = TRUE) %>%
  filter(n > 2) %>%
  mutate(word = fct_reorder(word, n)) %>%
  ggplot(mapping = aes(x = word, y = n)) +
  geom_col() + coord_flip()
```

---

## Let's Get More Data

```r
white_album <- genius_album(artist = "The Beatles", 
                          album = "The Beatles ('The White Album')") 
```

---

```r
white_album  %>%
  unnest_tokens(output = word, input = lyric,
                token = "words") %>%
  anti_join(stop_words, by = "word") %>%
  count(word, sort = TRUE) %>%
  filter(n > 12) %>%
  mutate(word = fct_reorder(word, n)) %>%
  ggplot(mapping = aes(x = word, y = n)) +
  geom_col() + coord_flip()
```

---

I have so many questions.

* Do The Beatles really sing that much about a bungalow?

* What is "ob" or "mi"?

---

## bungalow

Problem?

```r
str_subset(string = white_album$lyric, pattern = "bungalow")
```

```
## character(0)
```

---

## Bungalow

```r
str_subset(string = white_album$lyric, pattern = "Bungalow")
```

```
##  [1] "Hey, Bungalow Bill"                "What did you kill, Bungalow Bill?"
##  [3] "Hey, Bungalow Bill"                "What did you kill, Bungalow Bill?"
##  [5] "Hey, Bungalow Bill"                "What did you kill, Bungalow Bill?"
##  [7] "Hey, Bungalow Bill"                "What did you kill, Bungalow Bill?"
##  [9] "Hey, Bungalow Bill"                "What did you kill, Bungalow Bill?"
## [11] "Hey, Bungalow Bill"                "What did you kill, Bungalow Bill?"
## [13] "Hey, Bungalow Bill"                "What did you kill, Bungalow Bill?"
## [15] "Hey, Bungalow Bill"                "What did you kill, Bungalow Bill?"
## [17] "Hey, Bungalow Bill"                "What did you kill, Bungalow Bill?"
## [19] "Hey, Bungalow Bill"                "What did you kill, Bungalow Bill?"
## [21] "Hey, Bungalow Bill"                "What did you kill, Bungalow Bill?"
## [23] "Hey, Bungalow Bill"                "What did you kill, Bungalow Bill?"
## [25] "Hey, Bungalow Bill"                "What did you kill, Bungalow Bill?"
## [27] "Hey, Bungalow Bill"                "What did you kill, Bungalow Bill?"
```

---

## mi

* How can we use regular expressions to get the word(?) "mi", not words that contain "mi"?

```r
str_subset(string = white_album$lyric, pattern = "mi|Mi")
```

```
##  [1] "Flew in from Miami Beach BOAC"                                                               
##  [2] "That Georgia's always on my mi-mi-mi-mi-mi-mi-mi-mi-mind"                                    
##  [3] "That Georgia's always on my mi-mi-mi-mi-mi-mi-mi-mi-mind"                                    
##  [4] "Let me see you smile"                                                                        
##  [5] "So let me see you smile again"                                                               
##  [6] "Won't you let me see you smile?"                                                             
##  [7] "Deep in the jungle, where the mighty tiger lies"                                             
##  [8] "With every mistake, we must surely be learning"                                              
##  [9] "She's not a girl who misses much"                                                            
## [10] "The man in the crowd with the multicoloured mirrors"                                         
## [11] "I'm so tired, my mind is on the blink"                                                       
## [12] "I'm so tired, my mind is set on you"                                                         
## [13] "For a little peace of mind"                                                                  
## [14] "For a little peace of mind"                                                                  
## [15] "I'd give you everything I've got for a little peace of mind"                                 
## [16] "I'd give you everything I've got for a little peace of mind"                                 
## [17] "I'd give you everything I've got for a little peace of mind"                                 
## [18] "Now somewhere in the black mining hills of DakotaThere lived a young boy named Rocky Raccoon"
## [19] "I listen for your footsteps coming up the drive"                                             
## [20] "Windy smile calls me"                                                                        
## [21] "I can only speak my mind, Julia"                                                             
## [22] "Black cloud crossed my mind"                                                                 
## [23] "Blue mist round my soul"                                                                     
## [24] "Just a smile would lighten everything"                                                       
## [25] "I'm coming down fast, but I'm miles above you"                                               
## [26] "I'm coming down fast, but don't let me break you"                                            
## [27] "I'm coming down fast, but don't let me break you"                                            
## [28] "She's coming down fast!"                                                                     
## [29] "Coming down fast"                                                                            
## [30] "How can I ever misplace you?"                                                                
## [31] "But if you want money for people with minds that hate"                                       
## [32] "You better free your mind instead"                                                           
## [33] "You might not feel it now"                                                                   
## [34] "The duchess of Kircaldy always smiling"                                                      
## [35] "I've missed all of that"                                                                     
## [36] "Them for themming and when for whimming"                                                     
## [37] "Close your eyes and I'll close mine"                                                         
## [38] "Close your eyes and I'll close mine"
```

---

## mi

```r
str_subset(string = white_album$lyric, pattern = "\\b(mi|Mi)\\b")
```

```
## [1] "That Georgia's always on my mi-mi-mi-mi-mi-mi-mi-mi-mind"
## [2] "That Georgia's always on my mi-mi-mi-mi-mi-mi-mi-mi-mind"
```

---

## ob

```r
str_subset(string = white_album$lyric, pattern = "\\b(ob|Ob)\\b")
```

```
##  [1] "Ob-la-di, ob-la-da, life goes on, brah"      
##  [2] "Ob-la-di, ob-la-da, life goes on, brah"      
##  [3] "Ob-la-di, ob-la-da, life goes on, brah"      
##  [4] "Ob-la-di, ob-la-da, life goes on, brah"      
##  [5] "Ob-la-di, ob-la-da, life goes on, brah"      
##  [6] "Yeah, ob-la-di, ob-la-da, life goes on, brah"
##  [7] "Ob-la-di, ob-la-da, life goes on, brah"      
##  [8] "Yeah, ob-la-di, ob-la-da, life goes on, brah"
##  [9] "Take Ob-la-di-bla-da"                        
## [10] "We all know Ob-La-Di-Bla-Da"
```

---

## Wordcloud

* What's the `geom`?
* What are the `aes`thetics of the `geom`?
* How are the variables mapped to the `aes`thetics?

---

## Wordcloud

```r
library(wordcloud)
library(RColorBrewer)
pal <- brewer.pal(9, "Set1")

white_album_count <- white_album  %>%
  unnest_tokens(output = word, input = lyric,
                token = "words") %>%
  anti_join(stop_words, by = "word") %>%
  count(word, sort = TRUE)
```

---

```r
white_album_count %>%
  with(wordcloud(word, n, colors = pal,
          min.freq = 7, random.order = FALSE,
          scale = c(4, 1)))
```

* Issue with the color palette?

---

```r
library(viridis)
pal <- magma(n = 30, direction = -1)
white_album_count %>%
  with(wordcloud(word, n, scale = c(4, 1),
                 colors = pal, min.freq = 7, 
                 random.order = FALSE))
```

---

## Comparisons Across Albums

```r
sweetener <- genius_album(artist = "Ariana Grande", 
                          album = "Sweetener") %>%
  mutate(album = "Sweetener")
  
thank_u_next <- genius_album(artist = "Ariana Grande", 
                          album = "thank u next") %>%
  mutate(album = "thank_u_next")
```

---

## Word Frequencies Across Albums

```r
ariana_grande <- bind_rows(sweetener, thank_u_next) %>%
  unnest_tokens(output = word, input = lyric, token = "words") %>%
  anti_join(stop_words, by = "word") %>%
  filter(!(word %in% c("ayy", "da", "eh"))) %>%
  count(album, word) %>%
  group_by(album) %>%
  mutate(prop = n/sum(n)) 
ariana_grande
```

```
## # A tibble: 887 x 4
## # Groups:   album [2]
##    album     word       n     prop
##    <chr>     <chr>  <int>    <dbl>
##  1 Sweetener afraid     1 0.000560
##  2 Sweetener ah        13 0.00728 
##  3 Sweetener ahh        3 0.00168 
##  4 Sweetener air        7 0.00392 
##  5 Sweetener align      1 0.000560
##  6 Sweetener angel      2 0.00112 
##  7 Sweetener ariana     1 0.000560
##  8 Sweetener asleep     1 0.000560
##  9 Sweetener awake      1 0.000560
## 10 Sweetener aww        1 0.000560
## # … with 877 more rows
```

---

## Word Frequencies Across Albums

```r
ariana_grande_wider <-  ariana_grande %>%
  select(album, word, prop) %>%
  pivot_wider(names_from = album, values_from = prop)
ariana_grande_wider
```

```
## # A tibble: 758 x 3
##    word   Sweetener thank_u_next
##    <chr>      <dbl>        <dbl>
##  1 afraid  0.000560    NA       
##  2 ah      0.00728      0.00992 
##  3 ahh     0.00168     NA       
##  4 air     0.00392     NA       
##  5 align   0.000560     0.000661
##  6 angel   0.00112      0.00397 
##  7 ariana  0.000560    NA       
##  8 asleep  0.000560    NA       
##  9 awake   0.000560    NA       
## 10 aww     0.000560    NA       
## # … with 748 more rows
```

---

## Word Frequencies Across Albums

```r
ariana_grande %>%
  group_by(album) %>%
  arrange(desc(n)) %>%
  slice(1:10) %>%
  ungroup() %>%
  mutate(word = factor(word), word = fct_reorder(word, n)) %>%
  ggplot(mapping = aes(x = word, y = n)) +
  geom_col() + facet_wrap(~album) +
  coord_flip()
```

---

## Word Frequencies Across Albums

---

### Word Frequencies Across Albums

```r
ariana_grande_wider %>%
  filter(Sweetener > 0.001, thank_u_next > 0.001) %>%
ggplot(mapping = aes(x = Sweetener, y = thank_u_next,
                     label = word)) +
  geom_text(size = 4, position = 
              position_jitter(width = 0.08, height = 0.08)) + 
  scale_x_log10() + scale_y_log10() +
  geom_abline()
```

---

## Sentiment Analysis

* Was `thank u next` a more negative album than `Sweetener`?

* Need to add a column that measures the sentiment of each token.
    + From [Bing Liu and collaborators](https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html)
    + Generalizability to other English-speaking countries or time periods?

```r
sentiments
```

```
## # A tibble: 6,786 x 2
##    word        sentiment
##    <chr>       <chr>    
##  1 2-faces     negative 
##  2 abnormal    negative 
##  3 abolish     negative 
##  4 abominable  negative 
##  5 abominably  negative 
##  6 abominate   negative 
##  7 abomination negative 
##  8 abort       negative 
##  9 aborted     negative 
## 10 aborts      negative 
## # … with 6,776 more rows
```

---

## Sentiment Analysis

* Keep stop words this time.

```r
ariana_grande <- bind_rows(sweetener, thank_u_next) %>%
  unnest_tokens(output = word, input = lyric, token = "words") %>%
  count(album, word) %>%
  group_by(album) %>%
  mutate(prop = n/sum(n)) 
ariana_grande
```

```
## # A tibble: 1,403 x 4
## # Groups:   album [2]
##    album     word       n     prop
##    <chr>     <chr>  <int>    <dbl>
##  1 Sweetener a         89 0.0141  
##  2 Sweetener about      8 0.00127 
##  3 Sweetener above      3 0.000475
##  4 Sweetener afraid     1 0.000158
##  5 Sweetener after      3 0.000475
##  6 Sweetener again      2 0.000317
##  7 Sweetener ah        13 0.00206 
##  8 Sweetener ahh        3 0.000475
##  9 Sweetener ain't     23 0.00364 
## 10 Sweetener air        7 0.00111 
## # … with 1,393 more rows
```

---

## Sentiment Analysis

What are the most common **negative words** on each album?

```r
ariana_grande %>%
  inner_join(sentiments, by = "word") %>%
  filter(sentiment == "negative") %>%
  arrange(desc(n))
```

```
## # A tibble: 106 x 5
## # Groups:   album [2]
##    album        word         n    prop sentiment
##    <chr>        <chr>    <int>   <dbl> <chr>    
##  1 Sweetener    stole       32 0.00507 negative 
##  2 Sweetener    bum         23 0.00364 negative 
##  3 thank_u_next bad         19 0.00396 negative 
##  4 Sweetener    darkness    18 0.00285 negative 
##  5 Sweetener    twist       16 0.00253 negative 
##  6 thank_u_next fake        16 0.00334 negative 
##  7 thank_u_next shit        11 0.00229 negative 
##  8 Sweetener    cry         10 0.00158 negative 
##  9 Sweetener    hard        10 0.00158 negative 
## 10 thank_u_next ruin        10 0.00208 negative 
## # … with 96 more rows
```

---

## Sentiment Analysis

What are the most common **positive words** on each album?

```r
ariana_grande %>%
  inner_join(sentiments, by = "word") %>%
  filter(sentiment == "positive") %>%
  arrange(desc(n))
```

```
## # A tibble: 81 x 5
## # Groups:   album [2]
##    album        word       n    prop sentiment
##    <chr>        <chr>  <int>   <dbl> <chr>    
##  1 thank_u_next like      45 0.00938 positive 
##  2 Sweetener    like      43 0.00681 positive 
##  3 thank_u_next love      41 0.00855 positive 
##  4 thank_u_next thank     39 0.00813 positive 
##  5 Sweetener    happy     25 0.00396 positive 
##  6 thank_u_next good      19 0.00396 positive 
##  7 Sweetener    love      17 0.00269 positive 
##  8 thank_u_next smile     17 0.00354 positive 
##  9 thank_u_next woo       13 0.00271 positive 
## 10 Sweetener    better    12 0.00190 positive 
## # … with 71 more rows
```

---

### What is the distribution of positive and negative words?

* Remember that words not in the lexicon are dropped!

```r
ariana_grande %>%
  inner_join(sentiments, by = "word") %>%
  group_by(album, sentiment) %>%
  summarize(n = sum(n)) %>%
  mutate(prop = n/sum(n)) %>%
  ggplot(aes(x = album, y = prop, fill  = sentiment)) +
  geom_col()
```

Issue with word-based sentiment analysis?

---

## Sentiment Analysis

```r
thank_u_next %>%
  filter(str_detect(lyric, "love"))
```

```
## # A tibble: 34 x 5
##    track_n  line lyric                                    track_title album     
##      <int> <int> <chr>                                    <chr>       <chr>     
##  1       2     7 "I'ma scream and shout for what I love"  needy       thank_u_n…
##  2       2    11 "I'm obsessive and I love too hard"      needy       thank_u_n…
##  3       2    24 "I'ma scream and shout for what I love"  needy       thank_u_n…
##  4       2    28 "I'm obsessive and I love too hard"      needy       thank_u_n…
##  5       3     4 "You can say \"I love you\" through the… NASA        thank_u_n…
##  6       3    25 "Usually, I would love it if you stayed… NASA        thank_u_n…
##  7       3    50 "You can say \"I love you\" through the… NASA        thank_u_n…
##  8       4     8 "Love me, love me, baby"                 bloodline   thank_u_n…
##  9       4    12 "Get it like you love me"                bloodline   thank_u_n…
## 10       4    28 "I ain't lookin' for my one true love"   bloodline   thank_u_n…
## # … with 24 more rows
```

---

## Sentiment Analysis

* Should also try out other lexicons

```r
nrc <- get_sentiments("nrc")
nrc
```

```
## # A tibble: 13,901 x 2
##    word        sentiment
##    <chr>       <chr>    
##  1 abacus      trust    
##  2 abandon     fear     
##  3 abandon     negative 
##  4 abandon     sadness  
##  5 abandoned   anger    
##  6 abandoned   fear     
##  7 abandoned   negative 
##  8 abandoned   sadness  
##  9 abandonment anger    
## 10 abandonment fear     
## # … with 13,891 more rows
```

---

```r
ariana_grande %>%
  inner_join(nrc, by = "word") %>%
  group_by(album, sentiment) %>%
  summarize(n = sum(n)) %>%
  mutate(prop = n/sum(n)) %>%
  ggplot(aes(fill = album, y = prop, x  = sentiment)) +
  geom_col(position = "dodge") + coord_flip()
```

---

## Measuring Differences: `tf_idf`

* tf = Number of times word appears in a given text

* idf = log(number of texts/number of texts with word)

* tf `$*$` idf = Sense of frequency within text that accounts for how common word is across texts

If we have 6 texts and "you" shows up in all of them, then tf `$*$` idf equals what?

---

### Need Several Albums

```r
ts_albums <- c("Taylor Swift", "Fearless", "Speak Now",
               "1989", "Reputation", "Lover")

ts <- genius_album(artist = "Taylor Swift", 
                   album = ts_albums[1]) %>%
  mutate(album = ts_albums[1])

for(i in 2:length(ts_albums)){
  next_album <- genius_album(artist = "Taylor Swift",
                             album = ts_albums[i]) %>%
    mutate(album = ts_albums[i])
  ts <- bind_rows(ts, next_album)
}
```

---

## Measuring Differences: `tf_idf`

```r
taylor_tidy <- ts %>%
  unnest_tokens(output = word, input = lyric, token = "words") %>%
  count(album, word, sort = TRUE) %>%
  filter(!(word %in% c("la", "ey", "e", "di", "da", "eeh", "ooh",
                       "aah", "ah"))) %>%
  bind_tf_idf(word, album, n)

taylor_tidy %>%
  arrange(desc(tf_idf))
```

```
## # A tibble: 4,952 x 6
##    album      word         n      tf   idf  tf_idf
##    <chr>      <chr>    <int>   <dbl> <dbl>   <dbl>
##  1 1989       yet         64 0.0109  1.10  0.0120 
##  2 1989       woods       39 0.00664 1.79  0.0119 
##  3 Lover      daylight    40 0.00595 1.79  0.0107 
##  4 Speak Now  grow        21 0.00374 1.79  0.00669
##  5 1989       york        30 0.00511 1.10  0.00561
##  6 Reputation getaway     22 0.00306 1.79  0.00548
##  7 1989       welcome     29 0.00494 1.10  0.00543
##  8 1989       shake       78 0.0133  0.405 0.00539
##  9 1989       blood       16 0.00272 1.79  0.00488
## 10 Fearless   belong      12 0.00267 1.79  0.00478
## # … with 4,942 more rows
```

---

```r
taylor_tidy %>%
  mutate(album = factor(album, levels = ts_albums)) %>%
  group_by(album) %>%
  slice_max(tf_idf, n = 10) %>%
  ungroup() %>%
  mutate(word = fct_reorder(word, tf_idf)) %>%
  ggplot(aes(x = word, y = tf_idf, fill = album)) + 
  geom_col(show.legend = FALSE) + coord_flip() +
  facet_wrap(~album, ncol = 3, scales = "free")
```

---

## Further Text Analysis Topics

* Topic Models: Latent Dirichlet allocation

* Sentence level sentiment analysis with `coreNLP`, `cleanNLP`, and/or `sentimentr`

---

### National

---

### ~~National~~

---

### International Wear your Hat to Zoom Day 🎈: Thursday, April 1st

😜 Lesser known holiday on the same day: April Fool's Day

**Any** Headware is welcome

🧢 Hats

🎧 Headphones

👱‍♀️ Wigs

⛵ [Paper sailor hats](https://lifestyle.howstuffworks.com/crafts/recycled/how-to-make-paper-sailor-hat.htm)

🍩 Inflatable donut

Encouraged to wear the 🤠 all day.

Participation is optional.