The World (of music) According to Spotify

Malen Cuturic (Data Science at Reed College)https://reed-statistics.github.io/math241-spring2022/
May 5, 2022

Background

With approximately 406 million users at the end of 20211, and only growth since, Spotify commands a great deal of influence in the world of music. Being situated in such a central location, the decisions that the engineers behind Spotify make broker not only massive amounts of cultural currency, dictating which independent and lesser-known artists get the exposure necessary to break through to the mainstream, but an arguably even more important flow of money from listeners to artists, both directly through its (admittedly almost negligible) payments made for each listen and indirectly through ticket sales that artists receive from increased exposure. There are several ways that Spotify attempts to curate content for its users based on their past listening history, including daily mixes, front page recommendations, automatically generated playlists, and autoplaying suggested songs at the end of a queue. One thing that each of these methods have in common is the fact that they are automated. This is of course largely unavoidable with the massive size of Spotify’s userbase. However, recognition of the very real material ways that Spotify’s algorithms impact the livelihoods of artists, and to a lesser extent the lives of listeners, calls in to question the opacity of the algorithms that they’re using to make their recommendations.

Based on past analysis I have done using the Spotify API (analyzing follower patterns), I know that there are several publicly available statistics on songs, categorizing their “danceability,” mood, and other similar features. However, based purely on anecdotal evidence, I would estimate that these variables account for far less than 50% of the variability in song recommendation– the most important features, in other words, are hidden. This is supported by a variety of already noted bizarre recommendation phenomena2. It is my belief that these algorithms could be subject to systemic bias, which could have real world financial consequences for artists belonging to marginalized communities, and an analysis of the ways in which Spotify makes its recommendations is necessary. Specifically, I hypothesize that network analysis will show a bias in favor of white and male musicians; in particular, I estimate that white male musicians will have consistently higher measures of centrality than non-white or female musicians. This will be expanded upon in the methods section.

Data Description

The specific form of content creation I will interrogate in this analysis is the 1,520 public playlists created and maintained by the Spotify official account on Spotify. The data collected through the Spotify API was thus the songs featured on every one of those playlists, coerced into an edgelist format to be read as as a graph. This CSV, created in Python, is the list of all possible pairs of artists in each playlist, as the inclusion of artists on a playlist indicates a tie in Spotify’s understanding of their relationship, with an additional column for the name of the playlist.

For an example, if we had a playlist titled “Example Playlist” which had three songs on it, written by Some Rock Band, Some Pop Artist, and Some Jazz Musician respectively, the CSV would represent this playlist with three categorical variables like so:

hide
data.frame(V1 = c(rep('Some Rock Band',2), 'Some Pop Artist'),
           V2 = c('Some Pop Artist', rep('Some Jazz Musician',2)),
           playlist = rep('Example Playlist', 3)) %>%
  knitr::kable()
V1 V2 playlist
Some Rock Band Some Pop Artist Example Playlist
Some Rock Band Some Jazz Musician Example Playlist
Some Pop Artist Some Jazz Musician Example Playlist

These data were read into an igraph object, and subsequently given the discrete numerical vertex attribute of “degree,” in this case representing how many playlists the artist was featured on rather than its traditional meaning in graph theory, while the edges were given the categorical variable of the playlist they were drawn from as an attribute. The artists were then all given another two categorical variables which I encoded manually– whether they were white/nonwhite, and whether they were male/female or gender minority. In the case of bands, the artists were encoded as white if the band was all white and male if the band was all male, and the opposite if the band had at least one nonwhite or female member. The time constraints unfortunately forced me to use this crude oversimplification of the vast amount of racial and gender identities that are represented in this population, and it is worth noting that these (literally) binary variables do not begin to capture the reality of the intersections the artists in question exist at.

Data Explorations / Methods and Initial Findings

After acquiring these data, the graph was shrunk to the largest connected component, removing many of the artists featured on disconnected satellite playlists such as the “Learn to Speak French” playlist, and the remaining subgraph was then filtered to only artists featured on 10 or more playlists. This left us with a subgraph with 420 nodes. From here, I employed random subgraphing to create 2 exploratory plots of the network, taking 100 random nodes for each of the following graphs which give an impression of the network distribution of female and male artists and the distribution of white and nonwhite artists.

hide
knitr::include_graphics('gender_subgraph.jpg')
knitr::include_graphics('race_subgraph.jpg')

An immediate apparent disparity reveals itself as white artists vastly outnumber nonwhite artists, and to a lesser extent, male artists outnumber female ones (roughly 65/30). Moreover, we can see the hint of a disparity in network effects as well– note how the few nonwhite artists present are mostly on the outskirts of the network, and often find themselves in clusters of other nonwhite artists. Similarly, female artists appear to be somewhat clustered as well, although they seem more central in their network, they appear less likely to bridge a gap between components.

In order to examine further, I subsetted in the following 4 plots, which show distributions of race for male artists and female artists separately, and distributions of gender for white and non-white artists separately.

hide
knitr::include_graphics('female_subgraph.jpg')
knitr::include_graphics('male_subgraph.jpg')

The plots showing racial distribution faceted between male and female artists tells an interesting story. While we see a similar clustering of non-white artists and generally lower centrality for non-white artists in both plots, there is a marked difference in trends between male and female artists along these lines. Non-white male artists are largely relegated to one cluster, and are otherwise pushed towards the outskirts, with several clusters almost exclusively populated by white artists. In contrast, for female artists, there is much more dispersion, with only one truly notable cluster in the bottom left which seems to be disproportionately white.

hide
knitr::include_graphics('white_subgraph.jpg')
knitr::include_graphics('nonwhite_subgraph.jpg')

These plots crystallize these observations into a clearer picture. It appears that white artists are subject to less dispersion and are more scattered throughout the network, while non-white artists are highly cohesive and have few segments that could truly be called separate components from their dense epicenter. It also appears that the groups that are most likely to end up in relatively disconnected areas are white men and non-white women.

Continuing our exploratory analysis, we can also briefly observe the distribution of measures of centrality in the histograms below:

hide
ggplot(data.frame(x = closeness(s)), aes(x = x)) +
  geom_histogram(color = 'white', fill = 'midnight blue',
                 bins = 30) +
  labs(x = 'Closeness')
hide
ggplot(data.frame(x = betweenness(s)), aes(x = x)) +
  geom_histogram(color = 'white', fill = 'dark red',
                 bins = 30) +
  labs(x = 'Betweenness (Log 10 Scale)') +
  scale_x_log10()
hide
ggplot(data.frame(x = V(s)$degree), aes(x = x)) +
  geom_histogram(color = 'white', fill = 'dark green',
                 bins = 30) +
  labs(x = 'Degree (Playlist Membership Count)') +
  xlim(c(10, 50))

The first two measures of centrality represented here are standard in the sociological methodology of social network analysis3. The first, closeness, broadly measures just what you’d expect– distance to other nodes (or artists). The closeness of a node is a normalized measurement of the sum of the lengths of the shortest paths between it and every other node in the graph. Thus, artists that are highly central in closeness will need fewer playlists on average to connect them to every other artist in the graph.

The second, betweenness, can loosely be considered a measurement of a node’s importance for the closeness of other nodes. It is a count of the amount of times a particular node belongs to the shortest path between two other nodes. In this case, a high betweenness score is representative of an artist that is often used as a “bridge” between other artists, perhaps artists that can be considered genre spanning or artists that could easily be put onto many different types of playlists.

The third and final measure we will be observing is a modification of the concept of degree centrality. In graph theory, degree refers to the amount of nodes that a node is directly connected to: its immediate neighbors. In our study, however, this framework give high degree centrality to the artist of a piano song placed on the 510 song long “Chill Piano” playlist, since it connects to likely somewhere in the region of 400 distinct artists by belonging to that playlist alone. This is not a great measure of centrality for our purposes, so we are instead looking at how many different playlists each artist is featured on.

hide
#organizing summary stats like a total pro
df <- data.frame(artist = names(V(s)), white = V(s)$white, male = V(s)$male,
                 clo = closeness(s), bet = betweenness(s), deg = V(s)$degree,
                 row.names = 1:length(V(s))) %>%
  mutate(Race = ifelse(white, 'White', 'Non-white'),
         Gender = ifelse(male, 'Male', 'Female'),
         .before = 1) %>%
  select(-white, -male)

df %>%
  group_by(Race, Gender) %>%
  summarize(Closeness = median(clo), Betweenness = median(bet), 
            Degree = median(deg), .groups = 'drop') %>%
  arrange(desc(Betweenness)) %>%
  knitr::kable(caption = 'Median Centrality Measures',
               digits = c(1, 1, 5, 2, 2))
Table 1: Median Centrality Measures
Race Gender Closeness Betweenness Degree
White Male 0.00140 65.00 15
White Female 0.00142 63.70 14
Non-white Female 0.00141 48.85 15
Non-white Male 0.00144 33.48 13

Cursory exploratory analysis of the measures of centrality supports the hypothesis of the interrelatedness of race and gender in the problem at hand. In the table above, we see that white men have higher betweenness and degree centrality than any other group, but specific trends are hard to generalize. The closeness scores are all very close to one another, and closer examination is required to glean anything from them. However, being white has an unmistakable impact on betweenness, supporting the hypothesis that Spotify’s algorithm finds non-white people, especially non-white men, less suitable for a wide variety of playlists, and is likely to relegate them to a particular subset of playlists.

Hypothesis testing was performed with an alpha of 0.025 for the difference in medians of each centrality measure between white and non-white and male and female artists. This was done using a permutation test with 10,000 replicates for each research question, and two-sided p-values were then calculated to test for statistical significance to reject the null hypothesis, that being white/non-white or being male/female is independent of your measures of centrality. These 6 hypothesis tests had varying levels of statistical significance, and the only one for which an incredibly clear relationship appeared was the difference in median betweenness between white and non-white artists.

hide
set.seed(1) #setting seed so I don't look foolish when stating result in paper

obs_diff <- df %>%
  specify(explanatory = Race, response = bet) %>%
  calculate('diff in medians', order = c('White', 'Non-white')) %>%
  pull()
null <- df %>%
  specify(explanatory = Race, response = bet) %>%
  hypothesize('independence') %>%
  generate(reps = 10^4, type = 'permute') %>%
  calculate('diff in medians', order = c('White', 'Non-white'))
p <- get_p_value(null, obs_diff, 'both') %>% pull()

visualize(null) +
  shade_p_value(obs_diff, 'both')

Above we can see a histogram of the null distribution of the difference in median, with the observed difference of 26.269 shown in red. The probability of attaining such a statistic under the null hypothesis was p = 0.003, which was significantly less than our alpha of 0.025, supporting the rejection of our null hypothesis, and suggesting that being white does have a significant correlation with betweenness.

Results and Conclusion

The highly theoretical nature of our methods and the fact that we can’t in any reasonable amount of time perform our analysis on the entire network in question limits this study. However, the strength of our results and the effect size give credence to the hypothesis that Spotify’s algorithm systematically privileges white artists over non-white artists. The fact that this pattern is revealed through a difference in the betweenness of the groups of artists suggests that white artists are considered more genre and style agnostic than their non-white peers, and are able to fit into a wider variety of playlists as a result. Further research into these results is crucial, and greater opacity is necessary in order to ensure that the systems that Spotify has put in place are equitable and are not serving to replicate and perpetuate existing systems of inequality in the world of music.

Appendix

Graph Crunching Code (eval=FALSE)

hide
library(tidyverse)
library(igraph)

#creating graph
edgelist <- 
  read.csv('data/edgelist.csv',
           col.names = c('V1', 'V2', 'playlist')) %>%
  filter(V1 != '', V2 != '', playlist != '') #no empty cells

    
g <- graph_from_edgelist(as.matrix(edgelist[,1:2]), directed = FALSE)
E(g)$playlist <- edgelist[,3]


#counting number of playlists each artist appears on

playlist_count_df <- 
  rbind(
    edgelist  %>%
      select(V1, playlist),
    edgelist %>% 
      select(V2, playlist) %>%
      rename(V1 = V2)
  ) %>%
  distinct(.keep_all = TRUE)

counts <- plyr::count(playlist_count_df$V1) #alphabetical counts
vertex_attr(g, 'degree', sort(names(V(g)))) <- counts[,2] #alphabetized

#saving big graph
saveRDS(g, 'data/graph.RDS')

########## creating subgraph for analysis #############

#largest connected component
s <- subgraph(g, components(g)$membership == 1)

#filtering to only artists who are on at least 10 playlists,
#then removing isolates
s <- delete_vertices(s, V(s)[V(s)$degree< 10])
s <- delete_vertices(s, degree(s) == 0)

#creating empty csv with col full of artists for coding of race and gender
dem <- data.frame(artist = names(V(s)), white = NA, male = NA)
write.csv(dem, 'data/dems.csv', row.names = FALSE)

#after this was completed, I manually coded for whether artist was
#white and whether artist was male in a new csv called dems_coded.csv, loaded
#below
dem_coded <- read.csv('data/dems_coded.csv') %>%
  drop_na()

s <- subgraph(s, dem_coded$artist)

#validating
if(mean(names(V(s)) == dem_coded$artist) == 1){
  V(s)$white <- dem_coded$white
  V(s)$male <- dem_coded$male
}

saveRDS(s, 'data/coded_subgraph.RDS')

Plot Code (eval=FALSE)

hide
library(tidyverse)
library(igraph)

s <- readRDS('data/coded_subgraph.RDS')

female_color <- '#785EF0'
male_color <- '#FE6100'
nonwhite_color <- '#DA0000'
white_color <- '#6AA84F'

V(s)$gcolor <- female_color
#color of male vertices
V(s)[V(s)$male == 1]$gcolor <- male_color

#color of non-white vertices
V(s)$rcolor <- nonwhite_color
#color of white vertices
V(s)[V(s)$white == 1]$rcolor <- white_color
hide
set.seed(1)
subgraph <- subgraph(s, sample(V(s), 100))

jpeg('gender_subgraph.jpg', width = 800, height = 800)

plot(subgraph, 
     vertex.size = 5,
     vertex.label = NA,
     vertex.color = V(subgraph)$gcolor,
     edge.color = alpha('black', 0.1),
     edge.width = .01,
     arrow.mode = '-',
     layout = layout_with_kk(subgraph))
title(sub = 'Male and Female Artists', cex.sub = 2.5)
legend('left', legend=c("Male", "Female"), fill = c(male_color, female_color))

dev.off()

rm(subgraph)
hide
set.seed(2)
subgraph <- subgraph(s, sample(V(s), 100))

jpeg('race_subgraph.jpg', width = 800, height = 800)

plot(subgraph, 
     vertex.size = 5,
     vertex.label = NA,
     vertex.color = V(subgraph)$rcolor,
     edge.color = alpha('black', 0.1),
     edge.width = .01,
     arrow.mode = '-',
     layout = layout_with_kk(subgraph))
title(sub = 'White and Non-white Artists', cex.sub = 2.5)
legend('left', legend=c("White", "Non-white"), 
       fill = c(white_color, nonwhite_color))

dev.off()

rm(subgraph)
hide
set.seed(3)
female_vertices <- V(s)[V(s)$male == 0]
subgraph <- subgraph(s, sample(female_vertices, 100))

jpeg('female_subgraph.jpg', width = 800, height = 800)

plot(subgraph, 
     vertex.size = 5,
     vertex.label = NA,
     vertex.color = V(subgraph)$rcolor,
     edge.color = alpha('black', 0.1),
     edge.width = .01,
     arrow.mode = '-',
     layout = layout_with_kk(subgraph))
title(sub = 'White and Non-white Female Artists', cex.sub = 2.5)
legend('left', legend=c("White", "Non-white"), 
       fill = c(white_color, nonwhite_color))

dev.off()

rm(subgraph)
hide
set.seed(4)
male_vertices <- V(s)[V(s)$male == 1]

subgraph <- subgraph(s, sample(male_vertices, 100))

jpeg('male_subgraph.jpg', width = 800, height = 800)

plot(subgraph, 
     vertex.size = 5,
     vertex.label = NA,
     vertex.color = V(subgraph)$rcolor,
     edge.color = alpha('black', 0.1),
     edge.width = .01,
     arrow.mode = '-',
     layout = layout_with_kk(subgraph))
title(sub = 'White and Non-white Male Artists', cex.sub = 2.5)
legend('left', legend=c("White", "Non-white"), 
       fill = c(white_color, nonwhite_color))

dev.off()

rm(subgraph)
rm(male_vertices)
hide
white_vertices <- V(s)[V(s)$white == 1]

set.seed(3)
subgraph <- subgraph(s, sample(white_vertices, 100))

jpeg('white_subgraph.jpg', width = 800, height = 800)

plot(subgraph, 
     vertex.size = 5,
     vertex.label = NA,
     vertex.color = V(subgraph)$gcolor,
     edge.color = alpha('black', 0.1),
     edge.width = .01,
     arrow.mode = '-',
     layout = layout_with_kk(subgraph))
title(sub = 'Male and Female White Artists', cex.sub = 2.5)
legend('left', legend=c("Male", "Female"), fill = c(male_color, female_color))

dev.off()

rm(subgraph)
rm(white_vertices)
hide
nonwhite_vertices <- V(s)[V(s)$white == 0]

set.seed(3)
subgraph <- subgraph(s, sample(nonwhite_vertices, 100))

jpeg('nonwhite_subgraph.jpg', width = 800, height = 800)

plot(subgraph, 
     vertex.size = 5,
     vertex.label = NA,
     vertex.color = V(subgraph)$gcolor,
     edge.color = alpha('black', 0.1),
     edge.width = .01,
     arrow.mode = '-',
     layout = layout_with_kk(subgraph))
title(sub = 'Male and Female Non-White Artists', cex.sub = 2.5)
legend('left', legend=c("Male", "Female"), fill = c(male_color, female_color))

dev.off()

rm(subgraph)
rm(nonwhite_vertices)

Class Peer Reviews


  1. https://newsroom.spotify.com/company-info/↩︎

  2. https://www.stereogum.com/2105993/pavement-harness-your-hopes-spotify/columns/sounding-board/↩︎

  3. https://cambridge-intelligence.com/keylines-faqs-social-network-analysis/↩︎

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".