9: The World (of music) According to Spotify

Background

With approximately 406 million users at the end of 2021¹, and only growth since, Spotify commands a great deal of influence in the world of music. Being situated in such a central location, the decisions that the engineers behind Spotify make broker not only massive amounts of cultural currency, dictating which independent and lesser-known artists get the exposure necessary to break through to the mainstream, but an arguably even more important flow of money from listeners to artists, both directly through its (admittedly almost negligible) payments made for each listen and indirectly through ticket sales that artists receive from increased exposure. There are several ways that Spotify attempts to curate content for its users based on their past listening history, including daily mixes, front page recommendations, automatically generated playlists, and autoplaying suggested songs at the end of a queue. One thing that each of these methods have in common is the fact that they are automated. This is of course largely unavoidable with the massive size of Spotify’s userbase. However, recognition of the very real material ways that Spotify’s algorithms impact the livelihoods of artists, and to a lesser extent the lives of listeners, calls in to question the opacity of the algorithms that they’re using to make their recommendations.

Based on past analysis I have done using the Spotify API (analyzing follower patterns), I know that there are several publicly available statistics on songs, categorizing their “danceability,” mood, and other similar features. However, based purely on anecdotal evidence, I would estimate that these variables account for far less than 50% of the variability in song recommendation– the most important features, in other words, are hidden. This is supported by a variety of already noted bizarre recommendation phenomena². It is my belief that these algorithms could be subject to systemic bias, which could have real world financial consequences for artists belonging to marginalized communities, and an analysis of the ways in which Spotify makes its recommendations is necessary. Specifically, I hypothesize that network analysis will show a bias in favor of white and male musicians; in particular, I estimate that white male musicians will have consistently higher measures of centrality than non-white or female musicians. This will be expanded upon in the methods section.

Data Description

The specific form of content creation I will interrogate in this analysis is the 1,520 public playlists created and maintained by the Spotify official account on Spotify. The data collected through the Spotify API was thus the songs featured on every one of those playlists, coerced into an edgelist format to be read as as a graph. This CSV, created in Python, is the list of all possible pairs of artists in each playlist, as the inclusion of artists on a playlist indicates a tie in Spotify’s understanding of their relationship, with an additional column for the name of the playlist.

For an example, if we had a playlist titled “Example Playlist” which had three songs on it, written by Some Rock Band, Some Pop Artist, and Some Jazz Musician respectively, the CSV would represent this playlist with three categorical variables like so:

hide

data.frame(V1 = c(rep('Some Rock Band',2), 'Some Pop Artist'),
           V2 = c('Some Pop Artist', rep('Some Jazz Musician',2)),
           playlist = rep('Example Playlist', 3)) %>%
  knitr::kable()

V1	V2	playlist
Some Rock Band	Some Pop Artist	Example Playlist
Some Rock Band	Some Jazz Musician	Example Playlist
Some Pop Artist	Some Jazz Musician	Example Playlist

These data were read into an igraph object, and subsequently given the discrete numerical vertex attribute of “degree,” in this case representing how many playlists the artist was featured on rather than its traditional meaning in graph theory, while the edges were given the categorical variable of the playlist they were drawn from as an attribute. The artists were then all given another two categorical variables which I encoded manually– whether they were white/nonwhite, and whether they were male/female or gender minority. In the case of bands, the artists were encoded as white if the band was all white and male if the band was all male, and the opposite if the band had at least one nonwhite or female member. The time constraints unfortunately forced me to use this crude oversimplification of the vast amount of racial and gender identities that are represented in this population, and it is worth noting that these (literally) binary variables do not begin to capture the reality of the intersections the artists in question exist at.

Data Explorations / Methods and Initial Findings

After acquiring these data, the graph was shrunk to the largest connected component, removing many of the artists featured on disconnected satellite playlists such as the “Learn to Speak French” playlist, and the remaining subgraph was then filtered to only artists featured on 10 or more playlists. This left us with a subgraph with 420 nodes. From here, I employed random subgraphing to create 2 exploratory plots of the network, taking 100 random nodes for each of the following graphs which give an impression of the network distribution of female and male artists and the distribution of white and nonwhite artists.

hide

knitr::include_graphics('gender_subgraph.jpg')
knitr::include_graphics('race_subgraph.jpg')

An immediate apparent disparity reveals itself as white artists vastly outnumber nonwhite artists, and to a lesser extent, male artists outnumber female ones (roughly 65/30). Moreover, we can see the hint of a disparity in network effects as well– note how the few nonwhite artists present are mostly on the outskirts of the network, and often find themselves in clusters of other nonwhite artists. Similarly, female artists appear to be somewhat clustered as well, although they seem more central in their network, they appear less likely to bridge a gap between components.

In order to examine further, I subsetted in the following 4 plots, which show distributions of race for male artists and female artists separately, and distributions of gender for white and non-white artists separately.

hide

knitr::include_graphics('female_subgraph.jpg')
knitr::include_graphics('male_subgraph.jpg')

The plots showing racial distribution faceted between male and female artists tells an interesting story. While we see a similar clustering of non-white artists and generally lower centrality for non-white artists in both plots, there is a marked difference in trends between male and female artists along these lines. Non-white male artists are largely relegated to one cluster, and are otherwise pushed towards the outskirts, with several clusters almost exclusively populated by white artists. In contrast, for female artists, there is much more dispersion, with only one truly notable cluster in the bottom left which seems to be disproportionately white.

hide

knitr::include_graphics('white_subgraph.jpg')
knitr::include_graphics('nonwhite_subgraph.jpg')

These plots crystallize these observations into a clearer picture. It appears that white artists are subject to less dispersion and are more scattered throughout the network, while non-white artists are highly cohesive and have few segments that could truly be called separate components from their dense epicenter. It also appears that the groups that are most likely to end up in relatively disconnected areas are white men and non-white women.

Continuing our exploratory analysis, we can also briefly observe the distribution of measures of centrality in the histograms below:

hide

ggplot(data.frame(x = closeness(s)), aes(x = x)) +
  geom_histogram(color = 'white', fill = 'midnight blue',
                 bins = 30) +
  labs(x = 'Closeness')

hide

ggplot(data.frame(x = betweenness(s)), aes(x = x)) +
  geom_histogram(color = 'white', fill = 'dark red',
                 bins = 30) +
  labs(x = 'Betweenness (Log 10 Scale)') +
  scale_x_log10()

hide

ggplot(data.frame(x = V(s)$degree), aes(x = x)) +
  geom_histogram(color = 'white', fill = 'dark green',
                 bins = 30) +
  labs(x = 'Degree (Playlist Membership Count)') +
  xlim(c(10, 50))

The first two measures of centrality represented here are standard in the sociological methodology of social network analysis³. The first, closeness, broadly measures just what you’d expect– distance to other nodes (or artists). The closeness of a node is a normalized measurement of the sum of the lengths of the shortest paths between it and every other node in the graph. Thus, artists that are highly central in closeness will need fewer playlists on average to connect them to every other artist in the graph.

The second, betweenness, can loosely be considered a measurement of a node’s importance for the closeness of other nodes. It is a count of the amount of times a particular node belongs to the shortest path between two other nodes. In this case, a high betweenness score is representative of an artist that is often used as a “bridge” between other artists, perhaps artists that can be considered genre spanning or artists that could easily be put onto many different types of playlists.

The third and final measure we will be observing is a modification of the concept of degree centrality. In graph theory, degree refers to the amount of nodes that a node is directly connected to: its immediate neighbors. In our study, however, this framework give high degree centrality to the artist of a piano song placed on the 510 song long “Chill Piano” playlist, since it connects to likely somewhere in the region of 400 distinct artists by belonging to that playlist alone. This is not a great measure of centrality for our purposes, so we are instead looking at how many different playlists each artist is featured on.

hide

#organizing summary stats like a total pro
df <- data.frame(artist = names(V(s)), white = V(s)$white, male = V(s)$male,
                 clo = closeness(s), bet = betweenness(s), deg = V(s)$degree,
                 row.names = 1:length(V(s))) %>%
  mutate(Race = ifelse(white, 'White', 'Non-white'),
         Gender = ifelse(male, 'Male', 'Female'),
         .before = 1) %>%
  select(-white, -male)

df %>%
  group_by(Race, Gender) %>%
  summarize(Closeness = median(clo), Betweenness = median(bet), 
            Degree = median(deg), .groups = 'drop') %>%
  arrange(desc(Betweenness)) %>%
  knitr::kable(caption = 'Median Centrality Measures',
               digits = c(1, 1, 5, 2, 2))

Table 1: Median Centrality Measures
Race	Gender	Closeness	Betweenness	Degree
White	Male	0.00140	65.00	15
White	Female	0.00142	63.70	14
Non-white	Female	0.00141	48.85	15
Non-white	Male	0.00144	33.48	13

Cursory exploratory analysis of the measures of centrality supports the hypothesis of the interrelatedness of race and gender in the problem at hand. In the table above, we see that white men have higher betweenness and degree centrality than any other group, but specific trends are hard to generalize. The closeness scores are all very close to one another, and closer examination is required to glean anything from them. However, being white has an unmistakable impact on betweenness, supporting the hypothesis that Spotify’s algorithm finds non-white people, especially non-white men, less suitable for a wide variety of playlists, and is likely to relegate them to a particular subset of playlists.

Hypothesis testing was performed with an alpha of 0.025 for the difference in medians of each centrality measure between white and non-white and male and female artists. This was done using a permutation test with 10,000 replicates for each research question, and two-sided p-values were then calculated to test for statistical significance to reject the null hypothesis, that being white/non-white or being male/female is independent of your measures of centrality. These 6 hypothesis tests had varying levels of statistical significance, and the only one for which an incredibly clear relationship appeared was the difference in median betweenness between white and non-white artists.

hide

set.seed(1) #setting seed so I don't look foolish when stating result in paper

obs_diff <- df %>%
  specify(explanatory = Race, response = bet) %>%
  calculate('diff in medians', order = c('White', 'Non-white')) %>%
  pull()
null <- df %>%
  specify(explanatory = Race, response = bet) %>%
  hypothesize('independence') %>%
  generate(reps = 10^4, type = 'permute') %>%
  calculate('diff in medians', order = c('White', 'Non-white'))
p <- get_p_value(null, obs_diff, 'both') %>% pull()

visualize(null) +
  shade_p_value(obs_diff, 'both')

Above we can see a histogram of the null distribution of the difference in median, with the observed difference of 26.269 shown in red. The probability of attaining such a statistic under the null hypothesis was p = 0.003, which was significantly less than our alpha of 0.025, supporting the rejection of our null hypothesis, and suggesting that being white does have a significant correlation with betweenness.

Results and Conclusion

The highly theoretical nature of our methods and the fact that we can’t in any reasonable amount of time perform our analysis on the entire network in question limits this study. However, the strength of our results and the effect size give credence to the hypothesis that Spotify’s algorithm systematically privileges white artists over non-white artists. The fact that this pattern is revealed through a difference in the betweenness of the groups of artists suggests that white artists are considered more genre and style agnostic than their non-white peers, and are able to fit into a wider variety of playlists as a result. Further research into these results is crucial, and greater opacity is necessary in order to ensure that the systems that Spotify has put in place are equitable and are not serving to replicate and perpetuate existing systems of inequality in the world of music.

Appendix

Graph Crunching Code (eval=FALSE)

hide

library(tidyverse)
library(igraph)

#creating graph
edgelist <- 
  read.csv('data/edgelist.csv',
           col.names = c('V1', 'V2', 'playlist')) %>%
  filter(V1 != '', V2 != '', playlist != '') #no empty cells

    
g <- graph_from_edgelist(as.matrix(edgelist[,1:2]), directed = FALSE)
E(g)$playlist <- edgelist[,3]


#counting number of playlists each artist appears on

playlist_count_df <- 
  rbind(
    edgelist  %>%
      select(V1, playlist),
    edgelist %>% 
      select(V2, playlist) %>%
      rename(V1 = V2)
  ) %>%
  distinct(.keep_all = TRUE)

counts <- plyr::count(playlist_count_df$V1) #alphabetical counts
vertex_attr(g, 'degree', sort(names(V(g)))) <- counts[,2] #alphabetized

#saving big graph
saveRDS(g, 'data/graph.RDS')

########## creating subgraph for analysis #############

#largest connected component
s <- subgraph(g, components(g)$membership == 1)

#filtering to only artists who are on at least 10 playlists,
#then removing isolates
s <- delete_vertices(s, V(s)[V(s)$degree< 10])
s <- delete_vertices(s, degree(s) == 0)

#creating empty csv with col full of artists for coding of race and gender
dem <- data.frame(artist = names(V(s)), white = NA, male = NA)
write.csv(dem, 'data/dems.csv', row.names = FALSE)

#after this was completed, I manually coded for whether artist was
#white and whether artist was male in a new csv called dems_coded.csv, loaded
#below
dem_coded <- read.csv('data/dems_coded.csv') %>%
  drop_na()

s <- subgraph(s, dem_coded$artist)

#validating
if(mean(names(V(s)) == dem_coded$artist) == 1){
  V(s)$white <- dem_coded$white
  V(s)$male <- dem_coded$male
}

saveRDS(s, 'data/coded_subgraph.RDS')

Plot Code (eval=FALSE)

hide

library(tidyverse)
library(igraph)

s <- readRDS('data/coded_subgraph.RDS')

female_color <- '#785EF0'
male_color <- '#FE6100'
nonwhite_color <- '#DA0000'
white_color <- '#6AA84F'

V(s)$gcolor <- female_color
#color of male vertices
V(s)[V(s)$male == 1]$gcolor <- male_color

#color of non-white vertices
V(s)$rcolor <- nonwhite_color
#color of white vertices
V(s)[V(s)$white == 1]$rcolor <- white_color

hide

set.seed(1)
subgraph <- subgraph(s, sample(V(s), 100))

jpeg('gender_subgraph.jpg', width = 800, height = 800)

plot(subgraph, 
     vertex.size = 5,
     vertex.label = NA,
     vertex.color = V(subgraph)$gcolor,
     edge.color = alpha('black', 0.1),
     edge.width = .01,
     arrow.mode = '-',
     layout = layout_with_kk(subgraph))
title(sub = 'Male and Female Artists', cex.sub = 2.5)
legend('left', legend=c("Male", "Female"), fill = c(male_color, female_color))

dev.off()

rm(subgraph)

hide

set.seed(2)
subgraph <- subgraph(s, sample(V(s), 100))

jpeg('race_subgraph.jpg', width = 800, height = 800)

plot(subgraph, 
     vertex.size = 5,
     vertex.label = NA,
     vertex.color = V(subgraph)$rcolor,
     edge.color = alpha('black', 0.1),
     edge.width = .01,
     arrow.mode = '-',
     layout = layout_with_kk(subgraph))
title(sub = 'White and Non-white Artists', cex.sub = 2.5)
legend('left', legend=c("White", "Non-white"), 
       fill = c(white_color, nonwhite_color))

dev.off()

rm(subgraph)

hide

set.seed(3)
female_vertices <- V(s)[V(s)$male == 0]
subgraph <- subgraph(s, sample(female_vertices, 100))

jpeg('female_subgraph.jpg', width = 800, height = 800)

plot(subgraph, 
     vertex.size = 5,
     vertex.label = NA,
     vertex.color = V(subgraph)$rcolor,
     edge.color = alpha('black', 0.1),
     edge.width = .01,
     arrow.mode = '-',
     layout = layout_with_kk(subgraph))
title(sub = 'White and Non-white Female Artists', cex.sub = 2.5)
legend('left', legend=c("White", "Non-white"), 
       fill = c(white_color, nonwhite_color))

dev.off()

rm(subgraph)

hide

set.seed(4)
male_vertices <- V(s)[V(s)$male == 1]

subgraph <- subgraph(s, sample(male_vertices, 100))

jpeg('male_subgraph.jpg', width = 800, height = 800)

plot(subgraph, 
     vertex.size = 5,
     vertex.label = NA,
     vertex.color = V(subgraph)$rcolor,
     edge.color = alpha('black', 0.1),
     edge.width = .01,
     arrow.mode = '-',
     layout = layout_with_kk(subgraph))
title(sub = 'White and Non-white Male Artists', cex.sub = 2.5)
legend('left', legend=c("White", "Non-white"), 
       fill = c(white_color, nonwhite_color))

dev.off()

rm(subgraph)
rm(male_vertices)

hide

white_vertices <- V(s)[V(s)$white == 1]

set.seed(3)
subgraph <- subgraph(s, sample(white_vertices, 100))

jpeg('white_subgraph.jpg', width = 800, height = 800)

plot(subgraph, 
     vertex.size = 5,
     vertex.label = NA,
     vertex.color = V(subgraph)$gcolor,
     edge.color = alpha('black', 0.1),
     edge.width = .01,
     arrow.mode = '-',
     layout = layout_with_kk(subgraph))
title(sub = 'Male and Female White Artists', cex.sub = 2.5)
legend('left', legend=c("Male", "Female"), fill = c(male_color, female_color))

dev.off()

rm(subgraph)
rm(white_vertices)

hide

nonwhite_vertices <- V(s)[V(s)$white == 0]

set.seed(3)
subgraph <- subgraph(s, sample(nonwhite_vertices, 100))

jpeg('nonwhite_subgraph.jpg', width = 800, height = 800)

plot(subgraph, 
     vertex.size = 5,
     vertex.label = NA,
     vertex.color = V(subgraph)$gcolor,
     edge.color = alpha('black', 0.1),
     edge.width = .01,
     arrow.mode = '-',
     layout = layout_with_kk(subgraph))
title(sub = 'Male and Female Non-White Artists', cex.sub = 2.5)
legend('left', legend=c("Male", "Female"), fill = c(male_color, female_color))

dev.off()

rm(subgraph)
rm(nonwhite_vertices)

Class Peer Reviews

Reviewer 1
1. State the authors’ objectives and the general questions that the authors are considering. Do the results and figures support the conclusions made in the report?
Anonymous Crane, the author of this report, was investigating biased within Spotify’s recommendation system. They did this by looking at various playlists made by Spotify and creating a network of these artists. Some artists were cut if the playlist they came from didn’t connect enough to other playlists in Spotifys account. Looking at these networks, they looked at whether or not the artist was white or not and whether they were male or not. Through this, they can see how likely certainly demographics are to be recommended for other playlists and for other parts of Spotify, as playlists are not the only element of recommendation. The results and figures do support the conclusion they made in the report, showing that there is a strong biased for white artists across the board. White artists were more likely to be closely related to other playlists, and therefore would have a stronger likelihood of being recommended by Spotify. The visualizations they showed, many networks, histograms, and general data tables all support the info they are providing and analyzing.
1. Discuss the foundations of data visualizations relative to the figures presented in the report.
Anonymous Crane used network data as the main visualization for this report. A network visualization is a plot of nodes and edges of a larger topic of discussion. A node in this context is any representation of an object, individual, idea, etc. It is the thing that lives in a row of a data set. An edge is simply a visual display of connection between nodes. These networks are good for a number of things, but ultimately they are good at visualizing data sets that have connections inbetween themselves. For example, one could use a network visualization to show social networks, with each node being a person and the edge connecting different friends. These plots can be used beyong purely social connection, as they can show connection through city streets, evolution of species, chemical reactions, etc. Anonymous Crane, for this report, used network visualization to show connection between artists on different Spotify created playlists.
1. State 3 things that are strong about their report and 2 things that can be improved.
I think for the most part the visualization were pretty appealing to look at. They were clean with distinct colors and labeled / addressed in all settings. Anonymous Crane did a good job describing their data and how they wrangled it. It was clear how they were setting it up and what it was for. They also do a good job of demonstrating the validity of the findings. The histograms, tables, and hypothesis testing does a good job of cementing the findings of their data. One thing that could have been changed, I think, is the qualifier for marking a band white vs. non white or male vs. non male. Currently their system was if any band had one member that was not white or male, they would be classified as the other. I don’t think this does the best job of establishing the band’s identity, as the band could have 4 white male members and one non male and be labeled as a female band, which seems untrue. This area of classification is rather rocky, so their simple qualification makes sense for this project, but still could have been adapted to make more sense. I also wish that they had provided some information with how Spotify recommends artists in cases other than playlists. I think it is a good example to analyze, but I think the report could have use more real life application or information to show how the analysis they provided applies to the real world and not just their analysis.

The World (of music) According to Spotify

Background

Data Description

Data Explorations / Methods and Initial Findings

Results and Conclusion

Appendix

Class Peer Reviews

Reuse