Exploring Spotify API in R
Statistical hypothesis testing on audio features
One of the important aspect of a data scientist repertoire is their domain knowledge. And if you plan to work in the world of advertising, you have to know how to work with various web services and their APIs. In this blog, I will use spotifyr
package to pull track audio features and other information from Spotify’s Web API in bulk. Spotify is a great site to get data from because they have really unique indices to quantify music.
Spotify has several variables to quantify music. I will use two features - danceability and valence from the most popular music group of all time - The Beatles. Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). I want to test the hypotheses that the correlations between the pairwise set of variables is 0 and provide an 90% confidence interval.
# spotify developer credentials
Sys.setenv(SPOTIFY_CLIENT_ID = "f4b7863fa1504190b0d49939fc31a05f")
Sys.setenv(SPOTIFY_CLIENT_SECRET = "db2c21e185404604a02a5b8d683a6c03")
access_token <- get_spotify_access_token()
# pull audio features on the beatles
beatles <- get_artist_audio_features('the beatles')
# selected variables of choice
danceability <- as.data.frame(beatles$danceability)
valence <- as.data.frame(beatles$valence)
data <- cbind(danceability, valence)
Preliminary test to check the test assumptions - Is the covariation linear? Yes, form the plot below, the relationship is linear. The p-value of the pair is 1.912e-06
, which is more than the significance level alpha = 0.05. We can conclude that danceability and valence are not significantly correlated with a correlation coefficient of 0.3912615
.
fit <- lm(danceability ~ valence, data = beatles)
par(mar=c(1,1,1,1))
plot(data, main = "danceability vs valence")
abline(fit)
summary(fit)
##
## Call:
## lm(formula = danceability ~ valence, data = beatles)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.36885 -0.09990 0.00224 0.10957 0.30770
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.35917 0.03436 10.453 < 2e-16 ***
## valence 0.27467 0.05519 4.976 1.91e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1449 on 137 degrees of freedom
## Multiple R-squared: 0.1531, Adjusted R-squared: 0.1469
## F-statistic: 24.76 on 1 and 137 DF, p-value: 1.912e-06