How does Spotify know what I like?

Dondré O. Jordan
6 min readJun 26, 2020

“Music has a profound ability to alter the way we feel, boost energy levels, calm us down, and unite us with others. We’ve all experienced a time when a song brought us to tears, unified us with a group during a concert, or pushed us to run that extra block.”

If you typed into Google, “How does Spotify know what I like?”, the search results would probably be in the millions that all surround the simple answer I have learned at Lambda to call an algorithm or a calculated recipe, but we’ll get to algorithms a bit later. I’m sure many of my cohorts can agree that they have exhausted through their favorite playlists once or twice during their studies as data scientists. Some may use it as motivation during their morning ritual to get up at 8 in the morning and some may play it in the background while they study to offset the stresses of learning a new trade. For me, I like to think of it as a baby monitor. You know, the little square box that you keep with you all around the house and every time you listen to it there are little giggles of joy, screams from fear or for hunger, or just a calming silence. With such custom-curated playlists like Discovery Weekly and Made for You, and being the top leader in music streaming with over 200 million active users and over 50 million songs, I often ask myself, “How does Spotify measure and predict my palette for music? Now, I’m no expert, yet, but let’s take a look at some of my findings.

Thanks to Kaggle and Spotipy, a lightweight Python library for the Spotify Web API, I was able to gather a large random dataset consisting of approximately 10,000 songs spread across 26 of the main genres. After cleaning the data the total numbers of songs reached 232,752 and, more importantly, I had measurability for Spotify’s audio attributes. Each song (row) has values for artist name, track name, track id, and the audio features itself ranging from speechiness to predictions of whether or not a track contains vocals, predictions of danceability and energy and the valence, or the mood of a song. This was especially exciting for me as a musician to know my favorite song could also be translated to have some numerical significance.

Now, if you are following along in the notebook and wanted to see if your favorite song made the list, you will notice a basic code to search through the dataset by the song name.

Fig 1. Song Name Search Feature Code
Fig 2. Song (row) Featuress of Colt Ford’s Cover of “Shape of You”

“The feeling is not music but the experience it reminds us of.” -Sharone Houri, a Health Blogger and Existential Psychologist

According to Spotify’s Most Played of All-Time Playlist (updated weekly) Spotify suggests the song “Shape of You” to be a popularly well-known song around the world. Let’s use this song to give us some insight of how to read the data. Figure 3 (below) shows the chart-topping pop song “Shape of You” by Ed Sheeran. Based from the data, for almost four minutes you can expect a positive and mildly energizing song with a good sound balance. As a musician, I am keen note the C# minor keys usually contain “a passionate expression of sorrow and deep grief” and with the tempo as fast as someone walking through the park, I already can get a good sense at the music. Add more features and a couple thumbs up on my Spotify app on my phone and you now have a foundation for the recipe for your next curation.

Fig 3. Ed Sheeran’s “Shape of You” Audio Features

If you take all of those features and mapped them against each one another to see what possible relationships could there be, it would look something like this:

Fig 4. Spotify Data Set Correlation Matrix

Remember the algorithm will look at what degree these relationships might have based on the like and dislike and skips among many factors. To help build you that Discover Weekly Playlist to get you through build week, the computer might say to itself that it doesn’t predict you to listen to songs where the black meets black. For example, if figure 4 were a song playing on your phone based on your preferences, the probability of the next song having low metrics across “Acousticness”/“Loudness” and “Loudness”/ “Energy” may be greater. Of course, we all know you will probably skip that song too. Try not to forget, the more input you give with thumbs down, the more output you’ll receive with better-tailored songs.

Fig 5. Histograms

To dive deeper into my exploration, I created two different histograms. The plot on the left illustrates the distribution of songs based on their energy levels. The histogram, or “heatmap” on the right displays the numbers songs found at every value of valence and danceability.

I took another step further and created three different sized test variables (N=1000, 5000, or 100,000) to look for trends in features among the population of the dataset (Figure 6). Each time the code ran, I received a whole new set of data comprised of pop, movie, metal, soul, bluegrass, and even more. To my surprise, louder music trends positively when you factor in how hard it is to dance to a song.

Fig 6

What is important?

Have you ever been brought to having goosebumps right before an intense and climatic drop of a song only to hurry and grabb your phone just so you didn’t ruin those awesome tunes? I have and if you have ever been to a concert you will know what cognitive psychologist Steven Pinker means by his definition of music as “auditory cheesecake”. Pinker describes music as a “recreational cocktail adsorbed through the ears triggering multiple pleasure circuits simultaneously”, in which, I completely agree. Music is mildly addictive-like actual cheesecake. It is this exact “auditory cheesecake’ that the algorithm seeks.

“Use the properties of music to enhance your life.”

Although I may not fully comprehend the inner-workings of machine learning or neural networks, I can conclude is that the system simply boils down to two concepts: exploitation and exploration. Simply put, every action by users on Spotify to exploit information it already knows about you and it will explore the undiscovered similar like playlists and artists you enjoy listening to. So take heed the next time you press like on that track today.

Follow along and try using the song search feature in the notebook and type in a classic like Nirvana’s “Smells Like Teen Spirit”.

Reference Links:

--

--

Dondré O. Jordan

An inspiring full-stack data scientist and proud graduate of Lambda.