Developing a music recommendation system from scratch

by Silas Stulz


The new semester already started and so have my new courses. I'm especially excited because this semester I can develop my own project from start to finish.

As you may know, I started my journey in machine learning a while ago, so this is the perfect opportunity to gain some real hands-on experience in the field. I chose to develop my own music recommendation system from scratch.

In this blog post, I will talk a little bit about the idea, what I have done so far and some of the goals.

In the upcoming weeks and months I will post regularly about the topics I'm working on and also will publish some tutorials on how to do things.

tl;dr: If you just want to check out the repo it's open-source on GitHub. Work in progress

The idea

The main principle is pretty simple. I want to develop a machine learning model that can simply classify if I like a song or dislike a song based on my previous preferences. Additionally, I added a few other functionalities the model could possibly do.

Post-processing playlists, that means deleting songs that I would definitely dislike. Create recommended playlists, similar to "Spotify Discover" Explore new genres and countries. That means searching for similar music like you generally like, but for example from another geographic region Now how does that work? I use an approach called content-based filtering. Basically, I look at Songs in their basic components and features (tempo, danceability, accousticness) and let my machine learning model decide, whether that song fits in my previous liked songs.

What I've done so far Created a dataset. I created a dataset containing currently about 500 songs, classified into two classes ("liked", "disliked"). I did this with the help of my classifying application, you can read more about it here:

Wrote scripts to get data from the Spotify API. Luckily, Spotify provides an API for all the parameters we need for each song (tempo, danceability, etc.). I created different scripts to fetch this data and integrate it into my existing dataset.

Cleaned the dataset. First, I defined the relevant parameters and prepared my dataset. After that, I had to generally clean and prepare it. This means shuffling the data, splitting into test and training set. Additionally, I had to undersample my dataset to balance out the classes.

Played around with different algorithms. There are a lot of algorithms to choose from (e.g RandomForest, SVC, SGD). As well as hyperparameter tuning.

Analyzed and compared the results. Finally, I made sense of the results. Calculated different metrics like accuracy, precision, and recall. Also, a confusion matrix to see what I guess correctly was very helpful.

What's coming next

There is a lot left to do. One of the main challenges is currently getting a searchable database of the 30 million songs available on Spotify. This would generally be needed to suggest new titles, but especially to discover new genres.

Of course, I am also looking for ways to improve my current model. Currently, I'm exploring the following options:

  • Adding more data. For example, going from undersampling to oversampling or gathering more data.
  • Feature engineering , creating additional features or get better at feature selection.
  • Try different algorithms. Try a deep neural network for example with Tensorflow and see if I can get better results.
  • Tune hyperparameters.


In this post, I wanted to tell you a little bit about the project I’m working on. The next posts will go more into detail on how to implement the different steps to get there.

More articles

You want to create your own NFT? Here’s how you do it on the Cardano blockchain

With the current hype around NFTs, you may think about creating your own. For those who are new to the crypto space, an NFT is a…

Read more

Improve your Psycopg2 executions for PostgreSQL in Python

Speeding up your PostgreSQL execute statements!

Read more

Let's talk about your project