Music Auto Tracker pt 2: Making the Ground Truth

19 October 2019

In the last post, we turned all the .mp3 files from our dataset into spectrograms. However, we’ll also need a ground truth for our images so that the neural network can sort out what images are what chords. Luckily, our original dataset includes a .jams file for each mp3 file, and we can use those pretty easily to make the ground truths. I toyed around in my head with a few different ways of labeling the data. The first and easiest one is to sort each image into folders by chord, so that all the A chords are in one folder and all the G chords are in another, etc. However, there may be chord switches in the middle of a spectrogram, so we may want to apply two different labels to the same spectrogram. So I ended up doing a csv file, where the first column is the file name and the rest of the columns are the chords.

Much of the actual code is just fiddling around with data. First you get the file name sorted out, then you get the chord data from the .jams file as an array, then you match the times from the chord data to the chords being played. The CSV looks something like this:

You can see one possible issue in this, which is that the chord data may not match up with what was actually being played. For example, one of them said that the chord was held for 7 seconds; however, if you actually look at the spectrograms they may not actually be playing because holding a chord for 7 seconds is hard. This shouldn’t be too much of an issue since I screened for the onsets of chords, but it may be relevant later in the project.

Another thing this brings up is another way to screen the spectrograms. Rather than use the change in sound energy to detect note onsets, we can just use the jams files to tell us where the notes are being played. We may revisit this if the neural network isn’t working as well as we hope.

Jupyter Notebook Download