Fetching the music data

We will use the GTZAN dataset, which is frequently used to benchmark music genre classification tasks. It is organized into 10 distinct genres, of which we will use only six for the sake of simplicity: classical, jazz, country, pop, rock, and metal. The dataset contains the first 30 seconds of 100 songs per genre. We can download the dataset at http://opihi.cs.uvic.ca/sound/genres.tar.gz. The tracks are recorded at 22,050 Hz (22,050 readings per second) mono in the WAV format.

Converting into a wave format

Sure enough, if we would want to test our classifier later on our private MP3 collection, we would not be able to extract much meaning. This is because MP3 is a lossy music compression format that cuts out parts that the human ear cannot perceive. This is nice for storing because with MP3, you can fit ten times as many songs on your device. For our endeavor, however, it is not so nice. For classification, we will have an easier time with WAV files, so we will have to convert our MP3 files in case we would want to use them with our classifier.

Tip

In case you don't have a conversion tool nearby, you might want to check out sox: http://sox.sourceforge.net. It claims to be the Swiss Army Knife of sound processing, and we agree with this bold claim.

One advantage of having all our music files in the WAV format is that it is directly readable by the SciPy toolkit:

>>> sample_rate, X = scipy.io.wavfile.read(wave_filename)

Here, X contains the samples and sample_rate is the rate at which they were taken. Let us use this information to peek into some music files to get a first impression of what the data looks like.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.78.137