We discussed earlier how to convert a signal into the frequency domain. In most modern speech recognition systems, people use frequency-domain features. After you convert a signal into the frequency domain, you need to convert it into a usable form. Mel Frequency Cepstral Coefficients (MFCC) is a good way to do this. MFCC takes the power spectrum of a signal and then uses a combination of filter banks and discrete cosine transform to extract features. If you need a quick refresher, you can check out http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs. Make sure that the python_speech_features
package is installed before you start. You can find the installation instructions at http://python-speech-features.readthedocs.org/en/latest. Let's take a look at how to extract MFCC features.
import numpy as np import matplotlib.pyplot as plt from scipy.io import wavfile from features import mfcc, logfbank
input_freq.wav
input file that is already provided to you:# Read input sound file sampling_freq, audio = wavfile.read("input_freq.wav")
# Extract MFCC and Filter bank features mfcc_features = mfcc(audio, sampling_freq) filterbank_features = logfbank(audio, sampling_freq)
# Print parameters print ' MFCC: Number of windows =', mfcc_features.shape[0] print 'Length of each feature =', mfcc_features.shape[1] print ' Filter bank: Number of windows =', filterbank_features.shape[0] print 'Length of each feature =', filterbank_features.shape[1]
# Plot the features mfcc_features = mfcc_features.T plt.matshow(mfcc_features) plt.title('MFCC')
filterbank_features = filterbank_features.T plt.matshow(filterbank_features) plt.title('Filter bank') plt.show()
extract_freq_features.py
file. If you run this code, you will get the following figure for MFCC features:18.217.67.16