Extracting frequency domain features

We discussed earlier how to convert a signal into the frequency domain. In most modern speech recognition systems, people use frequency-domain features. After you convert a signal into the frequency domain, you need to convert it into a usable form. Mel Frequency Cepstral Coefficients (MFCC) is a good way to do this. MFCC takes the power spectrum of a signal and then uses a combination of filter banks and discrete cosine transform to extract features. If you need a quick refresher, you can check out http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs. Make sure that the python_speech_features package is installed before you start. You can find the installation instructions at http://python-speech-features.readthedocs.org/en/latest. Let's take a look at how to extract MFCC features.

How to do it…

  1. Create a new Python file, and import the following packages:
    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.io import wavfile 
    from features import mfcc, logfbank
  2. Read the input_freq.wav input file that is already provided to you:
    # Read input sound file
    sampling_freq, audio = wavfile.read("input_freq.wav")
  3. Extract the MFCC and filter bank features:
    # Extract MFCC and Filter bank features
    mfcc_features = mfcc(audio, sampling_freq)
    filterbank_features = logfbank(audio, sampling_freq)
  4. Print the parameters to see how many windows were generated:
    # Print parameters
    print '
    MFCC:
    Number of windows =', mfcc_features.shape[0]
    print 'Length of each feature =', mfcc_features.shape[1]
    print '
    Filter bank:
    Number of windows =', filterbank_features.shape[0]
    print 'Length of each feature =', filterbank_features.shape[1]
  5. Let's visualize the MFCC features. We need to transform the matrix so that the time domain is horizontal:
    # Plot the features
    mfcc_features = mfcc_features.T
    plt.matshow(mfcc_features)
    plt.title('MFCC')
  6. Let's visualize the filter bank features. Again, we need to transform the matrix so that the time domain is horizontal:
    filterbank_features = filterbank_features.T
    plt.matshow(filterbank_features)
    plt.title('Filter bank')
    
    plt.show()
  7. The full code is in the extract_freq_features.py file. If you run this code, you will get the following figure for MFCC features:
    How to do it…
  8. The filter bank features will look like the following:
    How to do it…
  9. You will get the following output on your Terminal:
    How to do it…
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.67.16