Building a speech recognizer

We need a database of speech files to build our speech recognizer. We will use the database available at https://code.google.com/archive/p/hmm-speech-recognition/downloads. This contains seven different words, where each word has 15 audio files associated with it. This is a small dataset, but this is sufficient to understand how to build a speech recognizer that can recognize seven different words. We need to build an HMM model for each class. When we want to identify the word in a new input file, we need to run all the models on this file and pick the one with the best score. We will use the HMM class that we built in the previous recipe.

How to do it…

  1. Create a new Python file, and import the following packages:
    import os
    import argparse 
    
    import numpy as np
    from scipy.io import wavfile 
    from hmmlearn import hmm
    from features import mfcc
  2. Define a function to parse the input arguments in the command line:
    # Function to parse input arguments
    def build_arg_parser():
        parser = argparse.ArgumentParser(description='Trains the HMM classifier')
        parser.add_argument("--input-folder", dest="input_folder", required=True,
                help="Input folder containing the audio files in subfolders")
        return parser
  3. Define the main function, and parse the input arguments:
    if __name__=='__main__':
        args = build_arg_parser().parse_args()
        input_folder = args.input_folder
  4. Initiate the variable that will hold all the HMM models:
        hmm_models = []
  5. Parse the input directory that contains all the database's audio files:
        # Parse the input directory
        for dirname in os.listdir(input_folder):
  6. Extract the name of the subfolder:
            # Get the name of the subfolder 
            subfolder = os.path.join(input_folder, dirname)
    
            if not os.path.isdir(subfolder): 
                continue
  7. The name of the subfolder is the label of this class. Extract it using the following:
            # Extract the label
            label = subfolder[subfolder.rfind('/') + 1:]
  8. Initialize the variables for training:
            # Initialize variables
            X = np.array([])
            y_words = []
  9. Iterate through the list of audio files in each subfolder:
            # Iterate through the audio files (leaving 1 file for testing in each class)
            for filename in [x for x in os.listdir(subfolder) if x.endswith('.wav')][:-1]:
  10. Read each audio file, as follows:
                # Read the input file
                filepath = os.path.join(subfolder, filename)
                sampling_freq, audio = wavfile.read(filepath)
  11. Extract the MFCC features:
                # Extract MFCC features
                mfcc_features = mfcc(audio, sampling_freq)
  12. Keep appending this to the X variable:
                # Append to the variable X
                if len(X) == 0:
                    X = mfcc_features
                else:
                    X = np.append(X, mfcc_features, axis=0)
  13. Append the corresponding label too:
                # Append the label
                y_words.append(label)
  14. Once you have extracted features from all the files in the current class, train and save the HMM model. As HMM is a generative model for unsupervised learning, we don't need labels to build HMM models for each class. We explicitly assume that separate HMM models will be built for each class:
            # Train and save HMM model
            hmm_trainer = HMMTrainer()
            hmm_trainer.train(X)
            hmm_models.append((hmm_trainer, label))
            hmm_trainer = None
  15. Get a list of test files that were not used for training:
        # Test files
        input_files = [
                'data/pineapple/pineapple15.wav',
                'data/orange/orange15.wav',
                'data/apple/apple15.wav',
                'data/kiwi/kiwi15.wav'
                ]
  16. Parse the input files, as follows:
        # Classify input data
        for input_file in input_files:
  17. Read in each audio file:
            # Read input file
            sampling_freq, audio = wavfile.read(input_file)
  18. Extract the MFCC features:
            # Extract MFCC features
            mfcc_features = mfcc(audio, sampling_freq)
  19. Define variables to store the maximum score and the output label:
            # Define variables
            max_score = None
            output_label = None
  20. Iterate through all the models and run the input file through each of them:
            # Iterate through all HMM models and pick 
            # the one with the highest score
            for item in hmm_models:
                hmm_model, label = item
  21. Extract the score and store the maximum score:
                score = hmm_model.get_score(mfcc_features)
                if score > max_score:
                    max_score = score
                    output_label = label
  22. Print the true and predicted labels:
            # Print the output
            print "
    True:", input_file[input_file.find('/')+1:input_file.rfind('/')]
            print "Predicted:", output_label 
  23. The full code is in the speech_recognizer.py file. If you run this code, you will see the following on your Terminal:
    How to do it…
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.125.2