Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Building a speech recognizer

We need a database of speech files to build our speech recognizer. We will use the database available at https://code.google.com/archive/p/hmm-speech-recognition/downloads. This contains seven different words, where each word has 15 audio files associated with it. This is a small dataset, but this is sufficient to understand how to build a speech recognizer that can recognize seven different words. We need to build an HMM model for each class. When we want to identify the word in a new input file, we need to run all the models on this file and pick the one with the best score. We will use the HMM class that we built in the previous recipe.

How to do it…

Create a new Python file, and import the following packages:

import os
import argparse 

import numpy as np
from scipy.io import wavfile 
from hmmlearn import hmm
from features import mfcc

Define a function to parse the input arguments in the command line:

# Function to parse input arguments
def build_arg_parser():
    parser = argparse.ArgumentParser(description='Trains the HMM classifier')
    parser.add_argument("--input-folder", dest="input_folder", required=True,
            help="Input folder containing the audio files in subfolders")
    return parser

Define the main function, and parse the input arguments:

if __name__=='__main__':
    args = build_arg_parser().parse_args()
    input_folder = args.input_folder

Initiate the variable that will hold all the HMM models:
```
    hmm_models = []
```

Parse the input directory that contains all the database's audio files:

    # Parse the input directory
    for dirname in os.listdir(input_folder):

Extract the name of the subfolder:

        # Get the name of the subfolder 
        subfolder = os.path.join(input_folder, dirname)

        if not os.path.isdir(subfolder): 
            continue

The name of the subfolder is the label of this class. Extract it using the following:

        # Extract the label
        label = subfolder[subfolder.rfind('/') + 1:]

Initialize the variables for training:

        # Initialize variables
        X = np.array([])
        y_words = []

Iterate through the list of audio files in each subfolder:

        # Iterate through the audio files (leaving 1 file for testing in each class)
        for filename in [x for x in os.listdir(subfolder) if x.endswith('.wav')][:-1]:

Read each audio file, as follows:

            # Read the input file
            filepath = os.path.join(subfolder, filename)
            sampling_freq, audio = wavfile.read(filepath)

Extract the MFCC features:

            # Extract MFCC features
            mfcc_features = mfcc(audio, sampling_freq)

Keep appending this to the X variable:

            # Append to the variable X
            if len(X) == 0:
                X = mfcc_features
            else:
                X = np.append(X, mfcc_features, axis=0)

Append the corresponding label too:

            # Append the label
            y_words.append(label)

Once you have extracted features from all the files in the current class, train and save the HMM model. As HMM is a generative model for unsupervised learning, we don't need labels to build HMM models for each class. We explicitly assume that separate HMM models will be built for each class:
```
        # Train and save HMM model
        hmm_trainer = HMMTrainer()
        hmm_trainer.train(X)
        hmm_models.append((hmm_trainer, label))
        hmm_trainer = None
```

Get a list of test files that were not used for training:

    # Test files
    input_files = [
            'data/pineapple/pineapple15.wav',
            'data/orange/orange15.wav',
            'data/apple/apple15.wav',
            'data/kiwi/kiwi15.wav'
            ]

Parse the input files, as follows:

    # Classify input data
    for input_file in input_files:

Read in each audio file:

        # Read input file
        sampling_freq, audio = wavfile.read(input_file)

Extract the MFCC features:

        # Extract MFCC features
        mfcc_features = mfcc(audio, sampling_freq)

Define variables to store the maximum score and the output label:

        # Define variables
        max_score = None
        output_label = None

Iterate through all the models and run the input file through each of them:

        # Iterate through all HMM models and pick 
        # the one with the highest score
        for item in hmm_models:
            hmm_model, label = item

Extract the score and store the maximum score:

            score = hmm_model.get_score(mfcc_features)
            if score > max_score:
                max_score = score
                output_label = label

Print the true and predicted labels:

        # Print the output
        print "
True:", input_file[input_file.find('/')+1:input_file.rfind('/')]
        print "Predicted:", output_label

The full code is in the speech_recognizer.py file. If you run this code, you will see the following on your Terminal:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Building a speech recognizer

Create new playlist

Sign In

Sign Up

Building a speech recognizer

How to do it…

Table of Contents for
Building a speech recognizer