Real-time speech recognition using Pocket Sphinx, GStreamer, and Python in Ubuntu 14.04.2

The following is the code for real-time speech recognition using GStreamer:

#!/usr/bin/env python

#The following modules need to import before handling gstreamer API's

import gobject
import sys
import pygst
pygst.require('0.10')
gobject.threads_init()
import gst

#Module to handle keyboard interrupt signal
import signal

#Keyboard signal handling routine
def signal_handle(signal, frame):
  print "You pressed Ctrl+C"
  sys.exit(0)


#Implementation of Speech recognition class
class Speech_Recog(object):

  #Initializing gstreamer pipeline and pocket sphinx element
  def __init__(self):
    self.init_gst()

  #This function will initialize gstreamer pipeline
  def init_gst(self):
    #The following code create a gstreamer pipeline with pipeline description. The required descriptors needed for the code is given as parameters.
          self.pipeline = gst.parse_launch('gconfaudiosrc !audioconvert ! audioresample '
                     + '! vader name=vad auto-threshold=true '
                     + '! pocketsphinx name=asr ! fakesink')
    #Accessing pocket sphinx element from gstreamer pipeline
          asr = self.pipeline.get_by_name('asr')
    #Connecting to asr_result function when a speech to text conversion is completed
          asr.connect('result', self.asr_result)

      #User can mention lm and dict for accurate detection
      #asr.set_property('lm', '/home/user/mylanguagemodel.lm')
      #asr.set_property('dict','/home/user/mylanguagemodel.dic')

    #This option will set all options are configured well and can start recognition
          asr.set_property('configured', True)
    
    #Pausing the GStreamer pipeline at first.
          self.pipeline.set_state(gst.STATE_PAUSED)

      #Definition of asr_result
    def asr_result(self, asr, text, uttid):
      #Printing the detected text
      print "Detected Text=>    ",text

    #This function will start/stop Speech recognition operation
    def start_recognition(self):
      #VADER - Voice Activity DEtectoR, which helps when the speech start and when its ends. Creating VADER object and set the property silent to False, so no speech will detected until key press
            vader = self.pipeline.get_by_name('vad')
            vader.set_property('silent', False)
      #Waiting for a key press to start recognition
      raw_input("Press any key to start recognition:>")
      #Start playing the pipeline
      self.pipeline.set_state(gst.STATE_PLAYING)

      #Waiting for stopping the recognition
      raw_input("Press any key to stop recognition:>")
            vader = self.pipeline.get_by_name('vad')
      #Setting silent property of VADER to True
            vader.set_property('silent', True)
      #Pausing GStreamer pipeline
          self.pipeline.set_state(gst.STATE_PAUSED)


if __name__ == "__main__":

  #Creating an object of Speech_Recog() class
  app_object = Speech_Recog()


  #Assign keyboard interrupt handler
  signal.signal(signal.SIGINT, signal_handle)

  while True:

    #Calling Speech recognition routine
    app_object.start_recognition()

The code can be simply executed using the following command:

$ python <code_name.py>

The following is the screenshot of the output window:

Real-time speech recognition using Pocket Sphinx, GStreamer, and Python in Ubuntu 14.04.2

Press any key to start recognition; after this, we can talk and it will be converted and printed on the terminal window. To stop detection, press any key and it will pause the GStreamer pipeline.

One of the other speech recognition tool is Julius. We will see how to install it and work with it using Python.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.125.219