Speech recognition using Julius and Python in Ubuntu 14.04.2

In this section, we will see how to install the speech recognition system of Julius and how to connect it to Python. The required packages (such as Julius and audio tools) are available in Ubuntu's package manager, but we also need to download and install the Python wrapper separately. Let's start with the required components for the installation.

Installation of Julius speech recognizer and Python module

The following are the instructions to install Julius and Python binding in Ubuntu 14.04.2:

  • The following command will install the speech recognition system of Julius:
    $ sudo apt-get install julius
    
  • The following command will install padsp (the pulse audio tool). It may be necessary to run the Julius speech recognizer in Ubuntu 14.04.2:
    $ sudo apt-get install pulseaudio-utils
    
  • The following command will install the OSS proxy daemon to emulate the OSS sound device and stream through the ALSA device. It will emulate the /dev/dsp device in Ubuntu and stream through ALSA. Julius needs the /dev/dsp device for its functioning:
    $ sudo apt-get install osspd-alsa
    
  • Reload the ALSA process to bind osspd to alsa:
    $ sudo alsa force-reload
    

To install pyjulius, the Python extension for Julius, you need to install the setup tools in Python.

  1. To install the setup tools, the best option is to download a script from the setup tools website; it's a global script that can be used in any OS. The script can be downloaded from the following link using the wget tool:
    $ wget https://bootstrap.pypa.io/ez_setup.py
    $ sudo python ez_setup.py
    
  2. The installation details of setup tools is mentioned at https://pypi.python.org/pypi/setuptools
  3. After the installation of setup tools, download pyjulius from https://pypi.python.org/pypi/pyjulius/0.3
  4. Extract the achieve and installation package using the following command:
    $ sudo python setup.py install
    
  5. After the installation of pyjulius, install a demo of the Julius tool, which contains HMM, LM, and dictionary of a few words. Download the Julius quick-start files using the following command:
    $ wget http://www.repository.voxforge1.org/downloads/software/julius-3.5.2-quickstart-linux.tgz
    
  6. Extract the files and run the command from the folder.
  7. Execute the following command in the extracted folder. It will start the speech recognition in the command line:
    $ padsp julius -input mic -C julian.jconf
    
  8. To exit speech recognition, click on CTRL + C.
  9. To connect to Python, enter the following command:
    $ padsp julius -module -input mic -C julian.jconf
    

This command will start a Julius server. This server listens to clients. If we want to use Julius APIs from Python, we need to connect to a server using a client code as given in the following sections. The Python code is a client that connects to the Julius server and prints the recognized text.

Python-Julius client code

The following code is a Python client of the Julius speech recognition server that we started using the previous command. After connecting to this server, it will trigger speech-to-text conversion and fetch the converted text and print on terminal:

#!/usr/bin/env python
import sys

#Importing pujulius module
import pyjulius

#It an implementation of FIFO(First In First Out) queue suitable for multi threaded programming.
import Queue

# Initialize Julius Client object with localhost ip and default port of 10500 and trying to connect server.
client = pyjulius.Client('localhost', 10500)
try:
    client.connect()
#When the client runs before executing the server it will cause a connection error.
except pyjulius.ConnectionError:
    print 'Start julius as module first!'
    sys.exit(1)


# Start listening to the server
client.start()
try:
    while 1:
        try:
            #Fetching recognition result from server
        	   result = client.results.get(False)
        except Queue.Empty:
            continue
        print result
except KeyboardInterrupt:
    print 'Exiting...'
    client.stop()  # send the stop signal
    client.join()  # wait for the thread to die
    client.disconnect()  # disconnect from julius

After connecting to Julius server, the Python client will listen to server and print the output from the server.

The acoustic models we used in the preceding programs are already trained, but they may not give accurate results for our speech. To improve the accuracy in the previous speech recognition engines, we need to train new language and acoustic models and create a dictionary or we can adapt the existing language model using our voice. The method to improve accuracy is beyond the scope of this chapter, so some links to train or adapt both Pocket Sphinx and Julius are given.

Improving speech recognition accuracy in Pocket Sphinx and Julius

The following link is used to adapt the existing acoustic model to our voice for Pocket Sphinx:

http://cmusphinx.sourceforge.net/wiki/tutorialadapt

Julius accuracy can be improved by writing recognition grammar. The following link gives an idea about how to write recognition grammar in Julius:

http://julius.sourceforge.jp/en_index.php?q=en_grammar.html

In the next section, we will see how to connect Python and speech synthesis libraries. We will work with the eSpeak and Festival libraries here. These are two popular, free, and effective speech synthesizers available in all the OS platforms. There are precompiled binaries available in Ubuntu in the form of packages.

Setting up eSpeak and Festival in Ubuntu 14.04.2

eSpeak and Festival are speech synthesizers available in the Ubuntu/Linux platform. These applications can be installed from the software package repository of Ubuntu. The following are the instructions and commands to install these packages in Ubuntu.

  1. The following commands will install the eSpeak application and its wrapper for Python. We can use this wrapper in our program and access eSpeak APIs:
    $ sudo apt-get install espeak
    $ sudo apt-get install python-espeak
    
  2. The following command will install the Festival text-to-speech engine. Festival has some package dependencies; all dependencies will be automatically installed using this command:
    $ sudo apt-get install festival
    
  3. After the installation of the Festival application, we can download and install Python bindings for Festival.
  4. Download Python bindings using the following command. We need the svn tool (Apache Subversion) to download this package. Subversion is a free software versioning and revision control system:
    $ svn checkout http://pyfestival.googlecode.com/svn/trunk/ pyfestival-read-only
    
  5. After the downloading process is complete, switch to the pyfestival-read-only folder and you can install this package using the following command:
    $ sudo python setup.py install
    

Here is the code to work with Python and eSpeak. As you will see, it's very easy to work with Python binding for eSpeak. We need to write only two lines of code to synthesize speech using Python:

from espeak import espeak
espeak.synth("Hello World")

This code will import the eSpeak-Python wrapper module and call the synth function in the wrapper module. The synth function will synthesize the text given as argument.

The following code shows how to synthesize speech using Python and Festival:

import festival
festival.say("Hello World")

The preceding code will import the Festival-Python wrapper module and call the say function in the Festival module. It will synthesize the text as speech.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.118.95