Using PocketSphinx to accept your voice commands

Sound is cool and speech is even cooler, but you'll also want to be able to communicate with your projects through voice commands. This section will show you how to add speech recognition to your robotic projects. This isn't nearly as simple as the speaking part, but thankfully, you have some significant help from the open source development community. You are going to download a set of capabilities named PocketSphinx, which will allow our project to listen to our commands.

The first step is downloading the PocketSphinx capabilities. Unfortunately, this is not quite as user-friendly as the espeak process, so follow along carefully. There are two possible ways to do this. If you have a keyboard, mouse, and display connected or want to connect via vncserver, you can do this graphically by performing the following steps:

  1. Go to the Sphinx website hosted by Carnegie Mellon University (CMU) at http://cmusphinx.sourceforge.net/. This is an open source project that provides you with the speech recognition software. With our smaller, embedded system, we will be using the PocketSphinx version of this code.
  2. You will need to download two pieces of software modules: sphinxbase and PocketSphinx. Select the Download option at the top of the page and then find the latest version of both of these packages. Download the .tar.gz version of these and move them to the /home/pi directory of your Raspberry Pi.

However, before you build these, you need two libraries. The first library is libasound2-dev. If you skipped the first two objectives of this chapter, you'll need to download them now using sudo apt-get install libasound2-dev. If you're unsure whether or not it's installed, try it again. The system will let you know if it's already installed.

The second of these libraries is a library called Bison. This is a general-purpose, open source parser that will be used by PocketSphinx. To get this package, type sudo apt-get install bison.

Another way to accomplish this is to use wget directly on the command-line prompt of Raspberry Pi. If you want to do it this way, perform the following steps:

  1. To use wget on your host machine, find the link to the file you wish to download. In this case, go to the Sphinx website hosted by Carnegie Mellon University at http://cmusphinx.sourceforge.net/. This is an open source project that provides you with the speech recognition software. With your smaller, embedded system, you will be using the PocketSphinx version of this code.
  2. You will need to download two pieces of software modules, namely sphinxbase and PocketSphinx. Select the Download option at the top of the page and then find the latest version of both of these packages. Right-click on the sphinxbase-0.8.tar.gz file (if 0.8 is the latest version) and select Copy Link Location. Now open a PuTTY window in Raspberry Pi, and after logging in, type wget and paste the link you just copied. This will download the .tar.gz version of sphinxbase. Now follow the same procedure with the latest version of PocketSphinx.

Before you build these, you need two libraries. The first library is libasound2-dev. If you skipped the first two objectives of this chapter, you'll need to download it now using sudo apt-get install libasound2-dev. If you're unsure whether or not it's installed, try it again. The system will let you know if it's already installed.

The second of these libraries is called Bison. This is a general purpose, open source parser that will be used by PocketSphinx. To get this package, type sudo apt-get install bison.

Once everything is installed and downloaded, you can build PocketSphinx. Firstly, your home directory, with the tar.gz files of both PocketSphinx and sphinxbase, should look as follows:

Using PocketSphinx to accept your voice commands

To unpack and build the sphinxbase module, type sudo tar –xzvf sphinx-base-0.y.tar.gz, where y is the version number; in our example, it is 8. This should unpack all the files from the archive into a directory named sphinxbase-0.8. Now type cd sphinxbase-0.8. Listing the files should show something like the following screenshot:

Using PocketSphinx to accept your voice commands

To build the application, start by issuing the command sudo ./configure --enable-fixed. This command will check that everything is ok with the system and then configure a build.

Now you are ready to actually build the sphinxbase code base. This is a two-step process, which is as follows:

  1. Type make and the system will build all the executable files.
  2. Type sudo make install and this will install all the executables onto the system.

Now we need to make the second part of the system: the PocketSphinx code itself. Go to the home directory and decompress and unarchive the code by typing tar -xzvf pocketsphinx-0.8.tar.gz. The files should now be unarchived, and we can now build the code. Installing these files is a three-step process as follows:

  1. Type cd in the PocketSphinx directory, and then type ./configure to see if we are ready to build the files.
  2. Type make and wait for a while for everything to build.
  3. Type sudo make install.

    Note

    Several possible additions to our library installations will be useful later if you are going to use your PocketSphinx capability with Python as a coding language. You can install Python-Dev using sudo apt-get install python-dev and Cython using sudo apt-get install cython. You can also choose to install pkg-config, a utility that can sometimes help deal with complex compiles. Install it using sudo apt-get install pkg-config.

Once the installation is complete, you'll need to let the system know where our files are. To do this, you will need to edit the /etc/ld.so.conf path as the root by typing sudo emacs /etc/ld.so.conf. You will add the last line to the file, so it should now look like the following screenshot:

Using PocketSphinx to accept your voice commands

Now type sudo /sbin/ldconfig, and the system will now be aware of your PocketSphinx libraries.

Everything is installed, so you can now try our speech recognition. Type cd in the /home/pi/pocketsphinx-0.8/src/programs directory to try a demo program; then type pocketsphinx_continuous. This program takes input from the microphone and turns it into speech. After running the command, you'll get a lot of irrelevant information, and then you will see the following screenshot:

Using PocketSphinx to accept your voice commands

The INFO and Warning statements come from the C or C++ code and are there for debugging purposes. Initially, they will warn you that they cannot find your Mic and Capture elements, but when they find them, they will print out READY..... If you have set things up as previously described, you should be ready to give them a command. Say "hello" into the microphone. When they sense that you have stopped speaking, they will process your speech and give lots of irrelevant information again, but they should eventually show the commands in the following screenshot:

Using PocketSphinx to accept your voice commands

Notice the 000000000: hello command. It recognized your speech! You can try other words and phrases too. The system is very sensitive, so it may pick up background noise. You are also going to find that it is not very accurate. We'll deal with that in a moment. To stop the program, type cntrl-c.

There are two ways to make your voice recognition more accurate. One is to train the system to more accurately understand your voice. This is a bit complex and if you want to know more, go to the PocketSphinx website of CMU.

The second way to improve accuracy is to limit the number of words that your system uses to determine what you are saying. The default has literally thousands of word possibilities, so if two words are close, it may choose the wrong word. To avoid this, you can make your own grammar rules to restrict the words it has to choose from.

The first step is to create a file with the words or phrases that you want the system to recognize. Then, you use a web tool to create two files that the system will use to define our grammar. I'll do this through the vncserver command because I'll need to use a web browser on Raspberry Pi to turn a text file into a set of grammar files. Begin by editing a file; type emacs grammar.txt and insert the text as shown in the following screenshot:

Using PocketSphinx to accept your voice commands

Now you must use the CMU web tool to turn this file into two files that the system can use to define its dictionary. On my system, I have already installed Firefox using sudo apt-get install firefox. So, now I can open a web browser window and go to http://www.speech.cs.cmu.edu/tools/lmtool-new.html. If you hit the Browse button, you can find and select the file. It should look something like the following screenshot:

Using PocketSphinx to accept your voice commands

Open the grammar.txt file; then, on the web page, select COMPILE KNOWLEDGE BASE, and a window should pop up, as shown in the following screenshot:

Using PocketSphinx to accept your voice commands

You need to download the .tgz file created; in this case, the TAR1565.tgz file. This will download into your /home/pi/ directory. Move it to the /home/pi/pocketsphinx-0.8/src/programs directory and unarchive it using tar –xzvf and the filename.

Now you can invoke the pocketsphinx_continuous program to use this dictionary by typing ./pocketsphinx_continuous -lm 1565.lm -dict 1565.dic, and it will look in that directory to find matches to your commands.

You can also do this on your remote computer using Windows or Linux by creating the file in a text editor such as WordPad or Emacs. Once you have created the required grammar files, you can download them to your Raspberry Pi using WinSCP, if you are using Windows or scp from the command line, if you are using Linux.

Your system can now understand your specific set of commands! In the next section of this chapter, you'll learn how to use this input to have the project respond.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.56.45