Chapter 6. Exploring Sound

From Wikipedia: hearing, auditory perception, or audition is the ability to perceive sound by detecting vibrations, changes in the pressure of the surrounding medium through time, through an organ such as the ear. Like touch, audition requires sensitivity to the movement of molecules in the world outside the organism. Both hearing and touch are types of mechanosensation.

Introduction

This chapter contains a lot of short examples and code snippets for doing various audio processing features, including installing audio programs, playing music files, recording and processing audio streams, performing speech recognition, and performing text-to-speech. A lot of these examples can be combined to build full end-to-end audio systems and voice-controlled electronics. A quick example of a voice controlled electronic system is given at the end of this chapter.

Materials List

There are only two parts required for this chapter: parts 1 and 2 below. As an alternative for part 1, you can use parts 3 and 4 in conjunction:

  1. A Linux-compatible USB headset

    I use the Logitech ClearChat Comfort/USB Headset H390. It’s fairly cheap at $25 on Amazon and has a mute button that I find comes in handy. However, any Linux-compatible USB headset should work well with Edison, and cheaper ones are available.

  2. A power supply

    Same requirements as those listed in Chapter 5 (see “Materials List”).

  3. A USB-to-3.5 mm speaker/headphone and microphone jack

    If you prefer to use your (sure to be) already existent headphones, then you can buy USB to 3.5 mm jack converters. Good examples are the Plugable USB Audio Adapter with 3.5mm Speaker/Headphone and Microphone Jacks and the iLuv USB Audio Adapter.

  4. A 3.5 mm microphone

    Assuming you already have 3.5 mm headphones, you’ll also need a 3.5 mm jack microphone to plug into your USB adapter.

Connecting a Headset

The first step in exploring sound is being able to record sounds and listen to recorded sounds. Whereas in Chapter 5, you started video processing by pulling images from the Internet, here you’ll start right away by connecting a headset.

Make sure the switch in between the microUSB and USB slot is flipped toward the USB port and plug in your USB headset. If you have the same (or similar) Logitech that I’m using, the red LED on the mute button will start blinking after a few seconds. This is an indication that the headset is receiving power from Edison and that the microphone is muted. If you push the button, the LED will turn solid red, indicating that the microphone is now active.

As a first step, you need to tell Edison to use your USB headset as the default system device for sound. You can tell how Edison references your sound card in the hardware with the aplay command:

# aplay -Ll | tail -5
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 2: Headset [Logitech USB Headset], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

Your device should appear as card 2 and will be called by name. The important piece is what comes after the colon and before the opening square bracket. This is the name by which you can reference your headset sound card; in this case, it’s simply Headset.

Use your favorite text editor to open the /etc/asound.conf file. Add the following line to the top, save it, and then close the file:

pcm.!default sysdefault:Headset

Remember to replace Headset with whatever device name you discovered using the aplay command.

Playing and Recording Sounds

Install the alsa utilities for sounds and the text-to-speech engine espeak:

# opkg install alsa-utils espeak

OPKG Installing

Occasionally you’ll get errors using the opkg installer, most of which can be solved by updating opkg:

#  opkg update

Alsa automatically installs some wave files for testing your headset. You can play one of these to make sure your audio is configured properly:

# aplay /usr/share/sounds/alsa/Front_Center.wav

If everything is installed properly, you should hear the words “Front, Center” in a nice-sounding woman’s voice. Let’s switch that to a creepy robot voice by using espeak!

# espeak "Front, Center"

If you just type the command espeak, you can issue creepy robot text words until you terminate with Ctrl+C:

# espeak
front
center
hello
whatever
^C
#

The espeak program has a lot of fun options to play around with, from intonation to speech speed and even language! You can see the full list by calling the help dialogue (the same flag works for aplay and arecord and many other command-line programs):

# espeak -h

To give you a feel for it, try asking, “How are you?” in fast-paced, high-pitched Swedish:

# espeak -v sw "Hur mor du?" -p 99 -s 250

You can also record your own voice using the arecord command and then play it back using aplay. Without any specified arguments, arecord will record forever. You can kill it with Ctrl+C:

# arecord hello.wav
Recording WAVE 'hello.wav' : Unsigned 8 bit, Rate 8000 Hz, Mono
^CAborted by signal Interrupt...
# aplay hello.wav
Playing WAVE 'hello.wav' : Unsigned 8 bit, Rate 8000 Hz, Mono

Makeshift MP3 Player

Remember how awesome the iPod was when it came out? You can turn Edison into your very own iPod quite easily.

Form Factor

If you’d like to turn Edison into a small form-factor mp3 player, one great setup option is the SparkFun Base and Battery Blocks. From here, you can tack on the OLED Block for display controls or the microSD Block for increased storage capacity. If you’d like your music player to last for a really long time, a cute trick is to connect a cell phone battery pack to one of the microUSBs instead of using the Battery Block. These have a much higher battery capacity.

First, install the mpg123 library for playing mp3s:

# opkg install mpg123

Next, just for this example, you’ll pull some songs from LastFM’s free download collection:

# wget http://freedownloads.last.fm/download/59565166/From%2BEmbrace%2BTo%2BEmbrace.mp3
# wget http://freedownloads.last.fm/download/569330114/Lost%2BBoys.mp3
# wget http://freedownloads.last.fm/download/569264057/Get%2BGot.mp3

The downloaded filenames are shown here:

#  ls *.mp3
From+Embrace+To+Embrace.mp3  Lost+Boys.mp3
Get+Got.mp3

You can play any one of these files with the mpg123 command and quit with Ctrl+C:

# mpg123 Get+Got.mp3

You can play all the mp3 files on shuffle with the -Z flag and the wildcard *:

# mpg123 –Z *.mp3

You can create playlist for these files using a simple text document. Create a file called playlist.txt with the following contents. Remember to replace /home/root/ with the complete paths to the files on your own system:

/home/root/Get+Got.mp3
/home/root/From+Embrace+To+Embrace.mp3
/home/root/Lost+Boys.mp3

Now play your playlist with mpg123:

# mpg123 -@ playlist.txt

You’ve just created an iPod! You can even use your SPI screen to display the info for each song.

Recording Audio with Python

Let’s do something a little more computationally savvy with our audio. You’ll use Python to interface with the audio stream and detect when you’re speaking into the mic and when you’re silent.

You’ll use the pyaudio library for this. First, install the dependencies using opkg:

# opkg install libjack
# opkg install --nodeps jack-dev
# opkg install libportaudio-dev

Then, install pyaudio, making sure to set the flags so that pip trusts the unverified pyaudio library:

# pip install --allow-external pyaudio --allow-unverified pyaudio pyaudio

Basic Recording

Now, download a simple audio-recording file from GitHub, compliments of Mahmoud Abdrabo. This is a modified example from the pyaudio documentation:

# wget https://gist.githubusercontent.com/mabdrabo/8678538/raw/30e63a8c2ab78b516b13a180895308b8a4244ecf/sound_recorder.py

This will download sound_recorder.py. Open this file.

import pyaudio   1
import wave

FORMAT = pyaudio.paInt16   2
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "file.wav"

audio = pyaudio.PyAudio()   3

# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
                rate=RATE, input=True,
                frames_per_buffer=CHUNK)
print "recording..."
frames = []

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):  4
    data = stream.read(CHUNK)
    frames.append(data)
print "finished recording"


# stop Recording
stream.stop_stream()   5
stream.close()
audio.terminate()

waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')   6
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
1

The script first imports the necessary libraries: pyaudio to pull the audio stream, and wave to save the audio file.

2

Next, it sets the audio recording format by declaring variables for the log format (FORMAT), data rate (RATE), amount of data to read at a time (CHUNK), duration of recording (RECORD_SECONDS), and an output filename (WAVE_OUTPUT_FILENAME).

3

The script then creates a pyaudio audio stream and opens it with the given configuration. Before the script starts recording, it prints “recording…” and then creates an empty frame array.

4

Each time through the for loop, which iterates as many times as there are chunks to read in five seconds, the audio data that is read is tacked onto (or appended to) the frames array. When the loop finishes, “finished recording” is printed.

5

The audio stream is stopped and closed, and the pyaudio object is terminated.

6

The full five-second recorded audio is written to disk using wave functions. The file is opened and configured, the frames array is written out, and the file is closed.

Run the file and, making sure your mic is not muted, record yourself speaking in between the “recording…” and “finished recording” lines. Play your wonderful recording back to you using aplay:

# aplay file.wav

Thresholding

It’s great to just blanket record sounds, but it would be nice to make our system a bit smarter. We’re going to modify sound_recorder.py to help us do that.

First, add the following line at the end of the import statements:

import audioop

Now, in the for loop where you recorded the stream, add the print statement in between the the two other lines as shown here:

 data = stream.read(CHUNK)
 print audioop.rms(data,2)
 frames.append(data)

The audioop library contains a series of functions and classes to help you manipulate raw audio data. In this case, you’re taking the root mean square (RMS) of each audio chunk you read. The root mean square is calculated by squaring every data point in the sample, taking the mean of these values, and then taking the square root of that mean. You can think of it like the mean, except the RMS value puts a higher emphasis on outlier points.

The RMS value is a common way to detect speech and silence in audio streams. If even just a few points have higher volumes, the RMS value will be skewed way up. Save your modified sound_recorder.py file and run it again. This time, as the data is processed, the RMS values will be printed to the screen.

Run the script again with your mic on mute. You’ll notice that the printed values drop immediately down to low single-digit values. The audio stream is receiving basically 0 noise. Run it again with the mic unmuted but without speaking. You’ll probably see values in the 100s and 200s (if your house is approximately as noisy as mine). If you put the headset on and begin talking, you’ll notice that the values probably shoot up into the 1,000s. If you’re using a USB headset, one of the reasons this value shoots up so high is that your mic is directional, meaning it prefers sound to come from one direction as opposed to all over. In this case, it’s from the slot facing your mouth.

By choosing an adequate threshold, you can actually detect fairly well when people are speaking and when they’re silent. Try playing with this yourself. Open the sound_recorder.py script one more time, define a threshold for speech, and modify the loop as follows:

threshold = 800
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    rms = audioop.rms(data,2)
    if rms > threshold:
        print "You're speaking"
    frames.append(data)

Save and exit the file, and then run it. This time through, silence gets you nothing, but any time you speak, you should see the “You’re speaking” text. Feel free to choose a threshold that best suits your headset; it will definitely vary by model and even by fit. If you find yourself needing more time, modify the recording duration (RECORD_SECONDS variable) to better suit your needs.

Speech Recognition

If you think speech detection is cool, you’re in for a treat, because speech recognition blows it out of the water. Up until now, we’ve been doing everything offline, directly on Edison without the need for Internet. This example uses the Google Speech Recognition API and so needs an Internet connection to work. Make sure your Edison is on WiFi.

The Python library you’ll be using is SpeechRecognition. You’ll need to install a single dependency and pip install the library itself:

# opkg install flac-dev
# pip install SpeechRecognition

After installation, create a Python script called SpeechReconize.py with the following contents:

Note

This code comes almost directly from the examples on the SpeechRecognition page.

import speech_recognition as sr

r = sr.Recognizer()
print "Start Listening..."
with sr.Microphone() as source:
# use system default microphone as the
# audio source

    audio = r.listen(source)
    # listen for the first phrase and extract it
    # into audio data

print "Done Listening..."
try:
    print("You said: " + r.recognize(audio))
    # recognize speech using
    # Google Speech Recognition

except LookupError:
    # speech is unintelligible
    print("Could not understand audio")

Run the script now. You’ll see that the script hangs after it prints “Start Listening…” until you both start and finish speaking. When you finish, the script prints “Done Listening…”, hangs for another second or so, and then prints out the contents of your speech or the error message, “Could not understand audio.” If you play with this script a little bit, you’ll immediately see how powerful it is. The listen command is very good at capturing full phrases, and the speech classifier is very, very accurate. Essentially, you’ve just added Google Now to your Edison in less than 20 lines of code.

It’s worth noting that this example uses pyAudio to handle the audio stream from the mic. The thresholding and audio stream capture is done in much in the same way as you did manually in the previous example, but this time you’ve got a fancy library to do it all in the background!

This does raise an interesting point though. In the last exercise, I told you that ambient noise and speech volume would vary with not only the environment but also with the microphone. You might wonder how the SpeechRecognition library works so well just straight off the bat. SpeechRecognition has some default settings that work pretty well for most normal situations. Having a nice, directional headset microphone makes the algorithm work even better. Think about how much more defined your voice is into this headset than it would be yelling at your phone on a New York street corner. However, if you are out on that New York street corner, you’re going to need a way to recalibrate as we did prevoiusly. All you need to do is add a single line to the script, directly above the listen command:

with sr.Microphone() as source:
     # use system default microphone
     # as the audio source
    r.adjust_for_ambient_noise(source)
     # adjust thresholding
    audio = r.listen(source)
     # listen for the first phrase                                                 # and extract it into audio data

This adjust_for_ambient_noise function listens for one second to the audio stream and uses that to calibrate the threshold for ambient noise. Now, in theory, your recognition should be good in almost any environment.

Controlling Devices

There’s one last example in this book, and it’s using speech recognition to control peripheral devices, in this case your SPI screen. Connect the SPI screen to the Arduino breakout board now. You’ll use the SpeechRecognition library for parsing speech and the ILI9341 library to drive the SPI screen. The script will take in your speech, parse it, and then use it to drive the color of the screen.

I’ve posted a quick example of this system to GitHub. In order for this example to work as is, you have to download it into the EdisonILI9341 directory or wherever your ILI9341.py file resides if you moved it. Change into that directory now, then pull the script using wget:

# wget https://raw.githubusercontent.com/smoyerman/VoiceControlledScreen/master/ScreenControl.py

Open this file in a text editor. It should look fairly similar to the previous example, but expanded:

import speech_recognition as sr   1
import ILI9341

# Construct screen and speech recognizer
disp = ILI9341.ILI9341()   2
disp.begin()
r = sr.Recognizer()

# Hard-coded colors
red = (255,0,0)   3
green = (0,255,0)
blue = (0,0,255)
white = (255,255,255)
black = (0,0,0)
puple = (100,0,100)

# Loop and listen 10 times
for i in range(10):   4

    # Listen for audio each time
    print "Start Listening..."
    with sr.Microphone() as source:
        # Only check for ambient noise the first time
        if i == 0:
            r.adjust_for_ambient_noise(source)
        audio = r.listen(source)
    print "Done Listening..."

    try:
        # Parse text
        speechString = r.recognize(audio)   5
        print("You said " + speechString)
        speechArray = speechString.split()   6
        # Check for colors
        if "red" in speechArray: color = red
        elif "blue" in speechArray: color = blue
        elif "green" in speechArray: color = green
        elif "purple" in speechArray: color = purple
        elif "white" in speechArray: color = white
        elif "black" in speechArray: color = black
        # Display to screen
        disp.clear(color)   7
        disp.display()

    # Check for error
    except LookupError:
        print("Could not understand audio")

There are several new components to this script:

1

At the top, the script imports ILI9341.

2

It declares the screen object and initializes the screen.

3

After initializing the speech recognition object, colors are hardcoded as different RGB tuples.

4

The speech recognition is then placed in a loop that will run 10 times. Each time, we process audio the same way as in the previous example.

5

When the processing is finished, we store the output as the variable speechString.

6

The speechString variable is then split into an array, so “An expression like this” becomes ["An”,"expression”,“like”,"this"]. This is to keep words like “bluebird” from triggering the phrase blue. By splitting it into an array, the script is forcing an exact match. The if and elif statements then perform the color keyword searching.

7

Finally, disp.clear(color) sets the color, and disp.display() renders it to the screen.

Run the code, and you’ll see the screen change with your voice commands:

# python ScreenControl.py
Start Listening...
Done Listening...
You said I see a red door  <-- screen turns red
Start Listening...
Done Listening...
You said and I want to paint it black  <-- screen turns black
Start Listening...

if and elif

The way the code is written is like going down a queue. The code will only check for the next keyword if all other conditions before it have not been met. This is the nature of elif. Therefore, the expressions “red is black” and “black is red” will both do the same thing: render the screen red.

Going Further

There are two main areas of study for going further in this chapter: digital signal processing, which encompasses audio signals, and natural language processing, which covers topics like automatic speech recognition and speech to text:

Digital Signal Processing

A free wikibook starting with the basics of digital signals and moving up into some very complicated territory. It contains a lot of hands-on exercises in Matlab and its free counterpart, Octave. Many other free online resources such as this one exist; do a quick Google search to find which works best for you.

Natural Language Processing with Python

A hands-on text with a lot of examples. Note that this book can get a little heavy on theory and upper-level mathematics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.61.30