Audio stream data – recording and injecting

As stated previously, the Kinect Studio currently delivered by Microsoft does not support the tracking and injecting of the audio stream data.

In this appendix, we have attached a simple and primitive tool for recording the speech input and to submit it against the speech recognition engine and the grammar defined.

We encourage you to take the idea further and to realize a more complex and user-friendly Kinect Audio/Studio type of application.

The idea behind the tool is very simple. You can record your audio input as a .wav file and then inject it in to the speech recognition engine and debug/test the audio stream processing.

You may want to use a different .wav file and see how the speech engine recognition works against other people pronunciation or other environmental characteristics that differ from the one where you are currently testing your application. Have you ever thought of developing an application that is capturing commands from a song? Or what about building a chaos monkey (a small tool able to test the reliability of your application) type of test injecting a no-sense .wav file in to your application? How is the application reacting to that?

As you may remember, we enabled the speech recognition process in to Chapter 4, Speech Recognition, calling the key SetInputToAudioStream API of the SpeechRecognitionEngine class for processing the AudioSource streamed out from the KinectSensor (please refer to the following code snippet). This enabled our application to try recognizing all the speech inputs streamed in by the Kinect sensor:

speechEngine.SetInputToAudioStream(
    sensor.AudioSource.Start(), 
    new SpeechAudioFormatInfo
    (EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
         speechEngine.RecognizeAsync(RecognizeMode.Multiple);

The SpeechRecognitionEngine class provides the SetInputToWaveFile method too, which enables us to receive input from a .wav file. So we can load the .wav file we recorded in advance with the following code:

speechEngine.SetInputToWaveFile(“COMMAND_TO_TEST.WAV”);

The speech recognition process will be the very same one we saw in the previous chapter. In order to save the audio captured by the Kinect sensors we can utilize the Recorder class to save the audio stream inside a .wav file format:

    sealed class Recorder
    {   static byte[] buffer = new byte[4096];
        static bool isRecording;
        public static bool IsRecording
        {   get { return isRecording; }
            set { isRecording = value; }
        }

The data format of a wave audio stream is defined by the WAVEFORMATEX structure:

        struct WAVEFORMATEX
        {   public ushort   wFormatTag;
            public ushort   nChannels;
            public uint     nSamplesPerSec;
            public uint     nAvgBytesPerSec;
            public ushort   nBlockAlign;
            public ushort   wBitsPerSample;
            public ushort   cbSize;
        }

Note

More details on a structure’s members are explained in the Microsoft references at http://msdn.microsoft.com/en-us/library/windows/hardware/ff538799(v=vs.85).aspx.

A complete list of WAVE_FORMAT_XXX formats (WAVE_FORMAT_PCM for one or two channel PCM data) can be found in the Mmreg.h header file.

With the WriteWavHeader method we create the header of the .wav file:

// Support method utilized by WriteWavHeader method
        static void WriteString(Stream stream, string s)
        {   byte[] bytes = Encoding.ASCII.GetBytes(s);
            stream.Write(bytes, 0, bytes.Length);
        }

        public static void WriteWavHeader(Stream stream, int dataLength)
        {   using (MemoryStream memStream = new MemoryStream(64))
            {   int cbFormat = 18;
                WAVEFORMATEX format = new WAVEFORMATEX()
                {   wFormatTag = 1,
                    nChannels = 1,
                    nSamplesPerSec = 16000,
                    nAvgBytesPerSec = 32000,
                    nBlockAlign = 2,
                    wBitsPerSample = 16,
                    cbSize = 0
                };
 
                using (var bw = new BinaryWriter(memStream))
                {   WriteString(memStream, “RIFF”);
                    bw.Write(dataLength + cbFormat + 4);
                    WriteString(memStream, “WAVE”);
                    WriteString(memStream, “fmt “);
                    bw.Write(cbFormat);
                    bw.Write(format.wFormatTag);
                    bw.Write(format.nChannels);
                    bw.Write(format.nSamplesPerSec);
                    bw.Write(format.nAvgBytesPerSec);
                    bw.Write(format.nBlockAlign);
                    bw.Write(format.wBitsPerSample);
                    bw.Write(format.cbSize);
                    WriteString(memStream, “data”);
                    bw.Write(dataLength);
                    memStream.WriteTo(stream);
                }
             }}

The WriteWaveFile method converts the Kinect Audio source in the .wav file:

public static void WriteWavFile(KinectAudioSource sourceAudio, 
FileStream fileStream)
        {   var size = 0;
            //Write header
            WriteWavHeader(fileStream, size);
 
            using (var audioStream = sourceAudio.Start())
            {  while (audioStream.Read(buffer, 0, buffer.Length) > 0&& isRecording)
                {   fileStream.Write(buffer, 0, buffer.Length);
                    size += buffer.Length;
                }
                long prePosition = fileStream.Position;
                fileStream.Seek(0, SeekOrigin.Begin);
                WriteWavHeader(fileStream, size);
                fileStream.Seek(0, SeekOrigin.Begin);
                WriteWavHeader(fileStream, size);
                fileStream.Seek(prePosition, SeekOrigin.Begin);
                fileStream.Flush();
            }}
   }}

We recall the Recorder class inside our application simply by invoking the RecordAudio method:

private static object lockObject = new object();
private void RecordAudio()
        {
            lock (lockObject)
            {Recorder.IsRecording = true;
             using (var fileStream = new 
                   FileStream(“COMMAND.WAV”, FileMode.Create))
             {
              Recorder.WriteWavFile(this.sensor.AudioSource, fileStream);
             }
            }
         }

To make our WPF application responsive to the user input and able to record the audio data streamed in by the Kinect sensor, we need to use background workers. The following code snippet highlights how to define the background worker and to invoke the RecordAudio method as the activity to implement when the background worker executes its work. The complete code source is provided in the code attached to this appendix:

private BackgroundWorker bgW =
new System.ComponentModel.BackgroundWorker();
…
this.bgW.RunWorkerCompleted += backgroundWorker1_RunWorkerCompleted;
this. bgW.DoWork += backgroundWorker1_DoWork;
…
void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{      RecordAudio();  }
…
Recorder.IsRecording = true;
if (!this.backgroundWorker1.IsBusy)
    {
this.backgroundWorker1.RunWorkerAsync();
    }
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.168.56