Kinect microphone array

The microphone array is the heart of Kinect audio. Before we start talking about what a microphone array is and how it works, let's first have a quick look at what the major challenges and focus area of Kinect audio were.

The major focus area of Kinect audio

The major focus area of Kinect's audio processing was human speech recognition and recognizing the voice of the players when they are moving around and are in different positions.

  • The first challenge was identifying audio with loud sound. Consider a situation where you are playing a game and a loud sound is coming from some different source, such as a TV. That creates difficulty in recognizing the player's voice because of the loud sound as well as the echo and noises in the room.
  • The second major challenge was to identify the speech within a dynamic range of area. While playing, the player could change his position, or multiple players could be speaking from different directions.

To overcome all these problems and to provide one of the best solutions for speech recognition, Kinect has a microphone array to capture the voice and deal with high-quality sounds.

Why microphone array

The Kinect sensor has four microphones; three of them are on the right side and the other one on the left. The following screenshot shows how the microphones are positioned and the distance between the microphones within the Kinect device:

Why microphone array

The logic behind placing microphones in different places is to identify the following:

  • The origin of the sound
  • The direction of the incoming sound

As all the microphones are placed in different positions, the sound will arrive at each of the microphones at different time intervals, which means there should be some delay in sound reception between each microphone. By this, the Kinect sensor can understand the direction from which the sound is coming. Kinect is also intelligent enough to calculate the approximate distance based on the wave and the time difference of sound from the actual source, similar to how our ears and brain work.

Note

Having a longer distance (149 mm) between the first microphone and the second one allows the Kinect to see a difference between the inputs in terms of delay of sound as well as calculating the sound source's direction. There is a delay of the duration of up to seven samples between the left-most microphone and the next microphone, whereas there is a little less delay for the other three microphones.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.40.53