The microphone array is the heart of Kinect audio. Before we start talking about what a microphone array is and how it works, let's first have a quick look at what the major challenges and focus area of Kinect audio were.
The major focus area of Kinect's audio processing was human speech recognition and recognizing the voice of the players when they are moving around and are in different positions.
To overcome all these problems and to provide one of the best solutions for speech recognition, Kinect has a microphone array to capture the voice and deal with high-quality sounds.
The Kinect sensor has four microphones; three of them are on the right side and the other one on the left. The following screenshot shows how the microphones are positioned and the distance between the microphones within the Kinect device:
The logic behind placing microphones in different places is to identify the following:
As all the microphones are placed in different positions, the sound will arrive at each of the microphones at different time intervals, which means there should be some delay in sound reception between each microphone. By this, the Kinect sensor can understand the direction from which the sound is coming. Kinect is also intelligent enough to calculate the approximate distance based on the wave and the time difference of sound from the actual source, similar to how our ears and brain work.
Having a longer distance (149 mm) between the first microphone and the second one allows the Kinect to see a difference between the inputs in terms of delay of sound as well as calculating the sound source's direction. There is a delay of the duration of up to seven samples between the left-most microphone and the next microphone, whereas there is a little less delay for the other three microphones.
18.116.40.53