Detecting simple actions

Let's see now how we can enhance our application and leverage the Kinect sensor's Natural User Interface (NUI) capabilities.

We implement a manager that, using the skeleton data, is able to interpret a body motion or a posture and translate the same to an action as "click". Similarly, we could create other actions as "zoom in". Unfortunately, the Kinect for Windows SDK does not provide APIs for recognizing gestures, so we need to develop our custom gesture recognition engine.

Gesture detection can be relatively simple or intensely complex depending on the gesture and the environment (image noise, scene with more users, and so on).

In literature there are many approaches for implementing gesture recognition, the most common ones are as follows:

  • A neural network that utilizes the weighted networks (Gestures and neural networks in human-computer interaction, Beale R and Alistair D N E)
  • A DTW that utilizes the Dynamic Time Warping algorithm initially developed for the speech recognition and signal processing (Space-Time Gestures, T. Darrell and A. Prentland; Spatial-Temporal Features by Image Registration and Warping for Dynamic Gesture Recognition, Y. Huang, Y ZHU, G. XU, H. Zhang)
  • The Adaptive Template method (Adaptive Template Method for Early Recognition of Gestures, K. Kawashima, A. Shimada, H. Hagahara, and R. Taniguchi)
  • HMMs that utilize statistical classification (Hidden Markov Model for Gesture Recognition, J. Yang and Y. Xu)
  • The Hybrid approach that utilizes a combination of the previously mentioned approaches

In this chapter we are going to develop the manager for gesture recognition based on an algorithmic approach that considers a gesture as a sequence of postures defined by the position of the tracked joints. This algorithm is based on a simplification of the Adaptive Template method that uses the skeletal tracking data provided by the SDK. Understanding the complexity of all the approaches previously listed, we decided to use the algorithm approach because it is simple and suits the scope of this book.

We are going to develop an example where we select an area of the scene captured by the Kinect color camera of raising our left hand, and then we can drag the selected area of moving the right hand. The scope of this example is to demonstrate how we can translate a simple gesture in to a command for the application. Hence, this example will demonstrate how we can use the Kinect sensor for leveraging NUI in our applications.

The algorithmic approach enables us to define and easily distinguish simple gestures. It is optimal for tracking uniform movements. The algorithmic approach does face a few challenges on recognizing more complex gestures such as the swing movement or drawing a circle.

Every single gesture can be decomposed in to two or more sections (or segments). A section is a posture at a given time. A single posture is defined as the location of a given joint in respect to other joints. For example, the gesture of raising the left hand could be defined by the two sections: we start by having our left and right hands alongside the body and lower than the shoulders; we finish with the left hand higher than the left shoulder while the right hand is still lower than the right shoulder.

For simplicity we will decompose our initial SelectionHandLeft gesture in to two sections that we are defining as the initial section and the final section. To recognize the SelectionHandLeft gesture we have to first recognize the initial section and then the final one. Every single section is validated. The single gesture is validated if and only if all the sections composing the gesture itself are validated. Once the gesture has been validated, the manager notifies all the observers with the gesture recognition event.

The code provided in the following code snippet is a custom code for a single section defined as a class that implements the IgestureSection interface:

//GESTURE'S SECTION INTERFACE
interface IgestureSection
{
    GestureSectionCheck Check(Skeleton skeleton);
}

The initial and final sections of the "Selection Hand Left:" gesture are defined by the custom code classes, SelectionGestureHandLeftSTARTsection and SelectionGestureHandLeftENDsection, described in the following code snippets:

class SelectionGestureHandLeftSTARTsection : IgestureSection
{ 
/// <summary> Validate a gesture's section
/// </summary>
/// <param name=”skeleton”>skeleton stream data</param>
/// <returns>'OK' if gesture is validate, 'KO' otherwise</returns>
public GestureSectionCheck Check(Skeleton skeleton) 
  { if (skeleton.Joints[JointType.HandLeft].Position.X < 
skeleton.Joints[JointType.ShoulderLeft].Position.X && skeleton.Joints[JointType.HandRight].Position.Y < 
   skeleton.Joints[JointType.ShoulderRight].Position.Y &&
   skeleton.Joints[JointType.HandLeft].Position.Z > 
   skeleton.Joints[JointType.ShoulderLeft].Position.Z-0.30f)
        {
            return GestureSectionCheck.ok;
        }
        return GestureSectionCheck.ko;
    }
}
 
class SelectionGestureHandLeftENDsection : IgestureSection
{public GestureSectionCheck Check(Skeleton skeleton) 
    {if (skeleton.Joints[JointType.HandLeft].Position.X < 
   skeleton.Joints[JointType.ShoulderLeft].Position.X &&
   skeleton.Joints[JointType.HandRight].Position.Y < 
   skeleton.Joints[JointType.ShoulderRight].Position.Y &&
   skeleton.Joints[JointType.HandLeft].Position.Z < 
   skeleton.Joints[JointType.ShoulderLeft].Position.Z-0.30f)
        {
            return GestureSectionCheck.ok;
        }
        return GestureSectionCheck.ko;
    }
}

Note

Splitting a single gesture into discrete sections increases the reliability of selecting and recognizing the right gesture.

We can improve the gesture recognition algorithm by increasing the level of the status managed by the Check method to allow a more detailed analysis of the intermediate movements.

The following code snippet defines our base class for the gestures:

class Gesture
{
    private IgestureSection[] gestureSections;
    private int counterGestureSection = 0;
    private int counterFrame = 0;
    private GestureType gestureType;
    public event EventHandler<GestureEventArgs> GestureRecognized;
    public Gesture(IgestureSection[] gestureSections,  GestureType gestureType)
    {
        this.gestureSections = gestureSections;
        this.gestureType = gestureType;
    }
 
    public void Update(Skeleton Data)
    {GestureSectionCheck check = 
   this.gestureSections[this.counterGestureSection].Check(Data);
        if (check == GestureSectionCheck.ok)
        {
            if (this.counterGestureSection + 1 < this.gestureSections.Length)
            {
                this.counterGestureSection++;
                this.counterFrame = 0;
            }
            else
            {
                if (this.GestureRecognized != null)
                {
                    this.GestureRecognized(this, 
                new GestureEventArgs(this.gestureType));
                    this.Reset();
                }
            }
        }
       else if (check == GestureSectionCheck.ko || this.counterFrame == 60)
        {
            this.Reset();
 
        }
        else
        {
            this.counterFrame++;
        }
    }
 
    public void Reset()
    {
        this.counterGestureSection = 0;
        this.counterFrame = 0;
    }
 
}

    public enum GestureType
    {
        NoGesture,
        SelectionGestureHandLeft
   //ADD OTHER GESTURE TYPE
    }
 
    //STATE OF GESTURE'S SECTION CHECK
    public enum GestureSectionCheck
    {
        ko,
        ok
    }

 
    class GestureEventArgs:EventArgs
    {
        public GestureType GestureType
        {
            get;
            set;
        }
        public GestureEventArgs(GestureType gestureType)
        {
            this.GestureType = gestureType;
        }
    }

In the following code we define the gesture manager. In the GestureManager class construct we define, in a single collection, all the different gestures that the manager is going to handle and all the sections composing a single gesture. As previously stated, in our example, the "Selection Hand Left" gesture is composed of two sections only:

class GestureManager
{
 
   private List<Gesture> gestures = new List<Gesture>();
 
    public event EventHandler<GestureEventArgs> GestureRecognized;
 
    public GestureManager() 
    {
    IgestureSection[] SelectionSectionsHandLeft = new IgestureSection[2];
    SelectionSectionsHandLeft[0] = new SelectionGestureHandLeftSTARTsection();
    SelectionSectionsHandLeft[1] = new SelectionGestureHandLeftENDsection();
    Add(SelectionSectionsHandLeft, GestureType.SelectionGestureHandLeft);
 
//ADD HERE OTHER GESTURE
    }
 public void Add(IgestureSection[] gestureSections,GestureType gestureType)
    {
        Gesture gesture = new Gesture(gestureSections, gestureType);
        gesture.GestureRecognized += gesture_GestureRecognized;
        this.gestures.Add(gesture);
    }
 
    void gesture_GestureRecognized(object sender, GestureEventArgs e)
    {
        if (this.GestureRecognized != null)
        {
            this.GestureRecognized(this, e);
        }
 
        ResetAllGestures();
    }
 
    public void UpdateAllGestures(Skeleton data)
    {
        foreach (Gesture gesture in this.gestures)
        {
            gesture.Update(data);
        }
    }
 
    public void ResetAllGestures()
    {
        foreach (Gesture gesture in this.gestures)
        {
            gesture.Reset();
        }
    }

To utilize the gesture manager in our application, we need to declare and instantiate the private GestureManager gestureManager variable and define the gestureManager_GestureRecognized event handler for the GestureManager.GestureRecognized event:

void gestureManager_GestureRecognized(object sender, GestureEventArgs e)
{
    switch (e.GestureType)
    {
        case    GestureType.SelectionGestureHandLeft:

         // THING TO DO WHEN THIS GESTURE WAS RECOGNIZED
   // ADD HERE ALL THE GESTURE TO BE MANAGED
        default:
            break;
    }
}

Finally, anytime the skeleton data stream is providing a new frame, we need to update the gesture manager and analyze the frame itself to detect potential new gestures or sections. We add the following code snippet that is needed at the end of the sensor_AllFramesReady event handler:

// update the gesture manager
if (skeletonData != null)
{               
foreach (var skeleton in this.skeletonData)
{if (skeleton.TrackingState != SkeletonTrackingState.Tracked)
        continue;
    gestureManager.UpdateAllGestures(skeleton);
}
}

Joint rotations

We may face some scenarios where a given action needs to be designed according to joint rotations. The Kinect sensor is able to capture the hierarchical (rotation of the joint axis with regard to its parent joint) and absolute (rotation of the joint axis with regard to the Kinect sensor) joint rotations.

Discussing the joint rotations goes beyond the scope of this book and we recommend you review the SDK v1.6 documentation for a complete description of joint rotations (http://msdn.microsoft.com/en-us/library/hh973073.aspx).

During our tests we have been noticing that there is some noise in the joint positions streamed by the skeletal tracking system. An important step for improving the quality of skeletal tracking is to use a noise reduction filter. Applying the filter before the analysis of the skeletal tracking data helps to remove a part of the noise from the joint data. Such filters are called smoothing filters and the process is called skeletal joint smoothing. A full and in-depth study of skeletal joint smoothing is available in the Microsoft White Paper at http://msdn.microsoft.com/en-us/library/jj131429.aspx.

We can certainly share at this stage the smoothing parameters we have been testing for optimizing the joint rotation recognition in some of the proof of concepts developed in the past:

// Typical smoothing parameters for the bone orientations:
var boneOrientationSmoothparameters = new TransformSmoothParameters
{   Smoothing = 0.5f,
    Correction = 0.8f,
    Prediction = 0.75f,
    JitterRadius = 0.1f,
    MaxDeviationRadius = 0.1f  };
// Enable skeletal tracking
sensor.SkeletonStream.Enable(boneOrientationSmoothparameters); 
Joint rotations

Using the Kinect sensor as a Natural User Interface device

Note

The source code attached to this chapter provides a fully-functional example, where we demonstrate how simple user's actions can be combined to address a real scenario and utilize the Kinect sensor as a Natural User Interface for a complex application.

In the proposed example, we select a portion of the color camera stream data of raising our left or right hand. The selected portion of the color stream data can then be dragged within the field of view using the other hand (the right hand if we selected using the left one and vice versa).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.97.157