Since my last post several months back (about the PrimeSense device) I've been doing a lot of work with the Kinect. I've started work on a library that I'm referring to as NuiDotNet which I plan to put on codeplex and that assists you in building applications that integration NUI with the Kinect PrimseSense, WiiMote and voice recognition into your application. As part of trying to get the word out about it, I've decided it's time to start posting about some of the work that I've done with it. I'm also speaking this weekend at Houston Tech Fest 2011 and want to get content out for anyone that attends my session, and if you are there please do come by as I'd love your input.
The first video is one that I'll go over right now is one that shows the Kinect doing hand tracking. Now I've kind of jumped way ahead, well past things like skeleton tracking, to this point. Hand tracking is not something provided by Kinect or it's SDK (or OpenNI), so it involves a lot of custom programming that I will go over in this and subsequent posts. In this video I have placed my right hand about 1.5m from the front of the sensor and am running a simple piece of code written with NuiDotNet. I open and close the hand, as well as extend and retract several of the fingers to show the process in action.
To do this, there are a number of tasks that must be undertaken, on a frame-by-frame basis:
- Get the depth image from the Kinect
- Extract an appropriate view volume from the depth image
- Perform point cluster analysis on the view volume
- Generate an outline from the identified cluster
- Determine center of mass of the cluster
- Perform a modified K-Curvature algorithm to determine the location of the fingers
- From the finger information and K-Curvature results perform a least squares fit of finger outline to determine the direction that the finger is pointing
- Render the visual in WPF
So what does the code look like to do this? I've tried to make it very simple (on top of the covers) with NuiDotNet. The XAML is the following:

Very easy. One canvas, 'canvas', which is for the next demo, and one canvas, 'handCanvas', where NuiDotNet will render the hand. The following is the entirety of the code for the rest of the window, which I hopes to show the simplicity that I'm striving for in NuiDotNet:

Everything in NuiDotNet involves using some tip of [Device]NuiDataSourceFactory, which will create various forms of NUI based data streams for your application. Line 41 creates one for the Kinect. There is actually a layer of abstraction available that will even hide the specific devices from the application, allowing Kinect's and PrimeSense (OpenNI) devices to be specified via configuration, but that's for another post.
Most things in DotNetNUI then revolve around getting a DepthDataSource object, which represents the depth sensor data stream from the device. This is created in line 42. Once a depth sensor stream is available, it can then be passed into a number of other strategy objects or other data sources. In this case, it is passed into a factory method that creates HandDataSource (line 46), which does all the work to track a hand given a depth stream.
The hand data source takes several other parameters, a Clustering Parameter and Hand Parameter. A cluster parameter defines how to look at the depth data to find a hand. This particular subclass will find the point nearest the sensor and construct a view volume from that point to 500mm further back in the depth stream. The 75 parameter effectively specifies the "floor", below which the data is ignored; this is useful if you are sitting at a desk like I am. The hand parameters specify various options for the K-Curvature algorithm, which I'll not get into at this point.
NuiDotNet then provides a number of visualizers that can be used to render various data streams. In this scenario, in lines 48 - 54, a hand visualizer is created, specifying what canvas to render to, which hand data source to render, and which visual elements of the hand are to be drawn; in this case, I want to see the center of the palm, finger tips (the blue dots), finger tip rays (blue vectors/rays extending from the finger), and the contour of the hand.
It's that simple :)
To close this post, I include the following video is a small extension of the application to map the hand movement to the entire window, and using the pointing vector and hand location to determine if the ray from the finger tip would intersect any of the rectangles in the window, effectively showing how this can be used to select items in the application without actually grabbing (grabbing itself is supported by NuiDotNet and will be covered in a subsequent post):
This is a fairly simple extension using NuiDotNet constructs. I'm not going to get in to the details of ray intersections with rectangles, but how this is done is by hooking into the hand data sources NewHandDataAvailable event, which is passed a hand object, which has properties such as PalmCenter and FingerTips, and a FingerTip object also has a vector representing the direction that it is pointing. Handling this event it is then possible to determine with ray/line intersection algorithms which item the finger is pointing at.