In November 2010, Microsoft® introduced Kinect™. As an expansion of the Xbox 360™ gaming console, it brings controller-free gaming to the living room and even long before its actual release it was believed to revolutionize Human Computer Interaction. Therefore, expectations were rather high and one felt reminded of the Natural User Interface (NUI) featured in the movie Minority Report. Will this futuristic vision soon become reality?
Kinect offers the possibility to control a system with body movements and gestures. The package contains an inexpensive hardware combination including a range camera, RGB camera, and a 3-D microphone as well as special software. At first its field of application was restricted to Xbox 360. Homebrewed drivers were available very early, and in June 2011 an official, beta non-commercial Kinect SDK for Windows was made available.
The kinect-sensor (Source)
Selling 8 million devices in the first 60 days after its launch clearly shows Kinect’s impact on the gaming community. One of the impressive and unusual aspects of modern interaction concepts such as gesture- and touch-based interaction is the direction in which they evolve: in the past, new technology has been introduced in the business (or military) sector before entering the private domain. Apparently sequences are shifting here.
Hard and Software
Generally speaking, Kinect is connected with Xbox 360 via a special port. Alternatively, it can be hooked up to the computer using its USB Port and a special adapter, which is included in the package. After installation of the SDK, Kinect can register up to six people at the same time. However, reconstruction of skeletal-information is done for only two people. For this purpose the hardware generates 20 marks per person and follows them when the person is moving (“tracking”). As illustrated in the image below, the marker points correspond to the most important joints of the human body.
Application window for “SkeletalViewer” (Source)
The data sheet reveals that the sensing range of the depth sensor and consequently skeletal tracking is working at a distance of 1.2 meters or more. At a distance of 1.2 meters, Kinect picks up a sensitive area of 1.3 meters in breadth. This area is increased with growing distance (up to 3.5 meters max), at which its breadth makes up 3.8 meters. The following image shows a proportional comparison of these values with a person.
Detection range of the depth-sensor (Source)
The best tracking results are achieved by standing straight in front of the sensor. Apart from the skeletal structure, the sensor also provides an RGB picture of 640×320 pixels, as well as a depth picture of 320×240 pixels. Additionally the sensor contains a high-resolution 3D-microphone, for voice recognition or localization of noises.
UI-Interaction in Kinect Games
The aim of this article is to assess the potential for industrial use of Kinect. For this reason, it is certainly helpful to learn form already existing NUI solutions, which are founded on the use of Kinect. An article by Andrew Webster and another one by Jakob Nielsen serve as a basis for the following section.
One of the first providers to concern themselves with Kinect was Harmonix® Music Systems while working on their game DanceCentral™. The developers were confronted with the problem of creating a menu-system, which was meant to replace usual gamepad solutions while also bringing joy to the user. It was supposed to deliver an intuitive and understandable navigation experience. Up to this point, no examples or references existed for a system like this. While still in early stages, they realized that it was not efficient investing much time and work up front in the detailed description of possible interaction concepts or metaphors. Instead, the most effective and direct way was to realize ideas rapidly in form of small prototypes to test their capability. This approach matches the philosophy of Centigrade, as reasoned in the article UI Prototyping. The following paragraphs classify the different types of interaction, which are known from various Kinect games.
Desktop in the Air
In context of menus and navigation, users are familiar with “Point & Click” interaction. Now, the “mouse cursor” is moved with the hands of the player. Selecting a menu entry proved to be a problem, however, since movement of the hands only triggers a hover event and no click event per se. To compensate for this shortfall a couple of confirm interactions were developed.
This idea for confirmation is based on pushing a button in the real world. In that, the player moves his hand over the menu entry or points at it respectively. To select the button he moves his hand forward as if to really trying to push it. The absence of a pressure point (haptic feedback), i.e. pushing in the air without any form of resistance, felt awkward however. Moreover the position of the cursor might change during selection and as a consequence a wrong button is registered.
In this approach, a confirmation button appears after the primary selection of a menu entry. It is only after its activation by the player that the selection is verified. If the player does not confirm, he simply moves his hand away from the menu and the confirmation-button disappears. This kind of confirmation used in the game “Your Shape” for instance.
Another possibility to confirm a selection is the employment of a countdown. Whenever the user moves his hand over a button, a circle appears that goes from being an empty shape to being filled. This circle represents a temporal countdown and the action is triggered after the countdown has elapsed (i.e. the circle is completely filled). The countdown is running as long as the user has his hand on the button. If the user removes his hand before the countdown is finished, then the confirmation will be canceled. This mechanism prevents accidental activation due to a flawed interpretation.
Unfortunately, hitting a button precisely was a problem for some users. To compensate for inaccuracies, a magnetic button was developed. As soon as the user gets near to a button, the cursor is drawn to the target.
Confirmation with Gestures
Outside the context of GUIs and their use of dedicated buttons, in NUIs small gestures are applied rather often. For example one can find the metaphor of “pulling a button”. Once the hand of the user is set on a button, the player has to pull his outstretched arm back to his body. With this gesture his choice his confirmed. Alternatively, the user may wave his hand in a certain direction, as can be seen in the game “DanceCentral”.
One of Kinect’s most popular gestures is the swipe. It is used to navigate through an assortment of screens for example, by moving the entire arm and thereby sliding the next screen into view. It imitates the wiping away of a screen, while pulling another one in. While this gesture was successfully introduced on mobile phones as just one finger is needed, horizontal swiping with the entire arm holds some issues. The Harmonix developers had to realize how every user had his very own way of performing a swipe with his hands. Teaching the system the different variations of this movement proved to be very time-consuming and it seemed impossible to cover every variation. Therefore, it was decided to train the user instead. Whether this is pure “user centered design” could be discussed, as the user is trained instead of designing the system around him. Sometimes, however, it is legitimate to train the user to some degree, if an aspect allows too many degrees of freedom and it is therefore impossible to offer a standardized solution.
Another interesting idea for navigation/interaction is the use of real world metaphors. That is to say objects are ported to the application in consideration of their real world functionality and behavior. One of these attempts, which unfortunately exceeded the limits of the system itself, was the “the price is right gesture” from Harmonix. It tried to imitate a spinning wheel as know from the TV series. Due to Kintect’s low skeletal resolution however, the system cannot identify the exact moment when the player releases the wheel. Mapping an object to a gesture is also done in racing games in form of the steering wheel. By copying steering motions with the fists the player controls the vehicle, swiveling of the hips initiates drifting and pushing the fists forward results in boosting. In spite of the working controls, the absence of haptic and physical feedback of the vehicle is perceived as unnatural and disturbing. This example shows however, how in contrast to menu navigation, manipulation of objects inside the gaming environment is absolutely intuitive. That is the case, since games make use of objects such as balls or sport equipment, which are always used like their real world counterparts.
Another interesting feature and Kinect’s biggest potential is facial recognition. As soon as a player steps in front of the sensor it is not just his skeleton that is created, but also his face gets identified. Xbox 360 then automatically signs in his profile and self-created Avatar while Kinect calibrates itself based on the respective account data. Jakob Nielson calls this type of interaction “non command interface”:
„You don’t feel that you’re issuing commands to a computer; you simply go about your business the way you normally would, and the computer does what’s needed to complete its part of the task.“
After this overview of the existing possibilities for the implementation of UI’s for Kinect, Part 2 of this article will offer a critical analysis of what is presented and an assessment of the application areas of Kinect in an industrial context.
Microsoft, Windows, Kinect and Xbox 360 are trademarks or registered trademarks of Microsoft Corporation in the US and/or other countries.
Harmonix and DanceCentral are trademarks or registered trademarks of Harmonix Music Systems, Inc. in the US and/or other countries.