The first part of this article provided an overview of the concepts of the currently implemented user interfaces for the Kinect™ sensor. It pointed out technical specifications and explained the human-machine interaction within Kinect games. This second part now scrutinizes this interaction and assesses its potential for industrial application.
Analysis of existing opportunities
Part 1 of this article described the already existing navigation and interaction possibilities of Kinect in the game context. The respective metaphors and gestures make a lot of sense within the context of games. Kinect games do not jump from one type of content to another, as is the case within a business context. While playing, the user dives off for a longer period of time into the game and sees nothing of other applications or games (“immersion”). Since different games often have very different goals and activities, the various control options between games are perfectly reasonable: obviously, car racing and dancing have little in common.
When trying to transfer the interaction paradigms discussed in part 1 of this article into the business world, problems arise:
Presentation of the interaction
First of all, many gestures must be sweeping and highly visible. Over a period of time such an interaction is stressful and cumbersome. Thus, in order not to exhaust the user, such gestures can only be used occasionally.
Lack of references and guidelines
The diversity of approaches within the general interaction concepts such as menus and menu navigation is counterproductive. There is still no standard that enables a generally accepted interaction or navigation for menu systems. This complicates interaction, because the user has to learn these basic operations for each application (game) again.
The Kinect system supports only one generic gesture. This Gesture prompts the system to pause the game. To do this, the user has to position both arms to his sides. Then he moves the left arm straight out at a 45 degree angle from his body. Actually, this gesture is not very intuitive. Apart from this break-gesture, there are no guidelines for further operations such as “back”, “delete” or “text input”. Ryan Challinor, interface developer for “DanceCentral™” formulated an appropriate comparison:
“It would be like if your mouse worked differently with every program!”
This raises the question why Microsoft®, as platform owner, has not done some research in the past, in order to pass on certain policies or a style guide to Kinect developers. Perhaps Microsoft wanted to leave this research to the industry to convey a wide range of interactive solutions, so that the proven concepts can establish themselves at a later date.
However, such user interface guidelines can help ensure that various developers adhere to a common standard and thus increase the learnability of a system. This topic is also discussed in the article User Interface Guidelines for Mobile Devices: Blessing or Curse?
Experimenting with the SDK also brought to light several technical limitations. As already mentioned, it is not possible to track finger movements. The Kinect depicts a hand only by two points. These then result in only one bone in the skeletal representation of the user. With these points, neither the finger movements, nor the orientation or position of the hand are covered. So it cannot be recognized if the hand is closed or open, or if the palm of the hand is facing the sensor. On the upside, the hardware can already provide a resolution of 640×480 pixels of depth image. However, this resolution is currently not used because of technical limitations. According to Eurogamer, one of the reasons for the restriction is the Xbox 360 USB interface. It allows data transmission of 35 MB/s, of which only about 16 MB/s can be used. The artificial limitation was introduced because multiple USB devices can be used at once on one Xbox 360. In the future it might be possible that with a higher resolution and the development of the SDK also finger-tracking could be integrated.
The Kinect in industrial applications?
The issues described above show that it will be difficult to find suitable applications of Kinect in the industrial sector. Mainly, the sensor could play a role in applications where the entire body is at the center of attention. Especially in the development of accessible systems, this technology could enable new options.
Serious User Interfaces
Consider the interaction concept of the desktop metaphor for the Kinect. The mouse cursor can be controlled by the hands and maybe small gestures. Despite the disadvantages stemming from the lack of haptic feedback and the interaction in the air, the medical industry could benefit from this concept, as it is already trying to realize contactless interfaces since a long time, as, e.g., the project Gestix [PDF] of the Washington Hospital Center shows. In this domain, the Kinect sensor could be a large step forward. It would allow a contact-free interaction with computer systems and thus preventing users from running into the danger of contaminating a sterile work environment or themselves.
In environments in which protective clothing must be worn, this type of interaction could provide benefits, too. For example, if gloves should always be worn but must be taken off to make adjustments to a machine, there is a risk that users will not wear them. An interface that is easy to operate with gloves or bulky clothing would enhance security. (It should be noted, though, that there are already solutions in the range of touch-screens-technologies that can be operated with gloves. These multi-touch displays work by using an infrared grid, i.e. optical touch technology. Dell™ ST2220T is an example.)
Gestures in the industry
As mentioned above, problems with finger-tracking have been identified. Apart from the official SDK, there are already several projects are striving to achieve improvements. A very successful prototype project is being developed by Antonis A. Argyros at the University of Crete, which is called Efficient model-based 3D tracking of hand articulations using Kinect. This project is concerned not only with the marker-free detection and tracking of 3D position and orientation of the entire human hand, but also with the recognition of the individual fingers and their joints.
If we take a look into the future, gesture support could be refined by appropriate middleware. Gestures could be used in every imaginable context of intuitive menu guidance. Metaphors such as “drag and drop” then would get an even clearer significance in terms of accessibility and even sign language could serve as a solid gestures pool!
Additional value through skeleton tracking
Despite the low resolution of the skeletal reconstruction of a captured user, the information obtained by the sensor could be used for ergonomic adjustments. Before a person goes to a machine, their size is estimated and the machine adjusts automatically (e.g. regarding work surface height). Manual settings are therefore no longer necessary. This approach could be especially helpful for machines that are used regularly by different people.
In another context, the Kinect could serve as an observation instrument. The paper Human Activity Detection from Images RGBD deals with the classification of movements based on skeletal data. The categorization of the movement of a person under observation can serve as an information source for robotic assistants, in order to support a human with certain tasks in everyday live, like brushing teeth, cooking, working with a computer or talking on phone.
In the domain of building security, instead of capturing a whole video stream, only the 20 marker points of a person could be saved. This would make archiving surveillance data more efficient. One problem would be the small capture range of the sensor. The person under observation would always have to stand in front of the sensor and shouldn’t leave an area of about four square meters. Those situations are rarely given.
The advertising industry could as well benefit from skeleton tracking. It is likely that it will be the first industry after the games industry, which will make use of the sensor. The article With Xbox’s New In-Game Advertising, Engagement Is The Goal describes the concept of NUad’s, which stands for “Natural User Interface Advertisement”. A passive medium like TV has the problem that the audience will devote itself to other media to obtain further information (internet). The idea behind NUad is to deliver an augmented and natural way to interact with TV content. For example, advertisers can prompt users to say “Xbox near me” and a map to the nearest retailer will be sent to their mobile phone. This way TV should become an immersive and interactive experience.
Except for the use on TVs, the sensor could be placed behind a store window to detect people on the street. This allows the target audience to interact with advertising content without additional peripherals and the advertisers can provide specific information. The project Kinect Shop – the next generation of augmented shopping uses this setup to provide virtual shopping tours. Those applications are very similar to casual games: the interaction never lasts for a long time and serves for entertainment.
Automation through facial and voice recognition
As with the Xbox 360, face recognition has a positive impact on usability. A computer system could identify a person using facial recognition and load individual profile data. This scenario would also make sense for machines that are used by many different people. However, in this context the aspects of data protection should be critically considered.
A feature not exclusive to Kinect is the high resolution 3D microphone. Using this interface, voice commands could be entered. This feature could play a major role in the field of accessible systems. By combining voice commands with the RGB camera, video telephony could be handled by the Kinect sensor.
Summary and Conclusion
This article aimed at analyzing Kinect from an “industrial view”. It provided an overview of existing developments and discussed them critically. Technical limitations of the system were pointed out, which restrict the use in “serious” industries. Against this background, the article placed Kinect-specific features in an appropriate application context. It seems that the face recognition, the 3D microphone and the rudimentary skeleton tracking are the greatest strengths of the system.
The hope remains that in the future “teething problems” will be fixed and a standard menu system will be established. Developers could then rely on generic guidelines related to menu navigation and users won’t need to learn a multitude of interaction concepts. From the technical side, it is desirable that precision tracking and resolution of the skeleton will be increased.
All in all, Microsoft has made the first step towards affordable, accessible and marker less full-body tracking and provides a solid research base with that hardware. At this point, Kinect is primarily a beginner-friendly casual-game-system, which is neither suitable for the hardcore gamer nor for “serious” applications.
During research for this article it showed again and again that developers can’t only talk and speculate about Kinect: at a certain point ideas must be tried and tested. Ryan Challinor puts it this way:
“You can’t just talk about it, you have to prototype it. Concepting won’t get you very far!”
This approach seems plausible, and is also used frequently in our workflow, as the article Prototyping Natural User Interfaces describes.
All in all, it seems recommendable to keep an eye on this technology, which evolves rapidly and will surely support innovations in diverse areas.
Microsoft, Windows, Kinect and Xbox 360™ are trademarks or registered trademarks of Microsoft Corporation in the US and/or other countries.
Harmonix and DanceCentral are trademarks or registered trademarks of Harmonix Music Systems, Inc. in the US and/or other countries.
Dell™is a trademark of Dell Computer Corporation, registered in the U.S. and/or other countries