Machine vision is key to implementing touch recognition in large displays, making it possible to create interactive systems of almost any size.
by Niel Beckie
THE MARKET for touch-enabled whiteboards and public interactive displays is growing at double-digit rates, and demand is also growing for even larger interactive displays in military applications, mass-transit and network-operations centers, air-traffic control, and disaster-response agencies.
The use of interactive displays in this growing range of applications presents new challenges. Camera-based touch technology can overcome many of these challenges and offer significant new capabilities, particularly – but not exclusively – when the displays have large screens.
Of course, touch-enabled displays are a familiar part of our everyday lives, from small personal-digital-assistant (PDA) displays to medium-sized displays in automated-teller machines and point-of-sale touch screens. Well-designed systems built around these modest-sized displays have become quite intuitive and easy to operate.
But touch-enabled displays are getting larger. Each year, larger plasma-display panels (PDPs) and liquid-crystal displays (LCDs) are being introduced in public interactive displays, such as information kiosks and interactive whiteboards in classrooms and meeting rooms. Even larger, wall-sized systems using new methods of tiling multiple high-resolution displays or projectors are opening up new opportunities for interactivity. In addition, existing wall-sized displays in command-and-control and operations centers can be retrofitted with interactive capabilities.
There are many touch technologies with varying characteristics, the most common being analog resistive, in which two sheets of film are coated with a resistive material and are separated by a gap. Pressure applied to the surface brings the sheets together, and the contact point is determined by measuring the resistance to different points on the periphery of the touch screen. Frequently used for small- and medium-sized touch screens, analog resistive technology is also used in interactive, whiteboard-sized displays.
SMART Technologies, Inc.
The application of touch technology to large displays can be challenging. Some touch technologies scale better than others to larger screen sizes. Ideally, the touch technology should have minimal impact on display quality. Some require the use of a special device or instrument to interact with the display (active touch), while others allow the simple use of a finger or any object (passive touch). Large displays may invite touch interactions from (and between) two or more users at the same time, and it would be desirable for a large-screen touch technology to be able to support this kind of interaction.
The Case for Machine Vision
The most commonly used touch technologies – which include analog resistive, capacitive, IR-LED, and surface acoustic wave (SAW) – have been described in many articles, including some that have appeared in Information Display.1–4 In this article, we will look at a relatively recent development, the use of digital cameras and other image-sensing devices to make surfaces interactive.
Machine vision is defined as the use of image-sensing devices in conjunction with a computer or digital signal processor to automatically acquire and analyze images. It is frequently used to control industrial processes, but it has also been used in display applications in a variety of ways, such as tracking a laser pointer aimed at the front of a display to enable a basic level of interactivity. An example is Keytec's View Touch™ product, which was demonstrated at SID 2004. In other cases, machine vision has been used to create virtual keyboards on surfaces used in conjunction with PDAs and cellular telephones. Providers of such technology include Canesta, VKB, Virtual Devices, and iBiz Technology.
Machine vision has also been used to touch-enable large displays. Two companies, Jestertek, Inc., and NextWindow, have employed machine-vision technology to permit a range of displays to recognize basic mouse movements and control. At SMART Technologies, Inc., we exhibited our version of machine vision, called DViT™ (Digital Vision Touch) technology, at SID 2003 and 2004. It makes a wide range of public displays interactive by interpreting basic mouse movements on large interactive digital signage and by capturing handwriting entry on interactive whiteboards.
The technology has been successfully incorporated in displays as small as 17 in. (Fig. 1) and as large as the wall-sized displays mentioned above. Between these two extremes is a 60-in. display created by the Defence Research and Development Canada at Valcartier (DRDC-V) called the Topographical Map Display (ToMaDi™) MkIIb. It is used for displaying topographical data, satellite imagery, and digital terrain elevation (Fig. 2). The global display resolution is 5600 x 4200 pixels. Machine-vision technology was selected by the DRDC-V because it does not degrade the image but still has sufficient digitizing resolution for detailed interaction. It allows for interaction with a finger or any passive object and has the ability to annotate, capture handwriting, and provide basic point-and-click interactions.
Fig. 3: DViT™ technology applied to a SMART Board™ interactive whiteboard relies on two (or more) cameras mounted in the corners of the display.
Those are the specific qualities the DRDC-V specified. Now, let us take an overall look at the advantages of machine vision as a touch technology for large interactive displays.
1. It requires no special films or coatings over the display surface, so it does not degrade the image. It also allows the application of anti-glare, anti-smudge, or other coatings, if desired.
2. It offers scalability to a wide range of display sizes.
3. It needs no proprietary pens, tools, or other devices.
4. It is applicable to large monolithic and tiled PDPs, LCDs, and rear-projection displays.
5. It produces high digitizing resolution and accuracy, along with a response speed that is sufficient to capture handwriting.
Implementation of Machine Vision
In SMART Technologies' DViT™ implementation of machine vision, digital cameras are mounted in the corners of the display to activate the surface for touch interaction. The cameras constantly scan the surface to detect a target object (Fig. 3). When an object is detected, each camera identifies and calculates the angle of its position relative to a coordinate system. Algorithms automatically record the distance between cameras and their viewing angles relative to each other. With this information, the technology can triangulate the location of the contact point with high accuracy and speed.
We successfully built such a display which delivered the expected characteristics. In particular, the technology offered excellent durability since the surface of the display is not part of the touch circuit, and virtually any surface appropriate for the display can be used. However, further development was required. In early implementations, we found that changing lighting conditions, such as a sunlit room suddenly darkened by a passing cloud, could interfere with touch accuracy. Lighting the display area by mounting a ring of infrared illuminators in the bezel was instrumental in solving this problem.
A research project successfully completed in 2003 provided a wall-sized machine-vision touch display for the Georgia Institute of Technology (Fig. 4). The largest known interactive display of its kind, it allows simultaneous input by multiple users and demonstrates the capabilities of large, machine-vision touch displays.
As touch displays increase in size, even to the size of an entire wall, more than one user will be able to interface with the display at the same time. Two students could conveniently collaborate on a project, for instance, or two game players could challenge each other by interacting with the display simultaneously. The tracking and processing of inputs from multiple users are features that machine-vision technology can implement efficiently because the data of two simultaneous pointers can be processed from a single camera image (Fig. 5). At SMART Technologies, Inc., we have implemented multiple-user input in rear-projection and flat-panel-overlay interactive whiteboards.
SMART Technologies, Inc
Fig. 4: A custom-designed interactive whiteboard developed by SMART Technologies for the Georgia Institute of Technology uses DViT™ technology in a 17.11 x 4.8-ft. (5.2 x 1.46-m) display with multiple simultaneous-user input.
Fig. 5: Machine vision allows two simultaneous pointers to be detected and processed, as demonstrated by this binarized image of camera data.
Machine-vision systems can also be taught to recognize a user's gestures for enhanced interactivity with very large displays. Because of the size of these displays, the controls within a typical graphical user interface (GUI) are likely to be placed too far away from the user. A system that can be programmed to recognize gestures can interpret a series of hand motions to activate advanced functions, thus relieving the user of having to locate a menu item or a hardware control. For example, the first contact made by a finger on the display surface is usually interpreted as a left click. A subsequent second contact by a finger to the right of the first contact point is interpreted as a right click. Another gesture can initiate scrolling. These gestures have been implemented on SMART Board™interactive whiteboards using the DViT™ camera system. Jestertek's JestPoint™ technology converts simple gestures into direct mouse control, and interpretation of more-complex gestures is possible.
Since the cameras can identify an object's location before it contacts the display, it is possible to report a "hover event." This enables a new mode of touch interaction that could be used on drop-down menus in a GUI; an item could be selected by hovering but not be activated until the screen surface is actually touched. Hovering is turned off by default when displays are delivered to avoid confusing customers accustomed to using older products that were not based on machine vision.
Rejection of false contact events can also be improved through machine vision. Since the system can "see" the height of an object, a machine-vision system could, in theory, be taught to ignore insects and dirt on the display surface and report only legitimate contacts. Although the system could be set to register a "touch" as occurring at any height above the display surface that is within the camera's range, we set the "touch" to be at actual contact to avoid confusing users accustomed to conventional touch displays.
The DViT™ camera system's architecture uses a digital signal processor (DSP) to collect and process information from a 640 x 480-resolution CMOS camera sensor which is configured to use just a 640 x 20-pixel format (Fig. 6). The system uses a fixed-background method of segmentation using artificial light provided by a ring of IR illuminators in a bezel around the display area. Each camera's field of view is 90°. Compensation for lens distortion is provided by the image-processing algorithms.
The pixel collection is performed by having the sensor clock pixels into a FIFO buffer. That information is then collected by the DSP when a full sensor frame is available. The camera DSP then processes the pixel information and extracts relevant metrics from the analyzed scene. An interactive display capable of capturing smooth writing with no human perception of system lag requires a frame rate of 100 frames per second (fps), which corresponds to 100 points per second of output mouse events.
This architecture – along with the embedded processor design – is a critical component in bandwidth handling because only a subset of the available pixels is collected and processed. This was the primary reason for selecting a CMOS sensor rather than a charge-coupled-device (CCD) sensor because the latter would require the collection of every pixel. The reduced pixel format enables the high frame rate requisite for a highly responsive interactive experience.
Once the onboard DSP has processed the pixel information and extracted the necessary metrics, the information is then sent to a master DSP. The metrics are usually information about the object of interest in the camera scene, which is normally a finger or pointer of some type. Some important metrics include location in the scene, contact, size, and type. The master DSP uses the information from each camera to triangulate the object's location. This information is then sent to a computer running an application. Contact events, where the user has actually touched the display, are interpreted as mouse clicks to provide application control.
The system minimizes bandwidth requirements and processing load on the attached computer by using a staged reduction from the camera to the computer. All that is passed from the master DSP to the computer is the x,y display location and contact status over a 9600-bit/sec serial connection.
As cameras and microprocessors have become smaller and cheaper, the potential applications for machine-vision touch panels have expanded. We expect the growth of this technology to continue and its technical capabilities to increase even further. •
1Bruce DeVisser, "Designing Touch LCDs for Portable Devices" 19, No. 7, 18–21 (July 2003).
2Jeff Morris, "Five-Wire Touch Screens Make Inroads" 18, No. 8, 24–26 (August 2002).
3Dave Gillespie, "Novel Touch Screens for Hand-Held Devices" 18, No. 2, 22–25 (Feb-ruary 2002).
4Wayne Wehrer, "Touch Technology Grows Up" 11, Nos. 7&8, 14–19 (July/August 1995).
Fig. 6: The DViT™ system architecture reduces the amount of data that must be transmitted to register and process touch information, providing high response speed with relatively low bandwidth.