Personal Near-to-Eye Light-Field Displays
In order for people to achieve true mobility – the ability to do anything, anywhere, anytime – there needs to be an entirely new class of mobile device.
by Wanmin Wu, Kathrin Berkner, Ivana Tošić, and Nikhil Balram
SMARTPHONES AND TABLETS are becoming super-computing, super-communications, and super-sensory systems.1 Over the last few years, their deployment has been growing much faster than that of the conventional PC platform. It is projected that by 2015 the installed base of mobile devices will be twice that of PCs.2
Despite this pervasiveness, the existing human interface of smartphones and tablets has fundamental limitations. Due to the constraints imposed by mobility, the human interface is provided by a small screen. This results in a field of view (FoV) that is too narrow to display certain types of information satisfactorily (Fig. 1, top left). This also limits information input and digital object manipulation (Fig. 1, top right). Furthermore, the promise of “augmented reality” (AR) applications that “allow users to see the real world with virtual objects superimposed upon or composited with the real world”3 is lost because the small screen offers too tiny a window over the real world and the handheld form factor makes continuous usage awkward (Fig. 1, bottom left). Last but not least, current screens lack the capability to deliver a true (volumetric) 3-D experience (Fig. 1, bottom right).
Fig. 1: Human interfaces with existing mobile devices (e.g., smartphones and tablets) are fundamentally limited in their field of view and interaction capabilities.
In an earlier article on mobility published in the 2014 IMID Digest,4 we presented the argument that to achieve true mobility – the ability to do anything, anywhere, anytime – there needs to be an entirely new class of mobile device. We call this new category the Mobile Information Gateway (MIG), and it will comprise a compute–communication module (CCM) and a human-interface module (HIM), as shown in Fig. 2.
Fig. 2: Mobile Information Gateways – a new family of devices that combine a personal NTE light-field display and a mobile computer – will be the next-generation mobile platform.
In order to provide a wide FoV under the mobility constraint, the next-generation human-interface module will logically need to be a wearable display positioned near the eyes of the user. To support unobstructed and immersive interaction with the surrounding 3-D real world, the near-to-eye (NTE) display will need to have binocular, optical see-through, and true 3-D visualization capabilities. To minimize the weight and form factor, the primary computation and communication electronics will reside in a separate compute– communication module that will be an evolution of the current mobile platform.
In this article, we focus our discussion on the NTE display system. A number of attempts have been made in the past decade to build personal NTE light-field-display prototypes (e.g., see Refs. 11, 12, and 16). They all have different tradeoffs based on the design choices, and thus will apply to different classes of applications. Our immediate goal is to discuss the capabilities and limitations of these approaches and to report the implications to future MIG system developers so that they can make a more informed decision when choosing a path for their specific applications.
The NTE component of a MIG needs to provide four key interface features:
• A wide FoV to enable a viewing experience comparable to that of using a large screen.
• A perceptually correct positional overlay of digital information over objects in the real 3-D world surrounding the user.
• True (volumetric) 3-D projection of digital objects.
• The ability to capture and interpret gestures to enable information input capabilities in any environment without the need of a physical keyboard or mouse.
We believe that one of the most promising approaches to providing all of those key display-related attributes is through a binocular personal (single-user) light-field NTE see-through display.
In the most general sense, a light-field display refers to a device that emits an approximation of the light field of a 3-D scene, which is represented as a
high dimensional array of light modalities, including spatial, angular, wavelength, and temporal information of light.10 The approximation is achieved by designing a specific sampling topology of the light field, where that topology depends on the display architecture. For example, light-field displays based on micro-lens arrays have a reduced spatial sampling frequency in order to allow for the angular sampling of light. On the other hand, multi-focal displays based on high-speed digital micromirror devices (DMDs) maintain high spatial resolution but utilize temporal multiplexing to emit different spatio-angular slices of the light field at separate time instances.
Binocular NTE light-field displays with a large field of view can revolutionize how humans interact with the world. Two examples are illustrated in Fig. 3. In the example on the left, customer-facing professionals, such as employees working in a futuristic bank branch, can use the system to see their waiting customers with detailed profile information overlaid. Information overly associated with customers positioned at different distances from the viewer requires accurate positioning of overlay information, not only with correct 2-D positional alignment but also with correct 3-D depth cues. In the example on the right, during an ultrasound-guided catheter insertion procedure, a doctor could see ultrasound images superimposed on the patient while inserting the catheter, avoiding the continuous look-away required for conventional ultrasound displays.
Fig. 3: These examples of augmented-reality applications use binocular see-through light-field displays.
There are a large number of such potential applications. However, existing mobile-display technologies such as smartphones, tablets and various NTE displays24 are inadequate to support these applications.
Taxonomy of NTE Displays
To understand personal NTE light-field displays, it is useful to understand the taxonomy of NTE displays. As shown in Fig. 4, NTE displays can be classified into virtual-reality (VR) displays and augmented-reality (AR) displays. VR displays such as described in Ref. 9 show only virtual information to the users and block the real-world views completely. AR displays, in contrast, allow the users to see both the virtual world and the real world at the same time.
Fig. 4: A taxonomy of NTE displays includes commercial or near-commercial examples.
AR displays can be further classified into video-overlay displays and optical see-through displays. Video-overlay displays block the real-world view optically, but capture it with a miniature camera and present the video view to the user with virtual information overlaid. Although this approach has its advantages, such as latency hiding and simplified overlay, it suffers from a number of drawbacks such as sensory conflicts between vision and proprioception (the sense of one’s own body), perceived resolution loss, viewpoint mismatch, altered color and brightness, and user trust issues.
Optical see-through AR displays can be further classified into monocular and binocular displays. Google Glass8 is a well-known example of a monocular see-through display. It has a small FoV of about 13°. Monocular displays have no 3-D display capabilities and provide only limited AR support because the set of real-world locations that can be overlaid with AR information is constrained. On the contrary, binocular optical see-through displays relax the constraints on the placement of overlay information, allowing more natural placement in a
real-world scene. Therefore, we envision the next-generation mobile interface to be a binocular optical see-through display and, more specifically, a binocular optical see-through display with light-field projection capabilities.
Personal NTE Light-Field-Display Technologies
Among the aforementioned key requirements for the NTE component of the next-generation mobile-platform MIG – wide FoV, virtual-physical object overlay with optical see-through capability, true 3-D display, and gesture interpretation – the most challenging is probably the true 3-D display of a scene.
Conventional stereoscopic 3-D displays (such as the Epson Moverio shown in Fig. 4) are designed to create a 3-D perception of a scene, but they suffer from the fundamental problem of vergence-accommodation conflict. This conflict causes visual discomfort and fatigue, distortion in perceived depth, and degradation in visual performance and stereoacuity.5 Avoiding the vergence-accommodation mismatch is crucial for a MIG system to be able to create a comfortable 3-D viewing experience and be used for a sustained period of time without compromising visual comfort or performance. Researchers have proposed various 3-D display systems to avoid this fundamental conflict, including integral 3-D displays,23 compressive light-field displays,18 holographic displays,19 and volumetric displays.20,21 However, these systems are significantly burdened by the requirement of being
multi-user/multi-view and are not designed to be mobile.
The NTE component of a MIG should be a single-user/single-viewpoint, 3-D volumetric display. Past research has indicated that this path may be practically achievable by a multi-focal display, where the number of depth planes needed to provide conflict-free 3-D viewing can be six or even fewer.7 In an article presented at SIGGRAPH,6 Akeley et al. built a prototype of a single-user display that could display four planes and demonstrated that it enabled natural vergence-accommodation coupling during viewing. We expect the next-generation mobile interface to be a compact NTE version using the same concept.
As mentioned earlier, different types of approaches have been proposed in the past to construct personal NTE light-field displays, e.g. see Refs. 11, 12, and 16. They all have different tradeoffs and may be suited for different applications. In the remainder of this article, we will survey those approaches, classify them, and offer insights on how they compare to each other.
Before we discuss the various approaches, it is useful to understand the main components of a personal light-field-display system. The existing systems primarily consist of the following elements, e.g.,
• Light source: LEDs, OLEDs, laser diodes.
• Image source: reflective DMDs, emissive OLED devices, scanners (with fiber optics).
• Optical subsystem/path: including relay, field lens, combiner (such as a beamsplitter or mirror, waveguide, and free-form
optics), focus actuator [such as a liquid lens and a deformable membrane mirror device (DMMD)], and eyepieces.
• Electrical subsystem: power and signals (including sensors and controls).
The active display component, including the light source and the image source, forms an image that is, in turn, relayed by the optical subsystem into the retinae of the eyes. The optical subsystem typically needs to include an optical combiner that simultaneously reflects the projected information and transmits light originating from the real-world scene. In some cases, as explained below, the optical sub-system includes a focus actuator to adjust the position of the image plane at different distances from the eye, resulting in a 3-D content display at multiple focal planes. The electrical subsystem provides power and signals.
In the following sections, we mainly focus on the active display component and the focus actuator (if any) because they are the key components that realize the true 3-D projection capability. For a review of the other parts of NTE display systems (e.g., optical combiners), we refer readers to Refs. 24 and 25.
A Survey of NTE Light-Field Displays
To achieve true 3-D projection in a wearable display is a non-trivial problem. There have primarily been two ways proposed to tackle it: matrix-display based and laser-scan based.
Matrix-Display Methods with Temporal or Spatial Multiplexing
Matrix-display methods use matrix-display modules, such as a DMD or OLED, to project a 3-D volume of light by temporal multiplexing (i.e., projecting one plane at a time) or spatial-multiplexing (similar to 3-D integral imaging28 but with the light path reversed).
Temporally multiplexed systems utilize high-frequency display modules to consecutively display content in multiple focal planes, and to do so fast enough that the human eye perceives them as being displayed simultaneously. They utilize the fact that the refresh rate of DMDs is much higher than what the human visual system can resolve. Such systems also need to use a high-frequency focus actuator that is synchronized with the active display component to project light focused on
multiple depth planes.
Furthermore, in order to approximate a continuous depth volume for accommodative responses, a technique called depth blending (also known as depth blending or depth fusing) is often used.6
For example, Liu et al.26 developed a multi-focal optical see-through display prototype with an OLED microdisplay device as the active display component and a liquid lens as the focus modulator. The liquid lens the researchers used had a slow response time (75 msec), and thus only two planes were demonstrated at flicker-free speed.
In a 2009 article published in Optics Express, Love et al.11 constructed two prototypes, both using high-frequency CRTs running at 180 Hz. In the first prototype, they used two CRTs (one for each eye) viewed through
mirror prisms. They employed birefringent lenses as the focus actuator for each eye, which resulted in a maximum of four multi-focal planes. In the second prototype , they used the same principle, but with only one CRT that is time-multiplexed for two eyes (with the help of shutter glasses). The refresh rate was 45 and 22.5 Hz for prototype 1 and 2, respectively.
Hu et al.12,13 built a prototype display using a DMD and a synchronized high-speed MEMS-based DMMD as the focus actuator. Figure 5 shows the schematic optical layout. The DMD used can display at a rate of about 23 kHz and the DMMD has a switching speed of 1 kHz. The prototype was able to display six focal planes at a flicker-free rate (60 Hz).
Fig. 5: This schematic shows a prototype for a temporally multiplexed display using a DMD and a synchronized high-speed MEMS-based DMMD.12
In spatially multiplexed systems, the true 3-D capability is achieved by using a microlens array or a pinhole array in the optical path. This system architecture is based on the same principle as integral-imaging displays and micro-lens-based light-field cameras. These approaches reconstruct a full-parallax light field of a 3-D scene, and thus render focus actuators unnecessary because fixed-focus optics (e.g., free-form optics) already allow the viewer to perceive the 3-D volume. The most common active display component being used for this approach is an OLED device with high resolution and efficient form factor and power.
For example, in a paper published in the 2013 ACM Transactions on Graphics, D. Lanman et al.22 described a non-see-through NTE light-field display. Based on the same principle of integral-imaging displays and microlens-based light-field cameras, this system used an OLED microdisplay and a microlens array to render a light field before projecting it to the eyes. The achieved spatial resolution was 146 × 78 and the FoV was 29 × 16°. The authors later proposed a see-through AR display using point light sources,27 but the constructed prototype only supported one focal plane. The authors provided some theoretical guidelines on how it might be extended to display a light field but acknowledged notable challenges and did not demonstrate any implementation or experiment.
Hua et al.14 combined the microscopic integral-imaging (micro-InI) method and free-form optics to create a 3-D integral optical see-through light-field display. A micro-InI unit, consisting of a high-resolution OLED microdisplay and a microlens array, enabled reconstruction of 3-D volumetric shapes with both horizontal and vertical parallax. Figure 6 shows the scheme. Unlike that described in Ref. 22, the system demonstrated see-through capability with a free-form prism employed as the viewing optics that directly relayed the light field of the reconstructed scene into the eye. It achieved FoV of 33.4° and a depth range of 4 m.
Fig. 6: This schematic view shows Hua’s 3-D integral-imaging see-through light-field-display prototype.14
Laser-Scan Methods with a Single Fiber or a Fiber Array
While the above methods all rely on some type of matrix display, currently known laser-scan displays use laser diodes and fiber-optic scanners to raster-scan virtual images into the eyes. Such systems form multiple focal planes in two ways: (1) use of a single fiber and scanning different depths sequentially or (2) use of fiber arrays (with each fiber representing one depth plane) and XY scanning of the fiber bundle to generate 3-D depth volume. Laser-scan approaches are intrinsically temporal-multiplexing approaches. But compared to the previously described temporal-multiplexing matrix-display methods, the temporal tradeoff can be relaxed in
laser-scanning-based systems because color and gray scale can be
handled independently with three color diodes and analog modulation, not sequentially as when using DMDs. However, there is still a tradeoff between frame rate and the number of lines per frame, just as there was for CRTs.
Schowengerdt et al. presented an early prototype of a laser scanning system.15 In that system, a single beam of light was formed and first scanned in the Z-axis with a DMMD that dynamically adjusted the focus of the beam; the beam was then raster scanned in the X and Y axes (XY scanned). This prototype was limited by the DMMD frequency at that time and was only able to project two planes frame sequentially.
Schowengerdt et al. later proposed another retinal scan method to overcome this limitation. The idea was to use multiple light sources to form a composite multi-focal RGB beam and then XY scan it into the viewer’s eyes.16,17 An optical fiber array with end faces positioned at different angles was used to produce a multi-focal bundle of beams, as shown in Fig. 7.
Fig. 7: The above schematic view shows Schowengerdt et al.’s fiber array-based scheme.16,17
Comparison of Existing Solutions
Table 1 presents a summary of recent approaches for the personal NTE light-field
displays described above.
Temporal multiplexing offers an opportunity for a better multi-focal display because one can achieve a significant depth range by placing planes as needed, from reasonably close to far away. The main drawback is that in practice the design of real systems will require careful thought behind the tradeoffs in the number of gray-scale levels, number of pixels, number of frames, and number of depth planes. Another challenge is having the whole display mechanism compact enough to make a
light and comfortable NTE display.
Spatial multiplexing offers the opportunity to make a more compact light-field display suitable for consumer market segments. The main drawback is that the actual depth achieved in the scene is limited by the small amount of parallax that is possible for light fields using a microlens-array approach, but this can be mitigated using free-form optics.14 The limited resolution of each spatial plane might be a disadvantage, but with OLED microdisplays going to 4K resolution, perhaps there might be enough resolution for many types of applications.
There has been tremendous progress in the past few years in the development of personal NTE light-field displays. As discussed above, the existing approaches all have different tradeoffs based on different design considerations. Thus far no practical design has been demonstrated that satisfies all the key requirements. As seen in Table 1, each approach has some limitations in the current implementations. But we believe that the fundamental approach of light-field sampling in spatial and angular dimensions will be the foundation of future designs that overcome the current limitations and deliver the full set of requirements for the MIG.
Further, in our opinion, the performance evaluations of the systems proposed in the current crop of papers are far from complete – most have not incorporated human studies yet, and those that have, have mostly only considered monocular accommodative responses. In other words, there is not yet enough scientific evidence to demonstrate that any approach is better than another. Overall, research in this area is still at an early stage. Thorough end-to-end system evaluation is still needed to assess these systems, for example, in terms of binocular responses, visual comfort, depth perception, etc.
Besides the challenges mentioned above, there are more open questions to answer in the development of Mobile Information Gateway (MIG) systems. The following list is by no means exhaustive; it is intended to stimulate discussion and new research in this area in the near future.
• How to evaluate the sampling topology of light-field displays to drive the system design in a practical direction?
• How to assess an end-to-end system incorporating optical and perceptual performance parameters?
• How to perceptually model the system performance?
• How to achieve seamless overlay in AR?
• How to calibrate the system to achieve the performance metrics we need in overlay and in system design?
We are confident that there will be substantial progress over the next 5 years, with the first commercial deployments occurring in specific vertical applications before the end of the decade. We believe MIG systems that encompass a NTE light-field display will ultimately revolutionize the human-computer and human-world interaction.
1N. Balram and M. Biswas, Designing a Power-Efficient High-Quality Mobile Display System, Seminar M-4, 2014 SID International Symposium, Seminars, and Exhibition.
2M. Meeker, Internet Trends 2014, http://www.kpcb.com/internet-trends, Code Conference (2014).
3R. T. Azuma, A Survey of Augmented Reality, Presence (MIT Press, 1997).
4N. Balram, W. Wu, K. Berkner, and I. Tosic, Mobile Information Gateway – Enabling True Mobility, Proc. IMID (2014).
5D. M. Hoffman, A. R. Girshick, K. Akeley, and M. S. Banks, Vergence–Accommodation Conflicts Hinder Visual Performance and Cause Visual Fatigue, Journal of Vision 8, 3, article 33 (March 2008).
6K. Akeley, S. J. Watt, A. R. Girshick, and M. S. Banks, A Stereo Display Prototype with Multiple Focal Distances, Proc. ACM SIGGRAPH (ACM Transactions on Graphics) 23, 3 (2004).
7K. J. MacKenzie, D. M. Hoffman, and S. J. Watt, Accommodation to Multiple-Focal-Plane Displays: Implications for Improving Stereoscopic Displays and for Accommodation Control, Journal of Vision (2010).
8Google Glass, http://www.google.com/glass/start/
9Oculus Rift, http://www.oculusvr.com
10X. Liu and H. Li, The Progress of Light-Field 3-D Displays, Information Display (2014).
11G. D. Love, D. M. Hoffman, P. J. W. Hands, J. Gao, A. K. Kirby, and M. S. Banks, High-Speed Switchable Lens Enables the Development of a Volumetric Stereoscopic Display, Optics Express 17, 15716–15725 (2009).
12X. Hu and H. Hua, Design and Assessment of a Depth-Fused Multi-Focal-Plane Display Prototype, Journal of Display Technology 10, 4, 308–316 (2014).
13X. Hu and H. Hua, A Depth-Fused Multi-Focal-Plane Display Prototype Enabling Focus Cues in Stereoscopic Displays, SID Symposium Digest of Technical Papers (2011).
14H. Hua and B. Javidi, A 3D Integral Imaging Optical See-Through Head-Mounted Display, Optics Express 22, 11 (2014).
15B. T. Schowengerdt, E. J. Seibela, J. P. Kellya, N. L. Silvermana, and T. A. Furness III, Binocular Retinal Scanning Laser Display with Integrated Focus Cues for Ocular Accommodation, Proc. IS&T/SPIE EI (2003).
16B. T. Schowengerdt, M. Murari, and E. J. Seibel, Volumetric Display Using Scanned Fiber Array, SID Symposium Digest of Technical Papers (2010).
17B. T. Schowengerdt and E. J. Seibel, 3D Volumetric Scanned Light Display with Multiple Fiber Optic Light Sources, Proc. IDW (2010).
18G. Wetzstein, D. Lanman, M. Hirsch, and R. Raskar, Tensor Displays: Compressive Light Field Synthesis using Multilayer Displays with Directional Backlighting, Proc. ACM SIGGRAPH (ACM Transactions on Graphics (2012).
19P-A. Blanche, A. Bablumian, R. Voorakaranam, C. Christenson, W. Lin, T. Gu, D. Flores, P. Wang, W.-Y. Hsieh, M. Kathaperumal, B. Rachwal, O. Siddiqui, J. Thomas, R. A. Norwood, M. Yamamoto, and N. Peyghambarian, Holographic Three-Dimensional Telepresence using Large-Area Photorefractive Polymer, Nature 468, 80–83 (2010).
20G. E. Favalora, J. Napoli, D. M. Hall, R. K. Dorval, M. G. Giovinco, M. J. Richmond, and W. S. Chun, 100 Million-Voxel Volumetric Display, Proc. SPIE (Cockpit Displays IX: Displays for Defense Applications) 4712, 300 (2002).
21A. Sullivan, 3 Deep: New Displays Render Images You Can Almost Reach Out and Touch, IEEE Spectrum 42(4), 30–35 (2005).
22D. Lanman and D Luebke, Near-Eye Light Field Displays, Proc. ACM SIGGRAPH (ACM Transactions on Graphics) (2013).
23J.-H. Park, Y. Kim, J. Kim, S.-W. Min, and B. Lee, Three-Dimensional Display Scheme Based on Integral Imaging with Three-Dimensional Information Processing, Optics Express 12, 6020–6032 (2004).
24B. Kress and T. Starner, A Review of Head-Mounted Displays (HMD) Technologies and Applications for Consumer Electronics, Proc. SPIE 8720 (2013).
25O. Cakmakci, J. Rolland, and H. Hua, Head-Worn Displays: A Review, Journal of Display Technology 2, 3, 199–216 (2006).
26S. Liu, D. Cheng, and H. Hua, An Optical See-Through Head Mounted Display with Addressable Focal Planes, Proc. IEEE ISMAR (2008).
27A. Maimone, D. Lanman, K. Rathinavel, K. Keller, D. Luebke, and H. Fuchs, Pinlight Displays: Wide Field of View Augmented Reality Eyeglasses Using Defocused Point Light Sources, Proc. ACM SIGGRAPH (2014).
28X. Xiao, B. Javidi, M. Martinez-Corral, and A. Stern, Advances in Three-Dimensional Integral Imaging: Sensing, Display and Applications, Applied Optics 52, 4 (2013). •