Recent Developments in Virtual-Reality and Augmented-Reality Technologies
Along with the advances in virtual-reality (VR) and augmented-reality (AR) technologies, many challenges remain. This is therefore an exciting time for the display industry and its engineers. In this article, we present a summary of select new developments reported at Display Week.
by Achintya K. Bhowmik
Virtual- and augmented-reality technologies, along with the increasingly numerous variants referred to as mixed- and merged-reality platforms, represent the next frontier of visual and interactive applications. These devices free us from the confines of conventional screens for immersive viewing experiences, and promise to seamlessly blend digital objects with the physical world. While these concepts have been pioneered over many decades, rapid developments in displays, sensors, and computing technologies in recent years are now pushing this dream toward reality. The applications are endless, including gaming, media, education, virtual tourism, e-commerce, etc.
This article provides a synopsis of some of the key new developments in the field as presented at Display Week 2017.
All the ‘Realities’
“What is real?” asks the character Morpheus in the acclaimed 1999 science fiction movie, The Matrix. Then he rhetorically asks, “How do you define ‘real’?” He goes on to answer his own question: “If you’re talking about what you can feel, what you can smell, what you can taste and see, then ‘real’ is simply electrical signals interpreted by your brain.”
Well … this is a profound definition of reality – one that engineers can readily accept! If we can understand the electrical signals that zip around the neurons in the cerebral cortex of our brain as we sense and perceive the world, then we may be able to artificially stimulate the neurons in someone’s brain with similar signals. This would create the illusion of seeing something, or being somewhere, that is completely different from “actual” reality and yet indistinguishable from the physical world. That, precisely, is the goal that virtual-reality engineers around the world are striving to achieve.
There exists some confusion over the terms “virtual” reality and “augmented” reality, not to mention “mixed” reality, “merged” reality, and “extended” reality. The short course on virtual and augmented reality at Display Week 2017 presented by this author made an attempt at defining these terms.1 Virtual reality (VR) places the user in a virtual environment, generating sensory stimuli (visual, vestibular, auditory, haptic, etc.) that provide the sensation of presence and immersion. Augmented and mixed reality (AR and MR) place virtual objects in the real world while providing sensory cues to the user that are consistent between the physical and digital elements. Merged reality blends real-world elements within the virtual environment with consistent perceptual cues, scene understanding, and natural human interactions.
Augmented- and Mixed-Reality Devices
While the research and development of virtual- and augmented-reality technologies have a rich history going back several decades, recent advances in some of the key areas are now starting to make it possible to design and commercialize practical products with compelling new applications. These include significant progress in displays and optics, miniaturized and high-accuracy sensors with low latencies, high-performance and low-power graphics and computer vision processing, ergonomic system designs, understanding of the important human factors and their related issues and mitigations, and more.
The technical symposium at Display Week included several papers and demonstrations that narrated the state-of-the-art results in these key technologies and how they’re being implemented in devices. The product development efforts in the industry that were presented at the event included Microsoft HoloLens, the Meta 2 Augmented Reality headset, Intel Project Alloy, Google Tango, and various other projects. A key enabling technology incorporated in all these devices is a new technique called simultaneous localization and mapping (SLAM), which is based on sensor-fusion approaches including computer vision and inertial measurement units,2 along with high-fidelity and immersive graphics and display technologies. The devices also included depth sensors for 3D spatial learning and gesture interactions for natural human interfaces. The system-design variants included all-in-one untethered mobile devices as well as head-worn displays connected to a computer.
B. C. Kress et al. detailed the display architectures for the Microsoft HoloLens mixed-reality headset,3 presenting a review of the key display requirements for the untethered system and the optical hardware module selections that were made for the device. The paper described the optical subsystem architecture consisting of the display engine, imaging optics, waveguide modules including diffractive optical elements, and the overall optical module assembly. Figure 1 shows the pupil-forming optics and the display module assemblies for the HoloLens device. The authors also described the user experience considerations that drove the technology selections, with a focus on viewing comfort and immersion.
Fig. 1: On the left, the Microsoft HoloLens display architecture3 incorporates a display module assembly consisting of a liquid-crystal-on-silicon (LCoS) microdisplay and pupil-forming optics. The right image shows the dual-display module assemblies with the shared optics for both eyes.
The second-generation immersive optical see-through AR system from Meta was presented by K. Pulli.4 Consisting of an optical engine based on a freeform visor display with a relatively large (90°) field of view and integrated sensor modules for 3D tracking and gesture interactions, the device is designed to connect with a computer that runs the applications, as depicted in Fig. 2.
Fig. 2: Meta has developed an optical see-through interactive AR display system.4
Among other presentations describing approaches to creating augmented imagery via head-worn display devices were the Lumus optical technology presented by A. Frommer,5 Avegant direct-view optics for near-eye displays presented by A. Gross et al.,6 and a complex amplitude-modulation technique presented by Q. K. Gao et al. from the Beijing Institute of Technology.7
D. Diakopoulos et al. presented the system architecture of the Intel Project Alloy platform,8 an all-in-one merged-reality device incorporating inside-out visual-inertial tracking, depth sensing and 3D spatial-capture technologies, integrated application and graphics processors, and hardware for accelerating computer-vision algorithms. The paper also detailed prototype applications based on scanning the 3D environment and blending real-world elements into the virtually rendered world, as demonstrated in Fig. 3.
Fig. 3: Intel’s Project Alloy all-in-one device merges real-world elements into the virtual world. The top image shows the real-world scene; the middle shows the 3D scanned version of the scene; and the bottom shows the merged-reality environment where the real-world elements have been transformed and blended into the virtually rendered world.8
The technical details and applications for the Google Tango project were presented by J. Lee. This project integrates motion tracking, depth sensing, and area-learning capabilities into a smartphone platform to provide augmented-reality experiences.9 Figure 4 shows demonstrations for two of the applications, including real-time measurements and annotation, as well as augmentation of the real-world scenes with virtually created characters.
Fig. 4: These demonstrations of AR experiences on a smartphone platform are delivered by the Google Tango project. The left image shows real-time measurements and annotation with dimensions, while the right one shows virtual objects blended into a real-world scene.9
Improving Visual Acuity for VR
“How many pixels are really needed for immersive visual experiences with a virtual-reality head-mounted display?” was one of the most common questions raised during and after the short course this author taught at Display Week. So here we reflect on this a bit, and point to some recent developments and trends in the display industry as gleaned from the presentations and demonstrations at this year’s event.
First, let’s consider some basic, back-of-the-envelope math and calculations. Here are some facts related to the human visual system: An ideal human eye has an angular resolution of about 1/60th of a degree at the central vision. Each eye has a horizontal field of view (FOV) of ~160° and a vertical FOV of ~175°. The two eyes work together for stereoscopic depth perception over ~120° wide and ~135° high FOV.1 Since current manufacturing processes for both liquid-crystal displays (LCDs) and organic light-emitting diode (OLED) displays produce a uniform pixel density across the entire surface of the spatial light modulators, the numbers above yield a whopping ~100 megapixels for each eye and ~60 megapixels for stereo vision.
While this would provide perfect 20/20 visual acuity, packing such a high number of pixels into the small screens of a VR head-mounted display (HMD) is obviously not feasible with current technologies. To put this into context, the two displays in the HTC Vive HMD consist of a total of 2.6 megapixels, resulting in quite visible pixilation artifacts. Most people in the short course raised hands in affirmative answer to a question about whether pixel densities in current VR HMDs are unacceptable.
Even if it were possible to make VR displays with 60 to 100 million pixels, there are other system-level constraints that make this impractical. One involves the graphics and computation resources necessary to create enough polygons to render the visual richness to match such high pixel density on the screens. Next, the current bandwidth capabilities cannot support transporting such enormous amounts of data between the computation engines, memory devices, and display screens, and at the same time meet the stringent latency requirements for VR.
So … is this a dead end? The answer is a resounding “no!” Challenges such as these are what innovators and engineers live for! Let’s look at biology for some clues. How does the human visual system address this dilemma? It turns out that high human visual acuity is limited to a very small visual field – about +/- 1° around the optical axis of the eye, centered on the fovea. So, if we could track the user’s eye gaze in real time, we could render a high number of polygons in a small area around the viewing direction and drop it exponentially as we move away from it. Graphics engineers have a term for such technologies already in exploration – “foveated” or “foveal” rendering. (See the article on foveated rendering in this issue.) This would drastically reduce the graphics workload and associated power consumption problems.
Due to the severe limitation in pixel densities in the displays that can be made with current manufacturing technologies, there is a significant ongoing effort to reduce the “screen-door” effects resulting from the visible pixelation artifacts. As an example, B. Sitter et al. from 3M presented a technique to incorporate a diffractive film to reduce the screen-door effect and improve the visual quality of a virtual-reality display.10 As shown in Fig. 5, the diffractive film is made with 3M’s precision micro-replication process. The authors also presented a method to measure the effect that they used to demonstrate the efficacy of their technique. In another paper, J. Cho et al. from Samsung presented the results from their work on reducing pixelation artifacts by inserting an optical film that acts as a low-pass filter.11 The technique is illustrated in Fig. 6.
Fig. 5: Researchers from 3M demonstrated a technique to reduce the screen-door effect in virtual- and augmented-reality displays by incorporating diffractive films.10
Fig. 6: A paper from Samsung described the insertion of an optical film in the displays of a VR device that acts as a low-pass filter (left), with demonstrated reduction in the pixilation artifacts (right).11
Toward Matching Convergence and Accommodation
The optics and mechanics of human eyes allow us to “accommodate,” i.e., adjust the shapes of our lenses dynamically to focus on the objects in the physical space that we “converge” our two eyes on in order to dedicate our visual attention to them. As we look at an object that is located at a close distance, we rotate our eyes inward such that the optical axes of both eyes converge on it. At the same time, the lenses of the eyes are made thicker by adjusting the tension in the muscles that hold the lenses in order to bring the light from the object to focus on the retina to form a sharp image. On the other hand, as we shift our visual attention to an object that is located farther away, we rotate our eyes outward so that the optical axes now converge at that distance. In parallel, the lenses are made thinner to adjust the focal lengths accordingly.
For natural viewing conditions in the real world, these convergence and accommodation mechanisms are in sync. In other words, there is a consistent correspondence between where our eyes converge to and the lenses adjust to focus on. However, in currently available VR and AR devices, such a correspondence is not afforded, thereby causing visual fatigue
and discomfort. The displays in a conventional headset are located at a fixed distance, whereas the virtual objects are rendered at different distances to create a stereoscopic 3D visual environment. This creates a mismatch between the convergence and accommodation mechanisms, as illustrated in the top image of Fig. 7.12
There is significant ongoing research to address this human factors issue. For example, N. Padmanaban et al. from Stanford University reported their work on combining eye-gaze-tracking technology with adaptive-focus displays to minimize the mismatch between the convergence and accommodation points, as depicted in the bottom image of Fig. 7. The authors demonstrated prototypes with both focus-tunable lenses and mechanically actuated displays to dynamically adjust the accommodation points and provide natural focus cues. Demonstrations of technologies designed to solve this problem also included a tunable liquid-crystal lens by A. Jamali et al.,13 and a switchable lens based on cycloidal diffractive waveplate by Y. H. Lee et al.14
Fig. 7: An accommodation-convergence mismatch occurs in conventional VR and AR displays where the convergence points at the stereoscopic distances do not match the virtual image distances (top figure). Dynamic focus displays provide focus cues that are consistent with the convergence cues (bottom figure).12
Clearly, we are still in the early days of VR and AR technologies, with many challenges remaining to be solved, including presenting adequate visual acuity and truly immersive experiences on the displays. So, this is an exciting time for the display industry and its engineers, reminiscent of the days at the onset of display technology advances toward HDTVs and smartphones. The special track on VR and AR technologies at Display Week 2017 consisted of papers and demonstrations of new developments in this burgeoning field, including both commercially available products and results from ongoing research toward understanding and resolving key technical issues on the way to achieving compelling user experiences. There is much to look forward to at the next Display Week!
1A. K. Bhowmik, “Virtual & Augmented Reality: Towards Life-Like Immersive and Interactive Experiences,” Short Course S-2, Display Week 2017.
2A. K. Bhowmik, “Sensification of Computing: Adding Natural Sensing and Perception Capabilities to Machines,” APSIPA Transactions on Signal and Information Processing, 6, 1, 2017.
3B. C. Kress and W. J. Cummings, “Towards the Ultimate Mixed Reality Experience: HoloLens Display Architecture Choices,” Paper 11.1, Display Week 2017.
4K. Pulli, “Meta 2: Immersive Optical-See-Through Augmented Reality,” Paper 11.2, Display Week 2017.
5A. Frommer, “Lumus Optical Technology for AR,” Paper 11.3, Display Week 2017.
6A. Gross, E. Tang, N. Welch, and S. Dewald, “Direct View Optics for Near-Eye Displays,” Paper 11.4, Display Week 2017.
7Q. K. Gao, J. Liu, J. Han, X. Li, T. Zhao, and H. Ma, “True 3D Realization in the See-Through Head-Mounted Display with Complex Amplitude Modulation,” Paper 51.1, Display Week 2017.
8D. Diakopoulos and A. K. Bhowmik, “Project Alloy: An All-in-One Virtual and Merged Reality Platform,” Paper 4.2, Display Week 2017.
9J. Lee, “Mobile AR in Your Pocket with Google Tango,” Paper 4.1, Display Week 2017.
10B. Sitter, J. Yang, J. Thielen, N. Naismith, and J. Lonergan, “Screen Door Effect Reduction with Diffractive Film for Virtual Reality and Augmented Reality Displays,” Paper 78.3, Display Week 2017.
11J. Cho, Y. Kim, S. Jung, H. Shin, and T. Kim, “Screen Door Effect Mitigation and Its Quantitative Evaluation in VR Display,” Paper 78.4, Display Week 2017.
12N. Padmanaban, R. Konrad, T. Stramer, E. Cooper, and G. Wetzstein, “Optimizing Virtual Reality User Experience through Adaptive Focus Displays and Gaze Tracking Technology,” Paper 4.3, Display Week 2017.
13A. Jamali, D. Bryant, Y. Zhang, A. Grunnet-Jepsen, A. Bhowmik, and P. Bos, “Design Investigation of Tunable Liquid Crystal Lens for Virtual Reality Displays,” Paper 72.3, Display Week 2017.
14Y. H. Lee, G. Tan, Y. Weng, and S. T. Wu, “Switchable Lens based on Cycloidal Diffractive Waveplate for AR and VR Applications,” Paper 72.4, Display Week 2017. •
Dr. Achintya Bhowmik is the chief technology officer and executive vice president of engineering at Starkey, the largest hearing technology company in the US. In this role, he is responsible for leading the company’s research and engineering efforts. Prior to joining Starkey, Bhowmik was vice president and general manager of the Perceptual Computing Group at Intel Corp. He can be reached at firstname.lastname@example.org.