Advances in Virtual, Augmented, and Mixed Reality Technologies

Advances in Virtual, Augmented, and Mixed Reality Technologies

Virtual, augmented, and mixed reality (VR/AR/MR) technologies offer the promise of seamlessly blending the real and the virtual worlds, thereby enabling many exciting immersive and interactive applications and experiences. This article provides a synopsis of some of the key new developments in the field presented at Display Week 2018.

by Achintya K. Bhowmik

DISPLAY Week 2018, the annual conference organized by the Society for Information Display, featured a special track on virtual- and augmented-reality (VR/AR) technologies and applications. This was quite timely, given the rapid technology developments on this topic in recent years, as evidenced by the increasing number of companies introducing new products and universities offering specialized courses in associated technologies.

The VR/AR special track in this year’s conference included a keynote speech delivered by Doug Lanman from Oculus/Facebook Research Labs, a short course taught by this author, a seminar presented by Robert Konrad from Stanford University, several talks in the market focus conference, an extensive array of technical papers in the symposium, and a number of live demonstrations in the exhibit hall.

VR/AR devices promise exciting immersive experiences in the areas of gaming and entertainment, education, tourism, and medical applications, to name a few. The state-of-the-art results presented and demonstrated at Display Week this year are bringing virtual- and augmented-reality experiences ever closer to reality.

The March Toward Increasing Resolution

In the review article following Display Week 2017,1 we included a section on the topic of improving visual acuity for virtual-reality devices, addressing a popular question: “How many pixels are really needed for immersive visual experiences with a virtual-reality head-mounted display?” In that discussion, we referred to some facts related to the human visual system. An ideal human eye has an angular resolution of about 1/60th of a degree at central vision. Each eye has a horizontal field-of-view (FOV) of ~160° and a vertical FOV of ~175°. The two eyes work together for stereoscopic depth perception over ~120° wide and ~135° high FOV.2 These numbers yield a whopping ~100 megapixels for each eye and ~60 megapixels for stereo vision to provide the visual acuity of 60 pixels per degree (ppd)! Packing 60 to 100 million pixels in a small near-to-eye display is clearly not feasible with today’s manufacturing technology. However, some significant progress has been made over the past year, which was reported at this year’s Display Week technical symposium as well as demonstrated in the exhibit hall.

Vieri et al. reported the design and fabrication of a 4.3-in. (diagonal) organic light-emitting diode (OLED) on glass display with over 18 million pixels in a 3,840 × 4,800 format.3 Thus, this prototype, intended for head-mounted display (HMD) in VR applications, had a very impressive 1,443 pixels per inch (ppi) resolution with a 17.6-µm pixel pitch. The authors calculated that such a display would provide an angular resolution of ~40 ppd when integrated into an optical system with 40-mm focal length, ~120° horizontal and ~100° vertical viewing angle. Such a pixel density is well above what is available in today’s commercially available virtual-reality devices, and is approaching the specification required to meet human visual acuity.

Demonstrations in the exhibit area also included a 3.5-in. miniLED display panel shown by BOE that featured 4,320 × 4,800 pixels with 1,850-ppi resolution, integrated in a headset with 100° FOV. Samsung Display showed a 2.43-in. OLED panel with 1,200-ppi resolution, and JDI demonstrated a 3.25-in. LCD module with 2,160 × 2,432 pixels and 1,001 ppi.

Pushing the frontiers of pixel density even further, Fujii et al. presented a 0.5-in. microdisplay with an astonishing 4,032-ppi resolution and a 6.3-µm pixel pitch, based on OLED-on-silicon backplane technology.4 With the relatively small size of the current display, the targeted applications are in near-to-eye systems such as electronic viewfinders.

Besides manufacturing advances to pack an increasing number of pixels onto the display panels as reported above, other research efforts include enhancing the perceived resolution with the addition of appropriate optical apparatus. For example, Zhan et al. described a method to double the perceived pixel density in near-to-eye displays using a fast-switching liquid-crystal phase deflector,5 and Peng et al. presented a technique to enhance the resolution of light-field near-to-eye displays by using a birefringent plate.6

Dynamic Focus Cues

Achieving a truly immersive experience with VR, AR, and MR devices requires providing natural perceptual cues to the user. An important area of continued investigation is to understand the human visual and oculomotor cues that are vital to perceiving 3D structures in the real world, and to mimic those mechanisms with technologies implemented within virtual- and augmented-reality headsets. In a seminar titled “Computational Near-Eye Displays with Focus Cues,” Konrad reviewed the fundamentals and various methods that are being explored.7 The author explained the interplay between the convergence and accommodation mechanisms of the human visual system when viewing a natural 3D scene, as shown in Fig. 1.

Fig. 1:  When viewing a natural scene, our eyes converge on the object of interest in the 3D space, while the lenses of our eyes adjust accordingly to bring the light reflected by the object to focus on our retinas.7

Appropriately, a number of papers presented at the Display Week symposium focused on this important topic. For example, Dunn et al. described a prototype AR display that implements a mechanism to provide a variable focus using a deformable beam splitter and an LCD panel.8 As shown in Fig. 2, this display system adjusts the perceived distance of the displayed image with a dynamic modification of the corresponding optical power of a reflective and adjustable membrane to match the gaze of the user. The authors report that this system is able to provide a focal range of 11 diopters, effectively between 10 cm and optical infinity. While this prototype is much bulkier than what would be acceptable for a commercially viable device, it is expected that further developments will improve the form factors.

Among several other promising approaches, Jamali et al. presented a continuously variable lens system,9 and Liu et al. presented a method involving a liquid-crystal lens and dual-layer LCD panels.10

Fig. 2:  This prototype display provides variable focus using a deformable beam splitter in conjunction with an LCD panel.8

Foveated Rendering and Display

In last year’s review article, we also discussed the challenges involved in the computation required to render visual images corresponding to 60 to 100 million pixels per frame and the transportation of this vast amount of data into the display.1 We also considered the human visual perceptual system to seek ways to mitigate this challenge. High human visual acuity is limited to a very small visual field – about +/– 1° around the optical axis of the eye, centered on the fovea. So if we could track the user’s eye gaze in real time, we could render a high number of polygons in a small area around the viewing direction and drop the polygon density exponentially as we move away from the central vision. Graphics engineers have a term for such technologies already in exploration – “foveated” or “foveal” rendering. This would drastically reduce the graphics workload and associated power consumption problems.

The work reported by Vieri, et al., also included a driving scheme for foveated rendering and transport of the image data onto the display.3 Specifically, the authors propose to decompose each high-resolution frame into a high-acuity domain and a low-acuity domain; for example, into 640 × 640 and 1,280 × 1,600 pixel formats, respectively, and pack them into a single frame as shown on the left diagram of Fig. 3 for transportation to the display panel. The foveation logic incorporated into the driver chip is shown on the right of Fig. 3.

Fig. 3:  The left figure shows how the high-acuity and low-acuity image data are packed into a single frame for transporting the pixel values from the memory into the display. The right figure shows the block diagram of the foveation logic.3

Advances in Computer Vision and Spatial Tracking

While the visual display is arguably the most important component in a virtual-, augmented-, and mixed-reality headset, also crucial toward providing an immersive experience are accurate and real-time spatial tracking and computer vision technologies.11 Traditional VR platforms have largely involved headsets that are tethered to a computer or gaming console. They rely on external tracking systems, including infrared light emitters and detectors. However, it is widely accepted that future mass adoption of VR/AR/MR devices by mainstream users will require standalone headsets that incorporate the computing engines within the devices, as well as self-contained tracking systems, also referred to as inside-out tracking techniques. Life-like interactions in the 3D space also require that the devices be capable of low-latency tracking with six degrees of freedom (6DOF), whereas many standalone or smartphone-based VR headsets that are commercially available now are only capable of spatial tracking with three degrees of freedom (3DOF). Systems that can only perform positional tracking along the three Cartesian axes have 3DOF, where those that can also track the three angular rotations (yaw, roll, and pitch) have 6DOF. A state-of-the art algorithmic approach is visual-inertial odometry (VIO), based on a combination of computer vision and motion detection, using imaging and inertial sensors, termed as simultaneous localization and mapping (SLAM).

A number of papers and demonstrations addressed this topic. For example, Lieberman et al. presented a 6DOF SLAM technique with sub-millisecond latency based on a linear imaging sensor.12 Other work reported in the symposium included a semantic SLAM method including both tracking and object detection,13 and a review of tracking applications using visual-inertial odometry based on artificial intelligence, specifically deep learning techniques, as shown in Fig. 4.14 Devices with built-in imaging and inertial sensors, and algorithms such as SLAM and VIO, are capable of inside-out 6DOF spatial tracking, rather than the traditional approach of relying on external tracking setups.

Beyond 3D spatial tracking and presentation of 3D visual information on immersive near-to-eye displays, the virtual- and augmented-reality experiences will also include semantic understanding of the environment and user interactions. Thus, there is an increasing focus on real-time visual understanding based on 3D computer vision using artificial intelligence techniques, in conjunction with the miniaturization and system integration of depth-imaging sensors.15

Fig. 4:  Above is an example of utilizing artificial intelligence techniques based on deep learning in visual-inertial odometry and spatial tracking in virtual and augmented reality applications.14 In this specific approach, a convolutional neural network (CNN) followed by a recurrent neural network (RNN) is deployed to derive real-time 3D pose information.

Advances in Many Areas Are Taking AR/VR/MR Toward the Mainstream

While many challenges remain in advancing the technologies to bring virtual-, augmented-, and mixed-reality devices and applications into mainstream adoption, significant results continue to be accomplished and demonstrated. Display Week 2018 featured a special track to facilitate the review and presentation of the advances made around the world, both in academia and the industry. In this article, we have highlighted some of the key results, covering the areas of immersive visual displays with enhanced visual acuity, natural visual cues for 3D perception, spatial tracking, and semantic understanding with artificial intelligence.

References

1A.K. Bhowmik, “Recent Developments in Virtual-Reality and Augmented-Reality Technologies,” Information Display 6(2), 2017.

2A.K. Bhowmik, “Fundamentals of Virtual and Augmented Reality Technologies,” Short Course S-2, SID Display Week 2018.

3C. Vieri, G. Lee, N. Balram, S.H. Jung, J.Y. Yang, S.Y. Yoon, and I.B. Kang, “An 18 Megapixel 4.3-in. 1,443-ppi 120-Hz OLED Display for Wide Field-of-View High-Acuity Head-Mounted Displays,” JSID 26(5), 314–324, 2018.

4T. Fujii, C. Kon, Y. Motoyama, K. Shimizu, T. Shimayama, T. Yamazaki, T. Kato, S. Sakai, K. Hashikaki, K. Tanaka, Y. Nakano, “4,032-ppi High-Resolution OLED Microdisplay,” JSID 26(5),178-186, 2018.

5T. Zhan, Y-H. Lee, and S-T. Wu, “Doubling the Pixel Density of Near‐Eye Displays,” SID Display Week Symposium Digest 49(1), 13–16, 2018.

6K-E. Peng, J-Y. Wu, Y-P. Huang, H-H. Lo, C-C. Chang, F-M. Chuang, “Resolution-Enhanced Light-Field Near‐to‐Eye Display Using E‐shifting Method with Birefringent Plate,” SID Display Week Symposium Digest 49(1), 2018.

7R. Konrad, “Computational Near-Eye Displays with Focus Cues,” Seminar SE-1, SID Display Week 2018.

8D. Dunn, P. Chakravarthula, Q. Dong, K. Akşit, and H. Fuchs, “Towards Varifocal Augmented Reality Displays Using Deformable Beamsplitter Membranes,” SID Display Week Symposium Digest 49(1), 92–95, 2018.

9A. Jamali, C. Yousefzadeh, C. McGinty, D. Bryant, and P. Bos, “A Continuous Variable Lens System to Address the Accommodation Problem in VR and 3D Displays,” SID Display Week Symposium Digest 49(1), 1,721–1,724, 2018.

10M. Liu, H. Li, and X. Liu, “A Deep Depth of Field Near Eye Light Field Displays Utilizing LC Lens and Dual‐layer LCDs,” SID Display Week Symposium Digest 49(1), 96–99, 2018.

11A.K. Bhowmik, “Interactive and Immersive Devices with Perceptual Computing Technologies,” Molecular Crystals and Liquid Crystals (647), 329, 2017.

12K. Lieberman, D. Greenspan, O. Moses, and G. Barach, “Ultra‐High‐Speed 6DOF SLAM Using Optical Compression,” SID Display Week Symposium Digest 49(1), 388–390, 2018.

13B. Yu, Y. Li, C.P. Chen, N. Maitlo, J. Chen, W. Zhang, and L. Mi, “Semantic Simultaneous Localization and Mapping for Augmented Reality,” SID Display Week Symposium Digest 49(1), 391–394, 2018.

14H. Menon, A. Ramachandrappa, and J. Kesinger, “Deep‐Learning Based Approaches to Visual‐Inertial Odometry for Autonomous Tracking Applications,” SID Display Week Symposium Digest 49(1), 471–474, 2018.

15K. Vodrahalli and A. Bhowmik, “3D Computer Vision Based on Machine Learning with Deep Neural Networks: A Review,” JSID (25), 676, 2018.  •


Dr. Achintya Bhowmik is the chief technology officer and executive vice president of engineering at Starkey Hearing Technologies, a privately held medical devices company with operations in more than 100 countries worldwide. He is responsible for overseeing the company’s research and engineering departments, and is leading the drive to redefine medical wearable devices with advanced sensors and artificial intelligence technologies. Prior to joining Starkey, Bhowmik was vice president and general manager of the Perceptual Computing Group at Intel Corp. He can be reached at achintya.k.bhowmik@gmail.com.