Guest Editorial

Solving the Rendering Puzzle Solving the Rendering Puzzle

Solving the Rendering Puzzle

by Nikhil Balram

The last special issue we did on light fields, back in July/August 2016, provided an overview and update on light-field displays. We divided these into two categories – displays for group viewing and displays for personal use (head-mounted displays, or HMDs) – because of the distinctly different challenges and trade-offs in a display meant for many users to view simultaneously, vs. just one. In either case, the objective was to present a representation of “the light field,” the radiance that emanates from every point in the world and is visible to the human eye.

Display hardware elements continue to advance, with panels that have higher spatial resolution and faster response times, and with optics incorporating higher-quality lenses and fast-switching variable focus elements, enabling us to make better light-field displays. But there is another big bottleneck upstream – the generation of the light-field radiance image. In this special issue, we look at the state of the art in rendering for light-field displays, as well as major directions being followed to address the big challenges.

The first article in this issue, “Light-Field Displays and Extreme Multiview Rendering” by Thomas Burnett, provides an overview of the architecture of fully featured, group-viewable light-field displays, such as a light-field table that might be used by a roomful of generals to view a battlefield simulation, or a bar full of sports fans to watch a World Cup soccer match. Creating a rich and deep light-field volume requires a large number of views (pixels) per microlens element (hogel), possibly as many as 256 × 256 or even 512 × 512. Traditional graphics pipelines are not designed for such extreme multiview rendering and are extremely inefficient when thousands of render passes may be necessary to update a display.

Thomas’s article explains the most suitable rendering algorithms for such extreme multiview light fields and the major bottlenecks they face. He proposes a new scalable processing architecture, the heterogeneous display environment (HDE), where the major portions of the graphics pipeline are separated into host and display-processing elements. The host is used to compute the scene geometry and send a high-level description of this to various displays using a newly proposed object graphics language (ObjGL) application programming interface (API). Each display is responsible for its own rendering using one or more newly defined multiview processing units (MvPUs) optimized for extreme multiview rendering.

In the case of HMDs, only a single viewpoint, the so-called “first-person perspective,” needs to be generated. This enables a significant simplification of the light-field volume by approximating it with a discrete set of focal planes (“multifocal plane display”) or even just one that is adjusted dynamically based on where the user is gazing (“varifocal display”). The big challenge in rendering lies in the limited compute and thermal budget available on a mobile system.

In his keynote speech at Display Week 2017, Clay Bavor, VP for VR and AR at Google, talked about the need for very high pixel counts to approximate human visual acuity in VR and AR displays and the challenges of driving these. He reminded us that despite the rich nature of the world image we perceive, the optic nerve that conducts a representation of the image captured by the retina to the visual cortex for processing only has a bandwidth of ~10 Mb/s. This elegant information-processing architecture is based on the retina’s primary daylight viewing receptors (cones) being densely concentrated in a very narrow region, called the fovea. From the fovea, the cones continuously scan the scene through rapid eye movements called saccades to build a full-scene representation over time. So, at any particular moment, the front end of the human visual system is only capturing a small high-resolution image spanning a few degrees of visual field.

This foveated capture is the inspiration for efficient ways to render a rich world image to the single user of an HMD through the use of a set of techniques called foveated rendering. These techniques employ knowledge of the user’s eye movements to deliver high-resolution imagery for only the specific region of interest.

The second article in this issue, “Foveated Pipeline for AR/VR Head-Mounted Displays” by Behnam Bastani et al., provides an overview of the foveated rendering and display processing algorithms and architecture needed to deliver perceived high-fidelity images to the viewer using knowledge of where she is looking. These algorithms and architecture need to efficiently align with the compute and thermal constraints of mobile processing to enable the goal of delivering great virtual and augmented experiences to the user as she roams freely.

With efficient and practical light-field display architectures and the associated rendering pipelines coming together, we can anticipate the beginning of the era of light-field display systems. Perhaps there is an Olympics or World Cup in the not-too-distant future that will be experienced by billions of users all over the world as if they are actually there.


Nikhil Balram is head of display R&D for VR and AR at Google. He is an SID fellow and Otto Schade Prize winner. He can be reached at nbalram1@hotmail.com. •