5 3D User Interface Output Hardware

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5. 3D User Interface Output Hardware

This chapter begins our discussion of the hardware commonly used in 3D UIs. We examine visual, auditory, and haptic display devices and see how they affect 3D UI design and development. Additionally, we describe the characteristics of different output devices and develop some general strategies for choosing appropriate hardware.

5.1 Introduction

A necessary component of any 3D UI is the hardware that presents information to the user. These hardware devices, called display devices (or output devices), present information to one or more of the user’s senses through the human perceptual system; the majority of them are focused on stimulating the visual, auditory, or haptic (i.e., force and touch) senses. In some cases, they even display information to the user’s olfactory system (i.e., sense of smell; Nakamoto 2013), gustatory system (i.e., taste; Narumi et al. 2011), or vestibular system (i.e., sense of self-motion; Cress et al. 1997). Of course, these output devices require a computer to generate the information through techniques such as rendering, modeling, and sampling. The devices then translate this information into perceptible human form. Therefore, we can think of displays as actually consisting of the physical devices and the computer systems used in generating the content the physical devices present. Although the focus of this chapter is on display devices, we do provide references to relevant material on topics such as graphical and haptic rendering, as well as material on 3D sound generation.

Display devices need to be considered when designing, developing, and using various interaction techniques in 3D UIs, because some interaction techniques are more appropriate than others for certain displays. Therefore, an understanding of display device characteristics will help the 3D UI developer make informed decisions about which interaction techniques are best suited for particular display configurations and applications.

In this chapter, we explore different display device types (specifically visual, auditory, and haptic devices) and styles that are commonly found in 3D UIs. This is the first of two chapters that deal specifically with hardware technology (see Chapter 6, “3D User Interface Input Hardware”). Recall that in Chapter 3, “Human Factors Fundamentals,” we discussed fundamental concepts of the human visual, auditory, and haptic systems, and this information is useful for illustrating the differences among various display types in this chapter. See Hale and Stanney (2014), Sherman and Craig (2003), and Durlach and Mavor (1995) for more details on the human sensory system and its relationship to display technology.

5.1.1 Chapter Roadmap

In section 5.2, we begin by looking at visual display devices. We first examine visual display device characteristics, then present the many different types of visual display devices commonly found in 3D UIs and discuss how they affect 3D UI design. In section 5.3, we examine auditory displays by first describing how 3D sound is typically generated. Next we present the pros and cons of different sound system configurations and discuss the benefits of sound in 3D UIs. In section 5.4, we discuss haptic display devices, beginning with a discussion of different haptic display device characteristics, the different types of haptic display devices, and their use in 3D UIs. In section 5.5, we use the concept of level of fidelity to characterize different displays. Finally, in section 5.6, we present some general guidelines and strategies for choosing display devices for particular systems or applications, followed by section 5.7 which presents the specific display configurations that we will use in each of our running case studies.

5.2 Visual Displays

Visual displays present information to the user through the human visual system. Visual displays are by far the most common display devices used in 3D UIs. As stated in the introduction to this chapter, display devices require the computer system to generate digital content that the display device transforms into perceptible form. For visual display devices in 3D UIs, real-time computer graphics rendering techniques are used to produce the images that the display device presents to the user. Many different real-time rendering techniques have been developed over the years. The details of these techniques are beyond the scope of this book; however, Akenine-Möller et al. (2008), Angel and Shreiner (2011), and Hughes et al. (2013) all provide comprehensive treatments for interested readers.

5.2.1 Visual Display Characteristics

A number of important characteristics must be considered when describing visual display devices. From a 3D UI perspective, we discuss a visual display’s

field of regard and field of view

spatial resolution

screen geometry

light transfer mechanism

refresh rate

ergonomics

effect on depth cues

Other characteristics include brightness, color contrast, and gamma correction.

Field of Regard and Field of View

A visual display device’s field of regard (FOR) refers to the amount of the physical space surrounding the user in which visual images are displayed. FOR is measured in degrees of visual angle. For example, if we built a cylindrical display in which a user could stand, the display would have a 360-degree horizontal FOR.

A related term, field of view (FOV), refers to the maximum number of degrees of visual angle that can be seen instantaneously on a display. For example, with a large flat-projection screen, the horizontal FOV might be 80 or 120 degrees depending on the user’s position in relation to the screen, because the FOV varies with the user’s position and orientation relative to the screen. A display device’s FOV must be less than or equal to the maximum FOV of the human visual system (approximately 180 degrees) and will be lower than that if additional optics such as stereo glasses are used.

A visual display device can have a small FOV but still have a large FOR, as in the case of a tracked head-worn display (see section 5.2.2). For a head-worn display, the FOV might be 50 degrees, but because the device is attached to the user’s head, with the display always in front of the user’s eyes, the FOR is 360 degrees. This is because the synthetic imagery is always perceived by the user regardless of her position or orientation (i.e., the user could make a 360-degree turn and always see the visual images). Thus, the FOV α is always less than or equal to the FOR β. In general, the user at each instant can view α degrees out of the full β degrees available to her.

Spatial Resolution

The spatial resolution of a visual display is related to pixel size and is considered a measure of visual quality. This measure is often given in dots per inch (dpi). The more pixels displayed on the screen, the higher the resolution, but resolution is not equivalent to the number of pixels (a common misuse of the term). Instead, resolution depends on both the number of pixels and the size of the screen. Two visual display devices with the same number of pixels but different screen sizes will not have the same resolution, because on the large screen, each individual pixel takes up a larger portion of the screen than on the small screen, and so the large screen has fewer dots per inch.

The user’s distance to the visual display device also affects the spatial resolution on a perceptual level. Pixel size can be measured in absolute units such as dots per square inch based on the size of the screen. However, pixel size can also be measured in degrees of solid angle subtended relative to the viewer. Thus, the further the user is from the display, the higher the perceived resolution, because individual pixels are not distinguishable. This is similar to the effect of viewing paintings from the pointillist movement (Kieseyer 2001). As the viewer gets close to the painting, she can see the individual dots on the canvas, but as she moves away from it, the dots disappear and the individual dots fuse together to form a cohesive image. This phenomenon can be a problem with lower-quality head-worn displays, because the user’s eyes must be very close to the display screens (although optics are typically used to place the virtual image a bit farther away), causing degradation in the perceived spatial resolution.

Screen Geometry

Another visual display characteristic that plays a role in visual quality is screen shape. Visual displays come in a variety of different shapes (see section 5.2.2), including rectangular, circular, L-shaped, hemispherical, and hybrids. In fact, using projectors coupled with projective texturing (also called projection mapping) can create a visual display on almost any planar or nonplanar surface. However, these irregular screen shapes coupled with the projection surface require nonstandard projection algorithms, which can affect visual quality. For example, hemispherical displays can suffer from visual artifacts such as distortion at the edges of the display surface, resulting in lower overall visual quality. In another example, using a part of the body as a projected visual display can produce distortion and color balancing issues based on the body part geometry and its color. With irregular screen shapes, the source images are often predistorted to accommodate the screen’s particular geometric shape.

Light Transfer

There are a number of different ways to transfer light, including through a monitor or television, front projection, rear projection, laser light directly onto the retina, and through the use of special optics. A variety of different technologies such as liquid crystals, light-emitting diodes, digital light processing, and organic light-emitting diodes (Hainich and Bimber 2011) can also be used. The light transfer method sometimes dictates what types of 3D UI techniques are applicable. For example, when using a front-projected display device, 3D direct manipulation techniques often do not work well, because the user’s hands can get in the way of the projector, causing shadows to appear on the display surface. Using a short-throw projector mounted close to the projection surface can alleviate this problem (Blasko et al. 2005).

Refresh Rate

Refresh rate refers to the speed with which a visual display device refreshes the displayed image from the frame buffer. It is usually reported in hertz (Hz, refreshes per second). Note that refresh rate is not to be confused with frame rate or frames per second (fps), the speed with which images are generated by the graphics system and placed in the frame buffer (Hughes et al. 2013). Although a graphics system could generate images at a rate higher than the refresh rate, the visual display can show them only at its refresh rate limit. The refresh rate of a visual display is an important characteristic because it can have a significant effect on visual quality. Low refresh rates (e.g., below 50–60 Hz) can cause flickering images depending on the sensitivity of a particular user’s visual system. Higher refresh rates are important because they help to reduce latency, the start to finish time of taking a user’s pose in a virtual or augmented environment and rendering the graphics from said user’s unique perspective on a visual display device (so-called “motion to photon” latency). Latency can cause a variety of side effects, from poor interaction performance to cybersickness (LaViola 2000a). Other factors besides refresh rate contribute to latency, including tracking and rendering speeds. All else being equal, however, higher refresh rates lead to lower latency, which results in a better interactive experience.

Ergonomics

Visual display ergonomics is also an important display characteristic. We want the user to be as comfortable as possible when interacting with 3D applications, and we want the visual display device to be as unobtrusive as possible. Comfort is especially important when a user has to wear the display. For example, the weight of a head-worn display can cause a user discomfort, such as muscle strain, when wearing the device for long periods of time. See Chapter 3, “Human Factors Fundamentals,” section 3.5 for background on physical ergonomics.

Depth Cue Effects

Finally, an important characteristic of a visual display is its ability to support various depth cues (see Chapter 3, “Human Factors Fundamentals”), especially with stereopsis. A visual depth cue’s strength varies depending on visual circumstances. Stereopsis is a very strong visual depth cue with objects in close proximity to the viewer (no more than 30 feet away), because there is very little binocular disparity with objects that are farther away. Because the oculomotor cues are linked to binocular disparity, they are also effective only for objects that are at short distances from the viewer. In contrast, motion parallax can be a very strong visual cue, perhaps stronger than stereopsis, when objects are viewed at a wide range of depths. Of the monocular static cues, occlusion is the strongest.

The monocular depth cues, aside from true accommodation, can be synthetically generated with almost any visual display device, assuming appropriate rendering hardware or software. In addition, motion parallax cues can be generated when the viewer and/or objects move through the world, using either physical or virtual movement. Stereopsis usually requires special-purpose visual display devices, and the display system must produce a left- and a right-eye image with correct geometric properties (i.e., varying binocular disparity depending on object depth).

As for oculomotor cues, visual display devices that allow for stereo viewing also provide a proper vergence cue. However, accommodation cues are generally not present in stereoscopic displays, because graphically rendered objects are always in focus at the same focal depth—the depth of the screen (Cruz-Neira et al. 1993). In fact, the lack of accommodation cues with the majority of stereo visual display devices causes the accommodation-vergence mismatch illustrated in Figure 5.1. Because the graphics are displayed on a fixed screen, the user must focus at that depth to see the graphics sharply. However, the left- and right-eye images are drawn so that the user’s eyes will converge to see the object at its virtual depth. When the virtual object depth and the screen depth are different, the user’s oculomotor system sends conflicting signals to the brain about the distance to the object.

In general, only “true 3D” displays like volumetric, holographic, and 3D retinal projection devices (see the next section) do not have this cue conflict. Additionally, light-field displays, which depict sharp images from out-of-focus display elements by synthesizing light fields that correspond to virtual scenes located within the viewer's natural accommodation range, can alleviate the accommodation-vergence mismatch (Lanman and Leubke 2013). However, these devices are still in their early stages of development.

Figure 5.1 Accommodation-vergence mismatch.

5.2.2 Visual Display Device Types

We now examine many different types of visual displays used in 3D UIs, which include the following:

single-screen displays

surround-screen and multiscreen displays

workbenches and tabletop displays

head-worn displays

arbitrary surface displays

autostereoscopic displays

We look at specific examples and identify the advantages and disadvantages of each visual display type with respect to 3D interaction.

Single-Screen Displays

Single-screen displays are commonly used in many different kinds of 3D applications, including video games, modeling, and scientific and information visualization. These displays include conventional monitors, high-definition and higher resolution televisions, and front- or rear-projection displays using a wall or screen material as the projection surface. Smartphone and tablet displays for use in mobile AR (Veas and Kruijff 2010) and VR applications also fall into this category. These displays are relatively inexpensive compared with other, more complex displays and can provide monocular and motion parallax depth cues. Stereopsis can also be achieved using a single-screen display and some additional hardware (stereo glasses and a stereo-capable graphics card, or special optics for using cell phones as head-worn displays). Note that some single-screen displays do not need any special hardware for stereopsis. These monitors are autostereoscopic in nature (discussed later in this section).

Besides an appropriate single-screen display, a pair of stereo glasses is also needed to achieve stereoscopic viewing. These glasses can be either active or passive. Active stereo glasses, often called shutter glasses, are synchronized to open and close their shutters at a rate equal to the refresh rate of the visual display, a technique known as temporal multiplexing. An example of active stereo glasses is shown in Figure 5.2. The images for the left and right eye are rapidly alternated on the screen, and the shutter glasses block one eye’s view in a coordinated sequence to prevent it from seeing the other eye’s view. Infrared or radio signals are sent to the glasses to maintain this synchronization. If the signal is blocked, the shutter glasses stop working and the stereo effect is disrupted. As a general guideline, 3D UIs should discourage the user from moving his hands or other physical objects into the line of sight between the glasses and emitters so as not to disrupt a user’s 3D stereoscopic experience. In order to achieve high active stereo quality, a single-screen display must have a high refresh rate, because the display of the two images (one for each eye) effectively halves the refresh rate. As a rule of thumb, a single-screen display with a refresh rate of 120 Hz or better is usually acceptable for obtaining well-perceived stereo images.

Figure 5.2 Active stereo glasses for viewing stereoscopic images. (Photograph courtesy of Joseph J. LaViola Jr.)

Passive stereo glasses use either polarization or spectral multiplexing. Polarization multiplexing filters the two separate, overlaid images with oppositely polarized filters. For example, one filter could be horizontally polarized and the other vertically polarized so that each eye sees only one image. Spectral multiplexing (also called anaglyphic stereo) displays the two separate, overlaid images in different colors. The glasses use colored filters so light from any color other than the filter’s color is effectively blocked out. For example, cyan and red anaglyph stereo glasses would allow only cyan light to pass through the cyan filter and red light to pass through the red filter, so that each eye sees only its own image. Anaglyph stereo is relatively inexpensive to produce, but it obviously has color limitations. Another type of spectral multiplexing is known as interference filtering, where specific wavelengths of red, green, and blue are used for the left eye, and a slightly different set of red, green, and blue wavelengths are used for right eye. This approach supports full color 3D stereo but is a more expensive solution compared to standard polarization.

In general, active stereo is considered to achieve the highest stereo quality. Active stereo glasses have the advantage that they can make use of the full color spectrum, in contrast to spectral multiplexed stereo glasses that use colored filters. However, passive stereo glasses are inexpensive compared to shutter glasses; they do not need double the refresh rate; and they present a slightly brighter image than active glasses, since they do not need to shutter out the light half of the time. Additionally, there are no synchronization issues between the glasses and the generated images (i.e., no synchronization emitter is needed). A more comprehensive discussion of 3D stereo display methods can be found in Lueder (2012).

A single-screen display coupled with a pair of stereo glasses (see Figure 5.3) makes for a simple yet effective visual display for 3D spatial applications. When a user’s head is also tracked (see Chapter 6 for information on user tracking), moving-viewer motion parallax becomes even easier to achieve. Such a setup is commonly referred to as fish-tank virtual reality (Ware et al. 1993). These configurations are relatively inexpensive compared to surround-screen devices (see descriptions later in this section), and they provide excellent spatial resolution at relatively large sizes. For example, a 60-inch 3D HDTV running at 1080p resolution can offer a compelling 3D stereo experience. They also allow the use of virtually any input device and can take full advantage of the keyboard and mouse. This flexibility provides the 3D UI developer with more input-device-to-interaction-technique mappings than do other visual display systems, because the user can see the physical world and is usually sitting at a desk.

On the other hand, single-screen displays are not very immersive, and the user has a very limited range of movement due to their small FOR. This limitation prohibits the use of many physically based travel techniques (see Chapter 8, “Travel”) and restricts the user’s ability to use her peripheral vision. Additionally, because of the visual display’s size, physical objects used for interaction may occlude the visual display, which can break any stereo illusion.

Figure 5.3 A monitor equipped with stereo glasses. (Photograph courtesy of Joseph J. LaViola Jr.)

Surround-Screen Displays

A surround-screen display is a visual output device that increases the FOR for a user or group of users by incorporating either a set of display screens (Figure 5.4 shows a canonical example), a large curved display screen, or some combination of curved and planar screens that makes use of one or more light projection devices. The key characteristic of a surround-screen display is that it tries to “surround” the user with visual imagery in order to provide a large FOR, with the surround level varying based on different device configurations. Over the years these output devices have had numerous configurations from simple multi-monitor displays to large surround multi-projection screens that completely envelop the user.

One of the first surround-screen displays used for 3D interaction was developed at the Electronic Visualization Laboratory at the University of Illinois at Chicago. This system was called the CAVE (Cruz-Neira et al. 1993) or Cave Automatic Virtual Environment, and consisted of four screens (three walls and a floor). This particular display configuration has had many variations and typically has three or more large planar projection-based display screens (often between 8 and 12 feet in width and height) that surround the human participant. The screens are arranged orthogonally so the device looks like a large box. Typically, the screens are rear-projected so users do not cast shadows on the display surface. However, in some cases, the floor of these surround-screen devices can use front-projection as long as the projector is mounted such that the user’s shadow is behind the user (e.g., as in Figure 5.4). In more sophisticated configurations, six screens are used to provide a 360-degree FOR, and all screens are rear-projected, including the floor (requiring a lot of extra space to house the device). Regardless of the number of screens, either a single PC with powerful graphics cards capable of handling many screens at once or a PC cluster is typically used to generate the images.

Figure 5.4 An example of a surround-screen VR system. The image shows a four-sided device (three walls and a floor). A projector mounted above the device projects images onto the floor (not pictured). (3D model courtesy of Mark Oribello, Brown University Graphics Group)

There have been a number of interesting variations on the basic structure of a surround-screen display. These variations are based on a number of factors, such as display size, FOV and angle between screens, and number of displays or projectors. For example, the screens do not have to be orthogonal, as shown in Figure 5.5. Here, three large screens are used, and the angle between them is approximately 120 degrees. This type of surround-screen display configuration still provides a large FOV but has a more limited FOR. Smaller versions of the surround-screen display concept have also been developed. The Computer-driven Upper Body Environment (CUBE) is a 360-degree display environment composed of four 32- by 28-inch rear-projected Plexiglas screens. Users stand inside the CUBE, which is suspended from the ceiling, and physically turn around to view the screen surfaces. The screens are approximately one foot from the user’s face and extend down to his or her midsection. It was developed at the Entertainment Technology Center at Carnegie Mellon University and represents a small, personalized version of a display that completely surrounds the user.

Other examples of these smaller devices make use of inexpensive LCD panels (see Figure 5.6). As the number of panels increase, the user’s FOV and FOR will increase as long as the number of added panels increases the amount of visible screen space. Of course, using collections of LCD panels often results in the presence of perceptible barriers between screens, reducing the seamlessness of the surround-screen display. Many displays of this sort attempt to surround the user in an arc or cylindrical pattern.

Figure 5.5 A variation on the traditional, orthogonal surround-screen display system. This device uses 3 large planar screens where the angle between them is 120 degrees. (Photograph courtesy of Joseph J. LaViola Jr.)

Figure 5.6 A surround-screen display using a collection of display panels. (Photograph courtesy of Chris North, Department of Computer Science, Virginia Tech)

Up until now, we have examined surround-screen displays that make use of planar screen geometry. However, surround-screen displays also can utilize curved screens, the most common being hemispherical (see Figure 5.7), cylindrical, or some combination of the two (see Figure 5.8). These types of displays typically use either front or rear projection. Curvature can be approximated by either making the angle between planar display screens greater than 90 degrees, like the display in Figure 5.5, or by using a collection of display panels, as in Figure 5.6. However, we do not consider these configurations to be true curved surround-screen displays.

For true curved surround-screen displays that make use of either front or rear projection, special software and/or optics are required to support projection diameters of varying sizes. In the case of front projection for a hemispherical display, a wide-angle lens is attached to the projector, which distorts the output image. Spherical mapping software is used to predistort the image so that it appears correctly on the curved screen. In Figure 5.7, the user sits in front of a small table and can interact with 3D applications using a keyboard and mouse or 3D input devices. Hemispherical displays with much larger projection diameters have also been developed to accommodate multiple users. Similar approaches are taken for rear-projected curved surround-screen displays like the one in Figure 5.8, with the distinction that the predistorted image conforms to the exact geometry of the display surface.

Figure 5.7 A hemispherical display used for interactive molecular visualization. (Photograph courtesy of Paul Bourke)

Figure 5.8 An example of a curved, rear-projected surround-screen display that combines both hemispherical and cylindrical geometry. (Photograph courtesy of Digital Projection)

There are a number of advantages to using a surround-screen display system. They typically provide high spatial resolution and a large FOR. In addition, such devices also have a large FOV, allowing the user to utilize his peripheral vision. Like a single-screen display, surround-screen displays provide monocular depth cues and motion parallax, and when the user is tracked and wears stereo glasses, the device also provides additional motion parallax cues and stereopsis respectively. In contrast with fish-tank VR, a tracked user in a large surround-screen display has much better moving-viewer motion parallax, because she can actually walk around in the device. For example, a user could walk around a virtual chair projected in the center of the display and see the sides and back of the chair. The user would not have this capability with a fish-tank VR setup. Stereopsis issues with surround-screen devices are also similar to those found with a stereo single-display configuration. Surround-screen stereo can be done actively with shutter glasses or passively with, for example, polarized glasses and special polarized lenses for the projectors. Additionally, real and virtual objects can be mixed in the environment. For example, a tractor’s cockpit could be brought into the display so an operator could test out its controls while driving through virtual farmland.

One of the biggest disadvantages of large surround-screen display devices is that they are expensive and often require a large amount of physical space. For example, a 10- by 10- by 10-foot CAVE with three walls and a floor can require a room at least 30 feet long, 22 feet wide, and 12 feet high to handle the display screens and the mounted projectors. However, using shorter throw projectors or even clusters of LCD panels can reduce the required room size considerably. The expense is not as much of a problem with smaller surround-screen displays like multi-monitor configurations.

Another problem with these systems, as well as any projection-based display system, is that users can have difficulty seeing objects in stereo under certain conditions. When the user gets close to the display or when objects appear to be right in front of the user, it becomes increasingly difficult to use the visual system’s accommodation and vergence capabilities to fuse the two images together. Eye strain is a common problem in these situations.

When more than one person occupies a surround-screen display, they all view the same screens. However, images are typically rendered from only one tracked user’s perspective. If an untracked viewer moves in the device, there will be no response from the virtual environment (i.e., the images will be distorted). As the tracked user moves, all non-tracked users effectively see the environment through the tracked user’s perspective, which can cause cue conflicts and lead to cybersickness (LaViola 2000a). This problem is a fundamental limitation of surround-screen displays, as well as any visual display device that claims to accommodate multiple users, and is a disadvantage compared to the unlimited number of active viewpoints (user viewpoints that are head tracked to ensure the correct viewing perspective) when using multiple head-worn displays. If non-tracked users stay close to and look in the same direction as the tracked user, they can get a rough approximation of the correct viewpoint, but this is not a completely satisfactory solution.

Over the years, several techniques have been developed to increase the number of active viewpoints. One approach is to use shutter glass synchronization to support two active stereoscopic viewpoints (Agrawala et al. 1997) or four monoscopic viewpoints (Blom et al. 2002). The basic approach to adding more than one active viewpoint is to render images from two or more tracked users and then synchronize the images with the shutter glasses. For example, to allow two active stereoscopic viewpoints, the graphics must be rendered and displayed four times (once for each eye of the two viewers) per frame. The shutter glasses then must be modified so that for one viewer, the left and right eyes switch between the ON/OFF and OFF/ON states while the second viewer’s glasses are turned completely off. Then, the second viewer’s glasses are in the ON/OFF and OFF/ON states while the first viewer’s glasses are turned completely off. Of course, when using this approach, the refresh rate for each eye is cut in half because each eye will see only one of four images displayed; this may cause flicker problems. The frame rate will also be affected, because the rendering engine must perform twice as many computations. More sophisticated approaches have been developed that support stereo for up to six tracked users (Kulik et al. 2011). This approach uses six customized DLP projectors for fast time-sequential image display coupled with polarization techniques. High-speed shutter glasses running at 360 Hz are required and can be programmed from the application to adapt to different scenarios. This approach was initially designed for a large single display screen but could be extended to surround-screen display configurations.

Although physical objects (including the user’s body) do not have to be represented as graphical objects in surround-screen display systems, an important issue with using physical objects in this type of display device is the physical/virtual object occlusion problem. The problem arises when the user tries to move a physical object behind a virtual object. Visually, the physical object will appear in front of the graphical object because the graphical object is actually being projected on the screen. This is a common problem with any projection or panel-based display device, and it can lessen the immersive experience and break the stereo illusion.

For those surround-screen displays that use front projection (which makes the display brighter than most rear-projected devices), direct 3D selection, manipulation, and navigation techniques may not work well, because moving too close to the display surface casts shadows on the screen, breaking the stereo illusion and possibly occluding virtual objects. For these types of devices, indirect techniques such as virtual laser pointers, gestures, or keyboard- and mouse-based video game style controls work best. Finally, if the surround-screen display has curvature, spatial resolution and image quality are usually not uniform across the display. For example, the center of the hemispherical display may have higher resolution and quality, while these values decrease toward the display’s edges. However, work by Majumder and Sajadi (2013) has begun to address this issue with automatic screen calibration methods.

Workbenches and Tabletop Displays

Another type of display is like a workbench or tabletop. One of the original displays of this type was the Responsive Workbench, developed by Krüger and Fröhlich (1994). These display devices are used to simulate and augment interaction that takes place on desks, tables, and workbenches. Figure 5.9 shows two workbench displays with differing characteristics. The picture on the left shows a standard single workbench where the display can be rotated to be completely horizontal or almost vertical, designed primarily for 3D stereo and interaction. The image on the right shows a small workbench that has a pressure-sensitive display surface for 2D input, which lets users write on the screen surface and makes it easier to utilize both 2D and 3D interaction techniques.

Figure 5.9 Workbench style displays. (Photographs courtesy of Barco and Fakespace Systems)

The example displays shown in Figure 5.9 have evolved in recent years with technology advancements. Workbenches like the one in the left image of Figure 5.9 have become smaller and more personalized as shown in Figure 5.10. With the advent of low-cost multi-touch input technology such as frustrated total internal reflection (Han 2005) and capacitive touch screens, the workbench on the right in Figure 5.9 has evolved into horizontal, multi-touch, table-based non-immersive 2D displays. However, coupling 3D UI technologies (3D stereo and user tracking) with these displays has provided even more powerful technology for 3D spatial interaction, since multi-touch input can be combined with 3D UI techniques to create rich interaction experiences (Steinicke et al. 2013). Figure 5.11 shows an example of a table-based display that uses both multi-touch and 3D spatial input (Jackson et al. 2012).

Figure 5.10 A personalized workbench display that supports tracked 3D stereo and interaction. (Photograph courtesy of Joseph J. LaViola Jr.)

Figure 5.11 A table-based workbench style display that combines multi-touch and 3D spatial input. (Photograph courtesy of Bret Jackson)

In general, workbenches and table-based displays provide relatively high spatial resolution and make for an intuitive display for certain types of applications. For example, a horizontal display configuration is ideally suited to a surgical training application, while a display with a 35-degree orientation would be useful in a drafting or 3D modeling application. Relative to large surround-screen displays, workbench screen sizes are smaller, which improves visual quality. Workbenches and table-based displays also can provide the same visual depth cues that surround-screen and single-screen displays do (assuming appropriate stereo and user-tracking hardware).

As with a large surround-screen or single-screen display, the device can accommodate multiple users, but with the same viewpoint constraints described in the previous section. In general, users have limited mobility when interacting with a workbench because the display is stationary, rather than head-coupled like a head-worn display (discussed later in this section) and does not enclose them like a large surround-screen display. Therefore, as with single-screen displays, the range of viewpoints from which a user can see 3D imagery is restricted, because from some viewpoints, all or part of the screen is not visible. For example, it would not be possible for a user to see the bottom of a stationary graphical object with a completely horizontal table display, because the display surface would no longer be in the user’s FOV. From a 3D interaction perspective, then, physically based travel techniques are not appropriate when using a workbench, because the user has little maneuverability compared to other display devices. However, most direct selection and manipulation techniques (see Chapter 7, “Selection and Manipulation”) work well because most of the screen real estate is within arm’s reach.

Head-Worn Displays

Our discussion of visual display devices has so far focused on stationary displays (i.e., displays that do not move with the user). In this section, we examine visual displays in which the device is attached (coupled) to the user’s head. A head-coupled display device used for 3D applications is typically called a head-mounted display (HMD) or a head-worn display (HWD), or more colloquially, a “headset,” “goggles,” or “glasses.” Although it is not yet a common term except in some academic circles, we have chosen to use the term “head-worn display (HWD)” throughout this book. This term emphasizes the fact that such displays are wearable—they can be put on and taken off like a watch, a necklace, or a piece of clothing. The term “HMD,” while very common in the literature, is a holdover from the days when displays were attached permanently to pilots’ helmets; we no longer “mount” displays to people’s heads! An HWD is a sophisticated piece of equipment, because it requires the complex integration of electronic, optical, mechanical, and even audio components (some HWDs support 3D spatial sound). As such, many different HWDs have been designed and developed over the years, with a variety of design choices and tradeoffs. Regardless of internal design, an HWD’s main goal is to place images directly in front of the user’s eyes using one or two small screens (e.g., LCD, OLED). In some cases, a smartphone’s high-resolution screen can be used as the display engine for an HWD (see Figure 5.12). A combination of refractive lenses and/or mirrors (depending on the optical technique used) is used to present and sometimes magnify the images shown on the screens (see Melzer and Moffitt [2011] for more details on the physical design of HWDs). An example of a HWD that has screens and optics embedded in the device is shown in Figure 5.13.

Figure 5.12 An example of a head-worn display that uses a cell phone as the display engine. (Photograph courtesy of Samsung)

Figure 5.13 A head-worn display for virtual reality. (Photograph courtesy of Sony)

There have been a variety of different head-coupled display designs that make use of the basic HWD concept. In the early 1990s, Bolas (1994) developed an arm-mounted display that applied a counterweight on the opposite side of the display to make it easier to manipulate (see Figure 5.14). This counterweight made it possible for this device to use better optics and heavier, higher-quality display screens (i.e., CRTs), providing greater resolution. This device also made use of mechanical tracking technology (see Chapter 6) to track the user’s head position and orientation. Tsang and colleagues (2002) used this idea to create an arm-mounted device that attached a flat panel display to an armature. This display enables the metaphor of a window into the 3D virtual world. Given display and optics technology miniaturization and improvement, the same or better resolution can be achieved in a small and lighter form factor. For example, today’s HWDs achieve high-definition quality or better resolution and can weigh less than one pound, making HWD-based arm-mounted displays less relevant for 3D spatial interfaces.

Figure 5.14 An arm-mounted display called the Binocular Omni-Orientation Monitor. (Photograph of the Boom 3C courtesy of Fakespace Labs, Mountain View, California)

Another HWD design variation is the head-mounted projective display (HMPD). In HMPDs, small projectors are attached to the head-coupled device, and these project the graphical images into the real environment. Retroreflective material (a special bendable material that reflects light back in the direction it came from, regardless of its incident angle with the screen surface) is placed strategically in the environment so that the user sees the graphics reflecting off the material. HMPDs thus are a hybrid between conventional HWDs and projection displays (Hua et al. 2001). HMPDs are ideally suited to collaborative applications in mixed and augmented reality, because using retroreflective screens provides correct occlusion cues for both virtual and real objects and because all participants have their own individually correct viewpoints (Hua et al. 2002). With projector miniaturization, microprojectors can be used in HMPDs giving them a form factor on par with sunglasses, making them extremely lightweight with high resolution (see Figure 5.15).

HWDs that project images directly onto a user’s retina are known as virtual retinal displays. The virtual retinal display (VRD), also called the light-scanning display, was invented at the Human Interface Technology Lab (Tidwell et al. 1995). With a VRD, photon sources are used to generate coherent beams of light (e.g., lasers, LEDs), which allow the system to draw a rasterized image onto the retina. In order to generate full-color images, three light sources are needed (red, green, and blue), while monochrome versions of the VRD require only one. These light beams are intensity-modulated to match the intensity of the image being rendered, meaning that is possible to produce fully immersive and see-through display modes for AR applications. The light beams then scan across the retina to place each image point, or pixel, at the proper position.

Figure 5.15 An example of a head-mounted projective display that supports both virtual and augmented reality applications by using an attachment that makes it project onto an integral opaque or see-through surface, instead of onto surfaces in the environment. (Image courtesy of CastAR)

Another variation on the traditional HWD is a design that supports seeing both virtual and real imagery. These types of HWDs support augmented and mixed reality applications. There are three main HWD designs that support the ability to see both real and virtual imagery simultaneously (Billinghurst 2014). Optical see-through displays place optical combiners in front of the user’s eyes; the combiners are partially transparent, so that the user can see the real world through them, and partially reflective, so that the user can see virtual images reflected from small head-mounted screens. The advantage of optical see-through displays is that they provide a direct view of the real world with full resolution and no time delay. However, it is more difficult to get wide FOVs and it is much easier to see registration problems with the superimposed virtual imagery.

Video see-through displays work by streaming real-time video from head-mounted cameras to the graphics subsystem, which renders virtual computer graphics images into the video buffers in real time, blending the virtual and real. The result is displayed to the user in a traditional closed-view HWD (Azuma 1997). Video see-through displays provide a digitized image of the real world, making wide FOVs easier to support as well as providing more registration and calibration strategies between virtual imagery and the real world. In addition, they provide the full gamut of compositing technologies to combine real and virtual imagery. On the other hand, a video image of the real world is almost always of lower visual quality than a direct optical view. It is important to note that with video see-through displays the views must be orthoscopic (Drascic and Milgram 1996), which is crucially dependent on the optical path to the cameras that must be typically folded (State et al. 2005). Figure 5.16 presents an example of an early AR video see-through display, while Figure 5.17 shows a modern optical see-through display.

Figure 5.16 Early video see-through AR. (Included by permission from UNC Department of Computer Science)

Figure 5.17 Modern optical see-though AR displays. (Photographs courtesy of Epson and Microsoft)

Finally, a third design is to use head-worn projector-based displays (including ones that don’t use retroreflective surfaces). These HWDs support combining real and virtual in the actual physical environment, rather than in the optics or the computer. Of course, one of the many challenges with this type of display is ensuring that there is correct color correction and perspective viewing since the images are projected onto arbitrary objects. This type of design is further discussed the “Arbitrary Surface Display” section below.

In terms of visual depth cues, as with other display systems, tracked HWDs allow for all the monoscopic and motion parallax depth cues. Stereoscopy is produced differently with HWDs than with projection-based displays and monitors. With non-head-coupled devices, active stereo is produced often using temporal multiplexing (i.e., drawing images for the left and right eye on the same screen sequentially). With an HWD, stereopsis is achieved by drawing two separate images on one screen or two separate screens—one for each eye—at the same time.

One of the biggest advantages of HWDs is that the user can have complete physical visual immersion (i.e., a 360-degree FOR), because the user always sees the virtual world regardless of head position and orientation. Even though tracked HWDs have a 360-degree FOR, the FOV will vary depending on the device construction and cost. HWDs with small FOVs can cause perception and performance problems. Even high-end HWDs can have limited FOVs when compared to surround-screen displays. According to Neale (1998), restricted FOVs can impede a user’s ability to acquire spatial information and develop spatial cognitive maps of unfamiliar spaces. Small FOVs may also produce distortions in perception of size and distance and reduce performance on some visual tasks. However, better optics and display technology have helped to improve HWD FOVs with low-cost designs achieving 100-degree FOVs or more, reducing these perceptual issues.

HWDs have other benefits when compared to projection and single-screen displays. HWDs do not suffer from the active viewpoint problem that plagues projection and single-screen displays, because each user can have his own HWD with his own tracked view of the virtual world. Of course, multiple HWDs require more graphical computing power to drive them, and multiple viewers also cannot see each other directly when wearing non-see-through HWDs.

HWDs have both pros and cons when it comes to stereoscopic viewing. Because they use one display per eye, HWDs eliminate the need to do temporal multiplexing. A potential problem with stereoscopic HWDs is that each person has a different interocular distance (the distance between the two eyes), meaning that stereo images have to be separated by that distance for correct binocular stereopsis. However, many HWDs provide a way to adjust the screens and optics for a range of interocular distances, improving stereo viewing. In addition, the interocular distance should be changed by setting the distance between the virtual cameras used in the 3D application. Tracking the position of the user’s eyes relative to a HWD’s optics can also help to maintain proper interocular distance (Itoh and Klinker 2014).

In many HWD designs, the images the user sees are always in focus and have the same focal depth, causing accommodation and vergence cue conflicts when users look at objects with different virtual depths, leading to eye strain and discomfort. This phenomenon also occurs with projection-based and single-screen displays but may be more pronounced with HWDs and VRDs, because the displays are close to the user’s eyes. Omura et al. (1996) developed a system to alleviate this problem by incorporating movable relay lenses into an HMD. The lenses are continuously adjusted, based on gaze direction. The virtual screen surface appears to be at the same distance as the user’s vergence point, creating a display with matched accommodation and vergence cues. Research by McQuaide and colleagues (2002) developed something similar for a VRD using a deformable membrane mirror (a microelectromechanical system) to dynamically change the focal plane, which would enable the viewer to see 3D objects using the natural accommodative response of the visual system cues.

Eye tracking (see Chapter 6) appears to be a useful approach for dealing with accommodation-vergence problems. Using eye tracking will also significantly benefit VRDs, because when users move their eyes while using a VRD, they can lose all or part of the image, since the images from retinal displays are sent to a fixed location. Work by Chinthammit et al. (2002) demonstrated how an eye-tracking system can be used to let VRD images be coupled to a user’s eye movements. These types of eye-tracking-coupled display systems are still in the early stages of development, but low-cost eye-tracking technology is making this approach more practical and will enable HWDs to support all the visual depth cues.

HWDs have the advantage of being more portable and often less expensive as compared to surround-screen and single-screen displays. However, surround-screen and single-screen displays generally have higher spatial resolutions than HWDs, because the display screens have to be small and lightweight to keep the overall weight of the HWD low. Even though HWDs can achieve high resolution, their projection and single-screen counterparts tend to always be ahead of the resolution curve, and this trend is likely to continue until we reach the limits of the human visual system.

Ergonomically, the weight and weight distribution of an HWD must be considered, because a heavier HWD can cause strain in neck muscles from extended use. HWDs may not fit every user, because everyone has a different head size and shape. The HWD’s center of gravity is also an essential part of maintaining user comfort (Melzer 2014). Another ergonomic consideration for many HWDs is the cabling that connects the display to the computer. Users of non-see-through HWDs may trip over cables or get wrapped up in them as they move around. Approaches to address this problem include moving computation into the display itself, having the user carry or wear the computer, or using wireless technology to transmit display and input signals between the computer and the HWD.

Because the real world may be completely blocked from the user’s view, interaction while wearing an HWD often requires some type of graphical representation of either one or both hands, or the input device used. These graphical representations can be as simple as a cube or as sophisticated as a high fidelity hand model. Non-see-through HWDs also limit the types of input devices that can be used because the user cannot physically see the device in order to use it. It’s also important to graphically represent the boundaries of the physical space and any physical obstacles, so that the user avoids injury.

Arbitrary Surface Displays

Thus far, we have examined displays that use simple planar or curved screens. Another display approach is to project imagery directly on arbitrary surfaces of any shape or size. Creating visual displays out of everyday objects and surfaces is known as projection mapping or spatial augmented reality (Bimber and Raskar 2005).

Projecting imagery onto arbitrary surfaces presents many challenging problems, and the level of difficulty is often dependent on the complexity of the geometrical surface and its color and texture characteristics. Other issues include whether 3D stereo is needed, how to deal with shadows, and display area restrictions. A common approach to creating these types of displays from a hardware perspective is to use one or more projector-camera pairs. In some cases, projector-camera pairs are placed onto steerable motorized platforms that can move in real time to project imagery to increase the display surface size (Wilson et al. 2012). Using more than one projector helps to alleviate many issues due to shadows and display size. The camera is used to perform display surface estimation, including the display surface’s geometry, color, and texture. Calibration of the camera and the projector is also required to ensure that the images are displayed correctly. Another approach is to use optical overlays, such as mirror beam combiners or transparent surfaces. More details on these approaches can be found in Bimber and Raskar (2005). Some examples of arbitrary surface displays are shown in Figures 5.18 and 5.19.

In terms of 3D stereo, one of the advantages of arbitrary surface displays is that projecting onto 3D objects supports appropriate depth, since the images are directly placed onto the 3D surface. However, if images need to appear in front of or behind the display surface, view-dependent stereoscopic projection is required (Raskar et al. 1998). Depending on the display configuration, support for multiple head-tracked viewers is also possible. For example, a version of the Virtual Showcase (Bimber et al. 2001) makes use of four mirrors so each user can have their own view. With these displays, 3D spatial interaction is often limited to passive viewing, but they do support various selection and manipulation techniques. As with surround-screen displays, if front projection is used, more indirect methods of interaction are needed (see Chapter 7, “Selection and Manipulation”) since direct manipulation can cause shadows, breaking the display illusion.

Figure 5.18 The Illumroom, an example of projection mapping to create an extended display in a living room. The projector-camera pair correctly projects onto the various surfaces in the room. (Photograph courtesy of Microsoft)

Figure 5.19 The Virtual Showcase, an augmented surface display that projects virtual information onto a physical 3D artifact, presenting a seamless integration of the physical object and virtual content. (Photograph courtesy of Oliver Bimber)

Autostereoscopic Displays

In this last section on visual displays, we discuss autostereoscopic display devices, which generate 3D imagery without the need for special shutters or polarized glasses. These displays mainly use lenticular, volumetric, or holographic display technology, which we mention below. However, other techniques for creating autostereoscopic displays exist as well, including compressive light fields, diffractive-optical elements, integral imaging, parallax illumination, and barrier grids. A discussion of these techniques goes beyond the scope of this book, but Holliman et al. (2011) and Wetzstein et al. (2012) provide nice surveys of these autostereoscopic display technologies.

Parallax barrier displays use a vertical grating where one eye sees odd pixels through the grating, while the other eye sees even pixels to support stereopsis. Lenticular displays (see Figure 5.20) typically use a cylindrical lens array in front of the display screen. The cylindrical lens array approach directs different 2D images into different subzones, and these zones are projected out at different angles. When the viewer’s head is positioned correctly in front of the display, each eye sees a different image, allowing for binocular disparity. These lenses effectively replace the vertical grating in the parallax barrier approach. These techniques have the drawback that the user must be in a stationary position. Extensions to the basic approach do allow for user movement and the maintaining of motion parallax (see Dodgson 2005, Perlin et al. 2000, and Schwerdtner and Heidrich 1998 for examples).

According to Blundell and Schwarz (2000), a volumetric display device permits the generation, absorption, or scattering of visible radiation from a set of localized and specified regions within a physical volume (holographic displays can also fit into this definition). In other words, volumetric displays create “true” 3D images by actually illuminating points in 3D space. This is in contrast to other stereoscopic displays, which provide the illusion of 3D images but are actually projected onto a 2D surface.

Figure 5.20 A lenticular display. (Photograph courtesy of Joseph J. LaViola Jr.)

Volumetric displays generate 3D imagery using a number of different methods. Two of the most common use either a swept- or a static-volume technique (Blundell 2012). Swept-volume techniques sweep a periodically time-varying 2D image through a 3D spatial volume at high frequencies. Displays using swept-volume techniques (see Figure 5.21) have a mechanical component, because they spin a 2D screen within the viewing volume to create the 3D image. Static-volume techniques create 3D images without the need of mechanical motion within the viewing volume. One static-volume approach uses two intersecting invisible laser beams to create a single point of visible light. This allows the drawing of voxels (3D pixels) inside the volume by controlling when and where the lasers intersect (Downing et al. 1996). Rapidly scanning these intersection points in a predefined way enables the drawing of a true 3D image (Ebert et al. 1999). Other static-volume approaches use a high-speed projector with a stack of air-spaced liquid crystal scattering shutters (multiplanar optical elements), which act as an electronically variable projection volume. The high-speed projector projects a sequence of slices of the 3D image into the stack of scattering shutters, and each slice is halted at the proper depth (Sullivan 2003). Figure 5.22 shows a display using this static-volume approach.

One display system that provides a rough approximation of what a volumetric display looks like is known as pCubee (see Figure 5.23). Although it does not create “true 3D,” the use of LCD panels placed in a box configuration coupled with a head-tracked perspective rendering and a physics simulation gives users the impression that they are looking into a cubic volume (Stavness et al. 2010). This approach is a less expensive alternative to true volumetric displays. Brown et al. (2003) did something similar to pCubee, using a head-mounted projective display and a retroreflective cube.

Figure 5.21 A volumetric display system that uses the swept-volume technique. On the left is the display device and on the right a volumetric image. (Photographs courtesy of Actuality Systems)

Figure 5.22 A volumetric display that uses a multiplanar, static-volume approach to generate 3D images. (Photograph courtesy of LightSpace Technologies)

Figure 5.23 pCubee, a simulated volumetric display that puts five LCD panels in a box configuration. (Photograph courtesy of Dr. Ian Stavness)

Holographic displays are similar to volumetric displays in that they both produce true 3D images, but the two categories use different techniques to generate the imagery. Holographic displays produce 3D imagery by recording and reproducing the properties of light waves (amplitude, wavelength, and phase differences) from a 3D scene. The process involves a computational step in which a 3D description of a scene is converted into a holographic fringe pattern (a diffraction pattern that captures light from different directions) and an optical step that modulates the fringe pattern and turns it into a 3D image. Lucente (1997) provides a detailed introduction to the steps involved in generating 3D images using holographic displays, and more information on approaches to creating holographic displays can be found in Yaras et al. (2010).

An important concept to understand about volumetric and holographic displays is that because they produce true 3D imagery, they do not suffer from the active viewpoint problem that plagues projection-based displays and monitors. Therefore, the number of viewers with the correct perspective is basically unlimited. In addition, with these devices no trackers are needed to maintain moving-viewer motion parallax. They also do not suffer from the accommodation–vergence cue conflicts that accompany more traditional stereoscopic displays. However, current volumetric displays have the problem that they cannot provide many monocular depth cues, such as occlusion and shading, unless coupled with head tracking. Additionally, both volumetric and holographic displays generally can display images only within a small working volume, making them inappropriate for immersive VR or AR.

There has been little 3D UI research with lenticular and holographic displays (Bimber 2006). However, the same interfaces that work for rear-projected displays should also work for lenticular displays. Volumetric displays have an interesting characteristic in that many of them have an enclosure around the actual 3D imagery, meaning that users cannot physically reach into the display space. Balakrishnan et al. (2001) built volumetric display prototypes to explore these issues. They developed a set of UI design options specific to these displays for performing tasks such as object selection, manipulation, and navigation (Balakrishnan et al. 2001). Grossman and Balakrishnan extended this work using multi-finger gestures (Grossman et al. 2004) to interact with volumetric displays; they also designed selection (Grossman and Balakrishnan 2006) and collaboration techniques (Grossman and Balakrishnan 2008).

5.3 Auditory Displays

Auditory displays are another important aspect of presenting sensory information to the user, but they are overlooked in many 3D UIs (Cohen and Wenzel 1995). One of the major goals of auditory displays for 3D UIs is the generation and display of spatialized 3D sound, enabling the human participant to take advantage of his auditory localization capabilities. Localization is the psychoacoustic process of determining the location and direction from which a sound emanates (Sherman and Craig 2003). Having this feature in a 3D application can provide many important benefits to the 3D UI designer.

Note that the topic of auditory displays and 3D sound generation is a very large and active field, and going into great detail on the subject is beyond the scope of this book. For more details on 3D sound and audio displays, Blauert (1997), Begault (1994), Vorländer and Shinn-Cunningham (2015), and Xie (2013) all provide comprehensive introductions. Chapter 3, “Human Factors Fundamentals,” describes the human auditory system and spatial sound perception in section 3.3.2.

5.3.1 3D Sound Generation

Before 3D sound can be used in a 3D UI, it must be generated in some way. This generation process is important because it can have an effect on the quality of the interface and the user’s overall experience with the application. There are many different techniques for generating 3D sound, and we briefly describe two of the most common. The first is 3D sound sampling and synthesis, and the second is auralization.

3D Sound Sampling and Synthesis

The basic idea behind 3D sound sampling and synthesis is to record sound that the listener will hear in the 3D application by taking samples from a real environment. For example, with binaural audio recording, two small microphones are placed inside a person’s ears (or in the ears of an anthropomorphic dummy head) to separately record the sounds heard by left and right ears in the natural environment. All of the 3D sound cues discussed in section 3.3.2 are present in these recordings, which are capable of producing very realistic results. However, the main problem with this type of sound generation is that it is specific to the environmental settings in which the recordings were made. Therefore, any change in the sound source’s location, introduction of new objects into the environment, or significant movement of the user would require new recordings. Unfortunately, these changes will occur in most 3D applications, making this basic technique impractical for the majority of situations.

An alternative approach, which is one of the most common 3D sound-generation techniques used in 3D applications today, is to simulate the binaural recording process by processing a monaural sound source with a pair of left- and right-ear head-related transfer functions (HRTFs, discussed in section 3.3.2), corresponding to a desired position within the 3D environment (Kapralos et al. 2003). With these empirically defined HRTFs, real-time interactivity becomes much more feasible, because particular sound sources can be placed anywhere in the environment and the HRTFs will filter them accordingly to produce 3D spatial audio for the listener.

As with binaural recording, there are some important issues that need to be considered with the HRTF approach. In general, HRTF measurements are taken in echo-free environments, meaning that they will not produce reverberation cues. Additionally, as with binaural recording, one pair of HRTFs applies to only one position in the environment, which means that many HRTF pairs are needed in order to have spatial audio in the whole space. Ideally, HRTF measurement should be done for all possible points in the space, but this is highly impractical because of time and resource constraints. One approach to dealing with this problem is to use interpolation schemes to fill in the HRTF measurement gaps (Kulkarni and Colburn 1993). The other major issue with using HRTF measurements for generating spatialized sound is that there is a large variation between the HRTFs of different subjects. These differences are due to variations in each listener’s outer ears, differences in measurement procedures, and perturbations in the sound field by the measuring instruments (Carlile 1996). One method for dealing with these variations is to use generic HRTFs. These HRTFs are constructed in a number of ways, including using an anthropomorphic dummy head or averaging the responses of several listeners (Xie 2013). Another method is to build personalized HRTFs that are specific to a particular user by capturing a 3D model of the user’s head and ears that can then be fed into a numeric sound propagation solver (Meshram et al. 2014).

Auralization

Auralization is the process of rendering the sound field of a source in space in such a way as to simulate the binaural listening experience through the use of physical and mathematical models (Kleiner et al. 1993). The goal of auralization is to recreate a listening environment by determining the reflection patterns of sound waves coming from a sound source as they move through the environment. Therefore, this process is very useful for creating reverberation effects.

There are several computer-based approaches to creating these sound fields, including wave-based modeling, ray-based modeling, ambisonics, and wave-field synthesis. With wave-based modeling techniques, the goal is to solve the wave equation so as to completely re-create a particular sound field. In many cases, there is no analytical solution to this equation, which means that numerical solutions are required. In the ray-based approach, the paths taken by the sound waves as they travel from source to listener are found by following rays emitted from the source. The problem with the ray-based approach is that these rays ignore the wavelengths of the sound waves and any phenomena associated with them, such as diffraction. This means this technique is appropriate only when sound wavelengths are smaller than the objects in the environment but larger than the roughness of them (Huopaniemi 1999).

Ambisonics is a directional recording approach that can capture a full sphere-based sound field with a minimum of four microphones, based on the idea that a sound source in relation to a listener can be represented by its orthogonal components and a nondirectional component measured with an omnidirectional microphone. Note that higher order ambisonics adds more microphones to the configuration to increase localization accuracy. One of the advantages of this approach is that it provides flexible output and will work with a variety of external speaker configurations. However, four-channel ambisonics can only create an accurate localization for a single centered listener.

Finally, wave-field synthesis is a technique that can, in theory, completely capture an acoustic environment so that sounds can be made to sound like they originate anywhere within or beyond the listening area. This approach is based on the notion that a virtual sound source can be approximated by overlapping sources originating from actual sources at other positions. The major drawback of this approach is that many loudspeakers that are placed close together are needed to adequately display the sound field. Other material on auralization and its use in 3D applications can be found in Vorländer (2007), Dobashi et al. (2003), Naer et al. (2002), Manocha et al. (2009), and Funkhouser et al. (2004).

5.3.2 Sound System Configurations

The 3D sound generated using any of the many techniques available can then be presented to the user. The two basic approaches for displaying these signals are to use headphones or external speakers, and there are advantages and disadvantages to each.

Headphones

A common approach to the display of 3D sound is to use stereophonic headphones that present different information to each ear. Note that if 3D sound is not required in the 3D UI, monophonic (presenting the same information to each ear) headphones will work just as well, but for the purposes of this discussion, when we refer to headphones, we are referring to the stereophonic variety. Headphones coupled with HRTFs have many distinct advantages in a 3D UI. They provide a high level of channel separation, which helps to avoid crosstalk, a phenomenon that occurs when the left ear hears sound intended for the right ear, and vice versa. They also isolate the user from external sounds in the physical environment (i.e., noise-cancellation headphones), which helps to ensure that these sounds do not affect the listener’s perception. They are often combined with visual displays that block out the real world, such as HWDs, helping to create fully immersive experiences. Additionally, headphones allow multiple users to receive 3D sound (assuming that they are all head-tracked) simultaneously, and they are somewhat easier to deal with, because there are only two sound channels to control.

The main disadvantage of headphones is a phenomenon called inside-the-head localization (IHL). IHL is the lack of externalization of a sound source, which results in the false impression that a sound is emanating from inside the user’s head (Kendall 1995). IHL occurs mainly because of the lack of correct environmental information, that is, lack of reverberation and HRTF information (Kapralos et al. 2003). The best way to minimize IHL is to ensure that the sounds delivered to the listener are as natural as possible. Of course, this naturalness is difficult to achieve, as shown in our previous discussions on the complexity of 3D sound generation. At a minimum, having accurate HRTF information will go a long way toward reducing IHL. Including reverberation can basically eliminate IHL at a cost of reduced user localization accuracy (Begault 1992). Two other minor disadvantages of headphones are that they can be cumbersome and uncomfortable to wear for extended periods of time, and they can make it difficult to hear and talk with collaborators. However, according to Begault (1994), other than the IHL problem and possible comfort issues, headphone displays are superior devices for conveying 3D spatialized sound.

External Speakers

The second approach to display 3D sound is to use external speakers placed at strategic locations in the environment. This approach is often used with visual displays that are not head worn. With the exception of external speakers that use wave-field synthesis, the main limitation with this approach is that it makes it difficult to present 3D sound to more than one head-tracked user (external speakers work very well for nonspatialized sound with multiple users). On the other hand, with external speakers, the user does not have to wear any additional devices.

The major challenge with using external speakers for displaying 3D sound is how to avoid crosstalk and make sure the listener’s left and right ears receive the appropriate signals. The two main approaches for presenting 3D sound over external speakers are transaural audio and amplitude panning. Transaural audio allows for the presentation of the left and right binaural audio signals to the corresponding left and right ears using external speakers (Kapralos et al. 2003). Although transaural audio overcomes IHL, it requires some type of crosstalk cancellation technique, which can be computationally expensive, to ensure that the ear does not receive unwanted signals. See Gardner (1998), Mouchtaris et al. (2000), Garas (2000), and Li and colleagues (2012) for details on different crosstalk cancellation algorithms. Amplitude panning adjusts the intensity of the sound in some way to simulate the directional properties of the interaural time difference and the interaural intensity difference (see section 4.3.2). In other words, by systematically varying each external speaker’s intensity, a phantom source is produced in a given location. Although amplitude panning produces a robust perception of a sound source at different locations, it is very difficult to precisely control the exact position of the phantom source (Vorländer and Shinn-Cunningham 2015). See Pulkki (2001) and Pulkki and Karjalainen (2014) for details on many different amplitude panning techniques.

A final issue when using external speakers is speaker placement, because sounds emanating from external speakers can bounce or be filtered through real-world objects, hampering sound quality. For example, in a surround-screen system, placing the speakers in front of the visual display could obstruct the graphics, while placing them behind the display could muffle the sound. Some surround-screen displays use perforated screens to allow placement of speakers behind the screens without significant audio degradation.

5.3.3 Audio in 3D Interfaces

There are several different ways 3D UIs can use audio displays, including

localization

sonification

ambient effects

sensory substitution and feedback

annotation and help

Localization

As stated at the beginning of section 5.3, the generation of 3D spatial sound creates an important audio depth cue, providing the user with the ability to use her localization skills and giving her an aural sense of the 3D environment. Three-dimensional sound can be used in a variety of ways in 3D interfaces, including audio wayfinding aids (see Chapter 8), acoustic cues for locating off-screen objects (such as enemies in games and training applications), and an enhanced feeling of moving through a virtual space.

Sonification

Sonification is the process of turning information into sounds (the audio equivalent of visualization) (Hermann et al. 2011). It can be useful when trying to get a better understanding of different types of data. For example, in a 3D scientific visualization application for examining fluid flow, a user could move her hand through a portion of the dataset, and sounds of varying frequency could be generated to correspond to varying flow speeds.

Ambient Effects

Ambient sound effects can provide a sense of realism in a 3D application. For example, hearing birds chirping and the wind whistling through the trees helps to augment an outdoor wilderness environment, and hearing cars traveling down city streets could make a city environment more compelling. Ambient effects can also be used to mimic real-world feedback. For example, if a door opening and closing in the real world makes a distinct sound, playing such a sound for virtual door would improve realism and let the user know the door opened or closed without seeing it visually.

Sensory Substitution

In the case of audio displays, sensory substitution (discussed in section 3.3.5) is the process of substituting sound for another sensory modality, such as touch. This substitution can be a powerful tool in 3D UIs when haptic feedback (see section 5.4) is not present. For example, a sound could substitute for the feel of a button press or physical interaction with a virtual object, or it could let the user know an operation has been completed.

Annotation and Help

Recorded or synthesized speech can play a role as an annotation tool in collaborative applications, such as distributed model viewers, and as a means to provide help to users when interaction in the 3D environment is unclear. For example, the 3D environment could inform the user that they are navigating in the wrong direction or provide information from other users about the design of a new engine.

5.4 Haptic Displays

The last type of display device that we examine in this chapter is the haptic display. Haptic displays provide the user with the sense of touch and proprioception by simulating the physical interaction between virtual objects and the user. The word haptics comes from the Greek word haptein, meaning “to touch,” but is used to refer to both sensations of force or kinesthesia when nerve endings in joints and muscles are stimulated and tactile feedback when nerve endings in the skin are stimulated (Burdea 1996). Therefore, depending on the design of the haptic display, these devices can provide the user with a sensation of force, touch, vibration, temperature, or a combination of any of these. Haptic displays are often coupled with input devices so that they can provide a fast feedback loop between measuring user motion and haptic feedback communicated back to the user. Ensuring a very low latency between human motion and haptic feedback is one of the major requirements in creating effective haptic displays. Chapter 3, “Human Factors Fundamentals,” provides an overview of the human haptic system in section 3.3.3.

An important component of a haptic display system (besides the actual physical device) is the software used to synthesize forces and tactile sensations that the display device presents to the user: haptic rendering. Haptic rendering is based on a large variety of generic algorithmic techniques, such as physics-based modeling and simulation, but can also be based on psychophysical modeling of human haptic perception, an approach particularly popular in tactile rendering (Israr and Poupyrev 2011). Haptic rendering is an active research field; however, in this chapter, we focus on the physical output devices instead of the haptic display system as a whole. See Burdea (1996), Basdogan and Srinivasan (2002), and Lin and Otuday (2008) for more details on haptic rendering algorithms, haptic devices, and the human haptic system.

5.4.1 Haptic Display Characteristics

Haptic displays have many different characteristics that influence the use of haptic devices in 3D UIs. In this section, we discuss three of the most common characteristics, including

perceptual dimensions

resolution

ergonomics

See Sherman and Craig (2003) for more information on haptic display characteristics.

Perceptual Dimensions

The most important characteristic of a haptic display is its capability to present information to the user. The majority of haptic displays require direct physical contact between devices and the human body; the user has to “feel” the device. There are a few exceptions to this rule, including special-purpose heat or wind displays, or more general-purpose ultrasonic systems (i.e., powerful and directed sound waves a user can feel).

Unlike visual or auditory displays, in haptics there are multiple parallel physiological and perceptual mechanisms that evoke human haptic sensations. Therefore, numerous techniques are used to create haptic sensations, and there is no one single “best” haptic display. The design of a haptic display and its specific applications depend on the perceptual dimensions that we would like to evoke. On the most basic level, a display might provide tactile or kinesthetic cues, or both. If the device provides tactile cues, the actuation mechanisms can target different skin receptors by using vibrations at different frequencies and amplitudes, static relief shapes, or direct electrical stimulation, where the electrical charge directly interacts with the tactile nerve endings. If the device is designed to output kinesthetic cues, the actuation mechanisms can target different muscle groups in the limb and either actively modify forces that apply to the human body (as in classic force feedback displays) or modulate the perceived resistance of the device to human actuation, which can be done either actively using haptic brake displays or passively using simple physical props.

Body location that is used for haptic actuation is another important perceptual dimension. The density and distribution of nerve endings in skin, muscle, and tendons differ across body locations. For example, the tactile spatial resolution in the tip of the finger was measured with the two-points threshold test to be ~1mm, while the resolution on the back increases to more than 20mm (Cholewiak and Collins 1991). Additionally, the size of the activation area that the display can support affects both perception and display design: a full-body tactile display is perceived very differently than a single vibrating motor. Considering all these perceptual dimensions is essential in designing the haptic display and the display’s application compatibility.

Resolution

The spatial resolution of a haptic display refers to the minimum spatial proximity of the stimuli that the device can present to the user. It should correspond to perceptual spatial resolution in those body locations where the haptic display is applied. For example, the forearm is less sensitive to closely placed stimuli than are the fingertips (Sherman and Craig 2003); therefore, a tactile device designed for the fingers should have much higher spatial resolution than one designed for the forearm.

The temporal resolution of a haptic display refers to the minimal temporal proximity of the tactile stimuli generated by haptic displays; it is often referred to as the refresh rate of the haptic displays. Temporal resolution of displays is very important in delivering high-quality haptic sensations. For example, in force displays a low temporal resolution can cause unintended vibrations, making virtual objects feel softer than intended. Therefore, force displays usually require refresh rates at a minimum of 1000 Hz to provide quality haptic output (Massie 1993).

Ergonomics

In order to generate haptic sensations and communicate information via tactile and force representations, haptic displays need a close physical coupling to the user. Therefore, ergonomics and safety play a vital role in designing and characterizing these displays. For example, some tactile displays use direct electrical stimulation to stimulate tactile receptors. Therefore, care must be taken not to exceed the amount of current that might cause uncomfortable sensations or injury. Similarly, high-fidelity force displays may exert forces that are unsafe for the participant. Errors in haptic display design may cause discomfort or even injury.

In addition to safety, user comfort and convenience are also important concerns when designing haptic displays. For example, it takes time and effort for the user to attach or put on many haptic devices, especially with multiple force contacts. Designing attachment mechanisms that make this process easy and convenient is important for haptic device acceptance by the end users. Furthermore, once the device is attached to the user, its weight can be burdensome, and the device could restrict the user’s ability to move around or to perform anything but the intended haptic tasks. Regardless of the haptic display used, the user’s comfort and safety must be a primary concern.

5.4.2 Haptic Display Types

A wide variety of haptic displays have been developed through the years in both research and industrial settings; many of them have evolved from work done in telerobotics and teleoperation (Biggs and Srinivasan 2002). There have also been a number of attempts to classify haptic displays. For example, they are often categorized based on the types of actuators, i.e., the active components of the haptic display that generate the force or tactile sensations (see Hatzfeld and Kern (2014) for a comprehensive introduction to actuator technology).

In this discussion we group haptic displays from a user-centric perspective, placing them into one of six categories:

ground-referenced

body-referenced

tactile

in-air

combination

passive

In this section, we briefly describe some devices that provide examples for these categories.

Ground-Referenced Haptic Devices

Ground-referenced feedback devices (also called world-grounded) create a physical link between the user and a ground point in the environment, such as a desktop, wall, ceiling, or floor. Note that because these devices are fixed to the physical environment, their range is limited. Different types of ground-referenced displays include force-reflecting joysticks, pen-based force-feedback devices, stringed devices, motion platforms, and large articulated robotic arms. Ground-referenced displays typically use electric, pneumatic, or hydraulic actuator technology.

Force-reflecting joysticks as well as force-feedback steering wheels are commonly available, relatively inexpensive, and often used in computer games, such as driving and flight simulators. Pen-based haptic displays add haptic feedback to a familiar pointing device (e.g., a stylus). An example of such a display is shown in Figure 5.24. String-based feedback devices use thin steel cables to apply forces to the user’s hand; they are lightweight and can also support a large workspace (Ishii and Sato 1994). On a larger scale, there are large articulated arms that are grounded to the floor or ceiling. They can generate much higher levels of force, which means safety is a much more critical issue. The advantage of these devices is that they can provide a fairly large range of motion for the user. Examples of this type of device are the Argonne Remote Manipulator (Brooks et al. 1990) and the SARCOS Dextrous Arm Master (Burdea 1996): arm exoskeleton force displays that can apply forces to the hand, elbow, and shoulder.

Figure 5.24 A ground-referenced force-feedback device. (Photograph courtesy of SensAble Technologies)

In addition to haptic devices designed to enhance manual control and manipulation, ground-referenced haptic displays also include devices such as treadmills, motion platforms, and other locomotion devices for traveling through 3D environments (see Chapter 8 for more details).

Body-Referenced Haptic Devices

In contrast to ground-referenced haptic displays, body-referenced feedback places the haptic device on some part of the user’s body—the haptic display is “grounded” to the user. The main benefit of body-referenced displays is that they provide the user with much more freedom of motion in the surrounding environment than do ground-referenced displays (i.e., the user can walk around and is not constrained to workspaces instrumented with haptic displays). The disadvantage, however, is that the user has to bear the entire weight of the device. Therefore, ergonomic factors such as weight and size are critical in designing usable and effective devices, which is a significant engineering challenge. This type of display typically uses electrical or pneumatic actuator technology.

One promising approach that reduces the weight of body-referenced devices, making it more ergonomic, is to use electrical muscle stimulation to send electrical signals to different muscle groups causing involuntary muscle movements to generate haptic forces (Kruijff et al. 2006). This approach typically uses electrodes, connected to a transcutaneous electrical nerve stimulation (TENS) device, strategically placed on different muscles to invoke movement. For example, placing electrodes on forearm muscles can make a user’s hand move up and down with the right amount of stimulation. This type of haptic system has been used in a virtual reality boxing game (Lopes et al. 2015).

Body-referenced displays can be further classified by the body locations that are actuated by the devices. One popular type is the arm-based exoskeleton, which is similar to a ground-referenced arm exoskeleton force display, except that they are grounded to the user’s back rather than the floor, ceiling, or wall. The second type of body-referenced display is the hand-force-feedback device. Such devices are grounded to the user’s forearm, palm, or back of the hand, depending on the design. These displays typically use cables, tendons, and pulleys to transmit forces to the hand and fingers, with the actuators placed remotely. An example of such a device is shown in Figure 5.25; this device can produce forces that prohibit the user from bending the fingers (e.g., if the user is grasping a virtual ball). An alternative approach to designing hand force-feedback devices involves putting the actuators in the user’s palm, which reduces the overall complexity of the devices (Burdea et al. 1992). Full-body mobile exoskeletons have been also demonstrated recently, although their purpose is usually to amplify user mobility and strength rather than provide haptic feedback per se. Regardless of the type of body-referenced haptic displays, because the user wears them, they require setup time to be put on the user and calibrated for specific body size.

Tactile Displays

Tactile displays aim to present haptic information by stimulating the user’s tactile sense. Because human skin is highly sensitive, significantly less energy is required to produce strong and recognizable tactile sensations. Therefore, these displays are generally much smaller and more lightweight than the force displays that we discussed above. All tactile displays are based on producing tactile sensations by applying physical stimuli on human skin. Therefore, tactile displays can be categorized by the physical principles of the stimulation. They include mechanical displacement-based displays, vibrotactile displays, electrocutaneous displays, electrovibration displays, surface friction displays, and thermoelectric displays.

Examples of mechanical displacement-based displays include inflatable bladders and relief-style displays, where an array of pins creates a physical shape display or “tactile picture” that can be both observed by the user and felt by the hand (see Figure 5.26; Follmer et al. 2013). Vibrotactile displays (see Figure 5.27) communicate tactile sensations by placing vibrating actuators on the fingertips and hands; the most typical actuator used in vibrotactile displays is a vibrating motor. Vibrotactile displays are commonly used in game controllers and mobile phones. Electrocutaneous displays directly stimulate tactile receptors in human skin with electric charges passing through the skin (Kaczmarek et al. 1991). This sensation is not familiar to the user and feels like tingling. In another type of electrotactile stimulation, a current stimulates the eighth cranial nerve located behind the wearer’s ear. These electrical signals provide the user not with a tactile sensation, but with vestibular stimulation, which can create a sensation of motion.

Figure 5.26 InForm is a mechanical displacement display that is able to produce tactile haptic shapes (Follmer et al. 2013). (Photograph courtesy of MIT Media Lab)

Figure 5.27 A tactile device that puts vibrating actuators on the fingertips and the palm of the hand. (Photograph reproduced by permission of Immersion Corporation, © 2004 Immersion Corporation. All rights reserved)

Electrovibration-style tactile displays create tactile sensations by controlling electrostatic friction between instrumented surfaces and sliding fingers (Bau et al. 2010). An alternating voltage applied to the conductive surface produces an attraction force between the finger and the conductive surface. This force modulates friction between the surface and the skin of the moving hand, creating a friction-like sensation as the finger slides on the surface. This friction can be combined with a visual representation, allowing the user to feel the surface of virtual 3D images as demonstrated in Figure 5.28 (Kim et al. 2013). Note that any surface can be augmented with such tactile sensations, including surfaces of everyday objects, input devices, and physical props (Bau et al. 2012).

Surface friction tactile displays are similar to electrovibration displays in that they control friction between the human hand and a surface, but the tactile effect is based on vibrating a surface at ultrasonic frequency and creating a squeeze film of air between the human hand and the touch surface. The thin layer of air reduces the friction between the hand and display, making it more “slippery” (Winfield et al. 2007). The friction can further be manipulated by modulating the amplitude of the ultrasound vibration. Finally, thermoelectric displays produce the sensation of heat and are usually developed using various thermoelectric devices, such as Peltier modules (Jones 2008).

All the technologies that we discussed above allow for design of actuation technologies and control techniques needed to produce tactile sensations. The higher-level goal of designing tactile displays is the construction of high-order tactile percepts that can communicate complex meanings, expressions, and experiences to the user. This second category includes tactile languages to communicate symbolic information, display images and sounds, communicate alerts and messages, and present spatial information such as directions and shapes (Israr and Poupyrev 2011).

Figure 5.28 Electrovibration displays allow the user to feel the bumps and ridges on the 3D rendering projected on the surface of the display (Kim et al. 2013). (Photograph courtesy of Ivan Poupyrev, copyright Disney, printed with permission)

In-Air Haptics

With all the haptic and tactile feedback technologies discussed so far in this chapter, in order to feel virtual objects, users have to have direct physical contact with the haptic apparatus, either by touching some physical objects equipped with haptic devices or wearing tactile feedback devices in the form of haptic gloves, belts, vests, etc. Requiring users to wear physical devices significantly impedes natural user interaction and may limit the overall range of applications that employ tactile feedback. Consequently, a more recent approach in haptic displays are haptic devices that create tactile sensations in “mid-air” without the need for direct physical contact with the haptic apparatus. Most of the devices for creating in-air tactile sensations are based on using air to stimulate human skin.

The earliest approach to creating in-air tactile sensations was introduced in Sensorama, invented by cinematographer Morton Heilig in the 1960s. Sensorama combined stereoscopic graphics with smell, stereo sound, a vibrating seat, and wind blowing into the user’s face to increase the sense of immersion. Similar air-blowing techniques have also been used for decades in location-based entertainment (e.g., Walt Disney World’s “Soarin’” attraction, which simulates flying in a glider). Similarly, coarse-grained heat sensations can be created using heat lamps as displays (Dinh et al. 1999).

In ultrasound-based in-air haptics displays (Iwamoto et al. 2008), a two-dimensional array of hundreds of miniature ultrasonic transducers forms a beam of ultrasonic radiation pressure using a phased array focusing approach (Wilson et al. 2014). Because of the low ultrasound frequency, 99.9% of the incident acoustic energy will reflect from the human skin, creating a pressure field that provides perceivable tactile sensations. By modulating the ultrasound beam at approximately 200 Hz, the perceived intensity of tactile sensations increases due to the high sensitivity of skin to vibratory stimuli at this frequency. However, the disadvantage of this approach is the short effective distance of such displays.

An alternative approach to creating in-air tactile sensations using concentrated air pressure fields is based on using air vortices (Kruijff and Pander 2005). Here, the tactile sensations are produced by a pressure differential inside of an air vortex (Sodhi et al. 2013), providing effective tactile sensations over relatively large distances (over one meter in length). Also, this approach allows for an efficient, relatively inexpensive, and scalable design of in-air haptic devices. In one implementation, standard miniature speakers driven synchronously can consistently create a vortex that can be further directed by using a flexible nozzle (see Figure 5.29).

In-air tactile sensations are not currently able to provide the highly detailed and responsive tactile sensations that are possible with electromechanical devices. However, they offer new interactive modalities when designing 3D UIs in both desktop and room-scale settings.

Figure 5.29 Vortex-based in-air tactile display (Sodhi et al. 2013). (Photographs courtesy of Ivan Poupyrev, copyright Disney, printed with permission)

Combination Devices

Because of the complexities of simulating force and touch sensations, most haptic and tactile devices focus on producing feedback within one of the four haptic display categories above. However, combining different types of feedback can create more believable and recognizable haptic sensations. Figure 5.30 is an example of such a device that combines ground-referenced and body-referenced feedback styles. Another example of a combination display uses a body-referenced hand-force device coupled with tactile feedback for the user’s fingertips (Kramer 1993).

Figure 5.30 A haptic device that combines ground-referenced and body-referenced force feedback. (Photograph reproduced by permission of Immersion Corporation, © 2004 Immersion Corporation. All rights reserved)

Passive Haptics

The haptic devices we have seen thus far have all been active devices that generate forces or tactile sensations using some type of actuator technology controlled with haptic rendering techniques. Another class of haptic interfaces are those that are based on using passive physical representations of virtual objects to communicate their physical qualities. Their unique characteristic is that they convey a constant force or tactile sensation based on the geometry and texture of the particular object. For example, a real cup or hammer can be used to provide a fully realistic haptic sensation when the virtual world contains a virtual cup or hammer. Passive haptic devices are not necessarily restricted to handheld objects and can include tabletops, walls, and the floor, as well as providing basic input capabilities, e.g., touch input and position tracking (Poupyrev, Tomokazu et al. 1998).

Passive haptic devices (or “props”) are very specific in that they are solid physical objects that directly mimic the virtual objects that they are used to represent. Not surprisingly, they have been shown to be effective in improving the perceived realism of VEs (Insko 2001). Of course, their underlying limitation is their specificity and the need for exact registration between virtual objects and their passive representations. Refer to Chapter 9, “System Control,” section 9.4, and Chapter 10, “Strategies in Designing and Developing 3D User Interfaces,” section 10.2.1, for a thorough discussion of passive haptic devices used in 3D UIs.

5.4.3 Haptic Displays in 3D Interfaces

The ability to present haptic feedback to the user in a 3D UI is a powerful tool in developing more effective, efficient, and immersive experiences. In particular, from an immersive standpoint, haptic displays can help to improve the realism of a 3D UI (Biggs and Srinivasan 2002), which is particularly important in applications such as entertainment and gaming. Of course, realistically simulating the haptic cues of real-world objects is currently difficult to do—“getting it right” is quite challenging, and a poorly designed haptic display can in fact hinder the immersive experience. Even worse, strong forces from a haptic display can be dangerous, potentially harming the user.

From a 3D UI perspective, one natural use of haptic feedback is to provide feedback when grabbing and manipulating virtual objects using direct manipulation. Ground-referenced devices like the one shown in Figure 5.24 have been used in surgical training (Körner and Männer 2003), molecular docking (Brooks et al. 1990), and 3D modeling and painting applications (Foskey et al. 2002). Tactile feedback can be used to simulate the texture of physical surfaces (e.g., a designer might want to explore the surface textures of different objects in an industrial design application), as shown in Figure 5.28. In another example, a tactile display could be used to help a firefighter determine a doorknob’s temperature in the context of a training simulation. Haptic feedback can also be used to signal the user that an interaction task has been invoked or completed, successfully or not. Finally, passive haptic devices can also be used in 3D interfaces as props that provide weight and texture and represent an inexpensive way to display haptic cues to the user.

5.5 Characterizing Displays by Level of Fidelity

Choosing a specific display for a 3D UI can be overwhelming, given the number of different displays and configurations available. In the sections above we have described various characteristics of visual, auditory, and haptic displays as a way to think about them more abstractly and systematically. Thinking about characteristics like FOV, stereoscopy, resolution, and overall quality, we can see that all of these characteristics are related to the realism, or fidelity, of the display.

To be more precise, a display’s level of fidelity is the degree to which the sensory stimuli produced by a display correspond to those that would be present in the real world (Bowman and McMahan, 2007; McMahan et al. 2012). For example, in a real-world environment, a typical human would be able to see the world all around her (360-degree FOR), would have a more than 180-degree horizontal FOV, would perceive a spatial resolution limited only by the receptors on the retina, and would have stereoscopic vision with no crosstalk and a perfect accommodation-vergence match. Auditory and haptic stimuli would be similarly perfect. In a typical HWD, however, while the 360-degree FOR would still be present, the FOV would be much lower (perhaps 100 degrees), the spatial resolution would be significantly less, and stereoscopic vision would be compromised by an accommodation-vergence mismatch.

Why do we care about a display’s fidelity? First, much of the research on displays is aimed at ever-increasing realism. Display manufacturers and their customers want higher and higher resolutions (witness the current trend of “retina” displays in which individual pixels are said to be undetectable at normal viewing distances), wider and taller FOVs, better stereo and other depth cues, more lifelike color reproduction, and so on. Talking about a display’s fidelity allows us to benchmark it with respect to the real world and to compare it to other displays in a standard way.

Second, a display’s fidelity can have significant effects on the user’s experience with the display. For example, a wider FOV may lead to faster performance on a visual search task (Arthur 2000), and higher overall fidelity may result in a greater sense of engagement (McMahan et al. 2012). Again, thinking about these findings in terms of fidelity is more general and more useful than a comparison of specific displays.

It should be clear from this discussion that display fidelity is not a single thing or a number that captures everything about the sensory realism of a display. Rather, it’s made up of a number of components, each of which has its own level. A nonexhaustive list of visual display fidelity components includes:

field of regard (FOR)

field of view (FOV)

stereoscopy quality

refresh rate

spatial resolution

color reproduction quality

display latency

Similar lists apply to auditory and haptic displays, although it’s more difficult to nail down all of their components. Auditory display fidelity would include components related to auditory quality, spatiality, and latency, while haptic display fidelity components would include latency, resolution, and other haptic quality aspects.

Sometimes we can state outright that display A has higher fidelity than another display B. For this to be true, all display fidelity components of A would need to be at a level greater than or equal to the respective components of B. Another way of saying this is that it would be possible to simulate display B perfectly using display A (Slater et al. 2010). For example, a four-screen surround-screen display has higher fidelity than a single-screen display using one of the screens/projectors from the surround-screen display.

More often, however, the comparison of fidelity levels between two displays is more mixed, with one display having higher levels of certain components of fidelity and the other having higher levels for other components. For example, a HWD can provide a 360-degree FOR, while a three-screen surround-screen display has an FOR of only 270 degrees. However, the surround-screen display might provide a 120-degree FOV (with stereo glasses on), while the HWD might provide only 90 degrees. In a case like this, to decide which display is most appropriate for a given user task, it would be important to look at empirical results on the effects of FOR and FOV for that task to determine which one is most important to optimize.

For further reading on display fidelity and its effects on 3D UI user experience, see Bowman and McMahan (2007), McMahan et al. (2012), and Ragan et al. (2013, 2015).

5.6 Design Guidelines: Choosing Output Devices for 3D User Interfaces

Choosing the appropriate output devices for a 3D application is a difficult task, and unfortunately there is no single rule of thumb telling developers which output devices to use.

Tables 5.1–5.3 present summaries of the advantages and disadvantages of the output devices we have discussed in this chapter; these can be used as a quick and general guide for starting the process of choosing an appropriate output device for a 3D UI. Note that new and more powerful display devices are becoming available all the time, so these guidelines should be considered general rules of thumb, as there can always be exceptions to these pros and cons.

Table 5.1 Visual display devices: pros and cons, visual depth cues supported.

Table 5.2 Pros and cons of using headphones or external speakers in a 3D sound display.

Table 5.3 Pros and cons of different haptic display device types.

Although there are many factors to consider when choosing output devices, such as finances, restrictions from available input devices, and ergonomics, the most important consideration is the application itself.

Tip

Analyze the application’s goals and its underlying operations and tasks to obtain direction in choosing an appropriate output device.

As an example, consider a medical visualization tool that allows doctors to teach medical students about various surgical procedures. Because of the collaborative nature of this application, a single HWD or simple monitor would probably not be appropriate, since several people need to see the display simultaneously. In this case, using a large-screen display or having each person wear their own HWD is probably the better choice. In contrast, an application used in phobia treatment (Hodges et al. 1999) or to reduce pain (Hoffman et al. 2008) requires an individual user to be completely immersed in the environment with the real-world blocked out. Clearly, an HWD or six-sided surround-screen display is the best choice for this application. As a final example, a holographic or volumetric display might be the best choice for an exhibit in a museum, because users do not need to wear any special glasses for the 3D stereo effect and because an unlimited number of viewers can get correct views simultaneously.

Choosing an audio display is a somewhat easier task, because it often enhances an application’s effectiveness. Of course, choosing the appropriate audio configuration, whether it is headphone-based or a multiple speaker system, still depends on the type of application and on the other devices used.

With haptics, the display choice often depends on the type and accuracy of the feedback required. In some applications, a tactile sensation is enough to give the user a sense that there is something tangible about the objects they see and interact with. However, if the application requires a real force, such as a molecular docking application (Brooks et al. 1990) in which users try to fit different molecules together, a ground-referenced force-feedback device is more appropriate.

The examples presented thus far are more about common sense than any scientific discovery. However, another approach to determining the most appropriate display device for a given application is through empirical study. Bowman et al. (2002) developed guidelines for choosing appropriate display devices by examining interaction techniques across different display platforms. For example, they compared users’ preferences for real and virtual turns during navigation in an HWD and a four-sided surround-screen display. The results showed that users had a preference for real turns in the HWD and for virtual turns in the surround-screen display because of its limited FOR.

Tip

Visual display devices with a 360-degree FOR should be used in applications in which users perform frequent turns and require spatial orientation.

In another set of empirical studies, Swan and colleagues compared a monitor, surround-screen display, workbench, and a large wall display for the task of navigating in a battlefield visualization application. For this particular application, the results showed that users were faster at performing the navigation tasks on the desktop than with the other display devices, perhaps because of the higher spatial resolution and brightness achievable on a conventional monitor. For more details on the study, see Swan et al. (2003).

Kasik et al. (2002) also performed empirical evaluations on different visual display devices. They compared a high-resolution 20-inch monitor, a hemispherical display, and a large 50-inch flat screen panel by testing how quickly and accurately users found objects of interest in sophisticated 3D models. Users performed better with the monitor than with the flat panel or the hemispherical display, indicating that higher spatial resolution was an important factor in their search tasks. Other evidence from subjective studies (Demiralp et al. 2006) indicates that higher spatial resolution and crispness is important to users in scientific visualization. Other studies that have compared different display devices empirically include Johnsen and Lok (2008), Qi et al. (2006), Schulze et al. (2005), and Sousa Santos et al. (2009).

Tip

For visual search and pattern identification tasks in 3D environments, choose a display with high spatial resolution.

Although these empirical studies are quite powerful, their results should be used with caution because these studies are often designed for specific tasks and applications, making them difficult to generalize. In addition, as technologies continue to improve, results from older studies need to be reevaluated. Nevertheless, the development of guidelines based on empirical results should continue to be an important area of 3D UI research and will help to make it easier to determine the best display devices for given applications.

5.7 Case Studies

The following sections discuss the output device considerations for our two case studies. If you are not reading the book sequentially, you may want to read the introduction to the two case studies in Chapter 2, section 2.4.

5.7.1 VR Gaming Case Study

For our VR first-person action-adventure game, there are two obvious visual display types to consider: HWDs and surround-screen displays. We want users to be able to turn around physically to view different parts of the world around them, because that enhances the sense of presence and makes the environment seem more like a real physical place. So single screens are not an option.

HWDs have the advantages of being relatively inexpensive, easy to obtain and set up, usable in any sort of physical space, and good for inspection of and interaction with nearby objects. The advantages of surround-screen displays are their lighter headgear (simple stereo glasses) and the ability for the player to see her own body in the space. For a VR game targeted at home use, cost and convenience trump all, so HWDs are the clear choice. We might consider a surround-screen display, however, if the game was set up in a dedicated space like a VR arcade. For the remainder of the case study, we’ll assume the use of a typical consumer HWD.

In choosing an HWD, several issues arise that we’ll have to handle. First, ergonomics becomes a concern. HWDs can still be bulky, heavy, and uncomfortable. We will want to choose a display that is as light and comfortable as possible. The HWD should also have a comfortable accommodation distance and allow for adjustment of the distance between the optics/displays (interocular distance) to avoid eyestrain and provide for correct stereo viewing.

Since the player will not be able to see the real-world directly, we should also address safety. Preferably, the HWD can be used wirelessly or with an all-in-one backpack system, so that players won’t become tangled in cables or have a tripping hazard. We also want the real world to “bleed through” when the user is in danger of running into a real-world object. A system like the HTC Vive’s chaperone feature can display the boundaries of the physical space (walls) when the user approaches or reaches out close to them. It’s also possible to use a depth camera to detect other obstacles in the play area, such as furniture or people/pets walking through the room (Nahon et al. 2015).

The fidelity of the display is moderately important for a game of this type. The virtual world will not be intended to look just like a real-world environment, so it’s possible for the game to be effective even without the highest levels of fidelity. At the same time, we do want players to feel present in the game world, so it’s important to have a reasonable FOV (100 degrees horizontal, which is currently the standard for many consumer HWDs, should be fine) and to provide a good spatial resolution.

So far, we’ve only discussed a single player, which is ideal for an HWD. But in most games, even if they’re not explicitly multiplayer, it’s desirable to allow nonplayers to watch the action in real time. This could be accomplished with support for multiple HWDs, but nonplayers with HWDs might just get in the way, and this solution is not scalable to a larger audience (most people won’t have ten HWDs sitting around). A better solution is a green screen compositing solution, where a video camera captures the player in front of a green screen and then composites this with a view of the virtual world, so that viewers can see the player immersed in the game world from a third-person point of view. This approach was popularized by the game Fantastic Contraption (Figure 5.31) and works great for both in-person viewers and posting or streaming of gameplay videos online.

Figure 5.31 Mixed reality compositing in Fantastic Contraption, allowing viewers to see both the player and the virtual world she inhabits. (Photograph courtesy of Northway Games)

Finally, we need to consider nonvisual displays. For this game, spatial sound will certainly add to the realism and experience of the game (e.g., hearing monsters sneak up behind you), so we’ll want to ensure that the player has a good pair of headphones. Using physical props to provide haptic feedback is not really an option, since the same physical space will be used for a variety of virtual rooms. We can provide some level of haptic feedback through the handheld controllers—both the passive feedback created by holding the controllers themselves and the rumble/vibration feedback they can produce.

Key Concepts

Choose a visual display that will be both effective and practical for your end users.

Carefully consider human factors issues related to VR displays.

Don’t forget to account for social aspects such as non-users viewing the VR experience.

5.7.2 Mobile AR Case Study

While creating the HYDROSYS outdoor AR application, we had to design a suitable platform to meet various physical and technical requirements. The devices used for analysis of environmental processes not only had to withstand outdoor environments, but they also had to be comfortable to hold and make clear visibility of content possible. Moreover, the device setup had to enable high tracking accuracy and real-time rendering of the augmented content. Providing an accurate platform turned out to be a true challenge. In this section we unravel the various issues and derive useful recommendations.

The type and specifications of the visual display itself was not a primary concern for this application. We needed a low-cost AR display that could be tracked outdoors and used by multiple users in harsh conditions. A handheld, video-see-through AR display was really the only reasonable choice. More difficult, however, were the physical requirements of the entire platform, including issues of form factor, sensor integration, and ruggedness.

Let us first go into the physical requirements of the platform. The platform had to be robust to withstand the harsh external conditions, as the platform would be used in low temperatures, snow, and rain—not something most electronic devices will withstand. While interacting, the device needed to be comfortable to hold to enable ergonomic usage of the system over potentially long periods of time. Users would often perform analyses for half an hour or more, and sore arm muscles would not be acceptable! Even more so, the setup had to be compact so as to be transported, for example tucked away in a small backpack. Finally, the setup had to be modular such that users could adjust the setup based on the usage conditions.

Now, you may ask yourself why a waterproof smartphone could not have been used, as it would have met many of the requirements. Unfortunately, the built-in quality of smartphone sensors at the time was not good enough to provide adequate tracking, for which reason external sensors such as a high-quality GPS sensor, a good camera, and a precision inertial sensor for rotation measurement had to be used. While this situation has changed somewhat, there are still cases in which developers will prefer to use these higher-quality sensors.

We based the design of a suitable platform on prior experiences with another handheld AR platform used for visualization of an underground infrastructure (Schall et al. 2009). The usage environment was less harsh than for HYDROSYS, yet many of the same requirements applied. This prior platform, called Vesp’r (Figure 5.32), consisted of a central unit that held a Sony UMPC mobile computer and the sensors, as well as all the needed cabling. Sensors included a high-quality camera, a high-precision GPS, and an inertial measurement unit. The device afforded both single- and two-handed usage by mounting either one grip below or two grips at the sides of the setup. The grips enabled a power grip (see section 3.5.1), important for holding the slightly heavier setup firmly. (Note an inappropriate grip would limit ergonomic usage.) This was especially important when the device had to be held at eye height, a pose that eases the match between augmented content and the real world, yet normally is quite tiring. While the single-handed setup was slightly less comfortable to hold, it did enable interaction with the screen directly. With the two handles mounted at the sides, the interaction was performed through controllers embedded in the handles. We elaborate further on the input structure in the next chapter.

Figure 5.32 Vesp’r handheld AR device setup. (Image courtesy of Ernst Kruijff and Eduardo Veas).

Through various studies we found that the shape and balance of the setup, as well as controller allocation, can greatly help in supporting ergonomic usage. Very interestingly, a heavier device is not necessarily a worse device; a heavier but well-balanced and ergonomically formed device may outperform an ill-designed lighter platform (Veas and Kruijff 2008). Support for specific body poses extended the duration for ergonomic usage. We noticed users would often rest their elbows against their waists when using the device two-handed, which would stabilize the grip and distribute the weight better. Based on this, we chose the angle at which the display was mounted on the support structure so that users could clearly observe screen content from multiple poses. A different screen angle could lead to uncomfortable poses, as outdoor lighting and reflection limit the angle at which screen content is visible.

In HYDROSYS, we targeted a robust setup that could withstand even rougher conditions. We used a weather-resistant tablet and a MIL-standard Panasonic UMPC that was heavier than the Sony UMPC used for the Vesp’r yet had an outdoor-visible screen and a strong battery. The latter was very useful for driving the additional sensors over time while reducing the need for an additional battery pack. Similar to Vesp’r, we used a high-precision GPS, inertial sensors, and a high-quality camera.

Figure 5.33 HYDROSYS handheld AR device setup. (Image courtesy of Ernst Kruijff and Eduardo Veas)

Figure 5.33 shows the final setup achieved after three iterations. The setup contained an x-like mount behind the UMPC that was made out of a composite of 3D-printed plastic and aluminum. To the mount we connected a neoprene (a light but robust material) hull that could be compressed to limit the space cables and sensor control boxes would take up. The neoprene construction could be held comfortably by one hand. To the plastic frame, various 3D-printed units could be mounted to hold the preferred sensors. While the previous iterations also included a tripod mount, the final prototype included a camera belt connected to the setup to relieve the user of some of the weight, important when holding the device single-handed while controlling the application. The final prototype was a good balance between robustness, ergonomics, modularity, and compactness. For more details, see Veas and Kruijff (2010). We also revisit ergonomic considerations in Chapter 11, “Evaluation of 3D Interfaces.”

Key Concepts

While the development of the described prototype at first blush seems limited to the described domain, many of the lessons learned actually apply to a wide range of handheld AR setups.

Support a comfortable power grip to hold the system firmly, especially when the user is required to hold the device at eye height.

Allow users to vary their poses, and look into the potential of resting the arms against the body to extend usage duration.

Closely analyze the relationship between display angle and pose, as changing pose to see content clearly may result in nonergonomic usage.

Look closely at the balance of the setup—if the device tips in the wrong direction, even a good pose or grip may not be sufficient to hold the system for long. Balance may be more important than overall weight.

Try to limit the need for additional batteries for operation and compress additional cables, as they tend to take up a lot of space.

5.8 Conclusion

In this chapter, we have examined a variety of different output devices including different types of visual displays, auditory displays, and haptic and tactile feedback devices. We have also examined some strategies for choosing an appropriate output device and discussed the effects of display fidelity. However, output devices are not the only important hardware components used in 3D UIs. Input devices, the devices used to perform actions and tasks within the applications, are equally important. The choice of input device can affect the choice of output device, and vice versa. In the next chapter, we continue our examination of hardware used in 3D UIs by exploring the many different input device varieties and how they affect 3D UI design.

Table of Contents for 5 3D User Interface Output Hardware

Create new playlist

Sign In

Sign Up

Chapter 5. 3D User Interface Output Hardware

5.1 Introduction

5.1.1 Chapter Roadmap

5.2 Visual Displays

5.2.1 Visual Display Characteristics

Field of Regard and Field of View

Spatial Resolution

Screen Geometry

Light Transfer

Refresh Rate

Ergonomics

Depth Cue Effects

5.2.2 Visual Display Device Types

Single-Screen Displays

Surround-Screen Displays

Workbenches and Tabletop Displays

Head-Worn Displays

Arbitrary Surface Displays

Autostereoscopic Displays

5.3 Auditory Displays

5.3.1 3D Sound Generation

3D Sound Sampling and Synthesis

Auralization

5.3.2 Sound System Configurations

Headphones

External Speakers

5.3.3 Audio in 3D Interfaces

Localization

Sonification

Ambient Effects

Sensory Substitution

Annotation and Help

5.4 Haptic Displays

5.4.1 Haptic Display Characteristics

Perceptual Dimensions

Resolution

Ergonomics

5.4.2 Haptic Display Types

Ground-Referenced Haptic Devices

Body-Referenced Haptic Devices

Tactile Displays

In-Air Haptics

Combination Devices

Passive Haptics

5.4.3 Haptic Displays in 3D Interfaces

5.5 Characterizing Displays by Level of Fidelity

5.6 Design Guidelines: Choosing Output Devices for 3D User Interfaces

5.7 Case Studies

5.7.1 VR Gaming Case Study

5.7.2 Mobile AR Case Study

5.8 Conclusion

Recommended Reading

Table of Contents for
5 3D User Interface Output Hardware