© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
J.-M. ChungEmerging Metaverse XR and Video Multimedia Technologieshttps://doi.org/10.1007/978-1-4842-8928-0_2

2. Metaverse XR Components

Jong-Moon Chung1  
(1)
Seoul, Korea (Republic of)
 
The future market of XR devices is expected to grow significantly as technology evolves. These advancements in XR technology will drive metaverse services in new directions to become much more creative in designs and services. In addition, edge computing and mobile wireless technologies will significantly help the implementation of highly computational XR based metaverse and multimedia services in the future. More details on the current state-of-the-art in XR technology and future evolution directions will be introduced in this chapter as well as in Chapters 3 and 4 (www.coursera.org/learn/ar-technologies-video-streaming). Chapter 2 focuses on the core XR components, which include the following list of technologies:
  • XR, MR, AR, and VR

  • XR System Components and Workflow

  • XR CPU and GPU Engines

  • XR Software Engines and Platforms

  • XR Lab Setup and Haptic Devices

  • 3D Motion Treadmills

XR, MR, AR, and VR

Virtual reality (VR) enables its user to immerse into an artificial virtual environment where the user will commonly use an avatar to exist and interact inside the VR space. The user’s view in VR is different from the real environment, and therefore, fantasies and illusions are easy to create in the VR virtual environment. This unlimited design freedom is why VR is the most common platform for massively multiplayer online games (MMOGs) and metaverses.

Augmented reality (AR) is a mixture of real-life and VR parts, where users can obtain useful information about a location or objects and can interact with virtual contents in the real world. Commonly, an AR user can distinguish the superimposed virtual objects, augmented sounds, or haptic actions. AR users may be able to turn on or turn off selected AR functions (which may be related to certain objects, sounds, or haptics). In comparison with VR, AR users commonly feel less separated from the real world as they can see their actual environment in the background view. Virtual fantasies and illusions can be created and superimposed on a real-world view using AR.

Figure 2-1 presents the relation of AR, VR, and MR (www.researchgate.net/publication/235093883_Some_Human_Factors_Considerations_for_Designing_Mixed_Reality_Interfaces). MR enables its user to cross over from the VR domain into the augmented realistic AR domain and vice versa. In order to create MR, both AR and VR domains have to be matched to become interconnectable. For this, AR and the augmented virtual domains have to be connected. AR is based on the real world with elements that are composed virtually based on reality. Augmented virtuality is created from the VR virtual environment which composes reality based on virtual components.

A graphic defines how the real and virtual environment is related to the mixed reality that has augmented reality and virtuality.

Figure 2-1

A Definition of AR, VR, and MR (by Paul Milgram)

Figure 2-2 shows the Venn diagram domain of AR technology, in which AR is the cross-domain of 3D technology, real-time interaction, and a combination of the real world and VR (www.cs.unc.edu/~azuma/ARpresence.pdf).

A Venn diagram defines the A R based on the combination of real and virtual reality, real-time interaction, and 3 D.

Figure 2-2

A Definition of AR (by Ronald Azuma)

Currently, most metaverse services are based on VR user environments. This is mainly because many types of video games and MMOGs have been using VR technology to enable player avatars to compete in virtual game environments, which may include multiple domains and universes. MMOGs are Internet connected online video games that enable many players to simultaneously participate, collaborate in teams, and compete in the same game environment.

MMOGs require high-quality networks, game servers, high-performance player computer terminals, as well as virtually created multiple game domains, dimensions, and/or universes. Thus, such MMOGs can be considered as the one of the original purposes to create and use a metaverse.

However, it is predicted that soon XR will be massively used in creating metaverse services. XR technology can enable users to experience both real and virtual environments by interpolating mixed reality (MR), AR, and VR with various haptic user interfaces to make the user’s environment more interactive, realistic, broader, diverse, and immersive. New XR application programming interfaces (APIs) can enable various types of human-human, human-machine, and machine-machine interaction.

Future metaverse services will need advanced XR equipment as well as high-performance wired and wireless networks to support the required data rate, time delay, and reliability requirements. Advanced metaverse services require high levels of Quality of Service (QoS) and Quality of Experience (QoE) to enable the XR system to provide a satisfying level of immersive and virtual presence.

Based on this focus, new standards on XR equipment and networking services are being prepared by global organizations, like the 3rd Generation Partnership Project (3GPP) and the Institute of Electrical and Electronics Engineers (IEEE). Standards provide a way for new products and services to interoperate, safely be used, while satisfying the basic commonly required functionalities.

Just like MMOGs, metaverse systems can be supported by many types of networking platforms and be played using a personal computer (PC), video game console, smartphone, head mounted display (HMD), as well as various Internet of Things (IoT) devices. Therefore, metaverse applications have to be designed considering users in diverse environments using various interactive XR interfaces.

A full XR process is complex and hard to implement because XR requires accurate coordination of multiple cameras and sensors for acoustics, motion, and haptics to fit the user’s senses such that the experience is immersive and comfortable. A full XR system will implement MR which will interconnect the real surroundings of the user through AR and access virtual domains using VR seamlessly. Evidently AR is more difficult to implement than VR, and MR technology is more difficult to implement than AR. Considering that XR is a combination of component technologies that include MR, speech-to-text (STT), voice recognition systems, special sound effects, as well as haptic and 3D motion user interfaces (UIs), XR technology is even more difficult to implement than MR technology. As a result, evolution toward XR metaverses will take several years in which significant technological breakthroughs are needed, which is why large research and development investments are needed.

XR System Components and Workflow

In order to express 3D visual effects on a XR system, various visual sensing and expression techniques have to be implemented, which are described in the following. XR systems are best when providing an “immersive presence.” The term “immersion” refers to the feeling of being surrounded by an environment, which may be real, all virtual (in VR), or a combination of real and virtual (in AR and MR). When a XR user is immersed into a domain, then the user can feel one’s physical and spatial existence and can act, react, and interact with surrounding beings, objects, and environment. This feeling is called “presence” which has different representations based on being in a real, all virtual (in VR), or a combination of real and virtual (in AR and MR) domain.

In order to explain the technical details, parts of the 3rd Generation Partnership Project (3GPP) standardization document “Technical Specification Group Services and System Aspects: Extended Reality (XR) in 5G” (3GPP TR 26.928 V16.1.0) will be used in the following (https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3534).

Right-Handed Coordinate

In Figure 2-3, the right-handed coordinate system is presented. The right-handed coordinate system is commonly used in 3D and graphics standards, specifications, designs, and systems, which include OpenXR, WebXR, 3GPP, IEEE, etc. Using one’s right hand, by spreading the thumb, index finger, and middle finger out in a perpendicular form (which is uncomfortable), the right-hand thumb can represent the X axis, index finger represents the Y axis, and the middle finger can represent the Z axis, which is why this is called the “right-handed coordinate” or the “Cartesian right-handed coordinate.” A location or position in the right-handed coordinate can be expressed using a point (x, y, z) where the reference center is the origin located at (0, 0, 0) in which x=0, y=0, and z=0. Directional movement in the right-handed coordinate is expressed using a vector which is based on a combination of 3D movement in the positive or negative X, Y, and Z axis directions. In reference to the origin of the right-handed coordinate, “right” is in the +X direction, “up” is in the +Y direction, and “forward” is in the -Z direction for direction changes and motion control. In addition, unless specified, all units are in meters.

A 3 D plane of X Y Z has an origin of 0, 0, 0 which represents the right-handed coordinates with a photo of a right hand with the index, middle, and thumb stretched out.

Figure 2-3

Right-Handed Coordinate and My Right Hand

3D Rotational Movement

Rotational movement of an object in a 3D domain is commonly expressed using roll, pitch, and yaw. These movements are commonly used in expressing motion of a flying object, like an airplane. For example, considering that the nose of an airplane is the front, changes in flight direction roll, pitch, and yaw can be defined as shown in Figure 2-4. Roll expresses clockwise or counterclockwise circular rotational movement of the airplane. Pitch expresses upward or downward rotational movement of the airplane. Yaw expresses the right or left rotational movement of the airplane.

Positions are represented using (x, y, z) coordinates (in 3D vector form) in reference to the origin (0, 0, 0) of the 3D space. In order to represent the orientation due to spatial rotational movement (i.e., roll, pitch, and yaw), an additional dimension has to be added, which makes the coordinates four dimensional (4D) resulting in a (x, y, z, w) vector coordinate called a “quaternion.” Quaternions have been used in many 3D graphic platforms in the past, in which some limitations have been discovered (e.g., gimbal lock problems), so more reliable quaternion coordinates have been created.

An abstract of an aircraft in an X Y Z plane labeled as the rotational movements of yaw, roll, and pitch.

Figure 2-4

3D Rotational Movement

Degrees of Freedom (DoF)

DoF represents the amount and types of movement a XR user or avatar can conduct in a 3D space. DoF is expressed using multiple independent parameters that change based on the user’s movement and changes in viewport within the 3D AR or VR space. A XR user will take actions and will interact with objects and the real/virtual environment, resulting in various movements, gestures, body reactions, expressions, and feelings (i.e., emotional and haptics).

In Figure 2-5, my avatar JM demonstrates the four types of XR based 3D DoF, which include 3DoF, 3DoF+, 6DoF, and constrained 6DoF movement that are described in the following (3GPP TR 26.928 V16.1.0).

3DoF movement commonly occurs when the XR user is sitting in a chair and can perform 3D movement around the X, Y, and Z axes as well as roll, pitch, yaw based rotational movement.

Outline pictures of a human with mounted headsets and control devices in both hands describe the 3 D o F, 3 D o F plus, 6 D o F, and constrained 6 D o F.

Figure 2-5

Avatar JM Demonstrating the Difference of XR based 3D DoF Types

3DoF+ movement enhances 3DoF by allowing additional translational movement along the X, Y, and Z axes for a limited range, which results from the XR HMD user’s head movement while the user is sitting in a chair.

6DoF movement enhances 3DoF by allowing full translational movements along the X, Y, and Z axes to include movement upward and downward (e.g., elevating and heaving), left and right (e.g., strafing and swaying), and forward and backward (e.g., walking and surging). 6DoF movement commonly occurs when the XR user is capable of walking, running, flying, or swimming through the 3D XR environment.

Constrained 6DoF movement is 6DoF but with 3D constraints added to movements along the X, Y, and Z axes and may include roll, pitch, yaw based rotational movement constraints. Constrained 6DoF movement is very common in metaverses and MMOGs because avatar movement is restricted to the walls, buildings, environment, and various obstacles.

Constrained 6DoF has also been called “Room Scale VR” which defines the room or area where the XR user can work, play, or experience around mimicking the user’s real-life motion in a limited virtual space.

3D environments with multi-rooms, spaces with uneven floor levels, and very large open areas are categorized as 6DoF or “unconstrained 6DoF,” because these are beyond the scope of constrained 6DoF.

These DoF types are used to program user or avatar movement and describe the XR device’s tracking sensors capabilities.

XR View, XR Viewport, and XR Pose

An XR view is a single 3D view of a scene or part of a scene. So, the XR view will change as the XR device’s view direction changes, which will result from the user’s head or eye movement. For 3D representation, the XR view displayed for each eye of the user may be different based on stereoscopic or monoscopic 3D displaying (3GPP TR 26.928 V16.1.0).

3D image rendering of the XR view needs to be accurately aligned with the user’s viewing properties, which include the field-of-view (FoV), eye offset, etc. to create effects like occlusion and parallax. When a XR user moves in the 3D domain, the user’s point of view will change, and objects and the environment will look different. In 3D visual domains, “occlusion” occurs when an object blocks the view of another object. In a 3D visual domain, the XR user will estimate the distance and size of objects when objects move relative to each other. This is based on “parallax” which is a phenomenon of relative movement of objects when the XR user’s point of view changes. These aspects are critical in providing 3D immersive presence through the XR HMD system. In the 3D reference space, the position and orientation of the XR user’s view defines the “view offset” amount, which will increase as the movement is larger and faster.

As presented in Figure 2-6, the XR Viewport (or viewport) is a flat rectangular 2D surface region (perpendicular to the viewing direction) of the XR view in which the target objects in the 3D space are displayed on the 2D surface of the XR device. Figure 2-6 is an example of a Microsoft HoloLens 2 user’s inside view (viewport diagram). The XR Viewport is characterized by a rectangular dimension in terms of width and height. The view frustum is the region of 3D space that is displayed on the XR HMD’s 2D screen. The view frustum is the region of space that starts at the near plane and extends to the far plane, which are both perpendicular to the viewing direction of the XR view. The view frustum is the region that defines the FoV of the XR HMD. The XR Viewport and the view frustum near plane are the same in many cases. In 6DoF cases with extensive environments (i.e., unconstrained 6DoF), the view frustum far plane may be set infinitely far away.

The XR Pose of the XR HMD user is called the “XR Viewer Pose.” For proper XR Viewer Pose representation, the HMD has to accurately track the user’s view and XR scene based on the XR Reference Space. A sequence of XR views can be used to form a views array, which can be used to describe a XR scene based on multiple viewpoints. A views array is constructed based on a XR Viewer Pose query to characterize the XR Reference Space.

A graphic defines how the X R tracker in constrained 6 D o F X R reference space views the five objects in X R spaces through the frustum.

Figure 2-6

XR Viewport, Field-of-View (FoV), and View Frustum. Only Object 1 and Object 4 Are in the View Frustum of the XR Device (Microsoft HoloLens 2)

XR 3D Tracking Techniques

XR applications require fast and accurate 3D spatial tracking, which are enabled by the following techniques (3GPP TR 26.928 V16.1.0). The XR Viewer Pose is generated from continuous tracking of the user’s movement in the XR Reference Space. The XR Viewer Pose is determined by the viewer’s position and orientation. Very accurate positional tracking is needed for the XR application to work properly. 3D positional tracking methods include outside-in tracking, inside-out tracking, world tracking, and simultaneous localization and mapping (SLAM). These technologies are possible using the XR feature extraction and tracking technologies described in Chapter 3 and the artificial intelligence (AI) based deep learning technologies explained in Chapter 4.

Outside-in tracking is conducted by (multiple coordinated) tracking sensors (e.g., cameras) located at fixed locations in the XR facility (e.g., having motion detection sensors installed at all corners of the XR simulation room). Later shown in Figure 2-8, a Hikvision DS-2TD4166T-25 electro-optic (EO) and infrared (IR) dual camera is used to assist outside-in tracking as well as AR and VR automatic switching for continuous MR applications.

Outside-in tracking is possible only for a region that has preinstalled tracking sensors supporting it. Outside the preset tracking region, outside-in tracking cannot provide any information, and therefore, one (or multiple) of the following tracking methods needs to be used.

Inside-out tracking is conducted by multiple coordinated tracking sensors located on the XR HMD device. These sensors track the surroundings and identify the HMD’s location and position based on internal processing of the sensed information. This is the most common tracking method applied on almost all XR HMDs, glasses, and headsets.

World tracking is conducted by one (or multiple coordinated) camera(s). The camera(s) determines the XR device’s location and position as well as surrounding objects and the environment. World tracking is commonly used in AR devices and can be used in VR devices.

SLAM is used to create a map (or a territory picture) by combining multiple world tracking images from a XR HMD. SLAM connects multiple images and sensor data to create a map (or a territory picture) of a much wider region. SLAM could construct a more accurate and wider map (or territory picture) if images from multiple HMDs (and/or stationary cameras and sensors) can be used.

Localization Techniques

Localization is a technique to determine the location and position of users and objects in a 3D domain. Localization is used to create spatial maps for XR applications. Localization can be conducted using multiple sensors, cameras (e.g., monocular, stereo, and depth cameras), radio beacon signals, global positioning system (GPS), assisted-GPS (A-GPS), ultra-wideband (UWB), ultrasonic sensors, radio detection and ranging (RADAR), light detection and ranging (LIDAR), millimeter waves (mmWaves), gyroscope sensors, inertial sensors, etc. Signal analysis will be used on reference signal received power (RSRP), received signal strength indicator (RSSI), time of arrival (ToA), estimated ToA (ETA), variations in wavelength or phase, distortion of spectrum, etc.

More information used in the localization computations can help to increase the accuracy but will require more processing and memory. Some of the localization techniques are described in the following, which include spatial anchors and visual localization techniques (3GPP TR 26.928 V16.1.0).

Spatial anchors are reference points in the 3D domain that have known locations and can be used as references in calculating the position of other objects and users. Spatial anchors are known to provide accurate localization within a limited range. For example, in the Microsoft Mixed Reality Toolkit, the spatial anchoring has a 3 m radius. To increase the accuracy and range, coordination of multiple spatial anchors can be used. SLAM can also be used in the localization and tracking process. Visual localization uses visual SLAM (vSLAM), visual positioning system (VPS), and other visual imaging processing techniques as well as sensor data to conduct localization. More technical details are described in Chapter 3 in XR based detection technologies and Chapter 4 in deep learning technologies.

XR Experience

Immersive presence is a combination of two feelings: The feeling that you are surrounded by the metaverse/game environment (immersion) and the feeling that you are physically and spatially located in that environment (presence). There are two types of presence that the XR system needs to provide, which are “cognitive presence” and “perceptive presence.”

Cognitive presence is the presence that is felt in the user’s mind and is more content based (e.g., a good storyline) as it is triggered by imagination. Even without any XR device, cognitive presence can be felt by reading a good book or watching a good movie.

Perceptive presence is the presence that is felt through the user’s five senses (i.e., eyesight, hearing, touch, taste, and smell) or a few of them. At the current technical level, perceptive presence based on eyesight and hearing is common in AR, VR, and MR devices. XR systems that can provide a perceptive presence based on three senses (i.e., eyesight, hearing, and touch) are possible using haptic devices (more details are in Chapter 3). However, the level of XR performance needs much more improvement, in which hopefully this book will help accomplish in the near future. Perceptive presence is a highly technical feature of XR systems as all of the introduced XR technologies (i.e., 3D coordinates, rotational movement, DoF, FoV, XR view, XR Viewport, XR Pose, tracking, localization, etc.) need to be used to create a high-quality XR perceptive presence (3GPP TR 26.928 V16.1.0).

XR Device and Network Performance Requirements

To create a high-level XR based immersive presence, a very low time delay and a very high-quality image and display performance are required. The time delay performance bound can be derived from the delay tolerance level, which is influenced by the round trip interaction delay. The image and display performance factors include the required image resolution, frame rate, and FoV. More details are described in the following (3GPP TR 26.928 V16.1.0).

Delay Tolerance

The round trip interaction delay is obtained by adding the User Interaction Delay to the Age of Content, which is described in the following.

User interaction delay is the length of time from the moment the user initiates an action until the time the content creation engine of the XR system has taken the action into account. User interaction delay is influenced by the sensor delay/cycle time, data processing time, communication/networking time, and queueing (memory) delay.

Age of content is the length of time from the moment the content is created until the time the content is presented to the XR user. Age of content is influenced by the content processing time, communication/networking time, queueing (memory) delay, and display time.

For XR systems, a total latency less than 20 ms is recommended. The total latency covers the time from the moment the XR user’s head moves up to the time that changes on the XR device’s display are complete. This is a very short time delay compared to MMOGs, which are also very high-speed systems.

Among various MMOGs, the first-person shooter (FPS), role playing game (RPG), and real-time strategy (RTS) types are most relevant to XR systems and metaverses. Shortened name-wise, an FPS-type MMOG is commonly called a “massively multiplayer online first-person shooter” (MMOFPS) game. Likewise, an RPG based MMOG is commonly called a MMORPG game, and an RTS based MMOG is commonly called a MMORTS game. MMOFPS games have a delay tolerance of 100 ms (i.e., 0.1 seconds), MMORPG games have a delay tolerance of 500 ms (i.e., 0.5 seconds), and MMORTS games have a delay tolerance of 1,000 ms (i.e., 1 second). However, to achieve a higher MMOG performance, a delay time much shorter than the delay tolerance level is recommended (3GPP TR 26.928 V16.1.0).

Image and Display Requirements

Image resolution is an important factor in deciding the quality of the immersive presence.

The FoV, frame rate, and image resolution are all important factors. Considering that the Meta Quest 2 is the leading XR HMD by a far margin, references to the Oculus specifications will be considered as a guideline.

The founder of Oculus Rift Palmer Luckey mentioned that at least an 8K resolution (i.e., 8196 × 4096 pixel image frame) per eye is needed to eliminate the pixelation effect. Pixelation occurs when the resolution is low such that the pixels forming the image are noticeable by the viewer. When pixelation occurs, it is hard for the XR user to feel an immersive presence, and the aspects of the animation and augmentation are noticeable. An image frame rate higher than 90 Hz (i.e., 90 frames/s) is recommended to create a comfortable and compelling virtual presence feel for the user (https://developer.oculus.com/blog/asynchronous-timewarp-examined/). Considering that most of the TV programs we watch have a fame rate in the range of 24 to 60 frames/s, the recommended 90 frames/s is a very high frame rate. In addition, a wide field-of-view (FOV) in the range of (at least) 100 to 110 degrees is recommended.

XR CPU and GPU Engines

The XR device’s engine and software platform and operation relation between the central processing unit (CPU) and graphics processing unit (GPU) is illustrated in Figure 2-7 (3GPP TR 26.928 V16.1.0). The XR engine is in charge of the image rendering engine, audio engine, physics engine, and the artificial intelligence (AI) engine (www.nrexplained.com/tr/26928/xreng).

The rendering engine is in charge of all image and video rendering processes. Rendering is a computer process of creating an image from 2D or 3D models. The final image is called the “render” which may be photorealistic or non-photorealistic. Rendering requires computer graphic editing of the scene, object description, geometry, viewpoint, texture, lighting, transparentness, reflection, shading, information description, as well as augmented and virtual objects and domains. Rendering is the major final step in 2D and 3D computer graphics generation and is used in every image creation operation including XR, MR, AR, VR, video games, simulators, movies, TV, visual effects, animation, architecture, and design visualization (https://en.wikipedia.org/wiki/Rendering_(computer_graphics)).

The audio engine oversees the microphones and speakers and needs to assist the AI engine for voice control and speech-to-text (STT) processing. The audio engine processes the loading, modification, and balancing of speaker sound effects. Special acoustic effects using echoes, pitch and amplitude balancing, oscillation, and Doppler effects are essential in creating an immersive presence experience.

The physics engine reads and processes all sensor information and emulates realistic physical law based immersive presence effects. Accurate simulation of physical forces like collision, pressure, vibration, texture, heat, and chill are needed to enable the XR user to feel the air, wind, water, fluids, as well as solid objects. In the future, smell and taste physical effects could be added. By then the five basic human senses can all be influenced by the XR system, creating a full dimensional immersive presence experience.

The AI engine is used in everything and everywhere, including all XR devices, metaverses, MMOGs, and multimedia services. This is why Chapter 4 is separately dedicated to deep learning-based systems and their relations to processing audio, video, and sensor data as well as making optimized estimations and accurate predictions. So please refer to Chapter 4 for more details on the AI engine for XR devices, metaverses, MMOGs, and multimedia services.

A block diagram defines how the G P U commands and data streams flow through the system memory of the C P U and the video memory of the G PU.

Figure 2-7

XR Engine and Software Platform for CPU and GPU Operations

The CPU is in charge of executing the XR engine, which includes accessing the system memory as well as controlling the application and management, scene graph, AI, audio, streaming, physics, and localization. The abstraction APIs like Vulkan, OpenGL, etc. are also processed by the CPU. XR devices use multicore CPUs in which the operating system (OS) and kernel will use to form directed acyclic graph (DAG) based processing sequences, in which multiple processing threads are executed over. The AI system of the CPU will process the deep learning algorithms that support the sensors, sound, and images of the XR system. The deep learning recurrent neural network (RNN) used for the sensors, physics, audio, and localization processing is an infinite impulse response process that has a DAG that cannot be unrolled. The deep learning convolutional neural network (CNN) used for video, image, scene graph, localization, feature detection, and feature tracking is a finite impulse response process that has a DAG that can be unrolled (https://en.wikipedia.org/wiki/Recurrent_neural_network).

The GPU will receive commands and data streams from the CPU. The GPU is in charge of video memory access as well as the image rendering pipeline process. The vertex shader is a programmable shader that is in charge of processing the vertices. Shaders are user-defined programs that are executed on selected stages of the GPU process. The primitive assembly function will convert a vertex stream into a sequence of base primitives. A base primitive is the processed result of the vertex stream’s interpretation. The rasterization process will convert the image data and signals into projected images for video and graphics formation. Shader functions include fragmentation, vertex processing, geometry computations, and tessellation. The tessellation process assists arrangements of polygon shapes considering gapless close fittings and nonoverlapping repeated patterns. The shaders also consider the texture information of the objects and environment. (www.khronos.org/opengl/wiki/Rendering_Pipeline_Overview).

XR Software Engines and Platforms

The major XR software engines and platforms are introduced in chronological order of initial release, which include Lua (1993), Unreal (1998), Unity (2005), WebGL (2011), SteamVR (2014), WebVR (2016)/WebXR (2018), and OpenXR (2017).

Lua

Lua is a popular cross-platform OS programmed in ANSI C that was first released in 1993. Lua uses the filename extension “.lua” (www.lua.org/). Up to version 5.0, Lua was released under a license that is similar to the BSD license, but from version 5.0 it has been released under the MIT License. Lua means “moon” in Portuguese, which was named considering the programming platform SOL which means “sun” in Portuguese. Lua uses parts of the SOL data-description syntax, parts of the control structures of Modula, parts of the multiple assignments and returns structure of CLU, as well as parts of C++, SNOBOL, and AWK. Lua was originally designed to be a general embeddable extension language that is easy to use and could enable software applications to be customized quicker by enhancing the speed, portability, and extensibility. Lua is a popular programming language for video games and applications because it is easy to learn and embed and enables fast execution. The representative games developed using Lua include Roblox, Garry’s Mod, Payday 2, Phantasy Star Online 2, Dota 2, Angry Birds Space, Crysis, etc. In addition, Lua has been used by many applications such as Adobe Lightroom, Moho, iClone, Aerospike, etc. A poll conducted by GameDev.net revealed that Lua was the most popular scripting language for game programming in 2003 (https://en.wikipedia.org/wiki/Lua_(programming_language)).

Unreal

The Unreal Engine (UE) was developed by Epic Games and was first released in the first-person shooter (FPS) game Unreal in 1998 (www.unrealengine.com). UE is a 3D computer graphics game engine that is programmed in C++ which has superior portability as it can be used on various computer OSs, smart device OSs, game consoles, and XR platforms. This is why UE has been used in development of films, television programs, and metaverses, in addition to video games. Unreal was also listed in the Guinness World Record as the “most successful video game engine” in 2014. A list of games made with UE can be found in Wikipedia (https://en.wikipedia.org/wiki/List_of_Unreal_Engine_games). The source code of the Unreal Engine 5 is available on GitHub since April of 2022 (https://github.com/EpicGames) and can be used through a registered account. Commercial product use of UE with royalty is possible. Epic has waived the company’s royalty margin until the game developers have accumulated a revenue of $1 million, and if the game is published on the Epic Games Store, the developer’s fee is waived (but policies may change).

Unity

Unity was developed by Unity Technologies as a cross-platform game engine (https://unity.com). Unity was released in June of 2005 as a macOS X game engine at the Apple Worldwide Developers Conference. Although initially Unity was released for macOS X, its goal was set to democratize game development by providing an easy to learn and use game engine that works on many platforms (https://en.wikipedia.org/wiki/Unity_(game_engine)). Unity has been used in various computer and smartphone games and applications as it enables game engine development on over 19 platforms, where a list is provided in the following.
  • Mobile platforms: iOS, Android (Android TV), tvOS

  • Desktop platforms: Windows (Universal Windows Platform), macOS, Linux

  • Web platforms: WebGL

  • Console platforms: PlayStation (PS4, PS5), Xbox (Xbox One, Xbox Series X/S), Nintendo Switch, Stadia

  • XR platforms: Oculus, PlayStation VR, Google’s ARCore, Apple’s ARKit, Windows Mixed Reality (HoloLens), Magic Leap, Unity XR SDK SteamVR, Google Cardboard

  • Other platforms: Film, automotive, architecture, engineering, construction, and the US Armed Forces

Unity is written in C++ (runtime) and C# (Unity Scripting API) based on a proprietary license.

The Unity game engine can be used to develop 2D and 3D games as well as mobile apps, metaverses, interactive simulations, and XR applications. The Unity game engine is easy to learn and use because it provides a primary scripting API in C# using Mono that supports the Unity editor, which provides various plug-in options as well as drag and drop functionality. Unity provides the High-Definition Render Pipeline (HDRP), Universal Render Pipeline (URP), and the legacy built-in pipeline, which are incompatible with each other. Because Unity is easy to learn and use, many indie game developers have used it.

Indie games are video games made by an individual programmer or a team of programmers that do not belong to a game publisher. Indie games also include games that the programmer/designer has creative freedom in the game’s development even though a game publisher is funding and distributing the game. Games developed and released by major game publishers are typically called triple A (i.e., AAA) games, which are different to indie games. The name “indie game” is an abbreviated term for “independent video game” (https://en.wikipedia.org/wiki/Indie_game).

WebGL

WebGL was initially released in March of 2011 by the Mozilla Foundation and was developed by the Khronos WebGL Working Group (https://get.webgl.org). WebGL supports highly accelerated GPU processing by supporting physics and image processing. The Khronos Group announced in February of 2022 that WebGL 2.0 is supported for all major browsers. WebGL is short for “Web Graphics Library” which is a library of JavaScript APIs for web browsers that can be used to render interactive 2D and 3D graphics. WebGL enables accelerated GPU processing of web browsers when it is fully integrated with other web applications (www.khronos.org/webgl/wiki/Main_Page).

SteamVR

SteamVR is a hardware and software platform for VR devices that was developed by the Valve Corporation (www.steamvr.com/en/). SteamVR was first applied on the Oculus Rift HMD in 2014 and since has been widely used to support many HMDs, which include the HTC Vive and Valve Index. SteamVR technology supports accurate positional tracking to enhance room-scale immersive presence experiences, which has significantly contributed to (constrained) 6DoF application development. SteamVR originates from the popular Steam video game technologies, digital distribution services, and storefront developed by the Valve Corporation which launched in September of 2003. Originally, SteamVR provided support for Windows, macOS, and Linux platforms, but in May of 2020, the support for macOS ended (https://en.wikipedia.org/wiki/Steam_(service)#SteamVR).

WebXR

The WebVR API was initiated in 2014, and in March of 2016 the WebVR API proposal version 1.0 was released by the Mozilla VR team and the Google Chrome team. In 2018, the WebXR Device API was released to supersede WebVR by extending the API to support XR applications which includes MR, AR, and VR. WebXR enables website access to the immersive content provided on XR devices (including MR, AR, and VR) which include HMDs, headsets, sensor information, tracking, and sound effects (https://immersiveweb.dev/). WebXR was designed to be easily integrable with various web browser tools and is commonly used with WebGL. WebXR provides a very versatile interface which enables various XR devices to access the WebXR content with very little modifications. The web browser access of XR applications significantly reduces development time and performance testing. However, there are many web browser types which can create interoperability issues, and web browsers have limitations in immersive expressions and user interaction compared to HMDs, so not every XR aspects can be developed in WebXR (https://en.wikipedia.org/wiki/WebXR). The technical details of WebXR are defined by the Immersive Web Group of the World Wide Web Consortium (W3C), where the specifications are made public on GitHub (https://github.com/immersive-web/webxr).

OpenXR

OpenXR is an open standard that supports XR platforms and device development that was developed and released by the Khronos Group in February of 2017 (www.khronos.org/openxr/). It is a cross-platform OS based royalty-free standard that is accessible in GitHub (https://github.com/KhronosGroup/OpenXR-SDK-Source). OpenXR’s API has the following fundamental elements. “XrSpace” represents the 3D space, “XrInstance” represents the OpenXR runtime, “System” represents the XR devices, “XrSystemId” represents the XR controllers, “XrActions” helps to control user inputs, and “XrSession” assists the interaction session between the application and the user/runtime. OpenXR has been used for game and rendering engine support for the Unreal Engine (since September of 2019), Blender (since June of 2020), Unity (since December of 2020), Godot (since July of 2021), etc. OpenXR has been used to enable WebXR browser support on Google Chrome and Microsoft Edge. Some of the representative OpenXR conformant platforms include the Microsoft HoloLens 2, Windows Mixed Reality headsets, Oculus PC platform, Facebook Quest, Meta Quest 2, Valve SteamVR, VIVE Cosmos, and VIVE Focus 3 (https://en.wikipedia.org/wiki/OpenXR).

XR Lab Setup and Haptic Devices

A photograph of a metaverse X R system extended reality laboratory setup. It has a computer, an X R system setup, and posters of the device.

Figure 2-8

A Part of My Metaverse XR System Research and Development Lab Which Includes a Virtuix Omni VR 3D Treadmill with a Meta Quest 2 HMD and a Hikvision DS-2TD4166T-25 Camera (in the Far Back on the Right Side of the Whiteboard)

Figure 2-8 shows a part of my Metaverse XR system research and development lab (at Yonsei University, Seoul, South Korea) which shows a Virtuix Omni VR 3D treadmill with a Meta Quest 2 HMD and a Hikvision DS-2TD4166T-25 camera.

The Hikvision DS-2TD4166T-25 electro-optic (EO) and infrared (IR) dual camera is shown in the far back of the room. This camera can be used to control the AR and VR domain switching for MR applications, as the HoloLens 2 AR device can provide a constrained 6DoF (limited to the room size) where VR is needed to extend the limited room with doors that open up to different VR multiverses and metaverses. The IR camera helps the automated precision control especially when it is dark indoors or for outdoor XR experiments at night.

The Hikvision DS-2TD4166T-25 EO camera resolution is 1920×1080 (i.e., 2 MP level), and the IR camera resolution is 640 × 512 in support of a temperature range of -20 °C to 550 °C with a ±0.5°C error margin and a distance range of 200 m.

Figure 2-9 shows the Microsoft HoloLens 2, Manus Prime X Haptic VR gloves, and Meta Quest 2 in my Metaverse XR system research and development lab. More details on these systems are presented in the following.

A photograph of Microsoft Hololens 2, robotic gloves, and Meta Quest 2. The devices resemble head-mounted devices with arms.

Figure 2-9

Microsoft HoloLens 2 (left), Manus Prime X Haptic VR Gloves (middle), and Meta Quest 2 (right) Systems in My Metaverse XR System Research and Development Lab

The five basic human senses are sight, smell, touch, taste, and hearing, in which current MR, AR, or VR systems can commonly express sight and hearing effects, and a few XR systems can provide vibration-based touch effects in addition to the sight and hearing effects. Therefore, research and development of new XR technologies to express smell, touch, and taste effects need to be added. Some example haptic systems are described in the following.

Manus Prime X Haptic VR

Manus Prime X Haptic VR gloves are made of polyester (77%) and spandex (23%), and each glove has five flexible sensors and six inertial measurement units (IMUs) for movement sensing and includes programmable vibration motors inside the casings for haptic feedback and weighs 60 g. The IMU uses three types of sensors, which are the accelerometer, gyroscope, and magnetometer (www.manus-meta.com/products/prime-x). The gloves’ accelerometers measure linear motion direction (i.e., X, Y, Z direction) acceleration. The gyroscopes detect roll, pitch, and yaw based rotational motion with a +/- 2.5 degrees orientation accuracy. The magnetometer finds the direction of the strongest magnetic force. The signal latency is less than 5 ms with a maximal range of 30 m. Prime X Haptic VR can be used for application development with Unity, SteamVR, Oculus, Vive, and other XR software platforms. Figure 2-10 shows the Manus Prime X Haptic VR gloves. The detection management user interface (with enhancements by Wonsik Yang) that includes an avatar robotic hand with sensor and IMU data displays is presented in Figure 2-11.

A photograph of a Manus Prime X Haptic V R gloves placed on a stand. It has several sensors on the fingers connected by wires to the controllers.

Figure 2-10

Manus Prime X Haptic VR Gloves

Three simulated responses were recorded by the robotic Manus Prime X Haptic V R gloves along with the action performed by the hands wearing the gloves.

Figure 2-11

Manus Prime X Haptic VR Gloves Detection Management Interface with Avatar Robotic Hand and Sensor and IMU Data Display

Ultraleap Mid-Air Haptics

The STRATOS Inspire ultrasound speaker and STRATOS Explore from Ultraleap use programmable 3D sound sensor arrays, which can generate ultrasound waves that coincide to create mid-air haptic effects. These systems can generate a virtual touch feeling as well as a slight vibration feeling on the user’s hand (www.ultraleap.com/haptics/).

HaptX Gloves DK2

HaptX DK2 are haptic gloves from HaptX that can provide force feedback to the user’s hands. The feedback can be up to 40 pounds in the form of resistive force, so the user can feel and work with heavy objects in XR applications. The magnetic motion capture sensors conduct very accurate motion detection and enable the HaptX DK2 haptic gloves to have submillimeter precision sensing capability (https://haptx.com/).

Teslasuit

The haptic gloves and full body haptic suit from Teslasuit have some unique features that can enhance XR user experiences (https://teslasuit.io/). Teslasuit’s haptic glove and suit can be used to monitor the heart rate, oxygen saturation, or other biometric information of the user, which can be used to analyze the mental state and stress level of the user in training or playing games. The haptic gloves can generate a feel of solid object texture using electric sensors with 45 channels and can generate a feel of virtual object resistance force. Using these gloves, the user can feel up to 9 N of feedback resistance force. For reference, the “N” stands for “Newton” which is the unit of force. To provide you with a feel of force, 1 N of force can accelerate an object with a mass of 1 kg to 1 m/s2 (www.britannica.com/science/newton-unit-of-measurement). The haptic suit covers the user’s full body, so full body sensing and feedback is possible. The haptic suit can provide near real-life sensations and feelings through electro muscle stimulations and transcutaneous electrical nerve stimulation.

3D Motion Treadmills

When using a XR HMD, glasses, or headset, the user is focused on the environment, objects, and events happening in the XR domain. Therefore, the user can easily lose balance, fall down, fall off, or bump into other people or various objects including walls, fences, doors, etc. All of these accidents can result in a physical injury (and in some cases death) as well as property damage.

In the preceding DoF description presented, the 3DoF and 3DoF+ are based on the user sitting in a chair while using the XR device. Even in these cases, many accidents have happened to XR users, but these accidents may not be as serious as the accidents that can occur to XR users of 6DoF and constrained 6DoF cases. This is because XR users of 6DoF and constrained 6DoF cases are standing up and moving around in the XR domain, where the actual room or location will have a different physical structure to the metaverse or MMOG environment viewed in the XR HMD, glasses, or headset. Especially, a higher level of immersive presence will result in a higher risk of an accident because the XR user will have a lesser amount of awareness of one’s actual environment. This is why XR users of 6DoF and constrained 6DoF systems need a 3D motion treadmill. Selected 3D motion treadmills for XR applications are presented in the following.

Virtuix Omni Treadmill

Virtuix Omni is a VR treadmill device developed by Virtuix to assist 6DoF and constrained 6DoF VR gaming by enabling full body movement in VR environments (www.virtuix.com/). The Virtuix Omni was initially presented at the 2015 Consumer Electronics Show (CES) in Las Vegas. This 3D motion treadmill has a belt system that supports the user from falling while always keeping the user safely in the center of the treadmill. This 3D motion treadmill allows 360 degrees moving and running. The user’s foot movement is tracked using the inertial measurement unit (IMU) tracking sensors attached to the custom shoes. The motion sensors on the custom shoes communicate wirelessly with an external computer providing movement information. The surface of the 3D motion treadmill has low friction coating applied, which makes it very slippery to minimize friction such that less resistance is felt during movement. The current price of a Virtuix Omni treadmill ranges from $1,995 to $2,295 (https://en.wikipedia.org/wiki/Virtuix_Omni).

Figure 2-12 shows the Virtuix Omni VR Treadmill in my Metaverse XR system research and development lab (at Yonsei University, Seoul, South Korea). Figure 2-13 shows the motion sensors (i.e., Pod 1 and Pod 2) used on the shoes of the Virtuix Omni VR Treadmill. These motion sensors need to be charged in advance and are installed on top of the special shoes as shown in Figure 2-14. The detected motion and status of the devices are transferred to the Virtuix Omni VR Treadmill control and management interface through the wireless sensors as shown in Figure 2-15.

A photograph of a Virtuix Omni V R treadmill. It has a space to stand, and robotic arms with a belt and shoes to be worn by the user.

Figure 2-12

Virtuix Omni VR Treadmill

A photograph of two motion sensors labeled as Omni. It has three lights on the left, with one switched on.

Figure 2-13

Motion Sensors (Pod 1 and Pod 2) Used on the Shoes of the Virtuix Omni VR Treadmill

A photograph of shoes used by the user while standing on the V R treadmill. It reads left and right. Both have motion sensors above them.

Figure 2-14

Shoes Used on the Virtuix Omni VR Treadmill with the Wireless Sensors (Pod 1 and Pod 2) Installed

A screenshot of a dark screen labeled Advanced information. It displays the device list, information, raw data, keyboard, and physical tracking data.

Figure 2-15

Virtuix Omni VR Treadmill Control and Management Interface

Cyberith Virtualizer

Cyberith Virtualizer is an omnidirectional treadmill developed by Cyberith that supports 360 degrees of walking and running in VR applications. The VR walking platform supports full movement by using six optical motion sensors as well as an optical rotation sensor and optical height sensor. The Cyberith Virtualizer has two versions, the Virtualizer ELITE 2 and the Virtualizer R&D Kit. The Virtualizer ELITE 2 is a powered walking platform. The Virtualizer R&D Kit supports higher precision and lower latency optical motion tracking (www.cyberith.com/?gclid=CjwKCAjwq5-WBhB7EiwAl-HEkj5rqxgnDK-uZx5AlhWBSL5QIzvJEcSuuEH1EH2qVIhyB2qwkFbE3BoCuAwQAvD_BwE).

KAT Walk C

KAT Walk C is an omnidirectional treadmill developed by KATVR that supports 360 degrees of walking, running, crouching, and jumping motion detection to support VR gaming. The system also provides real-time haptic response through vibration to the user. KAT Walk C was designed to be cost-effective while occupying a small space for comfortable use in private homes and small rooms. KAT Walk C provides diverse scenario support which includes sit-down interaction for vehicle control gaming (www.kat-vr.com/products/kat-walk-c2).

Spacewalker VR

Spacewalker VR treadmill is a VR treadmill in prototyping phase by SpaceWalkerVR that supports forward and backward (mono-directional) movement only. The system is equipped with a gyroscope that detects user movement such as walking, running, shooting, and picking up objects. The safety system uses integrated pressure sensors to automatically regulate speed such that it can prevent the user from falling (https://spacewalkervr.com/).

Infinadeck

Infinadeck is a VR application supportive 3D motion treadmill developed by Infinadeck. Infinadeck includes an active wireless control system and enables 360 degrees of movement through an omnidirectional moving floor (www.infinadeck.com/).

Summary

This chapter provided details on metaverse and XR systems, where the characteristics of XR, MR, AR, and VR were first described, followed by XR system components and the processing workflow. Then details on the hardware engines, software platforms, haptic devices, and 3D motion treadmills were provided. These advanced XR technologies will continue to evolve and drive metaverse XR services to improve significantly. In the following Chapter 3, XR head mounted display (HMD) technologies are introduced, which include XR device types, XR operation processes, as well as XR feature extraction and description algorithms.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.212.211