XR, MR, AR, and VR
XR System Components and Workflow
XR CPU and GPU Engines
XR Software Engines and Platforms
XR Lab Setup and Haptic Devices
3D Motion Treadmills
XR, MR, AR, and VR
Virtual reality (VR) enables its user to immerse into an artificial virtual environment where the user will commonly use an avatar to exist and interact inside the VR space. The user’s view in VR is different from the real environment, and therefore, fantasies and illusions are easy to create in the VR virtual environment. This unlimited design freedom is why VR is the most common platform for massively multiplayer online games (MMOGs) and metaverses.
Augmented reality (AR) is a mixture of real-life and VR parts, where users can obtain useful information about a location or objects and can interact with virtual contents in the real world. Commonly, an AR user can distinguish the superimposed virtual objects, augmented sounds, or haptic actions. AR users may be able to turn on or turn off selected AR functions (which may be related to certain objects, sounds, or haptics). In comparison with VR, AR users commonly feel less separated from the real world as they can see their actual environment in the background view. Virtual fantasies and illusions can be created and superimposed on a real-world view using AR.
Currently, most metaverse services are based on VR user environments. This is mainly because many types of video games and MMOGs have been using VR technology to enable player avatars to compete in virtual game environments, which may include multiple domains and universes. MMOGs are Internet connected online video games that enable many players to simultaneously participate, collaborate in teams, and compete in the same game environment.
MMOGs require high-quality networks, game servers, high-performance player computer terminals, as well as virtually created multiple game domains, dimensions, and/or universes. Thus, such MMOGs can be considered as the one of the original purposes to create and use a metaverse.
However, it is predicted that soon XR will be massively used in creating metaverse services. XR technology can enable users to experience both real and virtual environments by interpolating mixed reality (MR), AR, and VR with various haptic user interfaces to make the user’s environment more interactive, realistic, broader, diverse, and immersive. New XR application programming interfaces (APIs) can enable various types of human-human, human-machine, and machine-machine interaction.
Future metaverse services will need advanced XR equipment as well as high-performance wired and wireless networks to support the required data rate, time delay, and reliability requirements. Advanced metaverse services require high levels of Quality of Service (QoS) and Quality of Experience (QoE) to enable the XR system to provide a satisfying level of immersive and virtual presence.
Based on this focus, new standards on XR equipment and networking services are being prepared by global organizations, like the 3rd Generation Partnership Project (3GPP) and the Institute of Electrical and Electronics Engineers (IEEE). Standards provide a way for new products and services to interoperate, safely be used, while satisfying the basic commonly required functionalities.
Just like MMOGs, metaverse systems can be supported by many types of networking platforms and be played using a personal computer (PC), video game console, smartphone, head mounted display (HMD), as well as various Internet of Things (IoT) devices. Therefore, metaverse applications have to be designed considering users in diverse environments using various interactive XR interfaces.
A full XR process is complex and hard to implement because XR requires accurate coordination of multiple cameras and sensors for acoustics, motion, and haptics to fit the user’s senses such that the experience is immersive and comfortable. A full XR system will implement MR which will interconnect the real surroundings of the user through AR and access virtual domains using VR seamlessly. Evidently AR is more difficult to implement than VR, and MR technology is more difficult to implement than AR. Considering that XR is a combination of component technologies that include MR, speech-to-text (STT), voice recognition systems, special sound effects, as well as haptic and 3D motion user interfaces (UIs), XR technology is even more difficult to implement than MR technology. As a result, evolution toward XR metaverses will take several years in which significant technological breakthroughs are needed, which is why large research and development investments are needed.
XR System Components and Workflow
In order to express 3D visual effects on a XR system, various visual sensing and expression techniques have to be implemented, which are described in the following. XR systems are best when providing an “immersive presence.” The term “immersion” refers to the feeling of being surrounded by an environment, which may be real, all virtual (in VR), or a combination of real and virtual (in AR and MR). When a XR user is immersed into a domain, then the user can feel one’s physical and spatial existence and can act, react, and interact with surrounding beings, objects, and environment. This feeling is called “presence” which has different representations based on being in a real, all virtual (in VR), or a combination of real and virtual (in AR and MR) domain.
In order to explain the technical details, parts of the 3rd Generation Partnership Project (3GPP) standardization document “Technical Specification Group Services and System Aspects: Extended Reality (XR) in 5G” (3GPP TR 26.928 V16.1.0) will be used in the following (https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3534).
Right-Handed Coordinate
3D Rotational Movement
Rotational movement of an object in a 3D domain is commonly expressed using roll, pitch, and yaw. These movements are commonly used in expressing motion of a flying object, like an airplane. For example, considering that the nose of an airplane is the front, changes in flight direction roll, pitch, and yaw can be defined as shown in Figure 2-4. Roll expresses clockwise or counterclockwise circular rotational movement of the airplane. Pitch expresses upward or downward rotational movement of the airplane. Yaw expresses the right or left rotational movement of the airplane.
Degrees of Freedom (DoF)
DoF represents the amount and types of movement a XR user or avatar can conduct in a 3D space. DoF is expressed using multiple independent parameters that change based on the user’s movement and changes in viewport within the 3D AR or VR space. A XR user will take actions and will interact with objects and the real/virtual environment, resulting in various movements, gestures, body reactions, expressions, and feelings (i.e., emotional and haptics).
In Figure 2-5, my avatar JM demonstrates the four types of XR based 3D DoF, which include 3DoF, 3DoF+, 6DoF, and constrained 6DoF movement that are described in the following (3GPP TR 26.928 V16.1.0).
3DoF+ movement enhances 3DoF by allowing additional translational movement along the X, Y, and Z axes for a limited range, which results from the XR HMD user’s head movement while the user is sitting in a chair.
6DoF movement enhances 3DoF by allowing full translational movements along the X, Y, and Z axes to include movement upward and downward (e.g., elevating and heaving), left and right (e.g., strafing and swaying), and forward and backward (e.g., walking and surging). 6DoF movement commonly occurs when the XR user is capable of walking, running, flying, or swimming through the 3D XR environment.
Constrained 6DoF movement is 6DoF but with 3D constraints added to movements along the X, Y, and Z axes and may include roll, pitch, yaw based rotational movement constraints. Constrained 6DoF movement is very common in metaverses and MMOGs because avatar movement is restricted to the walls, buildings, environment, and various obstacles.
Constrained 6DoF has also been called “Room Scale VR” which defines the room or area where the XR user can work, play, or experience around mimicking the user’s real-life motion in a limited virtual space.
3D environments with multi-rooms, spaces with uneven floor levels, and very large open areas are categorized as 6DoF or “unconstrained 6DoF,” because these are beyond the scope of constrained 6DoF.
These DoF types are used to program user or avatar movement and describe the XR device’s tracking sensors capabilities.
XR View, XR Viewport, and XR Pose
An XR view is a single 3D view of a scene or part of a scene. So, the XR view will change as the XR device’s view direction changes, which will result from the user’s head or eye movement. For 3D representation, the XR view displayed for each eye of the user may be different based on stereoscopic or monoscopic 3D displaying (3GPP TR 26.928 V16.1.0).
3D image rendering of the XR view needs to be accurately aligned with the user’s viewing properties, which include the field-of-view (FoV), eye offset, etc. to create effects like occlusion and parallax. When a XR user moves in the 3D domain, the user’s point of view will change, and objects and the environment will look different. In 3D visual domains, “occlusion” occurs when an object blocks the view of another object. In a 3D visual domain, the XR user will estimate the distance and size of objects when objects move relative to each other. This is based on “parallax” which is a phenomenon of relative movement of objects when the XR user’s point of view changes. These aspects are critical in providing 3D immersive presence through the XR HMD system. In the 3D reference space, the position and orientation of the XR user’s view defines the “view offset” amount, which will increase as the movement is larger and faster.
As presented in Figure 2-6, the XR Viewport (or viewport) is a flat rectangular 2D surface region (perpendicular to the viewing direction) of the XR view in which the target objects in the 3D space are displayed on the 2D surface of the XR device. Figure 2-6 is an example of a Microsoft HoloLens 2 user’s inside view (viewport diagram). The XR Viewport is characterized by a rectangular dimension in terms of width and height. The view frustum is the region of 3D space that is displayed on the XR HMD’s 2D screen. The view frustum is the region of space that starts at the near plane and extends to the far plane, which are both perpendicular to the viewing direction of the XR view. The view frustum is the region that defines the FoV of the XR HMD. The XR Viewport and the view frustum near plane are the same in many cases. In 6DoF cases with extensive environments (i.e., unconstrained 6DoF), the view frustum far plane may be set infinitely far away.
XR 3D Tracking Techniques
XR applications require fast and accurate 3D spatial tracking, which are enabled by the following techniques (3GPP TR 26.928 V16.1.0). The XR Viewer Pose is generated from continuous tracking of the user’s movement in the XR Reference Space. The XR Viewer Pose is determined by the viewer’s position and orientation. Very accurate positional tracking is needed for the XR application to work properly. 3D positional tracking methods include outside-in tracking, inside-out tracking, world tracking, and simultaneous localization and mapping (SLAM). These technologies are possible using the XR feature extraction and tracking technologies described in Chapter 3 and the artificial intelligence (AI) based deep learning technologies explained in Chapter 4.
Outside-in tracking is conducted by (multiple coordinated) tracking sensors (e.g., cameras) located at fixed locations in the XR facility (e.g., having motion detection sensors installed at all corners of the XR simulation room). Later shown in Figure 2-8, a Hikvision DS-2TD4166T-25 electro-optic (EO) and infrared (IR) dual camera is used to assist outside-in tracking as well as AR and VR automatic switching for continuous MR applications.
Outside-in tracking is possible only for a region that has preinstalled tracking sensors supporting it. Outside the preset tracking region, outside-in tracking cannot provide any information, and therefore, one (or multiple) of the following tracking methods needs to be used.
Inside-out tracking is conducted by multiple coordinated tracking sensors located on the XR HMD device. These sensors track the surroundings and identify the HMD’s location and position based on internal processing of the sensed information. This is the most common tracking method applied on almost all XR HMDs, glasses, and headsets.
World tracking is conducted by one (or multiple coordinated) camera(s). The camera(s) determines the XR device’s location and position as well as surrounding objects and the environment. World tracking is commonly used in AR devices and can be used in VR devices.
SLAM is used to create a map (or a territory picture) by combining multiple world tracking images from a XR HMD. SLAM connects multiple images and sensor data to create a map (or a territory picture) of a much wider region. SLAM could construct a more accurate and wider map (or territory picture) if images from multiple HMDs (and/or stationary cameras and sensors) can be used.
Localization Techniques
Localization is a technique to determine the location and position of users and objects in a 3D domain. Localization is used to create spatial maps for XR applications. Localization can be conducted using multiple sensors, cameras (e.g., monocular, stereo, and depth cameras), radio beacon signals, global positioning system (GPS), assisted-GPS (A-GPS), ultra-wideband (UWB), ultrasonic sensors, radio detection and ranging (RADAR), light detection and ranging (LIDAR), millimeter waves (mmWaves), gyroscope sensors, inertial sensors, etc. Signal analysis will be used on reference signal received power (RSRP), received signal strength indicator (RSSI), time of arrival (ToA), estimated ToA (ETA), variations in wavelength or phase, distortion of spectrum, etc.
More information used in the localization computations can help to increase the accuracy but will require more processing and memory. Some of the localization techniques are described in the following, which include spatial anchors and visual localization techniques (3GPP TR 26.928 V16.1.0).
Spatial anchors are reference points in the 3D domain that have known locations and can be used as references in calculating the position of other objects and users. Spatial anchors are known to provide accurate localization within a limited range. For example, in the Microsoft Mixed Reality Toolkit, the spatial anchoring has a 3 m radius. To increase the accuracy and range, coordination of multiple spatial anchors can be used. SLAM can also be used in the localization and tracking process. Visual localization uses visual SLAM (vSLAM), visual positioning system (VPS), and other visual imaging processing techniques as well as sensor data to conduct localization. More technical details are described in Chapter 3 in XR based detection technologies and Chapter 4 in deep learning technologies.
XR Experience
Immersive presence is a combination of two feelings: The feeling that you are surrounded by the metaverse/game environment (immersion) and the feeling that you are physically and spatially located in that environment (presence). There are two types of presence that the XR system needs to provide, which are “cognitive presence” and “perceptive presence.”
Cognitive presence is the presence that is felt in the user’s mind and is more content based (e.g., a good storyline) as it is triggered by imagination. Even without any XR device, cognitive presence can be felt by reading a good book or watching a good movie.
Perceptive presence is the presence that is felt through the user’s five senses (i.e., eyesight, hearing, touch, taste, and smell) or a few of them. At the current technical level, perceptive presence based on eyesight and hearing is common in AR, VR, and MR devices. XR systems that can provide a perceptive presence based on three senses (i.e., eyesight, hearing, and touch) are possible using haptic devices (more details are in Chapter 3). However, the level of XR performance needs much more improvement, in which hopefully this book will help accomplish in the near future. Perceptive presence is a highly technical feature of XR systems as all of the introduced XR technologies (i.e., 3D coordinates, rotational movement, DoF, FoV, XR view, XR Viewport, XR Pose, tracking, localization, etc.) need to be used to create a high-quality XR perceptive presence (3GPP TR 26.928 V16.1.0).
XR Device and Network Performance Requirements
To create a high-level XR based immersive presence, a very low time delay and a very high-quality image and display performance are required. The time delay performance bound can be derived from the delay tolerance level, which is influenced by the round trip interaction delay. The image and display performance factors include the required image resolution, frame rate, and FoV. More details are described in the following (3GPP TR 26.928 V16.1.0).
Delay Tolerance
The round trip interaction delay is obtained by adding the User Interaction Delay to the Age of Content, which is described in the following.
User interaction delay is the length of time from the moment the user initiates an action until the time the content creation engine of the XR system has taken the action into account. User interaction delay is influenced by the sensor delay/cycle time, data processing time, communication/networking time, and queueing (memory) delay.
Age of content is the length of time from the moment the content is created until the time the content is presented to the XR user. Age of content is influenced by the content processing time, communication/networking time, queueing (memory) delay, and display time.
For XR systems, a total latency less than 20 ms is recommended. The total latency covers the time from the moment the XR user’s head moves up to the time that changes on the XR device’s display are complete. This is a very short time delay compared to MMOGs, which are also very high-speed systems.
Among various MMOGs, the first-person shooter (FPS), role playing game (RPG), and real-time strategy (RTS) types are most relevant to XR systems and metaverses. Shortened name-wise, an FPS-type MMOG is commonly called a “massively multiplayer online first-person shooter” (MMOFPS) game. Likewise, an RPG based MMOG is commonly called a MMORPG game, and an RTS based MMOG is commonly called a MMORTS game. MMOFPS games have a delay tolerance of 100 ms (i.e., 0.1 seconds), MMORPG games have a delay tolerance of 500 ms (i.e., 0.5 seconds), and MMORTS games have a delay tolerance of 1,000 ms (i.e., 1 second). However, to achieve a higher MMOG performance, a delay time much shorter than the delay tolerance level is recommended (3GPP TR 26.928 V16.1.0).
Image and Display Requirements
Image resolution is an important factor in deciding the quality of the immersive presence.
The FoV, frame rate, and image resolution are all important factors. Considering that the Meta Quest 2 is the leading XR HMD by a far margin, references to the Oculus specifications will be considered as a guideline.
The founder of Oculus Rift Palmer Luckey mentioned that at least an 8K resolution (i.e., 8196 × 4096 pixel image frame) per eye is needed to eliminate the pixelation effect. Pixelation occurs when the resolution is low such that the pixels forming the image are noticeable by the viewer. When pixelation occurs, it is hard for the XR user to feel an immersive presence, and the aspects of the animation and augmentation are noticeable. An image frame rate higher than 90 Hz (i.e., 90 frames/s) is recommended to create a comfortable and compelling virtual presence feel for the user (https://developer.oculus.com/blog/asynchronous-timewarp-examined/). Considering that most of the TV programs we watch have a fame rate in the range of 24 to 60 frames/s, the recommended 90 frames/s is a very high frame rate. In addition, a wide field-of-view (FOV) in the range of (at least) 100 to 110 degrees is recommended.
XR CPU and GPU Engines
The XR device’s engine and software platform and operation relation between the central processing unit (CPU) and graphics processing unit (GPU) is illustrated in Figure 2-7 (3GPP TR 26.928 V16.1.0). The XR engine is in charge of the image rendering engine, audio engine, physics engine, and the artificial intelligence (AI) engine (www.nrexplained.com/tr/26928/xreng).
The rendering engine is in charge of all image and video rendering processes. Rendering is a computer process of creating an image from 2D or 3D models. The final image is called the “render” which may be photorealistic or non-photorealistic. Rendering requires computer graphic editing of the scene, object description, geometry, viewpoint, texture, lighting, transparentness, reflection, shading, information description, as well as augmented and virtual objects and domains. Rendering is the major final step in 2D and 3D computer graphics generation and is used in every image creation operation including XR, MR, AR, VR, video games, simulators, movies, TV, visual effects, animation, architecture, and design visualization (https://en.wikipedia.org/wiki/Rendering_(computer_graphics)).
The audio engine oversees the microphones and speakers and needs to assist the AI engine for voice control and speech-to-text (STT) processing. The audio engine processes the loading, modification, and balancing of speaker sound effects. Special acoustic effects using echoes, pitch and amplitude balancing, oscillation, and Doppler effects are essential in creating an immersive presence experience.
The physics engine reads and processes all sensor information and emulates realistic physical law based immersive presence effects. Accurate simulation of physical forces like collision, pressure, vibration, texture, heat, and chill are needed to enable the XR user to feel the air, wind, water, fluids, as well as solid objects. In the future, smell and taste physical effects could be added. By then the five basic human senses can all be influenced by the XR system, creating a full dimensional immersive presence experience.
The CPU is in charge of executing the XR engine, which includes accessing the system memory as well as controlling the application and management, scene graph, AI, audio, streaming, physics, and localization. The abstraction APIs like Vulkan, OpenGL, etc. are also processed by the CPU. XR devices use multicore CPUs in which the operating system (OS) and kernel will use to form directed acyclic graph (DAG) based processing sequences, in which multiple processing threads are executed over. The AI system of the CPU will process the deep learning algorithms that support the sensors, sound, and images of the XR system. The deep learning recurrent neural network (RNN) used for the sensors, physics, audio, and localization processing is an infinite impulse response process that has a DAG that cannot be unrolled. The deep learning convolutional neural network (CNN) used for video, image, scene graph, localization, feature detection, and feature tracking is a finite impulse response process that has a DAG that can be unrolled (https://en.wikipedia.org/wiki/Recurrent_neural_network).
The GPU will receive commands and data streams from the CPU. The GPU is in charge of video memory access as well as the image rendering pipeline process. The vertex shader is a programmable shader that is in charge of processing the vertices. Shaders are user-defined programs that are executed on selected stages of the GPU process. The primitive assembly function will convert a vertex stream into a sequence of base primitives. A base primitive is the processed result of the vertex stream’s interpretation. The rasterization process will convert the image data and signals into projected images for video and graphics formation. Shader functions include fragmentation, vertex processing, geometry computations, and tessellation. The tessellation process assists arrangements of polygon shapes considering gapless close fittings and nonoverlapping repeated patterns. The shaders also consider the texture information of the objects and environment. (www.khronos.org/opengl/wiki/Rendering_Pipeline_Overview).
XR Software Engines and Platforms
The major XR software engines and platforms are introduced in chronological order of initial release, which include Lua (1993), Unreal (1998), Unity (2005), WebGL (2011), SteamVR (2014), WebVR (2016)/WebXR (2018), and OpenXR (2017).
Lua
Lua is a popular cross-platform OS programmed in ANSI C that was first released in 1993. Lua uses the filename extension “.lua” (www.lua.org/). Up to version 5.0, Lua was released under a license that is similar to the BSD license, but from version 5.0 it has been released under the MIT License. Lua means “moon” in Portuguese, which was named considering the programming platform SOL which means “sun” in Portuguese. Lua uses parts of the SOL data-description syntax, parts of the control structures of Modula, parts of the multiple assignments and returns structure of CLU, as well as parts of C++, SNOBOL, and AWK. Lua was originally designed to be a general embeddable extension language that is easy to use and could enable software applications to be customized quicker by enhancing the speed, portability, and extensibility. Lua is a popular programming language for video games and applications because it is easy to learn and embed and enables fast execution. The representative games developed using Lua include Roblox, Garry’s Mod, Payday 2, Phantasy Star Online 2, Dota 2, Angry Birds Space, Crysis, etc. In addition, Lua has been used by many applications such as Adobe Lightroom, Moho, iClone, Aerospike, etc. A poll conducted by GameDev.net revealed that Lua was the most popular scripting language for game programming in 2003 (https://en.wikipedia.org/wiki/Lua_(programming_language)).
Unreal
The Unreal Engine (UE) was developed by Epic Games and was first released in the first-person shooter (FPS) game Unreal in 1998 (www.unrealengine.com). UE is a 3D computer graphics game engine that is programmed in C++ which has superior portability as it can be used on various computer OSs, smart device OSs, game consoles, and XR platforms. This is why UE has been used in development of films, television programs, and metaverses, in addition to video games. Unreal was also listed in the Guinness World Record as the “most successful video game engine” in 2014. A list of games made with UE can be found in Wikipedia (https://en.wikipedia.org/wiki/List_of_Unreal_Engine_games). The source code of the Unreal Engine 5 is available on GitHub since April of 2022 (https://github.com/EpicGames) and can be used through a registered account. Commercial product use of UE with royalty is possible. Epic has waived the company’s royalty margin until the game developers have accumulated a revenue of $1 million, and if the game is published on the Epic Games Store, the developer’s fee is waived (but policies may change).
Unity
Mobile platforms: iOS, Android (Android TV), tvOS
Desktop platforms: Windows (Universal Windows Platform), macOS, Linux
Web platforms: WebGL
Console platforms: PlayStation (PS4, PS5), Xbox (Xbox One, Xbox Series X/S), Nintendo Switch, Stadia
XR platforms: Oculus, PlayStation VR, Google’s ARCore, Apple’s ARKit, Windows Mixed Reality (HoloLens), Magic Leap, Unity XR SDK SteamVR, Google Cardboard
Other platforms: Film, automotive, architecture, engineering, construction, and the US Armed Forces
Unity is written in C++ (runtime) and C# (Unity Scripting API) based on a proprietary license.
The Unity game engine can be used to develop 2D and 3D games as well as mobile apps, metaverses, interactive simulations, and XR applications. The Unity game engine is easy to learn and use because it provides a primary scripting API in C# using Mono that supports the Unity editor, which provides various plug-in options as well as drag and drop functionality. Unity provides the High-Definition Render Pipeline (HDRP), Universal Render Pipeline (URP), and the legacy built-in pipeline, which are incompatible with each other. Because Unity is easy to learn and use, many indie game developers have used it.
Indie games are video games made by an individual programmer or a team of programmers that do not belong to a game publisher. Indie games also include games that the programmer/designer has creative freedom in the game’s development even though a game publisher is funding and distributing the game. Games developed and released by major game publishers are typically called triple A (i.e., AAA) games, which are different to indie games. The name “indie game” is an abbreviated term for “independent video game” (https://en.wikipedia.org/wiki/Indie_game).
WebGL
WebGL was initially released in March of 2011 by the Mozilla Foundation and was developed by the Khronos WebGL Working Group (https://get.webgl.org). WebGL supports highly accelerated GPU processing by supporting physics and image processing. The Khronos Group announced in February of 2022 that WebGL 2.0 is supported for all major browsers. WebGL is short for “Web Graphics Library” which is a library of JavaScript APIs for web browsers that can be used to render interactive 2D and 3D graphics. WebGL enables accelerated GPU processing of web browsers when it is fully integrated with other web applications (www.khronos.org/webgl/wiki/Main_Page).
SteamVR
SteamVR is a hardware and software platform for VR devices that was developed by the Valve Corporation (www.steamvr.com/en/). SteamVR was first applied on the Oculus Rift HMD in 2014 and since has been widely used to support many HMDs, which include the HTC Vive and Valve Index. SteamVR technology supports accurate positional tracking to enhance room-scale immersive presence experiences, which has significantly contributed to (constrained) 6DoF application development. SteamVR originates from the popular Steam video game technologies, digital distribution services, and storefront developed by the Valve Corporation which launched in September of 2003. Originally, SteamVR provided support for Windows, macOS, and Linux platforms, but in May of 2020, the support for macOS ended (https://en.wikipedia.org/wiki/Steam_(service)#SteamVR).
WebXR
The WebVR API was initiated in 2014, and in March of 2016 the WebVR API proposal version 1.0 was released by the Mozilla VR team and the Google Chrome team. In 2018, the WebXR Device API was released to supersede WebVR by extending the API to support XR applications which includes MR, AR, and VR. WebXR enables website access to the immersive content provided on XR devices (including MR, AR, and VR) which include HMDs, headsets, sensor information, tracking, and sound effects (https://immersiveweb.dev/). WebXR was designed to be easily integrable with various web browser tools and is commonly used with WebGL. WebXR provides a very versatile interface which enables various XR devices to access the WebXR content with very little modifications. The web browser access of XR applications significantly reduces development time and performance testing. However, there are many web browser types which can create interoperability issues, and web browsers have limitations in immersive expressions and user interaction compared to HMDs, so not every XR aspects can be developed in WebXR (https://en.wikipedia.org/wiki/WebXR). The technical details of WebXR are defined by the Immersive Web Group of the World Wide Web Consortium (W3C), where the specifications are made public on GitHub (https://github.com/immersive-web/webxr).
OpenXR
OpenXR is an open standard that supports XR platforms and device development that was developed and released by the Khronos Group in February of 2017 (www.khronos.org/openxr/). It is a cross-platform OS based royalty-free standard that is accessible in GitHub (https://github.com/KhronosGroup/OpenXR-SDK-Source). OpenXR’s API has the following fundamental elements. “XrSpace” represents the 3D space, “XrInstance” represents the OpenXR runtime, “System” represents the XR devices, “XrSystemId” represents the XR controllers, “XrActions” helps to control user inputs, and “XrSession” assists the interaction session between the application and the user/runtime. OpenXR has been used for game and rendering engine support for the Unreal Engine (since September of 2019), Blender (since June of 2020), Unity (since December of 2020), Godot (since July of 2021), etc. OpenXR has been used to enable WebXR browser support on Google Chrome and Microsoft Edge. Some of the representative OpenXR conformant platforms include the Microsoft HoloLens 2, Windows Mixed Reality headsets, Oculus PC platform, Facebook Quest, Meta Quest 2, Valve SteamVR, VIVE Cosmos, and VIVE Focus 3 (https://en.wikipedia.org/wiki/OpenXR).
XR Lab Setup and Haptic Devices
Figure 2-8 shows a part of my Metaverse XR system research and development lab (at Yonsei University, Seoul, South Korea) which shows a Virtuix Omni VR 3D treadmill with a Meta Quest 2 HMD and a Hikvision DS-2TD4166T-25 camera.
The Hikvision DS-2TD4166T-25 electro-optic (EO) and infrared (IR) dual camera is shown in the far back of the room. This camera can be used to control the AR and VR domain switching for MR applications, as the HoloLens 2 AR device can provide a constrained 6DoF (limited to the room size) where VR is needed to extend the limited room with doors that open up to different VR multiverses and metaverses. The IR camera helps the automated precision control especially when it is dark indoors or for outdoor XR experiments at night.
The Hikvision DS-2TD4166T-25 EO camera resolution is 1920×1080 (i.e., 2 MP level), and the IR camera resolution is 640 × 512 in support of a temperature range of -20 °C to 550 °C with a ±0.5°C error margin and a distance range of 200 m.
The five basic human senses are sight, smell, touch, taste, and hearing, in which current MR, AR, or VR systems can commonly express sight and hearing effects, and a few XR systems can provide vibration-based touch effects in addition to the sight and hearing effects. Therefore, research and development of new XR technologies to express smell, touch, and taste effects need to be added. Some example haptic systems are described in the following.
Manus Prime X Haptic VR
Ultraleap Mid-Air Haptics
The STRATOS Inspire ultrasound speaker and STRATOS Explore from Ultraleap use programmable 3D sound sensor arrays, which can generate ultrasound waves that coincide to create mid-air haptic effects. These systems can generate a virtual touch feeling as well as a slight vibration feeling on the user’s hand (www.ultraleap.com/haptics/).
HaptX Gloves DK2
HaptX DK2 are haptic gloves from HaptX that can provide force feedback to the user’s hands. The feedback can be up to 40 pounds in the form of resistive force, so the user can feel and work with heavy objects in XR applications. The magnetic motion capture sensors conduct very accurate motion detection and enable the HaptX DK2 haptic gloves to have submillimeter precision sensing capability (https://haptx.com/).
Teslasuit
The haptic gloves and full body haptic suit from Teslasuit have some unique features that can enhance XR user experiences (https://teslasuit.io/). Teslasuit’s haptic glove and suit can be used to monitor the heart rate, oxygen saturation, or other biometric information of the user, which can be used to analyze the mental state and stress level of the user in training or playing games. The haptic gloves can generate a feel of solid object texture using electric sensors with 45 channels and can generate a feel of virtual object resistance force. Using these gloves, the user can feel up to 9 N of feedback resistance force. For reference, the “N” stands for “Newton” which is the unit of force. To provide you with a feel of force, 1 N of force can accelerate an object with a mass of 1 kg to 1 m/s2 (www.britannica.com/science/newton-unit-of-measurement). The haptic suit covers the user’s full body, so full body sensing and feedback is possible. The haptic suit can provide near real-life sensations and feelings through electro muscle stimulations and transcutaneous electrical nerve stimulation.
3D Motion Treadmills
When using a XR HMD, glasses, or headset, the user is focused on the environment, objects, and events happening in the XR domain. Therefore, the user can easily lose balance, fall down, fall off, or bump into other people or various objects including walls, fences, doors, etc. All of these accidents can result in a physical injury (and in some cases death) as well as property damage.
In the preceding DoF description presented, the 3DoF and 3DoF+ are based on the user sitting in a chair while using the XR device. Even in these cases, many accidents have happened to XR users, but these accidents may not be as serious as the accidents that can occur to XR users of 6DoF and constrained 6DoF cases. This is because XR users of 6DoF and constrained 6DoF cases are standing up and moving around in the XR domain, where the actual room or location will have a different physical structure to the metaverse or MMOG environment viewed in the XR HMD, glasses, or headset. Especially, a higher level of immersive presence will result in a higher risk of an accident because the XR user will have a lesser amount of awareness of one’s actual environment. This is why XR users of 6DoF and constrained 6DoF systems need a 3D motion treadmill. Selected 3D motion treadmills for XR applications are presented in the following.
Virtuix Omni Treadmill
Virtuix Omni is a VR treadmill device developed by Virtuix to assist 6DoF and constrained 6DoF VR gaming by enabling full body movement in VR environments (www.virtuix.com/). The Virtuix Omni was initially presented at the 2015 Consumer Electronics Show (CES) in Las Vegas. This 3D motion treadmill has a belt system that supports the user from falling while always keeping the user safely in the center of the treadmill. This 3D motion treadmill allows 360 degrees moving and running. The user’s foot movement is tracked using the inertial measurement unit (IMU) tracking sensors attached to the custom shoes. The motion sensors on the custom shoes communicate wirelessly with an external computer providing movement information. The surface of the 3D motion treadmill has low friction coating applied, which makes it very slippery to minimize friction such that less resistance is felt during movement. The current price of a Virtuix Omni treadmill ranges from $1,995 to $2,295 (https://en.wikipedia.org/wiki/Virtuix_Omni).
Cyberith Virtualizer
Cyberith Virtualizer is an omnidirectional treadmill developed by Cyberith that supports 360 degrees of walking and running in VR applications. The VR walking platform supports full movement by using six optical motion sensors as well as an optical rotation sensor and optical height sensor. The Cyberith Virtualizer has two versions, the Virtualizer ELITE 2 and the Virtualizer R&D Kit. The Virtualizer ELITE 2 is a powered walking platform. The Virtualizer R&D Kit supports higher precision and lower latency optical motion tracking (www.cyberith.com/?gclid=CjwKCAjwq5-WBhB7EiwAl-HEkj5rqxgnDK-uZx5AlhWBSL5QIzvJEcSuuEH1EH2qVIhyB2qwkFbE3BoCuAwQAvD_BwE).
KAT Walk C
KAT Walk C is an omnidirectional treadmill developed by KATVR that supports 360 degrees of walking, running, crouching, and jumping motion detection to support VR gaming. The system also provides real-time haptic response through vibration to the user. KAT Walk C was designed to be cost-effective while occupying a small space for comfortable use in private homes and small rooms. KAT Walk C provides diverse scenario support which includes sit-down interaction for vehicle control gaming (www.kat-vr.com/products/kat-walk-c2).
Spacewalker VR
Spacewalker VR treadmill is a VR treadmill in prototyping phase by SpaceWalkerVR that supports forward and backward (mono-directional) movement only. The system is equipped with a gyroscope that detects user movement such as walking, running, shooting, and picking up objects. The safety system uses integrated pressure sensors to automatically regulate speed such that it can prevent the user from falling (https://spacewalkervr.com/).
Infinadeck
Infinadeck is a VR application supportive 3D motion treadmill developed by Infinadeck. Infinadeck includes an active wireless control system and enables 360 degrees of movement through an omnidirectional moving floor (www.infinadeck.com/).
Summary
This chapter provided details on metaverse and XR systems, where the characteristics of XR, MR, AR, and VR were first described, followed by XR system components and the processing workflow. Then details on the hardware engines, software platforms, haptic devices, and 3D motion treadmills were provided. These advanced XR technologies will continue to evolve and drive metaverse XR services to improve significantly. In the following Chapter 3, XR head mounted display (HMD) technologies are introduced, which include XR device types, XR operation processes, as well as XR feature extraction and description algorithms.