Creating a mixed reality app

Microsoft defines an MR application as an application developed for the Windows 10 Universal Platform, utilizing the holographic rendering, gaze, gesture, motion, and voice APIs. In this chapter, and in the next, we will focus on building MR applications using DirectX 11, C#, and SharpDX (C# wrapper for DirectX 11); alternatively, you can follow along using C++.

DirectX is a large topic (and one of many); it is not the intention of this book to cover DirectX, but to use it as a platform to illustrate the examples presented in this book. For a more comprehensive look at DirectX, for C#, I recommend Direct 3D Rendering Cookbook by Justin Stenning ( Direct3D Rendering Cookbook ).

We will start by walking through the Visual C# Holographic DirectX 11 App Template, focusing on sections related to HoloLens, before moving on to extend it for the example of this chapter.

To begin with, ensure that you have all the dependencies installed; details of these can be found on the official Microsoft at https://developer.microsoft.com/en-us/windows/mixed-reality/install_the_tools.

This book uses the following:

Visual Studio 2015 Update 3 Community
HoloLens Emulator (build 10.0.14393.0)
SharpDX3.0.0-beta01 (the latest version at the time of writing this chapter)
Unity 5.5.0f3 (used in further chapters)
Vuforia 6.1 (used in a further chapter)

Next, launch Visual Studio, create a new project by selecting the File | New Project menu item. With the New Project dialog open, expand Templates | Visual C# | Windows | Universal | Holographic and select the Holographic DirectX 11 App (Universal Windows) Template, as shown in the following screenshot:

The project template shows how to render a rotating cube at a world-locked location placed 2 meters in front of the user. The user can reposition the cube by performing an air-tap gesture; this will reposition the cube 2 meters in front of the user's gaze. Along with providing best practices when developing for the HoloLens, the template contains a set of helpful utility classes, bundled in the Common namespace, that can be taken advantage of in your own projects.

With the project now created, let's explore some of the components unique to HoloLens, starting with the class HolographicSpace. When the class CoreWindow is (re)created, a corresponding HolographicSpace is instantiated, which is responsible for fullscreen rendering, camera data, and access to spatial reasoning APIs.

The HolographicSpace is created within the SetWindow(CoreWindow window) method of the AppView class, as shown in the following code snippet:

 public void SetWindow(CoreWindow window)
 {
 // ...
 holographicSpace = HolographicSpace.CreateForCoreWindow(window);
 
 deviceResources.SetHolographicSpace(holographicSpace);

 main.SetHolographicSpace(holographicSpace);
 }

The official documentation describes the HolographicSpace as a representation of a holographic scene, with one or more holographic cameras rendering its content. It essentially controls the rendering pipeline for holographic applications.

Once the HolographicSpace is instantiated, it is bound to the deviceResource, which in turn, uses it to create the DXGI adapter, an API used to access the video hardware. The HolographicSpace is also passed to the main application, HolographicAppMain, where the application registers for the CameraAdded and CameraRemoved events. The following snippet shows an extract of the SetHolographicSpace method of the HolographicAppMain:

 public void SetHolographicSpace(HolographicSpace holographicSpace)
 {
 this.holographicSpace = holographicSpace;

 // ...
 holographicSpace.CameraAdded += this.OnCameraAdded;
 holographicSpace.CameraRemoved += this.OnCameraRemoved;
 }

Device-based resources need to be created or disposed accordingly for each camera; this is the responsibility of the application and is performed within the CameraAdded and CameraRemoved events, respectively.

The other significant role the HolographicSpace plays is in spatial reasoning. As the name suggests, spatial reasoning is concerned with predicting the specific location of the user's head (also known as device) for a given time, which of course, are essentially our cameras (World View Matrices) and is used to accurately position the holograms. The details of spatial reasoning are encapsulated in the HolographicFrame class, which is created at each frame within the Update method, as in the following extract:

 public HolographicFrame Update()
 {
 // ...
 HolographicFrame holographicFrame =  
 holographicSpace.CreateNextFrame();
 // ...
 }

For those who are unfamiliar with the use of matrices in 3D programming, matrices are essentially used to transform points (vertices) into different spaces. In 3D, we are normally dealing with the matrices model, world, and view. The matrix model is specifically for a single object, for example, cube. It determines where it resides in space, encapsulating (as with all transformation matrices) position, rotation, and scale. The world matrix defines where and how the object is placed inside the scene. Finally, the view (or camera) matrix is position relative to the camera.

They are used extensively in 3D programming, and in this book, so if you are unfamiliar with these concepts, I strongly recommend reading some material on 3D Math for game programming, such as Mathematics for 3D Game Programming and Computer Graphics, by Eric Lengyel.

The HolographicFrame exposes two objects that are used to better understand the user's current position and pose, the HolographicFramePrediction and SpatialCoordinateSystem; let's take a look at each of these in turn.

The HolographicFramePrediction encapsulates the camera properties and pose. The camera properties are used to (re)create the associated buffers, while the pose is used to update the view and projection matrices per frame.

The other object made available by the HolographicFrame is the SpatialCoordinateSystem. HoloLens provides a unique approach in managing coordinate systems, a concept we frequently revisit throughout this book. It essentially, unlike virtual environments that consist of a single coordinate system, manages multiple spaces, and does so by creating multiple coordinate systems for each space which, in turn, are used by the device to reason about the position and orientation of the holograms. The following figure illustrates this concept, where the VR environment consists of a single frame of reference, while the MR consists of one or more:

The following code extract shows an extended version of the preceding Update method to include references of HolographicFramePrediciton and SpatialCoordinateSystem, showing snippets of how the HolographicFramePrediction is being used:

public HolographicFrame Update()
 {
 HolographicFrame holographicFrame =  
 holographicSpace.CreateNextFrame();

 HolographicFramePrediction prediction = 
 holographicFrame.CurrentPrediction;

 deviceResources.EnsureCameraResources(holographicFrame, prediction);

 SpatialCoordinateSystem currentCoordinateSystem = 
 referenceFrame.CoordinateSystem;

 // ...

 SpatialInteractionSourceState pointerState = 
 spatialInputHandler.CheckForInput();
 if (null != pointerState)
 {
 spinningCubeRenderer.PositionHologram(
 pointerState.TryGetPointerPose(currentCoordinateSystem)
 );
 }

 timer.Tick(() => 
 {

 // ...

 foreach (var cameraPose in prediction.CameraPoses)
 {
 HolographicCameraRenderingParameters renderingParameters = 
 holographicFrame.GetRenderingParameters(cameraPose);

 renderingParameters.SetFocusPoint(
 currentCoordinateSystem,
 spinningCubeRenderer.Position
 );
 }
 return holographicFrame;
 }

It is not the intention of this chapter to go into any depth of the template but rather provide a top level tour of where and how the major components are used. I encourage you to read Microsoft's official document on the HoloLens developer site and examine the code.

Similarly, the following is an extract of the Render method, showing snippets of how HolographicFramePrediction is being used:

 public bool Render(ref HolographicFrame holographicFrame)
 {

 // ...

 holographicFrame.UpdateCurrentPrediction();
 HolographicFramePrediction prediction = 
 holographicFrame.CurrentPrediction;

 return deviceResources.UseHolographicCameraResources(
 (Dictionary<uint, CameraResources> cameraResourceDictionary) =>
 {
 bool atLeastOneCameraRendered = false;

 foreach (var cameraPose in prediction.CameraPoses)
 {
 CameraResources cameraResources = 
 cameraResourceDictionary[cameraPose.HolographicCamera.Id];

 // ...

 cameraResources.UpdateViewProjectionBuffer(deviceResources, 
 cameraPose, referenceFrame.CoordinateSystem);

 bool cameraActive = 
 cameraResources.AttachViewProjectionBuffer(deviceResources);

 if (cameraActive)
 {
 spinningCubeRenderer.Render();
 }
 atLeastOneCameraRendered = true;
 }

 return atLeastOneCameraRendered;
 });
 }

There is a lot going on, but for the purpose of our discussion, let's narrow our focus just to how HolographicFramePrediction and SpatialCoordinateSystem are used. Within the Update method, the SpatialCoordinateSystem is used to determine the user's position and pose, but only when an air-tap gesture is detected. We will return to the air-tap gesture later on in this chapter. Within the Update method, we also set the focus point for each camera; this is an optional step that can be used to further improve the quality of tracking by having the HoloLens prioritize areas of stabilization, normally near where you holograms reside.

To ensure that holograms are locked in position, HoloLens performs image stabilization that will attempt to compensate for movement in the display around the focal plane. A single plane, called the stabilization plane, is used to maximize stabilization for the holograms in the vicinity. HoloLens will automatically position the stabilization plane, but it can be overridden by the user, as seen earlier. The SetFocusPoint has multiple overrides, allowing more granular control over how the stabilization plane behaves, such as the normal and velocity relative to the holograms.

Now, let's turn our attention to the Render method to ensure minimal latency between the user and the virtual cameras; UpdateCurrentPrediction is called on the HolographicFrame, which ensures that it has the most up-to-date frame for rendering.

Next, we iterate over each camera updating the associated buffers (RenderTargets and DepthStencilView) of the context, including updating the view and projection matrices for a given camera's current pose and coordinate system.

So far, we briefly mentioned how HoloLens differs from purely virtual environments with respect to their coordinate system, without delving into detail about how this coordinate system is established. We will now revisit this and dive a little deeper into how this works.
In order to be able to position Holograms, we need to establish a central point of reference such that the position--0,0,0--has the same meaning for all holograms in your environment. This central point of reference comes from a FrameOfReference created by the SpatialLocator, a class responsible for tracking the motion of HoloLens. If you examine the SetHolographicSpace of the HolographicApp class, you can see where this happens, as shown in the following code snippet:

 public void SetHolographicSpace(HolographicSpace holographicSpace)
 {
 // ... 
 locator = SpatialLocator.GetDefault();
 locator.LocatabilityChanged += this.OnLocatabilityChanged;
 }

As you can see, you obtain the SpatialLocator via the GetDefault() class method; with reference to the SpatialLocator, we can now register to be notified of tracking state changes, such as if the device loses its ability to track, in which case you won't be able to render the holograms. Using the SpatialLocator, we can create a FrameOfReference, which acts as the central point of our digital world such that the holograms in proximity will be positioned and orientated relative to this. The following extract shows where this is done:

public void SetHolographicSpace(HolographicSpace holographicSpace)
{
// ...
referenceFrame = locator.CreateStationaryFrameOfReferenceAtCurrentLocation();
}

With the SpatialLocator.CreateStationaryFrameOfReferenceAtCurrentLocation() call , we create what is essentially a world locked anchor where the origin is the current position of HoloLens; remember that everything using the associated coordinate system is positioned and orientated relative to this point. The CreateStationaryFrameOfReferenceAtCurrentLocation has overrides that allow you to further offset and orientate the reference frame from the device.

The alternative to the SpatialStationaryFrameOfReference (shown in the preceding section) is the SpatialLocatorAttachedFrameOfReference, created via the CreateAttachedFrameOfReferenceAtCurrentHeading() method of the SpatialLocator class. Unlike the stationary frame of reference created, the attached frame of reference, as the name suggests, is attached to the HoloLens device. The implication of this is that holograms using its coordinate system are positioned relative to the device. It is worth noting that the frame of reference has a fixed orientation, that is, it's invariant to the device rotating. In this example, we rely on the stationary frame of reference and explore the attached frame of reference in Chapter 3, Assistant Item Finder Using DirectX.

The next natural question is how we can use this to position holograms, which leads us nicely to the discussion of spatial coordinate systems. As mentioned earlier; your objects are positioned relative to a fixed point (position and orientation) in a purely virtual space. Just like a purely virtual space, HoloLens positions holograms relative to a reference point, but unlike a purely virtual space, it needs to reason about the holograms' position and pose relative to multiple reference points (coordinate systems), which are anchored in the real world. In this example, the coordinate system is used to obtain SpatialPointerPose via SpatialInteractionSourceState, as shown in the following code snippet:

 public HolographicFrame Update()
 {
 // ... 
 SpatialCoordinateSystem currentCoordinateSystem = 
 referenceFrame.CoordinateSystem;

 SpatialInteractionSourceState pointerState = 
 spatialInputHandler.CheckForInput();
 if (null != pointerState)
 {
 spinningCubeRenderer.PositionHologram(
 pointerState.TryGetPointerPose(currentCoordinateSystem)
 );
 }
 }

When reference to the SpatialPointerPose, we can obtain the device's position and facing direction relative to the specified coordinate system. We can see how this is used within the method PositionHologram of the SpinningCubeRenderer as shown in the following code snippet:

 public void PositionHologram(SpatialPointerPose pointerPose)
 {
 if (null != pointerPose)
 {
 Vector3 headPosition = pointerPose.Head.Position;
 Vector3 headDirection = pointerPose.Head.ForwardDirection;

 float distanceFromUser = 2.0f; // meters
 Vector3 gazeAtTwoMeters = headPosition + (distanceFromUser * 
 headDirection);

 this.position = gazeAtTwoMeters;
 }
 }

In the preceding code snippet, we can see how the cube is being positioned 2 meters in front of the user's gaze using Head.Position and Head.ForwardDirection of the SpatialPointerPose instance.

The final chunk we will briefly discuss in this section before moving on to the example is the class SpatialInteractionManager and how it is being used in this example. Unlike your traditional desktop or laptop, HoloLens is not intended to have an external keyboard or mouse readily available; therefore, it relies heavily on gaze, gestures, and voice for its primary mode of input. These lend themselves well to the multimodal interaction model where multiple sources can be used to achieve the same task, or used together to achieve a particular task. A simple example of this can be selecting a hologram where the hologram can be selected via an air-tap gesture or via the select voice command. SpatialInteractionManager is the class responsible for managing these different modes of inputs and mapping them to a consistent gesture that can be handled by your application. To make this more concrete, let's peek inside the SpatialInputHandler class to see how this example is making use of it:

 public class SpatialInputHandler
 {
 private SpatialInteractionManager interactionManager;
 private SpatialInteractionSourceState sourceState;

 public SpatialInputHandler()
 {
 interactionManager = SpatialInteractionManager.GetForCurrentView();
 interactionManager.SourcePressed += this.OnSourcePressed;
 }

 public SpatialInteractionSourceState CheckForInput()
 {
 SpatialInteractionSourceState sourceState = this.sourceState;
 this.sourceState = null;
 return sourceState;
 }

 public void OnSourcePressed(SpatialInteractionManager sender, 
 SpatialInteractionSourceEventArgs args)
 {
 sourceState = args.State;
 }
 }

You obtain a reference to the SpatialInteractionManager via the GetForCurrentView static method. Now with access, you can register to be notified of specific types of interactions; in this instance, the SourcePressed event is assigned a listener, which is fired when a source has entered the pressed state. The delegate is passed an instance of the SpatialInteractionSourceEventArgs class that references a SpatialInteractionSourceState class, which encapsulates details of the event, such as the source (hand, voice, and controller) and associated properties.

We have only lightly touched on SpatialInteractionManager, but we will frequently revisit to discuss and experiment with it throughout this book.

This now concludes our whirlwind tour of the template and sets us up nicely to walk through the example for this chapter, but before moving on, now is a good time to build and deploy the template to see the result of all the code we have been examining. When built and deployed, you should see a colored cube floating 2 meters in front of you, as illustrated:

Table of Contents for Creating a mixed reality app

Create new playlist

Sign In

Sign Up

Table of Contents for
Creating a mixed reality app