Technology in this industry is advancing rapidly, evolving and changing daily. Companies are being highly funded, purchased, merged, and forced to admit defeat. However, many of the ideas and functionalities remain consistent and are rooted in older technologies.
IS ANY OF THIS ACTUALLY NEW? Nothing good ever happens fast. Breakthroughs over the last century have created stepping stones leading toward XR.
THE WORLD IN THE PALM OF YOUR HAND Mobile AR holds great power and has an impressive future. Many people already own a mobile device—the quickest entry point into an augmented experience.
PROJECTION MAPPING Rooted in decades-old technology, projection has gotten smarter. Adding spatially aware video is an approachable and relatively efficient way to create a strong visual impact using familiar tools.
HEAD-MOUNTED DISPLAYS Placing a computer on your head, and over your eyes, is personal. HMDs are being created in XR to bring digital content into our view.
SPATIAL COMPUTING Screens as we currently know them are starting to evolve as our world is becoming the new interface. Spatial computing helps bridge the gap between our physical and digital spaces to make the experience more human.
In recent years there has been a lot of focus and attention in the fields of virtual and augmented reality. This has been paralleled by large investment backing for the development of this technology and in support of companies who have emerged as forerunners. It is common to hear about these “new” emerging technologies, as if they never existed until now. But the truth is that, like with most technological advances, a great number of steps were taken along the way to get us to where we are today, and to where we will be in the coming years. To go back to find both the conceptual starting points and key highlights in history that have led to where we are today, we first have to determine what specifically we are looking for. This is a complex industry, so there are many different components that can be tracked through the past.
If we want to look at the history through the lens of innovative communication, then that can lead us to the first telephone call or even the first radio broadcast—where a person could be in one place and experience something that was happening in another. We could also go back further to explore the inventions that aided in the creation of those technologies. Another path would be to focus on ways we can experience a new reality or one that is different from our current physical one.
We can find interesting concepts and ideas that have served as inspiration to the XR field intertwined within the genre of science fiction. That might lead us to the many written works where authors explored the ideas of a utopian world and explored some early ideas of what we know as AR today. In his 1901 book, The Master Key, L. Frank Baum talked about augmented markers:
I give you the Character Marker. It consists of this pair of spectacles. While you wear them every one you meet will be marked upon the forehead with a letter indicating his or her character. The good will bear the letter “G,” the evil the letter “E.” The wise will be marked with a “W” and the foolish with an “F.” The kind will show a “K” upon their foreheads and the cruel a letter “C.” Thus you may determine by a single look the true natures of all those you encounter.
Though this was just a written idea, it vividly describes the layering of information into your physical space using a pair of glasses—which is the fundamental concept of augmented reality as we know it today. It is an example that identifies early ideation into experiencing something that is not in your own physical space.
Looking for early precursors of XR, as well as areas of thought influence such as sci-fi narratives, can lead you to many interesting connections. It can be fascinating to consider other outside influences throughout your current knowledge and as you continue to explore and research more about XR. The timeline shown in FIGURE 2.1 focuses on highlights when technology was used to enhance an experience for humans through the combination of physical and digital worlds. Other examples will be shared in context as you read through the pages of this book, but the focus is intended to help you see the pivotal advances that have led to the XR industry today. There are examples that date back as early as the late 1800s and early 1900s: proof that there is a really robust history to this field. Though we are moving at a much faster rate now than ever before, these ideas and concepts are far from new.
FIGURE 2.1 History of XR. A timeline walking through the history of XR influences and predecessors.
The most important note to take away from this abbreviated history is the chain of advancement. You can see the influence of different technologies being used in different ways to influence the next one down the line. This is how strides are taken toward innovation—step-by-step and in smaller pieces. Usually we take note of the larger advancements and especially the ones that become more noteworthy because of the visibility of the companies, such as Apple, Facebook, or Google, who launch the products. It is not often discussed that those product launches are the culmination of a number of smaller advancements, often all combined into one. When a larger company backs an idea or product with a large amount of money, it is sure to grab the attention of others. It ultimately becomes a puzzle of putting the right technologies and functionalities together to create the next big technological breakthrough. As we break down some of the main functionalities that are happening right now in XR, you can continue to see the chain of influence: how current technologies are connected both looking back and also pointing toward mainstream acceptance in our future.
I had the opportunity to visit Ireland a few years ago to speak at a conference, and while I was there, I decided to take a trip to the Cliffs of Moher to do some sightseeing and find some final inspiration in preparation for my talk the next day. There I was surrounded by majestic views and an amazing atmospheric soundscape. I found a spot and sat down for a while sketching out ideas and just observing. It was in that spot where I had a breakthrough about AR becoming accepted and becoming mainstream. As I watched people from all over the world enjoy the same scenes as I was, I saw one consistency over and over: When someone saw something beautiful that captivated them, the first thing they did was hold up their phone and place it between themselves and the landscape to take a photo. People have already accepted this device into their everyday lives, and they are already willing to hold it up to engage with it throughout the highest moments of their day. That is the power of mobile AR.
A large challenge with VR is that it is harder to achieve this kind of mainstream adoption. The headset cost is still high, and with many headsets needing to be tethered to a desktop computer, the lack of freedom becomes limiting. However, with AR, you can literally place it into the palm of someone’s hand, and they can take it with them wherever they go. In fact, they don’t even need any additional equipment, because the technology is built right inside the smartphone that they already carry around and don’t go anywhere without. In fact, their smartphone is so much a part of their lives that they repeatedly hold it up and engage with it in their current reality.
Mobile AR is reliant on using the existing camera in our phones. This is called video pass-through as it uses a camera to digitize the world in real time. With the camera view occupying the full phone screen, it allows for the overlay of augmented elements and information right on top. Because this doesn’t require any additional device, people are more likely to use it. There are a number of different ways you can engage with AR on a smartphone.
Video pass-through A technique that combines a physical and digital environment through the use of a camera and the pixels on its screen.
Opposite of the video pass-through option is optical see-through (also known as OST), which provides a see-through lens with digital content projected or built into the lens. This allows you see to the actual physical world without looking through a camera view. Due to the challenges of the OST technology, the video pass-through as we have in mobile AR has become more common.
Optical see-through An additive technique to overlay digital content onto a see-through lens allowing a real view of the physical world.
One of the first ways you might interact with AR on a mobile device is encountering a new feature added to an experience you already use and know. For example, if you are shopping for your home on Amazon, Ikea, Wayfair, Crate & Barrel, and many of your other favorite online shopping sites, you may see a feature that allows you to “see it in your space.” Though the wording may vary per experience, the concept is the same. Once you choose this option, you have to grant permission for the app to use your camera. Next, you select a plane or where you would like to see the product. For example, if you are shopping for a rug, you tap the floor (plane) in your camera view. If you are shopping for a lamp, you select the tabletop. Then, you see the product displayed, sometimes in an effective way, in your space. You can capture a photo of this imagined reality to keep or to share to get a second opinion. This feature, although relatively simple, is a powerful way to engage buyers and to help them select the best product for their space. It has also proven to reduce the number of returns.
Social media and messaging apps are other places where you can see AR as an integrated feature. Snapchat created filters that add different designs on top of a snap allowing you to add everything from floral crowns to the head of your favorite animal or animated character. The Memoji feature, which is integrated into messaging apps on Apple devices, allows you to send a message using your detailed facial expressions applied to a character of your choice. Both of these examples use facial tracking that is reliant on front-facing, depth-enabled phone cameras.
If you are looking to have a full AR experience on a mobile device, you will likely have to download a specific application. Ikea, as an example, has a standalone app called IKEA Place. With it, shoppers can see their products in AR. Another AR app for mobile devices, PeakVisor identifies mountain peaks and their elevations using your phone’s camera, among other features (FIGURE 2.2). Have you ever been on a hike and wondered which mountain ranges you could see in the distance? This app has the solution. Beyond just overlaying the names of mountains on the surrounding view, the app also tells you the elevation of each peak using a 3D compass and altimeter. To make the most of all these functions, PeakVisor is a downloadable app. In addition, it enables hikers (or anyone) to download maps, so the app can still provide information on peaks and hiking trails when you’re exploring beyond the range of an internet connection. A web-based application would be unable to do this.
FIGURE 2.2 PeakVisor. PeakVisor mixes augmented reality with virtual reality (3D maps). You can get relevant information on what you see now (AR) and plan your adventures (celestial bodies simulation, mountain flyovers, and more).
Designer: PeakVisor team
The downloadable mobile app approach has several benefits. One of the benefits is that a mobile application can leverage the robust functionalities of the device itself and the AR framework. Since mobile applications are individual applications that are installed onto a mobile device, they have the full benefits of the phone’s processing power and are not limited to the processing power of a web browser, which web-based AR experiences rely on. In addition, web apps cannot use the depth-enabling sensors and cameras that mobile apps can tap into. Depending on the functionality of a mobile app, the user might not need to have a strong internet connection or network coverage for it work successfully. Another benefit is that mobile apps become their own small ecosystems. People engage with the app on its own without the pull of other browser windows and links. When people launch an app, it is like they are entering that world and that space, and this can help focus their attention.
Creating a native AR app also has its drawbacks. The first is that developing mobile apps is a complex undertaking. The full experience has to be developed and then maintained. These apps also need to be developed for Android and iOS using different computer languages. Android uses ARCore, and iOS uses ARKit for their AR capabilities. Though they can be developed with both platforms in mind to be more efficient, two separate applications must be developed and maintained to reach the widest audience. This can be costly in resources. This approach also has a built-in barrier, as users have to stop and download the app before they can proceed. As we will explore next, WebAR is a solution to provide a faster and more accessible experience to get users quickly engaged.
It is possible that your project won’t have the budget for a full standalone app—or it might not be needed for the scope of your project. Luckily, another solution has emerged as a viable option. Although the technology is not moving at the usual fast pace, WebAR is now at the point where you should consider it as an option. With this option, the AR experiences are built into a website. The biggest advantage is that you can send people directly to an experience without the need to download anything. This is great for a faster entry, eliminating the friction of users stopping and waiting to download an app, or the risky interval when users consider whether your experience is worth the effort of adding a new app to their device.
This approach allows for interactive experiences anywhere. Links and scannable codes can quickly connect users to immersive experiences from posters, signs, billboards, cards, presentations—you name it. Within seconds everyone in that space can be fully immersed into an AR experience. For projects with smaller budgets, both time and monetary, a huge benefit of WebAR is cross-platform capability, as it runs off the browser. As mentioned, for standalone apps you need to create separate versions for Android and iOS devices. These use different programming languages as well as different frameworks for each language. Avoiding that hurdle alone is a huge benefit of the web-based experience.
Of course, there are pros and cons to everything. With WebAR the main downsides are the slower rate at which the technology is being developed and that you are limited to the speed of the user’s web browser. If they are connected to Wi-Fi, then they can take advantage of that speed. However, because Wi-Fi is not ubiquitous, 5G coverage is something that many XR creators have been excited for. Widely available access to a faster network allows for more robust WebAR experiences.
Two companies have stood out in the WebAR development space. The first is 8th Wall. This California-based company has built an ever-growing Software Development Kit (SDK) that focuses, currently, on three main functionalities for WebAR experiences: face effects, world tracking, and using image targets to activate an experience (FIGURE 2.3). To develop a fully custom experience, you will need to have some working knowledge of JavaScript, HTML, and some CSS. However, 8th Wall provides prebuilt templates to help you get started and a great support system to help you. One of the other added benefits is that all the coding is web-based, which allows for multiple users to work on the same document, as it is considered a cloud document. Though the price tag to launch an 8th Wall experience will be less than developing a full app, be sure to read their requirements for licensing monthly and account for that in your budget. Later, in Chapter 13, “Bringing It to Life,” we will discuss the role of coding in the designer’s process.
FIGURE 2.3 Jini-Hero. 8th Wall web-based augmented reality experiences work in the browser across iOS and Android smartphones with no app required.
Designer: 8th Wall
Tip
Be sure to check all licensing fees for launching a commercial experience to account for that cost in your project budget.
The second WebAR company that currently stands out is Vectary. The Vectary WebAR platform has the added functionality of including a 3D editor all through the browser. You can create your 3D models and then create the experience all in one place. With 8th Wall, you need to create your models elsewhere and import them. (Vectary also offers the option of uploading your own models, if you prefer.) One of the added benefits of creating the 3D objects and creating the experience all in the same space is that you can easily allow for user interactions, such as allowing the user to see the same product in multiple colors. You are able to accomplish that all in one place versus having to go and create each model in every available color separately and then upload it. We will talk more about creating these 3D models in the next chapter, so don’t worry about that just yet, but it is important to know what functionalities are possible so you can plan accordingly.
There are many ways to experience AR on your mobile device, and there have been some advancements in the camera technology in recent years. Apple added a LiDAR (light detection and ranging) scanner in the iPhone 12 Pro family, and even in the iPad Pro (FIGURE 2.4). This is important for two main reasons. The first is that it places smarter XR capabilities in devices that millions will own, taking another step toward more acceptance and use of AR because no additional equipment is needed. The other exciting part about this is that it represents a giant step toward the release of the highly anticipated Apple AR glasses. This technology shows that Apple has figured out how to create powerful long-range space detection in a small laser. It is so small you may not even notice it at first. You can determine if your iPhone has this sensor by checking out your camera; look for a small black circle the same size as your white flash.
FIGURE 2.4 LiDAR Sensor. An iPhone 12 Pro rear view showing the LiDAR-equipped camera. The LiDAR scanner, the black circle the same size as the white flash, is used for 3D scanning and augmented reality.
To many who just use their cameras to take photos and videos, the benefit will be improvement of clarity in low-light conditions. But for those who try an AR experience, the LiDAR-equipped camera can be used for much more.
You may be familiar with Apple’s TrueDepth camera, which was introduced to enable facial recognition on the iPhone X. This technology is similar to LiDAR; however, the biggest difference is range. TrueDepth uses short-range infrared lasers that reach only a few feet, which works well when your face is near your phone. The LiDAR scanner, on the iPhone 12 Pro, works by essentially pulsing infrared light to objects more than 16 feet away. Once it hits an object, the light reflects back to the sensor, allowing it to measure the distance through the time it took for the light to travel. The longer the infrared light takes to travel to the object, the farther away the object. This can also be called a time-of-flight camera or even 3D laser scanning. The sensor sends light out in all directions and can create a mesh, or a 3D map, of the space or object. This can then be used to learn about the environment or even to create a 3D replica of an object that was scanned, depending on the use. Knowing that there is a wall or a table in a space will allow for more understanding between that physical space and the digital objects that will be augmenting it. Digital objects can start to appear under physical objects, for example. This understanding of depth is called occlusion.
Occlusion When one object blocks another object from view to maintain a user’s feeling of immersion.
This may sound familiar to you if have parking assistance in your car or have had land surveyed. The same LiDAR technology is being used in other devices, including other HMDs, such as the Microsoft HoloLens 2. It is a powerful tool to help create a 3D picture of objects and their environment. As of this writing, other smartphones have tried to incorporate a similar concept, but not on such an advanced level. Samsung might be close, as they have submitted a trademark application for the ISOCELL Vizion 33D, which mentions the use of this time-of-flight sensor.
While you may feel like you have a good understanding of the capabilities of your smartphone, it is important to know about two built-in features that you might use frequently, but may not have thought about. Along with the importance of your Wi-Fi and network coverage, the camera, and the processing power, two other features play a big role in mobile AR: the gyroscope and accelerometer (FIGURE 2.5). They often work together, especially for any kind of motion tracking, however, they have two very distinct functions.
FIGURE 2.5 Gyroscope and Accelerometer. The combination of the gyroscope and the accelerometer can sense the orientation and velocity of the device using the x, y, and z coordinates.
The gyroscope detects movement such as rotation and spinning.
The accelerometer measures acceleration, or the change in velocity, which is speed divided by time.
Both are used for many apps including any map or navigational experience. Within Google’s ARCore, motion tracking is a fundamental concept that helps the phone understand the environment around the user and where the user is in relationship to that space. To continue adding more acronyms to your tool belt, the process for understanding this relationship is called simultaneous localization and mapping, or SLAM. The goal is to identify the phone’s pose.
Pose The position and orientation of the camera of a 3D scene.
To identify the pose of a camera, the phone relies heavily on the gyroscope and the accelerometer. It uses this data to calibrate the virtual world to our physical world; this ensures that the perspective of the digital objects matches our real-world view. Having this match creates a sense of true immersion.
Immersion Having the sense that digital objects belong in our real world.
One last feature or enhancement to keep your eye on is the capability of the chips that are being added to our smartphones. If you are an iPhone user, then you might know about the AirDrop functionality, which makes file sharing among Apple devices in close proximity easy. If someone is close by with the AirDrop feature, Bluetooth, and Wi-Fi turned on, then you can quickly and easily send them files, such as a video, photo, or PDF, all with a tap. The technology that makes this super-speedy is built within the U1 chip, which, interestingly enough, is also now being included in Apple Watches—perhaps a test to see how it could perform in smartglasses?
Tip
If a digital object doesn’t act in the way we expect it to, this is called breaking immersion. Causing this break can alter the user’s trust with the experience.
So, why is this important? Though the file-sharing component is the most notable experience from this technology, the potential far exceeds this one use case, as this chip uses a different kind of wireless technology called Ultra Wideband (UWB). Apple refers to this as “GPS at the scale of your living room.” As you start to conceptualize different AR experiences, you may need to have some way of connecting people who are all in the same space or within close proximity to one another. This could be to have them be either all connected to a similar experience or all discoverable in some way. Consider the mobile application I developed called tagAR, which is used during real-world, in-person events (where all attendees have the app). It allows you to use your phone to look around at fellow attendees and see augmented name tags displayed above each user’s head. To keep the experience in real time, it is important to track only those who are close to you, instead of all users at an event or those who are using the app elsewhere. Using UWB, a development team could create a small local network for faster and more accurate tracking. The U1 chip is a step toward more robust mobile experiences, and it is already built into the mobile devices that will power the experience.
The future of mobile AR is ever growing. It is completely normal to see people holding up their phones everywhere we go. However, if you have ever tried to record an entire song at a concert or walk across town using GPS, then you know that your hands get tired from holding the device out in front of you for long periods of time. Even though mobile devices are small and very portable, they still have physical limitations. As we see mobile AR continue to advance, we will also be seeing how smartphones become primarily a processor. A head-up display feature, such as smartglasses, will become the screen through which we experience an augmented world.
Projectors, digital or analog, are an example of AR. If you consider what they do, they are a great example of an early augmented reality use case. They bring information into our physical space using light. Projectors used to be big and bulky, but they have improved to become much smaller and more efficient with their bulb use. We have an expectation for projectors to display content on designated screens that are generally found in conference rooms and classrooms, mostly because that is where projectors are often used. However, there are many opportunities for expansion, considering the power of projection. The biggest difference between simple projection and the addition of mapping is the adaptability of the image to the physical space. In essence, projection mapping enables you to transform any object into a display. It is projection with computer vision. You can display video, images, and audio customized to the environment. If you consider that a screen is a projector’s biggest inhibitor, imagine what happens if you remove the need for the screen, so anywhere can become a display surface. That is the power of projection mapping.
Projection mapping Transforming almost any physical object or surface into a display surface through the use of video projection.
If you are interested in AR and you are looking for a way to see some instant gratification of what it can do, projection mapping is a great introduction. With products like those from Lightform making this process easier, honestly the hardest part is coming up with the idea (FIGURE 2.6). Lightform’s LF2+ projector scans a scene through a process they call “visible structured light.” After connecting the device to the same Wi-Fi network as your computer, you scan the space you want to project onto. Using Lightform Creator you then select the parts of the object or surface you would like to project onto. If you have ever used the selection tools in Adobe Photoshop, you should have no problem selecting your objects. Finally, you find your video file or choose one from their library, and deploy. Within minutes you can have a dynamic projection up and running.
FIGURE 2.6 Lightform. Lightform teamed up with the San Francisco Conservatory of Flowers to bring their nightlife event to another level. Before Lightform, complex plant life would have been an intimidating subject matter for the most experienced projection artist, but these plants were no match for the power of Lightform Compute (LFC) scanning technology. Lightform Creator is able to use the most minute details of the scan to create stunning visuals with no expertise. (www.lightform.com/projects/living-wall).
Customer: San Francisco Conservatory of Flowers; Company: Lightform; Projector: Epson L1505U & Epson G7500; Software: Lightform Creator; Hardware: LFC
For the most dynamic experience, select individual areas on which to add different digital content, leaving the real world around them untouched. If you just project onto an entire flat wall, you haven’t really tapped into the potential of this system. This is a great way to create interactive installations, create budget-friendly signage, share information, bring art to life, and do anything else you can dream up really.
Once you start to see how projection can be used, and as it becomes more and more interactive, the use cases will continue to expand. Companies such as argodesign look to explore the possibilities of how you can make spaces collaborative through the use of interactive projections. Instead of each person looking at their individual screens trying to work toward one goal, they all are looking at one space and can move elements around together, in one unified decision (FIGURE 2.7). Other research has shown how projection can be used to teach infants who are deaf or hard of hearing to communicate with their parents, who may still be learning with them. Through their 2020 research study Antony, Blumenthal, Qiu, Tenesaca, Hu, and Bai created technology that projects ASL signs of the nursery rhymes that the parent sings to the child right in their home.1
FIGURE 2.7 Netflix Menu Concept. Interactive Light is a concept, from Mark Rolston and Jared Ficklin of the firm argodesign, that examines using projection combined with computer vision to create digital experiences that can be placed anywhere, incorporate everyday objects as digital controls, and create co-operative multi-user interfaces. Projected light requires that the UI fit into the environment, but also blends calmly into surfaces—if color and contrast are used well.
Designers: Mark Rolston and Jared Ficklin, founding partners at argodesign; Creative technologist: Jarrett Webb, argodesign
1 Projection-Based AR for Hearing Parent-Deaf Child Communication, University of Rochester
One of the benefits of exploring projection mapping as a designer is that you can bring your illustration and motion design skills to life in a new way. It is also exciting to see your work displayed at such a large scale, especially if you have never seen it that way before. Designs and videos can be displayed on the sides of buildings, on small figures on a shelf, on water, on rocks—really, the world becomes your canvas. It is important to experiment with a variety of different surfaces as shape, size, and texture all can influence the experience and how the light interacts with it. If you project into a location where lighting varies, that can also offer some variance in the experience. Be sure to check locations out at different times of day and with different secondary light sources on and off. Scanning works better with full light, but once the scan is made, you can save it and project at any point.
In the next chapter, we will look at some more of the workflow for creating 3D content, so stay tuned. For now, here’s a helpful workflow tip specifically for working with projection mapping: When you scan a space using a scanner, it saves a static image of the space. Although you will need to be thinking for time-based design, the static image does help you with two essential things. The first is to provide a visual to see how to set up your installment of a projection if you have to move the projector for any reason. The second is that it provides a mockup that can be used as you create the content to be projected. This static image can be brought into Adobe After Effects, or other editing software, so you can animate for the exact space, making it easier to connect your physical and digital content in one place.
Let’s say you are at a park. You look up, and to your surprise, you see a ball coming through the air about to hit you. What is the first thing you do? You may have two simultaneous thoughts: to move out of the way and to protect your head. You mostly likely will do both, but which one first? Your first instinct is likely to lift your arms to protect yourself just in case you can’t move out of the way fast enough. After surveying a handful of people with this same question, I asked them to physically show me what they would do. The first motion each person did was to lift their arms to protect their head.
We are very protective of our heads and especially our faces. It is important to know and understand this, as we discuss wearable technology that will be worn on the face and wrapped around the head. Just as we take necessary precautions to prevent injury to our heads first, we will meet with reluctance anything we don’t trust or don’t feel comfortable placing on our faces. Now, I haven’t even brought up the fact that we will be trusting our sight to this device. This adds another layer of discomfort that we will have to overcome in order to try a device.
For a user to try a head-mounted display (HMD), they need to know what the benefit is for them. Either curiosity or purpose might persuade them to try it out. As we get further into the design process, we will work toward ways to help the user understand the goal of the experience and then look at how the user experience will help impact their overall experience. For now, understand that the first 60 seconds of the experience are crucial to a successful and positive overall experience for a new user. To best design for that experience, you need to first understand a bit about this very personal wearable tech.
It truly isn’t as important to know every detail about all the HMDs, because they are going to change constantly over the next few years. What is more important to understand is what they do and how best to design for them. It is important to know about the different kinds of wearables so that you can select the best option for the experience you are creating. The first clear decision you have to make is if you are working in VR or AR. Then from there, you can determine how much spatial computing is needed for a successful AR experience, and you can decide between AR or MR.
Once you have a clearer path to the kind of tech you will be using, you need to know what device you are creating for at the start of the process. This is so you can tailor your design to the limitations and capabilities of the device. Not all HMDs are the made the same, so at this point, there is not a one-size-fits-all solution. You will need to design specifically for each headset.
Not all HMDs are the same, and there is one very distinct way to quickly narrow in on the best gear for the job. Consider how the movement, position, and orientation of the user is tracked. Most HMDs follow one of two main approaches to this tracking process: outside-in tracking or inside-out tracking. The difference is the location of the cameras (FIGURE 2.8). As the terms suggest, with outside-in tracking the cameras are fixed within the space themselves and are not attached to the user or any device. This method allows for more precise tracking and can also utilize seemingly invisible sensors worn by the user for more precision even if the camera can’t see a specific part of the body that might be blocked. However, it is limiting as it requires people to enter a specific space where the camera sensors are installed, making this a less portable option. This idea is used in video-enabled doorbells. The doorbell is in a fixed position, but it has motion-sensing capabilities that alert the homeowner when someone comes near the front door. Users can adjust the sensitivity of the motion sensors to reduce the number of false alarm notifications. In this use case, outside-in tracking makes sense, because the device is only used at the front door, which is not moving anywhere.
FIGURE 2.8 Tracking. A comparison between having external cameras or sensors within a space versus including the sensors on the device itself.
Outside-in tracking Uses external cameras or sensors to track positioning and detect motion.
Inside-out tracking Uses cameras and/or sensors that are within a device to track its position in real-world space.
In an opposing approach, a device using inside-out tracking has internal cameras or sensors that track the location of the device within a specific space. This requires more of the device itself, but because it is all internal, it is a more portable option. A number of sensors on the device are able detect the distance between the headset and areas within the space. This is similar to what we talked about with Apple’s use of the LiDAR scanner in mobile AR. Many of these sensors are reliant on the use of light beams or lasers to identify and re-create the space around it. Because inside-out tracking uses a simpler setup, it is often the preferred method.
One commonality between these types of tracking is that they are both able to track in real time. Many experiences require precise coordinate tracking, and in the past, that led many to prefer the outside-in tracking as it offered more accurate results. However, with the continued advancements in the technology, inside-out tracking has improved its accuracy. You can now choose the best option for an experience, instead of being limited by the capabilities. Some devices, especially VR and MR headsets, also have companion hand controllers for the experience. These controllers will be tracked in the same way as the headset to help identify their location within the experience.
HMDs are an essential part of VR. Because the user will be fully immersed within the experience, they require a headset of some type to help enclose their vision inside the different space. The industry has proven that this headset can be very advanced or even made of just cardboard—which Google proved by giving out cardboard headsets at the 2014 Google I/O developer conference. The device, which was literally made out of cardboard, allowed users to use their phones to engage within a VR experience. Despite this DIY VR option, most headsets are much more sophisticated and come with a higher price tag.
When we talk about VR headsets, you will likely hear about Oculus. This company makes headlines for many reasons. Facebook purchased Oculus in 2014 for over $2 billion. This price tag made it very news worthy, as it not only showed the value of the company, but also served as an investment into the future of VR by the social media company. You don’t spend that much money on something without projecting how that cost will be returned in revenue. Facebook also paid additional money to make sure they were able to retain the employees working for the company—probably a smart move. They have continued to advance the technology to the point where they now have a tether-free VR option called the Oculus Quest 2. This is a big advancement for VR, which has often felt more constricted by having to be connected to a powerful PC, to offload the processing from the headset itself. In a bold statement, Facebook CEO Mark Zuckerberg mentioned that 90% of their users are new to VR. This shows that more and more people are willing to engage in this virtual world. Facebook has continued to explore how VR can be a social engagement space. With stay-at-home orders during the COVID-19 pandemic, this proved even more of an essential platform to explore how we can safely enjoy time together, without sharing germs.
Other companies making great advancements in their HMDs that you should keep on your radar are HTC VIVE, Valve, Samsung, Google, Qualcomm, and Sony PlayStation.
There are also AR and MR HMDs (if you are starting to understand all these acronyms, then you are doing great). These are starting to blur the lines that distinguish which reality is which, as well as making it harder to distinguish between headsets and glasses. The term smartglasses has emerged to help identify these AR and MR wearables. Smartglasses either use small LED projectors to display information on the lenses, or use display lenses, also called combiners, which are part glass and part digital display. Both approaches enable the wearer to still see the physical world, while allowing light to pass through the lens, but use different ways to augment information at a comfortable viewing distance for the eyes. As you are looking at different HMDs for AR and MR, pay attention to the field of view. Within these glasses, the field of view determines the degree at which you can see the augmented information. Glasses with a smaller field of view provide mostly a normal view with a small rectangle display, possibly only for one eye. A larger field of view provides a much more immersive experience. Without any glasses, the human field of view is around 210 degrees in a horizontal arc and a vertical range of 150 degrees. At this time, 50 degrees is considered large for AR/MR HMDs. You can find this field of view in both the Magic Leap 1 and the Microsoft HoloLens 2.
Field of view (FOV) The size, big or small, of the viewing space for an augmented experience.
In the current state of this tech, you have to balance wanting a larger field of view with being inconspicuous. The wearables that allow a more immersive experience also look much less like normal glasses, typically requiring a band that goes around the full head. Smartglasses can be misidentified as ordinary glasses, but often have a smaller field of view. The Microsoft HoloLens 2 uses holographic images, using light diffraction. This headset has gained popularity for remote training and collaboration. For these purposes, having a larger device is more acceptable, but as you can imagine it isn’t something that you would wear out and about. The Magic Leap 1 provides a fully immersive experience with crisper colors and provides a dynamic sensory experience with spatial audio and vibrating haptics. We will talk more about these benefits in the coming chapters. These great features do wear the battery life out—another challenge for which developers are seeking solutions.
In the smartglasses category, Vuzix and Snap are both companies to watch. Vuzix uses a microLED projection in the latest update of their Blade glasses. This technology allows for independent control of each pixel versus just a constant stream of light throughout the screen. Snap released their Spectacles smartglasses, which have two symmetrical outward facing cameras that help you capture your world in 3D, so you can relive it in the future. Both of these glasses require a smartphone nearby to help power the experience and store content.
HMDs in general, and specifically smartglasses that can be seen as ordinary glasses, are an area many people are watching for major advancements. As of this writing there are many highly anticipated releases of AR glasses by such companies as Apple and Google. In 2020, Google acquired Nreal, who has won multiple awards for the best head-worn device. Other companies to watch are Facebook with Project Aria, Qualcomm’s plan to create “XR viewers,” and Microsoft.
With XR, the world is the new interface. With the spatial computing that we have to date, you can see how the barriers of human and computer interaction can start to be broken down. With each advancement the interactions between humans and computers become more humanized. Can a wave of your hand inward gesture an object to move closer? Can the things we look at, with just our gaze, pay attention to us and react? Yes, they can.
When you participate in an XR experience for the first time, you begin by going through a process of learning about the specific environment you are in. For VR, this typically means walking around the perimeter of the space where an interaction will occur. This sets up your active space. Then you go into the full virtual environment, and there isn’t much need to connect the experience with your physical environment further—in most cases.
For MR experiences, however, a full room scan is needed before an experience can be launched. If an experience is reliant on understanding the place where a user is, then it first needs to create a “digital twin” of that place. Different systems have different approaches as to how this is done, but the concept of it remains the same regardless of the device. As you look around the space any and all of the surfaces and planes that can be seen by the device will be mapped. This is an experience that you can usually see through an overlay of visual feedback in the form of a grid overlay or color highlighting of a plane. This feedback to the user helps communicate what the computer is learning about the space around them.
The Magic Leap 1 headset does a full scan of the space, using nine sensors, before you dive deeper into an experience. As you look around the room, a grid overlay appears on the floor, ceiling, walls, and larger furniture. It can read corners, edges, and surfaces. You need to complete a certain threshold of the scan before you can advance into a fully spatial experience. The device remembers your last scan in case you need to pick up on an experience where you previously left off.
The term spatial computing was born in 2003 from Simon Greenwold as the title of his master’s thesis at Massachusetts Institute of Technology. Greenwold not only provided the name for this idea, but also the definition as:
a human interaction with a machine in which the machine retains and manipulates referents to real objects and spaces.2
2 Greenwold, S. (2003). Spacial computing [Master’s thesis, Massachusetts Institute of Technology]. acg.media.mit.edu/people/simong/thesis/SpatialComputing.pdf
Spatial computing uses our physical space, unrestricted by rectangular screens, which we can then use as our interactive vision. This contextual awareness allows us to track physical and digital objects in any space, even including their relationship to each other. We can see and understand which are taller, shorter, closer, farther away, and all in a 360-degree experience. Having the ability to calibrate the two different worlds makes the sense of immersion more believable for the user and enhances the overall experience.
As Magic Leap was preparing to release their first headset (FIGURE 2.9), they shared images of whales emerging from the floors and splashing water around the room. They also shared images playing with scale, such as having an elephant in the palm of your hand. These 3D images play with the imagination by using exaggerated scale of objects and by understanding the environment they have been placed into. All of that is possible because of the advanced calculations that a group of sensors on the headset are able to combine into a full digital replica of the space. Then once the physical space is mapped, the application can do more than just layer the digital 3D objects into it; it can have them appear as if they were really in the space, all because of the power of spatial computing.
FIGURE 2.9 Spatial Computing. Magic Leap 1 Lightpack, Lightwear, and Control.
Photography: Bram Van Oost for Shutterstock
You may recall that we talked about the LiDAR scanner available on the iPhone 12 Pro as related to mobile AR. This concept is similar to how a headset creates a digital map of the space using spatial computing. Using simply your smartphone, you can create a full 3D picture of the space around you thanks to a process called photogrammetry (FIGURE 2.10), which creates a realistic 3D model of an object or scene in your physical space from a collection of merged photographs. Your camera takes a series of photographs, but through the use of depth sensing, it also captures and measures the distances. There are different kinds of cameras that can capture spaces in this way and in a range of different resolutions.
FIGURE 2.10 Photogrammetry Diagram. In 2018, students from S. I. Newhouse School of Public Communications and Institute of Technology at Syracuse Central traveled to Makhanda, South Africa, to work with local partners Inkululeko and Rhodes University School of Journalism and Media Studies. Through this collaboration, a deeper understanding of the community was developed, as well as a new form of visual storytelling: photogrammetry. This technology empowers visual artists and journalists to take people places in ways never before possible.
Project director: Associate Professor Ken Harper; Production artist: Aubrey Moore
Photogrammetry The process of capturing, measuring, and interpreting three-dimensional photographs in order to re-create a digital replica of a real-world object or scene.
To break down this multistep process into its simplest form, you need to first capture the space or object in images, and then you need to use software to re-create the full mesh and texture by merging all the images together. New technology, such as the LiDAR scanner on the iPhone 12 Pro, makes this process much easier; you can scan an object and have it ready to use as a 3D model in just minutes. If you are looking for high-resolution images, then you would want to invest in a higher-quality camera, such as a DSLR or one that can capture high-resolution photographs in the RAW format. But for the purposes of mobile AR experiences, the 12 Pro or 12 Pro family now has everything that you need in one device. This process can save a lot of time trying to create a 3D asset from scratch. As part of your design process, you will need to determine the overall look and feel of the digital elements you use within an experience: This offers a more photorealistic style.
There are two different ways to consider capturing images using photogrammetry. You can stand still and capture the full space in a 360-degree viewpoint, or you can keep the object static while you move all around it, capturing it from every angle, as shown in FIGURE 2.11. The first will help you create a full mesh of a scene or environment. The second approach will help you capture a standalone 3D model. With the first approach, you could create a digital replica inside the Sistine Chapel to share in a VR experience, so others could then experience what it is like to walk in and see the magnificent artwork of Michelangelo. Or with the second approach, you could place an apple on table and capture it from all angles, usually dividing it up into the three segments of top, middle, and bottom. Then you could place this 3D model in any immersive space in AR or VR, so users could walk around it just like you experienced it as you captured it in the real world.
FIGURE 2.11 Photogrammetry Model. A complete set of visual assets created as part of a Newhouse Center for Global Engagement Liberian media development. Part of this process involved the use of photogrammetry to create 3D models of some of his work. Manfred has been working since 2007 in weapon conversion, turning “arms into art.” The skull is completely made from AK-47 rifle parts, reclaimed by the United Nations after the Liberian Civil War.
Photographer and Production artist: Ken Harper; Created for: Manfred Zbrzezny, artist, blacksmith, and owner of Fyrkuna Metalworks, Brewerville, Liberia
Tip
If you want to see how something that is 3D can be replicated in 2D, you could just scan it and bring it onto your flat screens and explore. You don’t always have to go from 2D to 3D.
The photogrammetry process has been used for many years, such as in Google Street View and Google Earth, to allow you to make 360-degree photos into 3D models. However, it is now easier to capture and create a mesh than ever before, making it an easier to format to consider without a huge time commitment.
Many developers offer tools for the photogrammetry process, in several formats. For example, you can use software on your computer, from developers such as Unity Technologies, to merge your photographs to create a model or scene. There are web-based applications like Sketchfab that you can use to efficiently create and share your models. Mobile apps, such as the 3d Scanner App by Laan Labs, work directly with LiDAR-capable phones. The ClipDrop app, by Init ML, offers an interesting variation of this concept. It allows you scan an image in your space, instantly select the object from the background, and bring it into Adobe Photoshop or a similar program to use. It doesn’t capture the full 3D, but is still a powerful time-saving solution.
As is the case with many XR applications, light is an essential factor in determining the quality of the images for photogrammetry. Having a consistent light source that doesn’t produce harsh shadows will improve both the speed of the scan and also the final quality of the output. LED lights can offer a nice consistent tonality. Also, not all materials will work with same quality. Reflective surfaces, transparent materials, fine details, and things that move will all cause problems. If you have ever taken a panoramic photo on the iPhone, you may know how having someone move as you pan across a space can cause some glitchy effects when the images get merged together. The same is true when scanning for photogrammetry. Finding a space with minimal or no motion will set you up for success.
Having LiDAR sensors, depth sensing cameras, and mobile AR capable technology in the palm of your hand, in a way that can be easily shared, opens up a world of options for how you can begin to use it. It can influence the fields of construction, journalism, medicine, art, and even social media. Although the technology might at times feel overwhelming, remember that the underlying capabilities of what it can do are more important than remembering the name of the company who has released the latest thing. As we talked about in Chapter 1, “Pick Your Reality,” gaining prosthetic knowledge about what you need for the next step is the goal. That way you don’t get hung up on the latest and greatest and can develop the best design solution for the problem you are solving with XR.