© Ronald Ashri 2020
R. AshriThe AI-Powered Workplacehttps://doi.org/10.1007/978-1-4842-5476-9_7

7. AI Is the New UI

Ronald Ashri1 
(1)
Ragusa, Italy
 

Artificial intelligence is often, and rightly so, brought up in the context of solving hard problems like discovering new genes, curing cancer, or enabling autonomous driving. There is, however, a far more mundane but as challenging set of problems that AI is already playing a key role in solving. AI is increasingly the magic sauce behind the software that manages our interactions with any computing device.

The aim of this chapter is to illustrate and motivate the links between AI and user interfaces (UIs), and demonstrate how AI-powered UIs are going to be important not just for consumer products but for the workplace as well. AI-powered interfaces will become a source of competitive advantage for organizations that use them correctly.

Moving beyond “point and click”

Since the widespread introduction of the graphical interface with the Macintosh computer in 1984, the predominant interaction paradigm with computers has been to point a cursor at something and click to select it.

Innovations along the way upgraded this basic experience, making it richer and smoother, but they haven’t radically changed it. Yes, we can now use our fingers instead of a mouse. Yes, we can “pinch and zoom” with two fingers or “swipe” with more fingers. With some trackpads and smartphones we can even use pressure to cause different reactions. We went from tiny, underpowered processors with very little memory on grayscale screens to blazingly fast machines, virtually unlimited memory, and millions of colors. That’s thirty-five odd years of improvements. Nevertheless, we are still pointing and clicking.

Don’t get me wrong. All of these developments are amazing. The technology necessary to provide a smooth pinch and zoom experience is staggering. The fundamental paradigm, however, remains the same. You are manipulating objects on a screen (buttons, links, text, images) by using a device (a mouse, pen, trackpad, or your hands) to indicate to the machine what should happen to the object you are pointing at.

Interestingly, AI already plays a huge role in today’s interfaces. A prime example is the virtual keyboard on your smartphone. It is constantly predicting the most likely letter you would have wanted to touch, which one you are likely to touch next, as well as what words and phrases you are trying to type overall. It is learning to adapt its predictions to your specific manner of touching keys and writing. If all of that was switched off, we would find it very hard to type any message on our phones. It is no exaggeration to say that the introduction of the iPhone was only made possible because it used enough AI techniques to make the UI possible.

These days, all the top-of-the-line smartphones have either facial recognition or fingerprint recognition. That is a feature that heavily depends on AI techniques to interpret the inputs it gets (your facial characteristics or fingerprint) to the ones it has stored in memory. The fact that they can do it in a seamless motion with practically no delay is nothing short of magic.

We are at a tipping point, however. It is time to move on from the point and click interface to something else. Additional AI technologies will allow us to take the next step, and there are three key underlying drivers.

First, as computing spreads to every aspect of our life and every device, the interface quite simply disappears or is not an immediate option. If you are multitasking, such as driving a car or preparing a meal in your kitchen, your hands are already occupied. Being able to speak to a computer is the only choice. If you are interacting with a device that is on your wrist, or embedded in your clothes or furniture, voice commands are the natural choice.

Second, it is about time we turned the tables on computers and the way we interact with them. So far, we have had to learn the “magic incantations”: the sequences of clicks that will help us achieve our goal. Where in the endless layers of menus is the option we are looking for buried? Which of the various left-button, right-button, one-finger, two-finger, or three-finger with pressure click combinations should we evoke to make things happen? Why can’t we simply tell computers what we want and have them do it? This has always been the vision, but now interface designers finally have tools to help them realize pragmatic versions of that vision.

The third driver centers on competition and how external forces make it inevitable for others to react. When a set of technologies reaches a tipping point and enables a new way of doing things, it provides a competitive advantage. This, in turn, causes competitors to look for ways to neutralize the advantage, which inevitably drives further technological innovation. The iPhone is a prime example of that. In that first presentation of the iPhone, Steve Jobs showed the state of the art in phones at the time: bulky, clunky, with physical keyboards. The iPhone changed all of that. In a few years the bulky and clunky phones were all gone. iPhone became the new standard by which smartphones were judged. Fast forward to 2019 and the iPhone is now competing to keep up with innovations that others are spearheading.

Now, imagine a support team that is able to provide a better customer experience because they can focus on the more complex cases while their automated virtual assistants, powered by conversational AI, are dealing with the simple and repeatable problems. As a result, all of their competitors will look to provide similar support interfaces for users, and the use of conversational AI becomes the minimum entry point.

As AI influences so many different aspects of what we do, these forces will cause change in many different ways. From a business perspective a great user experience cuts right to the heart of the efficiency issue. Imagine your sales team having to compete with a team that has ten times better and more efficient access to data, and the ability to create new visualizations and ask new questions of their data. While your team is trying to borrow the time of a software developer in order to write a new query to pull out a report, the other team can simply type or speak what they need in a conversational interface and have the results show up in the team messaging tool for everyone to share. We are past the point where a good user experience was a luxury to be added later, and we are quickly getting to the point where a good user experience will equate to active, smart interfaces that collaborate with users to solve problems. In other words, the interfaces of the future will be entirely dependent on AI.

In the rest of the chapter I will introduce some of the technologies that are enabling this change, and the interaction paradigms that they are making possible.

Conversational Interfaces

Our brains are hardwired for language. As toddlers we get to the point where we are learning new words every hour of our life, and often we just need to hear a word once and we can already start using it. Listening and conversing (whether through voice or gestures) is what humans do.

Now, compare that to navigating a web site or interacting with an app on the phone. That requires specific effort and training. We need to learn it explicitly and the rules keep changing on us. Different applications put buttons in different places, icons are different, etc. Conversations, however, remain simple: question, reply, response, repeat. From a human perspective, conversational interfaces as a way forward are a no-brainer. It’s what we do all the time.

Conversational interfaces are digital interfaces where the main mode of interaction is a conversation — a repeating pattern of reply response.

A conversational interface can use purely written exchanges (e.g., within Facebook Messenger or via SMS); voice-based (e.g., with the Amazon Alexa service), or a hybrid (e.g., Siri or Cortana, where we use voice but receive replies in a combination of voice and text). Conversational interfaces can also provide rich replies that mix text with media, or simplify the conversation by giving us a set of options to choose the reply from.

In the next few pages I take a look at what is happening with voice and written conversational interfaces, and explain why I think we are at the start of a very significant change for both.

Voice

I remember as a teenager being completely enthralled by the technological achievements of the 1990s, arguing with my dad about voice recognition. A new software solution called Dragon NaturallySpeaking was making waves at the time. It was the first commercial software that claimed to effectively recognize continuous speech (at 100 words a minute!). It felt as though the days where we would only ever talk to computers were just around the corner. My dad was far more skeptical. While dictation software was impressive, he could see all the challenges of voice recognition in busy office environments with multiple accents from different cultural backgrounds. He could not see how voice could be the main interface with a computer anytime soon.

Dragon NaturallySpeaking was first introduced in 1997. I was convinced that by the time we got to 2000 it would be the dominant interaction paradigm for computing. I was a bit too optimistic. My dad was right. Natural language recognition was not anywhere near the required level of capability.

The history of voice-assisted products goes back even further. In 1962 IBM presented the first commercially minded solution at the Seattle World Fair. It was called Shoebox and could recognize 16 spoken words and perform mathematical functions. We’ve been trying to crack the voice challenge for at least the past 60 years!

Eventually, however, algorithms, data, and computing power advanced sufficiently. Now is the time to stand on the side of natural language and voice in the argument. There is still a lot of ground to cover but the ingredients are there. Anyone with a smartphone has a voice assistant in their pocket. Siri was integrated into Apple’s iOS in October 2011. Amazon released Alexa in November 2014. Google Assistant launched in 2016, but the technology was gestating as Google Now since 2012. Every large tech company has a “voice” platform. IBM has Watson, Microsoft has Cortana, and Samsung has Bixby.

By early 2019 over 100 million products with Amazon Alexa built into them were sold.1 This level of adoption is critical. Voice applications have many challenges to overcome before they become a stable part of our digital environment. The two main ones, however, are getting us into the habit of using voice to achieve tasks (even very simply ones) and doing this reliably in any number of different situations. Both challenges require broad adoption before we see results. Broad adoption means that we are going to be increasingly more accustomed to having them around, which will feed enough data back to developers in order to improve them so that they can perform reliably. This is why these devices are so cheap. The large tech companies know what they need to get the ball rolling, and the only way right now is to make it a no-brainer for us to purchase the devices. It is such a low price that we simply reason that, worst-case, they are a decent speaker or alarm clock!

Through this mass adoption strategy, the tipping point is getting increasingly closer. We finally have:
  1. 1.

    Natural language recognition technologies (both for going from voice to text and then understanding the meaning of that text) that are good enough and widely available enough to deal with well-delineated domains

     
  2. 2.

    Devices that can support conversation-driven interaction that are cheap enough and widely available enough

     
  3. 3.

    Development platforms that allow anyone to create conversational applications that can be released and reach a mass audience

     

This means that we will see an explosion in voice-driven applications as companies begin to explore the problem space and find those killer applications.

Text

The same elements that are driving voice-based conversational applications are also driving text-based applications, but currently text has a few significant advantages. First, it is very easy to add text-based conversational interfaces to web sites and, second, messaging applications are the new kings and queens of the digital world.

The top four messaging applications have at least 4.1 billion monthly active users and on average we spend 12 minutes a day within messaging apps2 (the fact that your most likely reaction to that number is that is seems low is further proof of how popular messaging apps are!).

Messaging applications are widely used in business as well of course, with Skype, Microsoft Teams, Slack, and many others used daily by millions of people.

The asynchronous but immediate nature of text-based interactions is particularly suitable for a very wide range of everyday tasks. From checking flight details, banking issues, to the latest updates from your kid’s school, a text-based message is incredibly well suited. According to a Twillio survey3 of users across the United States, UK, Germany, India, Japan, Singapore, and South Korea, 89% of users would like to be able to use messaging to communicate with businesses. For 18- to 44-year-olds messaging is preferred over e-mail or phone communications.

We are going to look much more closely at text-based conversational interfaces, the technologies behind them, and how they can be transformational for work in organizations in Chapters 8 and 9, so I will skip a more lengthy discussion here. The main takeaway, however, is that with messaging applications we are past the tipping point. It is where people are now and what they like using. Now, the question is if you are looking to take advantage of opportunities afforded .

Augmented Reality and Virtual Reality

No discussion of how AI will change the way we interact with machines could be complete without dealing with augmented reality (AR) and virtual reality (VR).

AR refers to interfaces that overlay digital information on our view of the real world, either through wearable devices like glasses or simply through the screen of our phone. As with so many technologies, the level of usage forms a continuum. You can go from adding just a couple of extra pieces of information to my real-world view, such as the name of a building or a small card with extra information, all the way to creating what are often called mixed reality (MR) environments where the digital layer is rich and can be manipulated.

VR, on the other hand, creates an entire new world and places us in it. Whereas AR or MR augments what we currently see, VR replaces the analog world with an entirely digital one. The user typically wears a head device that immerses them in the virtual world and holds interface devices in their hands to manage what is going, on or their gestures are “read” and interpreted through an external device.

Although these technologies are further away from hitting the mainstream than conversational interfaces, the inflection point is getting closer. Once more the magic sauce of better computing capabilities, better hardware, and the application of AI techniques in the form of machine vision, natural language, and much more will lead to solutions that have the potential to feel like a natural extension of what we currently do. Success, however, is by no means a foregone conclusion. Even when all the required technological elements are there, the use cases still need to be carefully considered.

For example, does anyone remember Google Glass? Released in 2012 to great fanfare, the device was hailed as the harbinger of the AR age. The wearer of the Google Glasses communicated with the device using natural language voice commands or by touching the side of the glasses, and the glasses were able to overlay relevant digital information just above your line of vision. All the ingredients where there: natural language, a wearable device, and tons of automation to make everything work smoothly. It was also a complete failure.

There are multiple reasons for why Google Glass failed, and this is not the place to perform an in-depth analysis. What is interesting from our perspective is that a lot of the problems had little to do with the technology itself. In other words, even if Google Glasses were the “perfect” device from a technical capability perspective, they still would have failed. They were expensive, created awkward social situations (e.g., concerns that people would be photographed through the glasses without being aware of it led to them being banned in various locations), and didn’t solve an immediate pressing problem for people.

Unlike voice or text-based conversations that use an interface paradigm we are immediately familiar with, AR technologies add a new layer that we need to get used to. This means that unless it is done right, it becomes yet another interface to learn. If that interface offers sufficient benefits, people will invest the time to learn it even if it is not a great fit. If not, after the initial excitement, people will just give up. Indeed, calling Google Glass a complete failure is not fair. It has found uses in industrial settings where there are clear uses cases of helping skilled workers as they are completing tasks.

From a consumer perspective we have some strong examples of how AR can be very successful when it is used effectively. Pokémon Go from the gaming industry is perhaps the most well-known example. Pokémon Go gets users, equipped with smartphones, searching for and capturing digital Pokémon in the physical world. The game indicates to users where they need to go to find the Pokémon, thus giving it the ability to direct people to specific locations. The excitement of mixing real world treasure hunts with digital game play took the world by storm, and for a few months in 2016 it was impossible not to come across people either playing the game or discussing it. While that initial excitement has settled and we don’t hear about Pokémon Go on news reports anymore, the game is still played by tens of millions of users and generates hundreds of millions of dollars in revenue.4

Practitioners learn through these successes and failures, and because the appeal of AR is clear, it will eventually break through and become part of the tooling that helps us get work done in an office. The first area of application, however, is more likely to be industrial rather than office-based work. The mix of costs/benefits in an industrial setting is far more obvious and the domains to operate in are well defined. A great example is from a company called UpSkill.io. They use AR glasses to help field technicians receive information from remote specialist support staff. The AR glasses create a two-way feed between the field technician manipulating a complex device such as a drilling machine and the specialist support person.5 The back-end specialist can talk to the field technician, see exactly what the technician is seeing, and stream relevant information to the glasses. This allows the specialist technician to scale and support multiple field technicians, providing clear savings for the company.

Now, if AR is challenging because it introduces a new way of doing things, VR takes that challenge to a whole new level. VR technology needs to play the ultimate magic trick. It needs to make us think that we are in a completely new world but feel as though it is as natural as the physical world. For years the struggle was simply around packing enough computing into a portable unit so that you could actually wear the device and carry it around. A catalyzing moment was when Facebook purchased one of the most promising producers of VR headsets—Oculus VR—for two billion USD in 2014. The promise of the technology paired to the reach of Facebook convinced people we would all have VR sets in our living rooms in a short amount of time. Several years later the enthusiasm has settled but the technology has marched on. Oculus now has products that don’t require any wires, and at a significantly lower price point.

Ultimately, the promise of the technology is such that developments will continue. For our increasingly distributed offices, where large teams need to collaborate intensely on complex projects, tools that make that experience better are crucial. One of the VR holy grails is fixing the meeting room experience to makes those in the room and those calling in all feel as if they are in the same place. All the large technology companies and countless startups are working on VR/AR platforms that will put the tools in the hands of developers, to allow them to explore the space and find the user experience solutions and business models that will work. Some of the platforms to look out for are:
  • Microsoft with its Hololens 26 platform is providing the raw ingredients to allow developers to build applications on top of it. It is currently predominantly marketed for use in industrial applications.

  • MagicLeap, although a startup, has already built an amazing headset and platform to allow developers to build solutions on. They are focusing on entertainment experiences but also building the tools to create an AR experience for office work.

  • Facebook, as we already mentioned, is heavily developing its Oculus platform.

  • Apple has a mature AR development kit for the iPhone, and rumors abound about Apple AR glasses. Of course, no one can be sure until the official announcements come, but undoubtedly Apple with its existing AR platform will look to make the next move, which may well include some form of wearable device.

  • Google has not given up on AR and VR technologies; it is simply taking its time to apply the learnings of the first attempt.

Overall, the promise of AR and VR is such that people simply cannot give up. What is interesting from an office work perspective is that in order to fully take advantage of these platforms once they are widely available, you will need automation to allow users to really interact with your organization’s data and processes.

Better User Experiences Are a Competitive Advantage

For a long time, software built for the office simply did not consider the user experience as an important feature. Enterprise software was serious software for serious people, and that meant that if you had to click through ten screens and memorize twenty shortcuts to get your job done, well that is just what you would have to do.

Thankfully, we are now not arguing that point anymore. Although a lot of software is still terrible, there is an understanding that easy-to-use software leads to better work due to less training for users, fewer things going wrong, and increased user satisfaction. Beautifully designed consumer electronics and positive user experiences with tools such as Instagram or Facebook also make workers demand better experiences at work as well.

The next phase is going to be about how we can introduce more automation into our software solutions and how we can further reduce the friction of interacting with them. This will become especially true as the problems we are trying to solve increase in complexity and the volume of work increases as well.

AI techniques combined with interface paradigms such as conversation, AR, and VR will play a key role here. No matter what the UI of the future is ultimately going to look like, it is clear that the organizations that are able to provide the smoothest interactions between their systems and their staff, clients, and partners will have a competitive advantage.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.104.153