Nava Shaked and Detlev Artelt

6The use of multimodality in Avatars and Virtual Agents

Abstract: One area where multimodality is becoming essential is the design of Avatars and Virtual agents (A&VA). This multidisciplinary area combines several design capabilities that must work together. This chapter reviews the development of Avatars and Virtual Agents as part of the Human Machine Interaction (HMI) field, with an emphasis on the needs and challenges raised by user requirements and demands. Specifically, we are asking: How has Multimodality changed HMI to create a more versatile, personalized and accessible experience and reduced the gap between virtual assistance and live assistance?

Avatars and Virtual Agents, as we will show, create a “live”-like sensation and interaction experience thanks to the correct and smart fusion of the Multimodal capabilities. In fact, the more sophisticated the interface is and the more knowledgeable, the better the interaction between users and their Avatars. Virtual Agents are able to get the user to “task completion home run” if they can be personalized to the right extent and, no less important, if they are efficient.

We claim that it is possible to categorize the different existing multimodal interaction types for Avatars and Virtual Agents and propose a framework based on three types of “interaction relationship”, with an example for each type.

6.1What are A&VA – Definition and a short historical review

There is substantial evidence that avatars are not a phenomenon of the 20th century. Although the first “sophisticated” avatar appeared much later, the idea to make bodies as entertainment systems has existed for a very long time (Cassell 2001). The word “avatar” (origin in Sanskrit avatāra (Avatar 2015)) originally derives from Hinduism where it is a bodily manifestation of Immortal Beings or “the Supreme being” (Egen 2005). This definition, when applied to computers, means that an avatar is a representation in the virtual world and for some it can be seen as their “Incarnation” into the Internet (Egen 2005).

We will give a short overview of the most important milestones in the history of avatars and take a look at how they go hand-in-handwith specific technological developments.

We will also introduce Virtual Agents and their development in the digital media as representatives of service providers for self-service applications. To describe the difference between an Avatar and Virtual Agents we turn to Fox et al. (2015) who define avatars as virtual representations perceived to be controlled by humans and virtual agents as those perceived to be controlled by computer algorithms. Virtual representations of people in computer-mediated interactions can be categorized as avatars or agents.

“Avatars are distinguished from agents by the element of control: avatars are controlled by humans, whereas agents are controlled by computer algorithms. Hence, interaction with an avatar qualifies as computer-mediated communication (CMC) whereas interaction with an agent qualifies as human-computer interaction” (Fox et al. 2015, p. 5). Later in the chapter we will refer to this definition and suggest some additional criteria.

So what is so special about Avatars and Virtual Agents (A&VAs)? Humans by nature engage in complex representational activity involving speech and hand gesture, and they regulate that activity through social conversational protocols that include speech, eye gaze, facial expressions, head movement, and hand gesture. Casswell (2001) claims that, where social collaborative behavior is key, representing a system as a human is the correct interface. Her term for A&VAs is Embodied Conversational Agents (ECA). She argues that an ECA is an interface in which the system is represented as a person and in which information is conveyed to human users via multiple modalities such as voice and hand gestures, etc. Cass-well (2001) was actually able to translate the rules of engagement into a table of conversational functions and their behavior realization which, in turn, could be used for ECA-based systems so that the user interface will be as similar as possible to human-to-human interaction.

We will examine all these interaction issues in relation to their supporting technologies but first let’s look at the historical development.

6.1.1First Avatars – Bodily interfaces and organic machines

The earliest attempts to model the body and bodily interfaces date back to the eighteenth century. Cassel (2001) claims that organicist automaton makers were driven by the question whether one could design a machine that could talk, write, interact in the way people do. Scientists wanted to know if machines could perform the same tasks as humans and to what degree they were able to perform them. This was the birth of artificial intelligence (AI) and the first steps towards the development of modern machines and computers.

The first machines were primarily developed for entertainment purposes, but soon they led to a completely new way of thinking. It was the French philosopher René Descartes (1596–1650), the “Father of Modern Philosophy”, who in the seventeenth century first expressed the thought that animals and humans operate by mechanical principles. There is also some evidence that he believed that a body was the same as a machine (Mastin 2008).

One of the first organicist machines was invented by the Swiss-born watchmaker Pierre Jaquet-Droz in the 1770s. The “writing boy” is a life-sized doll that can move its arm to dip a quill in the inkpot and write text to 40 characters. The automaton comprises approximately 6000 parts and 40 replaceable interior cams that dictate the characters written (Hills 2013). Another automaton that achieved great popularity in Europe was a mechanical duck constructed by Vaucanson in 1738. It consisted of gold-plated copper and contained more than 1,000 parts including a digestive tract. Vaucanson’s aimwas to understand the bodily functions of human beings and find treatment options to heal diseases (Perkowitz 2004).

6.1.2Modern use of avatars

The first digital avatars and agents appeared in the earliest days of computer development in the 1950s and 1960s. One of them was ELIZA, created by Stuart Weizenbaum. This computer program appeared in the early 1960s and emulated a Rogerian psychotherapist. ELIZA gave an illusion of intelligence and could answer simple questions (Weizenbaum 2000).

This program reminds us of what later became chat-boxes for customer service, which started to flourish in the 1990s as the Internet was released for public use. Chatterbots are still often integrated into the dialog systems of automated online assistants,with the ability to provide information to customers. During the 1980s and 1990s we also find voice-based Virtual Agents in call center IVRs for self-service – most popular in the banking, finance and public service domains. A well-known virtual agent figure was Julie from Amtrak. Over ten years ago Julie was assigned as Amtrak’s friendly voice that you hear when you call their 1-800 number. It started a persona that was at first mostly associated with the voice enabled services and is now also used by Amtrak for Internet chatting and is associated with a woman’s image that guides you through the company website (“Ask Julie”, Amtrak 2016).

In the 1990s the first online avatars could be found as well. Early ones were simply animated characters mainly to attract more attention, while others presented information to video gamers, for example. In addition, the development of virtual and augmented reality technology – especially in gaming – required the user to choose an avatar to activate and, in most cases, play an extension of himself.

With the development of 3D graphics, gesture recognition and real-time image processing, the usage of virtual images representing the player is growing further. One of the most popular games using avatars is World of Warcraft. In this MMORPG (Massive Multiplayer Online Role Playing Game), the gamers choose a clearly combat focused Avatar. With about 7.1 million subscribers as of Q1 2015 (Statista 2015), World of Warcraft is currently the world’s most-subscribed MMORPG. Digital avatars as virtual representations of real humans are also used in virtual worlds, such as Second Life.

What is common to all these examples is that the virtual image (avatar or agent) has human-like features and capabilities, with the ability to speak, move and understand what the user is communicating. A virtual avatar is able to conduct an interaction at some intellectual level (from light to sophisticated).

6.1.3From virtual “me” to virtual “you”

Whereas the first organicist machine representations were of general human (or animal) figures and mostly not of a specific person, avatars in video games or virtual worlds represent a specific individual. They are used as a “virtualme”, acting instead of the user in a virtual world. By the same token, we see in the last few years frequent usage of avatars as individual “virtual you” personalities, acting instead of customer service representatives.

What is common to all of them is that they can be “contacted” and interacted with in various ways using several communications channels as well as different input and output technologies, in conjunction with data transfer by apps or sensors – exactly as Caswell (2001) suggested. Some of these implementations require a high order of Artificial Intelligence technology to support them, such as NLP, machine learning, etc.

The ability of an avatar to generate face-to-face communication between real and virtual persons allows a much richer communication channel. It enables multimodal communication through both verbal and nonverbal channels such as gaze, gesture, spoken intonation and body posture (Nassiri, Powell & Moore 2004).

Furthermore, in a world in which mobility is a core topic, mobile devices, trackers, mobile sensors and smartphones are clearly connected to avatars. People create their own “virtual me” and “feed” it with multimodal data from sensors like fitness trackers or apps, so that they can measure weight loss, health or even sleep phases. There is no doubt that the interaction between humans and machines has reached a new level.

6.2A relationship framework for Avatars and Virtual Agents

Our basic working assumption is that a Human-Machine relationship is taking place in the interaction and the aspiration is to achieve a dialog that enables task completion whatever the user task is.

In their work Rich, Sindner and Lesh (2000) discuss the theory of Collaboration, which is basically intended for Human-to-Human interactions, and apply it to Human-Machine interaction. “Collaboration” is a process in which two or more participants coordinate their actions towards achieving shared goals (Rich et al. 2000, p. 2). They introduce the concept of an agent independent of the application that communicates with the user in a three-way communication (agent, app, and user). The machine has been personified and “Collaboration” is now taking place between two users: one virtual and one human.

In our examination of theories of collaboration, we see that we can directly relate them to the concept of team work and we find three perspectives as described by Goldman and Degani (2012): the Humanistic View (teams with Human-only interaction), the Mechanistic view (in AI, for teams of Machines) and the Human-Machine view (joint teams).

If we accept the definition that Team Work means being responsive to each other’s needs and mutually supportive to succeed in the joint plan, then we definitely get a sense of relationship that is being created between users (in our case between the Avatars and Virtual Agents and the human end user).

If this is the case, then let’s examine and classify the types of relationships created in the dialog. We want this classification to be the basis for making a decision about which multimodal technologies or interaction types to choose for our Human-Machine Interface design.

We propose three main relationships types between human users and Avatars and Virtual Agents:

Type 1: The Avatar as virtual me

Type 2: Interaction with a personalized/specialized Avatar

Type 3: Me and a random Avatar

6.2.1Type 1 – The Avatar as virtual me

In this relationship type the Avatar is a mirror image of the user, a persona used to function as an extension for various purposes (playing a virtual game, personal assistance, medical avatar to be used in health applications, and a reflection of fitness and health application). The relationship is self-contained; personal activities are tracked without the need for external intervention. This type of Avatar is highly personalized according to the profile of the user and his preferences, with the ability to learn patterns and behaviors and the relationship can take on a high degree of intimacy. The preferred modalities will be inherent to the user and his profile and will not change very much.

In 1985 a video game called Ultima V was published. It was one of the first role-plays allowing the user to take on a different identity. The player starts as a regular character and tries to become an avatar. For the first time in gaming, an avatar appears. We already mentioned World of Warcraft (WoW), created in 2004 and today the world’s most-subscribed MMOPRG. In the virtual universe of Azeroth every gamer can choose an avatar and personalize it, no matter whether he or she wants to act as a magician, elf or even monster (Ultima V 2015). But in these digital days, there are many more virtual worlds and a huge variety of avatars. Second Life, for example, developed by Linden Lab, is more than a game for its users. For them it’s a kind of lifestyle or even an attitude towards life. Second Life users create avatars that can do anything that can be done in the real world: explore new things, chat, do business, create things. This virtual world is not a game as we know it – it’s a parallel universe, just digital (Second Life 2015). It is very interesting how people act in virtual environments. Researchers have found that people tend to choose virtual characters that are a slightly “better” or an idealized version of themselves. People who have low self-esteem often choose avatars that like to socialize a lot. On the other hand researchers have also found, however, that users who create a more self-like avatar enjoyed the game more (Madigan 2015).

The other kind of avatars that belong to Type 1 are personal assistants that behave as an extension of the user for the purposes of self-monitoring, self-management and recording data related to a variety of activities. The popular applications of market leaders – such as Siri (Apple), Cortana (Microsoft), Ivee, Echo (Amazon) and Google Now (Google) – provide personalized services for mobile applications.

An avatar for elderly people called GeriJoy is a caregiving companion, built to address many of the unique challenges faced by seniors and their families. The GeriJoy Companion avatar is designed to be a supportive friend and caregiver. It is able to listen to – and remember! – what the user is saying, like names of grandchildren, favorite places, TV shows. It monitors the emotional states of the elderly such as feeling lonely or confused, and can provide engaging and supportive conversation.

Yet another example is Medical Avatar (Medical Avatar LLC 2016), an Internet site that provides online health management and maintenance services. It can identify health symptoms on the patient avatar while the user is interacting with a 3D image to create his mapping of his illness or complaints.

Of course both the GeriJoy and Medical Avatar services are supported with a backend of live agents and caregivers, but from an efficiency point of view, enabling self-service for at least some activities can save time by handling simple requests, especially for frequent users.

6.2.2Type 2 – The interaction with a personalized/specialized avatar

This relationship type refers to the use of a specialized Virtual Agent that specializes in a certain type of activity such as: banking and financial services, HMO health agents, education, and government assistance. On the one hand these virtual agents represent a company, an organization or a service provider but, on the other hand, they are not random. They have prior knowledge of the user as their customer, sometimes on a routine basis. They know the profile of the user, his history and habits. They may be not as good as the personal avatar but in their specific line of business they can make predictions and next best offers. They can also connect generally available data with personal data to maximize service.

Nina is an intelligent multichannel customer service virtual assistant – a platform able to connect to IVRs, Internet sites, chat boxes and mobile apps. It can take on different personas depending on the enterprise it represents. One example is Ines, Nespresso’s (AI4US 2016a) virtual agent. She has prior knowledge about the user, his account and his regular purchases. In addition, the user can ask her anything about the products and services offered. She gives users access to their information, helps users register with The Nespresso Club, and assists with connection issues and access to the Nespresso site.

The advantage of a personalized, known agent across all channels of the enterprise is clear. It is meaningful both in light of the theory of collaboration mentioned earlier as well as the notion of familiarity and personalization of services, which is crucial for the customer experience as a whole.

Online Banking is growing and is directly connected to a multichannel approach whereby users can access their account information and perform actions from various digital channels such as Internet, mobile app, telephone, chat, etc. Having an agent that is common to all channels and familiar with the customer’s profile and habits is an advantage that is leveraged to create a holistic, seamless customer experience across every encounter.

To clarify, the difference between Type 1 and 2 in our definition is that Type 1 Avatars are an actual extension of the user and his daily habits be it in his gaming, his virtual world or his own personal assistants on his devices, where his profile resides and he can select his own preferences. In Type 2 we are talking about a virtual agent who is not under the control of the users but rather operated by a service organization, yet has prior knowledge of and can make personalized offers to the user.

6.2.3Type 3 – Me and a virtual agent that is random

The third type of relationship is a relationship between the user and a Virtual Agent with whom he is not familiar, a random interaction with a virtual representative who contacts the user while he surfs the Internet or through an online segmentation and targeted advertisement campaign.

Here the user is mostly in a passive and less cooperative mode. For design purposes these agents must be very interesting, articulate and very engaging to get user attention and response.

Essentially these random virtual agents can be defined as virtual personas answering real-time questions asked on a website, at click speed, without pauses, 24 hours a day, completely automatically. These VAs can engage in a dialog and help customers make a decision.

The first Avatars used in marketing were only able to give information, with no ability to engage in two-way interaction. The Microsoft assistant Clippy (Cheezburger Inc. 2016) is a good example. When the user started typing, the assistant with googly eyes popped up and offered help. But things have changed a lot since then. The digital revolution and the availability of various interaction technologies have enabled Avatars and users to interact with each other more naturally and effectively. People seem to like interacting with Virtual Agents, as they provide a social dimension to the interaction. Humans willingly ascribe social awareness to computers (Nass, Steuer & Tauber 1994), and thus interaction with Virtual Agents follows social conventions, similar to human-to-human interactions. This social interaction both raises the believability and perceived trustworthiness of the agents as well as increases the user’s engagement with the system. Another effect of the social aspect of agents is that presentations given by a Virtual Agent are perceived as more entertaining and more agreeable than the same presentations given without an agent (Van Mulken, André & Müller 1998).

Virtual Agents of this kind are also called “Chatbots”. There are many good examples of Chatbots being used in marketing or customer service that offer more than just information. A current example is ALEXA – the virtual assistant from Amazon. Integrated into Amazon ECHO for the first time, Amazon plans to release the assistant for tablets and smartphones as well. It works via Amazon web services and requires a good Internet connection. It offers weather and news from a variety of sources and will play music from the owner’s Amazon Music accounts. ECHO will respond to your questions about items in your Google calendar and other interfaces. Like Siri or Cortana, Alexa can be controlled by speech. Nevertheless ECHO is not a human-like figure; it is a cylinder-shaped object to be used in the home or office.

The site www.chatbots.org lists a wide range of customer service Virtual Agents – like Ines based on the Nina platform (AI4US 2016b) – with different gender persona and visualization. Some examples are Agent Striker for IGN Entertainment, Nathan for Symantec, Charlie for AT&T, Mr. Bibendum for Michelin, Cloe for VirtyOz, and many more.

In conclusion, defining these three framework types in this section is important in order to characterize the different types of Avatars and Virtual Agents and distinguishing them by means of the relationship they form with the user. These distinctions will also be important later when evaluating the quality of a given platform for creating the evaluation matrix in Section 6.3.3

6.3Multimodal features of A&VA – categorizing the need, the challenge, the solutions

Up to this point, we have described the evolution of Avatars and Virtual Agents and suggested a relationship framework to describe the different types of human-machine relations. Next we will discuss the usage of Multimodal interaction in Virtual Avatars and Agents and suggest a methodology to facilitate the design and fusion of technologies for different A&VA applications.

The context for this discussion is very straightforward. Human-machine interaction has been changing to accommodate to the new digital era, with Multimodal interaction technologies helping to successfully bridge the gaps. Our user is living in a mobile environment based on state-of-the-art digital capabilities and is looking to interact intensively in order to get information, collect data, get customer service and perform actions.

6.3.1About multimodal interaction technologies

Multimodal systems combine two or more input and output modes for human-machine interaction – such as speech, handwriting, gesture and touch, sketch, eye movement, facial expression, and so on. “This new class of interfaces aims to recognize naturally occurring forms of human language and behavior, which incorporate at least one recognition based technology (e.g., speech, pen, vision)” (Oviatt & Designs 2012, p. 1)

Fortunately today system designers are able to successfully offer many of these modes relying mainly on the mobile infrastructure and the availability of communication networks. Nevertheless, as we will show, the offering of interaction modalities is directly dependent on the functionality and the nature of the application.

Oviatt and Cohen (2015) have created a table containing interaction modes according to different mobile platforms and infrastructures. Some are already popular while others are still in the early adopters stage. But it is clear that with the IoT (internet of things) revolution and 5G cellular networks, the incorporation of more and more modalities will be a requirement of the users themselves. This table (Oviatt & Cohen 2015, p. 136–137) presents applications that use Multimodal interaction. It shows the status of actual combinations and implementations that already exist in the market. For each of these combinations there is an industry-line as well as actual product providers. One of the industry lines is Virtual Assistance, which is a sub-area of the A&VA field. A careful look reveals that Virtual Assistance includes the largest number of possible modalities to be used: Voice, Sketch, Touch, Gesture, Handwriting, Mouse, Gaze, Face tracking, Keyboard, Buttons, Head position, Torso position, Eye tracking, Arm position, Face recognition, Fingerprint, Iris, and Voice biometrics (Oviatt & Cohen 2015, p. 136–137).

In fact we claim that in an Avatars & Virtual Agents platform we will be using the maximal number of modalities – higher than in any other industry line. This highly multimodal approach addresses the goal of creating more flexible, efficient, challenging and easy-to-use interactions for different user populations and the need to accommodate the different types of A&VA relations described in Section 6.2 above.

6.3.2Why use multimodality with Avatars?

Whereas the first Avatars were mostly attention-grabbing “personas” limited primarily to providing information, modern Avatars offer much more. Whether in gaming, education, personal assistants or even in healthcare, customer service and marketing, the user experience and the task completion rate of the interaction is crucial to the success of the platform. To demonstrate the importance of this issue, let’s consider the following examples:

A user is logged-in, playing a video game using his avatar. The interaction between him and his avatar takes place using keyboard (for chat) and joystick and perhaps he is using video and voice too. If the interaction fails because, for example, his camera or keyboard drops out, an error will occur. In some cases he will be logged out and his session will end, which is disappointing but not critical because he can log in again.

But if an interaction is taking place between a user and his avatar on a health-care site (such as a Doctor’s site: www.MedicalAvatar.com) and in the middle of a personal checkup the application halts, the user might feel very uncomfortable regarding his health information. In another example, if you are calling a service center requesting information and the virtual agent is unable to understand you, keeps getting into an error loop because the speech engine is failing and the interaction is cumbersome – there will be no resolution of the request and the user experience will be disappointing. It has been established that multimodal ways of interaction are, most of the time, easier to use, require less training, are robust and flexible, as well as faster, more efficient, and, of course, supports new functionality (Dumas, Lalanne, Oviatt 2009).

So how can we design a satisfactory interface for A&VA using multimodal capabilities? From an interaction point of view we claim that three factors must be analyzed and modeled before designing a solution.

6.3.2.1 The user

Identifying the target user is a key factor. There’s a difference whether a child or an adult is interacting with an avatar, or if the user is elderly or an injured or disabled person. Each of the user types mentioned will prefer one or more ways of interaction. Children will mostly communicate by speech, text, touch and gesture, but an elderly person may use eye position, voice or gesture. Other issues concern the output technologies preferred by each of the multimodal interaction user types: textual or image output, voice or cues, as well as how the data is presented.

6.3.2.2 The purpose

In his discussion Traum (2008) establishes the relationship between the behavior of people during a human-to-human dialog and a person’s interaction with “virtual humans” (the term he coined for Avatars and Virtual Agents). Traum claims that the construction of dialogs with virtual humans can be based on similar cornerstones to human-to-human dialog, quoting Allwood’s (1995) social activities parameters:

Procedures: type, purpose, function

Roles: competence, obligations, rights

Instruments: machines, media

Other physical environment parameters

But Traum also claims that we must consider that the behavior of people might be affected by the fact that they are conversing with a virtual character and this fact needs to be taken into account in the design of the dialog: “Purely human data can be used both as a starting point for analysis and implementation of virtual humans, and also as an ultimate performance goal, however it may not be the most direct data, especially for machine learning.” (p. 300).

The next criteria concerns the task the user is trying to complete using the Avatar: Is the Avatar being used to get information, for Q&A, for education, diagnostics or to provide data to someone else. We can differentiate among three types of use cases: active, passive or mixed use.

Passive use. The user only receives passive information such as, for example, in transportation when an avatar is giving schedule times for buses or trains. This Avatar is usually a persona representing the company on its Internet site.

Active use. The user actively provides data to the Avatar or Virtual Agent. For example, using a fitness app with an avatar of “myself”, giving data about food habits, about training, about running distance or sleep behavior by using some sort of fitness tracker or other sensors. The user initiates the action and actively engages with the virtual persona.

Mixed use. The user is not only providing data but also receiving information from the Avatar or Virtual Agents. Diagnosis solutions are a good example of mixed usage. The patient provides data to the Avatar using speech, touch, keyboard or sensors and gets information from it, e.g. a diagnosis or medical advice. If data is provided to the Avatar or Virtual Agent, usually two or more ways of interaction are used. The more extensively someone is giving data and information, more ways of interaction are used.

6.3.2.3 The environment

The interaction modality is also influenced by the usage environment. Users do not usually use speech in a very noisy environment or eye tracking when it’s dark outside. Environment is not only the surrounding and ambience. It also includes the platform on which the virtual agent is used. Mobile devices allow for easier interaction in some respects while wearables or gaming devices have other restrictions and considerations. In fact, as claimed by Lawo, Logan and Pasher (2016, to appear), the notion of wearable ecology describes the environment in which we put the user and the technology in an intimate relationship. In their article they discuss this environment as empowering the user on the one hand, but as limiting and restricting on the other.

To sum up, in the first part of the section we claimed that A&VA will use as many modalities as possible. In the second part we clarify that the three criteria of User, Purpose and Environment must be factored in when selecting the appropriate modalities for an optimal multimodal interface design.

6.3.3Evaluation of the quality of Avatars and Virtual Agents

This section addresses the research question of how to determine the optimal interaction design for A&VA. Let’s review our argument so far; In Section 6.2, we examined a set of classification methods to characterize the user-machine relationship and established a relationship framework based on three types: Type 1: Avatar as virtual me; Type 2: Interaction with a personalized/specialized Avatar; and Type 3: Interaction with a random Avatar. In Section 6.3.2 we established the criteria to determine which modalities will fit different application and use cases depending on User Type, the Purpose of the application, and the Usage Environment.

The aim of this section is to present the set of features for evaluation. This set of features will help us best assess the overall quality of the Avatar, taking into consideration all the factors we have discussed above – and some others, which are related to performance measurement.

We suggest an evaluation framework which is based on a set of assessment features and directly linked to the three user-Avatar relationship types. For each relationship type the priority and weight of the feature is scaled and graded as L (low), M (medium), or H (high). This matrix of features, relationship type and grading, provides a novel approach to the evaluation of an avatar or virtual agent (see Tab. 6.1). For each feature we ask how crucial it is for the Interaction Type. Some of the features are based on general quality assurance best practices while others are specific to the A&VA platform.

The following is a brief description of the matrix basics, focusing only on those features that are central to the chapter’s arguments. Clearly, the matrix needs to be further developed and detailed.

Ease of Use: This feature is concerned with how friendly and easy to navigate the interface is, and if it requires a sophisticated learning effort. We claim that for all types this feature is crucial, and, therefore, is graded as High.

Visualization: This feature deals with the external look of the A&VA, the app – whether there is a human-like agent, an image, or just an app user interface. How likeable is it and how does it represent the user or the enterprise? We claim that this feature is, in general, highly important to all types of A&VAs.

Tab. 6.1: The evaluation matrix.

When the A&VA is a Persona (human-like image), how important is it to the interaction and to the engagement of the user to create a remarkable user experience? Well, for Type 1, which is the extension of the user in an avatar (“virtualme”), it is of very high priority. For the personalized Virtual Agents it has medium importance since, after all, it is a reflection of the enterprise it is representing. For the random type (3) the visualization is of low value as long as it is agreeable and easy to use.

Privacy of information is a grand and complex issue that cannot be covered here. We will simply say that in, our opinion, it is not of the same importance to all Types. In case of a random agent which does not possess any prior information, it will be of lesser importance while for Types 1 and 2 it must be a high priority issue.

Language Support is a feature that enables the A&VA to support many languages and accommodate multi-language speakers. This is highly important for the Type 3 random agent, which needs to support several target audiences without prior knowledge of their preferred language. This will not be necessary for Types 1 and 2, where the basic profile and preferences of the user are known in advance.

Diversity of available tasks relates to how important it is that the A&VA will be able to offer an array of services of the following types: specific personalized ones, general tasks, or a mixture of both. How much emphasis should be placed on learning the user in real time and providing services based on his habits and behavior? We claim that for Type 1, personalized tasks are critically important, while for Type 2 a mixture of personalized and general tasks has to be supported. Type 3 requires only general tasks.

Error recovery is the ability of the A&VA to gracefully recover from a dialog mistake or misunderstanding rather than create an infinite loop and a bad interaction experience. Error recovery is not easy and requires NLP as well as data analytics technologies, but it is highly important especially in a dialog with a virtual agent representing an enterprise in a commercial interaction. Failure to create a smooth interaction – especially if the agent has prior information regarding the user – is a problem. Another feature important to user experience is Latency, i.e., the delay between input into the system and the appearance of the desired output. Latency greatly affects the quality of the interaction as well as the flow of the dialog. As a result, low latency (the reaction in less than a few seconds per interaction) is graded as High for all three types.

We have already established the importance of Multimodal Interaction Technologies in A&VA design, claiming that A&VAs can incorporate the largest number of input/output technologies into the user-machine dialog, whether that dialog be passive or active (see Section 6.3.1). We claim that for Types 1 and 2 multimodal interaction technologies are crucial, while for Type 3 it is of Medium priority – good to have but not critical since the expectation for interaction from a random agent is not as high as from a personalized agent or avatar. When we drill down to the usage of sensors for input information and for output feedback, once again this is of high priority to Types 1 and 2 and less so for the random Type 3.

Other supporting technologies also determine the quality of the interaction and the level of the user experience. Gamification, i.e., adding gaming components and rewards for task completion and improvement into the A&VA-user dialog, is highly recommended for Type 1 Avatars and recommended for personalized Type 2 agents. For Type 3 it can be used effectively to engage the user with a new offering.

As mentioned above, Data Collection and Analytics also play an important role in the process of personalization. Analytics facilitate the recognition of user behavior patterns and the prediction of future user behavior. It is highly critical for Types 1 and 2 and much less for Type 3.

Machine learning algorithms and AI capabilities are necessary for an A&VA to interact in a natural, human-like way, leveraging the user experience and generating cooperation from the side of the user. To achieve this, the interaction must be monitored continuously over time, with the ability to collect, store and analyze the data and then run Machine learning AI algorithms to process the interaction and learn to improve it.

The matrix described above gives us a glimpse into the quality considerations relevant to A&VAs. It also includes some suggestions for a new approach to grading and scaling, but the subject definitely requires further elaboration and research.

6.4Conclusion and future directions: The vision of A&VA multimodality in the digital era

How is HMI influenced by the new technological developments of the current Digital Era? This question is essential to the understanding of the future of the interaction technologies that form the basis for the abilities of A&VAs to communicate in a human-like manner and incorporate AI.

The very existence of Avatars and Virtual Agents depends on a successful user interface and is closely related to the whole theory of human-machine interaction. However, we will not review here the timeline of human-machine interaction development, but will jump straight to the current period and discuss the implications of the Digital Era.

Jill Shepherd (2004) claims that the digital era is characterized by technology that increases the speed and breadth of knowledge turnover within the economy and society. “The Digital Era can be seen as the development of an evolutionary system in which knowledge turnover is not only very high, but also increasingly out of the control of humans, making it a time in which our lives become more difficult to manage.” Furthermore she claims that “the Digital Era has changed the way we live and work by creating a society and economy that is geared to knowledge, whether that knowledge is content-laden and therefore scientifically factual, or instead is content-free and therefore reliant on emotions, or indeed any combination in between.” (Shepherd 2004, p. 2)

In this era people belong to active social and economic communities and the effect on all members of these communities is their aspiration to gain as much knowledge as possible. As technological functionality becomes more knowledge-based, we become more dependent on knowledge retrieval technologies and online devices. Thus understanding the Digital Era in terms of evolution means enrolling technologies to provide service and knowledge in every area and using technology to build assistive Avatars or Agents is an excellent subject of this revolution.

Meisel (2013) suggests that personal assistance apps like Siri point to the growing sophistication of the human relationship with digital systems. He claims that “user interfaces will continue to improve over time, driven by improvements in the underlying technology, faster digital processors, improved connectivity with computer networks and user feedback.” (Meisel 2013, p. 27).

On the one hand this change has led to new ways of interaction between humans and Virtual Agents and Avatars, and, on the other hand, to new interaction types and interfaces affected by the following four major leading trends that are highly relevant to our topic:

Infrastructure Technologies. The data availability revolution is powered by big data, cloud storage, hosted services and innovative user interface technologies. These developments have made it possible to go to the next level in terms of dialog systems and self-service. From information-giving avatars in the early 1980s to today’s mobile personal assistants that support user input and interaction of two or more modalities (Oviatt & Cohen 2015).

Mobile Technologies. Mobility as a whole and the smartphone in particular that is found in virtually every pocket requires the development of supporting interfaces to fit the needs of all users. Mobile manufacturers provide multimodal interfaces and Avatar applications as basic offerings to the users. We see the fusion of data, sensors and input/output technologies into mobile devices in order to optimize their performance and the quality of interaction that they provide. For example, using the GPS sensor to locate the user, to count steps and then transfer this data to the Avatar can create a data routine that alerts the user in a fitness application. 3G and 4G capabilities have enabled higher and higher data rates over the last ten years. Streaming got more powerful and the usage of apps, chat and other smart functions predominate telephone functions (Rainie & Poushter 2014). Behold! The second digital revolution is coming! 5G networks will create an even stronger effect – making every device a communication channel and allowing IoT applications. Mobile devices have become the preferred multimodal platform for interacting with Virtual Agents and Avatars by enabling the usage of virtual communications through apps, sensors and mobile solutions.

Socio-economic phenomena. From Babyboomers (born 1946–1965), to Generation X (born 1965–1980) to Generation Y, so called Millenials (born after 1980), “smart” technology – and especially mobile devices – is integrated into their work and private lives. Eight out of ten young people aged 18–30 use a Smartphone, 76% own a Notebook and every third young adult possesses a tablet (Heuzeroth 2014). The digital revolution has taken place and technology, apps, mobile devices and avatars are our daily companions. As the survey “Generation #Hashtag” by Bain & Company shows, as many as 2/3 of German users prefer digital media and use mobile devices (Kunstmann-Seik 2015). The social changes brought about by the growth of these technological changes are reflected in our habits and behaviors, as well as in the way we communicate, consume products, socialize and interact. That means that communicating with your PA, your bank’s virtual agent, or your health avatar should be an easy task, with a friendly user interface that fits anywhere, anytime.

The elderly population, which is growing massively, is creating a new balance in market forces on the interaction domain, requiring the adaptation and adoption of new care methodologies and technologies. The same is also true with children. The growing use of A&VAs in care-taking and assistive, as well as educational, applications is driving the adoption of multimodal interfaces.

In conclusion, the technological developments that are an essential part of this era have cultivated and essentially forced a change in the way Humans and Machines interact and thus has brought new interface technologies to human-machine interaction in general, and to Avatars and Virtual Agents specifically.

Abbreviations

A&VA Avatars and Virtual agents
HMI Human Machine Interaction
AI artificial intelligence
MMORPG Massively Multiplayer Online Role Playing Game
NLP Natural Language Processing
IoT Internet of Things
WoW World of Warcraft
HMO Health Maintenance Organization
MMI Multimodal Interaction
I/O Input output
ECA Embodied Conversational Agents

References

AI4US 2016a, ‘Ines (Nespresso Club)’, Chattbots.org. Available from: https://www.chatbots.org/virtual_agent/ines1 [17 January 2016].

AI4US 2016b, ‘Chatbots by nuance’, Chattbots.org. Available from: https://www.chatbots.org/developer/nuance/ [17 January 2016].

Allwood, J 1995, An activity based approach to pragmatics, Technical Report (GPTL) 75, Gothenburg Papers in Theoretical Linguistics, University of Göteborg.

Amtrak 2016, Ask Julie. Available from: http://www.amtrak.com/about-julie-amtrak-virtual-travel-assistant [17 January 2016].

Avatar 2016, (Wikipedia article). Available from: https://en.wikipedia.org/wiki/Avatar [7 October 2015].

Cassell, J 2001, ‘Embodied conversational agents: representation and intelligence in user interfaces’, AI magazine, vol. 22, no. 4, pp. 67.

Cheezburger Inc. 2016, ‘Clippy’, Know your meme. Available from: http://knowyourmeme.com/memes/clippy [17 January 2016].

Egen, S 2005, The history of avatars, iMedia Connection.

Fox, J, Ahn, SJ, Janssen, JH, Yeykelis, L, Segovia, KY, & Bailenson, JN 2015, ‘Avatars versus agents: A meta-analysis quantifying the effects of agency on social influence’, Human-Computer Interaction, vol. 30, issue 5, pp. 401–432. Available from: http://www.tandfonline.com/doi/abs/10.1080/07370024.2014.921494 [25 October 2015]

GeriJoy 2015, Senior living. Available from: http://www.gerijoy.com/senior-living.html[29 July 2015].

Heuzeroth, T 2014, Generation Y fühlt sich von digitaler Welt gestresst. Available from: http://www.welt.de/wirtschaft/article134497791/Generation-Y-fuehlt-sich-von-digitaler-Welt-gestresst.html [27 July 2015].

Hills, S 2013, Was this automaton the world’s first computer? Incredible mechanical boy built 240 years ago who could actually write. Available from: http://www.dailymail.co.uk/news/article-2488165/The-worlds-Mechanical-boy-built-240-years-ago-engineered-act-writing.html [5 August 2015].

Kim, J 2013, Health Buddy: Fitness-Avatar wird fett bei zu wenig Bewegung. Available from: http://de.engadget.com/2013/08/20/s-health-buddy-fitness-avatar-wird-fett-bei-zu-wenig-bewegung/ [29 July 2015].

Kunstmann-Seik, L 2015, Bain-Studie zur digitalen Mediennutzung: „Generation #Hashtag“ setzt auf neue Medienformate. Available from: http://www.bain.de/press/press-archive/generation-hashtag-setzt-auf-neue-medienformate.aspx [1 July 2015].

Lawo, M, Logan, R, & Pasher, E 2016, ‘Wearable computing – A media ecology approach and the context challenge’ to appear in The Design of Mobile Multimodal Interfaces, eds N Shaked & U Winter, DeGuyter, NY.

Lloyd, D 2013, Mobile Virtual Agents for Self-Service Banking. Available from: https://www.bai.org/bankingstrategies/article.aspx?Id=2f247148-b5e3-431d-92a9-d3406e253c73[29 July 2015].

Madigan, J 2015, The psychology of video games. Available from: http://www.psychologyofgames.com/author/jamie-madigan/ [29 July 2015].

Mastin, L 2008, René Deacartes. Available from: The Basics of Philosophy http://www.philosophybasics.com/philosophers_descartes.html [17 January 2016].

Medical Avatar LLC 2016. Available from: https://www.MedicalAvatar.com [17 January 2016].

Nass, C, Steuer, J & Tauber, ER 1994, ‘Computers are social actors’, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, pp. 72.

MedicalAvatar 2015, Available from: http://www.medicalavatar.com [17 October 2015].

Nassiri, N, Powell, N & Moore, D 2004, ‘Avatar gender and personal space invasion anxiety level in desktop collaborative virtual environments’, Virtual Reality, vol. 8, no. 2, pp. 107–117.

New Hope Media LLC 2013, To-Do List Apps for ADHD Kids and Adults. Available from: http://www.additudemag.com/adhd/article/8698.html [29 July 2015].

Nuance 2015, Self service with an intelligent touch of humanity. Available from: http://www.nuance.com/for-business/customer-service-solutions/nina/index.htm [29 July 2015].

Oviatt, S & Cohen, PR 2015, The paradigm shift to multimodality in contemporary computer interfaces, Morgan & Claypool Publishers.

Oviatt, S & Designs, I 2012, ‘Multimodal Interfaces,’ in Handbook of Human-Computer Interaction, ed J Jacko, (3rd ed), Lawrence Erlbaum, New Jersey, pp. 405–430.

Perkowitz, S 2004, Digital People: From Bionic Humans to Androids, Joseph Henry Press.

Piccolo Picco Ltd 2015, Emerging nations catching up to U.S. on technology adoption, especially mobile and social media use. Available from: https://itunes.apple.com/us/app/fitness-avatar-exercise-trainer/id942101272?mt=8 [29 July 2015].

Rainie, L & Poushter, J 2014, Emerging nations catching up to U.S. on technology adoption, especially mobile and social media use. Available from: http://www.pewresearch.org/fact-tank/2014/02/13/emerging-nations-catching-up-to-u-s-on-technology-adoption-especially-mobile-and-social-media-use/ [27 July 2015].

Second Life 2015 (Wikipedia article). Available from: https://de.wikipedia.org/wiki/Second_Life [29 July 2015].

Shepherd, J 2004, ‘What is the digital era?’, in Social and economic transformation in the digital era, eds G Doukidis, N Mylonopoulos N Pouloudi, pp. 1–18.

statista 2015, Number of World of Warcraft subscribers Q12005–Q1 2015. Available from: http://www.statista.com/statistics/276601/number-of-world-of-warcraft-subscribers-by-quarter/ [27 July 2015].

Traum, D 2008, ‘Talking to virtual humans: Dialogue models and methodologies for embodied conversational agents’, in Modeling communication with robots and virtual humans, eds I Wachsmuth and G Knoblich, LNAI 4930, Springer-Verlag, Berlin, Heidelberg, pp. 296–309.

Ultima V: Warriors of Destiny 2015 (Wikipedia article). Available from: https://en.wikipedia.org/wiki/Ultima_V:_Warriors_of_Destiny [29 July 2015].

Van Arsdale, J 2014, Fitness hero presents – ‘Avatars’. Available from: https://www.indiegogo.com/projects/fitness-hero-presents-avatars#/story [29 July 2015].

Van Mulken, S, André, E & Müller, J 1998, ‘The persona effect: how substantial is it’, People and Computers XIII: Proceedings of HCI, vol. 98, pp. 53–66.

Weizenbaum, J 2000, Die Macht der Computer und die Ohnmacht der Vernunft, 11th edn, Suhrkamp Verlag GmbH, Frankfurt.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.93.210