© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
A. KoreDesigning Human-Centric AI ExperiencesDesign Thinkinghttps://doi.org/10.1007/978-1-4842-8088-1_9

9. Understanding AI Terminology

Akshay Kore1  
(1)
Bengaluru, India
 

In this chapter, we will look at some mainstream AI techniques, metrics used to evaluate the success of AI algorithms, and common capabilities of AI.

A large part of what AI teams do is figure out problems to solve with AI and match appropriate techniques and capabilities to the solution. While most product-facing PMs and designers are used to working together, they often speak different languages and have different working styles from their engineering and machine learning counterparts. These differences can inhibit powerful collaborations. Product designers and product managers need to understand what their tech teams do and speak their language to work effectively within AI teams.

In this chapter, we will look at the following:
  1. 1.

    Mainstream AI techniques

     
  2. 2.

    Metrics used to measure AI performance

     
  3. 3.

    Common AI capabilities and the applications they enable

     

A large part of what AI teams do is figure out problems to solve with AI and match appropriate techniques and capabilities to the solution.

Key Approaches for Building AI

There are different ways to implement AI, but at a high level, they are based on two approaches, namely, rule-based and examples-based. In the rule-based approach, the AI is programmed to follow a set of specific instructions. The algorithm tells the computer precisely what steps to take to solve a problem or reach a goal. For example, a robot designed to navigate a warehouse is given specific instructions to turn when there is an obstruction in its path.

In the examples-based approach, the AI is taught, not programmed. Learning happens by giving the AI a set of examples of what it will encounter in the real world and how to respond. For example, an AI may be shown multiple images of faces in different environments to detect faces from an image. By observing these images, the AI learns how to detect a face.

On the left side is a diagram of the rules-based approach, wherein a photo of a light switch when switched on will light a bulb. On the right side is a diagram of an examples-based approach, where a photo of a group of donuts is analyzed through learning patterns and an A I model to identify if the photograph is a donut or not.

Figure 9-1

Approaches of building AI. (a) Rules-based approach. Here, the switch is mapped to the light bulb. The rule is to turn on the light bulb by flicking the switch on. (b) Examples-based approach. The AI learns to recognize donuts by showing examples of what a donut looks like without specifying exact steps

The most popular AI techniques are a combination of rule-based and examples-based approaches. In larger AI products, it is common to use multiple AI techniques to achieve an objective. For example, a smart speaker might use a different AI algorithm to detect a voice and a different one to search for a response.

AI is an evolving field, and many new techniques will get invented over time. A significant amount of progress in AI techniques has been made in the subfield of machine learning (ML), so much so that AI in the industry is sometimes synonymous with ML.

A diagram where the inner circle is named machine learning, and the outer circle is named artificial intelligence.

Figure 9-2

Artificial intelligence is a broad field, and machine learning is a subset of AI

AI Techniques

This section will look at some of the following mainstream ML techniques used by AI product teams:
  1. 1.

    Supervised learning

     
  2. 2.

    Unsupervised learning

     
  3. 3.

    Reinforcement learning

     
  4. 4.

    Deep learning and neural networks

     
  5. 5.

    Backpropagation

     
  6. 6.

    Transfer learning

     
  7. 7.

    Generative adversarial networks (GANs)

     
  8. 8.

    Knowledge graphs

     

Supervised Learning

In supervised learning, you train the AI by mapping inputs to outputs. We use labeled data to build the model. Labeled means that along with the example, we also provide the expected output. For instance, if we want to build a system that detects product defects in a factory, we can show the ML model many examples of defective and good-condition products. We can label these examples as “defective” or “good condition” for the model to learn. We can label a face recognition dataset by indicating whether an example image contains a face or not.

This technique is a key part of modern AI, with the most value created by supervised learning algorithms. Supervised learning is a common technique that accounts for about 95% of all practical applications of AI, and it has created a lot of opportunities in almost every major industry.1

Most supervised learning focuses on two types of techniques, namely, classification and regression:
  1. 1.

    Classification refers to the grouping of the AI’s results into different categories. In this technique, the AI learns to group labeled data by categories, for example, grouping photos by people, predicting whether or not a person will default on a loan, sorting eggs by quality in an assembly line, etc.

     
  2. 2.

    Regression is a statistical technique that finds a prediction based on the average of what has happened in the past.2 The idea is to take past data and use it to predict future outcomes like predicting flight prices or detecting anomalies in data, for example, credit card fraud, where specific behavior patterns may be observed.

     
However, today, to teach a computer what a coffee mug is, we show it thousands of coffee mugs, but no child’s parents, no matter how patient and loving, ever pointed out thousands of coffee mugs to their child.3 The challenge with supervised learning is that it needs massive amounts of labeled data. Getting accurate and useful labeled data can be costly and time-consuming. Companies often employ human annotators to label training data.

A diagram of the two roles of A I models in supervised learning. On the left side, in the left column, from top to bottom are clip arts of apple, orange, and banana, which, upon training, result in an A I model. On the right hand side are a group of different fruits, which through an A I model, are now categorized according to their classifications with labels.

Figure 9-3

Supervised learning. In this example, we train the AI with labeled data of images of fruits. After training, the AI model can categorize fruit images by their labels

Unsupervised Learning

Most people would agree that if you have to teach a child to recognize cats, you don’t need to show them 15,000 images to understand the concept. In unsupervised learning, there is no labeled data. We teach machines to learn directly from unlabeled data coming from their environments. This is somewhat similar to how human beings learn. With unsupervised learning, you essentially give the algorithm lots of data and ask it to find patterns in that data. The AI learns by finding patterns in the input data on its own.

A diagram of the use of an A I model in clustering three fruits, which are apples, bananas, and oranges. From left, are the fruits that go through finding patterns, then to an A I model that clusters according to classification, without any indicated label.

Figure 9-4

Unsupervised learning. We provide the AI with images of fruits with no labels in this example, and the AI automatically finds patterns in the photos and sorts them into groups

Many applications of unsupervised learning use a technique called clustering. Clustering refers to organizing data by some characteristics or features. These features are then used as labels that the model generates. This technique is especially useful when you want to group things like a set of images by people, animals, landscapes, etc. or find groups of similar users for targeting ads.

On the left hand side is a pile of potatoes of different sizes and types. On the right hand side is a photograph of sorted potatoes according to their types. A horizontal arrow between the two indicates that the sorting has been completed.

Figure 9-5

Clustering. A pile of vegetables is clustered by its type in this example. Source: https://medium.com/mlearning-ai/cluster-analysis-6757d6c6acc9

Access to labeled data is often a bottleneck for many AI projects, and this is also one of the most difficult challenges facing the field. Unsupervised learning is an extremely valuable technique and represents one of the most promising directions for progress in AI. With unsupervised learning, we can imagine systems that can learn by themselves without the need for vast volumes of labeled training data.

Reinforcement Learning

Sometimes it can be difficult to specify the optimal way of doing something. In reinforcement learning, the AI learns by trial and error. In this technique, AI is rewarded for positive behavior and punished for negative behavior.

We show the AI data, and whenever it produces correct output, we reward it (sometimes in the form of scores). When it produces incorrect results, we punish it by not rewarding it or deducting rewards. Over time the AI builds a model to get the maximum reward.

Training a pet is a good analogy for understanding reinforcement learning. When the pet displays good behavior, you give it treats, and when not, you don’t give it any treats. Over time the pet learns that if they behave in a particular manner, they will get rewards, and you get the pet’s good behavior as an outcome.

A diagram of a silhouette of a standing cat, on the upper right side, which is in positive reinforcement, can play a ball, or receive a reward. While on the lower right hand side, with negative reinforcement, it will just sit, and will not receive a reward.

Figure 9-6

Training a cat using positive and negative reinforcement

Reinforcement learning is used for building AI that drives autonomous vehicles like self-driving cars, drones, etc., as well as for playing games like chess, Go, and video games. The AI can play a lot of games against itself to learn on its own. Google’s DeepMind team trained their AI to play 49 different Atari video games entirely from scratch, including Pong, Freeway, and Space Invaders. It used only the screen pixels as input and the game score as a reward signal.4

The main challenge with reinforcement learning is that it requires a large number of practice runs and massive computing resources before the algorithm is usable in the real world. A human can learn to drive a car in 15 hours of training without crashing into anything. If you want to use the current reinforcement learning methods to train a car to drive itself, the machine will have to drive off cliffs 10,000 times before it figures out how not to do that.5

Deep Learning and Neural Networks

Deep learning (DL) is a subset of machine learning. In ML, learning happens by building a map of input to output. This map is the ML model represented as a mathematical function.

The mapping can be straightforward or convoluted and is also called a neural network. For example, clicking a button to switch on a light bulb is an easy problem that a simple neural network can accomplish. The input is the click, and the output is the state of the light bulb (on/off). Learning this sort of mapping is more straightforward and sometimes referred to as shallow learning. For more complex cases like predicting the price of a house based on input parameters like size, the number of rooms, location, distance from schools, etc., learning happens in multiple steps. Each step is a layer in the neural network. Many of these steps are not known and are called hidden layers. Learning with multiple layers is known as deep learning.

In ML, depth refers to the number of layers in the neural network. Networks with more than three layers are generally considered deep. While the term neural networks might sound brainlike, neural nets are not trying to imitate the brain, but they are inspired by some of its computational characteristics, at least at an abstract level.6

DL is uniquely suited to build AIs that often appear human-like or creative, like restoring black-and-white images, creating art, writing poetry, playing video games, or driving a car. Deep learning and neural networks is a broad field and the foundation of many popular AI techniques.

A diagram of deep learning and neural networks. From left to right are input layers like size in square feet, number of rooms, location, nearest school, crime rate, and age, which are connected through lines and circular points called hidden layers, with an output of estimated price.

Figure 9-7

Deep learning and neural networks. In this example, features of a house like its size, number of rooms, location, etc. are used to estimate its price

Backpropagation

A large part of what machine learning teams do is called “feature engineering.” Learning is enabled by a collection of what are called “hyperparameters”—an umbrella term that refers to all the aspects of the network that need to be set up by humans to allow learning to even begin.7 This can include how much weight we should assign for different inputs, the number of layers in a neural network, and many other variables. This is called tuning the algorithm.

An ML engineer might build a model and assign some weights to inputs and different layers. After testing the model on some data, they might adjust these weights to get better results, and they might do this multiple times to arrive at an optimal weight.

By using the backpropagation algorithm, ML teams can start adjusting weights automatically. It’s a way of tinkering with the weights so that the network does what you want. As a neural network is trained, information propagates back through the layers of neurons that make up the network and causes a recalibration of the settings (or weights) for the individual neurons. The result is that the entire network gradually homes in on the correct answer.8

Backpropagation is used wherever neural networks are employed, especially for speech recognition and speech synthesis, handwriting recognition, and face recognition systems.

A diagram of backpropagation. From left to right, input layers are size in square feet, number of rooms, location, nearest school, crime rate, and age, connected through lines and points called hidden layers, with an output of estimated price. Lines of varying square footage are linked by feedback and a feed-forward that directs to the output.

Figure 9-8

Backpropagation. In this example, features of a house like its size, number of rooms, location, etc. are used to estimate its price. Information or feedback propagates back through the layers of neurons that make up the network and causes a recalibration of the model’s weights

Transfer Learning

If a person knows how to drive a hatchback car, they can also drive an SUV. Software engineers who know a particular language like JavaScript can transfer some of their knowledge when writing code in Python. If I give you a new task, you wouldn’t be completely terrible at it out of the box because you’d bring some knowledge from your past experiences with handling similar tasks.

Transfer learning is the ability of an AI program to transfer what it has learned about one task to help it perform a different, related task.9 Humans are great at doing this. For people, transfer learning is automatic. In transfer learning, instead of training your model from scratch, you can pick up some learnings from another trained model that is similar. For example, an image recognition system trained to recognize cars from images can also transfer its learnings to recognize golf carts. This technique is very valuable, and many computer vision and natural language processing (NLP) systems are built using transfer learning.

Photograph A: There are four cars on a road, each in a square with a label on the upper left: Car 0.94. Photograph B is a double-decker bus on a highway, in a square with a label on the upper left: Bus 0.86. Between the two photographs is a horizontal arrow.

Figure 9-9

Transfer learning. An image recognition system trained to recognize cars from images can transfer its learnings to identify buses. (a) Image source: https://pythonrepo.com/repo/MaryamBoneh-Vehicle-Detection. (b) Image source: https://medium.com/analytics-vidhya/object-detection-zoo-part-1-bus-detection-heavy-vehicle-detection-1f23a13b3c3

Generative Adversarial Networks (GANs)

Perhaps you’ve seen examples of AI capable of generating images of people who don’t really exist, AI-generated speech that sounds human-like, or AI restoring black-and-white pictures to color. These are all examples of generative adversarial networks or GANs in action. Researchers at Nvidia used GANs to generate photorealistic images of fake celebrities from pictures of famous people on the Internet.

Generative adversarial networks were developed by Ian Goodfellow and have been called the most interesting idea of the decade by Facebook’s chief AI scientist, Yann LeCun.10 While explaining how GANs work is out of the scope for this book, at a high level, this technique is a subset of deep learning that generates new data by pitting two AI algorithms against each other, hence adversarial.

Deepfakes, where a person in a specific kind of media—like an image, video, sound, etc.—is swapped with another person, are one of the most popular (and dangerous) applications of GANs. GANs have many applications in the VFX and entertainment industries. You can use GANs for various tasks like synthesizing new images from scratch, generating videos (or deepfakes), restoring images, self-driving cars, video game AIs, and sophisticated robotics.

On the left side are collage photographs of 25 faces of random men and women. On the right side is a collage of four house landscape paintings.

Figure 9-10

GANs. (a) Researchers from Nvidia used artificial intelligence to generate high-res fake celebrities. Source: www.theverge.com/2017/10/30/16569402/ai-generate-fake-faces-celebs-nvidia-gan. (b) Style transfer GANs can apply the style from one painting to the other image. Source: https://softologyblog.wordpress.com/2019/03/31/style-transfer-gans-generative-adversarial-networks/

Knowledge Graphs

When you search something like “Mumbai” on Google Search, you’ll notice that Google seems to understand that Mumbai is a place and will show you a rich view of places to visit, flights, hotels, etc. This rich view is enabled by a technique called knowledge graphs.

A knowledge graph, also known as a semantic network, represents a network of real-world entities—that is, objects, events, situations, or concepts—and illustrates the relationship between them.11 These rich graphs of information are foundational to enabling computers to develop an understanding of relevant relationships and interactions between people, entities, and events.12 Knowledge graphs provide a lot of economic value and are probably one of the most underrated AI techniques.

On the left side are search results for Socrates, arranged from top to bottom, which are Images, Name and Occupation, Wikipedia, General, Books, and Relationships. On the right side is a diagram of Socrates, connected with his background information and people related to him.

Figure 9-11

Knowledge graphs. (a) Knowledge panel on Google Search. Source: Google Search on desktop. (b) Representation of Socrates’s knowledge graph. Source: https://towardsdatascience.com/knowledge-graphs-at-a-glance-c9119130a9f0

Note

The preceding list contains mainstream AI techniques and is not exhaustive. Some of the techniques described might be subsets of others. AI is an evolving field, and the line between many techniques is blurry. Sometimes researchers might combine two or more approaches to invent a new technique. For example, the DeepMind group combined reinforcement learning—particularly Q-learning—with deep neural networks to create a system that could learn to play Atari video games. The group called their approach deep Q-learning.13

AI Metrics

AI systems are probabilistic and will sometimes make mistakes. Lack of complete information is an insurmountable challenge with real-world AI systems. Very little of our knowledge is completely certain; we don’t know much about the future. However, complete certainty is not necessary for action. We only need to know the best possible action given the circumstances. In other words, it is much more important for AI models to be useful than perfect. AI product teams use some common metrics and terms to measure the usefulness of their models.

Complete certainty is not necessary for action.

To determine the success and failure of a model’s outputs, we need to know when it succeeds or makes mistakes and what types of mistakes it makes. To understand this, let’s take an example of a system that detects donuts and “no donuts.” We can think of detecting a donut as a positive result and detecting “no donut” as negative. There are four possible outcomes:
  1. 1.

    True positive

    The model correctly predicts a positive result, that is, the system detects a donut correctly, and the image actually contains donuts.

     
  2. 2.

    True negative

    The model correctly predicts a negative result, that is, the AI detects “no donut” correctly, and the image doesn’t contain a donut.

     
  3. 3.

    False positive

    The model incorrectly predicts a positive result, that is, the system detects a donut when there is no donut in the image.

     
  4. 4.

    False negative

    The model incorrectly predicts a negative result, that is, the AI predicts “no donut” when the image actually contains a donut.

     
We can plot the preceding outcomes in a table. This table is called the confusion matrix and is used to describe the performance of a model. Let’s assume we give this detector 75 images of donuts and 25 photos with “no donuts” in them. In our early tests, we find out that of the 75 images of donuts, our detector recognizes 70 correctly (it’s donut) and 5 incorrectly (it’s “not donut,” but the detector says it’s donut). When classifying the 25 images of “no donut,” our detector recognizes 21 correctly (it’s “no donut”) and 4 incorrectly (it’s a donut, but the detector says it’s “no donut”).

A table of the results related to true positive and false negative rates. It has 2 columns, which are detected, donut and detected, no donut. It also has row headers, which are is, donut, and is not donut. Row 1 has 70 true positives and 4 false negatives. Row 2 has 5 false positives and 21 true negatives.

Figure 9-12

Confusion matrix of the “donut” detector

Once we know these values, we can use them to derive some standard metrics used by AI teams to understand the performance of their models.

Accuracy

Accuracy is the fraction of predictions our model got right. It is the proportion of correct predictions out of all predictions.

Accuracy = (True positives + True negatives) ÷ All predictions*

*All predictions = True positives + True negatives + False positives + False negatives

In the case of our donut detector, that would be

Accuracy = (70 + 21) / (70 + 5 + 4 + 21) = 0.91 (or 91%)

91%! That’s not so bad. However, there are problems with determining an algorithm’s performance based on accuracy alone. A 91% accuracy may be good or terrible depending on the situation. (It’s not good if a car crashes 9 out of 100 times).14 To understand the AI’s performance better, AI teams use metrics like precision and recall that describe the breadth and depth of results that your AI provides to users and the types of errors that users see.15

Precision

Precision is the proportion of true positives correctly predicted out of all true and false positives. Precision determines how much you should trust your model when it says it’s found something. The higher the precision, the more confident you can be that any model output is correct.16 However, the tradeoff with high precision is that the model might miss out on predicting some correct results, that is, you will increase the number of false negatives by excluding possibly relevant results.

Precision = True positives ÷ (True positives + False positives)

In the case of our donut detector, that would be

Precision = 70 / (70 + 5) = 0.933 (or 93.3%)

If the donut detector was optimized for high precision, it wouldn’t recommend every single image of a donut, but it would be highly confident of every donut it recommends. Users of this system would see fewer incorrect results but would miss out on some correct predictions.

Recall

Recall is the proportion of true positives correctly predicted out of all true positives and false negatives. The higher the recall, the more confident you can be that all the relevant results are included somewhere in the output.17 The higher the recall, the higher is the overall number of predictions. However, the tradeoff with high recall is that the model might predict more incorrect results, that is, you will increase the number of false positives by including possibly irrelevant results.

Recall = True positives ÷ (True positives + False negatives)

In the case of our donut detector, that would be

Recall = 70 / (70 + 4) = 0.945 (or 94.5%)

If we optimized the donut detector for high recall, it would recommend more images of donuts, including ones that don’t contain donuts. The user of this system would see more results overall; however, more of those results might be incorrect.

Precision vs. Recall Tradeoff

In most real-world scenarios, you will not get a system that both is completely precise and has a hundred percent recall. Your team will need to make a conscious tradeoff between the precision and recall of the AI. You will need to decide if it is more important to include all results even if some are incorrect, that is, high recall, or minimize the number of wrong answers at the cost of missing out on some right ones, that is, high precision. Making this decision will depend on the context of your product and the stakes of the situation. Weighing the cost of false positives and false negatives is a critical decision that will shape your users’ experiences.18 For example, in a banking system that classifies customers as loan defaulters or not, you might be better off making fewer but highly confident predictions. It would be better to optimize this system for higher precision. On the other hand, a music streaming app occasionally recommending a song that the user doesn’t like isn’t as important as showing a large selection. You might optimize such a system for higher recall.

You will need to design your product’s experiences for these tradeoffs. Make sure to test the balance between precision and recall with your users.

On the left, is an image of several smileys, which means true positive, and a sad face, which means false positive. Inside the circle are all smiley faces. Outside the circle, there are 3 scattered sad faces and 2 smileys. While on the right side, along with the 7 smileys, are 2 sad faces inside the circle, and 4 sad faces scattered outside the circle.

Figure 9-13

Precision vs. recall tradeoff. (Left) Optimizing for precision: Model classifies no false positives but misses some true positives. (Right) Optimizing for recall: Model covers all true positives but includes some false positives

Note

The preceding are some of the most popular metrics AI teams use. However, there can be various additional metrics specific to certain AI techniques.

AI Capabilities

The AI techniques discussed previously enable ML teams to build capabilities that help product teams solve various user problems. The following are some of the most common capabilities used to build AI products:
  1. 1.

    Computer vision (CV)

     
  2. 2.

    Natural language processing (NLP)

     
  3. 3.

    Speech and audio processing

     
  4. 4.

    Perception, motion planning, and control

     
  5. 5.

    Prediction

     
  6. 6.

    Ranking

     
  7. 7.

    Classification and categorization

     
  8. 8.

    Knowledge representation

     
  9. 9.

    Recommendation

     
  10. 10.

    Pattern recognition

     

Computer Vision (CV)

Computer vision (CV) is the ability of AI systems to see by enabling them to derive meaningful information from digital images, videos, and other visual inputs. AI systems can use this information to make predictions and decisions and take action. For example, Airware was a start-up that conducted an automated analysis of drone images for mining and construction industries. They used computer vision on drone imagery to analyze safety compliance, maximize fuel efficiency, calculate stockpile volumes, and recognize storm damage.19 Google Lens uses computer vision algorithms to identify what’s in a picture.

The following are some key applications of computer vision:
  1. 1.

    Image classification

    This is the ability to classify images by parameters like what the image contains, objects, people, and other information. For example, an intelligent photo album might use image classification to categorize images into landscapes, photos of people, animals, etc.

     
  2. 2.

    Face recognition

    Face recognition is a method of identifying an individual using their face. While these systems raise ethical concerns, facial recognition is fairly common, from iPhone’s Face ID to government security and surveillance.

     
  3. 3.

    Object detection

    This is the ability to detect objects in an image, for example, detecting the position of cars and pedestrians in an image. They can be used to tell where objects are present in an image by drawing a polygon, often a rectangle, around them. Object detection is used in many applications like self-driving cars, radiology, counting people in crowds, robotic vacuum cleaners, and even precision farming.

     
  4. 4.

    Image segmentation

    Image segmentation is similar to object detection, but you can segment an exact boundary instead of drawing a rectangle around the detected object in an image or video. This technique is commonly used in VFX as well as many photo and video editing applications. The ability to blur your background on a video call or change the background to a beach is enabled by image segmentation.

     
  5. 5.

    Object tracking

    This is the ability to track the movement of objects in a video. You can detect where things are going, like people on the camera feed, microorganisms in a cell culture, traffic, etc.

    From left to right are photographs of a puppy on a wooden floor, a cat and a dog facing each other on the grass. Each is in a square detector. It is titled, Is there a dog and where is it question mark. The last is an image segmentation of the same cat and dog. It is titled, which pixels belong to the dog question mark.

    Figure 9-14

    Computer vision applications: image classification, object detection, and image segmentation. Source: https://bdtechtalks.com/2021/05/07/attendseg-deep-learning-edge-semantic-segmentation/

    On the left side is a photograph of the face of a woman, in a square, as part of face recognition. The face has dots interconnected by lines. On the right side is an aerial photograph of vehicles on the road that go in different directions and are tracked using cameras.

    Figure 9-15

    Computer vision applications. (a) Face recognition. Source: https://blogs.microsoft.com/on-the-issues/2020/03/31/washington-facial-recognition-legislation/. (b) Object tracking of vehicles in the camera footage. Source: https://aidetic.in/blog/2020/10/05/object-tracking-in-videos-introduction-and-common-techniques/

     

Natural Language Processing (NLP)

Natural language processing (NLP) is the ability for AI to pull insights and patterns out of written text. NLP is one of the most valuable capabilities for AI systems and is used in many applications like AI assistants, search engines, chatbots, sentiment analysis, translation, etc. For example, voice assistants like Alexa, Siri, Cortana, or Google Assistant use NLP to act on your commands. Everlaw is a cloud-based AI application that helps lawyers research documents and prepare for trials by extracting information from legal documents, highlighting which documents or their parts are important. They automate repetitive tasks for lawyers and paralegals by identifying documents relevant for trial by topic, translating documents into other languages, clustering documents by topics, and audio transcription.20

The following are some mainstream applications of NLP:
  1. 1.

    Text classification

    This is the ability to categorize text by its contents. For example, a spam detection tool would use the email’s subject and content to classify it as spam or not spam. Text classification can also be used to automatically categorize products by their description or translate restaurant reviews to star ratings.

     
  2. 2.

    Information retrieval

    Information retrieval enables systems to extract information from unstructured text in documents or websites. This technique is used in various search applications like web search, finding documents on your computer, searching for people on social media, etc.

     
  3. 3.

    Entity recognition in text

    This is the ability to recognize entities in text. For example, you can use entity recognition to find and extract names, places, addresses, phone numbers, dates, etc. from textual information. Entity recognition is commonly used in chatbots to extract information like names, addresses, and queries to decide appropriate responses.

     
  4. 4.

    Machine translation

    Machine translation is used to translate text from one language to another. For example, Facebook uses machine translation to translate comments on posts. Google Translate uses this technique to translate between languages. The idea of machine translation is sometimes extended to different input-output mappings like translating words to images, sounds, and other forms of media.

     
  5. 5.
    You can also use NLP to tag parts of speech like nouns, verbs, etc., sentiment analysis, etc. Grammarly is a writing assistant that uses various NLP techniques to identify mistakes and which parts of the text to change, highlight which parts to pay attention to, etc.

    On the left side is a page of food reviews, which have 5 stars as the highest, 3 stars as good, and 1 star as a terrible or bad review. On the right hand side, there is a page of search results for A I groups on Facebook, where 3 groups are illustrated on the screen.

    Figure 9-16

    NLP applications. (a) Text classification: The content of the review is automatically classified as a rating. (b) Information retrieval: Searching for groups on Facebook. Source: Facebook app

    On the left side, is a page with a paragraph with legends, which are Person, Loc, Org, Event, Date, and Other. Each that suits a part of the paragraph is highlighted. On the right hand side, there is a Google search result for, how do I say artificial intelligence in French, where the result is, intelligence artificielle.

    Figure 9-17

    NLP applications. (a) Text entity recognition: The algorithm identifies and tags entities like people, locations, events, etc. in the paragraph. Source: www.analyticsvidhya.com/blog/2021/11/a-beginners-introduction-to-ner-named-entity-recognition/. (b) Machine translation. Source: Google Translate

     

Speech and Audio Processing

This is the ability for AI to convert speech to text and text to speech (TTS) and extract information from audio files. Speech processing is often used alongside NLP in various AI applications like voice assistants, call center analytics, automatic subtitle generation, etc. For example, voice assistants like Alexa, Siri, Cortana, or Google Assistant use speech processing to recognize and convert your voice into text. YouTube uses speech processing to generate automatic subtitles on videos. Spleeter is an open source software made by the music streaming service Deezer that uses speech and audio processing to automatically separate vocals and different instruments from songs.

The following are some typical applications of speech and audio processing:
  1. 1.

    Speech to text (STT)

    This technique is used to convert speech from audio into text. For example, Otter.ai is a service that uses STT to transcribe audio files and voice calls automatically.

     
  2. 2.

    Wake word detection

    A wake word is a special word or phrase meant to activate a device when spoken.21 It is also known as a trigger word or hot word. Voice assistants like Siri or Amazon’s Alexa use wake word detection to identify phrases like “Hey, Siri” or “Alexa” to start interacting with them.

     
  3. 3.

    Speaker ID

    This is the ability of AI systems to identify who is speaking. Speaker ID is commonly used in automatic transcription tools, voice assistant devices meant for multiple users, and voice authentication.

     
  4. 4.

    Audio entity recognition

    Audio entity recognition detects various types of entities in audio files like different instruments, people’s voices, animal sounds, etc. This technique is commonly used in music production to separate instruments and vocals. Some advanced home theater systems also use some form of audio entity detection to fire different types of sounds from different speakers and simulate surround sound. Audio detection can also be used for ensuring safety in industrial environments by detecting a gas leak, pipe bursts, etc.

     
  5. 5.

    Speech synthesis

    Also known as text to speech (TTS), this is the ability to generate speech from text. This capability is commonly used in voice assistants to respond to user commands or provide information. It is also valuable for accessibility applications like read-aloud. TTS eases the Internet experience for one out of five people who have dyslexia, low-literacy readers, and others with learning disabilities by removing the stress of reading and presenting information in an optimal format.22

     

Starting from the left, Image A is the welcome page of Spleeter, which has 4 categories: the vocal track, drum track, bass track, and other track. B is a clipart image of a man that gives verbal details, converted to written text. C is a photo of an Amazon Alexa device.

Figure 9-18

Speech and audio processing applications. (a) Audio entity recognition: Spleeter’s service automatically splits an audio track by vocals, instruments, and other sounds. Source: https://deezer.io/releasing-spleeter-deezer-r-d-source-separation-engine-2b88985e797e. (b) Speech to text. (c) Alexa on Amazon Echo uses multiple techniques like wake word detection, speech to text, speech synthesis, and speaker ID, along with other AI capabilities. Source: Photo by Lazar Gugleta on Unsplash

Perception, Motion Planning, and Control

Robotics is a branch of technology that deals with the design, construction, operation, and application of robots. Primarily used in robotics, this is the ability of AI systems to plan and navigate spaces through sensors and control actuators. An actuator is a component of any machine that enables movement like a robot’s limbs, steering wheel in a self-driving car, propellers on a drone, etc. Perception helps a robot figure out what’s around it, motion planning allows the robot to map the best route, and control allows it to send commands to actuators. Robots are used in many industrial applications like warehouse robots, assembly lines in factories, and delivering products using drones.

While the most common illustration of a robot is one that resembles the human form, our homes, factories, and surroundings are filled with different types of robots. Here are some examples:
  1. 1.

    A robotic vacuum cleaner would scan its environment to create a map of a room along with obstacles. It would then plan its motion to navigate and clean the space in the most optimal manner. It would use its vacuum attachments to perform the task of cleaning the room.

     
  2. 2.

    Zipline is a start-up that uses self-flying drones to drop off blood where it is needed. This is especially useful in emergency scenarios where an ambulance or truck might take a long time to reach or it is dangerous to get to the place.23

     
  3. 3.

    Amazon has more than 200,000 mobile robots working inside its warehouse network.24 Many of these robots carry shelves of products from worker to worker, read barcodes, and pick up and drop off items from and on the conveyor belt.

     
  4. 4.

    Self-driving vehicles are essentially autonomous robots on wheels. They use various sensors like cameras, lidar scanners, etc. to build a map of their surroundings, do motion planning, and use actuators like the steering wheel, headlights, brakes, etc. to navigate.

     

Photographs A, B, and C depict a robotic vacuum cleaner, two robotic arms, and a two-seater self-driving car, respectively.

Figure 9-19

Examples of robots using perception, motion planning, and control. (a) Robotic vacuum cleaner. Source: Photo by YoonJae Baik on Unsplash. (b) Robots in a factory. Source: Photo by David Levêque on Unsplash. (c) Google’s self-driving car. Source: https://commons.wikimedia.org/wiki/File:Google_self_driving_car_at_the_Googleplex.jpg

Prediction

Prediction is the process of filling in the missing information. Prediction takes the information you have, often called “data,” and uses it to generate information you don’t have.25 This is one of the most valuable and powerful capabilities of AI systems. For example, AI models are used in systems that predict flight prices or the number of orders while managing inventory. A ubiquitous example is the smartphone keyboard that uses predictive models to suggest the next word. Furthermore, predictions can be used to make an investment, detect and correct health disorders, or choose the right word in a situation.

The following are some examples of predictions:
  1. 1.

    Airbnb predicts the probability of getting a booking based on price, date, photos, location, and unique listing features.

     
  2. 2.

    Cardiogram is a personal healthcare assistant that detects atrial fibrillation and predicts strokes using data from a smartwatch.26

     
  3. 3.

    Instacart’s partner app helps personal shoppers predict the best route in-store to collect groceries as fast as possible.

     

Ranking

You can use AI to rank items, especially when it is difficult to determine a clear ranking logic. Ranking algorithms are used in search engines to decide the order of results. PageRank is an algorithm used by Google Search to rank web pages in search results. Ranking is also used along with other AI applications like product recommendation systems to decide the order of items suggested.

Classification and Categorization

This is the ability of AI to categorize entities into different sets. Categorization can be used for several applications like sorting vegetables from a pile or detecting faulty products in an assembly line or by photo apps to classify images into landscapes, selfies, etc. Clustering is a common technique used for generating a set of categories. For example, in an advertising company, you can use clustering to segment customers based on demographics, preferences, and buying behavior.

Knowledge Representation

Knowledge representation is the ability of AI systems to extract and organize information from structured and unstructured data like web pages, books, databases, real-world environments, etc. This capability enables AI systems to understand relevant relationships and interactions between people, entities, and events. A rich search engine results page (SERP) is an example of knowledge representation in action. For example, when you search for a famous person like “Alan Turing” on Google Search, the result shows you rich information about his date of birth, partner, education, alma mater, etc.

On the left is a search bar, with a search for running shoes and predictions below. Besides it, there is a page of available flight bookings for different airlines, sorted based on price. On the right hand side, there is a search result in images, texts, and related searches for Socrates. At the bottom is a page with a search bar and recommended photographs below.

Figure 9-20

(a) Prediction: Google Search automatically predicts the query. Source: Google search on Desktop. (b) Ranking: Flight booking website ranks results by relevance. (c) Classification: Google Photos automatically sorts images by things in them. Source: Google Photos app on iOS. (d) Knowledge representation: Knowledge panel on Google Search. Source: Google Search on Desktop

Recommendation

Recommendation is the ability of AI systems to suggest different content to different users. For example, Spotify suggests what songs to listen to next, or Amazon recommends which books to buy based on previous purchases and similar buying patterns. Recommendation is a very valuable capability and is closely related to prediction. You can also use recommendations to surface content that would otherwise be impossible for users to find on their own. Many social media applications like TikTok or Instagram use recommendation algorithms to generate a continuous stream of dynamic content.

A is a page on Netflix with a list of movie or series recommendations. Below, on the left part, are recommended articles on Medium homepage, and on the right side are book recommendations on Amazon.

Figure 9-21

Recommendation. (a) Netflix allows users to choose a title from their personalized suggestions. Source: www.netflix.com/. (b) Users can choose from articles recommended for them on Medium. Source: https://medium.com/. (c) Amazon shows multiple choices of results in its book suggestions. Source: https://amazon.in/

Pattern Recognition

This is the ability of AI systems to detect patterns and anomalies in large amounts of data. Detecting anomalies means determining specific inputs that are out of the ordinary. For example, AI systems can help radiologists detect lung cancer by looking at X-ray scans. Banks use software that detects anomalous spending patterns to detect credit card fraud. Market researchers can also use pattern recognition to discover new segments and target users.

Eight X-ray images of different analyses of human lungs. Atelectasis is circled on the upper right part, while on the center part is cardiomegaly. The bottom right is effusion. On the central left part are infiltration, mass, and nodule. In the center-right part is pneumonia, and in the upper left part is pneumothorax.

Apart from the preceding capabilities, machine learning is also used broadly to find patterns in data. Many of the examples discussed previously relate to unstructured data. The popular press frequently covers AI advancements applied on unstructured data like detecting cats from videos, generating music, writing movie scripts, etc. These examples often appear human-like and are easily understandable by most people.

However, AI is applied at least as much or more to structured data, that is, data is a standardized format for storing and providing information. Structured data also tends to be more specific to a single organization. AI on structured data is creating immense economic value to organizations. But since it is harder for popular media to understand AI advancements on structured data within companies, it is written about much less than AI advancements over unstructured data.

AI is evolving, and many new techniques will unlock newer capabilities. A large part of what AI teams do is figure out problems to solve with AI and match appropriate techniques and capabilities to the solution. Doing this well requires lots of iterations, time, and effort. Most large AI products combine multiple capabilities to provide a solution. For example, an AI voice assistant might use speech processing to listen for voice commands and convert them to text, NLP to understand and act on the command, and, finally, speech synthesis to respond to the user. Many of the preceding capabilities might be further combined to realize new AI methods. For example, you could extend the idea of machine translation from translating languages to translating images into words through automatic captioning, generating poetry, or even translating words back to images by creating AI artworks.

Summary

Product designers and product managers need to understand what their tech teams do and speak their language to work effectively within AI teams. In this chapter, we looked at some mainstream AI techniques, metrics used to evaluate the success of AI algorithms, and common capabilities of AI. Here are some important points:
  1. 1.
    There are different ways to implement AI, but at a high level, they are based on two approaches, namely, rule-based and examples-based.
    1. a.

      In the rule-based approach, the AI is programmed to follow a set of specific instructions.

       
    2. b.

      In the examples-based approach, learning happens by giving the AI a bunch of examples of what it will encounter in the real world and how to respond.

       
     
  2. 2.

    A significant amount of progress in AI techniques has been made in the subfield of machine learning (ML).

     
  3. 3.
    The following are some of the mainstream AI techniques:
    1. a.

      Supervised learning

      In supervised learning, you train the AI by mapping inputs to outputs. We use labeled data to build the model.

       
    2. b.

      Unsupervised learning

      In unsupervised learning, there is no labeled data. We teach machines to learn directly from unstructured data coming from their environments.

       
    3. c.

      Reinforcement learning

      In reinforcement learning, the AI learns by rewarding it for positive behavior and punishing for negative behavior. The AI learns by trial and error.

       
    4. d.

      Deep learning and neural networks

      For complex cases like predicting the price of a house based on input parameters like size, the number of rooms, location, distance from schools, etc., learning happens in multiple steps. Each step is a layer in the neural network. Learning with multiple layers is known as deep learning.

       
    5. e.

      Backpropagation

      In backpropagation, feedback on a neural network propagates back through the layers of neurons. This information is used to recalibrate the model’s weights.

       
    6. f.

      Transfer learning

      Transfer learning is the ability of an AI program to transfer what it has learned about one task to help it perform a different, related task.27

       
    7. g.

      Generative adversarial networks (GANs)

      GANs are a subset of deep learning that generates new data by pitting two AI algorithms against each other. GANs have many applications in the VFX and entertainment industries. You can use GANs for various tasks like synthesizing new images from scratch, generating videos (or deepfakes), restoring images, self-driving cars, video game AIs, and sophisticated robotics.

       
    8. h.

      Knowledge graphs

      A knowledge graph, also known as a semantic network, represents a network of real-world entities—that is, objects, events, situations, or concepts—and illustrates their relationship.

       
     
  4. 4.

    Lack of complete information is an insurmountable challenge with real-world AI systems. Very little of our knowledge is entirely certain; we don’t know much about the future.

     
  5. 5.

    We only need to know the best possible action given the circumstances. In other words, it is much more important for AI models to be useful than perfect.

     
  6. 6.
    AI product teams use some common metrics and terms to measure the usefulness of their models:
    1. a.

      Accuracy

      Accuracy is the fraction of predictions our model got right. It is the proportion of correct predictions out of all predictions.

       
    2. b.

      Precision

      Precision is the proportion of true positives correctly predicted out of all true and false positives. Precision determines how much you should trust your model when it says it’s found something.

       
    3. c.

      Recall

      Recall is the proportion of true positives correctly predicted out of all true positives and false negatives.

       
     
  7. 7.

    In most real-world scenarios, you will not get a system that both is completely precise and has a hundred percent recall. Your team will need to make a conscious tradeoff between the precision and recall of the AI.

     
  8. 8.
    The AI techniques discussed previously enable ML teams to build capabilities that help product teams solve various user problems. The following are some of the most common capabilities used to build AI products:
    1. a.

      Computer vision (CV)

      Computer vision (CV) is the ability of AI systems to see by enabling them to derive meaningful information from digital images, videos, and other visual inputs.

       
    2. b.

      Natural language processing (NLP)

      Natural language processing (NLP) is the ability for AI to understand natural language by pulling insights and patterns out of written text.

       
    3. c.

      Speech and audio processing

      This is the ability for AI to convert speech to text and text to speech and extract information from audio files.

       
    4. d.

      Perception, motion planning, and control

      Primarily used in robotics, this is the ability of AI systems to plan and navigate spaces through sensors and control actuators. Perception helps a robot figure out what’s around it, motion planning allows the robot to map the best route, and control allows it to send commands to actuators.

       
    5. e.

      Prediction

      Prediction is the process of filling in the missing information. Prediction takes the information you have, often called “data,” and uses it to generate information you don’t have.28

       
    6. f.

      Ranking

      You can use AI to rank items, especially when it is difficult to determine a clear ranking logic. Ranking algorithms are used in search engines to decide the order of results.

       
    7. g.

      Classification and categorization

      This is the ability of AI to categorize entities into different sets.

       
    8. h.

      Knowledge representation

      Knowledge representation is the ability of AI systems to extract and organize information from structured and unstructured data like web pages, books, databases, real-world environments, etc.

       
    9. i.

      Recommendation

      Recommendation is the ability of AI systems to suggest different content to different users.

       
    10. j.

      Pattern recognition

      This is the ability of AI systems to detect patterns and anomalies in large amounts of data.

       
     
  9. 9.

    The popular press frequently covers AI advancements applied to unstructured data like detecting cats from videos, generating music, writing movie scripts, etc. However, AI is used at least as much or more on structured data, which tends to be more specific to a single organization and is creating immense economic value.

     
  10. 10.

    Most large AI products combine multiple capabilities to provide a solution. A large part of what AI teams do is figure out problems to solve with AI and match appropriate techniques and capabilities to the solution.

     
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.162.26