To make robots practical, flaws must be removed. To make robots endearing, flaws must be added. – Khang Kijarro Nguyen In this chapter we offer nine introductory concepts used in the fields of AI and ML. We also discuss the successful application of this knowledge in four core areas. You will see that all of these concepts are minor variations of each other. We present them for the sake of completeness and creating familiarity with the jargon, thinking, and philosophy behind Machine Learning and Artificial Intelligence. The two chief methods of making inferences from data are rule-based systems and ML. It is useful for marketing professionals to have some knowledge of both methods. ML did not replace rule-based systems, but rather became another tool in the marketer’s toolbox. Rule-based systems still have their place as a simpler form of AI, so marketing professionals can reasonably consider using one or the other, or even both. Rule-based systems can store and manipulate information for various useful purposes. Many AI applications use rule-based systems. The term generally applies to systems that involve manmade rules, featuring a series of IF–THEN statements, such as IF “A,” THEN “B” else IF “C,” THEN “D” and so forth. In terms of real-world applications, a rule-based program might tell a banker, “IF the loan applicant has a credit score below 500, THEN refuse the loan, ELSE offer the loan.” Using a set of data and a set of rules, programmers can build useful marketing tools such as approval programs and recommendation engines. In most cases, rule-based systems require the knowledge of human experts in the given field. That’s why expert systems are rule-based. A downside of rule-based systems is that they can be cumbersome, since a rule needs to be made for each data point, and life involves so many special cases. For instance, IF “A” says, “It’s raining,” THEN “B” might say, “Recommend an umbrella.” But what if it isn’t raining very hard? Or what if the rain is a hurricane with super-strong winds that would break even the toughest umbrella? Or what if the rain is just a brief summer shower of less than five minutes? Or what if the customer lives in a place where it almost never rains? Or what if the customer recently purchased an umbrella? In these cases, recommending an umbrella would be impractical or even foolish. Another issue with rule-based systems is that sometimes the data changes faster than programmers can create new rules. For instance, a recent news story reported that a major flood occurred involving two feet of rain on the Hawaiian island of Kauai, causing major mudslides and many homes to be destroyed. A return to the Yahoo home page produced a sponsored ad displayed for discounted flights to – you guessed it – Kauai. In one way, the ad was “intelligently” based on real-time interaction (a just-read article about Kauai); but in another way, not so much, as why on earth would a reader want to go there now? This brings us to another issue: a strict reliance on keywords may not be all that is needed to apply the right ads to the right realities. That’s where ML comes in. ML can address the problems inherent in rule-based systems, by focusing on the outcomes only, as opposed to the entire thought processes of human experts. Where rule-based systems are deterministic, ML systems are probabilistic, based on statistical models. An ML system uses historical data to ask the following question: Given what we know about past events, what can we determine about future events? In the future, this type of probabilistic information will be used for better prediction of weather conditions, among other things. Although ML may be better in the long run, rule-based systems can still be appropriate for faster solutions and workarounds. What’s more, many marketing projects begin by using an expert system, in order to better understand the system itself. Rule-based systems are still useful for occasions where all decision-based situations are known in advance, but ML algorithms can adjust the rules for you as they “learn” and improve at the task. 72% of business leaders believe artificial intelligence is a “business advantage.” – “2018 AI Predictions: Practical AI,” PwC, September 6, 2017 An inference engine is an ML system that utilizes automatic rule inference. Put simply, an inference engine applies logical rules to the data, in order to deduce future outcomes. A typical ML system (not to mention a typical rule-based system) is made up of three components: The first inference engines were features of expert systems, meaning human experts were still needed to supply and analyze the data. These days, an ML algorithm can do what human experts do with rule-based systems. With ML, each new data point added to the knowledge base can trigger additional rules within the inference engine. An ML inference engine works by either forward chaining (deducing future outcomes from known facts) or backward chaining (deciding on a goal, and deducing which facts would have to be in place to achieve that goal). Popular real-world applications of inference engines include classification, chemical analysis, medical diagnosis, financial management, credit authorization, petroleum engineering, and product design, to name just a few. The main difference between a rule-based system and an inference engine is that a rule-based system classifies data from an inputted set of rules, whereas an inference engine applies its own rules to existing data. Listening to the data is important . . . but so is experience and intuition. After all, what is intuition at its best but large amounts of data of all kinds filtered through a human brain rather than a math model? – Steve Lohr, New York Times reporter and part of the team awarded the Pulitzer Prize in 2013, in “Sure, Big Data Is Great. But So Is Intuition,” New York Times, December 29, 2012 A heuristic is experiential knowledge that is captured as an algorithm. Heuristics include psychological shortcuts, or a plausible reasoning process. Basically, heuristics make practical use of our normal human inclination to make things move faster and/or become easier. Sometimes a heuristic solution is as good as (if not better than) the optimal solution, as it speeds up the process while still achieving an acceptable result. A workaround is one example of heuristic technique. Other examples of heuristic approaches are rules of thumb, trial and error, educated guesses, and good old-fashioned common sense. Most heuristics can be applied to marketing strategies. Heuristics make good ground rules for expert systems. Warmer weather may result in sales of cold food products, water, fans, and swimwear. This is a simple heuristic grounded in experiential and observational knowledge. This heuristic provides an overarching strategic framework that can then be used by an ML algorithm to figure precisely how much of each item to stock in a store. Much of human understanding and knowledge is this heuristic knowledge. Some of it is accurate, some of it is biased, some of it is deeply flawed. The inaccuracies are part of the structure and organization of human knowledge. This “fuzziness” creates possibilities, wild extrapolations, and leaps of faith that are usually associated with great human progress. Nothing about Mozart or Einstein, or Steve Jobs or Elon Musk, has been linear. It is this nonlinear, and wildly discontinuous trajectory that has been the signature of human progress. These discontinuities are not the result of an ordered structural sound logical system in place, but are more the result of an inaccurate and incomplete understanding of life and its experiences. The pieces of a kaleidoscope may all be broken, but the images they create are stunningly complete. Q: How many programmers does it take to change a lightbulb? A: None! That’s a hardware problem. Whether or not we are aware of it, our thought processes (and the thought processes of all living systems) follow a hierarchal scheme. The decision-making process is approached in a hierarchal fashion. Actually, all learning is hierarchal. It is simply an evolutionary feature of our beings to follow a learning hierarchy of increasing complexity, whether mental or physical in nature. For instance, to learn complex math, you must first master easier math. Or when learning a language, you start by learning the letters of the alphabet, then you learn how to string those letters together to form words, then you learn how to string those words together to form sentences, and so on. The fact that all learning is hierarchal seems obvious, when you think about it. We cannot run before we learn to walk. In 1956, educational psychologist Robert Gagn proposed a learning classification system where facets of learning were based on their increasing complexity. He outlined eight increasingly complex types of learning, and hypothesized that each type of learning in the hierarchy depended on having mastered the types of learning prior to it. Gagn’s eight types of hierarchal learning are as follows: You may notice that the first four types of learning listed above are more behavioral in nature, while the second four are more cognitive. AI systems utilize these various learning schema in different contexts. That is to say, handwriting recognition, speech recognition, and face ID may use a certain set of features and learning methods versus determining the best way to beat traffic in getting from point A to point B. Deep Learning systems, like all learning systems, function within a hierarchy. Hierarchical Deep Learning (HDL, and nothing to do with cholesterol) can be supervised, semisupervised, or unsupervised. HDL systems often involve artificial neural networks. Current applications of Deep Learning systems include document classification, image classification, article categorization, and sentiment analysis, to name only a few. Sixty-one percent of those who have an innovation strategy said they are using AI to identify opportunities in data that would otherwise be missed. Only 22% without a strategy said the same. – “62% of Organizations Will Be Using Artificial Intelligence (AI) Technologies by 2018,” Narrative Science, July 20, 2016 In AI, an expert system is basically a database of expert knowledge that incorporates the decision-making ability of a human expert. The system works by way of a series of IF–THEN rules. An expert system is a rule-based system, although not all rule-based systems are expert systems. For example, a chess computer for beginners is a very weak program that “knows” all the rules of the game, and will therefore always make legal moves following a rule-based system, but the program has no strategic or tactical skills, and cannot “learn” from its own mistakes. It may even have additional rules such as “IF the user offers a draw, THEN accept the draw.” Or “IF a move is checkmate, THEN play a different move.” Some beginners’ chess programs are designed never to beat the beginner. There are typically three parts to an expert system: Applications of expert systems include debugging, design, diagnosis, instruction, interpretation, monitoring, planning, prediction, and repair (among others). A chief disadvantage of an expert system is the knowledge acquisition process. It can be difficult to get experts to go through all this information, not to mention prohibitively expensive to hire experts for as many hours as it would take them to supply and analyze all the data. What’s more, you may also have to hire a mathematician or a data scientist to write the algorithm. Still, in the fields of finance, games, management, marketing, and innovation (to name only a few), today’s best expert systems can outdo the world’s cleverest humans. And why shouldn’t this be so? After all, expert systems don’t have egos, don’t get distracted, and don’t slow down as they grow older, to name just a few nonhuman advantages. Every time a senior person in a company retires, a library of knowledge, expertise, and learning walks out the door. Instead of conducting “exit interviews” with employees, companies may do well to create expert systems from the employees who are about to leave. These rules accumulated over a course of time can provide a valuable historical learning engine that becomes an asset, instead of an exit interview form to be buried in the document repository. Consumers use more AI than they realize. While only 34% think they use AI-enabled technology, 84% actually use an AI-powered service or device. – “New Research Reveals Deep Confusion About Artificial Intelligence,” Pega, April 4, 2017 Since the dawn of the internet, humans have inputted countless billions of data points online. Each data point provides some piece of information. The sum total of all this information is generally what is called Big Data. More commonly, the term typically refers to the body of data gathered about and associated with a specific area or function. For example: an online retailer’s assembly of information regarding customers’ purchase patterns, or a loyalty card program that tracks consumption and rewards buyers when a certain level of spending is reached. It is estimated that 90% of the information on the internet has been put there over the past two years. In fact, we create as much data every two days as the data created from the dawn of man to the year 2000. Yet, the data keeps increasing exponentially! The internet now holds around 5 zettabytes of data. Data scientists estimate that by 2020, the internet will hold 10 times that amount. Big Data refers to the entire collection of all types of digital data, from printed text to databases to sound recordings to images to sensory input and everything else. Big Data involves data sets so massive that traditional data processing systems are unequipped to deal with them. Big Data is characterized by its volume (the sheer quantity of information), variety (the many different types of information), and velocity (the speed with which this information travels). Today, Big Data pertains to advanced methods of data analytics performed by DL systems. It is the only way to make sense of this mass that is Big Data. Computers can be taught to identify patterns – by way of image recognition and Natural Language Processing (NLP) – better and faster than any human ever could. The use of Big Data is based on the principle that the more you know about something, the more reliably you can predict future outcomes pertaining to it. As more and more data points are compared, new patterns become apparent, allowing humans (and machines) to make smarter decisions. Outside of the business sector, Big Data has improved crucial human services such as healthcare, education, disaster prediction, emergency response, crime prevention, and food production, to name only a few. One downside of Big Data is a loss of privacy, and the widespread dispersal of personal information. At this point, retaining a high degree of digital privacy consistently would be more difficult than stuffing the air molecules back into a popped balloon. Big Data is here to stay. Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom. – Attributed to Cliff Stoll and Gary Schubert in Mark R. Keeler, Nothing to Hide: Privacy in the 21st Century (iUniverse, 2006), 112 Data cleansing is the process of finding and correcting (or deleting) irrelevant, corrupt, missing, duplicate, or otherwise useless data from a data set. This is a necessary step designed to purify data, so that algorithms can work faster and make more accurate predictions. Reasons for the corruption of data will vary. Among the most common causes of corruption are user error, dummy data, and workarounds. Data cleansing functions may include the enhancement, harmonization, and standardization of data. To perform data cleansing, all incorrect, incomplete, and irrelevant data must be found, then either replaced, removed, or modified, so that the data will be consistent with other data in the system. Top quality data must be valid, accurate, complete, consistent, and uniform. Disadvantages of data cleansing include the high cost, the time it takes, and security issues (data must be shared in order to be cleansed). Still, data cleansing is a necessary step toward optimizing Big Data. Once the data is cleansed, it is important to maintain efficient data management techniques. All new incoming data must conform to the existing knowledge in the knowledge base. That is why a comprehensive data management plan must also include periodic data cleansing, to catch and correct outdated information, among other things. He uses statistics as a drunken man uses lampposts – for support rather than for illumination. – Andrew Lang, Scottish poet, novelist, literary critic, and contributor to the field of anthropology Gaps in data are data fields that contain no information. Data gaps can be time-consuming for an algorithm to analyze, and the missing info may also be important to the success of a company. To speed things up and/or provide pertinent information, there are various ways to fill in the gaps in a database. There is no single right or wrong method. Popular heuristic approaches to filling gaps in data include: 80% of executives believe AI boosts productivity. – Leo Sun, “10 Stats About Artificial Intelligence That Will Blow You Away,” The Motley Fool, June 19, 2016 Learning is a function of neural networks. Once scientists figured out the architecture of neural networks, certain machines were embedded with artificial neural networks whose rules were followed by learning algorithms. In this way, machines were able to mimic the capabilities of the human nervous system, and Machine Learning was born. ML is basically an application of AI in which the system automatically improves from experience, without having been specifically programmed to do so. A well-written ML algorithm will access data, analyze it, and use it to improve its own performance. This is why we call it learning. Artificial neural networks are typically trained by epoch, a scenario in which each data point is presented only once to the system. After learning, the artificial neural network is able to perform the function of generalization. ML methods are available in three basic flavors: ML is based on the expert design of precise and efficient prediction algorithms. These algorithms cause ML to perform two main functions: induction (classification of data) and transduction (labeling of data). Here are a few good reasons why marketing professionals should use ML in their marketing strategies: There is no doubt that ML is fast becoming an essential component of marketing plans. What marketers seek to know is how and when to use it. Today, just 15% of enterprises are using AI. But 31% said it is on the agenda for the next 12 months. – “2018 Digital Trends,” Adobe, 2018 ML presents marketers with many opportunities, as it features capabilities such as speech recognition, speaker verification, optical recognition, spam detection, fraud detection, first-rate recommendation systems, biological applications, medical diagnosis, and strategic game expertise, to name only a few. Here are some real-world examples:
By 2018, more than 3 million workers globally will be supervised by a so-called “robo-boss.” – Heather Pemberton Levy, “Gartner Predicts Our Digital Future,” Gartner, October 6, 2015 “How did we not know that?” goes the all-too-common refrain from a large marketer, when confronted with the surprising (threatening) success of a local brand competitor. Often the reason for this success is based upon speed – the superior ability of the local brand to understand the market in detail; identify latent consumer needs and wants; create products best-suited to the market; launch them efficiently; and adapt to changing consumer choices and preferences. It is this overall agility that frequently lies at the core of the “big versus small” contest. So how can Goliath gain back the advantage over David? Increasingly, the answer will be driven by the adoption and savvy application of AI and ML methodologies and resources. These technologies’ ability to conduct more in-depth and faster market analysis; their superior capabilities in identifying hitherto-unseen trends; their capacity to search out and deliver data that can lead to innovations in product development, formulation, naming, and packaging; and their similar skills at divining marketing concepts and strategies, and even executional ideas, are all assets that can level the playing field – and then turn it to a company’s advantage. By 2020, smart agents will manage 40% of mobile interactions. –Heather Pemberton Levy, “Gartner Predicts Our Digital Future,” Gartner, October 6, 2015 “Social listening” occupies many businesses today – the idea being, that if we listen long enough, and closely enough, and for enough time on social media platforms, we’ll hopefully understand where our customers are, what their needs are, what their aspirations are, and maybe we’ll be better able to respond to them. Problem is, that takes immense amounts of time. And the conversation (monologue, really) changes in real time. It is always evolving, taking unforeseen twists and turns, going off on tangents. Substitute the word “humans” for “consumers,” and it’s clear that that’s how we behave. AI and ML automate the social listening process and perform it faster and with better results than other methodologies. Better, in the sense that by using AI and ML, a company can identify what they’re looking for, and then find it, with greater accuracy and speed and relevance. AI and ML excel at this kind of challenge. The technologies are literally built for it. Even with the growing degree of digital advertising we see today, it remains that advertising overall is an expensive proposition. Factor in the hundreds of thousands of dollars spent on creative execution for just one major TV commercial and then the millions spent on media placement of that spot, and soon you’re talking some serious money. Every ad is a roll of the dice, in the sense of what screenwriter William Goldman meant when he famously said that in Hollywood, “nobody knows anything.” All the market research, the focus groups, the online surveys, the test markets, and assorted other means of measurement cannot, and do not, guarantee success. We rely on experience, instinct, our best judgment, and other “soft” metrics to gauge what is worth taking the risk to produce and place in terms of commercial messaging. Sometimes we hit the jackpot; most other times we achieve a pretty reasonable rate of return on our investment. And then, sadly, other times, not so much. AI and ML resources and methodologies can inject a meaningful amount of more assurance into this picture. How? Because, when linked to proven tools used to measure the effectiveness of certain elements in an ad in terms of their resonance in the non-conscious mind, AI and ML can identify which elements are likely to be the most productive, thereby rendering a more reliable guide as to what to spend money on in a spot. One specific, and very important, asset that AI and ML bring to the advertising table is metaphor analysis and implementation. Neuroscientists and linguistic experts agree: metaphors are essential ways in which human beings make sense out of the world around us and express widely shared, non-conscious truisms about life, love, death, and which deodorant lasts the longest. Okay, admittedly that last bit is a stretch – but for a reason. Understanding what metaphors are and how they work in the non-conscious mind, and then applying that knowledge in concert with sophisticated (proprietary) metaphor databases and AI and ML tools, a marketer can divine powerful ways to tap into and exploit the underlying desires, concerns, and needs of a potential prospect who would be the target for a deodorant product. In fact, that metaphor and algorithm-powered system can also tease out possible new product innovations in the personal care category. It can isolate which product formulations and packaging elements (color, scent, consistency, etc.) are likeliest to work best in such a highly competitive and crowded category. It can supercharge the naming development process. It can point toward, and even discover, effective concepts and strategies for point-of-sale promotions. And much more. Metaphors can be used to communicate core product benefits and attributes in a unique, and uniquely effective, kind of non-conscious “shorthand.” They can do that visually, and they can do that aurally. They can optimize advertising effectiveness, especially since they can be activated in very short time spans. As a result, they can also optimize advertising investments in terms of media buying. They are tools that the sharpest marketers will increasingly put to use for competitive advantage. Consumer data will be the biggest differentiator in the next two to three years. Whoever unlocks the reams of data and uses it strategically will win. – Angela Ahrendts, Senior VP of Retail at Apple, “Demonstrating Value and Measuring Success from Data Science,” Capitaresourcing.co.uk/blogs, April 13, 2017 This classic and often-repeated advice from American business consultant Peter Drucker appeared again in Forbes on July 2, 2006: “Because the purpose of business is to create a customer, the business enterprise has two – and only two – basic functions; marketing and innovation. Marketing and innovation produce results; all the rest are costs. Marketing is the distinguishing, unique function of the business.” Unearthing innovations – in concepts, strategies, products, and messaging – is a function that AI and ML are exceptionally suited for. Investments in research and development, and marketing, loom large for many companies. A methodology – especially one that is founded in hard science – that can rationalize and optimize those investments, offer proven and powerful frameworks for them to operate in, and make them more efficient and effective can be exceptionally well worth exploring and exploiting. Algorithmic unearthing of consumer insights and latent desires can be accelerated and “supercharged” in terms of the breadth and depth of learnings that AI and ML can deliver. Whether we are based on carbon or on silicon makes no fundamental difference; we should each be treated with appropriate respect. – Arthur C. Clarke, 2010: Odyssey Two (Rosetta Books, 1982) Every marketer of any experience knows the inherent shortcomings of obtaining articulated responses from consumers through focus groups and surveys. While results can be useful, they are inevitably afflicted with the simple human virus of unreliability and uncertainty. Asking consumers what they think, feel, and believe about a product or a marketing message may – or may not – produce answers that actually reflect the truth in every instance. Consequently, relying on those results for new product development and other innovations may – or may not – end up producing a success. In other words, counting on conscious responses from consumers for the co-creation process remains somewhat a hit-or-miss proposition. By seeking out and synthesizing consumers’ non-conscious needs and wants, AI and ML methodologies can help reduce uncertainty and “best-guessing.” The sheer volume of data processed is a key factor in that. The range of resources tapped into to divine those non-conscious needs and desires far exceeds what any individual, or even group of people, is capable of exploring is another factor. “Co-creating” with the non-conscious mind is a deeper and much more direct path to uncovering the most salient, but unsaid, fertile fields for product and marketing innovation.
Concept 1: Rule-based Systems
Concept 2: Inference Engines
Concept 3: Heuristics
Concept 4: Hierarchical Learning
Concept 5: Expert Systems
Concept 6: Big Data
Concept 7: Data Cleansing
Concept 8: Filling Gaps in Data
Concept 9: A Fast Snapshot of Machine Learning
Areas of Opportunity for Machine Learning
Application 1: Localization and Local Brands
Application 2: Value and Rationalization of Social Media Cost
Application 3: Rationalization of Advertising Cost
Application 4: Merging of Innovation and Marketing and R&D
Application 5: Co-creation
18.117.189.228