Recent research suggests that the brain builds structures on the fly—in eleven dimensions. Apparently the brain assembles neurons into structures of various shapes and complexity as it processes information and makes decisions. There seems to be a link between learning and the design of structured intelligence. This matches my experience with brain design and it matches what we know about teaching. This chapter lays the groundwork for what a brain design is and how to use one to guide learning.
The documentary film AlphaGo tells the story of Alphabet subsidiary DeepMind’s AlphaGo AI defeating reigning champion Lee Sedol in 2016 in a five-game series of Go, long considered one of the most strategically complex games ever created (even more so than chess). AlphaGo learned and discovered a system of fuzzy rules and strategies. With this understanding, AlphaGo competed with, and defeated, human champions. As it tries and learns, the AI is infinitely curious. It never needs a nap, and will never quit because of frustrated emotions.
Have you ever considered just how amazing it is that an algorithm can learn to play a video game with no prior knowledge or understanding, just by looking at the screen and the scoreboard, trying actions and receiving feedback? Let me give you an example. In 2013, an AI brain learned how to play Atari games just by looking at the screen and the score that resulted from its actions. When I first started experimenting with decision-making AI, I worked down the hall from the creator of PacMan.
Imagine you are an AI learning to play this game, with no understanding of what will happen when you interact with each object. You don’t know what your character looks like (you are the yellow chomper; see Figure 3-1), you don’t know what a ghost is, you don’t know what fruit is, you don’t know what a pellet is. You also don’t understand basic concepts about the world that we take for granted: what a game is, or a level, a life or death. When humans play PacMan for the first time, we instinctively understand that the yellow chompy thing is eating, but as an AI, you don’t know what eating is, so you miss the reference.
You are programmed to seek the maximum score over time and you learn by responding to careful observation. You observe that the score increases when you come into contact with pellets (you earn points in PacMan for chomping pellets. Your score goes up even more when you come into contact with fruit. If you chomp all the pellets on a level, you go to the next level but you don’t know what a level is, so you observe this phenomenon as the ever increasing opportunity to earn more point by coming into contact with round white objects (pellets). Then, there are ghosts. You learn from experience that if you come into contact with ghosts on three separate occasions, your opportunity to earn points in the game is permanently capped (you lose the game).
The amazing thing is that AI brains learn how to play most Atari games this way without any understanding of most of the concepts that we use to orient ourselves to a new task. They even achieved superhuman competence at most games! But there’s a problem. Learning effective strategy, even for a simple game like PacMan, is a lot more challenging without the concepts that we rely on when playing games. I can easily think of three intuitive strategies for playing PacMan.
One strategy is to avoid ghosts. To execute this strategy, you move away from ghosts to open areas of the maze that allow you the most room to move without ghosts nearby.
The second strategy is to trap ghosts. This strategy is a variant of the “avoid ghosts” strategy. You lure or draw ghosts to the corner of the screen, then enter the tunnel which transports you to the opposite end of the screen. This gives you time to eat pellets and fruit before the ghosts catch up to you in your new location.
The third strategy is to eat ghosts. In this strategy, you eat fruit, then pursue ghosts to eat them.
Now imagine how hard it is to discover and use these strategies when you don’t even know what a fruit or a tunnel or a trap is! This might be part of the reason why AI succeeded in learning some Atari games without help, but struggled to learn other games that require more strategy, like Montezuma’s Revenge, which requires the player to match multiple skills and concepts to scenarios that they perceive.
I hope that you agree at this point that complex tasks often require multiple skills and strategies, and that those skills and strategies are likely dictated by the dynamics of the task itself. Chess strategies work within and because of the game’s own rules. Basketball strategies work within and because of the dynamics of basketball. Here’s the problem: it’s proven quite difficult for humans and AI brains to practice multiple skills simultaneously.
The Ray Interference paper, published by Google DeepMind in 2019, asserts and empirically proves that AI brains that store learning in neural networks get confused and take a long time to learn multiple skills simultaneously. Games like PacMan, Montezuma’s Revenge, Chess and Go require many different skills and strategies to succeed. If the task requires multiple strategies, the paper explains, the brain will learn each skill incompletely and sequentially with long learning plateaus in between because it doesn’t know where one skill begins and another ends.
The author’s prescriptions for avoiding this confusion are to not store learning in a neural network (this is not a good idea for multiple reasons that we will explain below), use clever algorithmic learning tricks (which aren’t always feasible), or to teach skills explicitly (I’m translating their conclusion based on what I discuss about teaching in this chapter and book).
If I can teach my AI the skills and concepts that I already know, I can help ensure that the AI succeeds. It might discover the skills and strategies that I already know, but it might not. There’s no way to tell until it tries. Even if my AI does learn these skills on its own, I’ve wasted valuable time and money letting my AI discover things that I could have taught it. For example, DeepMind’s AlphaChess AI discovered the 12 most common opening sequences in chess—some of which were innovated over a thousand years ago. One thousand years! Why would we pay millions of dollars in computation costs for AI to learn strategies as well known as those?
The same thing happened with Go: AlphaGo learned a greedy strategy that any beginning player would be taught. Then the AI discovered more sophisticated known strategies. Then, much like more advanced humans, it began trading off and improvising strategies in sophisticated ways.
The whole point of teaching is helping students learn faster by telling them a bit of what you already know. Imagine if I taught my young son to play basketball by bringing him to a hoop, handing him a ball, and offering him ten cents for every time he gets the basketball to go through the hoop. He’ll try and try and try-- and probably become discouraged and quit before he finds a way to get the ball through the hoop consistently.
In fact, this method of teaching would be cruel. Why would I ever “teach” my son how to play basketball without telling him about the jump shot and the layup? These tested methods of launching the basketball through the net require practice, especially to get it right from various distances and speeds. But they provide a known set of skills to structure the practice without forcing players to reinvent (or rediscover) the wheel.
To shoot a layup, the player approaches the hoop, jumps from one leg in stride and extends the ball in their hand toward the hoop. The layup shot is a good strategy when the player is close to the basket, to ensure high chances of making the shot. I would likely teach this skill step by step through demonstration, then (most importantly) instruct my son to: (1) shoot layups when close to the basket, and (2) practice layups on both sides of the hoop.
The jump shot is used to make shots from farther distances (Figure 3-3. I would likely teach this skill by breaking it down into phases (we will discuss this later) and having my son practice the phases separately. I would then instruct my son to (1) shoot the jump shot when farther from the basket, and (2) practice jump shots from various distances. See Figure 3-4 for some examples of where on the court you might use each strategy.
I know what some of you are thinking: Kence, by doing all this teaching, aren’t you biasing the AI’s explorative practice and preventing it from discovering something creative? Yes, teaching does indeed bias exploration based on what the learner already knows about successfully completing the tasks; no, good teaching does not prevent learners from exploring creatively and discovering new paradigms. For example: Michael Jordan, generally considered the best basketball player of all time, innovated many widely imitated techniques. Did learning the jump shot and the layup stifle Jordan’s genius? No way! In fact, learning the jump shot allowed him to explore ways to refine it for specialized situations without having to relive and rediscover its entire evolution.
Let’s take another example: the composer Wolfgang Amadeus Mozart (1756-1791) is widely considered a genius of Western music. Mozart lived and composed in the Classical period, just after the Baroque period. Did learning harmonic conventions from the Baroque period (from Bach and other composers) prevent him from innovating Western music theory? Just the opposite, actually: Mozart was able to stand on the music theory he had learned, then innovate from there. Vgotsky describes this phenomena in his Theory of Zones of Proximal Development and in his work on Scaffolding. On the other hand, if you reinvent or rediscover the wheel, you have to go through the entire evolutionary process because there’s no existing knowledge to build on.
The fact that AIs playing chess and Go against themselves discovered some of the same decision-making policies that humans use suggests that these strategies couldn’t simply be artifacts of human thinking: they tap into fundamental dynamics dictated by the structure of the game itself. The same is likely true for the jump shot. As a strategy, it leverages the fundamental anatomical relationships between human fingers, hands, arms, and shoulders to make it easier to launch a basketball through a circular hoop. If human anatomy was different, that anatomy would dictate different strategies as most effective for launching basketballs.
Remember the brain that controls rock crushers from chapter 1? I suggested to my co-designer, a data scientist at a mining company, that a “black box” AI could practice on a virtual crusher simulation and come up with some very creative ways to operate the crusher better. The drawback would be limited visibility into why the AI did what it did. I explained to him that if you want an AI that can “tell you what it’s doing” in terms humans understand, you should have it practice known skills separately, then practice how to combine them. He replied, “I really, really like this decomposition approach. While I understand that a monolithic concept might come up with really novel strategies, the people and process concerns require decomposition.”
Allowing a black-box AI to practice without any instruction about which skills to practice would be like SCG saying to its boardmen: Go ahead, try whatever you want and use whatever method works best. That may work for video gamers and home bakers, but not for real decisions that have to produce a consistent product, win or lose real money, or preserve the safety of real people’s lives.
Pitak and the team at SCG told me they had come across strategies that could produce fantastic results when used at the right time, but could also damage equipment when used in the wrong way or in the wrong scenario. One of those strategies is called overshoot. You run the reaction at a higher temperature and pressure than normally advised. When you temporarily exceed guidance under just the right conditions, the reaction runs faster and better. If you use the overshoot strategy at the wrong times, you pointlessly stress the reactor because you don’t increase the speed or quality of the reaction. This is similar to sitting with bad ergonomics. Things that are under repetitive stress eventually break. That’s why these strategies are reserved for experts who are trained to use them properly.
Now, imagine a black box AI practices on a virtual chemical reactor, discovers these strategies, and begins using them. You have no way to know whether it is carrying them out as safely as an expert would. You couldn’t allow that in a workplace; you’d have to take it offline for the sake of your employees’ safety and your valuable equipment.
What if, instead, you design an AI (like the one in Figure 3-5) that learns Strategies 1 and 2, practices them separately, and then practices when to use each strategy? You’d have a way to validate that this AI can indeed behave like an expert operator. Every time it makes a decision, it “tells” you not just what it wants to do, but which strategy it is using to make that decision. Now you can follow its logic.
This process of breaking tasks down into separate skills and strategies to practice, then building them back up into complete tasks by practicing how to use the skills together, is teaching.
One of my favorite movies of all time is The Karate Kid (1984). In this movie, set in Los Angeles, Mr. Miyagi, a karate master from Okinawa, Japan, teaches the Karate blocking system to a teenager named Daniel using a really interesting method. On the first day, Mr. Miyagi invites Daniel to his house and shows him how to wax a car using a very specific circular arm motion. Then he instructs Daniel to wax his collection of cars. This is the exact circular motion required to perform soto uke, the inward middle-forearm block, but Daniel doesn’t know that. The next day, Miyagi teaches Daniel a specific technique for painting fences—which happens to be the exact up-down motion required to perform age uke, the Rising Block—and then asks him to spend all day painting a fence. The third day is house painting with a side to side motion, and on the final day of his initial training Daniel sands a wooden deck with the motion required to perform the circular block kagite uke.
Daniel, angry that he “hasn’t learned any Karate” but only has “done yard work for his Karate Master,” threatens to quit his lessons. Miyagi calls him back and demonstrates how the motions he’s practiced are used together in blocking. In a very short practice sequence, Daniel is able to assemble the skills into a coherent blocking system.
AI researchers did something very similar when teaching an AI to grasp and stack blocks with a robotic arm. They taught each of the skills separately: moving the arm laterally, extending the arm, orienting the end effector (hand) around the block, grasping the block and stacking the block. After mastering each of these skills, the AI learned how to combine the skills together extremely quickly (22,000 practice iterations compared to a similar Google experiment where the brain took 1 million practice iterations to learn the task).
So, it turns out that this architectural skill of designing autonomous AI that I practice with all these companies, that I show to professionals all around the world, and that I outline in this book is about teaching: breaking down tasks into component skills, orchestrating how they relate to each other, and sequencing them for practice. This approach, which we call machine teaching, has enabled mechanical, chemical, aerospace, and controls engineers with no previous AI experience to design AI brains that use their expertise to make high-value industrial decisions.
In the first era of computing (before the groundbreaking work of mathematician Alan Turing (1912-1954), on computer programming and reusable algorithms), you needed a unique machine for each calculation or type of decision you wanted to make. So, you might build one automaton to write a word but need to build another automaton to ride a bicycle and yet another to play a tune on a piano. In fact, you needed separate machine rolls to play each song on a player piano.
Along comes Turing who builds a master machine that can accept separate programs that contain instructions for each task. Now, I don’t need a separate machine for each task; I can simply write a new set of instructions. Turing’s Enigma Machine accepted programmed instructions on how to break Nazi German codes. This development ushered in the age of algorithms.
Era of Intelligence | Scope of Intelligence | Examples |
---|---|---|
Machine Intelligence |
Build a new machine to provide intelligence for each task. |
Automaton, IBM Voting Machine |
Algorithm Intelligence |
Write a new algorithm to provide intelligence for each task. |
Turing Computer |
Teaching Intelligence |
Teach a learner each new task. |
Alpha Go, Tesla AI |
Since then, almost every new software application (or example of machine intelligence) is labeled an algorithm, even now that algorithms can learn. In the algorithms paradigm, each and every brain that we design would be a “new algorithm,” but in the teaching paradigm, it’s just the result of one of the many, many possible learning schemes developed by the master teacher. Algorithms tell you exactly how to do things, skills make you practice different ways to do it well under all sorts of conditions. Teach the AI what you already know and let it learn what you don’t know yet. So, now that we’ve talked about the merits and characteristics of Autonomous AI, what is a brain design exactly?
There is a big difference between teaching and doing. It’s the difference between defining a sequence of skills for someone to practice and learn how to do something and programming an algorithm to make all the decisions. It’s also the difference between helping others succeed and being the star of the show. Teaching is an underrated skill and when it comes to brain design, architecting Autonomous AI is much more like teaching than programming and requires the curiosity of a learner more than the knowledge of an expert.
Most of us associate the skill required for designing advanced AI with programming, but as we discussed in “We’re entering an era of teaching intelligence (skills and strategies)” teaching is required to train even machine systems that can learn. Teaching is a skill, but there is tension even for teachers at the university level who are expected to research and rarely trained on how to teach.
The Last Dance is a 2020 American sports documentary miniseries that revolves around the career of Michael Jordan, with particular focus on his final season with the Chicago Bulls. The series depicts Michael Jordan as a star performer with high expectations for himself and his teammates. Over time he seemed to develop some coaching skills to bring out the best in others, but his primary persona was the star performer. Star performers don’t make good brain designers. Data scientists, be careful. Some of the data scientists I’ve worked with have too much “star performer” characteristics in them to be good brain designers. The temptation will be to write your own algorithm that dictates exactly what the AI should do and how it should learn. A good brain designer needs to be willing to learn and able to teach (outline skills for practice and self-discovery).
The player coach started off as a subject matter expert, but gained significant interest and expertise in coaching over time. Though Michael Jordan was the star of The Last Dance, Phil Jackson was my favorite character. He won two championships with the New York Knicks (an American Basketball team). This gives him tremendous credibility as a basketball expert. There is a whole episode of the series that shows how fascinated Jackson was with coaching and how much he studied under one of the great coaches in the game. I also consider Steve Kerr a great example of a player coach. He is an eight-time NBA champion, having won five titles as a player (three with Michael Jordan and the Chicago Bulls and two with the San Antonio Spurs) as well as three with the Golden State Warriors as a head coach. Player coaches can make the absolute best brain designers. Subject matter experts, who can teach and have lots of street cred with subject matter experts. That’s a winning combination.
The professional teacher might not be a subject matter expert at all. Their expertise is teaching itself. They are curious, inquisitive and ask great questions. They not only absorb information quickly, but they can break down what they learn into component parts and quickly organize those parts based on what they hear. This is my experience as a brain designer. So, I have to rely on my skills as a teacher and my willingness to learn.
One quick word for those of you who are in the learner category: you have limited subject matter expertise in the area that you will design brains for and you have limited expertise in teaching. Learn to teach. The rest of this book will provide a framework for interviewing subject matter experts and teaching what they know to AI.
My favorite aspect of brain design is the wide variety of different processes and systems I’ve designed AI for. I’m certainly no expert in bulldozers, extruders, drones, gyratory crushers, drilling equipment, warehouse logistics or robots but I love a challenge, learn fast, and ask good questions. If you take a similar approach, armed with the framework in this book, you’ll design some amazing brains of your own.
I love maps, especially old maps. There’s something intriguing to me about the visual representation of geographic information. Maps tell us about familiar territories and which areas are yet unexplored. Systems and processes are kind of like geographic landscapes. Some of the decision landscape has been explored and recorded in maps (think process guidelines, procedures and specifications), but some of the decision landscape unexplored territory. Learning how to perform a task (acquiring a skill) is like exploring an area of the decision space, teaching tells you what people already know about how to acquire that skill (navigate the decision space), and brain designs are maps of the decision space that guide exploration.
Before we settle on what a brain design actually is and how to use it, we need to discuss in more detail how humans and machines make decisions.
When humans start practicing a skill or when machines use math, menus, and manuals to make decisions, they have very limited information about the space they are searching for solutions. We’re searching for the highest point of elevation on a landmass, but we don’t have a map. We don’t know where the boundaries of the landmass are, we don’t know what the geographic features are, and we most certainly don’t have step by step instructions on how to reach the highest point of elevation on the landmass. The one thing that we do know is the elevation of the point where we stand (like in Figure 3-8).
Let’s assume for a moment every step that you take travels the same distance. Each time you make a decision, you get feedback on the new elevation for where you now stand.
Each automated decision-making method makes the decision about which step to take, slightly differently. Remember the PID gains from Math, Menus, and Manuals? Each of those gains (the P, the I and the D in PID for example) are numbers, mathematical codes for how the controller should respond to feedback. So, when making the next decision (taking the next step), math consults its model of the world, and calculates the next move based on its gain values. That’s like moving along directional paths such as the green dashed lines in Figure 3-10.
Optimization systems make the next decision by testing out potential next moves, measuring the altitude of each potential destination, then taking the step that leads to the highest elevation. You need to decide how many options to test before deciding. For example, should you test 4 times at 90 degree angles, 8 times at 45 degree angles (that’s what’s shown in Figure 3-10), or 360 times at 1 degree angles. Expert systems prescribe either exactly what move to make or landmarks to watch out for based on previous exploration experience. Opening sequences in Chess prescribe the exact first few moves to make because the landscape is very treacherous and there are many early moves that could result in quick losses. The opening sequences look like the four prescribed initial decisions marked by yellow arrows in Figure 3-10.
Humans use aspects of each of these, but far surpass any of these decision making methods. That’s the fun part. Learning systems don’t need to be spoon-fed every decision they make. Whether humans or AI, learning systems explore well through self-discovery. The most efficient way to skill them up is to guide their exploration, not micromanage them.
Figure 3-11 shows us two important landmass features that the explorer can’t see. The high mountain peak is east; that’s helpful to know. There’s also a big mountain range between the explorer and the goal peak. A skill provides guidance for practicing, exploring, and completing a complex task. In this case, we provide two skills. The first skill is to travel east. This is far from step by step instructions for how to reach the goal. It provides valuable, easy to retain information about landmass features without requiring exact coordinates or limiting exploration of creative routes. The second skill is to find and travel through a mountain pass. Mountain passes are low points between mountains, so traveling through one comprises the exact opposite of the long term goal: find the highest altitude point on the landmass. But, because of what we know about the landmass from previous exploration, it provides an invaluable clue. Without this skill, the explorer will ascend one of the lesser mountains in the central mountain range and declare success on the overall task. Optimization experts call this a local maxima (a peak that’s high, but not the highest on the landmass) and for this landmass, that’s a big mistake. There are also three different mountain passes. The skill doesn’t presume to dictate which mountain pass the explorer should navigate. We don’t know enough about the terrain between the origin and the mountain range to tell exactly what route to take as we travel east, never mind which mountain pass we will think best after we explore and navigate that space.
This whole exploration thing can be frustrating to the human explorer and the onlooking observer. Why can’t I just get a map of the entire terrain, then I’ll figure out the best way to get to the highest peak? Not so fast! The only way to get a map that detailed is to explore and record the entire landscape. Remember Checkers? Computers scoured that decision-making landscape for 20 consecutive years and recorded 500 billion billion points. Checkers is tiddly-winks (a very simple childhood game) compared to your real-life problem. That’s why you can’t just get a map of the space to make decisions for it
You can’t explore the entire space ahead of time, but you can scout out local surrounding areas a few moves in advance. Model Predictive Control (MPC) looks at least one move in advance when making decisions and modern Autonomous AI like AlphaChess, AlphaGo, and AlphaZero can’t function without scouting many moves ahead. These AI use a technique called Monte Carlo Tree Search to navigate large spaces like Chess, Shogi, and Go.
You can use trees to search many moves ahead. The first set of branches on the tree represent options for your next move and each further branch on the tree represents further future moves. The idea is to keep looking at more future moves until you reach the end of the game. When you do, there will be paths through the branches (called lines in chess) that lead to winning outcomes and others that lead to losing outcomes. If you can make it that far through the search, you’ll have lots of winning lines that you can pursue. The average chess game lasts 20 to 40 moves and the tree representing any one of those games has more branches than the number of atoms in the universe. That doesn’t mean that you shouldn’t use look ahead moves though. There are many ways to pare down the number of branches that you need to examine to get to winning outcomes.
Monte Carlo Tree search randomly searches branches on the tree for as long as you have time and compute power to spare. The World Chess Federation (FIDE, for the French name Fédération Internationale des Échecs) specifies 90 minutes for each player’s first 40 moves, then 30 minutes total for both players to finish the game. AlphaZero uses 44 CPU computer cores to randomly search about 72,000 tree branches for each move. Depending on chance, the algorithm might or might not find a winning line during the search between each of its moves. Both professional Chess and Go players say that the AI has an “alien playing style” and I say that is because of the randomness of Monte Carlo Tree Search. You see, when the search algorithm finds a line, it chases it and will do absolutely anything (no matter how unorthodox or sacrificial) to follow it. Then, depending on the opponent’s play, the algorithm may pick up a new line with seemingly disjointed unorthodox moves and sacrifices. Some of these lines are brilliant, creative, and thrilling to watch but are also at times erratic. Before we move on, let’s give credit where it’s due: AlphaZero beat the de facto standard in machine chess (Stockfish) 1000 games to zero.
Humans look ahead, but not this way. Psychology research on chess players shows that expert chess players focus on only a small subset of total pieces and review an even smaller subset of tree branches when scouting ahead for options. How are they able to look many moves ahead without randomly exploring so many options? They’re biased. Many options don’t make sense as strong chess moves, others don’t make sense based on the strategy that the player is using, others still don’t make sense based on the strategy that the opponent is using. I suggest that a promising area of research is using human expertise and strategy to bias the tree search (only explore options that match with the current strategy).
The reconnaissance procedure above depends on discrete actions and certainty about what will happen when you take an action. Many problems that you work on will have to deal with uncertainty and continuous actions. Almost every logistics and manufacturing problem that I’ve designed a brain for displays seasonality. Much the same way that tides ebb and flow and the moon waxes and wanes in the sky, seasonal variations follow a periodic pattern. Here’s some examples of seasonal patterns of uncertainty:
Traffic
Seasonal demand
Weather patterns
Wear and replacement cycle for parts
After you factor in the incomplete information on the maps of most decision spaces, the cost to scout ahead, and uncertainty, it turns out that teaching skills is a more efficient way to help learners find good navigation paths than giving them step by step instructions. Skills guide the exploration, even when most of the features of the landmass are still unknown.
Russian psychologist Lev Vygotsky (1896 - 1934), known for his work on the psychological development of children, posited two theories that help us better understand the function and purpose of brain designs: Zones of Proximal Development and Scaffolding.
Vgotsky defined as follows: “The zone of proximal development is the distance between the actual developmental level as determined by independent problem solving and the level of potential development as determined through problem solving under adult guidance or in collaboration with more capable peers.” He is saying that there is a gap between what we learn from self-discovery and exploration and more advanced skills that we could acquire with the help of someone who better knows the landscape (a teacher or a capable peer).
Brain designs enable Autonomous AI more advanced skill acquisition by providing maps for exploring decision landscapes.
Scaffolding describes the process of teaching and learning concepts sequentially in an order that better facilitates skill acquisition. Learning to ride a bike is a good example. At first, a child rides a bike with training wheels to help the bike stay upright. Next, the child rides without training wheels and an adult may run alongside the bicycle helping the child to steer and balance. Finally, the adult steps aside after the child learns to balance well on their own.
If the mighty human mind needs teaching and scaffolding, then so do limited algorithms that can change behavior but have limited reasoning and even more limited pre-baked concepts of the world to rely on when learning specific tasks. No modern human can say they’ve learned without teaching (gained all of their current skills) through self-guided practice alone. That’s why brain designs are so important.
Decisions (actions you take) are like routes across geographic terrain. You must explore the terrain to understand the geography, much like the video game The Legend of Zelda: Breath of the Wild. Some areas are easier to explore than others. Take for example, the landmass in Figure 2-6. This terrain is safe and easy to explore any way you want. If you are looking for the location marked x, you can approach it from many directions using many routes.
Take Figure 3-13 as a counter example. This land mass dictates very specific exploration in order to arrive at the point marked x from any of the blue arrows. The blue arrows represent the current state of any system or process and the red x marks the state you want the system to be in after you make a series of sequential decisions. The route represents the sequential decisions that will take you from origin to destination. There are many ways to explore this landmass in ways that would never lead to the destination (many invalid routes assuming that you can only travel over land). A useful guide for exploration would communicate two critical exploratory steps in sequence:
Explore different ways to get to the confluence where the land masses meet.
Explore different ways to get to the point marked by a red x.
Note that these steps are not step by step instructions for how to reach the target. They are skills, strategies to practice reaching the target from various starting points. So, this brain design, at a high level, looks a lot like the brain design above for baking and for making plastic at SCG.
Each point on the map represents an outcome for your system or process. There’s a much older Atari game called Lunar Lander. Many who wouldn’t remember playing it know it because it became a benchmark for Autonomous AI. In this game, you control the side thrusters and the bottom thruster of a spacecraft. Your objective is to land it quickly but safely between two flags that mark the landing zone.
There are many paths that you can take through the decision landscape to get from the current state of the craft to the landing state. Note that the state of the craft includes the horizontal and vertical position, angle, velocity, and angular velocity (spinning) of the craft. From the origin state in Figure 3-16, a pilot could take path 1 through the state space by tilting first, moving horizontally, then landing or path 2 which swoops down to land in one motion.
Here’s two brain designs that allow an AI or even a human pilot to practice landing from different starting states using the routes prescribed in the brain design.
So it turns out that maps are a great analogy for decision-making. Remember the whole idea here is not that we learn to make better decisions ourselves (though that might be a nice side benefit). The goal of this book is to teach you how to design an AI system that can make good decisions for completing specific tasks. The design of this brain is best compared to a mental map with landmarks. For algorithms that learn (change behavior and adapt to feedback), this mental map guides the algorithm’s exploration and self-discovery of the decision-making landscape. Even for systems that calculate decisions (math), systems that search, but don’t change their behavior (menus) or expert systems (manuals ), this brain design provides a helpful palette for you as a brain designer to organize and combine techniques. In the next chapter, I will define the building blocks for brains and provide a framework for organizing these building blocks into brains.