Chapter 4. Building Blocks for AI Brains

My 8-year-old son, Christien, loves to play with Lego Blocks. He can play for hours building cars, jets, and landscapes. I enjoy building things together with him. Sometimes, when we are building, we need just the right piece to complete a section. So, we search through large bins, trolling for just the right piece for the job. When we find a block that performs the right function, the whole structure comes together nicely.

The same is true for AI brains. Autonomous decision making that works in real life doesn’t magically emerge from a monolithic algorithm: it is built from building blocks of machine learning, AI, optimization, control theory and expert systems.

Here’s an example. A group of researchers at UC Berkeley, under Peter Abiel, taught a robot how to walk. This robot, Cassie, looks a little like a bird with no torso (just legs). The AI brain that they built to control the robot snaps together decision-making modules of multiple different types and orchestrates them in a way that makes sense with what we know about how walking works. It combines math (control theory), manuals (expert systems) and machine-learning AI modules to enable faster learning of more competent walking than any of those decision-making techniques could on their own.

Figure 4-1. Brain design of AI that controls Cassie walking robot

You can see from Figure 4-1 that this brain uses different modules to perform different functions. It uses PD controllers to control the joints. As you learned in Chapter 2, PD controllers are quite good at controlling for a single variable like joint position based on feedback. The gait library contains stored expertise about successful walking patterns (I’ll discuss exactly what a gait is in a minute). This module is an expert system (remember manuals from Chapter 2?) that allows lookup of codified and stored expertise. The module labeled “Policy” is a Deep Reinforcement Learning module that selects the right gait pattern to use and how to execute that gait pattern. You can read the details of how this brain works in the research paper.

Each of these modules works at different time scales and uses different decision-making technology, but neither of these characteristics explain why this brain has multiple modules. The reason for different brain modules is to perform the multiple skills necessary for walking. One quick and easy way to determine that you need multiple modules, though, is to look for decisions that happen at different timescales and assign them to skills. For example, PD controllers operate at a very high frequency (think 10 decisions per second) as they move the joints. But how quickly does body position change during the execution of the walking gait? Not quite as quickly. When we’re walking, we change gaits when the surface or walking conditions change, even less frequently than adjusting to a new body position to execute the gait.

Table 4-1. Walking skills used in AI brain that controls Cassie robot to walk
Walking Skill Technique used to perform skill

Understand Gait

Expert System (menus)

Select and execute gait

Deep Reinforcement Learning

Translate gait to joint control

Low pass filter (math)

Control joints

PD Control (math)

Table 4-1 outlines the skills that the research team explicitly taught the brain through their modular design. The first skill is about understanding what a gait is. A gait is a repeating cycle of phases in the complex walking movement. Simple robots have simpler gaits, but bipeds with ankles and toes (that’s us) use approximately 8 gait phases when we walk. Before you get carried away thinking about whether AI has true, human-like understanding, let me explain to you what I mean. Without this gait library expert system, the AI would have no understanding of what a gait is, how a gait relates to walking, or how to use gaits to walk. But this expert system defines and stores gaits that the AI will use to walk the robot. So, in a primitive way, this brain does indeed understand gaits.

The second skill is to select which gait to use at any particular time and to make sure that the pose of the robot obeys the gait. A robot pose is much like a human pose: the shape of the robot frame when its joints are set to specific positions. This is a job for Deep Reinforcement Learning! As we’ll see below, you can think of each gait phase as a strategy to be used at just the right time to complete the task successfully. DRL is great at learning strategy and adapting its behavior to changing conditions.

Figure 4-2. Examples of robot poses related to walking

Next, we need to translate the poses to low-level control of the joints. This third skill is not a decision-making skill. It doesn’t take any action on the system. It’s a translator between the pose and the joint position command to give to the PD controller. The technology that is a perfect fit for performing this skill is a low pass filter. Often used in audio applications, low pass filters are great at blurring or smoothing signals so that the joints move smoothly between poses instead of jerking around. Then we can finally use our tried and true PD controllers to apply feedback and make sure that the joints execute the motions of successful walking. The brain design captures the fundamental skills required for walking and allows the learning algorithm to acquire walking behavior in a structured way with practice. Here’s what the brain design looks like translated into our visual language for brains.

Figure 4-3. Brain design of the Cassie robot brain from the Abiel team

Case Study: Learning to walk is hard to evolve, easier to teach

Walking on two legs is a complex movement that is difficult to describe and execute. Roboticists have done a lot of work to reverse-engineer walking and teach robots to walk. Most of this work uses complex mathematics to calculate control actions, then apply them to each robot joint. A second approach leverages AI algorithms to learn control policies or to search for the right way to control each joint for walking (this includes optimization algorithms like evolutionary algorithms and Deep Reinforcement Learning). Neither of these approaches allow a human to teach even the most well understood knowledge about walking.

Figure 4-4. Simple simulated two-legged walking robot for AI to practice controlling

See the funny looking purple robot in Figure 4-4? This is a training gym for teaching AI how to walk on two legs. This environment simulates a two-legged robot with four joints: two upper joints that work like human hips and two lower joints that work like human knees. This robot has no ankles or feet.

Remember, in Deep Reinforcement Learning the agent practices the task and receives a reward based on how well it performs the task. The basic reward that comes with this gym environment gives points for how much forward progress you make but penalizes 100 points if you fall over (your big purple hull touches the ground). One AI researcher uses the picture in Figure 4-5 to describe four movement strategies that agents will learn on their own with this reward.

The double balance looks like someone rapidly tapping the ground on their tip toes. While kneel balancing, the AI kneels on one knee, then uses the front leg to reach out and drag the body forward in a pawing motion. The rear balance strategy puts the weight of the body on the back leg and moves forward by pawing the front leg. This is similar to the kneel balance, but in a standing position. At first glance, this looks a little like walking but the legs never cross in the characteristic scissor motion. Finally, the front balance extends and stiffens the back leg and paws forward with the front leg. Again, the legs never switch.

Figure 4-5. Four self-learned movement strategies that do not qualify as walking

So, why do we walk?

Walking is defined by an inverted pendulum gait in which the body vaults over the stiff limb or limbs with each step and where at least one leg remains in contact with the ground at all times. Basically, this means that when you walk, you vault (like the Olympic Pole Vault) over your planted leg, lift the opposite leg, then repeat the process. So that’s a bit of how we walk, but here’s why: walking is the most energy-efficient way for bipeds (animals with two legs) to move around. It’s not the fastest way to move around or the easiest way to move around, but walking uses the least amount of energy for each distance that you travel.

So am I telling you that none of the motion strategies above even meet the criteria of walking? Exactly, none of these strategies for moving around on two legs meet the definition of walking. So, while these agents have learned to move by experience alone, they have not learned to walk. If brains can learn by practicing and pursuing reward, why don’t these agents learn to walk? It turns out that Deep Reinforcement Learning becomes conservative when you penalize it harshly, much like human learners do. The AI receives severe punishment when it falls over but a much smaller reward to incentivize it to take its first steps. In contrast, the AI has to get a lot of things right to get the full reward of walking, so it settles for things that are more certain ways to get rewards without punishment. These things (which are more like crawling) let the brain get the reward of moving forward with a lot less risk of falling and without having to learn to balance.

The AI training gym comes with a PID controller that is tuned to perform the walking motion. The controller walks successfully, but will only succeed at certain walking speeds. Mathematical calculation provides a very precise definition of which action to take under each condition but results in a jerky mechanical walking motion. When I saw the PID control example, it gave me an idea. The PID controller separates the motion into three walking gait phases. After seeing this, I used my first two fingers (index and middle fingers) as “walking legs” to identify and name the three walking skills that I wanted to teach. My goal was to go beyond the motion strategies that emerged from trial and error only and the rigid walking motions of the PID controller: I wanted to teach the AI how to walk.

Table 4-2. Simple walking gait phases that we can teach as strategies
Gait Phase Heuristic Strategy (Hips)/Heuristic Strategy (Knees)

Lift Swinging Leg

Flex Swinging Hip (curl swinging leg), Extend Planted Hip/Flex Swinging Knee, Extend Planted Knee (keep planted leg straight)

Plant Swinging Leg

Extend Swinging Hip, Flex Planted Hip/Flex, then Extend Swinging Knee (curl, then straighten swinging leg)

Strategy vs. Evolution

Figure 4-6. The winning entrant of the 2017 NIPS Reinforcement Learning competition actually runs.

The AI research conference NeurIPS (Neural Information Processing Systems), formerly NIPS, hosted a Reinforcement learning competition in 2017 and 2018 where the challenge was to train AI to control a human skeleton and 41 lower-body muscles, to make it run and walk. I designed and trained AI for this competition. It was extremely frustrating to watch my AI brain do things like the movements shown in Figure 4-7, none of which are used in walking, when (being a bipedal walker myself), I already had quite a bit of knowledge about walking that I wanted to teach.

Figure 4-7. Skeleton performing three motions that are not used to walk: Skeleton leaning forward and extending leg backward (this is Yoga, not walking); Skeleton jumping up and falling backward; Skeleton kicking one leg out (this is bad Can-Can dancing, not walking)

My brains performed horribly at the competition tasks, but what I learned helped me develop the brain design techniques in this book and solve a lot of real-world problems. Here’s some other behaviors that my brain spent a lot of time exploring, and corresponding things that I desperately wanted to teach my AI brain.

Table 4-3. Behaviors that my AI spent a lot of time exploring that will never lead to walking
Don’t do This Reason Do This Instead


When walking and running, both legs do not operate in unison.

Move legs in a scissor-like motion.

Fall Forward

Walking involves vaulting over a planted leg.

Swing one leg forward, then plant it.

Stand on one leg, while swinging the other leg around.

Walking requires planting your swinging leg, so that you can vault over it and move forward.

Swing one leg forward, then plant it.

Even the 2017 competition winner, NNAISENSE, feels my pain. Here’s the warning they share on the website with the code they used to create the AI:

Note, however, that reproducing the results using this code is hard due to several reasons. First, the learning process (mostly in Stage I: Global Policy Optimization) was manually supported — multiple runs were executed and visually inspected to select the most promising one for the subsequent stages. Second, the original random seeds were lost. Third, the whole learning process required significant computational resources (at least a couple of weeks of a 128-CPUs machine). You have been warned.


Translation: we had to capture the brain doing things correctly like lightning in a bottle and stitch behaviors together, and even then it took extreme amounts of practice and computing power.

This is not surprising, since it took humans approximately 2 million years to learn to walk fully upright through evolution. In “The Origin of Strategy”, a seminal work on business strategy, Harvard Business School professor Bruce D. Henderson (1915 - 1992) asserts that strategy creates intelligent, creative, and planned interruption of incremental evolution. In biology, competition drives natural selection to differentiate, but incrementally and at a very slow pace. This is how the poison dart frog developed brightly colored toxic skin to deter predators and the Roraima bush toad developed the behavior of curling up and jumping off mountain cliffs, which makes it look like just another rock rolling downhill.

Strategy disrupts and diverts evolution and its long periods of drift toward equilibrium. Much like the scientific revolutions that we discussed in Chapter 1, strategy punctuates these periods. Stephen Jay Gould (1941-2002) describes a very similar phenomena in his Punctuated Equilibrium. We see this in business all the time. The Blockbuster movie rental chain dominated the home entertainment market by allowing you to browse titles in store and borrow your selection for a few dollars. Then Netflix offered to send the movie directly to your home and later enabled you to stream it directly to your TV. You don’t have to leave your home, but you don’t get access to all the most recent releases either. Then Redbox offered a new and interesting twist to location-based movie rentals when they created vending machines where you can self-serve and rent the titles you want.


Without strategy, it’s going to take evolutionary time scales or extreme luck to learn to walk.

Teaching walking as three skills

So, I decided to teach my brain the same three skills that the PID controller used in the reference example: the skills that I validated by walking my fingers across a table.

Defining skills

To teach each of these three skills, I had to limit the range of motion for the hip and the knee for each skill (strategy). For example, you can’t lift one leg (balancing on the other leg) unless you keep the planted leg stiff. You can’t keep the planted leg stiff unless you extend the knee and flex the hip. This is where it helps to try it out by walking your fingers on a hard surface. See Table 4-1 for details on the action ranges I used. By the way, this step of defining the actions each skill requires is crucial and I cover it in detail in Chapter 5: Telling your brain what to do.

Table 4-4. Simple walking gait phases that we can teach as strategies
Gait Phase Range of Motion (Hip) Range of Motion (Knee)

Lift Leg

Flex (close) swinging hip, flex then extend (open) planted hip

Flex (curl) swinging knee, extend (straighten) planted knee

Plant Leg

Extend (open) swinging hip, flex then extend planted hip

Extend (straighten) swinging knee, extend (straighten) planted knee

Swing Leg

Flex (close) swinging hip, flex then extend planted hip

Flex (curl) swinging knee, extend (straighten) planted knee

Figure 4-8. My AI brain executed the three skills I taught it: lift leg, plant leg, swing (opposite) leg. This is what walking looks like. The AI made a lot of mistakes and took a lot of practice, but it didn’t spend any time doing things that don’t resemble walking!

Setting goals for each skill

Next, I set a goal and success criteria for each of the three skills. We talk more about Setting Goals for your Brain in Chapter 6. Each gait phase of walking has distinct goals.

Table 4-5. Distinct goals for each gait phase
Gait Phase Goal

Lift Leg

Push off with enough velocity to vault over the planted leg

Plant Leg

Plant the leg with enough impulse (force at the moment of impact) to support the weight of the robot.

Swing Leg

This is the gait phase that generates most of the forward motion.

You can see that each of these gait phases have radically different goals. The first gait phase is about pushing off and picking up enough speed to vault over the other leg when you plant it. In the second phase, velocity doesn’t matter nearly as much. Walkers succeed in the second phase when they plant their leg with enough force to support the weight of the body. Otherwise, the walker will collapse to the ground. The final phase has yet another primary objective: forward motion. This phase is the big mover of the three gaits. During the first and second phase, the body doesn’t move forward very much even when the phases are very successful. Do you see how each gait phase performs a different functional skill with different goals?

Here’s what the walking behavior looked like. You can find the complete code for teaching this brain, here: machineteaching-io/stable-baselines: A fork of OpenAI Baselines, implementations of reinforcement learning algorithms (

Organizing the skills

Next, I snapped these skills together into a brain design. The gait pattern for walking cycles the skills in a sequence: lift leg, plant leg, swing leg, lift (the opposite) leg, plant (the opposite leg), swing (the opposite) leg, etc. Here’s what the brain design looks like.

Figure 4-9. Brain design diagram lists and orchestrates the skills needed to perform the task successfully

This brain design separates the brain into the skills that it will learn and orchestrates how the learned skills will work together. Each brain design is a miniature AI that will practice and learn how to perform that skill. Three skills execute the gait phases and one skill switches between gait phases. In the next section I define and categorize the building blocks that you will assemble into your brain designs and provide a framework for organizing those skills together.

Brains are built from skills

The mindset of algorithmic intelligence (see Chapter 2, “We’re entering an age of teaching intelligence” for a refresher) suggests to us that brains are built from algorithms. If you need a new brain for a new task, write a new algorithm. But the mindset of teaching intelligence tells us that brains are built from skills. If you need a new brain to accomplish a new task, identify and teach skills. Regardless of which learning paradigm you use to simulate learning, the brain will need to acquire skills to succeed.

What is a concept?

A concept is a unit of skill for performing a specific task.

Concepts are Fuzzy

Have you ever tried to articulate a concept that was hard to describe? Here’s a few examples: love, justice, beauty. Each of these concepts are fuzzy and best defined by giving many examples and counterexamples (sunsets and roses and smiles can be beautiful, but a sardonic smile is not beautiful, it’s disturbing). Sociologist Herbert Blumer (1900 - 1987) described these kinds of concepts as sensitizing concepts. Sensitizing means laying out a set of parameters that we can use to evaluate whether the concept applies. Blumer would define love, justice, and beauty as sensitizing concepts.

The skills that your brain will learn are a lot like sensitizing concepts. We learn sensitizing concepts by receiving feedback on the parameters that evaluate whether the concept applies. For example, one parameter that many use to evaluate beauty is how something makes you feel when you see it. If it makes you feel happy or sad, it might be beautiful. If it makes you feel afraid, angry, or disgusted, it likely isn’t beautiful. We then discover the boundary around these fuzzy concepts by comparing many examples against the defined sensitizing parameters for the concept. The same is true for skills that your brain will learn.

For example, the skill of an effective (American) football offense is fuzzy. You must be able to score against 3-4, 4-3, Man-to-man and zone (coverage) defenses. Each of those defenses are sensitizing criteria for a team’s skill at executing American football offense. The same is true for industrial processes and factory automation. One of the most challenging aspects of managing industrial processes is that there are multiple, often competing goals, and many more scenarios to succeed under. One goal in manufacturing is throughput (how much you make) but another competing goal is efficiency. I can make a lot of products, but might also spend a lot of energy to do it. I can make products very efficiently (labor and energy), but might sacrifice throughput to gain that efficiency. For this manufacturing skill, throughput and efficiency are both sensitizing criteria.

Expert rules inflate into concepts

The process of learning skills fits well into Blumer’s prescription for learning sensitizing concepts. Start with a set of examples and then add examples and add counter examples from there.


You can think of an expert rule as the starting point for learning a skill.

A rule provides a set of examples the same way that the definition of a line provides a set of points. The form y = mx + b (the equation for a straight line) gives us a set of points for the line. So, if a = 1 and b = 0, then the set of points on the line will be (0,0), (1,1), (2,2), etc. The rule provides solid examples that are both true to the concept and easy for the beginner to understand. With practice and experience, the beginner starts to identify exceptions to the rule. These exceptions are also true to the concept and provide a much more nuanced understanding of the concept.

Figure 4-10. A rule as the starting point of a skill. The skill is developed by identifying exceptions to the rule and inflated into the fuller, more nuanced description of the concept.

Here’s a few examples of skill concepts that can be expressed as an expert rule, but also fleshed out in more detail by discovering exceptions:

Table 4-6. Distinct goals for each gait phase
Skill Rule Example Exceptions

Bidding (Texas Hold ‘Em Poker)

Play “top 10” hands only, fold everything else.

Unless you have a lower pair and believe (usually by the bidding) that no one else has a top 10 hand.

Baggage Handling (Airport Logistics)

Use the conveyor for bags whose connecting flight is scheduled 45 minutes out or more.

Unless predictions suggest that some flights will be cancelled for weather. In that case, use the conveyor for bags whose connecting flight is likely to be cancelled, even if it is scheduled to leave within 45 minutes.

Naval Game Fleet Planning

Use a tank (ship with oversized armor and weapons) to attract and sink the enemy fleet.

Unless, the enemy has a large swarm of ships. In that case, use multiple medium large ships to split the swarm, then attract and defeat each swarm section.

Basketball Scoring

If you are close to the basket, shoot a layup, not a jump shot.

Unless you are closely defended by a larger defender. In that case, shoot a jump shot (consider a fadeaway).

Rock Crusher

Choke the crusher for large hard rocks, regulate the crusher for small soft rocks.

Unless you have a low customer demand for ore. In that case, produce the required ore as efficiently as possible, which may include choking the crusher for smaller, softer rocks than you otherwise would.

As humans and AI practice skills, they identify exceptions to the rule which provide a more accurate and nuanced picture of how to perform the skill, much the same way that we gain a more nuanced understanding of what love or justice or beauty are after detailed discussion of counter-examples.

Figure 4-11. A set of examples that might represent a concept. You may be able to approximate this concept with a straight line, but the reality is much more nuanced than the straight line.

Take a look at the data points in the figure. I don’t know what concept or skill this represents, but it looks quite nuanced and complex. One way to approach this skill is to find a single straight line that seems to best represent this concept. Data science calls this technique of fitting a line to a set of points, linear regression.

Figure 4-12. A linear regression that describes this complex concept as a line

There are benefits to this simplifying approach. These simplified representations provide portable replicas of the concept that are easy to manipulate and transfer. In the context of designing autonomous AI, where the concepts are skills that the AI will learn, the simplified representations are expert rules. Humans simplify concepts to expert rules for three main reasons:

  • Expert rules provide a starting point for practicing skills.

  • Expert rules are easy for beginners to understand and follow.

  • Expert rules are easy for teachers to communicate.

This idea of deflating (it’s kind of like vacuum sealing a bag of household products to save space) concepts to simplified expert rules is the basis of expert systems. Promising, but I discussed the drawbacks already in Chapter 2. Is there a way to leverage the simplifying benefits of expert rules and still embrace the full nuance of the concept?

Yes! In the next section, and then in much more detail in Chapter 7, I’ll show you how to use expert rules as abbreviations for the concepts that you’d like to teach. This allows you to define which skills are important for the learner to master (instead of leaving it up to the learner to discover both the skills and how to accomplish them) and allows the learner to discover unique and creative ways to perform these skills by practicing them.

Teach expert rules, let the learner inflate the concepts by practicing

A set of expert rules defines the skills in the AI brain, but instead of writing hundreds of additional expert rules to capture exceptions to the rules that better define each skill, we allow algorithms like Deep Reinforcement Learning to inflate the skill by practicing: identifying and adapting to the nuances. The structure of the skills provides some of the explainability and predictability of expert systems with the creativity and flexibility of DRL agents. Let’s return to the example of the gyratory crusher.

The structure of the expert rules, which reflect the two operating modes of the machine, outlines three skills that should be taught and learned. The first skill is the strategy of choking the crusher when the mine produces larger, harder rocks. The second skill is the strategy of regulating the crusher when the mine produces smaller, softer rocks. The third skill decides when to choke the crusher and when to regulate the crusher. This act of using subject matter expertise to define these three skills is itself teaching. Then, if we train each of three separate DRL agents on one of the three skills above, the combined brain will not only tell the engineers which next action to take to control the crusher but also which skill it is using at each decision point to make that decision.

Figure 4-13. Diagram of skills to control a mining crusher. The skills can be expressed as expert rules (to both people and AI), then practiced to fully inflate the skills based on sensitizing feedback.

As the AI learns (in the case of Deep Reinforcement Learning anyway), it captures the policy in a neural network. The teacher defines the skills to learn. The learning algorithm learns each skill. Machine Teaching leverages what you already know about how to perform the skill to structure the AI. Machine Learning builds the AI (in this case a set of neural networks).


As a brain designer, strive to express known skills in the form of expert rules then allow the AI to practice, master and trade off with other skills.

Next, I’ll outline the three different types of concepts that you’ll use in your brain designs. Perception concepts help the brain understand what is happening. Action concepts help the brain decide what to do. Selector concepts assign perception and decision-making work to other brain modules.

Perception concepts discern or recognize

Reacting to a changing environment starts with gathering information about what’s happening in that environment. Machines gather information with mechanical sensors. For example, a thermometer is a type of sensor that measures temperature and a barometer is a type of sensor that measures atmospheric pressure. People that design factories and industrial systems don’t use the same thermometers and barometers that we use at home, but they are good examples. We also have sensors on our bodies. Our eyes are complex light sensors, our ears are sophisticated audio sensors (like microphones), etc. See What is a sensor? for a more complete list and description of industrial sensors.

The sensors gather the information, but the information has to be processed and translated into a format that can be used to make decisions. For example, our eyes are more than just sensors that receive light. The rods and cones in our eyes process the light and translate it into electrical signals that our brains can use to make decisions. Our ears perform a similar function beyond just receiving sound and machines need more than sensors to make decisions.

Perception concepts process information that come in through the sensors and send relevant information through to the decision-making parts of the brain. For example Auditory Processing Disorder (APD) is an abnormality in sound perception in humans. The ears hear just fine, the abnormal perception obscures the information. There are five common variations of perceptions concepts commonly used in Autonomous AI design.

See and hear

Bell Flight designs and builds helicopters and other vehicles that can take off vertically. Have you ever seen the V-22 Osprey? It looks like a plane, but when it takes off, it tilts its rotors (the equivalent of propellers for helicopters are called rotors) up and takes off (straight up) like a helicopter. After it is in the air, it tilts its rotors back and flies like a plane.There is an autonomous version, the V-280 Valor that flies without a pilot. Bell also makes freight and people carrying drones.

Autonomous drones and larger rotorcraft like the V-280 use Global Positioning System (GPS) to calculate position and control. But if GPS is blocked by buildings, autonomous systems must fly and land by sight, much like human pilots would. Calculating systems like the ones that fly by GPS are based on control theory (math) and cannot process visual information from video feeds and camera images.

So, Bell built an Autonomous AI to land by sight. This brain has two modules: the first is a machine learning module that processes the image data and extracts features about the landing zone. Imagine a model that can input an image of the landing zone and output things like coordinates for the center of the landing zone, pitch, yaw, and roll of the landing zone in 3D space. This is the perception concept and it helps the brain see.

The second module is a Deep Reinforcement Learning module that has practiced landing the drone in simulation many times, on many different landing zones using the visual information that the first module passed to it.


We make predictions to help us make decisions all the time. When I decide which checkout line to wait in at the grocery store, I look at the number of people in each line (length), the number of items that various people have in their carts and make a rough assessment of the speed of each checker. I don’t look at every cart in every line and I have no way to measure the actual speed of each checker or the actual number of items that each customer in each line needs to check out. I’m sampling data from many variables that I have observed before, using my experience to predict which line will get me through the checkout the quickest, then acting on that perception and choosing a line.

I worked with a manufacturing company that wanted to better predict how long their cutting tools would last. Spinning tools cut metal to make all kinds of different parts that we use every day. They wear and break depending on how fast they spin, how much friction they experience, and how much you bend them in each direction. If you retire the tool too early, you’ve wasted money, but if the tool breaks while cutting a part, you might have to throw away the part you were working on, wasting even more money.

Figure 4-14. Three parts wear in different ways and survive different durations based on what they experience over their lifetime.

Figure 4-14 shows three scenarios of part wear. In scenario 1 the tool is run at low speeds but high load for the first part of its life and low speed high load for the second part of its life. The tool experiences high friction for its entire lifetime. Even though this tool is always run at either high speed, or high load it has the longest lifetime of the three tools. The tool in scenario 2 fails soonest when it is put under very high speed and friction even though it starts its life under low speed, load, and friction. The tool in scenario 3 starts its life at very high speed and even though it is later used at low speed, load, and friction, it fails soon after the transition. This example isn’t intended to model any particular physical scenario but I want to demonstrate two things to you: predicting wear is difficult and scenarios determine wear patterns.

The two most common complex predictions I see in industry are wear predictions like the one above and predictions about how much market demand there will be for products. Market demand is complex, seasonal and depends on different variables for different products. The demand for some products is very seasonal, like snowshoes and sunscreen. Crude oil contains gasoline, diesel fuel, and jet fuel, so oil refineries operate differently to make more or less of each depending on the demand. Europe consumes more diesel in the winter to heat homes and more jet fuel during the summer travel season.


Have you ever played the childhood game “one of these things is not like the other?” In this game, you look at multiple objects (see Figure 4-15 for an example) to determine which one is different, doesn’t match the pattern. When you play this game, you’re looking for anomalies.

Figure 4-15. Some of these objects look similar but belong in different categories.

Detecting anomalies is an important perception skill that informs decision making. One company that I worked with wanted to use AI for cybersecurity to stop cyber-attacks like the distributed denial-of-service attack (DDoS) in 2018 that used over 1000 different autonomous bots to disrupt the GitHub code repository site for over 20 minutes. In a DDoS attack, hackers purposefully generate fake traffic to a website, so much traffic, that the website can’t function. The first step in countering a DDoS attack is detecting one. It’s hard to tell whether a sudden spike in traffic is due to a legitimate spike in customer demand (this would be a very good thing) or the beginning of a DDoS attack (a very bad thing). My prescription was that the AI should have one module that learns to detect anomalies in web traffic and classify them as either a traffic spike or DDoS attack and another module that accepts the first module’s conclusions and passes them to the decision-making module, which takes action to stop attacks but lets valuable legitimate traffic through.


Sometimes it helps to classify things into categories before making a decision. In the grocery shopping example above, in addition to predicting, I am classifying things that I see: slow lines, quick lines, long lines, short lines, full grocery carts, empty grocery carts, no grocery cart (just a few handheld items), and overstuffed grocery carts. You get the picture. Maintenance technicians often do the same thing after taking a machine offline for repair. They classify the machine into states, then take different actions to bring the machine online based on what state it’s in. This is like what you might do when moving a bicycle from a fixed position. If the bicycle is facing downhill, don’t worry about which gear you’re in, just push off. If the bicycle is on flat ground, shift to a lower gear, then push off. If the bicycle is on a hill, stand up and pedal. You’ll need the extra force to get started no matter what gear you are in. Before making this decision, you need to perceive the slope of the path you are headed down.


There is a fascinating part of the steel-making process called coking where you introduce carbon into molten iron in the presence of limestone. There are hundreds of variables to consider while controlling the blast furnace where this process occurs. That’s difficult even for human experts who’ve built decades of experience into their intuition. So, instead of considering the full scope of variables at each decision point, the engineers devised an index that packs a huge amount of information into a single number. This number tells the operators most of what they need to know to control the furnace well. Yes, you lose a lot of information when you process the data like this, but that’s what filters are for: showing you the information that you need to see while weeding out the information that won’t help you decide. This index was likely carefully constructed and tested before using it as feedback on real furnaces. You should take care in how you filter data for decisions as well.


Data scientists, this is another area where we desperately need your help. Devising composite indices that effectively filter many variables in a way that facilitates decision making is very challenging.

Action concepts decide and act

Action concepts make things happen. They decide and act. Whether the decision-making is learned or programmed or even random, these concepts make the decisions about what the system will do next. I go into a lot of detail on how to use action concepts in your brains in Chapter 5: Telling your brain what to do.

Selector concepts supervise and assign

Every job needs a supervisor, right? Unless you’re an ant, you need a supervisor to take a high level view of the work and assign tasks and jobs to team members and crews. Each crew serves a different purpose or needs to be activated in different situations according to their specialty or training. Selector concepts are the supervisors for the brain. They are specialized action concepts. Their role is to assign the right decisions to the right concept. Once an action concept is called into service, it makes the decision for the brain.

Figure 4-16. AI brain that controls chillers for building heating and cooling. One concept trains on day scenarios, another concept trains on night scenarios; a supervising selector concept assigns control to either the day or the night concept.

Take a look at this example of an AI that controls heating and cooling in large commercial buildings (think about your work office building). The Heating Ventilation and Air Conditioning (HVAC) system uses ice to store energy, and water to cool the air in the building. Ducts pass the air across water that cools it. The chiller uses energy to make ice during the times of day when energy is cheaper. The ice stores the energy to cool the building without using energy when energy is more expensive. To control the chiller you switch it into the right mode (make ice, melt ice, pass the water directly through without cooling, etc).

The most difficult thing about controlling the chiller is the fact that buildings behave differently during the day and during the night. During the day, the flow of people entering and leaving the building drives cooling demand. At night when there are few people in the building, running machines require most of the cooling. These day and night scenarios are so different that you’d train a separate day crew and night crew to control the building at different times. Sometimes it’s easy to determine when to send the day crew home and call in the night crew, other times it’s fuzzy.

We can distinguish two kinds of concepts here: programmed and learned. Design programmed concepts into your brain when it’s clear which concept should make the decision. Use learned concepts when it’s fuzzy and hard to tell which concept to call to make the decision in a brain.

Programmed Concepts

The rule of thumb is that if someone can describe how to assign each crew to the right task as a set of rules, then program the selector. For the buildings with employees that mostly come exactly at 9am and leave exactly at 5pm, you can program the selector like we did. Here’s what the selector code looks like in python:

if time >= 9 and time <= 5: # It’s day time, assign the day crew
 assign = day_concept
else # It’s night time, call in the night crew
 assign = night_concept

Programming is step by step teaching where you specify every decision to make along the way. If you are confident that you know and can simply express instructions for how to supervise the concepts in your brain, design with a programmed selector.

Learned Concepts

But when the decision between which crew to assign is fuzzy, it’s better to teach an intelligent supervisor to assign the right crew. A learned selector is a Reinforcement Learning module that practices assigning tasks to the right concept at the right time. It experiences rewards and penalties based on whether it makes the right assignment. Learned selectors work really well when the policy for which concept to assign tasks to is nuanced and depends on a lot of different factors.

So, a learned selector is perfect to supervise the brain that controls chillers for a building where employees arrive and leave at very different times. To decide whether to assign the day crew or the night crew the selector needs to consider lots of factors that affect when people arrive and leave. For example, on Tuesday and Wednesday afternoons employees tend to stay later to beat traffic. On Thursday and Friday afternoons many employees leave early to beat traffic, or even earlier on Fridays before holiday weekends.

Learning allows the brain to explore how best to supervise the concepts in a brain. If you don’t know the best way to supervise concepts in a brain under all circumstances, or if you know but writing the instructions would take too much time and effort, design with learned concepts. One of my clients told me that they knew that there were two strategies for operating their equipment but that they know how to use only one of the strategies well. I designed a learned selector into the brain. The learned concept figures out how to perform the second strategy, the learned selector figures out when to use the second strategy.

Brains are organized by functions and strategies

So if the building blocks of brains are concepts that perform skills and sub-tasks, how do you organize these skills as you design a brain? Sequences and hierarchies are the two major paradigms for organizing skills in brains.

Let’s return to the maps analogy. Remember, a point on a map represents a good outcome in your process where you will arrive if you make good decisions. Brain designs are mental maps with landmarks that help you explore the landmass. Be careful not to confuse the mental map with the landmass (terrain) itself. Even with the mental map and landmarks, you’ll need to practice reaching goal destinations from various starting points. Just because you have defined a skill that a task requires, doesn’t mean that you are proficient at it. I know that shooting a jump shot is the best way to score in basketball from 18 feet out, but I’m not a great jump shooter yet. You still need to practice and your brains will need to practice the skills that you teach them, too.

Sequences or parallel execution for functional skills

Customers often tell me that for their task, you need to perform the skills in a particular sequence. They report that experience and evidence suggest (even demand) that they perform skills in a certain order. Note, I’m not talking about a sequence of steps here, but a sequence of skills. For these tasks, if you perform the skills in the right sequence, you will reach the goal. If you perform the skills in the wrong sequence, you will get hopelessly lost and never find the location on the map that represents success at the task.

Look at Figure 4-17. This is a perfect example. The mountain pass provides an obstacle that sequences the skills. One skill: making your way across the mountains from various starting points on the left side of the island, must be completed first. After you make it through the mountains, the second skill of reaching the target becomes possible. This reminds me of the technology trees in the video game Civilization. You must develop steam power, before you invent the locomotive train. This is also related to Vygotzky’s concept of Zones of Proximal Development that we discussed earlier in Chapter 3. Discovering steam power makes it more likely that you’ll invent the locomotive. The skills are related.

Figure 4-17. A decision landscape where two skills must be executed in sequence to reach the goal. Head through the mountain pass, then explore the flatland to reach the goal.

There’s a mathematical term for tasks with decision space landmasses that look like this: they are called funnel states. Funnel states are mathematical bottlenecks like doorways that you must go through in a problem to get to desirable goal states (like the red x marks in each of our landmass diagrams). To navigate these kinds of spaces, you need to use different skills in sequence. Each skill is a function that takes the right navigation action at the right time. Here’s a real example.

Let’s explore the Autonomous AI that Microsoft Researchers built to teach a robot to grasp and stack blocks from the People and Process Concerns section of Chapter 1, in more detail. The researchers designed a brain with 5 action concepts to execute skills and a learned selector to supervise the concepts:

  • Reach: this movement extends the hand out from the body.

  • Move: this movement sweeps the arm back and forth and up and down.

  • Orient: this movement put the robot hand in the right position to grasp the block

  • Grasp: this movement squeezes the fingers to grasp the block.

  • Stack: this movement picks up the block and places it on top of another block.

Each skill is a function that uses specific joints to perform a sub-task. This is important because limiting the actions that each skill takes as it performs its function prevents the brain from having to explore many movements that couldn’t possibly accomplish the goal. For example, orienting your hand around a block (putting your hand in position to pick it up) involves rotating your wrist. Now, imagine if your arm is in the perfect position and all you needed to do was turn your wrist to put your hand in position to grab the block, but you jerk your elbow! Now your hand is in a position where you can’t grasp the block, no matter how you turn your wrist.

Figure 4-18. The reach skill extends the robot arm by activating the shoulder, elbow and wrist. The move skill moves the arm laterally back and forth and up and down by activating the shoulder joint only. The grasp skill closes the hand by activating the fingers only.

Functions separate the actions into groups that are relevant for the skill.

Table 4-8. Functional skills for controlling a robotic arm and hand to grasp and stack blocks
Skill Actions


Elbow, Shoulder, Wrist








Shoulder, Fingers

Try this out for yourself. Identify an object within reach that you can grasp. Reach your arm (moving mostly your elbow, and also your shoulder and wrist as needed but only extend your arm out straight from your body. Now use your shoulder only to laterally move your hand toward the object. You might be able to grasp the object at this point, but don’t. You’re so close! Now move your arm around from the elbow. See how frustrating that is! Your elbow movements just moved your hand away from the object that you were previously able to grasp. Now imagine watching your AI brain use joints that ruin the skill sequence over and over in 1,000 different ways instead of turning the wrist and grasping after the arm is in position. This is exactly what will happen to you if you allow an AI to practice a task without teaching functional skills explicitly.

Sequences live in the selector

See the sequence?

For the robot arm example above, the skills must be performed in a sequence. Imagine what will happen if you try to grasp the block, then move your hand into the right position or if you try to stack the block before you’ve grasped it!

First, before I talk about the different types of functional skills and how to represent them, let me tell you where they live. Sequences live in selector concepts. The selector concepts that supervise the brain and assign which skill to perform next must obey all of the sequence rules that I present in this section. For each example, I include a brain-design diagram that outlines the sequence that the selector must obey as it makes its assignments.

Figure 4-19. Brain design for grasp and stack robotic task with sequence definition living in the selector

So, how do you make a selector obey a sequence as it assigns tasks? There are two ways to accomplish this: Programmed selectors can accept selection rules that enforce sequences. Alternately, you can enforce sequences in learned selectors using action masking. Action masking is a technique that sets the probability of unwanted actions to zero in the learning algorithm. This is the technique I used to enforce the sequence of the walking gait for the bipedal walker brain above.

We borrow some mathematical language symbols from a field called task algebra to describe the rules about sequences. These symbols represent the landmarks that provide clues to the sequence of skills.

Each of the skills in the sequence is a function. When the function has served its purpose, move on to the next task function in the sequence.

Table 4-9. Symbols from task algebra that we will use to describe relationships between skills.
Operator Name Example Description


A ⊸ B

Skill A must complete before Skill B can execute.

Exclusive Choice

A ⊗ B

Both Skill A and Skill B are enabled and can be executed in any order, but not at the same time.



A & B

Both Skill A and Skill B are enabled.

X[ ]


X[A, B]

Skill X assigns Skill A or Skill B to execute. The assigned task must be completed (Skill A or B) before Skill X is considered done.


Some readers will find this kind of mathematical representation refreshingly precise and others will find it intimidating. Don’t worry, I’ll provide plenty of examples.

The task algebra for the robotic arm example above is R ⊸ M ⊸ O ⊸ G ⊸ S. This means that the brain will always reach first, then move, then orient the robot hand around the block, then grasp the block, then stack the block.

Fixed-Order Sequences

This is a fixed-order sequence. The sequence doesn’t change, regardless of the starting point or the destination on the landmass. Sometimes we know why this is true (physics or chemistry tells us), but sometimes we don’t have the science to explain it—yet we know that the sequence holds true because experience over time proves it. In this case the fixed-order sequence of skills is effective, but seems a bit too rigid. For example, I can easily imagine many ways that you could move the arm first, before reaching, or alternate between reaching and moving the arm to get the arm into position for orienting the hand. A more flexible brain design allows more options for how the brain sequences the reach and move tasks:

R ⊗ M ⊸ (O ⊸ G ⊸ S)

RM,means that you can perform the reach and move skill as many times as you want in any order, which is a more natural movement. Then after reach and move are complete (the hand is in position to grasp the block after the correct wrist movement), the orient, grasp, and stack skills must be executed in exactly that order.

Figure 4-20. The path of the robot hand moving toward the block for fixed order task sequences, variable order task sequences, and parallel execution of the reach and move function

Parallel Execution of Functional Skills

Sometimes skills can be executed independently, but in parallel. The most smooth and natural hand motion for reach and move likely results from parallel execution. See figure 4-x for an example. If you reach first, then move RM , the motion looks very mechanical. The robot reaches the entire distance, then activates the move skill to sweep over to the block. A variable order sequence RM,alternates between reaching and moving, which looks smoother but is still a jerky motion. Activating reaching and moving simultaneously at each time step (R & M) leads to the smoothest path toward the block. The reach skill controls one set of joints and the move skill controls another set of joints, so that each action to control the arm joins the decisions from the independent reach and move skills. I think that the original definition of the reach skill (with the shoulder, elbow, and wrist) is a better brain design.

Not every set of skills can be executed successfully in parallel. We can only teach these skills in parallel (practice them separately, then combine them for parallel execution) if we slightly change the definition of the skills. Do you remember how the reach skill uses the shoulder, wrist and elbow and the move skill uses the shoulder only? To teach and execute these skills in parallel, each skill needs to use a mutually exclusive set of joints. This means that no joints are shared between skills. So, if we changed the reach skill to use the elbow and wrist only, then we can teach and execute reach and move in parallel.

You might look at the resulting paths toward the block and wonder why we should use fixed order or variable order sequences for skills that learn this grasp and stack task. Keep in mind that the research project used fixed order RM very successfully to complete the tasks and that the motion looks quite smooth for this 7-jointed robot. You can see the robot learning and executing these skills here. That’s one of the great things about brain design: there are multiple (maybe even many) valid brain designs that provide good landmarks for Autonomous AI to acquire skills to complete tasks well, just like there are many teaching strategies that can guide human students to successfully learn the jumpshot.

Variable Order Sequences

Just like the reach and move skill sequence for RM, other task sequences can be completed in any order. In the Nintendo game “Breath of the Wild” that I discussed earlier, the first 4 puzzles can be solved in any order, but the next skills must be performed in a sequence. You need a hang glider to get off the plateau. The task algebra for the opening skills in Breath of the Wild is:

(Gain Spirit Orb from Ja Baij ShrineGain Spirit Orb from Keh Namut ShrineGain Spirit Orb from Oman Au ShrineGain Spirit Orb from Owa Daim Shrine)Climb TowerFly Glider.

Here’s a landmass that requires skills to be performed in variable order.

Figure 4-21. Landmass where exploration functions (“travel through the mountain pass” and “travel around the lake”) should be used in variable sequences. Sometimes you will need to travel around the lake first, then through the mountain pass to get to the goal state, other times you will need to travel through the mountain pass first, then travel around the lake. Perform the tasks in any order that helps you succeed.

Figure 4-21 shows a landmass that requires variable sequences to navigate to the highest peak. From any point on the outskirts of the island, you will need to navigate around lakes and through mountain passes in sequence, but the sequence will vary depending on which point you start from. The task algebra looks like this:

Travel through the mountain passTravel around the lake

Let me give you another real robot example, this time with variable order sequences. In this example, the brain is controlling the two-armed Baxter robot to lift a table. This brain was also built by researchers at Microsoft. But here’s the catch: the robot needs to follow a human’s lead. Most of us have done this before. We team up with another person to lift a table: one person leads and the other follows.

Figure 4-22. Baxter robot lifting a table in simulation. There is a simulated, invisible force standing in for the human on the other side of the table. Baxter is trying to learn to lift the table by following this force’s lead.

We divide the task and teach it as two separate skills: lift and level.

Skill Goal


Move the table’s center of mass vertically upward. If you lift one end of the table only, this goal cannot be satisfied.


Return the angle of the table to 0 degrees (perfectly level). You need to level the table only if the table is tilted.

For the lift and level tasks, there is clearly a sequence, but the sequence is variable. If the table is not level, you need to perform the level skill before you can successfully lift the table vertically. But if the table is level you should start lifting (there is no leveling to do). The task algebra for these skills looks like this: Lift ⊗ Level. The tasks must be performed in sequence, but the sequence is variable. A good sequence might look like (Lift, Lift, Level, Lift, Level, Lift) but will vary depending on how the other lifter leads. Note that these skills cannot be taught as parallel execution (taught separately, then combined) because they are not completely independent.

Hierarchies for strategies

Strategies are different from functions. Functions are skills that must be performed in some sequence or executed in parallel. Strategies are skills that map to a scenario, not a sequence. Use strategies on landscapes that force you to choose the right skills for the right scenario.

Figure 4-23. This landmass allows you to travel from left to right using either of two strategies: pass between the two bodies of water or around the bodies of water.

Take a look at the landmass in Figure 4-x. Unlike the lift and level skills in the table-lifting example, both strategies are completely valid ways to traverse the island from left to right. But one of the strategies looks significantly more attractive depending on where you start and where the target is. If you start closer to the top or the bottom of the island, going around the bodies of water will require less distance traveled. If you start closer to the center of the island (vertically), then you can reach the goal sooner by traveling between the bodies of water.

That’s how strategies work. You need to read the situation correctly to choose the right strategy. In his iconic (at least in the field of AI) 1985 talk Can Machines Think, Richard Feynman (1918 - 1988) tells the story of how Douglas Lenat used strategy to win a prominent game competition. In this wargame competition, participants designed a navy fleet of miniature ships with different amounts of armor and weapons. Mock navy battles used chance (as many games do) to inflict damage and the last navy standing won.

During the month of June 1981, the EURISKO program was set the task of exploring the design of naval fleets conforming to a body of (several hundreds of) rules and constraints as set forward in Traveller: The Trillion Credit Squadron. EURISKO designed a fleet of ships suitable for entry in the 1981 Origins national wargame tournament, held at Dunfey’s Hotel, in San Mateo, Ca., over July 4 weekend. The Traveller tournament, run by Game Designers Workshop (based in Normal, Illinois), was single elimination, six rounds. EURISKO’s fleet won that tournament, thereby becoming the ranking player in the United States (and also an honorary admiral in the Traveller navy). This win is made more significant by the fact that the program’s creator, Professor Douglas Lenat of Stanford University’s Heuristic Programming Project, had never played this game before, nor any miniatures battle game of this type.

Lenat’s heuristic program (heuristic is just another term for strategy) devised a strategy to build one gigantic ship that contained all of the available armor and weapons. This is a well-used strategy in many battle video games; gamers would call this ship a “tank” (a large unit that can both inflict and absorb a huge amount of damage). These units are usually very slow, but their firepower and damage absorbing bulk can help them succeed, as Lenat’s gigantic ship did.

Discovering Strategies

Well, the next year, the wargame’s rules were changed to prevent a single huge ship from winning the competition. OK, game over, right? Nope. That year, Lenat’s competition entry used a navy of 100,000 tiny ships to overwhelm the competition and win for a second year in a row. Each ship delivers a tiny amount of damage, but there are so many of them, they can add up to a victory. Video gamers use this strategy in battle games frequently too. They call this the “swarm.”

I’m not a video game player (mostly because I don’t play them well), but I used to enjoy a video game called Starcraft II. In this game, you control a galactic space army. Depending on the race of the space army you control (Terran, Protoss, or Zerg), different strategies become very attractive. The Zerg is a “swarm” race that is collectively stronger by being part of a group. It’s easy to defeat an individual Zerg unit but you’ll be overwhelmed by a swarm. That’s how most Zerg players win the game.


When designing brains, look for extremes. The extremes help you identify strategies. Scenarios and strategies always come in pairs, so I always ask experts “what is the opposite of this strategy you just told me about?”.

Strategies wax and wane effectiveness over time

These kinds of strategies aren’t just useful in games. Businesses use the swarm strategy as well. Amazon built a reputation as an online shopping giant, a huge megalith that sells everything from underwear to high-end electronics from its website. It even bought the grocery-store chain Whole Foods. It wins by scale and by controlling a massive, efficient supply chain. This reminds me of the Empire in the Star Wars science fiction series: a huge intergalactic government with massive resources. They even built a space weapon the size of a planet: seemingly unbeatable.

Well, along comes Shopify (and the Rebel Alliance). Shopify provides technology for almost anyone to build and maintain an eCommerce store. OK. Now we’re powering up a swarm of small, nimble eCommerce stores; the Zerg of eCommerce, if you like. Here’s another thing about the Zerg. The Zerg ecosystem grows in power over time and is almost unbeatable late in the game. You have to beat them early in the game in order to win. In an article titled Shopify, the Zerg of eCommerce, Mike, an “ex-activist investor,” illuminates these very insights and suggests that over time the Shopify strategy will gain ground over the Amazon strategy.

Strategies Capture Trade-Offs

One ship vs many, tank vs. swarm, Empire vs. Rebel alliance (this is a reference to the Star Wars franchise), Federation vs. Borg (this is a reference to the Star Trek franchise), small and fast vs. big and slow, run game vs. passing game in football, —there’s always a trade-off. I didn’t learn this in a factory or an AI research lab, I learned this by studying the game of chess. I am not a proficient chess player, but I am fascinated by the strategy of the game. I couldn’t help but purchase “The Complete Book of Chess Strategy” by Jeremy Silman when I saw it at Barnes and Nobles. Yes, indeed, I am a book hound and have spent many hours perusing book store shelves.

In his books, Silman, a teacher and coach of chess masters evaluates positions according to the “imbalances”, or differences, which exist in every position, and advocates that players plan their play according to these. A good plan according to Silman is one which highlights the positive imbalances in the position. He’s saying that the differences in chess board scenarios present opportunities for various strategies to have more impact on the game than others.

But there were so many strategies listed in the encyclopedia! I understood that the chess game is typically broken into openings, mid game and end game and that there would be different strategies for each phase, but still there were too many different strategies listed for me to understand the trade-off. After reading his book, “The Amateur’s Mind” I devised my layman’s interpretation of what the two basic strategies in chess are. Describing these strategies in their extreme forms helped me discover them. This approach will also help you draw out strategies for industrial processes and equipment from subject matter experts

  • One strategy is extremely aggressive. It favors mobility (the ability to move pieces quickly) and therefore favors bishops over knights. Bishops are very mobile and can travel across the board on the long diagonal superhighway (called fiancetto). Bobby Fisher favored this strategy.

  • The opposite strategy is very defensive. It favors center control and builds edifices of pisces to block and own the center of the board. It favors knights over bishops. Knights can move more easily through crowded center areas of the board. A group of chess masters so preferred this style that they developed the Queen’s Gambit to lure a player into playing the aggressive, offensive strategy so that they could crush them with this one. The Queen’s Gambit accepted, takes up the challenge. The Queen’s Gambit Declined, sees the danger and takes action to mitigate this strategy’s advantages.

Figure 4-24. Pendulum of strategy. Strategies come in pairs. They are best devised by thinking about extremes, but executing strategy requires swinging back and forth between the strategic extremes.

The most important insight though is that you can’t just play whichever strategy you want. The board scenario (position of your pieces and your opponent’s pieces) tells you when it is most advantageous to use each strategy. This also explains why there are so many chess strategies in the encyclopedia. There are strategies at almost every point on the continuum that help navigate from most any type of board position.

Here’s an example from horse racing. The movie “Ride Like a Girl” tells the story of Michelle Payne, the first woman to win the prestigious Melbourne Cup race. Her father teaches her how to read the competitors during a race to effectively navigate between strategies. The first strategy is to hold the horse back and stay with the pack. Her father then explains that when horses tire during the race, the pack parts and a clear but temporary opening appears. If you wait until the opening appears to “make your move” you will charge ahead of the pack. If you try to make your move before the opening appears, you will not be able to break away from the pack.

Selector concepts navigate strategy hierarchies

Strategies live in hierarchies. The selector (remember, selectors are supervisors) decides which strategy to use in each situation and the strategy decides what to do. The task algebra for the hierarchies above looks like this: Selector[Strategy 1, Strategy 2]. Here are the task algebra representations of each of the strategy examples that I reference earlier in the chapter:

Select Navigation Strategy[Travel Between Lakes, Travel Around Lakes]

Select Naval Fleet Strategy[One Huge Ship, 100000 Tiny Ships]

Select Chess Strategy[Offensive, Defensive]

Select Horse Racing Strategy[Hold Horse Back, Charge Ahead of Pack]

Select Crusher Strategy[Choke, Regulate]

As you design brains, you will need to identify hierarchies of strategies and sequences of functional skills so that you can apply AI design patterns that teach skills effectively (that is, they guide the learning algorithm to acquire the skills to succeed). Just like a skilled teacher or coach, you need to be much more concerned with providing landmarks to guide exploration than with figuring out how to prescribe each action (perform the task yourself). In the next chapter, I will describe how to listen to detailed descriptions of tasks and processes for clues that illuminate which building blocks you should use to design a brain that can learn that task. If you practice, you will be able to quickly and easily identify sequences and hierarchies and sketch out effective brain designs. Next, let me provide some visual language for expressing brain designs and an example that combines many building blocks that we introduced in this chapter.

Visual Language of Brain Design

Figure 4-25. Stylized workflow diagram example

You will collaborate with many stakeholders during and after the brain-design process, so it helps to have a common language to describe brain designs. I often whiteboard brain designs together with the subject matter experts. I sometimes ask other brain designers to review my preliminary designs and give me feedback. After I am finished designing a brain, I pass the brain design to the group that will build the brain.

Let’s not reinvent the wheel. Workflow diagrams already provide a useful and well known visual language for systems that process information, output decisions, and choose which modules to activate. Perception concepts process information, action concepts output decisions, and selector concepts choose which action concepts to activate, so I use workflow diagrams as the visual language for brain designs.

Symbol Meaning in Workflow Diagram Function in Brain Design


Terminal (beginning or end of a workflow)

Input to the brain, output from the brain



Perception concept



Action concept


Decision between branches of the workflow

Selector concept

Figure 4-26. Brain design diagram labeling the worflow symbols I use to describe inputs, outputs, perception concepts, action concepts, and selector concepts.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.