It is not until someone said, “It is Intelligent”, that I stopped searching, and paid attention.
Artificial Intelligence, known as AI, is here. It has penetrated multiple aspects of our lives and is increasingly involved in making very important decisions. Soon it will be employed in every sector of our society, powering most of our daily operations. The technology is advancing very fast and its investments are skyrocketing. At the same time, it feels like we are in the middle of an AI frenzy. Everyday we hear about a new AI accomplishment: AI beats the best human player at Go game. AI outperforms human vision in classification tasks. AI makes deep fakes. AI generates high energy physics data. AI solves difficult partial differential equations that model the natural phenomena of the world. Self-driving cars are on the roads. Delivery drones are hovering in some parts of the world. We also hear about AI’s seemingly unlimited potential: AI will revolutionalize healthcare and education. AI will eliminate global hunger. AI will fight climate change. AI will save endangered species. AI will battle disease. AI will optimize the supply chain. AI will unravel the origins of life. AI will map the observable universe. Our cities and homes will be smart. Eventually, we cross into science fiction territory: Humans will upload their brains into computers. Humans will be enhanced by AI. Finally, the voices of fear and skepticism emerge: AI will take over, and destroy humanity.
Amid this frenzy, where the lines between the real, the speculation, the exaggeration, the aspiration, and the pure fiction are blurred, we must first define AI, at least within the context of this book. We will then discuss some of its limitations, where it is headed, and set the stage for the mathematics that is used in today’s AI. My hope is that when you understand the mathematics, you will be able to look at the subject from a relatively deep perspective, and the blurring lines between the fiction, the reality, and everything in between, will become more clear. You will also learn the main ideas behind state-of-the-art math in AI, arming you with the confidence needed to use, improve, or even create entirely new AI systems.
I have yet to come across a unified definition of AI. If we ask two AI experts, we hear two different answers. Even if we ask the same expert on two different days, they might come up with two different definitions. The reason for this inconsistency and seeming inability to define AI, is that until now it is not clear what the definition of I is. What is Intelligence? What makes us human and unique? What makes us concious of our own existence? How do neurons in our brain aggregate tiny electric impulses and translate them into images, sounds, feelings and thoughts? These are vast topics that have fascinated philosophers, anthropologists, and neuroscientists for centuries. I will not attempt to go there in this book. I will, however, address artificial intelligence in terms of an AI agent and list the following defining pricinciples for the purposes of this book. In 2021, an AI agent can be one or more of the following:
An AI agent can be pure software, or have a physical robotic body.
An AI agent can be geared toward a specific task, or a flexible agent exploring and manipulating its environment, building knowledge with or without a specific aim.
An AI agent learns with experience, that is, it gets better at performing a task with more practice at that task.
An AI agent perceives its environment then builds, updates, and/or evolves a model for this environment.
An AI agent perceives, models, analyzes, and makes decisions that lead to accomplishing its goal, whatever that predefined fixed or variable goal might be.
Whenever a mathematical model for AI is inspired by the way our brain works, I will point out the analogy, hence keeping AI and human intelligence in comparison, without having to define either. Even though today’s AI is nowhere close to human intelligence, except for specific tasks AI such as image classification, AlphaGo, etc., so many human brains have recently converged to develop AI that the field is bound to grow and have breakthroughs in the upcoming years.
It is also important to note that some people use the terms Artificial Intelligence, Machine Learning, and Data Science interchangeably. These domains are overlapping but they are not the same. The fourth very important but slightly less hyped area is that of Robotics, where physical parts and motor skills must be integrated into the learning and reasoning processes, merging mechanical engineering, electrical engineering, bioengineering together with information and computer engineering. One fast way to think about the interconnectivity of these fields is: Data fuels machine learning algorithms that in turn power many popular AI and/or robotics systems. The mathematics in this book is useful, in different proportions, for all four domains.
In the past decade, AI has sprung into worldwide attention due to the successful combination of following factors:
Generation and digitization of massive amount of data, such as text data, images, videos, health records, e-commerce, network, and sensor data. Social media and the Internet of Things have played a very significant role here with their continuous streaming of great volumes of data.
Advances in computational power, through parallel and distributed computing as well as innovations in hardware, allowing for efficient and relatively cheap processing of large volumes of complex and unstructured data.
Recent success of neural networks in making sense of Big Data, surpassing human performance in certain tasks such as image recognition and Go game. When AlexNet won the ImageNet Large Scale Visual Recognition Challenge in 2012, it spurred a myriad of activity in Convolutional Neural Networks (supported by Graphical Processing Units) and in 2015, PReLU-Net (ResNet) was the first to outperform humans in image classification.
When we examine the above points, we realize that today’s AI is not the same as science fiction AI. Today’s AI is centered around big data, nonetheless different types of data, machine learning algorithms, and is heavily geared towards performing one task extremely well, as opposed to developing and adapting varied intelligence types and goals as a response to the surrounding environment.
There are many more areas and industries where AI can be successfully applied than there are AI experts who are well suited to respond to this evergrowing need. Humans have always thrived for automating processes, and AI carries a great promise to do exactly that, at a massive scale. Large and small companies have volumes of raw data that they would like to analyze and turn into insights for profits, optimal strategies and allocation of resources. The health industry suffers a severe shortage of doctors and AI has inumerable applications and unlimited potential there. Worldwide financial systems, stock markets and banking industries have always depended heavily on our ability to make good predictions, and have suffered tremendously when those predictions failed. Scientific research has progressed significantly with our increasing ability to compute, and today we are at a new dawn where advances in AI enable computations at scales thought impossible a few decades ago. Efficient systems and operations are needed everywhere, from the power grid to transportation to the supply chain to forest and wildlife preservation, battling world hunger, disease, and climate change. Automation is even sought after in AI itself, where an AI system spontaneously decides on the optimal pipelines, algorithms and parameters, readily producing the desired outcomes for given tasks, thus eliminating the need for human supervision altogether.
In this book, as I work through the math, I will focus on the following widely useful application areas of AI, in the context of an AI agent’s specified tasks. However, the beneficial mathematical ideas and techniques are readily transferable across different application domains. The reason for this seeming easiness and wide applicability is that we happen to be at the age of AI implementation, in the sense that the main ideas to address certain tasks have already been developed, and only with little tweaking, they can be implemented across various industries and domains:
Our AI agent processes data, provides insights and makes decisions based on that data.
Neural networks in AI are modeled after the neocortex, or the new brain. This is the part of our brain responsible for high functions such as perception, memory, abstract thought, language, voluntary physical action, decision making, imagination and conciousness. The neocortex has many layers, six of which are mostly distiguishable. It is flexible and has a tremendous learning ability. The old brain and the reptilian brain lie below the neocortex, and are reponsible for emotions, and more basic and primitive survival functions such as breathing, regulating the heart beat, fear, aggression, sexual urges, and others. The old brain keeps records of actions and experiences that lead to favorable or unfavorable feelings, creating our emotional memory that influences our behavior and future actions. Our AI agent, in a very basic way, emulates the neocortex and sometimes the old brain.
Our AI agent senses and recognizes its environment. It peeks into everything from our daily pictures and videos, to our MRI scans, and all the way into images of distant galaxies.
Our AI agent communicates with its environment, and automates tedious and time consuming tasks such as text summarization, language translation, sentiment analysis, document classification and ranking, captioning images, and chatting with users.
Our AI agent detects fraud in our daily transactions, asseses loan risks, and provides 24- hour feedback and insights about our financial habits.
Our AI agent processes network and graph data, such as animal social networks, infrastructure networks, professional collaboration networks, economic networks, transporation networks, biological networks, and many others.
Our AI agent has social media to thank for providing the large amount of data necessary for its learning. In return, our AI agent attempts to characterize social media users, identifying their patterns, behaviors and active networks.
Our AI agent is an optimizing expert. It helps us predict optimal resource needs and allocation strategies at each level of the production chain. It also finds ways to end world hunger.
Our AI agent facilitates our daily operations.
Our AI agent solves partial differential equations used in weather forcasting and prediction.
Our AI agent attempts to fight climate change.
Our AI agent delivers personalized learning experiences.
Our AI agent thrives to be fair, equitable, inclusive, transparent, unbiased, and protective of data security and privacy.
Along with the impressive accomplishments of AI and its great promise to enhance or revolutionalize entire industries, there are some real limitations that the field needs to overcome. Some of the most pressing limitations are:
Current AI is not even remotely close to being intelligent in the sense that we humans consider ourselves uniquely intelligent. Even though AI has outperformed humans in innumerable tasks, it cannot naturally switch and adapt to new tasks. For example, an AI system trained to recognize humans in images cannot recognize cats without retraining, or generate text without changing its architecture and algorithms. In the context of the three types of AI, we have thus far only partially accomplished Artificial Narrow Intelligence, which has a narrow range of abilities. We have neither accomplished Artificial General Intelligence, on par with human abilities, nor Artificial Super Intelligence, which is more capable than humans. Moreover, machines today are incapable of experiencing any of the beautiful human emotions, such as love, closeness, happiness, pride, dignity, caring, sadness, loss, and many others. Mimicking emotions is different than experiencing and genuinely providing them. In this sense, machines are nowhere close to replacing humans.
Most popular AI applications need large volumes of labeled data, for example, MRI images can be labeled cancer or not-cancer, YouTube videos can be labeled safe for children or unsafe, or house prices can be available with the house district, number of bedrooms, median family income, and other features- in this case the house price is the label. The limitation is that the data required to train a system is usually not readily available, not cheap to obtain, label, maintain or warehouse. A substantial amount of data is confidential, unorganized, unstructured, biased, incomplete, and unlabeled. Obtaining the data, cleaning it, preprocessing it, and labeling it become major obstacles requiring large time and resource investments.
For a certain AI task, there are sometimes many methods, or algorithms, to accomplish it. Each task, dataset, and/or algorithm have parameters, called hyperparameters, that can be tuned during implementation, and it is not always clear what the best values for these hyperparameters are. The variety of methods and hyperparameters available to tackle a specific AI task mean that different methods can produce extremely different results, and it is up to humans to assess which methods’ decisions to rely on. In some applications, such as which dress styles to recommend for a certain customer, these discrepencies may be inconsequential. In other areas, AI-based decisions can be life changing: A patient is told they do not have a certain disease while in fact they do; a person is mislabeled as highly likely to reoffend and gets their parole denied as a consequence; or a loan gets rejected for a qualified person. Research is ongoing on how to address these issues, and I will expand more on them as we progress throughout the book.
Humans’ abilities and potential are limited to their brains’ power and capacity, their biological bodies, the resources available on Earth and in the universe, that they are able to manipulate. These are again limited by the power and capacity of their brains. AI systems are similarly limited by the computing power and hardware capability of the systems supporting the AI’s software. Recent studies have suggested that computation-hungry deep learning is approaching its computational limits, and new ideas are needed to improve algorithm and hardware efficiency, or discover entirely new methods. Progress in AI has heavily depended on large increases in computing power. This power, however, is not unlimited, extremely costly for large systems processing massive data sets, and has a substantial carbon footprint that cannot be ignored. Moreover, data and algorithmic software do not exist in the vacuum. Devices such as computers, phones, tablets, batteries, and the warehouses and systems needed to store, transfer and process data and algorithms are made of real physical materials harvested from Earth. It took Earth millions of years to make some of these materials and the type of infinite supply required to forever sustain these technologies is just not there.
Security, privacy and adversarial attacks remain a primary concern for AI, especially with the advent of interconnected systems. A lot of research and resources are being allocated to address these important issues. Since most of current AI is software and most of the data is digital, an arms race in this area is never ending. This means that AI systems need to be constantly monitored and updated, requiring more expensive-to-hire AI specialists, probably at a cost that defeats the initial purpose of automation at a scale.
The AI research and implementation industries has thus far treated themselves as slightly separate from the economical, social, and security consequences of their advancing technologies. Usually these ethical, social, and security implications of the AI work are acknowledged as important, need to be attended to, but beyond the scope of the work itself. As AI becomes widely deployed and its impacts on the fabric and nature of society, markets, and potential threats are felt more strongly, the field as a whole has to become more intentional in the way it attends to these issues of paramount importance. In this sense, the AI development community has been limited in the resources it allocates to addressing the broader impact of the implementation and deployment of its new technologies.
A very important part of learning about AI is learning about its incidents and failures. This helps us foresee and avoid similar outcomes when designing our own AI, before deploying out into the real world. If AI fails after being deployed, the consequences can be extremely undesirable, dangerous, or even lethal.
One online repository for AI failures, called AI Incident Database https://incidentdatabase.ai, contains more than a thousand such incidents. Examples from this website include: A self driving car kills a pedestrian, a trading algorithm causes a market flash crash where billions of dollars automatically transfer between parties, a facial recognition system causes an innocent person to be arrested, and Microsoft’s infamous chatbot Tay, shut down only 16 hours after its release, since it quickly learned and tweeted offensive, racist, and highly inflammatory remarks.
Such bad outcomes can be mitigated but require deep understanding of how these systems work, at all levels of production, as well as the environment and users they are deployed for. Understanding the mathematics behind AI is one crucial step in this discerning process.
To be able to answer, or speculate on, where AI is headed, it is best to recall the field’s original goal since its inception: Replicate human intelligence. This field was conceived in the fifties. Examining its journey throughout the past seventy years might tell us something about its future direction. Moreover, studying the history of the field and its trends enables us to have a bird’s-eye view of AI, putting everything in context and providing a better perspective. This also makes learning the mathematics involved in AI a non-overwhelming experience. The following is a very brief and nontechnical overview of AI’s evolution and its eventual thrust into the limelight thanks to the recent impressive progress of Deep Learning.
In the beginning, AI research attempted to mimic intelligence using rules and logic. The idea was that all we needed to do is feed machines facts and logical rules of reasoning about these facts. There was no emphasis on the learning process. The challenge here was that, in order to capture human knowledge, there are too many rules and constraints to be tractable for a coder and the approach seemed unfeasible.
In the late 1990’s and the early 2000’s, various machine learning methods became popular. Instead of programming the rules, making conclusions and decisions based on these preprogrammed rules, machine learning infers the rules from the data. The more data a machine learning system is able to handle and process, the better its performance. Data, and the ability to process and learn from large amounts of data economically and efficiently became centerfold goals. Popular machine learning algorithms in that time period were Support Vector Machines, Bayesian Networks, Evolutionary Algorithms, Decision Trees, Random Forests, Regression, Logistic Regression, and others. These algorithms are still popular now.
After 2010, and particularly in 2012, a tidal wave of neural networks and Deep Learning tookover, after the success of AlexNet’s convolutional neural network in image recognition.
Most recently, in the last five years, Reinforcement Learning gained popularity after DeepMind’s AlphaGo beat the world champion in the very complicated ancient Chinese game of Go.
Note that the previous glimpse of history is very rough: Regression has been around since Legendre and Gauss in the very early 1800’s, and the first artificial neurons and neural networks were formulated in the late 1940’s and early 1950’s with the works of neurophysiologist Warren McCulloch, mathematician Walter Pitts, psychologists Donald Hebb and Frank Rosenblatt. The Turing Test, originally called the Imitation Game, was introduced in 1950 by Alan Turing, a computer scientist, cryptanalyst, mathematician and theoretical biologist, in his paper Computing Machinery and Intelligence. Turing proposed that a machine posseses artificial intelligence if its responses are indistiguishable from those of a human. Thus, a machine is considered intelligent if it able to imitate human responses. The Turing Test, however, for a person outside the field of computer science, sounds limiting in its definition of intelligence, and I wonder if the Turing Test might have inadvertently limited the goals or the direction of AI research.
Even though machines are able to mimic human intelligence in some specific tasks, the original goal of replicating human intelligence has not been accomplished yet, so it might be safe to assume that is where the field is headed, even though that could involve rediscovering old ideas or inventing entirely new ones. The current level of investment in the area, combined with the explosion in research and public interest, are bound to produce new breakthroughs. Nonetheless, breakthroughs brought about by recent AI advancements are already revolutionarizing entire industries eager to implement these technologies. These contemporary AI advancements involve plenty of important mathematics that we will be exploring throughout this book.
The main AI race has been between the United States, Europe, and China. Some of the world leaders in the technology industry have been Google and its parent company Alphabet, Amazon, Facebook, Microsoft, Nvidia, and IBM, in the United States, DeepMind in the UK and the United States, Baidu and Tencent in China. There are major contributors from the academic world as well, but these are too many to enumerate. If you are new to the field, it is good to know the names of the big players, their histories and contributions, and the kinds of goals they are currently pursuing. It is also valuable to learn about the controversies, if any, surrounding their work. This general knowledge comes in handy as you navigate through and gain more experience in AI.
Question: When I say the word Math, what topics and subjects come to your mind?
Whether you are a math expert or a beginner, whatever math topic that you thought of to answer the above question is most likely involved in AI. Here is a commonly used list of the most useful math subjects for AI implementation, however, you do not need to be an expert in all of these fields in order to succeed in AI. What you do need is a deep understanding of certain useful topics drawn from the following math subjects: Calculus, Linear Algebra, Optimization, Probability and Statistics. Depending on your specific application area, you might need special topics from: Random Matrix Theory, Graph Theory, Differential Equations, and Operations Research.
In this book we will walk through the above topics without presenting a textbook on each topic. AI application and implementation are the unifying themes for these varied and intimately interacting mathematical subjects. Using this approach, I might offend some math experts, simplifying a lot of technical definitions or omitting whole theorems and delicate details, and I might as well offend AI or specialized industry experts, again omitting details involved in certain applications and implementations. The goal however is to keep the book simple and readable, while at the same time covering most of the math topics that are important for AI applications. Interested readers who want to dive deeper into the math or the AI field can then read more involved books on the particular area they want to focus on. My hope is that this book is a concise summary and a thorough overview, hence a reader can afterwards branch out confidently to whatever AI math field or AI application area that interests them. In all cases, I will appreciate pointing out my errors.
Human intelligence reveals itself in perception, vision, communication through natural language, reasoning, decision making, collaboration, empathy, modeling and manipulating the surrounding environment, transfer of skills and knowledge across populations and generations, and generalization of innate and learned skills into new and uncharted domains. Artificial intelligence aspires to replicate all aspects of human intelligence. In its current state, AI addresses only one or few aspects of intelligence at a time. Even with this limitation, AI has been able to accomplish impressive feats, such as modeling protein folding and predicting protein structures, which are the building blocks of life. The implications of this one AI application (among many) for understanding the nature of life and battling all kinds of diseases are boundless.
When you enter the AI field, it is important to remain mindful of which aspect of intelligence you are developing or using. Is it perception? Vision? Natural language? Navigation? Control? Reasoning? Etc. Which mathematics to focus on and why then follow naturally, since you already know where in the AI field you are situated. It will then be easy to attend to the mathematical methods and tools used by the community developing that particular aspect of AI. The recipe in this book is similar: First the AI type and application, then the math.
In this chapter, we addressed general questions like: What is AI? What is AI able to do? What are AI’s limitations? Where is AI headed? How does AI work? We also briefly surveyed important AI applications, the problems usually encountered by companies trying to integrate AI into their systems, incidents that happen when systems are not well implemented, and the math subjects typically needed for AI implementations.
In the next chapter, we dive into data and affirm its intimate relationship to AI. When we talk data, we also talk data distributions, and that plunges us straight into probability theory and statistics.