© Michael Paluszek, Stephanie Thomas  2017

Michael Paluszek and Stephanie Thomas, MATLAB Machine Learning, 10.1007/978-1-4842-2250-8_2

2. The History of Autonomous Learning

Michael Paluszek and Stephanie Thomas1

(1)New Jersey, USA

2.1 Introduction

In the previous chapter you were introduced to autonomous learning. You saw that autonomous learning could be divided into the areas of machine learning, controls, and artificial intelligence (AI). In this chapter you will learn how each area evolved. Automatic control predates AI. However, we are interested in adaptive or learning control, which is a relatively new development and really began evolving around the time that AI had its foundations. Machine learning is sometimes considered an offshoot of AI. However, many of the methods used in machine learning came from different fields of study such as statistics and optimization.

2.2 Artificial Intelligence

AI research began shortly after World War II [4]. Early work was based on knowledge of the structure of the brain, propositional logic, and Turing’s theory of computation. Warren McCulloch and Walter Pitts created a mathematical formulation for neural networks based on threshold logic. This allowed neural network research to split into two approaches. One centered on biological processes in the brain and the other on the application of neural networks to AI. It was demonstrated that any function could be implemented through a set of such neurons and that a neural net could learn. In 1948, Norbert Wiener’s book Cybernetics was published which described concepts in control, communications, and statistical signal processing. The next major step in neural networks was Donald Hebb’s book, The Organization of Behavior, connecting connectivity with learning in the brain. His book became a source of learning and adaptive systems. Marvin Minsky and Dean Edmonds built the first neural computer in 1950.

In 1956, Allen Newell and Herbert Simon designed a reasoning program, the Logic Theorist (LT), which worked nonnumerically. The first version was hand simulated using index cards. It could prove mathematical theorems and even improve on human derivations. It solved 38 of the 52 theorems in Principia Mathematica. LT employed a search tree with heuristics to limit the search. LT was implemented on a computer using IPL, a programming language that led to Lisp.

Blocks World was one of the first attempts to demonstrate general computer reasoning. The Blocks World was a micro world. A set of blocks would sit on a table, some sitting on other blocks. The AI systems could rearrange blocks in certain ways. Blocks under other blocks could not be moved until the block on top was moved. This is not unlike the Towers of Hanoi problem. The Blocks World was a significant advance as it showed that a machine could reason at least in a limited environment. Computer vision was introduced. Work began on implementing neural networks.

Blocks World and Newell’s and Simon’s LT was followed up by the General Problem Solver (GPS). It was designed to imitate human problem-solving methods. Within its limited class of puzzles it could solve them much like a human. While GPS solved simple problems such as the Towers of Hanoi, Figure 2.1, it could not solve real-world problems because the search was lost in the combinatorial explosion.

A420697_1_En_2_Fig1_HTML.jpg
Figure 2.1 Towers of Hanoi. The disks must be moved from the first peg to the last without ever putting a larger-diameter disk on top of a smaller-diameter disk.

In 1959, Herman Gelernter wrote the Geometry Theorem prover, which could prove theorems that were quite tricky. The first game-playing programs were written at this time. In 1958, John McCarthy invented the language Lisp (LISt Processing), which was to become a major AI language. It is now available as Scheme and Common Lisp. Lisp was implemented only one year after FORTRAN. A typical Lisp expression is

(defun sqrt-iter (guess x)

  (if (good-enough-p guess x)

      guess

      (sqrt-iter (improve guess x) x)))

This computes a square root through recursion. Eventually, dedicated Lisp machines were built, but they went out of favor when general-purpose processors became faster.

Time sharing was invented at the Massachusetts Institute of Technology (MIT) to facilitate AI research. Professor McCarthy created a hypothetical computer program, Advice Taker, a complete AI system that could embody general-world information. It would have used a formal language such as predicate calculus. For example, it could come up with a route to the airport from simple rules. Marvin Minsky arrived at MIT and began working on micro worlds. Within these limited domains, AI could solve problems, such as closed-form integrals in calculus.

Minsky and Papert wrote the book Perceptrons, which was fundamental in the analysis of artificial neural networks. The book contributed to the movement toward symbolic processing in AI. The book noted that single neurons could not implement some logical functions such as exclusive-or and erroneously implied that multilayer networks would have the same issue. It was later found that three-layer networks could implement such functions.

More challenging problems were tried in the 1960s. Limitations in the AI techniques became evident. The first language translation programs had mixed results. Trying to solve problems by working through massive numbers of possibilities (such as in chess) ran into computation problems. Mr. Paluszek (the author) in Patrick Winston’s 6.034 class at MIT wrote a paper suggesting the use of pattern recognition in chess to visualize board patterns much as a human player might. As it turned out, this was not the approach taken to produce the champion computer chess programs of today.

As more complex problems were addressed, this approach was not suitable and the number of possibilities grew rapidly with increases in problem complexity. Multilayer neural networks were discovered in the 1960s but were not really studied until the 1980s.

In the 1970s, self-organizing maps using competitive learning were introduced [2]. A resurgence in neural networks happened in the 1980s. Knowledge-based systems were also introduced in the 1980s. According to Jackson [3],

An expert system is a computer program that represents and reasons with knowledge of some specialized subject with a view to solving problems or giving advice.

This included expert systems that could store massive amounts of domain knowledge. These could also incorporate uncertainty in their processing. Expert systems are applied to medical diagnoses and other problems. Unlike AI techniques up to this time, expert systems could deal with problems of realistic complexity and attain high performance. They also explain their reasoning. This last feature is critical in their operational use. Sometimes these are called knowledge-based systems. A well-known open-source expert system is CLIPS. write out name

Back propagation for neural networks was reinvented in the 1980s, leading to renewed progress in this field. Studies began both of human neural networks (i.e., the human brain) and of the creation of algorithms for effective computational neural networks. This eventually led to deep learning networks in machine learning applications.

Advances were made in the 1980s as AI began to apply rigorous mathematical and statistical analysis to develop algorithms. Hidden Markov models were applied to speech. Combined with massive databases, they have resulted in vastly more robust speech recognition. Machine translation has also improved. Data mining, the first form of machine learning as it is known today, was developed. Chess programs improved initially through the use of specialized computers, such as IBM’s Deep Blue. With the increase in processing power, powerful chess programs that are better than most human players are now available on personal computers.

The Bayesian network formalism was invented to allow for the rigorous application of uncertainty in reasoning problems. In the late 1990s, intelligent agents were introduced. Search engines, bots, and website aggregators are examples of intelligent agents used on the Internet.

The state of the art of AI includes autonomous cars, speech recognition, planning and scheduling, game playing, robotics, and machine translation. All of these are based on AI technology. They are in constant use today. You can take a PDF document and translate it into any language using Google translate. The translations are not perfect but are adequate for many uses. One certainly would not use them to translate literature!

Recent advances in AI include IBM’s Watson. Watson is a question-answering computing system with advanced natural language processing and information retrieval from massive databases. It defeated champion Jeopardy players in 2011. It is currently being applied to medical problems.

2.3 Learning Control

Adaptive or intelligent control was motivated in the 1950s [1] by the problems of aircraft control. Control systems of that time worked very well for linear systems. Aircraft dynamics could be linearized about a particular speed. For example, a simple equation for total velocity in level flight is
$$displaystyle{ mfrac{dv} {dt} = T -frac{1} {2}
ho C_{D}Sv^{2} }$$ (2.1)

This says the mass m times the change in velocity $$frac{dv} {dt}$$ equals the thrust T minus the drag. C D is the drag coefficient and S is the wetted area (i.e., the area that causes drag). The thrust is used for control. This is a nonlinear equation. We can linearize it around a velocity v s so that v = v δ + v s and get
$$displaystyle{ mfrac{dv_{delta }} {dt} = T -
ho C_{D}Sv_{s}v_{delta } }$$ (2.2)
This equation is linear. We can control velocity with a simple thrust control law $$displaystyle{ T = T_{s} - cv_{delta } }$$ (2.3)
where $$T_{s} = frac{1} {2}
ho C_{D}Sv_{s}^{2}$$. c is the damping coefficient. ρ is the atmospheric density and is a nonlinear function of altitude. For the linear control to work, the control must be adaptive. If we want to guarantee a certain damping value, which is the quantity in parentheses,
$$displaystyle{ mfrac{dv_{delta }} {dt} = -left (c +
ho C_{D}Sv_{s}
ight )v_{delta } }$$ (2.4)
we need to know ρ, C D , S, and v s . This approach leads to a gain-scheduling control system where we measure the flight condition and schedule the linear gains based on where the aircraft is in the gain schedule.

In the 1960s, progress was made on adaptive control. State-space theory was developed, which made it easier to design multiloop control systems, that is, control systems that controlled more than one state at a time with different control loops. The general space-space controller is $$displaystyle{ dot{x} = Ax + Bu }$$ (2.5)
$$displaystyle{ y = Cx + Du }$$ (2.6)
$$displaystyle{ u = -Ky }$$ (2.7)
where A, B, C, and D are matrices. If A completely models the system and y contains all of the information about the state vector x, then this system is stable. Full state feedback would be x = −Kx, where K can be computed to have guaranteed phase and gain margins (that is, tolerance to delays and tolerance to amplification errors). This was a major advance in control theory. Before this, multiloop systems had to be designed separately and combined very carefully.

Learning control and adaptive control were found to be realizable from a common framework. The Kalman filter, also known as linear quadratic estimation, was introduced.

Spacecraft required autonomous control since they were often out of contact with the ground or the time delays were too long for effective ground supervision. The first digital autopilots were on the Apollo spacecraft. Geosynchronous communications satellites were automated to the point where one operator could fly a dozen satellites.

Advances in system identification, the process of just determining parameters of a system (such as the drag coefficient above), were made. Adaptive control was applied to real problems. The F-111 aircraft had an adaptive control system. Autopilots have progressed from fairly simple mechanical pilot augmentation systems to sophisticated control systems that can take off, cruise, and land under computer control.

In the 1970s, proofs about adaptive control stability were made. Stability of linear control systems was well established, but adaptive systems are inherently nonlinear. Universally stabilizing controllers were studied. Progress was made in the robustness of adaptive control. Robustness is the ability of a system to deal with changes in parameters that were assumed to be known, sometimes because of failures in the systems. It was in the 1970s that digital control became widespread, replacing traditional analog circuits composed of transistors and operational amplifiers.

Adaptive controllers started to appear commercially in the 1980s. Most modern single-loop controllers have some form of adaptation. Adaptive techniques were also found to be useful for tuning controllers.

More recently there has been a melding of AI and control. Expert systems have been proposed that determine what algorithms (not just parameters) to use depending on the environment. For example, during a winged reentry of a glider the control system would use one system in orbit, a second at high altitudes, a third during high Mach (Mach is the ratio of the velocity to the speed of sound) flight, and a fourth at low Mach numbers and during landing.

2.4 Machine Learning

Machine learning started as a branch of AI. However, many techniques are much older. Thomas Bayes created what’s known as Bayes’ theorem in 1763. Bayes’ theorem says $$displaystyleegin{array}{rcl} P(A_{i}vert B) = frac{P(Bvert A_{i})P(A_{i})} {sum P(Bvert A_{i})} & & \ P(A_{i}vert B) = frac{P(Bvert A_{i})P(A_{i})} {P(B)} & &{}end{array}$$ (2.8)
which is just the probability of A i given B. This assumes that P(B) ≠ 0. In the Bayesian interpretation, the theorem introduces the effect of evidence on belief. One technique, regression, was discovered by Legendre in 1805 and Gauss in 1809.

As noted in the section on AI, modern machine learning began with data mining, which is the process of getting new insights from data. In the early days of AI, there was considerable work on machine learning from data. However, this lost favor and in the 1990s was reinvented as the field of machine learning. The goal was to solve practical problems of pattern recognition using statistics. This was greatly aided by the massive amounts of data available online along with the tremendous increase in processing power available to developers. Machine learning is closely related to statistics.

In the early 1990s, Vapnik and coworkers invented a computationally powerful class of supervised learning networks known as support vector machines (SVMs). These networks could solve problems of pattern recognition, regression, and other machine learning problems.

A growing application of machine learning is autonomous driving. Autonomous driving makes use of all aspects of autonomous learning including controls, AI, and machine learning. Machine vision is used in most systems as cameras are inexpensive and provide more information than radar or sonar (which are also useful). It isn’t possible to build really safe autonomous driving systems without learning through experience. Thus, designers of such systems put their cars on the roads and collect experiences which are used to fine-tune the system.

Other applications include high-speed stock trading and algorithms to guide investments. These are under rapid development and are now available to the consumer. Data mining and machine learning are used to predict events, both human and natural. Searches on the Internet have been used to track disease outbreaks. If there are a lot of data—and the Internet makes gathering massive data easy—then you can be sure that machine learning techniques are being applied to mine the data.

2.5 The Future

Autonomous learning in all its branches is undergoing rapid development today. Many of the technologies are used operationally even in low-cost consumer technology. Virtually every automobile company in the world and many nonautomotive companies are working to perfect autonomous driving. Military organizations are extremely interested in AI and machine learning. Combat aircraft today have systems to take over from the pilot, for example, to prevent planes from crashing into the ground.

While completely autonomous systems are the goal in many areas, the meshing of human and machine intelligence is also an area of active research. Much AI research has been to study how the human mind works. This work will allow machine learning systems to mesh more seamlessly with human beings. This is critical for autonomous control involving people, but may also allow people to augment their own abilities.

This is an exciting time for machine learning! We hope that this book helps you bring your own advances to machine learning!

References

[1] K. J. Åström and B. Wittenmark. Adaptive Control, Second Edition. Addison-Wesley, 1995.

[2] S. Haykin. Neural Networks. Prentice-Hall, 1999.

[3] P. Jackson. Introduction to Expert Systems, Third Edition. Addison-Wesley, 1999.

[4] S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach, Third Edition. Prentice-Hall, 2010.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.188.238