The timeline of machine learning, as available on Wikipedia (https://en.wikipedia.org/wiki/Timeline_of_machine_learning), provides a succinct and insightful overview of the evolution of the field. The roots can be traced back to as early as the mid-1700s, when Thomas Bayes presented his paper on inverse probability at the Royal Society of London. Inverse probability, more commonly known today as probability distribution, deals with the problem of determining the state of a system given a prior set of events. For example, if a box contained milk chocolate and white chocolate, you took out a few at random, and received two milk and three white chocolates, can we infer how many of each chocolate there are in the box?
In other words, what can we infer about the unknown given a few points of data with which we can postulate a formal theory? Bayes' work was developed further into Bayes' Theorem by Pierre-Simon Laplace in his text, Théorie Analytique des Probabilités.
In the early 1900s, Andrey Markov's analysis of Pushkin's Poem, Eugeny Onegin, to determine the alliteration of consonants and vowels in Russian literature, led to the development of a technique known as Markov Chains, used today to model complex situations involving random events. Google's PageRank algorithm implements a form of Markov Chains.
The first formal application of machine learning, or more generally, AI, and its eventual emergence as a discipline, should be attributed to Alan Turing. He developed the Turing Test - a way to determine whether a machine is intelligent enough to mimic human behavior. Turing presented this in his paper, Computing Machinery and Intelligence, which starts out with the following:
Later in the paper, Turing writes:
Turing's work on AI was followed by a series of seminal events in machine learning and AI. The first neural network was developed by Marvin Misky in 1951, Arthur Samuel began his work on the first machine learning programs that played checkers in 1952, and Rosenblatt invented the perceptron, a fundamental unit of neural networks, in 1957. Pioneers such as Leo Breiman, Jerome Friedman, Vladimir Vapnik and Alexey Chervonenkis, Geoff Hinton, and YannLeCun made significant contributions through the late 1990s to bring machine learning into the limelight. We are greatly indebted to their work and contributions, which have made machine learning stand out as a distinct area of research today.
In 1997, IBM's Deep Blue beat Kasparov and it immediately became a worldwide sensation. The ability of a machine to beat the world's top chess champion was no ordinary achievement. The event gave some much-needed credibility to machine learning as a formidable contender for the intelligent machines that Turing envisaged.