Learning from data

There have been many definitions of ML, which all revolve around the automated detection of meaningful patterns in data. Two prominent examples include:

  • AI pioneer Arthur Samuelson defined ML in 1959 as a subfield of computer science that gives computers the ability to learn without being explicitly programmed. 
  • Toni Mitchell, one of the current leaders in the field, pinned down a well-posed learning problem more specifically in 1998: a computer program learns from experience with respect to a task and a performance measure whether the performance of the task improves with experience.

Experience is presented to an algorithm in the form of training data. The principal difference to previous attempts at building machines that solve problems is that the rules that an algorithm uses to make decisions are learned from the data as opposed to being programmed or hard-coded—this was the case for expert systems prominent in the 1980s. 

The key challenge of automated learning is to identify patterns in the training data that are meaningful when generalizing the model's learning to new data. There are a large number of potential patterns that a model could identify, while the training data only constitute a sample of the larger set of phenomena that the algorithm needs to perform the task in the future. The infinite number of functions that could generate the given outputs from the given input make the search process impossible to solve without restrictions on the eligible set of functions.

The types of patterns that an algorithm is capable of learning are limited by the size of its hypothesis space on the one hand and the amount of information contained in the sample data on the other. The size of the hypothesis space varies significantly between algorithms. On the one hand, this limitation enables a successful search and on the other hand, it implies an inductive bias as the algorithm generalizes from the training sample to new data.

Hence, the key challenge becomes a matter of how to choose a model with a hypothesis space large enough to contain a solution to the learning problem, yet small enough to ensure reliable generalization given the size of the training data. With more and more informative data, a model with a larger hypothesis space will be successful.

The no-free-lunch theorem states that there is no universal learning algorithm. Instead, a learner's hypothesis space has to be tailored to a specific task using prior knowledge about the task domain in order for the search of meaningful patterns to succeed. We will pay close attention to the assumptions that a model makes about data relationships for a specific task throughout this chapter, and emphasize the importance of matching these assumptions with empirical evidence gleaned from data exploration. The process required to master the task can be differentiated into supervised, unsupervised, and reinforcement learning. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.215.79.206