Supervised learning

Supervised learning is the most commonly used type of ML. We will dedicate most of the chapters in this book to learning about the various applications of models in this category. The term supervised implies the presence of an outcome variable that guides the learning process—that is, it teaches the algorithm the correct solution to the task that is being learned. Supervised learning aims at generalizing a functional relationship between input and output data that is learned from individual samples and applying it to new data.

The output variable is also, depending on the field, interchangeably called the label, target, outcome, endogenous, or left-hand-side variable. We will use yi for observations i = 1, ..., N, or y in vector notation. Some tasks are represented by several outcomes, also called multilabel problemsThe input data for a supervised learning problem is also known as features, exogenous, and right-hand-side variables, denoted by an xi for a vector of features for observations i = 1, ..., N, or X in matrix notation.

The solution to a supervised learning problem is a function (that represents what the model learned about the input-output relationship from the sample and approximates the true relationship, represented with . This function can be used to infer statistical associations or potentially even causal relationships among variables of interest beyond the sample, or it can be used to predict outputs for new input data.

Both goals face an important trade-off: more complex models have more moving parts that are capable of representing more nuanced relationships, but they may also be more difficult to inspect. They are also likely to overfit and learn random noise particular to the training sample, as opposed to a systematic signal that represents a general pattern of the input-output relationship. Overly simple models, on the other hand, will miss signals and deliver biased results. This trade-off is known as the bias-variance trade-off in supervised learning, but conceptually this also applies to the other forms of ML where overly complex models may perform poorly beyond the training data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.254.0