Machine learning is the art of creating software programs that learn from data. More formally, it can be defined as the practice of building adaptive programs that use tunable parameters to improve predictive performance. It is a sub-field of artificial intelligence.
We can separate machine learning programs based on the type of problems they are trying to solve. These problems are appropriately called learning problems.
The two categories of these problems, broadly speaking, are referred to as supervised and unsupervised learning problems. Further, there are some hybrid problems that have aspects that involve both categories.
The input to a learning problem consists of a dataset of n rows. Each row represents a sample and may involve one or more fields referred to as attributes or features.
A dataset can be canonically described as consisting of n samples, each consisting of m features. A more detailed introduction to machine learning is given in the following paper:
A Few Useful Things to Know about Machine Learning at http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
For supervised learning problems, the input to a learning problem is a dataset consisting of labeled data. By this we mean that we have outputs whose values are known. The learning program is fed input samples and their corresponding outputs and its goal is to decipher the relationship between them. Such input is known as labeled data. Supervised learning problems include the following:
In unsupervised learning or data mining, the learning program is fed inputs but no corresponding outputs. This input data is referred to as unlabeled data. The learning program's goal is to learn or decipher the hidden label. Such problems include the following:
A common usage of machine learning techniques is in the area of document classification. The two main categories of machine learning can be applied to this problem - supervised and unsupervised learning.
Machine learning systems utilize what is known as a classifier in order to learn from data. A classifier is an interface that takes a matrix of what is known as feature values and produces an output vector, also known as the class. These feature values may be discrete or continuously valued. There are three core components of classifiers:
3.149.249.252