Introduction to machine learning

Machine learning is the art of creating software programs that learn from data. More formally, it can be defined as the practice of building adaptive programs that use tunable parameters to improve predictive performance. It is a sub-field of artificial intelligence.

We can separate machine learning programs based on the type of problems they are trying to solve. These problems are appropriately called learning problems.

The two categories of these problems, broadly speaking, are referred to as supervised and unsupervised learning problems. Further, there are some hybrid problems that have aspects that involve both categories.

The input to a learning problem consists of a dataset of n rows. Each row represents a sample and may involve one or more fields referred to as attributes or features.

A dataset can be canonically described as consisting of n samples, each consisting of m features. A more detailed introduction to machine learning is given in the following paper:

A Few Useful Things to Know about Machine Learning at http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf

Supervised versus unsupervised learning

For supervised learning problems, the input to a learning problem is a dataset consisting of labeled data. By this we mean that we have outputs whose values are known. The learning program is fed input samples and their corresponding outputs and its goal is to decipher the relationship between them. Such input is known as labeled data. Supervised learning problems include the following:

  • Classification: The learned attribute is categorical (nominal) or discrete
  • Regression: The learned attribute is numeric/continuous

In unsupervised learning or data mining, the learning program is fed inputs but no corresponding outputs. This input data is referred to as unlabeled data. The learning program's goal is to learn or decipher the hidden label. Such problems include the following:

  • Clustering
  • Dimensionality reduction

Illustration using document classification

A common usage of machine learning techniques is in the area of document classification. The two main categories of machine learning can be applied to this problem - supervised and unsupervised learning.

Supervised learning

Each document in the input collection is assigned to a category, that is, a label. The learning program/algorithm uses the input collection of documents to learn how to make predictions for another set of documents with no labels. This method is known as classification.

Unsupervised learning

The documents in the input collection are not assigned to categories; hence, they are unlabeled. The learning program takes this as input and tries to cluster or discover groups of related or similar documents. This method is known as clustering.

How machine learning systems learn

Machine learning systems utilize what is known as a classifier in order to learn from data. A classifier is an interface that takes a matrix of what is known as feature values and produces an output vector, also known as the class. These feature values may be discrete or continuously valued. There are three core components of classifiers:

  • Representation: What type of classifier is it?
  • Evaluation: How good is the classifier?
  • Optimization: How to search among the alternatives?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.249.252