The data to which a ML algorithm is applied is called a training set, which consists of a set of pairs (x
, y
), called training examples. The pairs are explained as follows:
x
: This is a vector of values, often called the feature vector. Each value, or feature, can be categorical (values are taken from a set of discrete values, such as {S, M, L}
) or numerical.y
: This is the label, the classification or regression values for x
.The objective of the ML process is to discover a function that best predicts the value of y
associated with each value of x
. The type of y
is in principle arbitrary, but there are several common and important cases.
y
: This is a real number. The ML problem is called regression.y
: This is a Boolean value true or false, more commonly written as +1 and -1, respectively. In this class, the problem is binary classification.y
: Here this is a member of some finite set. The member of this set can be thought of as classes, and each member represents one class. The problem is multiclass classification.y
: This is a member of some potentially infinite set, for example, a parse tree for x
, which is interpreted as a sentence.Until now, machine learning has not proved successful in situations where we can describe the goals of the mining more directly. Machine learning and data mining are two different topics, although some algorithms are shared between them—algorithms are shared especially when the goal is to extract information. There are situations where machine learning makes sense. The typical one is when we have idea of what we looking for in the dataset.
The major classes of algorithms are listed here. Each is distinguished by the function .
x
that determines which child or children the search must proceed for.
The data aspects of machine learning here means the way data is handled and the way it is used to build the model.
18.116.63.105