The scikit-learn ML/classifier interface

We'll be diving into the basic principles of machine learning and demonstrate the use of these principles via the scikit-learn basic API.

The scikit-learn library has an estimator interface. We illustrate it by using a linear regression model. For example, consider the following:

In [3]: from sklearn.linear_model import LinearRegression

The estimator interface is instantiated to create a model, which is a linear regression model in this case:

In [4]: model = LinearRegression(normalize=True)   
In [6]: print model
    LinearRegression(copy_X=True, fit_intercept=True, normalize=True)

Here, we specify normalize=True, indicating that the x-values will be normalized before regression. Hyperparameters (estimator parameters) are passed on as arguments in the model creation. This is an example of creating a model with tunable parameters.

The estimated parameters are obtained from the data when the data is fitted with an estimator. Let us first create some sample training data that is normally distributed about y = x/2. We first generate our x and y values:

In [51]: sample_size=500
         x = []
         y = []

        for i in range(sample_size):
            newVal = random.normalvariate(100,10)
            x.append(newVal)
            y.append(newVal / 2.0 + random.normalvariate(50,5))

sklearn takes a 2D array of num_samples × num_features as input, so we convert our x data into a 2D array:

In [67]: X = np.array(x)[:,np.newaxis]
         X.shape
Out[67]: (500, 1)

In this case, we have 500 samples and 1 feature, x. We now train/fit the model and display the slope (coefficient) and the intercept of the regression line, which is the prediction:

In [71]: model.fit(X,y)
         print "coeff=%s, intercept=%s" % (model.coef_,model.intercept_)
         coeff=[ 0.47071289], intercept=52.7456611783

This can be visualized as follows:

In [65]: plt.title("Plot of linear regression line and training data")
         plt.xlabel('x')
         plt.ylabel('y')
         plt.scatter(X,y,marker='o', color='green', label='training data'),
         plt.plot(X,model.predict(X), color='red', label='regression line')
         plt.legend(loc=2)

Out[65]: [<matplotlib.lines.Line2D at 0x7f11b0752350]
The scikit-learn ML/classifier interface

To summarize the basic use of estimator interface, follow these steps:

  1. Define your model - LinearRegression, SupportVectorMachine, DecisionTrees, and so on. You can specify the needed hyperparameters in this step. For example, normalize=True as specified earlier.
  2. Once the model has been defined, you can train your model on your data by calling the fit(..) method on the model defined in the previous step.
  3. Once we have fit the model, we can call the predict(..) method on test data in order to make predictions or estimations.
  4. In the case of a supervised learning problem, the predict(X) method is given unlabeled observations X and returns predicted labels y.

Note

For extra reference, please see the following: http://bit.ly/1FU7mXj and http://bit.ly/1QqFN2V.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.217.95