Gradient-boosting regression

There are several ways these models can be interpreted. It's one of the advanced topic, this can be used is a prediction. There are several algorithms that can be applied in order to predict house prices. In this chapter, we can try to use gradient-boosting regression (GBR). 

Gradient boosting is one of the most powerful techniques for building predictive models. You can read more about the algorithm from the links provided. Let's see how we can employ this algorithm to predict house prices using our dataset: 

  1. Import the library from sklearn
  2. Create a variable for Gradient boosting and set the following parameters:
    • n_estimators: The number of boosting stages to perform. 
    • max_depth: The depth of the tree model. 
    • learning_rate: The rate of learning the data. 
    • loss: The loss function to be optimized; ls refers to least-squares regression. 
  3. Fit the training data into the gradient-boosting model (GBM).
  4. Check for accuracy: 
from sklearn import ensemble
clf = ensemble.GradientBoostingRegressor(n_estimators = 400, max_depth = 5, min_samples_split = 2,learning_rate = 0.1, loss = 'ls')

We just imported the GradientBoostingRegressor from the library and set the parameters. It is important to note that how you set these parameters really depends on an empirical approach, practice, and experience. 

Now, let's just create a training sample list and test it from the dataset. To do so, we can choose a subset of data as training data and the remaining as the test data, to see how well our model has learned to classify. We can do that by following the snippet:

x_df = data.drop(['id','date',], axis = 1)

We dropped id and date from the preceding data frame. Now, let's drop the price from the training dataset: 

x_df2 = x_df.drop(['price'], axis = 1)

We can use the train_test_split function available from the sklearn library to create training and test datasets: 

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x_df2,data['price'],test_size=0.4,random_state=4)

In the preceding snippet, it is important to understand that we created a training dataset and a testing dataset from the same dataset. Now let's try to fit our training data into the GBM: 

clf.fit(x_train, y_train)

Fitting the training data gives the following result:

GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.1, loss='ls', max_depth=5, max_features=None,
             max_leaf_nodes=None, min_impurity_split=1e-07,
             min_samples_leaf=1, min_samples_split=2,
             min_weight_fraction_leaf=0.0, n_estimators=400,
             presort='auto', random_state=None, subsample=1.0, verbose=0,
             warm_start=False)

 Let's check for the accuracy of the prediction: 

clf.score(x_test,y_test)

The preceding code gives us the following output:

0.91948383097569342

This is 91.95%, which is pretty good. You can try to apply these techniques to various other datasets. In addition to this gradient-boosting algorithm, you can try using linear regression. However, in our case, the gradient-boosting algorithm gives better accuracy. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.174.191