Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Regression models (Simple)

Regression is a technique used to predict a value of a numerical class, in contrast to classification, which predicts the value of a nominal class. Given a set of attributes, the regression builds a model, usually an equation that is used to compute the predicted class value.

Getting ready

Let's look at an example of a house price-based regression model, and create some real data to examine. These are actual numbers from houses for sale, and we will be trying to find the value of a house we are supposed to sell:

Size (m2)	Land (m2)	Rooms	Granite	Extra bathroom	Price
1076	2801	6	0	0	€324.500,00
990	3067	5	1	1	€466.000,00
1229	3094	5	0	1	€425.900,00
731	4315	4	1	0	€387.120,00
671	2926	4	0	1	€312.100,00
1078	6094	6	1	1	€603.000,00
909	2854	5	0	1	€383.400,00
975	2947	5	1	1	??

To load files in Weka, we have to put the table in the ARFF file format and save it as house.arff. Make sure the attributes are numeric, as shown here:

@RELATION house
@ATTRIBUTE size NUMERIC
@ATTRIBUTE land NUMERIC
@ATTRIBUTE rooms NUMERIC
@ATTRIBUTE granite NUMERIC
@ATTRIBUTE extra_bathroom NUMERIC
@ATTRIBUTE price NUMERIC

@DATA
1076,2801,6,0,0,324500
990,3067,5,1,1,466000
1229,3094,5,0,1,425900
731,4315,4,1,0,387120
671,2926,4,0,1,312100
1078,6094,6,1,1,603000
909,2854,5,0,1,383400
975,2947,5,1,1,?

How to do it...

Use the following snippet:

import java.io.BufferedReader;
import java.io.FileReader;

import weka.core.Instance;
import weka.core.Instances;
import weka.classifiers.functions.LinearRegression;

public class Regression{

  public static void main(String args[]) throws Exception{
    //load data
    Instances data = new Instances(new BufferedReader(new FileReader("dataset/house.arff")));
    data.setClassIndex(data.numAttributes() - 1);
    
    //build model
    LinearRegression model = new LinearRegression();
    model.buildClassifier(data); //the last instance with missing class is not used
    System.out.println(model);
    
    //classify the last instance
    Instance myHouse = data.lastInstance();
    double price = model.classifyInstance(myHouse);
    System.out.println("My house ("+myHouse+"): "+price);
  }
}

Here is the output:

Linear Regression Model

price =

    195.2035 * size +
     38.9694 * land +
  76218.4642 * granite +
  73947.2118 * extra_bathroom +
   2681.136 

My house (975,2947,5,1,1,?): 458013.16703945777

The model estimated the value of our house to be $458,013.17.

How it works...

Import a basic regression model named weka.classifiers.functions.LinearRegression:

import java.io.BufferedReader;
import java.io.FileReader;

import weka.core.Instance;
import weka.core.Instances;
import weka.classifiers.functions.LinearRegression;

Load the house dataset:

Instances data = new Instances(new BufferedReader(new FileReader("dataset/house.arff")));
data.setClassIndex(data.numAttributes() - 1);

Initialize and build a regression model. Note, that the last instance is not used for building the model since the class value is missing:

LinearRegression model = new LinearRegression();
model.buildClassifier(data);

Output the model:

System.out.println(model);

Use the model to predict the price of the last instance in the dataset:

Instance myHouse = data.lastInstance();
double price = model.classifyInstance(myHouse);
System.out.println("My house ("+myHouse+"): "+price);

There's more...

This section lists some additional algorithms.

Other regression algorithms

There is a wide variety of implemented regression algorithms one can use in Weka:

weka.classifiers.rules.ZeroR: The class for building and using an 0-R classifier. Predicts the mean (for a numeric class) or the mode (for a nominal class) and it is considered as a baseline; that is, if your classifier's performance is worse than average value predictor, it is not worth considering it.
weka.classifiers.trees.REPTree: The fast decision tree learner. Builds a decision/regression tree using information gain/variance and prunes it using reduced-error pruning (with backfitting). It only sorts values for numeric attributes once. Missing values are dealt with by splitting the corresponding instances into pieces (that is, as in C4.5).
weka.classifiers.functions.SMOreg: SMOreg implements the support vector machine for regression. The parameters can be learned using various algorithms. The algorithm is selected by setting the RegOptimizer. The most popular algorithm (RegSMOImproved) is due to Shevade, Keerthi, and others, and this is the default RegOptimizer.
weka.classifiers.functions.MultilayerPerceptron: A classifier that uses backpropagation to classify instances. This network can be built by hand, or created by an algorithm, or both. The network can also be monitored and modified during training time. The nodes in this network are all sigmoid (except for when the class is numeric in which case the output nodes become unthresholded linear units).
weka.classifiers.functions.GaussianProcesses: Implements Gaussian Processes for regression without hyperparameter-tuning.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Regression models (Simple)

Create new playlist

Sign In