8.9. Establishing Relationships Between Variables

Problem

You need to establish a relationship between two independent variables. These variables could be temperature versus energy use or the number of news channels versus stress-related ailments; you need to measure the correlation between two variables.

Solution

Add data points to an instance of Commons Math SimpleRegression. This class will calculate the slope, slope confidence, and a measure of relatedness known as R-square. The SimpleRegression class performs a least squares regression with one independent variable; adding data points to this model refines parameters to the equation y = ax + b. The following code uses SimpleRegression to find a relationship between two series of values [0, 1, 2, 3, 4, 5] and [0, 1.2, 2.6, 3.2, 4, 5]:

import orgorg.apache.commons.math.stat.multivariate.SimpleRegression;

SimpleRegression sr = new SimpleRegression( );

// Add data points         
sr.addData( 0, 0 );
sr.addData( 1, 1.2 );
sr.addData( 2, 2.6 );
sr.addData( 3, 3.2 );
sr.addData( 4, 4 );
sr.addData( 5, 5 );

// Print the value of y when line intersects the y axis
System.out.println( "Intercept: " + sr.getIntercept( ) );

// Print the number of data points
System.out.println( "N: " + sr.getN( ) );

// Print the Slope and the Slop Confidence
System.out.println( "Slope: " + sr.getSlope( ) );
System.out.println( "Slope Confidence: " + sr.getSlopeConfidenceInterval( ) );

// Print RSquare a measure of relatedness
System.out.println( "RSquare: " + sr.getRSquare( ) );

This example passes six data points to SimpleRegression and prints the slope, number of data points, and R-square from SimpleRegression:

Intercept: 0.238
N: 6
Slope: 0.971
Slope Confidence: 0.169
RSquare: 0.985

Discussion

R-square is the square of something called the Pearson’s product moment correlation coefficient, which can be obtained by calling getR( ) on SimpleRegression . R-square is a determination of correlation between two series of numbers. The parameters to the addData() method of SimpleRegression are a corresponding x and y value in two sets of data. If R-square is 1.0, the model shows that as x increases linearly, y increases linearly. In the previous example, R-square is 0.98, and this demonstrates that the (x,y) data points added to SimpleRegression have a strong linear relationship.

If R-square is -1.0, x increases linearly as y decreases linearly. A value of 0.0 shows that the relationship between x and y is not linear. The following example demonstrates two series of numbers with no relationship:

import org.apache.commons.math.stat.multivariate.SimpleRegression;

SimpleRegression sr = new SimpleRegression( );
sr.addData( 400, 100 );
sr.addData( 300, 105 );
sr.addData( 350, 70 );
sr.addData( 200, 50 );
sr.addData( 150, 300 );
sr.addData( 50, 500 );

// Print RSquare a measure of relatedness
System.out.println( "RSquare: " + sr.getRSquare( ) );

The data points added to this SimpleRegression are all over the map; x and y are unrelated, and the R-square value for this set of data points is very close to zero:

Intercept: 77.736
N: 12
Slope: 0.142
Slope Confidence: 0.699
RSquare: 0.02

The (x,y) data points supplied to the previous example have no linear correlation. This doesn’t prove that there is no relationship between x and y, but it does prove that the relationship is not linear.

See Also

For more information about least squares, the technique used by SimpleRegression, see Wikipedia (http://en.wikipedia.org/wiki/Least_squares). More information about R and R-square can also be found on Wikipedia (http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.139.8