You need to establish a relationship between two independent variables. These variables could be temperature versus energy use or the number of news channels versus stress-related ailments; you need to measure the correlation between two variables.
Add data points to an instance of Commons Math
SimpleRegression
. This class will calculate the
slope, slope confidence, and a measure of relatedness known as
R-square. The SimpleRegression
class performs a
least squares regression with one independent variable; adding data
points to this model refines parameters to the equation y =
ax + b
. The following code uses
SimpleRegression
to find a relationship between
two series of values [0, 1, 2, 3, 4, 5]
and
[0, 1.2, 2.6, 3.2, 4, 5]
:
import orgorg.apache.commons.math.stat.multivariate.SimpleRegression; SimpleRegression sr = new SimpleRegression( ); // Add data points sr.addData( 0, 0 ); sr.addData( 1, 1.2 ); sr.addData( 2, 2.6 ); sr.addData( 3, 3.2 ); sr.addData( 4, 4 ); sr.addData( 5, 5 ); // Print the value of y when line intersects the y axis System.out.println( "Intercept: " + sr.getIntercept( ) ); // Print the number of data points System.out.println( "N: " + sr.getN( ) ); // Print the Slope and the Slop Confidence System.out.println( "Slope: " + sr.getSlope( ) ); System.out.println( "Slope Confidence: " + sr.getSlopeConfidenceInterval( ) ); // Print RSquare a measure of relatedness System.out.println( "RSquare: " + sr.getRSquare( ) );
This example passes six data points to
SimpleRegression
and prints the slope, number of
data points, and R-square from SimpleRegression
:
Intercept: 0.238 N: 6 Slope: 0.971 Slope Confidence: 0.169 RSquare: 0.985
R-square is the square of something called the
Pearson’s product moment correlation coefficient,
which can be obtained by calling
getR( )
on
SimpleRegression
.
R-square is a determination of correlation between two series of
numbers. The parameters to the addData()
method of
SimpleRegression
are a corresponding x and y value
in two sets of data. If R-square is 1.0, the model shows that as x
increases linearly, y increases linearly. In the previous example,
R-square is 0.98, and this demonstrates that the (x,y) data points
added to SimpleRegression
have a strong linear
relationship.
If R-square is -1.0, x increases linearly as y decreases linearly. A value of 0.0 shows that the relationship between x and y is not linear. The following example demonstrates two series of numbers with no relationship:
import org.apache.commons.math.stat.multivariate.SimpleRegression; SimpleRegression sr = new SimpleRegression( ); sr.addData( 400, 100 ); sr.addData( 300, 105 ); sr.addData( 350, 70 ); sr.addData( 200, 50 ); sr.addData( 150, 300 ); sr.addData( 50, 500 ); // Print RSquare a measure of relatedness System.out.println( "RSquare: " + sr.getRSquare( ) );
The data points added to this SimpleRegression
are
all over the map; x and y are unrelated, and the R-square value for
this set of data points is very close to zero:
Intercept: 77.736 N: 12 Slope: 0.142 Slope Confidence: 0.699 RSquare: 0.02
The (x,y) data points supplied to the previous example have no linear correlation. This doesn’t prove that there is no relationship between x and y, but it does prove that the relationship is not linear.
For more information about least squares, the technique used by
SimpleRegression
, see Wikipedia (http://en.wikipedia.org/wiki/Least_squares).
More information about R and R-square can also be found on Wikipedia
(http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient).
3.16.139.8