Customizing the line type

So far, we have just plotted lines using the default settings. Breeze lets us customize how lines are drawn, at least to some extent.

For this example, we will use the height-weight data discussed in Chapter 2, Manipulating Data with Breeze. We will use the Scala shell here for demonstrative purposes, but you will find a program in BreezeDemo.scala that follows the example shell session.

The code examples for this chapter come with a module for loading the data, HWData.scala, that loads the data from the CSVs:

scala> val data = HWData.load
data: HWData = HWData [ 181 rows ]

scala> data.heights
breeze.linalg.DenseVector[Double] = DenseVector(182.0, ...

scala> data.weights
breeze.linalg.DenseVector[Double] = DenseVector(77.0, 58.0...

Let's create a scatter plot of the heights against the weights:

scala> val fig = Figure("height vs. weight")
fig: breeze.plot.Figure = breeze.plot.Figure@743f2558

scala> val plt = fig.subplot(0)
plt: breeze.plot.Plot = breeze.plot.Plot@501ea274

scala> plt += plot(data.heights, data.weights, '+',         colorcode="black")
breeze.plot.Plot = breeze.plot.Plot@501ea274

This produces a scatter-plot of the height-weight data:

Customizing the line type

Note that we passed a third argument to the plot method, '+'. This controls the plotting style. As of this writing, there are three available styles: '-' (the default), '+', and '.'. Experiment with these to see what they do. Finally, we pass a colorcode="black" argument to control the color of the line. This is either a color name or an RGB triple, written as a string. Thus, to plot red points, we could have passed colorcode="[255,0,0]".

Looking at the height-weight plot, there is clearly a trend between height and weight. Let's try and fit a straight line through the data points. We will fit the following function:

Customizing the line type

Note

Scientific literature suggests that it would be better to fit something more like Customizing the line type. You should find it straightforward to fit a quadratic line to the data, should you wish to.

We will use Breeze's least squares function to find the values of a and b. The leastSquares method expects an input matrix of features and a target vector, just like the LogisticRegression class that we defined in the previous chapter. Recall that in Chapter 2, Manipulating Data with Breeze, when we prepared the training set for logistic regression classification, we introduced a dummy feature that was one for every participant to provide the degree of freedom for the y intercept. We will use the same approach here. Our feature matrix, therefore, contains two columns—one that is 1 everywhere and one for the height:

scala> val features = DenseMatrix.horzcat(
  DenseMatrix.ones[Double](data.npoints, 1),
  data.heights.toDenseMatrix.t
)
features: breeze.linalg.DenseMatrix[Double] =
1.0  182.0
1.0  161.0
1.0  161.0
1.0  177.0
1.0  157.0
...

scala> import breeze.stats.regression._
import breeze.stats.regression._

scala> val leastSquaresResult = leastSquares(features, data.weights)
leastSquaresResult: breeze.stats.regression.LeastSquaresRegressionResult = <function1>

The leastSquares method returns an instance of LeastSquareRegressionResult, which contains a coefficients attribute containing the coefficients that best fit the data:

scala> leastSquaresResult.coefficients
breeze.linalg.DenseVector[Double] = DenseVector(-131.042322, 1.1521875)

The best-fit line is therefore:

Customizing the line type

Let's extract the coefficients. An elegant way of doing this is to use Scala's pattern matching capabilities:

scala> val Array(a, b) = leastSquaresResult.coefficients.toArray
a: Double = -131.04232269750622
b: Double = 1.1521875435418725

By writing val Array(a, b) = ..., we are telling Scala that the right-hand side of the expression is a two-element array and to bind the first element of that array to the value a and the second to the value b. See Appendix, Pattern Matching and Extractors, for a discussion of pattern matching.

We can now add the best-fit line to our graph. We start by generating evenly-spaced dummy height values:

scala> val dummyHeights = linspace(min(data.heights), max(data.heights), 200)
dummyHeights: breeze.linalg.DenseVector[Double] = DenseVector(148.0, ...

scala> val fittedWeights = a :+ (b :* dummyHeights)
fittedWeights: breeze.linalg.DenseVector[Double] = DenseVector(39.4814...

scala> plt += plot(dummyHeights, fittedWeights, colorcode="red")
breeze.plot.Plot = breeze.plot.Plot@501ea274

Let's also add the equation for the best-fit line to the graph as an annotation. We will first generate the label:

scala> val label = f"weight = $a%.4f + $b%.4f * height"
label: String = weight = -131.0423 + 1.1522 * height

To add an annotation, we must access the underlying JFreeChart plot:

scala> import org.jfree.chart.annotations.XYTextAnnotation
import org.jfree.chart.annotations.XYTextAnnotation

scala> plt.plot.addAnnotation(new XYTextAnnotation(label, 175.0, 105.0))

The XYTextAnnotation constructor takes three parameters: the annotation string and a pair of (x, y) coordinates defining the centre of the annotation on the graph. The coordinates of the annotation are expressed in the coordinate system of the data. Thus, calling new XYTextAnnotation(label, 175.0, 105.0) generates an annotation whose centroid is at the point corresponding to a height of 175 cm and weight of 105 kg:

Customizing the line type
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.163.13