If you haven't done so already, navigate to the repository at https://github.com/joshnewnham/MachineLearningWithCoreML and download the latest code. Once downloaded, navigate to the directory Chapter2/Start/ and open the playground LinearRegression.playground.
We will be creating a model that will predict the total payments for all claims (y) given the number of claims (x); the dataset we will be working with is auto insurance claims in Sweden. It consists of 2 columns and 64 rows, the first column containing the number of claims, and the second containing the total payments for all claims. Here is an extract from the dataset:
Number of claims
|
Total payments for all claims in thousands of Swedish Kronor |
108 | 329.5 |
19 | 46.2 |
13 | 15.7 |
124 | 422.2 |
... | ... |
In the playground script, you will see that we are creating a view of the type ScatterPlotView and assigning it to the playground's live view. We will use this view to visualize the data and the predictions from our model:
let view = ScatterPlotView(frame: CGRect(x: 20, y: 20, width: 300, height: 300))
PlaygroundPage.current.liveView = view
By using this view, we can plot an array of data points using the view.scatter(dataPoints:) method and draw a line using the view.line(pointA:,pointB) method. Let's load the raw data and visualize it:
let csvData = parseCSV(contents:loadCSV(file:"SwedishAutoInsurance"))
let dataPoints = extractDataPoints(data: csvData, xKey: "claims", yKey: "payments")
view.scatter(dataPoints)
In the previous code snippet, we first load the data into the csvData variable and then cast it into a strongly typed array of DataPoint (a strongly typed data object, which our view is expecting). Once loaded, we pass our data to the view via the scatter method, which renders the following output:
Each dot represents a single datapoint plotted against the number of claims (x axis) and total payments for all claims (y axis). From this visualization, we can infer some linear relationship between the number of claims and total payments for all claims; that is, an increase in number of claims increases the total payments for all claims. Using this intuition, we will attempt to model the data according to a linear model, one that, when given the number of claims, is able to predict the total payments for all claims. What we are describing here is a type of algorithm known as simple linear regression; in essence, this is just finding a straight line that best fits our data. It can be described with the function y = w * x + b, where y is the total payments for all claims, x is the number of claims, w is the relationship between y and x, and b is the intercept.
Simple enough! Our next problem is finding this line that best fits our data. For this, we are going to use an approach called gradient descent; there are plenty of books that go into the theoretical and technical details of gradient descent, so here we will just present some intuition behind it and leave it to you, the curious reader, to study the details.
You can think of gradient descent as a search for some minimum point; what determines this minimum point is something called a loss function. For us, it will be the absolute error between our prediction and actual number of claims. The algorithm is steered by calculating the relative contribution of each of our variables (here it is w and b). Let's see how this looks in code by working through the train method:
func train(
x:[CGFloat],
y:[CGFloat],
b:CGFloat=0.0,
w:CGFloat=0.0,
learningRate:CGFloat=0.00001,
epochs:Int=100,
trainingCallback: ((Int, Int, CGFloat, CGFloat) -> Void)? = nil) -> (b:CGFloat, w:CGFloat){
var B = b // bias
var W = w // weight
let N = CGFloat(x.count) // number of data points
for epoch in 0...epochs{
// TODO: create variable to store this epoch's gradient for b and w
for i in 0..<x.count{
// TODO: make a prediction (using the linear equation y = b + x * w
// TODO: calculate the absolute error (prediction - actual value)
// TODO: calculate the gradient with respect to the error and b (); adding it to the epochs bias gradient
// TODO: calculate the gradient with respect to the error and w (); adding it to the epochs weight gradient
}
// TODO: update the bias (B) using the learningRate
// TODO: update the weight (W) using the learningRate
if let trainingCallback = trainingCallback{
trainingCallback(epoch, epochs, W, B)
}
}
return (b:B, w:W)
}
Our train method takes in these arguments:
- x: An array of DataPoint containing the number of claims
- y: An array of DataPoint containing the total number of payments
- b: This is a random value used in our linear function to start our search
- w: Another random value used in our linear function to start our search
- learningRate: How quickly we adjust the weights
- epochs: The number of times we iterate, that is, make a prediction, and adjust our coefficients based on the difference between the prediction and expected value
- trainingCallback: This function is called after each epoch to report the progress
We next create some variables that will be used throughout training and begin our search (for epoch in 0...epochs). Let's step through each TODO and replace them with their respective code.
First, we start by creating two variables to hold the gradients for our variables b and w (these are the adjustments we need to make to their respective coefficients to minimize the loss, also known as absolute error):
// TODO: create variable to store this epoch's gradient for b and w
var bGradient : CGFloat = 0.0
var wGradient : CGFloat = 0.0
Next, we iterate over each data point, and for each data point, make a prediction and calculate the absolute error:
// TODO: make a prediction (using the linear equation y = b + x * w
let yHat = W * x[i] + B
// TODO: calculate the absolute error (prediction - actual value)
let error = y[i] - yHat
Now calculate the partial derivative with respect to the error. Think of this as a way to steer the search in the right direction, that is, calculating this gives us the direction and magnitude that we need to change b and w to minimize our error:
// TODO: calculate the gradient with respect to the error and b (); adding it to the epochs bias gradient
B = B - (learningRate * bGradient)
// TODO: calculate the gradient with respect to the error and w (); adding it to the epochs weight gradient
W = W - (learningRate * wGradient)
After iterating over each data point, we adjust the coefficients B and W using their accumulated gradients.
After each epoch, trainingCallback is called to draw a line using the current model's coefficients (its current best fit line that fits the data); the progress of this is shown in the following diagram:
Admittedly, this is difficult to interpret without a key! But the pattern will hopefully be obvious; with each iteration, our line better fits the data. After 100 epochs, we end up with this model:
The function describing this line is y = 0.733505317339142 + 3.4474988368438 * x. Using this model, we can predict the total payments for all claims given the number of claims (by simply substituting x with the number of claims).