Auto insurance in Sweden

If you haven't done so already, navigate to the repository at https://github.com/joshnewnham/MachineLearningWithCoreML and download the latest code. Once downloaded, navigate to the directory Chapter2/Start/ and open the playground LinearRegression.playground.

We will be creating a model that will predict the total payments for all claims (y) given the number of claims (x); the dataset we will be working with is auto insurance claims in Sweden. It consists of 2 columns and 64 rows, the first column containing the number of claims, and the second containing the total payments for all claims. Here is an extract from the dataset:

Number of claims	Total payments for all claims in thousands of Swedish Kronor
108	329.5
19	46.2
13	15.7
124	422.2
...	...

For more details, visit the source website: http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/slr/frames/slr06.html.

In the playground script, you will see that we are creating a view of the type ScatterPlotView and assigning it to the playground's live view. We will use this view to visualize the data and the predictions from our model:

let view = ScatterPlotView(frame: CGRect(x: 20, y: 20, width: 300, height: 300))

PlaygroundPage.current.liveView = view

By using this view, we can plot an array of data points using the view.scatter(dataPoints:) method and draw a line using the view.line(pointA:,pointB) method. Let's load the raw data and visualize it:

let csvData = parseCSV(contents:loadCSV(file:"SwedishAutoInsurance"))

let dataPoints = extractDataPoints(data: csvData, xKey: "claims", yKey: "payments")

view.scatter(dataPoints)

In the previous code snippet, we first load the data into the csvData variable and then cast it into a strongly typed array of DataPoint (a strongly typed data object, which our view is expecting). Once loaded, we pass our data to the view via the scatter method, which renders the following output:

Each dot represents a single datapoint plotted against the number of claims (x axis) and total payments for all claims (y axis). From this visualization, we can infer some linear relationship between the number of claims and total payments for all claims; that is, an increase in number of claims increases the total payments for all claims. Using this intuition, we will attempt to model the data according to a linear model, one that, when given the number of claims, is able to predict the total payments for all claims. What we are describing here is a type of algorithm known as simple linear regression; in essence, this is just finding a straight line that best fits our data. It can be described with the function y = w * x + b, where y is the total payments for all claims, x is the number of claims, w is the relationship between y and x, and b is the intercept.

Linear regression is a type of regression model that maps a linear function from a set of continuous inputs to a continuous output. For example, you may want to model and predict house prices; here, your inputs may be the number of bedrooms and the number of bathrooms. Using these two features, you'd want to find a function that can predict the house price, one that assumes there is a linear correlation.

Simple enough! Our next problem is finding this line that best fits our data. For this, we are going to use an approach called gradient descent; there are plenty of books that go into the theoretical and technical details of gradient descent, so here we will just present some intuition behind it and leave it to you, the curious reader, to study the details.

Gradient descent is a set of algorithms that minimize a function; in our case, they minimize the loss of our output with respect to the actual output. They achieve this by starting with an initial set of parameters (weights or coefficients) and iteratively adjusting these to minimize the calculated loss. The direction and magnitude of these adjustments are determined by how far off the predicted value is compared to the expected error and the parameters' contribution.

You can think of gradient descent as a search for some minimum point; what determines this minimum point is something called a loss function. For us, it will be the absolute error between our prediction and actual number of claims. The algorithm is steered by calculating the relative contribution of each of our variables (here it is w and b). Let's see how this looks in code by working through the train method:

func train(
    x:[CGFloat],
    y:[CGFloat],
    b:CGFloat=0.0,
    w:CGFloat=0.0,
    learningRate:CGFloat=0.00001,
    epochs:Int=100,
    trainingCallback: ((Int, Int, CGFloat, CGFloat) -> Void)? = nil) -> (b:CGFloat, w:CGFloat){
    
    var B = b // bias
    var W = w // weight
    
    let N = CGFloat(x.count) // number of data points
    
    for epoch in 0...epochs{
        // TODO: create variable to store this epoch's gradient for b and w
        for i in 0..<x.count{
            // TODO: make a prediction (using the linear equation y = b + x * w
            // TODO: calculate the absolute error (prediction - actual value)
            // TODO: calculate the gradient with respect to the error and b (); adding it to the epochs bias gradient
            // TODO: calculate the gradient with respect to the error and w (); adding it to the epochs weight gradient
        }
        // TODO: update the bias (B) using the learningRate
        // TODO: update the weight (W) using the learningRate
        if let trainingCallback = trainingCallback{
            trainingCallback(epoch, epochs, W, B)
        }
    }
    
    return (b:B, w:W)
}

Our train method takes in these arguments:

x: An array of DataPoint containing the number of claims
y: An array of DataPoint containing the total number of payments
b: This is a random value used in our linear function to start our search
w: Another random value used in our linear function to start our search
learningRate: How quickly we adjust the weights
epochs: The number of times we iterate, that is, make a prediction, and adjust our coefficients based on the difference between the prediction and expected value
trainingCallback: This function is called after each epoch to report the progress

We next create some variables that will be used throughout training and begin our search (for epoch in 0...epochs). Let's step through each TODO and replace them with their respective code.

First, we start by creating two variables to hold the gradients for our variables b and w (these are the adjustments we need to make to their respective coefficients to minimize the loss, also known as absolute error):

// TODO: create variable to store this epoch's gradient for b and w
var bGradient : CGFloat = 0.0
var wGradient : CGFloat = 0.0

Next, we iterate over each data point, and for each data point, make a prediction and calculate the absolute error:

// TODO: make a prediction (using the linear equation y = b + x * w
let yHat = W * x[i] + B
// TODO: calculate the absolute error (prediction - actual value)
let error = y[i] - yHat

Now calculate the partial derivative with respect to the error. Think of this as a way to steer the search in the right direction, that is, calculating this gives us the direction and magnitude that we need to change b and w to minimize our error:

Note that this is done after iterating through all data points; that is, it is influenced by all data points. Alternatives are to perform this update per data point or over a subset, known as a batch.

// TODO: calculate the gradient with respect to the error and b (); adding it to the epochs bias gradient
B = B - (learningRate * bGradient)
            
// TODO: calculate the gradient with respect to the error and w (); adding it to the epochs weight gradient 
W = W - (learningRate * wGradient)

After iterating over each data point, we adjust the coefficients B and W using their accumulated gradients.

After each epoch, trainingCallback is called to draw a line using the current model's coefficients (its current best fit line that fits the data); the progress of this is shown in the following diagram:

Admittedly, this is difficult to interpret without a key! But the pattern will hopefully be obvious; with each iteration, our line better fits the data. After 100 epochs, we end up with this model:

The function describing this line is y = 0.733505317339142 + 3.4474988368438 * x. Using this model, we can predict the total payments for all claims given the number of claims (by simply substituting x with the number of claims).

Table of Contents for Auto insurance in Sweden

Create new playlist

Sign In

Sign Up

Table of Contents for
Auto insurance in Sweden