LOESS

The thing that powers STL is the notion of local regression—LOESS itself is a terrible acronym formed from LOcal regrESSion—whatever drugs the statisticians were on in the 1990s, sign me up for them. We're already familiar with the idea of linear regression from Chapter 1, How to Solve All Machine Learning

Recall that the role of linear regression is that given a straight line function: . We want to estimate  and . Instead of trying to fit the whole dataset at once, what if we broke the dataset up into many small local components, and ran a regression on each small dataset? Here's an example of what I mean:

| X | Y |
|:--:|:--|
| -1 | 1 |
| -0.9 | 0.81 |
| -0.8 | 0.64 |
| -0.7 | 0.49 |
| -0.6 | 0.36 |
| -0.5 | 0.25 |
| -0.4 | 0.16 |
| -0.3 | 0.09 |
| -0.2 | 0.04 |
| -0.1 | 0.01 |
| 0 | 0 |
| 0.1 | 0.01 |
| 0.2 | 0.04 |
| 0.3 | 0.09 |
| 0.4 | 0.16 |
| 0.5 | 0.25 |
| 0.6 | 0.36 |
| 0.7 | 0.49 |
| 0.8 | 0.64 |
| 0.9 | 0.81 |

The preceding table is a function representing . Instead of pulling in the entire dataset for a regression, what if we did a running regression of every three rows? We'd start with row 2 (x = -0.9). And the data points under consideration are 1 before it and 1 after it (x = -1 and x = -0.8). And for row 3, we'd do a linear regression using row 2, 3, 4 as data points. At this point, we're not particularly interested in the errors of the local regression. We just want an estimate of the gradient and the crossings. Here's the resulting table:

| X | Y | m | c
|:--:|:--:|:--:|:--:|
| -0.9 | 0.81 | -1.8 | -0.803333333333333 |
| -0.8 | 0.64 | -1.6 | -0.633333333333334 |
| -0.7 | 0.49 | -1.4 | -0.483333333333334 |
| -0.6 | 0.36 | -1.2 | -0.353333333333333 |
| -0.5 | 0.25 | -1 | -0.243333333333333 |
| -0.4 | 0.16 | -0.8 | -0.153333333333333 |
| -0.3 | 0.09 | -0.6 | -0.083333333333333 |
| -0.2 | 0.04 | -0.4 | -0.033333333333333 |
| -0.1 | 0.01 | -0.2 | -0.003333333333333 |
| 0 | 0 | -2.71050543121376E-17 | 0.006666666666667 |
| 0.1 | 0.01 | 0.2 | -0.003333333333333 |
| 0.2 | 0.04 | 0.4 | -0.033333333333333 |
| 0.3 | 0.09 | 0.6 | -0.083333333333333 |
| 0.4 | 0.16 | 0.8 | -0.153333333333333 |
| 0.5 | 0.25 | 1 | -0.243333333333333 |
| 0.6 | 0.36 | 1.2 | -0.353333333333333 |
| 0.7 | 0.49 | 1.4 | -0.483333333333334 |
| 0.8 | 0.64 | 1.6 | -0.633333333333333 |
| 0.9 | 0.81 | 1.8 | -0.803333333333333 |

In fact, we can show that if you plot each line individually, you will have a somewhat "curved" shape. So, here's a side program I wrote to plot this out:

// +build sidenote

package main

import (
"image/color"

"github.com/golang/freetype/truetype"
"golang.org/x/image/font/gofont/gomono"
"gonum.org/v1/plot"
"gonum.org/v1/plot/plotter"
"gonum.org/v1/plot/vg"
"gonum.org/v1/plot/vg/draw"
)

var defaultFont vg.Font

func init() {
font, err := truetype.Parse(gomono.TTF)
if err != nil {
panic(err)
}
vg.AddFont("gomono", font)
defaultFont, err = vg.MakeFont("gomono", 12)
if err != nil {
panic(err)
}
}

var table = []struct {
x, m, c float64
}{
{-0.9, -1.8, -0.803333333333333},
{-0.8, -1.6, -0.633333333333334},
{-0.7, -1.4, -0.483333333333334},
{-0.6, -1.2, -0.353333333333333},
{-0.5, -1, -0.243333333333333},
{-0.4, -0.8, -0.153333333333333},
{-0.3, -0.6, -0.083333333333333},
{-0.2, -0.4, -0.033333333333333},
{-0.1, -0.2, -0.003333333333333},
{0, -2.71050543121376E-17, 0.006666666666667},
{0.1, 0.2, -0.003333333333333},
{0.2, 0.4, -0.033333333333333},
{0.3, 0.6, -0.083333333333333},
{0.4, 0.8, -0.153333333333333},
{0.5, 1, -0.243333333333333},
{0.6, 1.2, -0.353333333333333},
{0.7, 1.4, -0.483333333333334},
{0.8, 1.6, -0.633333333333333},
{0.9, 1.8, -0.803333333333333},
}

type estimates []struct{ x, m, c float64 }

func (es estimates) Plot(c draw.Canvas, p *plot.Plot) {
trX, trY := p.Transforms(&c)
lineStyle := plotter.DefaultLineStyle
lineStyle.Dashes = []vg.Length{vg.Points(2), vg.Points(2)}
lineStyle.Color = color.RGBA{A: 255}
for i, e := range es {
if i == 0 || i == len(es)-1 {
continue
}
strokeStartX := es[i-1].x
strokeStartY := e.m*strokeStartX + e.c
strokeEndX := es[i+1].x
strokeEndY := e.m*strokeEndX + e.c
x1 := trX(strokeStartX)
y1 := trY(strokeStartY)
x2 := trX(strokeEndX)
y2 := trY(strokeEndY)
x := trX(e.x)
y := trY(e.x*e.m + e.c)

c.DrawGlyph(plotter.DefaultGlyphStyle, vg.Point{X: x, Y: y})
c.StrokeLine2(lineStyle, x1, y1, x2, y2)
}
}

func main() {
p, err := plot.New()
if err != nil {
panic(err)
}
p.Title.Text = "X^2 Function and Its Estimates"
p.X.Label.Text = "X"
p.Y.Label.Text = "Y"
p.X.Min = -1.1
p.X.Max = 1.1
p.Y.Min = -0.1
p.Y.Max = 1.1
p.Y.Label.TextStyle.Font = defaultFont
p.X.Label.TextStyle.Font = defaultFont
p.X.Tick.Label.Font = defaultFont
p.Y.Tick.Label.Font = defaultFont
p.Title.Font = defaultFont
p.Title.Font.Size = 16

Now, we will see how to plot the original function:

  // Original function
original := plotter.NewFunction(func(x float64) float64 { return x * x })
original.Color = color.RGBA{A: 16}
original.Width = 10
p.Add(original)

// Plot estimates
est := estimates(table)
p.Add(est)

if err := p.Save(25*vg.Centimeter, 25*vg.Centimeter, "functions.png"); err != nil {
panic(err)
}
}

The preceding code yields a chart, as shown in the following screenshot:

Most of the code will be explained in the latter parts of this chapter, but, for now, let's focus on the fact that you can indeed run many small linear regressions on "local" subsets of the data to plot a curve.

LOESS brings this idea further, by stating that if you have a window of values (in the toy example, we used 3), then the values should be weighted. The logic is simple: the closer a value is to the row in consideration, the higher the weight. If we had used a window size of 5, then when considering row 3, 2, and 4 would be weighted more heavily than rows 1 and 5. This width, it turns out, is important to our smoothing.

The subpackage, "github.com/chewxy/stl/loess", implements LOESS as a smoothing algorithm. Do read through the code if you're interested in knowing more about the details.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.228.88