Least squares

The least squares method is the original method to learn the parameters of the hyperplane that best approximates the output from the input data. As the name suggests, the best approximation minimizes the sum of the squared distances between the output value and the hyperplane represented by the model.

The difference between the model's prediction and the actual outcome for a given data point is the residual (whereas the deviation of the true model from the true output in the population is called error). Hence, in formal terms, the least squares estimation method chooses the coefficient vector  to minimize the residual sum of squares (RSS):

Hence, the least-squares coefficients  are computed as:

The optimal parameter vector that minimizes RSS results from setting the derivatives of the preceding expression with respect to  to zero. This produces a unique solution, assuming X has full column rank, that is, the input variables are not linearly dependent, as follows:

When y and X have been de-meaned by subtracting their respective means,  represents the ratio of the covariance between the inputs and the outputs  and the output variance . There is also a geometric interpretation: the coefficients that minimize RSS ensure that the vector of residuals  is orthogonal to the subspace of spanned by the columns of X, and the estimates  are orthogonal projections into that subspace.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.163.31