How to formulate the model

The multiple regression model defines a linear functional relationship between one continuous outcome variable and p input variables that can be of any type but may require preprocessing. Multivariate regression, in contrast, refers to the regression of multiple outputs on multiple input variables.

In the population, the linear regression model has the following form for a single instance of the output y, an input vector , and the error ε:

The interpretation of the coefficients is straightforward: the value of a coefficient  is the partial, average effect of the variable xon the output, holding all other variables constant. 

The model can also be written more compactly in matrix form. In this case, y is a vector of N output observations, X is the design matrix with N rows of observations on the p variables plus a column of 1s for the intercept, and  is the vector containing the P = p+1 coefficients :

The model is linear in its p +1 parameters but can model non-linear relationships by choosing or transforming variables accordingly, for example by including a polynomial basis expansion or logarithmic terms. It can also use categorical variables with dummy encoding, and interactions between variables by creating new inputs of the form xi . xj.

To complete the formulation of the model from a statistical point of view so that we can test a hypothesis about the parameters, we need to make specific assumptions about the error term. We'll do this after first introducing the alternative methods to learn the parameters.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.143.181