How it works... 

Ridge is similar to ordinary least squares, with the obvious difference that there is a penalization term. The idea is to minimize the squared residuals, and at the same time have coefficients that are not big. Because the coefficients won't be as big as they would have been in the absence of the penalization term, the model won't be able to over-fit to the data. In the following equation, we have the Ridge minimization problem; note that the  is a hyper-parameter that defines how much weight we want to place on the penalization. A large value implies that the penalization will dominate and the model will likely under-fit (not capture the structure of the data). On the other hand, a small value implies that the penalization will not be used, and the model will probably over-fit (capture even the noise in the data). The right value for  needs to be determined via cross-validation or using training/testing data-sets:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.136.18.169