Weight-decay 

It can be observed that an overfitted model—like the polynomial in the preceding example—has very large weights. To prevent this, a penalty term, Ω, can be added to the objective function, which will drive the weights closer to the origin. Thus, the penalty term should be a function of the norm of the weights. Also, the effect of the penalty term can be controlled by multiplying with a hyperparameter, α. So our objective function becomes: E(w) + αΩ(w). The popularly-used penalty terms are:

  • Lregularization: Penalty term is given by . In regression literature, this is called ridge regression.
  • L1 regularization: Penalty term is given by . This is called lasso regression.

L1 regularization leads to a sparse solution; that is, it sets many of the weights to zero and thus acts as a good feature-selection method for regression problems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.51.153