Failed attempts

The example-based narrative of this book may mislead you into thinking that each hypothesis or idea improves accuracy. In fact, in order to write this chapter, we had to try and test a handful of other features and methods that didn't work out. For example, one idea we tried was to use the dates of each battle as a feature; you would think that allies lost more battles in the first half of the war and won more in the second. In reality, it actually lowered our performance on the testing dataset.

We also tried filling the missing values. At the very beginning of this chapter, we filled empty cells for planes, tanks, and guns with zeroes. In reality, the authors of Wikipedia had different sources; some of them had detailed data on the number of guns/planes/tanks, and some didn't. Most of the time, though (at least for a large number of soldiers), there were at least some values—but not others. It seems natural to inject at least an approximate number—for example, an average—into each empty cell. However, this didn't help the case either—filling with averages also lowered the score for the model. 

Improving the model is a constant process of iterations, trial, and error. It can be an exhausting and frustrating experience and, generally, the performance gains get smaller on each iteration. So, brace yourself, think strategically, and be ready to work hard, with no guarantee of a result at all.

Feature engineering is the king, but there is a second way to improve your performance, parallel to the features selecting the model and model parameters and that is parameter selection. Let's talk about parameter selection in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.128.57