Philosophy behind ensembling 

Ensembling, which is super-famous among ML practitioners, can be well-understood through a simple real-world, non-ML example.

Assume that you have applied for a job in a very reputable corporate organization and you have been called for an interview. It is unlikely you will be selected for a job just based on one interview with an interviewer. In most cases, you will go through multiple rounds of interviews with several interviewers or with a panel of interviewers. The expectation from the organization is that each of the interviewers is an expert on a particular area and that the interviewer has evaluated your fitness for the job based on your experience in the interviewers' area of expertise. Your selection for the job, of course, depends on consolidated feedback from all of the interviewers that talked to you. The organization deems that you will be more successful in the job as your selection is based on a consolidated decision made by multiple experts and not just based on one expert's decision, which may be prone to certain biases. 

Now, when we talk about the consolidation of feedback from all the interviewers, the consolidation can happen through several methods:

  • Averaging: Assume that your candidature for the job is based on you clearing a cut-off score in the interviews. Assume that you have met ten interviewers and each one of them have rated you on a maximum score of 10 which represents your experience as perceived by interviewers in his area of expertise. Now, your consolidated score is made by simply averaging all your scores given by all the interviewers.
  • Majority vote: In this case, there is no actual score out of 10 which is provided by each of the interviewers. However, of the 10 interviewers, eight of them confirmed that you are a good fit for the position. Two interviewers said no to your candidature. You are selected for the job as the majority of the interviewers are happy with your interview performance. 
  • Weighted average: Let's consider that four of the interviewers are experts in some minor skills that are good to have for the job you applied for. These are not mandatory skills needed for the position. You are interviewed by all 10 interviewers and each one of them have given you a score out of 10. Similar to the averaging method, in the weighted averaging method as well, your interviews final score is obtained by averaging the scores given by all interviewers.

However, not all scores are treated equally to compute the final score. Each interview score is multiplied with a weight and a product is obtained. All the products thus obtained thereby are summed to obtain the final score. The weight for each interview is a function of the importance of the skill it tested in the candidate and the importance of that skill to do the job. It is obvious that a good to have skill for the job carries a lower weight when compared to a must have skill. The final score now inherently represents the proportion of mandatory skills that the candidate possesses and this has more influence on your selection. 

Similar to the interviews analogy, ensembling in ML also produces models based on consolidated learning. The term consolidated learning essentially represents learning obtained through applying several ML algorithms or it is learning obtained from several data subsets that are part of a large dataset. Analogous to interviews, multiple models are learned from the application of ensembling technique. However, a final consolidation is arrived at regarding the prediction by means of applying one of the averaging, majority voting, or weighted averaging techniques on individual predictions made by each of the individual models. The models created from the application of an ensembling technique along with the prediction consolidation technique is typically termed as an ensemble.

Each ML algorithm is special and has a unique way to model the underlying training data. For example, a k-nearest neighbors algorithm learns by computing distances between the elements in a dataset; naive Bayes learns by computing the probabilities of each attribute in the data belonging to a particular class. Multiple models may be created using different ML algorithms and predictions can be done by combining predictions of several ML algorithms. Similarly, when a dataset is partitioned to create subsets and if multiple models are trained using an algorithm each focusing on one dataset, each model is very focused and it is specialized in learning the characteristics of the subset of data it is trained on. In both cases, with models based on multiple algorithms and multiple subsets of data, when we combine the predictions of multiple models through consolidation, we get better predictions as we leverage multiple strengths that each model in an ensemble carry. This, otherwise, is not obtained when using a single model for predictions.

The crux of ensembling is that better predictions are obtained when we combine the predictions of multiple models than just relying on one model for prediction. This is no different from the management philosophy that together we do better, which is otherwise termed as synergy

Now that we understand the core philosophy behind ensembling, we are now ready to explore the different types of ensembling techniques. However, we will learn the ensembling techniques by implementing them in a project to predict the attrition of employees. As we already know, prior to building any ML project, it is very important to have a deep understanding of the problem and the data. Therefore, in the next section, we first focus on understanding the attrition problem at hand, then we study the dataset associated with the problem, and lastly, we understand the properties of the dataset through exploratory data analysis (EDA). The key insights we obtain in this section come from a one-time exercise and will hold good for all the ensembling techniques we will apply in the later sections. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.106.237