Data

We will be using what is referred to as the ANSUR dataset, which stands for US Army Anthropometric Survey. It consists of two separate files: one for female soldiers and one for male soldiers. I've combined the results into one dataset. You can download the data here: https://github.com/PacktPublishing/Advanced-Machine-Learning-with-R/tree/master/Data/army_ansur.RData.

I found this data on a data repository site called data.world, which allows members to share any dataset they have of interest. For example, I have a version of the Gettysburg data we used in Chapter 1, Preparing and Understanding Data, on the site. This ANSUR data consists of research done by the Natick Soldier Research, Development and Engineering Center (NSRDEC) on over 6000 Active Duty, Reserve, and National Guard soldiers for the US Army. The features are of 93 different body measurements along with assorted demographic data. The US Army and contractors use this information to order the proper quantity and size of equipment, design new equipment, and so on. As you can imagine, many of these features are highly correlated, making this data perfect for PCA.

We'll put those body measurements through the PCA process, then use that to predict body weight in pounds, using a MARS model as we learned in prior chapters. Why soldier weight? Why not? We'll lump males and females together. We could use that data as an input feature, but I won't. Use age, race, gender, or the like in a model in the banking industry subject to review, and prepare to, at a minimum, answer some tough questions. OK, enough of the introduction, let's get cracking.

Table of Contents for Data

Create new playlist

Sign In

Sign Up

Table of Contents for
Data