9.9. Conclusion

Mi-Ling's goal in this analysis was to explore some of the features that JMP provides to support classification and data mining. She began by using various visualization techniques to develop an understanding of the data and relationships among the variables. Then, she used formulas and row states to partition her data into a training set, a validation set, and a test set.

Mi-Ling was interested in investigating logistic, partition, and neural net fits. Given that her goal was to learn about these platforms, she constructed models in a fairly straightforward way. She fit four models using the training data: a logistic model, a partition model, and two neural net models. The best classification, based on performance on her validation set, was obtained with a neural net model whose structure was chosen using K-fold cross-validation. We note that Mi-Ling could have taken a number of more sophisticated approaches to her modeling endeavor, had she so desired.

Among Mi-Ling's four models, the partition model had the worst performance. In our experience, single partition models tend not to perform as well as nonlinear (or linear) regression techniques when the predictors are continuous. They can be very useful when there are categorical predictors, and especially when these have many levels. Moreover, unlike neural net models and even logistic models, partition models are very intuitive and interpretable, which makes them all the more valuable for data exploration. In Mi-Ling's situation, where classification was the primary goal, the interpretability of the model was less important than its ability to classify accurately.

We also wish to underscore the importance of guarding against overfitting, which, in the case of neural net models, often results in claims of exaggerated model performance. The application of K-fold cross-validation helped Mi-Ling arrive at a neural net model that was simple but also generalized well to her test set. In the case of neural nets, where overfitting is so easy, we strongly recommended that model performance be assessed on a genuinely independent data set. Without such an approach, claims about the model's predictive performance are likely to be overly optimistic.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.83.96