Datasets used in this chapter

We will use a few datasets in this chapter, which will cover a wide range of topics. What they have in common is that they are all multidimensional; as a result, the techniques of this chapter are easy to implement. For convenience, all datasets are available online at http://scholar.harvard.edu/gerrard/mastering-scientific-computation-r. The following are some of the datasets:

  • Red wine: This is a dataset of red wine properties. This dataset contains the chemical properties of the wine as well as the wine quality score. This dataset comes from the paper P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009. It was downloaded from the University of California Irvine Machine Learning Repository at http://archive.ics.uci.edu/ml/.
  • Abalone: This is a dataset of abalone measurements. The measurements are concerned largely with sizes and weights of various parts of the abalone. This dataset comes from the University of California Irvine Machine Learning Repository available at http://archive.ics.uci.edu/ml/.
  • Physical functioning: We used this dataset in the previous chapter. For full details, refer to Chapter 5, Linear Algebra. To summarize, the survey respondents are asked how much difficulty they have with 20 items related to functional independence.

All UC Irvine Machine Learning Repository datasets come from Kevin Bache and Moshe Lichman (2013). The UCI Machine Learning Repository is available at http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.45.80