Overview

Since it is only possible to examine and use a small amount of H2O's functionality in this chapter, we thought that it would be useful to provide a list of all of the functional areas that it covers. This list is taken from the http://h2o.ai/ website at http://h2o.ai/product/algorithms/ and is based upon wrangling data, modeling using the data, and scoring the resulting models:

  • Process
  • Model
  • The score tool
  • Data profiling
  • Generalized linear models (GLM)
  • Predict
  • Summary statistics
  • Decision trees
  • Confusion matrix
  • Aggregate, filter, bin, and derive columns
  • Gradient boosting machine (GBM)
  • AUC
  • Slice, log transform, and anonymize
  • K-means
  • Hit ratio
  • Variable creation
  • Anomaly detection
  • PCA/PCA score
  • Deep learning
  • Multimodel scoring
  • Training and validation sampling plan
  • Naive Bayes
  • Grid search

The following section will explain the environment used for the Spark and H2O examples in this chapter and some of the problems encountered.

For completeness, we will show you how we downloaded, installed, and used H2O. Although we finally settled on version 0.2.12-95, we first downloaded and used 0.2.12-92. This section is based on the earlier install, but the approach used to source the software is the same. The download link changes over time, so follow the Sparkling Water download option at http://h2o.ai/download/.

This will source the zipped Sparkling Water release, as shown in the file listing here:

 [hadoop@hc2r1m2 h2o]$ pwd ; ls -l
/home/hadoop/h2o
total 15892
-rw-r--r-- 1 hadoop hadoop 16272364 Apr 11 12:37 sparkling-water-0.2.12-92.zip

This zipped release file is unpacked using the Linux unzip command, and it results in a Sparkling Water release file tree:

 [hadoop@hc2r1m2 h2o]$ unzip sparkling-water-0.2.12-92.zip

[hadoop@hc2r1m2 h2o]$ ls -d sparkling-water*
sparkling-water-0.2.12-92 sparkling-water-0.2.12-92.zip

We have moved the release tree to the /usr/local/ area using the root account and created a simple symbolic link to the release called H2O. This means that our H2O-based build can refer to this link, and it doesn't need to change as new versions of Sparkling Water are sourced. We have also made sure, using the Linux chmod command, that our development account, Hadoop, has access to the release:

[hadoop@hc2r1m2 h2o]$ su -
[root@hc2r1m2 ~]# cd /home/hadoop/h2o
[root@hc2r1m2 h2o]# mv sparkling-water-0.2.12-92 /usr/local
[root@hc2r1m2 h2o]# cd /usr/local

[root@hc2r1m2 local]# chown -R hadoop:hadoop sparkling-water-0.2.12-92
[root@hc2r1m2 local]# ln –s sparkling-water-0.2.12-92 h2o

[root@hc2r1m2 local]# ls –lrt | grep sparkling
total 52
drwxr-xr-x 6 hadoop hadoop 4096 Mar 28 02:27 sparkling-water-0.2.12-92
lrwxrwxrwx 1 root root 25 Apr 11 12:43 h2o -> sparkling-water-0.2.12-92

The release has been installed on all the nodes of our Hadoop clusters.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.137.67