The build environment

From past examples, you know that we favor sbt as a build tool for developing Scala source examples.

We have created a development environment on the Linux server called hc2r1m2 using the Hadoop development account. The development directory is called h2o_spark_1_2:

[hadoop@hc2r1m2 h2o_spark_1_2]$ pwd
/home/hadoop/spark/h2o_spark_1_2

Our SBT build configuration file named h2o.sbt is located here; it contains the following:

 [hadoop@hc2r1m2 h2o_spark_1_2]$ more h2o.sbt

name := "H 2 O"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.3.0"

libraryDependencies += "org.apache.spark" % "spark-core" % "1.2.0" from "file:///usr/hdp/2.6.0.3-8/spark/lib/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar"

libraryDependencies += "org.apache.spark" % "mllib" % "1.2.0" from "file:///usr/hdp/2.6.0.3-8/spark/lib/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar"

libraryDependencies += "org.apache.spark" % "sql" % "1.2.0" from "file:///usr/hdp/2.6.0.3-8/spark/lib/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar"

libraryDependencies += "org.apache.spark" % "h2o" % "0.2.12-95" from "file:///usr/local/h2o/assembly/build/libs/sparkling-water-assembly-0.2.12-95-all.jar"

libraryDependencies += "hex.deeplearning" % "DeepLearningModel" % "0.2.12-95" from "file:///usr/local/h2o/assembly/build/libs/sparkling-water-assembly-0.2.12-95-all.jar"

libraryDependencies += "hex" % "ModelMetricsBinomial" % "0.2.12-95" from "file:///usr/local/h2o/assembly/build/libs/sparkling-water-assembly-0.2.12-95-all.jar"

libraryDependencies += "water" % "Key" % "0.2.12-95" from "file:///usr/local/h2o/assembly/build/libs/sparkling-water-assembly-0.2.12-95-all.jar"

libraryDependencies += "water" % "fvec" % "0.2.12-95" from "file:///usr/local/h2o/assembly/build/libs/sparkling-water-assembly-0.2.12-95-all.jar"

We provided sbt configuration examples in the previous chapters, so we won't go into line-by line-detail here. We have used the file-based URLs to define the library dependencies and sourced the Hadoop JAR files from Hadoop home.

The Sparkling Water JAR path is defined as /usr/local/h2o/ that was just created.
We use a Bash script called run_h2o.bash within this development directory to execute our H2O-based example code. It takes the application class name as a parameter and is shown as follows:

[hadoop@hc2r1m2 h2o_spark_1_2]$ more run_h2o.bash

#!/bin/bash

SPARK_HOME=/usr/hdp/current/spark-client
SPARK_LIB=$SPARK_HOME/lib
SPARK_BIN=$SPARK_HOME/bin
SPARK_SBIN=$SPARK_HOME/sbin
SPARK_JAR=$SPARK_LIB/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar

H2O_PATH=/usr/local/h2o/assembly/build/libs
H2O_JAR=$H2O_PATH/sparkling-water-assembly-0.2.12-95-all.jar

PATH=$SPARK_BIN:$PATH
PATH=$SPARK_SBIN:$PATH
export PATH

cd $SPARK_BIN

./spark-submit
--class $1
--master spark://hc2nn.semtech-solutions.co.nz:7077
--executor-memory 512m
--total-executor-cores 50
--jars $H2O_JAR
/home/hadoop/spark/h2o_spark_1_2/target/scala-2.10/h-2-o_2.10-1.0.jar

This example of Spark application submission has already been covered, so again, we won't go into detail. Setting the executor memory at a correct value was critical to avoiding out-of-memory issues and performance problems. This is examined in the Performance tuning section.
As in the previous examples, the application Scala code is located in the src/main/scala subdirectory under the development directory level. The next section will examine the Apache Spark and the H2O architectures.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.93.12