The build environment

From past examples, you know that we favor sbt as a build tool for developing Scala source examples.

We have created a development environment on the Linux server called hc2r1m2 using the Hadoop development account. The development directory is called h2o_spark_1_2:

[hadoop@hc2r1m2 h2o_spark_1_2]$ pwd
 /home/hadoop/spark/h2o_spark_1_2

Our SBT build configuration file named h2o.sbt is located here; it contains the following:

 [hadoop@hc2r1m2 h2o_spark_1_2]$ more h2o.sbt
 
 name := "H 2 O"
 
 version := "1.0"
 
 scalaVersion := "2.10.4"
 
 libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.3.0"
 
 libraryDependencies += "org.apache.spark" % "spark-core"  % "1.2.0" from "file:///usr/hdp/2.6.0.3-8/spark/lib/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar"
 
 libraryDependencies += "org.apache.spark" % "mllib"  % "1.2.0" from "file:///usr/hdp/2.6.0.3-8/spark/lib/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar"
 
 libraryDependencies += "org.apache.spark" % "sql"  % "1.2.0" from "file:///usr/hdp/2.6.0.3-8/spark/lib/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar"
 
 libraryDependencies += "org.apache.spark" % "h2o"  % "0.2.12-95" from "file:///usr/local/h2o/assembly/build/libs/sparkling-water-assembly-0.2.12-95-all.jar"
 
 libraryDependencies += "hex.deeplearning" % "DeepLearningModel"  % "0.2.12-95" from "file:///usr/local/h2o/assembly/build/libs/sparkling-water-assembly-0.2.12-95-all.jar"
 
 libraryDependencies += "hex" % "ModelMetricsBinomial"  % "0.2.12-95" from "file:///usr/local/h2o/assembly/build/libs/sparkling-water-assembly-0.2.12-95-all.jar"
 
 libraryDependencies += "water" % "Key"  % "0.2.12-95" from "file:///usr/local/h2o/assembly/build/libs/sparkling-water-assembly-0.2.12-95-all.jar"
 
 libraryDependencies += "water" % "fvec"  % "0.2.12-95" from "file:///usr/local/h2o/assembly/build/libs/sparkling-water-assembly-0.2.12-95-all.jar"

We provided sbt configuration examples in the previous chapters, so we won't go into line-by line-detail here. We have used the file-based URLs to define the library dependencies and sourced the Hadoop JAR files from Hadoop home.

The Sparkling Water JAR path is defined as /usr/local/h2o/ that was just created.
We use a Bash script called run_h2o.bash within this development directory to execute our H2O-based example code. It takes the application class name as a parameter and is shown as follows:

[hadoop@hc2r1m2 h2o_spark_1_2]$ more run_h2o.bash
 
 #!/bin/bash
 
 SPARK_HOME=/usr/hdp/current/spark-client
 SPARK_LIB=$SPARK_HOME/lib
 SPARK_BIN=$SPARK_HOME/bin
 SPARK_SBIN=$SPARK_HOME/sbin
 SPARK_JAR=$SPARK_LIB/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar
 
 H2O_PATH=/usr/local/h2o/assembly/build/libs
 H2O_JAR=$H2O_PATH/sparkling-water-assembly-0.2.12-95-all.jar
 
 PATH=$SPARK_BIN:$PATH
 PATH=$SPARK_SBIN:$PATH
 export PATH
 
 cd $SPARK_BIN
 
 ./spark-submit 
   --class $1 
   --master spark://hc2nn.semtech-solutions.co.nz:7077  
   --executor-memory 512m 
   --total-executor-cores 50 
   --jars $H2O_JAR 
   /home/hadoop/spark/h2o_spark_1_2/target/scala-2.10/h-2-o_2.10-1.0.jar

This example of Spark application submission has already been covered, so again, we won't go into detail. Setting the executor memory at a correct value was critical to avoiding out-of-memory issues and performance problems. This is examined in the Performance tuning section.
As in the previous examples, the application Scala code is located in the src/main/scala subdirectory under the development directory level. The next section will examine the Apache Spark and the H2O architectures.

Table of Contents for The build environment

Create new playlist

Sign In

Sign Up

Table of Contents for
The build environment