Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Installing Shark

As of the writing of this chapter, the latest version of Shark is v0.7.0 and it requires Spark 0.7.2 as well as a very recent JVM (Open JK7/Oracle HotSpot JDK7). Shark is available pre-built for both Hadoop 1 and Hadoop 2. As of the writing, the respective files are http://spark-project.org/download/shark-0.7.0-hadoop1-bin.tgz and http://spark-project.org/download/shark-0.7.0-hadoop2-bin.tgz. Once you have downloaded and extracted Shark, it's time to configure it. In this example, we will assume that you extracted in /home/spark/. Shark has a separate configuration from Spark, which lives at shark-0.7.0/conf/shark-env.sh. For local mode, you need to set up at least HIVE_HOME and SPARK_HOME like so:

export HIVE_HOME=/home/spark/hive-0.9.0-bin
export SPARK_HOME=/home/park/spark-0.7.2
source $SPARK_HOME/conf/spark-env.sh

In local mode, you also need to create a place for Hive to store its files, which by default is /user/hive/warehouse. Make sure to use the chown command in order to make the files accessible to your user like so:

mkdir -p /user/hive/warehouse && chown [your-spark-user] /user/hive/warehouse

If you are using Shark with a Spark cluster, you also need to set the MASTER and HADOOP_HOME variables. If you are using Shark with an existing Hive installation, you must set HIVE_CONF_DIR to the directory containing the Hive XML configuration files. If you add these after the source... line, you can reference the variables in the Spark configuration with:

export HADOOP_HOME=/path/to/hadoop
export MASTER=spark://$SPARK_MASTER_IP:7077

Once you have Shark installed and set up, you also need to copy Shark and its custom hive to all the workers nodes; do this with:

pscp -v -r -h ./spark-0.7.2/conf/slaves -l sparkuser ./shark-0.7.0 ~/
pscp -v -r -h ./spark-0.7.2/conf/slaves -l sparkuser ./hive-0.9.0-bin ~/

If you are doing an EC2-based setup, just use the latest AMI; it should already be set up for Shark.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Installing Shark

Create new playlist

Sign In

Sign Up

Installing Shark

Table of Contents for
Installing Shark