Deploying Spark with Chef (opscode)

Chef is an open source automation platform that has become increasingly popular for deploying and managing both small and large clusters of machines. Chef can be used to control a traditional static fleet of machines, but can also be used with EC2 and other cloud providers. Chef uses cookbooks as the basic building blocks of configuration and can either be generic or site specific. If you have not used Chef before, a good tutorial for getting started with Chef can be found at https://learnchef.opscode.com/. You can use a generic Spark cookbook as the basis for setting up your cluster.

To get Spark working, you need to create a role for both the master and the workers, as well as configure the workers to connect to the master. Start by getting the cookbook from https://github.com/holdenk/chef-cookbook-spark. The bare minimum is setting the master hostname as master (so the worker nodes can connect) and the username so that Chef can install in the correct place. You will also need to either accept Sun's Java license or switch to an alternative JDK. Most of the settings that are available in spark-env.sh are also exposed through the cookbook's settings. You can see an explanation of the settings on configuring multiple hosts over SSH in the Set of machines over SSH section. The settings can be set per-role or you can modify the global defaults:

To create a role for the master with knife role, create spark_master_role -e [editor]. This will bring up a template role file that you can edit. For a simple master, set it to:

{
  "name": "spark_master_role",

  "description": "",
  "json_class": "Chef::Role",

  "default_attributes": {
    },
  "override_attributes": {
   "username":"spark",
   "group":"spark",
   "home":"/home/spark/sparkhome",
   "master_ip":"10.0.2.15",
  },
  "chef_type": "role",
  "run_list": [
    "recipe[spark::server]",
    "recipe[chef-client]",
  ],
  "env_run_lists": {
    },
}

Then create a role for the client in the same manner except instead of spark::server, use the spark::client recipe. Deploy the roles to the different hosts:

knife node run_list add master role[spark_master_role]
knife node run_list add worker role[spark_worker_role]

Then run chef-client on your nodes to update. Congrats, you now have a Spark cluster running!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.72.74