Chapter 9. Tips and Tricks

Now that you have the tools to build and test Spark jobs as well as set up a Spark cluster to run them on, it's time to figure out how to make the most of your time as a Spark developer.

Where to find logs?

Spark and Shark have very useful logs for figuring out what's going on when things are not behaving as expected. When working with a program that uses sql2rdd or any other Shark-related tool, a good place to start debugging is by looking at what HiveQL queries are being run. You should find this in the console logs where you execute the Spark program: look for a line such as Hive history file=/tmp/spark/hive_job_log_spark_201306090132_919529074.txt. Spark also keeps a per machine log on each machine, by default, in the logs subdirectory of the Spark directory. Spark's web UI provides a convenient place to see the stdout and stderr files of each job, running and completing separate output per worker.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.180.43