Now that you have the tools to build and test Spark jobs as well as set up a Spark cluster to run them on, it's time to figure out how to make the most of your time as a Spark developer.
Spark and Shark have very useful logs for figuring out what's going on when things are not behaving as expected. When working with a program that uses sql2rdd
or any other Shark-related tool, a good place to start debugging is by looking at what HiveQL queries are being run. You should find this in the console logs where you execute the Spark program: look for a line such as Hive history file=/tmp/spark/hive_job_log_spark_201306090132_919529074.txt. Spark also keeps a per machine log on each machine, by default, in the logs subdirectory of the Spark directory. Spark's web UI provides a convenient place to see the stdout
and stderr
files of each job, running and completing separate output per worker.
18.189.180.43