Debugging Spark application using SBT

The preceding setting works mostly on Eclipse or IntelliJ using the Maven project. Suppose that you already have your application done and are working on your preferred IDEs such as IntelliJ or Eclipse as follows:

object DebugTestSBT {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder
.master("local[*]")
.config("spark.sql.warehouse.dir", "C:/Exp/")
.appName("Logging")
.getOrCreate()
spark.sparkContext.setCheckpointDir("C:/Exp/")
println("-------------Attach debugger now!--------------")
Thread.sleep(8000)
// code goes here, with breakpoints set on the lines you want to pause
}
}

Now, if you want to get this job to the local cluster (standalone), the very first step is packaging the application with all its dependencies into a fat JAR. For doing this, use the following command:

$ sbt assembly

This will generate the fat JAR. Now the task is to submit the Spark job to a local cluster. You need to have spark-submit script somewhere on your system:

$ export SPARK_JAVA_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005

The preceding command exports a Java argument that will be used to start Spark with the debugger:

$ SPARK_HOME/bin/spark-submit --class Test --master local[*] --driver-memory 4G --executor-memory 4G /path/project-assembly-0.0.1.jar

In the preceding command, --class needs to point to a fully qualified class path to your job. Upon successful execution of this command, your Spark job will be executed without breaking at the breakpoints. Now to get the debugging facility on your IDE, say IntelliJ, you need to configure to connect to the cluster. For more details on the official IDEA documentation, refer to http://stackoverflow.com/questions/21114066/attach-intellij-idea-debugger-to-a-running-java-process.

It is to be noted that if you just create a default remote run/debug configuration and leave the default port of 5005, it should work fine. Now, when you submit the job for the next time and see the message to attach the debugger, you have eight seconds to switch to IntelliJ IDEA and trigger this run configuration. The program will then continue to execute and pause at any breakpoint you defined. You can then step through it like any normal Scala/Java program. You can even step into Spark functions to see what it's doing under the hood.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.93.68