Properties of the broadcast variable

The following are the properties for the broadcast variable:

  • Broadcast variables are read-only: Broadcast variables are immutable, that is, once initialized their value cannot be changed.
  • Broadcast variables get copied to executor memory at the time of creation: A broadcast variable gets cached to the executor's memory only once, at the time of creation. Therefore, it increases the performance of Spark as the lookup values do not need to pass to transformation/action functions multiple times for referring.
  • Should fit in executor's memory: The size of the broadcast variable should be small so that it can fit into an executor's memory. They are meant for the lookup values which are small in size and can be distributed over the cluster.

Let's start by creating a broadcast variable using JavaSparkContext:

SparkConf conf = new SparkConf().setMaster("local").setAppName("Broadcast Example");
JavaSparkContext jsc = new JavaSparkContext(conf);
Broadcast<String> broadcastVar = jsc.broadcast("Hello Spark");

Code for creating broadcast variable using SparkSession:

SparkSession sparkSession = 
SparkSession.builder().master("local").appName("My App")
.config("spark.sql.warehouse.dir", "file:////C:/Users/sgulati/spark-
warehouse").getOrCreate();
Broadcast<String> broadcastVar= sparkSession.sparkContext().broadcast("Hello Spark", scala.reflect.ClassTag$.MODULE$.apply(String.class));

The value of the broadcast variable can be read as follows:

broadcastVar.getValue();
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.247.68