Destroying broadcast variables

You can also destroy broadcast variables, completely removing them from all executors and the Driver too making them inaccessible. This can be quite helpful in managing the resources optimally across the cluster.

Calling destroy() on a broadcast variable destroys all data and metadata related to the specified broadcast variable. Once a broadcast variable has been destroyed, it cannot be used again and will have to be recreated all over again.

The following is an example of destroying broadcast variables:

scala> val rdd_one = sc.parallelize(Seq(1,2,3))
rdd_one: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[101] at parallelize at <console>:25

scala> val k = 5
k: Int = 5

scala> val bk = sc.broadcast(k)
bk: org.apache.spark.broadcast.Broadcast[Int] = Broadcast(163)

scala> rdd_one.map(j => j + bk.value).take(5)
res184: Array[Int] = Array(6, 7, 8)

scala> bk.destroy

If an attempt is made to use a destroyed broadcast variable, an exception is thrown

The following is an example of an attempt to reuse a destroyed broadcast variable:

scala> rdd_one.map(j => j + bk.value).take(5)
17/05/27 14:07:28 ERROR Utils: Exception encountered
org.apache.spark.SparkException: Attempted to use Broadcast(163) after it was destroyed (destroy at <console>:30)
 at org.apache.spark.broadcast.Broadcast.assertValid(Broadcast.scala:144)
 at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$writeObject$1.apply$mcV$sp(TorrentBroadcast.scala:202)
 at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$wri

Thus, broadcast functionality can be use to greatly improve the flexibility and performance of Spark jobs.

Table of Contents for Destroying broadcast variables

Create new playlist

Sign In

Sign Up

Table of Contents for
Destroying broadcast variables