Destroying broadcast variables

You can also destroy broadcast variables, completely removing them from all executors and the Driver too making them inaccessible. This can be quite helpful in managing the resources optimally across the cluster.

Calling destroy() on a broadcast variable destroys all data and metadata related to the specified broadcast variable. Once a broadcast variable has been destroyed, it cannot be used again and will have to be recreated all over again.

The following is an example of destroying broadcast variables:

scala> val rdd_one = sc.parallelize(Seq(1,2,3))
rdd_one: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[101] at parallelize at <console>:25

scala> val k = 5
k: Int = 5

scala> val bk = sc.broadcast(k)
bk: org.apache.spark.broadcast.Broadcast[Int] = Broadcast(163)

scala> rdd_one.map(j => j + bk.value).take(5)
res184: Array[Int] = Array(6, 7, 8)

scala> bk.destroy
If an attempt is made to use a destroyed broadcast variable, an exception is thrown

The following is an example of an attempt to reuse a destroyed broadcast variable:

scala> rdd_one.map(j => j + bk.value).take(5)
17/05/27 14:07:28 ERROR Utils: Exception encountered
org.apache.spark.SparkException: Attempted to use Broadcast(163) after it was destroyed (destroy at <console>:30)
at org.apache.spark.broadcast.Broadcast.assertValid(Broadcast.scala:144)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$writeObject$1.apply$mcV$sp(TorrentBroadcast.scala:202)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$wri

Thus, broadcast functionality can be use to greatly improve the flexibility and performance of Spark jobs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.64.89