Broadcast variable

Broadcast variable is a read-only variable shared among each executor node. A variable, once broadcasted, gets copied to each executor's memory and can be referred to whenever needed in the execution of the program.

The broadcast variable is a very useful feature if some data needs to be referred to during the execution of tasks at various stages of the program. Like the distributed cache concept in Hadoop, where lookup data in a table can be placed for a map side join, the broadcast variable can be used in Spark to keep the look up data available in each executor's memory.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.67.5