PySpark

PySpark uses Python-based SparkContext and Python scripts as tasks and then uses sockets and pipes to executed processes to communicate between Java-based Spark clusters and Python scripts. PySpark also uses Py4J, which is a popular library integrated within PySpark that lets Python interface dynamically with Java-based RDDs.


Python must be installed on all worker nodes running the Spark executors.

The following is how PySpark works by communicating between Java processed and Python scripts:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.83.126