Loading and saving datasets

We need to have data read into the cluster as input and output or results written back to the storage to do anything practical with our code. Input data can be read from a variety of datasets and sources such as Files, Amazon S3 storage, Databases, NoSQLs, and Hive, and the output can similarly also be saved to Files, S3, Databases, Hive, and so on.

Several systems have support for Spark via a connector, and this number is growing day by day as more systems are latching onto the Spark processing framework.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.37.56