Data persistence

By default, data is persisted using Parquet file format in Spark unless the configuration spark.sql.sources.default it set to any other format. Fully qualified file format names should be used as a data source except for file formats (JSON, Parquet, JDBC, ORC, LIBSVM, CSV, and text) which are natively supported as of Spark 2.1 . Though native file formats have built-in support with dtasets, some limitation exits, such as: for saving the data in text file format only single column is supported; similarly for saving ORC file format, Hive support should be enabled while creating SparkSession. Data can be persisted using either of the options provided by a dataset.

deptDf.write().mode(SaveMode.Overwrite).json("DirectoryLocation");
deptDf.write().mode("overwrite").format("csv").save("src/main/resources/output/deptText");
deptDf.write().mode(SaveMode.Overwrite).format("csv").save("DirectoryLocation ");

Data persistence mode is a very important aspect of how data is saved on the disk if existing data already exists. SaveMode provides four different options to handle such scenarios, default being error which throws an exception if the data is already present, as shown in the following table:

SaveMode

Any language

Meaning

SaveMode.ErrorIfExists

error

When saving a dataframe to a data source, if data already exists, an exception is expected to be thrown.

SaveMode.Append

append

When saving a dataframe to a data source, if data/table already exists, the contents of the dataframe are expected to be appended to existing data.

SaveMode.Overwrite

overwrite

When saving a dataframe to a data source, if a data/table already exists, the existing data is expected to be overwritten by the contents of the dataframe.

SaveMode.Ignore

ignore

When saving a dataframe to a data source, if data already exists, the save operation is expected to not save the contents of the dataframe and to not change the existing data. This is similar to a CREATE TABLE IF NOT EXISTS in SQL.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.151.32