Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Summary

In this chapter, we learned how to save data in plain text format. We noticed that schema information is lost when we do not load the data properly. We then learned how to leverage JSON as a data format and saw that JSON retains the schema, but it has a lot of overhead because the schema is for every record. We then learned about CSV and saw that Spark has embedded support for it. The disadvantage of this approach, however, is that the schema is not about the specific types of records, and tabs need to be inferred implicitly. Toward the end of this chapter, we covered Avro and Parquet, which have columnar formats that are also embedded with Spark.

In the next chapter, we'll be working with Spark's key/value API.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.16.75.165

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary