17.9 Wrap-Up

In this chapter, we introduced big data, discussed how large data is getting and discussed hardware and software infrastructure for working with big data. We introduced traditional relational databases and Structured Query Language (SQL) and used the sqlite3 module to create and manipulate a books database in SQLite. We also demonstrated loading SQL query results into pandas DataFrames.

We discussed the four major types of NoSQL databases—key–value, document, columnar and graph—and introduced NewSQL databases. We stored JSON tweet objects as documents in a cloud-based MongoDB Atlas cluster, then summarized them in an interactive visualization displayed on a Folium map.

We introduced Hadoop and how it’s used in big-data applications. You configured a multi-node Hadoop cluster using the Microsoft Azure HDInsight service, then created and executed a Hadoop MapReduce task using Hadoop streaming.

We discussed Spark and how it’s used in high-performance, real-time big-data applications. You used Spark’s functional-style filter/map/reduce capabilities, first on a Jupyter Docker stack that runs locally on your own computer, then again using a Microsoft Azure HDInsight multi-node Spark cluster. Next, we introduced Spark streaming for processing data in mini-batches. As part of that example, we used Spark SQL to query data stored in Spark DataFrames.

The chapter concluded with an introduction to the Internet of Things (IoT) and the publish/subscribe model. You used Freeboard.io to create a dashboard visualization of a live sample stream from PubNub. You simulated an Internet-connected thermostat which published messages to the free dweet.io service using the Python module Dweepy, then used Freeboard.io to visualize the simulated device’s data. Finally, you subscribed to a PubNub sample live stream using their Python module.

The rich collection of exercises encourages you to work with more big-data cloud and desktop platforms, additional SQL and NoSQL databases, NewSQL databases and IoT platforms. You can work with Wikipedia as another big-data source, and you can implement IoT with the Raspberry Pi and Iotify simulators.

Thanks for reading Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. We hope that you enjoyed the book and that you found it entertaining and informative. Most of all we hope you feel empowered to apply the technologies you’ve learned to the challenges you’ll face as you continue your education and in your career.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.138.144