Summary

We finished covering the basics of interacting with data in different commonly used storage mechanisms from the simple ones, such as text files, over more structured ones, such as HDF5, to more sophisticated data storage systems, such as MongoDB and Redis. The most suitable type of storage will depend on your use case. The choice of the data storage layer technology plays an important role in the overall design of data processing systems. Sometimes, we need to combine various database systems to store our data, such as complexity of the data, performance of the system or computation requirements.

Practice exercises

  • Take a data set of your choice and design storage options for it. Consider text files, HDF5, a document database, and a data structure store as possible persistent options. Also evaluate how difficult (by some metric, for examples, how many lines of code) it would be to update or delete a specific item. Which storage type is the easiest to set up? Which storage type supports the most flexible queries?
  • In Chapter 3, Data Analysis with Pandas we saw that it is possible to create hierarchical indices with Pandas. As an example, assume that you have data on each city with more than 1 million inhabitants and that we have a two level index, so we can address individual cities, but also whole countries. How would you represent this hierarchical relationship with the various storage options presented in this chapter: text files, HDF5, MongoDB, and Redis? What do you believe would be most convenient to work with in the long run?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.161.225