Database systems

Data storage implies the use of databases, historically dominated by relational database management systems (RDBMS) that use SQL to store and retrieve data in a well-defined table format with commercial providers like Oracle and Microsoft and open-source implementations like PostgreSQL and MySQL. More recently, alternatives have emerged that are often collectively labeled NoSQL but are quite diverse, namely:

  • Key-value storage: Fast read/write access to objects. We covered the HDF5 format in Chapter 2Market and Fundamental Data that facilitates fast access to a pandas DataFrame. 
  • Columnar storage: Capitalizes on the homogeneity of data in a column to facilitates compression and faster column-based operations such as aggregation. Used in the popular Amazon Redshift data warehouse solution, Apache Parquet, Cassandra, or Google's Big Table.
  • Document store: Designed to store data that defies the rigid schema definition required by an RDBMS. Popularized by web applications that use JSON or XML format that we encountered in Chapter 4, Alpha Factor Research. Used, for example, in MongoDB.
  • Graph database: Designed to store networks that have nodes and edges and specializes in queries about network metrics and relationships. Used in Neo4J and Apache Giraph. 

There has been some conversion towards the conventions established by the relational database systems. The Python ecosystem facilitates the interaction with many standard data sources and provides fast HDF5 and Parquet formats as demonstrated throughout the book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
54.234.136.147