Data warehousing

Data warehouse databases are more suitable for Online Analytical Processing (OLAP) applications. Data warehouses provide fast aggregation capability over vast volumes of structured data. While these technologies, such as Amazon Redshift, Netezza, and Teradata, are designed to execute complex aggregate queries quickly, they are not optimized for high volumes of concurrent writes. So, data needs to be loaded in batches, preventing warehouses from being able to serve real-time insights over hot data.

Modern data warehouses use a columnar base to enhance query performance. Examples of this include Amazon Redshift, Snowflake, and Google Big Query. These data warehouses provide very fast query performance due to columnar storage and improve I/O efficiency. In addition to that, data warehouse systems such as Amazon Redshift increase query performance by parallelizing queries across multiple nodes and take advantage of massive parallel processing (MPP).

Data warehouses are central repositories that store accumulations of data from one or multiple sources. They store current and historical data used to help create analytical reports for business data analytics. However, data warehouses store data centrally from multiple systems but they can not be treated as a data lake. Data warehouses handle only structured relational data while data lakes work with both structured relational data and unstructured data such as JSON, logs, and CSV data.

Data warehouse solutions such as Amazon Redshift can process petabytes of data and provide decoupled compute and storage capabilities to save costs. In addition to columnar storage, Redshift uses data encoding, data distribution, and zone maps to increase query performance. More traditional row-based data warehousing solutions include Netezza, Teradata, and Greenplum.

Table of Contents for Data warehousing

Create new playlist

Sign In

Sign Up

Table of Contents for
Data warehousing