Online analytical processing (OLAP)

OLTP and NoSQL databases are useful for application deployment but have limited capabilities for large-scale analysis. A query for a large volume of structured data for analytics purposes is better served by a data warehouse platform designed for faster access to structured data. Modern data warehouse technologies adopt the columnar format and use massive parallel processing (MPP), which helps to fetch and analyze data faster.

The columnar format avoids the need to scan the entire table when you need to aggregate only one column for data—for example, if you want to determine the sales of your inventory in a given month. There may be hundreds of columns in the order table, but you need to aggregate data from the purchase column only. With a columnar format, you will only scan the purchase column, which reduces the amount of data scanned compared to the row format, and thereby increases the query performance.

With massive parallel processing, you store data in a distributed manner between child nodes and submit a query to the leader nodes. Based on your partition key, the leader node will distribute queries to the child nodes, where each node picks up part of a query to perform parallel processing. The leader node then collects the subquery result from each child node and returns your aggregated result. This parallel processing helps you to execute the query faster and process a large amount of data quicker.

You can use this kind of processing by installing software such as IBM Netezza or Microsoft SQL server on a virtual machine, or you can go for a more cloud-native solution, such as Snowflake. A public cloud, such as Amazon Web Service, provides the petabyte-scale data warehousing solution Amazon Redshift, which uses the columnar format and massively parallel processing. You will learn more about data processing and analytics in Chapter 13, Data Engineering and Machine Learning.

You need to store and search a large amount of data, especially when you want to find a specific error in your logs or build a document search engine. For this kind of capability, your application needs to create a data search. Let's learn more about the data search.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.153.69