Using the right storage for the right need

For decades, organizations have been using a traditional relational database and trying to fit everything there, whether it is key/value-based user session data, unstructured log data, or analytics data for a data warehouse. However, the truth is, the relational database is meant for transaction data, and it doesn't work very well for other data types—it's like using a Swiss Army knife, which has multiple tools that work but to a limited capacity; if you want to build a house, then the screwdriver will not be able to perform a heavy lift. Similarly, for specific data needs, you should choose the right tool that can do the heavy lifting, and scale without compromising performance.

Solution architects need to consider multiple factors while choosing the data storage to match the right technology. Here are the important ones:

Durability requirement: How should data be stored to prevent data corruption?
Data availability: Which data storage system should be available to deliver data?
Latency requirement: How fast should the data be available?
Data throughput: What is the data read and write need?
Data size: What is the data storage requirement?
Data load: How many concurrent users need to be supported?
Data integrity: How to maintain the accuracy and consistency of data?
Data queries: What will be the nature of queries?

In the following table, you can see different types of data with examples and appropriate storage types to use. Technology decisions need to be made based on storage type, as shown here:

Data Type	Data Example	Storage Type	Storage Example
Transactional, structured schema	User order data, financial transaction	Relational database	Amazon RDS, Oracle, MySQL, PostgreSQL, MariaDB Microsoft SQL Server
Key-value pair, semi-structured, unstructured	User session data, application log, review, comments	NoSQL	Amazon DynamoDB, MongoDB, Apache HBase, Apache Cassandra, Azure Tables,
Analytics	Sales data, Supply chain intelligence, Business flow	Data warehouse	IBM Netezza, Amazon Redshift, Teradata, Greenplum, Google BigQuery
In-memory	User home page data, common dashboard	Cache	Redis cache, Amazon ElastiCache, Memcached
Object	Image, video	File-based	SAN, Amazon S3, Azure Blob Storage, Google Storage
Block	Installable software	Block-based	NAS, Amazon EBS, Amazon EFS, Azure Disk Storage
Streaming	IoT sensor data, clickstream data	Temporary storage for streaming data	Apache Kafka, Amazon Kinesis, Spark Streaming, Apache Flink
Archive	Any kind of data	Archive storage	Amazon Glacier, magnetic tape storage, virtual tape library storage
Web storage	Static web contents such as images, videos, HTML pages	CDN	Amazon CloudFront, Akamai CDN, Azure CDN, Google CDN, Cloudflare
Search	Product search, content search	Search index store and query	Amazon Elastic Search, Apache Solr, Apache Lucene
Data catalog	Table metadata, data about data	Meta-data store	AWS Glue, Hive metastore, Informatica data catalog, Collibra data catalog
Monitoring	System log, network log, audit log	Monitor dashboard and alert	Splunk, Amazon CloudWatch, SumoLogic, Loggly

As you can see in the preceding table, there are various properties of data, such as structured, semi-structured, unstructured, key-value pair, streaming, and so on. Choosing the right storage helps to improve not only the performance of the application, but also its scalability. For example, you can store user session data in the NoSQL database, which will allow application servers to scale horizontally and maintain user sessions at the same time.

While choosing storage options, you need to consider the temperature of the data, which could be hot, warm, or cold:

For hot data, you are looking for sub-millisecond latency and required cache data storage. Some examples of hot data are stock trading and making product recommendations in runtime.
For warm data, such as financial statement preparation or product performance reporting, you can live with the right amount of latency, from seconds to minutes, and you should use a data warehouse or a relational database.
For cold data, such as storing 3 years of financial records for audit purposes, you can plan latency in hours, and store it in archive storage.

Choosing the appropriate storage, as per the data temperature also saves costs in addition to achieving performance SLA. As any solution design revolves around handling the data, so a solution architect always needs to understand their data thoroughly and then choose the right technology.

In this section, we have covered a high-level view of data in order to get the idea of using the proper storage, as per data nature. You will learn more about data engineering in Chapter 13, Data Engineering and Machine Learning. Using the right tool for the right job helps to save costs and improve performance, so it's essential to choose the right data storage for the right need.

Table of Contents for Using the right storage for the right need

Create new playlist

Sign In

Sign Up

Table of Contents for
Using the right storage for the right need