Data classification

Let's look into how we can classify data in the context of designing data algorithms. As discussed in Chapter 2, Data Structures Used in Algorithms, quantifying the volume, variety, and velocity of the data can be used to classify it. This classification can become a basis to design data algorithms that can be used for its storage and processing.

Let's look into these characteristics one by one in the context of data algorithms:

 

  • Volume quantifies the amount of data that needs to be stored and processed in an algorithm. As the volume increases, the task becomes data-intensive and requires provisioning enough resources to store, cache, and process data. Big data is a term that vaguely defines a large volume of data that cannot be handled by a single node.
  • Velocity defines the rate at which new data is being generated. Usually, high-velocity data is called "hot data" or a "hot stream" and low-velocity data is called a "cold stream" or simply "cold data". In many applications, data will be a mix of hot and cold streams that will first need to be prepared and combined into a single table before it can be used with the algorithm. 
  • Variety refers to different types of structured and unstructured data that needs to be combined into a single table before it can be used by the algorithm.

The next section will help us to understand the trade-offs involved and will present various design choices when designing storage algorithms.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.162.110