What is big data?

The internet has grown over the last few years and is not showing any signs of slowing down. Just in the last five years, internet users have grown from a little under 2 billion to around 3.7 billion, accounting for 50% of Earth's total population (up from 30% just 5 years ago).

With more internet users and networks evolving, every year adds increasingly more data to existing datasets. In 2016, global internet traffic was 1.2 zettabytes (which is 1.2 billion terabytes) and it is expected to grow to 3.3 zettabytes by 2021.

This enormous amount of data generates increased needs for processing and analysis. This has generated the need for databases and data stores in general that can scale and efficiently process our data.

The term big data was first coined in the 1980's by John Mashey and mostly came into play in the past decade with the explosive growth of the internet. Big data typically refers to datasets that are too large and complex to be processed by traditional data processing systems and need some kind of specialized system architecture to be processed.

Big data's defining characteristics are in general:

  • Volume
  • Variety
  • Velocity
  • Veracity
  • Variability

Variety and variability refer to the fact that our data comes in different forms and our datasets have internal inconsistencies that need to be smoothed out by a data cleansing and normalization system before we can actually process our data.

Veracity refers to the uncertainty of the quality of data. Data quality may vary, having perfect data for some dates and missing datasets for others. This affects our data pipeline and how much we can invest in our data platforms, since even today one out of three business leaders don't completely trust the information they use to make business decisions.

Finally, velocity is probably the most important defining characteristic of big data (other than the obvious volume attribute) and it refers to the fact that big datasets not only that we have a large volume of data but also grow at an accelerated pace, making traditional storage using, for example, indexing a difficult task.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.