Introduction to big data

As seen in the preceding section, data analytics incorporates techniques, tools, and methodologies to explore and analyze data to produce quantifiable outcomes for the business. The outcome could be a simple choice of a color to paint the storefront or more complicated predictions of customer behavior. As businesses grow, more and more varieties of analytics are coming into the picture. In 1980s or 1990s , all we could get was what was available in a SQL Data Warehouse; nowadays a lot of external factors are all playing an important role in influencing the way businesses run.

Twitter, Facebook, Amazon, Verizon, Macy's, and Whole Foods are all companies that run their business using data analytics and base many of the decisions on it. Think about what kind of data they are collecting, how much data they might be collecting, and then how they might be using the data.

Let's look at our grocery store example seen earlier. What if the store starts expanding its business to set up 100s of stores. Naturally, the sales transactions will have to be collected and stored on a scale that is 100s of times more than the single store. But then, no business works independently any more. There is a lot of information out there starting from local news, tweets, yelp reviews, customer complaints, survey activities, competition from other stores, changing demographics, or the economy of the local area, and so on. All such additional data can help in better understanding customer behavior and revenue models.

For example, if we see increasing negative sentiment regarding the store parking facility, then we could analyze this and take corrective action such as validated parking or negotiating with the city public transportation department to provide more frequent trains or buses for better reach.

Such increasing quantity and a variety of data while provides better analytics also poses challenges to the business IT organization trying to store, process, and analyze all the data. It is, in fact, not uncommon to see TBs of data.

Every day, we create more than 2 quintillion bytes of data (2 Exa Bytes), and it is estimated that more than 90% of the data has been generated in the last few years alone.
1 KB = 1024 Bytes
1 MB = 1024 KB
1 GB = 1024 MB
1 TB = 1024 GB ~ 1,000,000 MB
1 PB = 1024 TB ~ 1,000,000 GB ~ 1,000,000,000 MB
1 EB = 1024 PB ~ 1,000,000 TB ~ 1,000,000,000 GB ~ 1,000,000,000,000 MB

Such large amounts of data since the 1990s, and the need to understand and make sense of the data, gave rise to the term big data.

The term big data, which spans computer science and statistics/econometrics, probably originated in the lunch-table conversations at Silicon Graphics in the mid-1990s, in which John Mashey figured prominently.

In 2001, Doug Laney, then an analyst at consultancy Meta Group Inc (which got acquired by Gartner) introduced the idea of 3Vs (variety, velocity, and volume). Now, we refer to 4 Vs instead of 3Vs with the addition of Veracity of data to the 3Vs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.214.56