Introducing Big Data | 23
What about Google? Coming to think of it, Google’s core job is to download the whole internet
for its users, so the statistics should not come as a surprise. The pages which are downloaded
are consistently treated for providing optimum search results in real time, and therefore, it needs
to be indexed. While Google does not provide numbers on how much data it stores, it reports
handling 40,000 search queries every second on an average.
The next V stands for Variety. It refers to the structure of the data. Most data in the Big Data
world is not relational data, which is organized neatly into rows and columns. About 80–90%
of the data we deal today are semi-structured or unstructured. The different types of data along
with its appropriate examples were explained in the preceding chapter. Usually, if the data one is
dealing with is much less structured, and there is considerable variability in the structure of the
data, then one may view this as a Big Data problem.
The last ‘V ’ is for Veracity which is about the authenticity of the data. The veracity of Big Data
refers to the biases, noise and abnormality in data. At present, each and every organization aims
to build a strong foundation of data that can be analysed to derive actionable business insights.
However, it is important for the data to be authentic, so that the analytics outcomes from the data
are accurate and reliable. However, data, especially from sources external to the enterprise, for
example, from app usage, social networks, etc., are often noisy or biased. Therefore, it is imper-
ative that there are suitable measures in place to ensure data veracity.
Points to Ponder
According to Facebook’s Q3 2018 business performance reports, for every 60 seconds on
Facebook, 510,000 comments are posted, 293,000 statuses are updated and 136,000 photos
are uploaded.
Google reports that it handles 3.5 billion searches per day.
2.5 SOURCES OF BIG DATA
Some of the ‘new’ sources of data, as compared to more ‘traditional’ sources, have already been
discussed in this chapter. This section explores some of these sources in detail, and explains what
makes this data an invaluable source of actionable business insights.
Social media data, such as comments, status messages, pictures and videos posted on social
media networks, provide great inputs to Big Data analysis. These are sources of information about
customer networks, customer preferences, customer purchase intents and customer sentiments
about products and services.
Mobile phone towers produce a variety of data both about the calls that they connect and
complete, as well as about the devices that pass near the towers.
Cars, aeroplanes and other vehicles have sensors which can generate lots of instrumentation
data on a variety of parameters. Sensors attached to machines in manufacturing plants generate
huge volumes of data in real time. Such information can be used to monitor the health of the
machines these sensors are attached to, and can be used for planning proactive maintenance
activities, thereby reducing downtime and improving productivity and longevity.
M02 Big Data Simplified XXXX 01.indd 23 5/10/2019 9:56:51 AM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.137.58