In recent years, we have seen an exponential growth in data generated by humans and machines. Varied sources, including home sensors, healthcare-related monitoring devices, news feeds, conversations on social media, images, and worldwide commerce transactions—an endless list—contribute to the vast volumes of data generated every day.
Facebook had 1.28 billion daily active users in March 2017 sharing close to four million pieces of unstructured information as text, images, URLs, news, and videos (Source: Facebook). 1.3 billion Twitter users share approximately 500 million tweets a day (Source: Twitter). Internet of Things (IoT) sensors in lights, thermostats, sensor in cars, watches, smart devices, and so on, will grow from 50 billion to 200 billion by 2020 (Source: IDC estimates). YouTube users upload 300 hours of new video content every five minutes. Netflix has 30 million viewers who stream 77,000 hours of video daily. Amazon has sold approximately 480 million products and has approximately 244 million customers. In the financial sector, the volume of transactional data generated by even a single large institution is enormous—approximately 25 million households in the US have Bank of America, a major financial institution, as their primary bank, and together produce petabytes of data annually. Overall, it is estimated that the global Big Data industry will be worth 43 billion US dollars in 2017 (Source: www.statista.com).
Each of the aforementioned companies and many more like them face the real problem of storing all this data (structured and unstructured), processing the data, and learning hidden patterns from the data to increase their revenue and to improve customer satisfaction. We will explore how current methods, tools and technology can help us learn from data in Big Data-scale environments and how as practitioners in the field we must recognize challenges unique to this problem space.
This chapter has the following structure:
There are many characteristics of Big Data that are different than normal data. Here we highlight them as four Vs that characterize Big Data. Each of these makes it necessary to use specialized tools, frameworks, and algorithms for data acquisition, storage, processing, and analytics:
Many have added other characteristics such as value, validity, and volatility to the preceding list, but we believe they are largely derived from the previous four.
3.142.250.203