Veracity of Data

Data can be analyzed for actionable insights, but with so much data of all types being analyzed from across data sources, it is very difficult to ensure correctness and proof of accuracy.

The following are the 4 Vs of big data:

To make sense of all the data and apply data analytics to big data, we need to expand the concept of data analytics to operate at a much larger scale dealing with the 4 Vs of big data. This changes not only the tools, technologies, and methodologies used in analyzing data, but also the way we even approach the problem. If a SQL database was used for data in a business in 1999, now to handle the data for the same business we will need a distributed SQL database scalable and adaptable to the nuances of the big data space.

Big data analytics applications often include data from both internal systems and external sources, such as weather data or demographic data on consumers compiled by third-party information services providers. In addition, streaming analytics applications are becoming common in big data environments, as users look to do real-time analytics on data fed into Hadoop systems through Spark's Spark streaming module or other open source stream processing engines, such as Flink and Storm.

Early big data systems were mostly deployed on-premises particularly in large organizations that were collecting, organizing, and analyzing massive amounts of data. But cloud platform vendors, such as Amazon Web Services (AWS) and Microsoft, have made it easier to set up and manage Hadoop clusters in the cloud, as have Hadoop suppliers such as Cloudera and Hortonworks, which support their distributions of the big data framework on the AWS and Microsoft Azure clouds. Users can now spin up clusters in the cloud, run them for as long as needed, and then take them offline, with usage-based pricing that doesn't require ongoing software licenses.

Potential pitfalls that can trip up organizations on big data analytics initiatives include a lack of internal analytics skills and the high cost of hiring experienced data scientists and data engineers to fill the gaps.

The amount of data that's typically involved, and its variety, can cause data management issues in areas including data quality, consistency, and governance; also, data silos can result from the use of different platforms and data stores in a big data architecture. In addition, integrating Hadoop, Spark, and other big data tools into a cohesive architecture that meets an organization's big data analytics needs is a challenging proposition for many IT and analytics teams, which have to identify the right mix of technologies and then put the pieces together.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.91.252