Chapter 8. Programming for Big Data

In the previous chapter, we covered database access programming with Entity Framework and ADO.NET. In this chapter, we will cover solutions for small big data applications, based on core .NET framework and Entity Framework.

Although big data applications can fulfill government and/or scientific needs, those solutions will need customization at every level; this is beyond the scope of this book. This chapter will focus on small big data application scenarios, such as IoT (Internet of Things) or long-ranged enterprise applications that have been collecting data for decades.

This chapter will cover the following topics:

  • What is big data
  • Architecting big data solutions
  • Microsoft Azure for big data
  • Simplified grid-computing
  • Lookup programming

What is big data?

A big data application deals with large volumes of fast-growing data. This is the most widely accepted definition and the most basic one too. Although a unique academic definition for big data does not exist, a more detailed definition of a big data application states is inclusive of the following criteria:

  • It handles huge volumes of data, to take care of its size on every usage such SQL SELECT queries or similar. As the word Big suggests, to deal with big data, the total data size must be huge. These days, any database that is less than 100 GB in size cannot be considered as a valid big data storage.
  • It handles fast-growing data in the meaning of velocity of growth. Real big data architecture and solutions are applicable only to fast-growing data; otherwise, we are simply dealing with a huge dataset. Any ever-growing large data store can be handled easily by any application in a few hours or days, depending on the scale of the data. The important thing to note here is that eventually, we will always be able to finish the computation. It is the rapidly increasing data that forces developers to come up with and use specific techniques and technologies to handle such data properly. This is different from the methods used to process standard data because these techniques are unable to deal with data that is both large and rapidly growing.
  • It handles a great variety of non-homogeneous data types. Because of the intrinsic data handled by any big data application, a data item can be of many different types. It frequently happens that same data types exist in multiple versions, incrementing the overall data type number.

Although more complex or specific definitions of big data actually exist, we will stick to the more canonical one. This choice is necessary because of the poor uniformity of the definition among scientists and IT organizations that deal with big data.

Now that we have an idea of what big data is, it's time to take a look at related technologies and techniques. In terms of data storage, a huge big data solution will rely on NoSQL databases because of their intrinsic high speed data read/write capability. This does not mean that relational databases are not fast enough for big data because often, they are also used in similar designs. However, when dealing with very large applications, any small improvement in speed can result in crucial improvements in the overall application performance.

In big data applications, most of the development effort focuses on data analysis. In terms of analysis computation, extreme parallelism is the key to success when dealing with big data applications. As seen in Chapter 2, Architecting High-performance .NET Code, the most parallelizable design is based on grid-computing techniques that are made across heavy distributed programming technologies. Ideally, the most powerful big data design in terms of throughput heavily uses grid and parallel programming, together with an asynchronous design regarding data analysis and persistence.

Because this book is not addressed to scientists from NASA or NSA, we will still use a relational RDBMS with examples showing you how to handle a table with a billion rows with SQL-based data sources.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.191.134