Conclusion to Part IV

Part IV, consisting of six chapters, described some of the experimental systems we have designed and developed that illustrate the key points of both big data management and analytics (BDMA) and big data security and privacy (BDSP) systems.

In Chapter 23, we presented a framework capable of handling enormous amounts of resource description framework (RDF) data that can be used to represent big data systems such as social networks. Our framework is based on the Hadoop/MapReduce technologies and implements a SPARQL query processor that can handle massive amounts of data. We also provided a brief overview of our security prototype that we built on top of the query processing system. In Chapter 24, we described the design of the big data analytics system called InXite. InXite will be a great asset to the analysts who have to deal with massive amounts of data streams in the form of billions of blogs and messages among others. For example, by analyzing the behavioral history of a particular group of individuals as well as details of concepts such as events, analysts will be able to predict behavioral changes in the near future and take necessary measures. We also discussed the use of cloud computing and various big data tools in the implementation of InXite. Chapter 25 described our design and implementation of a cloud-based information sharing system called CAISS. CAISS utilizes several of the technologies we have developed as well as open source tools. We also described the design of an ideal cloud-based assured information sharing system called CAISS++. In Chapter 26, we described techniques to protect our data by encrypting it before storing on cloud computing servers like Amazon S3. Our approach is novel as we propose to use two key servers to generate and store the keys. Also, we assure more security than some of the other known approaches as we do not store the actual key used to encrypt the data. This assures the protection of our data even if one or both key servers are compromised. Our implementation utilizes Blackbook, a semantic web-based data integration framework and allows data integration from various data sources. In Chapter 27, we formulated the intrusion detection problems as classification problems for infinite-length, concept-drifting data streams. Concept drift occurs in these streams as attackers react and adapt to defenses. We formulated both malicious code detection and botnet traffic detection as such problems, and introduced extended, multiple partition, multiple chunk, a novel ensemble learning technique for automated classification of infinite-length, concept-drifting streams. Finally, In Chapter 28, we described the first of a kind inference controller that will control certain unauthorized inferences for provenance data represented as RDF graphs. We also argued that inference control is an area that will need the use of BDMA systems for managing the data as well as reasoning about the data.

Now that we have described some of the experimental BDMA and BDSP systems we have developed in Part IV and focused on the details of stream data analytics in Parts II and III, we are now ready to describe several directions for BDMA and BDSP systems in Part V.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.137.243