0%

Book Description

Companies of all sizes are considering data lakes as a way to deal with terabytes of security data that can help them conduct forensic investigations and serve as an early indicator to identify bad or relevant behavior. Many think about replacing their existing SIEM (security information and event management) systems with Hadoop running on commodity hardware.

Before your company jumps into the deep end, you first need to weigh several critical factors. This O'Reilly report takes you through technological and design options for implementing a data lake. Each option not only supports your data analytics use cases, but is also accessible by processes, workflows, third-party tools, and teams across your organization.

Within this report, you'll explore:

  • Five questions to ask before choosing architecture for your backend data store
  • How data lakes can overcome scalability and data duplication issues
  • Different options for storing context and unstructured log data
  • Data access use cases covering both search and analytical queries via SQL
  • Processes necessary for ingesting data into a data lake, including parsing, enrichment, and aggregation
  • Four methods for embedding your SIEM into a data lake

Table of Contents

  1. 1. The Security Data Lake
    1. Leveraging Big Data Technologies to Build a Common Data Repository for Security
    2. Comparing Data Lakes to SIEM
    3. Implementing a Data Lake
    4. Understanding Types of Data
      1. Time-Series Data
      2. Contextual Data
    5. Choosing Where to Store Data
    6. Knowing How Data Is Used
      1. How Much Data Do We Have in Total?
      2. How Fast Does the Data Need to Be Ready?
      3. How Much Data Do We Query, and How Often?
      4. Where Is the Data and Where Does It Come From?
      5. What Do You Want with the Data and How Do You Access It?
    7. Storing Data
      1. Using Parsers
      2. Storing Log Data
      3. Storing Context
    8. Accessing Data
    9. Ingesting Data
    10. Understanding How SIEM Fits In
      1. Traditional Data Lake
      2. Preprocessed Data
      3. Split Collection
      4. Federated Data Access
    11. Acknowledgments
    12. Appendix: Technologies To Know and Use
3.14.253.221