One of the primary purposes of a Security Information and Event Management (SIEM) solution is to centralize the storage and analysis of security events across a diverse range of products that provide protection across your organization’s IT infrastructure. To do this, the solution needs to connect to those data sources, pull the data into a central store, and manage the life cycle of that data to ensure it is available for analysis and ongoing investigations.
In this chapter, we will review the types of data that are most interesting and useful for security operations, and then explore the functionality available to connect to multiple data sources and ingest that data into Azure Sentinel, by storing it in the Log Analytics workspace. Once the data is ingested, we need to ensure the appropriate configuration for data retention to maximize the ability to hunt for events and other security information, while also ensuring the cost of the solution is maintained at a reasonable level.
We will cover the following areas specific to data collection:
Then, we will cover these areas to ensure appropriate data management:
Quality data management is critical to the success of big data analytics, which is the core basis of how a SIEM solution works. Gathering data for analysis is required in order to find security threats and unusual behavior across a vast array of infrastructure and applications. However, there needs to be a balance between capturing every possible event from all available logs and not having enough data to really find the correlating activities. Too much data will increase the signal noise associated with alert fatigue and will increase the cost of the security solution to store and analyze the information, which, in this case, is Azure Log Analytics and Azure Sentinel, but also applies to other SIEM solutions.
One of the recent shifts in the security data landscape is the introduction of multiple platforms that carry out log analysis locally and only forward relevant events on to the SIEM solution. Instead of duplicating the logs, hoping to fish relevant information from it by using a single security analysis tool (such as a SIEM solution), new security products are focused on gathering specific data and resolving threats within their own boundaries; examples include the following:
Note
Refer to Chapter 1, Getting Started with Azure Sentinel, for further details about each of these solutions.
Each of these solutions already gather large volumes of data from their respective data sources; therefore, there is no need to duplicate the data in the SIEM log storage. Instead, these solutions can be integrated with the SIEM solution to only send relevant and actionable information, to enable the SIEM to act as the central point of communication for analysis, alerting, and ticketing. The net result is a reduction in duplication and overall solution cost. This idea is summarized in the following screenshot:
When dealing with large data volumes, we can use the 7 Vs of Big Data to guide our decisions on what is the right data to collect, based on the priorities assigned:
Here is an example of how to use each of these values to prioritize and justify the data— for volume, instead of focusing on the volume of data, we need to focus on the quality and variety of the data to provide accurate and actionable information across multiple systems.
A summary of this topic is shown in the following screenshot:
You can use the chart shown in the preceding screenshot to make your initial assessment of the types of data you need to ingest into Azure Sentinel and that which can be excluded. We recommend you also review this periodically to ensure you are maintaining a healthy dataset, either by adding more data sources or tuning out some of the data that no longer meets the requirements (but which costs to store and process).
Azure Sentinel relies on Log Analytics to store large volumes of data, in order to process that data and find useful information about potential risks and security threats. The data required may be located in many different types of resources across many different platforms, which is why we need many different options for connecting to those data sources. Understanding the options available, and how to configure them, is key to developing a strong data architecture to support the Azure Sentinel solution.
A summary of the connectors is shown in the following screenshot:
Connectors can be categorized based on the method used to ingest data from source. Currently, there are four major types:
Let’s explore each of these types in more detail.
Azure Sentinel has been developed to integrate directly with several resources across the Microsoft security product range, including (but not limited to) the following:
This is the preferred method for connecting to resources, if the option is available. Let’s take a look at direct connectors.
Some connectors available in Azure Sentinel need to be configured from the source location. The connector will usually provide the information required and a link to the appropriate location. Examples of these connectors include the following:
Now, let’s look at API connections.
Several security providers have API options that allow connections to be made to their solutions in order to extract the logs and bring the data in to Azure Sentinel. This is the preferred method for connecting to third-party solutions that support it, and you have the option to create your own connectors. For further information on creating API-based connectors, see this article: https://techcommunity.microsoft.com/t5/azure-sentinel/azure-sentinel-creating-custom-connectors/ba-p/864060.
Examples of API-based data connectors include the following:
The next type of connection is required for services that do not support any of the preceding options; usually for virtual or physical servers, firewalls, proxy, and other network-based devices.
This connector type will allow for the widest range of data connection and is an industry-standard method of shipping logs between resources and SIEM solutions. There are three types of connectors to consider; you may deploy one or more depending on your needs, and you may deploy multiple of the same type too. Let’s discuss them in detail.
Any server running Microsoft Windows can forward logs for Domain Name System (DNS), security events, Windows Firewall, and AD.
This is an agent deployed to a Linux host that can act as a concentrator for many resources to send logs to, which are then forwarded on to Log Analytics for central storage. For detailed guidance on implementing a Syslog server, please see this article: https://docs.microsoft.com/en-us/azure/sentinel/connect-syslog. Examples of third-party solutions that support this method include (but are certainly not limited to) the following:
While these options provide a wide range of options for data sources to gather, there is another method that, if available from the service provider, will give a richer dataset. Let’s take a look at the Common Event Format (CEF) option next.
This is very similar to the Syslog server deployment mentioned previously. For more detailed information, see this article: https://docs.microsoft.com/en-us/azure/sentinel/connect-common-event-format. The difference is that the source supports the CEF for logs. Examples of solutions that support this method include the following:
With this range of connectors available, it is possible to connect to and gather information from multiple resources across all your operating environments, including on-premises, a hosted service, the public cloud, and even industrial operations environments or the Internet of Things (IoT).
The Azure Sentinel - Data connectors page shows the total number of connectors, how many are currently connected, and how many are in development. An example of the Data connectors page is shown in the following screenshot:
As you can see in the preceding screenshot, there are currently 32 connectors available to implement in this Azure Sentinel workspace. The list is likely to grow over time as more solutions become natively integrated, which is why you can see the ability to filter the list and search for specific data connectors. By selecting the connector on the left-hand side, we can view the connector details on the right-hand side. For this example, we will use the data connector for AWS, as shown in the following screenshot:
At the top of the page in the preceding screenshot, we can see the STATUS of the connector (Not connected), the provider (Amazon), and the LAST LOG RECEIVED date/timestamp (empty due to a disconnected state).
The next section provides a description and further details about the connector, including a graph that will show the last few days of active log ingestion rate (when connected).
At the bottom of the page, we can see the Data types that are included in this connector; in this example, we are expecting to retrieve the AWS CloudTrail logs, when enabled.
Click on the Open connector page button to go to the next screen and start the configuration process, as shown in the following screenshot:
Each connector will show a slightly different screen, depending on the type of connector (native, direct, API, or agent) and the steps required to complete the configuration. In this example, the AWS connector is an API-based connector, and instructions are provided on how to set up the required permissions for Azure Sentinel to access the AWS account via the API. Once completed, you can select the Next steps tab to view the available workbooks and other resources available for this data connector, as shown in the following screenshot:
As we can see in the preceding screenshot, the AWS connector has the following two workbooks associated:
Each of these workbooks is configured based on the information available in the AWS CloudTrail logs. The page also provides example queries you can use to get started with interrogating the logs for your own requirements. Further information about how to use workbooks can be found in Chapter 8, Introducing Workbooks.
Now, when we look at a data connector that has been successfully configured, we can view the same pages and see the differences, as shown in the following screenshot:
We can see in the data connector page for Azure AD that this data source is connected and has received logs 4 minutes ago. We can see that 3 workbooks and 2 queries are using this data connector, and a regular flow of data has occurred over the last 3 weeks (December 8 to December 29). By selecting the Open connector page button, we get a view of the details of this connector, as shown in the following screenshot:
On the Instructions page, we see check marks to indicate the successful configuration of each element, with some padlocks to indicate other aspects that are also required. In the Configuration section, both the Azure Active Directory Sign-in logs and the Azure Active Directory Audit logs are connected. If you click on either of the blue buttons for Disconnect, this will stop the logs from being ingested to Azure Sentinel. Selecting the Next steps tab will show more information about what we can do with this connector, as shown in the following screenshot:
On the Next steps page, we can see three recommended workbooks. Two of them have been enabled, shown by the bar on the left-hand side, and one of them is available but not yet enabled.
In this section, we walked through the setup of the data connectors to enable data ingestion. In the next section, we will move on to look at how we manage that data to ensure we retain enough information to be useful, without storing so much that it becomes expensive.
Once you have completed the configuration of a few data connectors, you will begin to see how much data you will ingest and store in Log Analytics on a daily basis. The amount of data you store and retain directly impacts the costs—see Chapter 1, Getting Started with Azure Sentinel for further details. You can view the current usage and costs by navigating to the Log Analytics workspace, then selecting Usage and estimated costs from the General menu, as shown in the following screenshot:
Once selected, you are then presented with a dashboard of information that will show the pricing tier and current costs on the left-hand side and graphs on the right-hand side, to show the variation in consumption on a daily basis for the last 31 days. A second graph shows the total size of retained data, per solution. An example of the dashboard is shown in the following screenshot:
From this page, explore two of the options available along the top menu bar:
In the next section, we will look at how we calculate the costs involved in data ingestion and retention for Azure Sentinel and Log Analytics.
Many organizations have a need to retain security log data for longer than 90 days, and budget to ensure they have enough capacity based on business needs. For example, if we consider the need to keep data for 2 years, with an average daily ingestion rate of 10 GB, then we can calculate the cost of the initial ingestion and analysis, then compare to the cost of retention. This will provide an annual cost estimate for both aspects.
The following table shows the cost for ingesting data into Log Analytics and analyzing that data in Azure Sentinel. This price includes 90 days of free retention:
The following table shows the amount of data being retained past the free 90 days included in the preceding pricing, based on ingesting 10 GB per day:
Now, if we add these together, we can see the total cost of the solution over a 12-month period, shown in the following table:
Note
These prices are based on the current rates applicable to the US East Azure region, and figures are rounded to simplify. Actual data usage may fluctuate each month.
Based on these examples, the total cost for running Azure Sentinel, ingesting 10 GB per day and retaining data for 2 years, would be $39,780. Data retention accounts for 22% of the cost.
Because the charges are based on the volume of data (in GB), one way of maintaining reasonable costs is to carefully select which data is initially gathered, and which data is kept long term. If you plan to investigate events that occurred more than 90 days ago, then you should plan to retain that data. Useful log types for long-term retention include the following:
Other data types can be extremely useful for initial analysis and investigation; however, they do not hold as much value when the relevance of their data reduces. They include the following:
Also, consider that some platforms sending data to Azure Sentinel may also be configured to retain the original copies of the log data for longer periods of time, potentially without additional excessive costs. An example would be your firewall and CASB solutions.
The benefit of retaining data within Log Analytics is the speed of access to search the data when needed, without having to write new queries. However, many organizations require specific log data to be retained for long periods of time, usually to meet internal governance controls, external compliance requirements, or local laws. Currently, there is a limitation, as Log Analytics only supports storage for up to 2 years.
The following solutions may be considered as an alternative for long-term storage, outside of Log Analytics:
As you can see, there are plenty of options available to store the data in alternative locations, both for extended archive/retention and for additional analysis with alternative tools. We expect Microsoft will increase the number of options available.
Summary
In this chapter, we reviewed the importance of data quality, using the 7 Vs of Big Data as a guide to selecting the right data. We also looked at the various data connectors available to retrieve logs from a wide variety of sources, and the importance of constantly reviewing the connectors for updates and additional resources, such as workbooks. You now have the skills required to set up data connectors to begin ingesting data for later use in analysis and threat hunting.
Ongoing data management plays a key part of this solution, ensuring you maintain cost efficiency of the solution without losing valuable information that can help identify risk and mitigate potential loss. Use the information in this chapter to apply to your own environment, and review regularly.
In the next chapter, you will learn how to integrate threat intelligence feeds into Azure Sentinel, in order to enrich your data with insights from security experts and make your investigations more effective.
Use these questions to test your knowledge of this chapter:
The following resources can be used to further explore some of the topics covered in this chapter:
3.133.146.47