© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021
K. OtsAzure Security Handbookhttps://doi.org/10.1007/978-1-4842-7292-3_3

3. Logging and Monitoring

Karl Ots1  
(1)
Zürich, Zürich, Switzerland
 

Modern applications leverage a variety of cloud services and often span across IaaS, PaaS, and SaaS. In addition, the renewed complexity of applications themselves, the public cloud environments also generate a vast number of signals on their own. Monitoring these environments is different from traditional systems. For example, instead of locking the perimeter down and monitoring activities within your perimeter, you instead abide by the limitations of the cloud provider.

In this chapter, I will introduce you to platform, infrastructure, and application security monitoring and learn about the differences in the various log types. After reading this chapter, you will be able to describe end-to-end monitoring in Azure and select appropriate Azure monitoring tools for your environment.

Platform Monitoring

Traditionally, your applications are hosted in an infrastructure with a varying level of quality and access to monitoring signals. When you host your own infrastructure, you have the opportunity to combine data across your hosting environment. These could include CCTV feeds, physical access logs in and out of the datacenter building, HVAC systems, network components, as well as health of the hardware, operating systems, and applications of your applications. Integrating monitoring data from all these systems would be costly, but not impossible. After taking on such an exercise, you would be faced with massive number of signals to monitor. Identifying any security incidents or events would be like trying to find a needle in haystack.

This is different in an environment where some or all the parts are outsourced to a third party. If your datacenter is co-located with other customers, you will not have an unlimited access to the physical security information. Similarly, if you are consuming compute services from a hosting provider, they are not necessarily able to share the hypervisor-level monitoring data to protect their other customers. In some cases, the company is providing you with physical infrastructure, and managing your servers might even be competing, and it will not be in their best interest to share everything in a transparent manner.

All this changes, with the move to public cloud. Everything in Microsoft Azure is software defined; you have signals available to you from across your hosting platform. The signals are available to you in a standardized manner, which makes it easier to correlate across signal types. And what's more, as Microsoft is responsible for the physical and host security, they provide you with reports with pre-correlated data and in some cases even alerts. Microsoft is constantly analyzing any malicious activity against their infrastructure. This lets them provide you with anonymized security intelligence information, even when your environment is not yet being attacked.

Activity Logs

Each Azure subscription automatically stores platform-level logs as activity logs. The activity logs are immutable and stored for 90 days. They can be further exported using diagnostic settings, ensuring longer retention. The activity log schema groups the log events to categories such as administrative, service health, and security. For a full and up-to-date list of categories, please refer to the activity log schema.1

As the activity logs are immutable, you get a reasonable audit trail of key platform events out of the box. The out-of-the-box audit trail contains events such as deletion of resources and changes in access control assignments.

Administrative Activity Logs

This is the main category for Azure activity logs. Azure activity logs monitor any write operations to the management plane of Azure Resource Manager and logs them in the administrative log category. The operations are logged regardless of whether they were successful or not.

Note

Activity log events are created for unsuccessful write operations, too.

Activity logs consider PUT, POST, and DELETE as write operations to be logged. The activity logs include the following information:
  • The attempted operation and the outcome

  • Identifying information about the user who initiated the operation, such as User Principal Name, authentication methods, and IP address used

  • Additional information, such as a reason for a failed operation

Listing 3-1 illustrates the metadata available in an activity log event. Note that not all this information is populated when the source of the event is using a service principal.
"claims": {
 "http://schemas.microsoft.com/claims/authnmethodsreferences":
        "pwd,mfa",
        "ipaddr": "{IP address}",
        "name": "Karl Ots",
    },
    "category": {
        "value": "Administrative",
    },
    "eventTimestamp": "2021-02-14T13:37:59.86869Z",
    "operationName": {
        "value": "microsoft.insights/diagnosticSettings/write",
        "localizedValue": "Create or update resource diagnostic setting"
    }
}
Listing 3-1

A truncated example of an administrative activity log event.

Activity Logs: Authorization
Administrative category also stores authorization events. The authorization events are stored for all scopes within your subscription (subscription, resource group, resource). The following RBAC authorization events are logged to activity log in the administrative category:
  • Create role assignment.

  • Delete role assignment.

  • Create or update custom role definition.

  • Delete custom role definition.

Service Health

Events in this category are emitted by Azure service health. These events originate from Microsoft's own monitoring of their infrastructure. As Microsoft is responsible for both the hardware and software (in case of platform as a service), you could compare these events to events emitted from both your datacenter operations team and your managed services team. You can use service health to set up alerts and get notified when there is an Azure service outage that is impacting you. The possible events include
  • Service issues: Problems in the Azure services that affect service types you are using in the Azure regions you are using them. You are informed of service issues in real time. Once the issues have been resolved, you are able to export the reports that describe in detail which of your services were impacted, how, and how long.

  • Planned maintenance: Upcoming virtual machine maintenance that can affect the availability of your services.

  • Health advisories: Changes in Azure services that require actions from you, such as upgrading the middleware frameworks you are using in your Azure App Service application.

  • Security advisories: Security-related notifications or violations that may affect the availability of your Azure services.

In case of any broader outages, Microsoft publishes root cause analyses to Azure service health. Both the RCA and regular service issue reports can be exported as a PDF to be stored outside of the Azure systems. Figure 3-1 illustrates the level of detail available in the report.
../images/508755_1_En_3_Chapter/508755_1_En_3_Fig1_HTML.jpg
Figure 3-1

A service issue report downloaded from Azure service health

Security

This category contains the record of alerts and security incidents generated by Azure Security Center. Alerts are individual security threats that require your attention. Security incidents combine multiple alerts into a single view. Both alerts and incidents describe the impacted resources, as well as a line the potential attack on the MITRE ATT&CK matrix.

Policy

This category contains records of all effect action operations performed by Azure policy engine. Whenever a new Azure Resource Manager request is evaluated against a policy, it is logged into the activity log. Compliance policies that evaluate existing resources are also logged, so you should be wary off signal noise in this activity log category. I recommend you consider the policy activity log events as complimentary logs for audit and record-keeping purposes. For security posture management perspective, you should monitor policies with Azure Security Center.

Deployment History

Resource groups store management activities into deployment history. The deployment history contains information about the template used for deployment, such as defined resources, used parameters or secrets, and any output values. As deployment history is visible in the Azure Portal to users with reader access or above, it is a best practice to secure any sense to do parameters by using the secure string data type in your templates. By using the secure string type, the values of the parameters are not logged into deployment history, as illustrated in Figure 3-2.
../images/508755_1_En_3_Chapter/508755_1_En_3_Fig2_HTML.jpg
Figure 3-2

Deployment history view in the Azure Portal: the adminPassword parameter is defined as a secure string type

Deployment history is not a write-once-read-many (WORM) log store: deployment history events can be manually deleted like any other resources. Deployment history is stored as metadata in the resource group until they are deleted over the resource group reaches the maximum limit. The maximum number of deployment history items per resource group is 800.

Note

Only Azure Resource Manager Deployments provide template history; other infrastructure-as-code methods that use the AZ CLI, such as Terraform, are not trackable in the deployment history. Additionally, the Azure Portal deploys resources using arm templates, so manual deployments from the portal are locked in the deployment history.

If your resource group has 800 deployments stored in its deployment history, the subsequent deployments will fail. To mitigate this, you can manually delete deployment history items, or you can rely on the automatic deletions feature, introduced in 2020. The automatic deletions feature deletes deployment history items in a first-in, first-out manner. The feature aims to keep the number of deployment history items at around 750, but this number is subject to change. If you do not want to use deployment history at all, you can override the name of your deployments. When you deploy a template with the same name as one in the deployment history, the existing deployment history item is replaced.

By correlating the event ID found in your deployment history with the activity logs, you can identify the Azure AD identity used to perform the deploy the template. Listing 3-2 illustrates this.
    "correlationId": "7a34bd43-f8aa-46d6-af5e-a37a73e1a3eb",
  "operationId": "a36eefdd-17cb-4abc-a8b7-da79652e121d",
  "operationName": {
      "value": "Microsoft.Resources/deployments/write",
      "localizedValue": "Create Deployment"
Listing 3-2

Excerpt from an activity log event which correlates to a deployment history item

Azure AD Monitoring

Azure AD provides two main log categories. Activity logs include activities such as user sign-ins and changes made to Azure AD resources (users, groups, roles etc.). Security logs include correlated information from Azure Information Protection. Security logs include risky sign-in logs and reports for users that are flagged as risky users. Risk profiles from Azure Information Protection can also be used as conditions when creating conditional access policies.

Azure Information Protection correlates user sign-in information with Microsoft's internal and external threat intelligence sources. Some of the risks are evaluated in real time. If a user attempts to log in from an anonymous IP addresses (such as from a Tor network exit node), they are immediately flagged for sign-in risk. Another real-time risk is labeled “unfamiliar sign-in properties,” and it compares properties such as IP address and physical location to the user’s history.

Most of the sign-in risks are calculated offline. These include
  • Atypical travel, which identifies sign-ins from physically distant locations where the user would not have had time to travel across these locations during the time elapsed between the sign-ins.

  • Malware linked IP address, which detects sign-ins from IP addresses that are within the known infected addresses, such as bot networks.

  • Password spray, which detects sign-in attempts using the same password against multiple users, to perform a brute-force attack while avoiding attempted user accounts to be logged out.

Some Azure services allow you to configure Azure AD as their authentication system. These services include Azure App Service, Azure Databricks, Azure Kubernetes Service, and Azure SQL database.

Note

Whenever your application is integrating with Azure AD, you should monitor Azure AD sign-in logs against that application!

Exercise

You are acting as the security analyst investigating the impact of an ongoing nation-state attack against organizations in your region. You have learned that the adversaries have added additional credentials for service principles with existing elevated privileges2 to gain persistent access. You are tasked to assess whether your environment is impacted. List the steps you need to take and logs you need to query to find this out.

Bonus question

What are the required privileges for your investigation?

Infrastructure Monitoring

For some Azure services, you’re responsible for securing the virtual machine image. In these cases, you need to consume the logs from your infrastructure into a centrally managed location. Monitoring your Azure-based virtual machines typically requires installing a monitoring agent. There are multiple agents available natively in Azure, and the branding is evolving. There are multiple Cloud Workload Protection Platforms (CWPP) vendors, who offer other agent-based monitoring, vulnerability management, and protection solutions that you could use, too. If you choose to use a third-party agent, you should take network considerations into account in your deployment plans.

At the time this book went to press, the most comprehensive native monitoring agent from security perspective is the Log Analytics agent. The Log Analytics agent is named Microsoft Monitoring Agent or OMS agent in some documentation pages. For Windows, it is the same agent used by System Center Operations Manager. The Log Analytics agent collects Windows event logs from Windows virtual machines and Syslog from Linux virtual machines.

In addition to log collection, Azure Security Center’s Azure Defender for Servers includes a vulnerability assessment scanner by Qualys. Once installed, the Qualys agent collects artifacts from the host virtual machine and sends them to the Qualys cloud service of your region for analysis. The findings of the vulnerability assessments performed by Qualys are available in Azure Security Center.

If your virtual machine is configured as a Docker host, Azure Security Center provides you with recommendations to fix vulnerabilities in your container configurations. Azure Security Center uses Center for Internet Security (CIS) Docker Benchmark to perform these assessments.

Container images stored in the Azure Container Registry should also be scanned for vulnerabilities. Azure Defender for Container registries uses Qualys to perform the vulnerability scanning. Images are scanned whenever they are pushed to or pulled from the Azure Container Registry.

Application Monitoring

In this section, I describe how you can monitor the security of your applications and data plane.

Centralized Log Architecture

In this section, I present various options for centralized logging. The most suitable centralized logging architecture for your need varies based on the footprint you have in Azure. Your Azure usage might be split across multiple Azure clouds, Azure Active Directory tenants, and billing accounts.

Enterprise Environment Considerations

The most common scenario includes collecting logs across your organization's Azure subscriptions into a single centralized Log Analytics Workspace. Whether you are collecting all platform, infrastructure, and application logs into this centralized log store varies based on your unique requirements.

Activity log collection can be enabled centrally, if the identity used for enabling the log export has access to the target Azure subscription, as well as the centralized Log Analytics Workspace.

Any logs that require setting up a diagnostic setting required access to be reversed. The identity creating the diagnostic setting in the target subscription needs to have access to the centralized Log Analytics Workspace at the time the diagnostic setting is created.

Figure 3-3 illustrates the centralized log structure concept. In this example, the centralized log store (Log Analytics Workspace) is deployed to a separate subscription. This enables for a granular access control, providing access to the centralized security team for security posture management as well as security operations monitoring purposes.
../images/508755_1_En_3_Chapter/508755_1_En_3_Fig3_HTML.jpg
Figure 3-3

Centralized log structure

This centralized log store is then configured to ingest platform logs from sources such as Azure AD, ARM Activity Logs, and Resource Logs. Access control and log retention are set separately for the centralized log store and the application-specific log stores. While key platform and infrastructure logs are stored in the centralized log store, this approach supports more verbose logging through the usage of subscription-specific log stores. I consider at least the following Resource logs relevant for centralized logging purposes:
  • SQL audit logging, such as password resets and security logs

  • App Service antivirus logging

  • Key Vault access logs

  • Web Application Firewall logs

Securing the Centralized Log Store

To secure the centralized Log Analytics Workspace, you should control the public network access for ingestion and queries, as illustrated in Figure 3-4. You should also consider creating a diagnostic setting to audit any access to the centralized Log Analytics Workspace.
../images/508755_1_En_3_Chapter/508755_1_En_3_Fig4_HTML.jpg
Figure 3-4

Network isolation settings for Log Analytics Workspace

Depending on data stored in your centralized log store and your regulatory requirements, you might need to provide audit logs of your audit log access too. Log Analytics Workspace can emit audit logs using diagnostic settings. The specific log type is labeled LAQueryAuditLogs.

Note

Remember to keep a balance between usefulness for a company-wide audit and usefulness for application-specific requirements!

Complex Environments

In addition to the global Azure cloud, you might be using more than one Azure sovereign cloud to meet regulatory requirements. Azure US government cloud, Azure cloud in China, and Azure cloud in Germany are the most common examples. These sovereign clouds have their own network and identity parameters. In addition to technical connectivity issues, your regulatory requirements might prevent you from exporting logs from these sovereign clouds to global Azure infrastructure.

Whether or not you are using sovereign clouds, you might be using more than one Azure Active Directory tenant. This could be due to your business requirements, such as past or upcoming acquisitions, or for any number of reasons. If that is the case, you need to build a solution that integrates across the trust boundary of these tenants. You will effectively consume these logs in the same way as you would consume logs from Azure to a SIEM hosted by a third party.

Security Posture Monitoring

In this section, I discuss strategic choices your organization needs to take for security posture monitoring. There are several multi-cloud cloud security posture management (CSPM) vendors3 in the market who offer visibility to a broad set of public cloud environments. Regretfully, most of these are focusing on only a small set of services that are available across all your cloud platforms. While this approach dust gives you a single plane of glass, you are only gaining visibility to the least common denominator across the cloud platforms.
../images/508755_1_En_3_Chapter/508755_1_En_3_Fig5_HTML.jpg
Figure 3-5

Most cloud security posture monitoring tools see only a subset of you your cloud environment

If your cloud security posture management tool reports 100% compliance against your standards, you need to set this number in context. Without knowing how many of your cloud resources are not covered within the reports, you are effectively only seeing the tip of that proverbial iceberg, and you do not have a complete picture of the potential risks, as illustrated in Figure 3-5.

When choosing a cloud security posture monitoring approach, you need to align with your cloud strategy. Specifically, you need to understand the range of existing upcoming cloud services in use within your organization. If you are mostly consuming capacity and services that are similar across cloud vendors, multi-cloud cloud security posture management tools could give you a reasonable coverage across your environment. If you are using platform-as-a-service services, you might need to build additional security monitoring capabilities for the individual clouds you are working with anyway.

Security Posture Monitoring Using Azure Security Center

Azure Security Center is an Azure-native tool that helps you monitor your Azure security posture. Azure Security Center comes built into Azure subscriptions, providing a compliance review against Microsoft set of security best practices, the Azure Security Benchmark (ASB). Azure Security Center can be used by the central security team and by individual application development teams. Azure Security Center set of recommendations conform to Azure role-based access control. Therefore, developers and stakeholders see the security posture information of the resources that they have access to normally.

The central security teams can either use the same Azure-native interface as the application teams do or export the Security Center information to a centralized location. Azure Security Center data can be consumed into custom reports within the Security Center user interface using Azure monitor workbooks. Instead of the Security Center, you can use existing reporting tools at your disposal. Continuous Cloud Optimization Power BI Dashboards project4 is an example of such reporting outside the native Security Center interface.

Azure Security Center helps application teams secure their environment according to Microsoft best practices. Azure Security Center displays security recommendations based on the used Azure resources. Each recommendation provides steps to implement remediating security controls. The proposed security controls are assigned severities (low, medium, high) and are using the Microsoft secure score. Implementing a security control with a significant impact on your security posture improves your secure score more than a smaller improvement. Secure score is a way to gamify this federated security posture management. From an operational perspective, it is relatively straightforward to communicate how secure your environment is using the secure score.
../images/508755_1_En_3_Chapter/508755_1_En_3_Fig6_HTML.jpg
Figure 3-6

Screenshot of Azure Security Center recommendations in the Azure Security Center blade of the Azure Portal

Since 2021, the standard Security Center policy initiative is the Azure Security Benchmark. Azure Security Benchmark is based on Microsoft best practices and mapped into industry standard security controls, such as the ones from Center for Internet Security (CIS) and the National Institute of Standards and Technology (NIST).

Security Policy Initiatives

In addition to the standard policy initiative, you can use Azure Security Center with your own compliance policy initiatives. These policy initiatives are assigned the same way as Azure policies, that is, to the management group, subscription, and other scopes within Azure Resource Manager. Once assigned, conformance against your security policies can be viewed within the native Azure Security Center, specifically in the compliance blade.

You can use your custom security policy the same way you will do the default Security Center policy initiative. If you want to operationalize the custom security policy in a similar manner as the standard Security Center policy initiative, you can provide your own remediation steps and severity information.

Microsoft manages a list of built-in security policy initiatives to help you meet industry or regional regulatory compliance. As of the writing of this book, these policy initiatives include, but are not limited to, the following standards:5
  • HIPAA HITRUST

  • UK NHS

  • SWIFT CSP CSCF

  • CMMC

Security Policy Architecture at Scale

Implementing and enforcing security policies across your Azure environment is a complex topic. On one hand, you want to ensure a consistent security posture and visibility across all your applications. On the other hand, you want to provide room for exception management and multiple levels of granularity. Security policy management therefore ends up being a technical exercise of privileged access and Azure policy assignment(s). Microsoft’s best practices are described in the Azure Security Center Enterprise Onboarding Guide.6

Figure 3-7 illustrates the cumulative effects of these policy initiative assignments. In this figure, the Azure Security Benchmark is assigned in the root management group. This ensures that users with previous access to management groups and subscriptions are not able to modify the integrity of the baseline security policy initiative. Next, applicable policies for standards and compliance are assigned in the management group layers according to your governance model. These built-in policy initiatives can include industry-specific standards or regional compliance requirements. And finally, application teams should be empowered to assign custom policies in their respective environments.
../images/508755_1_En_3_Chapter/508755_1_En_3_Fig7_HTML.jpg
Figure 3-7

An example of nested security policy initiative structure

Change Tracking of Security Policies

Due to the evergreen nature of Azure, and specifically Azure Security Center, the standard security policies assigned to your Azure environment will change over time. You might have regulatory requirements requiring you to keep an audit trail of which policies have been in effect and when.

Microsoft keeps track of policy deprecations and other changes in the Azure Security Center release notes documentation.7 You can use this information as a baseline to meet your change tracking requirements. If you are using a custom policy initiative, you are in control of the changes. In that case, your change tracking information will come from your version control and deployment history of the policy initiative.

A more technical solution for policy initiative change tracking is the AzAdvertiser project.8 This unofficial tool periodically exports built-in Azure policies and policy initiatives from a live Azure subscription. Each policy or initiative is exported in full, so changes can be tracked down to the policy definition level. Beyond manual change tracking illustrated in Figure 3-8, AzAdvertiser provides a RSS feed of any changes.
../images/508755_1_En_3_Chapter/508755_1_En_3_Fig8_HTML.jpg
Figure 3-8

Changelog of the Azure Security Benchmark policy initiative, as displayed in Azadvertizer.net

These sources give you a baseline of changes in the policies. But what about the effective scope of these policies? To answer that question, you would need to complement the policy information with any exceptions you might have. Azure Security Center supports exempting resources or even recommendation categories. Exporting this exemption state can be done using Azure resource graph.9

Azure Tenant Security Scan

In addition to Security Center recommendations and policies, Microsoft built the Secure DevOps kit for Azure (AzSK) to automate security scanning across the application life cycle. The Secure DevOps kit for Azure’s security tested Azure PowerShell to check against misconfigurations. This approach was beneficial for automation purposes and allowed you to control the security of your resources in staging environments, too. In 2021, Microsoft announced the deprecation of Secure DevOps kit for Azure, citing advancements in Azure-native security capabilities such as Azure policy and Security center.

The replacing service is Azure Tenant Security Solution (AzTS). Azure Tenant Security Solution is built by the same team in Microsoft who built the Secure DevOps kit for Azure. It provides comparative security scan coverage for continuous assurance but lacks standalone mode and Azure DevOps pipeline integration. As the AzTS uses Azure Security Center, Azure policy, and Azure resource graph, it is scalable to tens of thousands of Azure subscriptions but, as of 2021, is still lacking some features that are available in the PowerShell-based approach of Secure DevOps kit for Azure.

Summary

In this chapter, you learned that security monitoring at scale in Microsoft Azure is a complex topic. You need to balance between different requirements, level of granularity, storage types, and the signal to noise ratio. To succeed in meeting your business requirements, you need to clearly differentiate where centralized logging and monitoring responsibilities and application team responsibilities begin.

When implemented successfully as part of a comprehensive security architecture, both security posture monitoring and operational security monitoring can provide improved results when compared to on-premises monitoring.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset