Modern applications leverage a variety of cloud services and often span across IaaS, PaaS, and SaaS. In addition, the renewed complexity of applications themselves, the public cloud environments also generate a vast number of signals on their own. Monitoring these environments is different from traditional systems. For example, instead of locking the perimeter down and monitoring activities within your perimeter, you instead abide by the limitations of the cloud provider.
In this chapter, I will introduce you to platform, infrastructure, and application security monitoring and learn about the differences in the various log types. After reading this chapter, you will be able to describe end-to-end monitoring in Azure and select appropriate Azure monitoring tools for your environment.
Traditionally, your applications are hosted in an infrastructure with a varying level of quality and access to monitoring signals. When you host your own infrastructure, you have the opportunity to combine data across your hosting environment. These could include CCTV feeds, physical access logs in and out of the datacenter building, HVAC systems, network components, as well as health of the hardware, operating systems, and applications of your applications. Integrating monitoring data from all these systems would be costly, but not impossible. After taking on such an exercise, you would be faced with massive number of signals to monitor. Identifying any security incidents or events would be like trying to find a needle in haystack.
This is different in an environment where some or all the parts are outsourced to a third party. If your datacenter is co-located with other customers, you will not have an unlimited access to the physical security information. Similarly, if you are consuming compute services from a hosting provider, they are not necessarily able to share the hypervisor-level monitoring data to protect their other customers. In some cases, the company is providing you with physical infrastructure, and managing your servers might even be competing, and it will not be in their best interest to share everything in a transparent manner.
All this changes, with the move to public cloud. Everything in Microsoft Azure is software defined; you have signals available to you from across your hosting platform. The signals are available to you in a standardized manner, which makes it easier to correlate across signal types. And what's more, as Microsoft is responsible for the physical and host security, they provide you with reports with pre-correlated data and in some cases even alerts. Microsoft is constantly analyzing any malicious activity against their infrastructure. This lets them provide you with anonymized security intelligence information, even when your environment is not yet being attacked.
Each Azure subscription automatically stores platform-level logs as activity logs. The activity logs are immutable and stored for 90 days. They can be further exported using diagnostic settings, ensuring longer retention. The activity log schema groups the log events to categories such as administrative, service health, and security. For a full and up-to-date list of categories, please refer to the activity log schema.1
As the activity logs are immutable, you get a reasonable audit trail of key platform events out of the box. The out-of-the-box audit trail contains events such as deletion of resources and changes in access control assignments.
Administrative Activity Logs
This is the main category for Azure activity logs. Azure activity logs monitor any write operations to the management plane of Azure Resource Manager and logs them in the administrative log category. The operations are logged regardless of whether they were successful or not.
Activity log events are created for unsuccessful write operations, too.
The attempted operation and the outcome
Identifying information about the user who initiated the operation, such as User Principal Name, authentication methods, and IP address used
Additional information, such as a reason for a failed operation
A truncated example of an administrative activity log event.
Activity Logs: Authorization
Create role assignment.
Delete role assignment.
Create or update custom role definition.
Delete custom role definition.
Service issues: Problems in the Azure services that affect service types you are using in the Azure regions you are using them. You are informed of service issues in real time. Once the issues have been resolved, you are able to export the reports that describe in detail which of your services were impacted, how, and how long.
Planned maintenance: Upcoming virtual machine maintenance that can affect the availability of your services.
Health advisories: Changes in Azure services that require actions from you, such as upgrading the middleware frameworks you are using in your Azure App Service application.
Security advisories: Security-related notifications or violations that may affect the availability of your Azure services.
This category contains the record of alerts and security incidents generated by Azure Security Center. Alerts are individual security threats that require your attention. Security incidents combine multiple alerts into a single view. Both alerts and incidents describe the impacted resources, as well as a line the potential attack on the MITRE ATT&CK matrix.
This category contains records of all effect action operations performed by Azure policy engine. Whenever a new Azure Resource Manager request is evaluated against a policy, it is logged into the activity log. Compliance policies that evaluate existing resources are also logged, so you should be wary off signal noise in this activity log category. I recommend you consider the policy activity log events as complimentary logs for audit and record-keeping purposes. For security posture management perspective, you should monitor policies with Azure Security Center.
Deployment history is not a write-once-read-many (WORM) log store: deployment history events can be manually deleted like any other resources. Deployment history is stored as metadata in the resource group until they are deleted over the resource group reaches the maximum limit. The maximum number of deployment history items per resource group is 800.
Only Azure Resource Manager Deployments provide template history; other infrastructure-as-code methods that use the AZ CLI, such as Terraform, are not trackable in the deployment history. Additionally, the Azure Portal deploys resources using arm templates, so manual deployments from the portal are locked in the deployment history.
If your resource group has 800 deployments stored in its deployment history, the subsequent deployments will fail. To mitigate this, you can manually delete deployment history items, or you can rely on the automatic deletions feature, introduced in 2020. The automatic deletions feature deletes deployment history items in a first-in, first-out manner. The feature aims to keep the number of deployment history items at around 750, but this number is subject to change. If you do not want to use deployment history at all, you can override the name of your deployments. When you deploy a template with the same name as one in the deployment history, the existing deployment history item is replaced.
Excerpt from an activity log event which correlates to a deployment history item
Azure AD Monitoring
Azure AD provides two main log categories. Activity logs include activities such as user sign-ins and changes made to Azure AD resources (users, groups, roles etc.). Security logs include correlated information from Azure Information Protection. Security logs include risky sign-in logs and reports for users that are flagged as risky users. Risk profiles from Azure Information Protection can also be used as conditions when creating conditional access policies.
Azure Information Protection correlates user sign-in information with Microsoft's internal and external threat intelligence sources. Some of the risks are evaluated in real time. If a user attempts to log in from an anonymous IP addresses (such as from a Tor network exit node), they are immediately flagged for sign-in risk. Another real-time risk is labeled “unfamiliar sign-in properties,” and it compares properties such as IP address and physical location to the user’s history.
Atypical travel, which identifies sign-ins from physically distant locations where the user would not have had time to travel across these locations during the time elapsed between the sign-ins.
Malware linked IP address, which detects sign-ins from IP addresses that are within the known infected addresses, such as bot networks.
Password spray, which detects sign-in attempts using the same password against multiple users, to perform a brute-force attack while avoiding attempted user accounts to be logged out.
Some Azure services allow you to configure Azure AD as their authentication system. These services include Azure App Service, Azure Databricks, Azure Kubernetes Service, and Azure SQL database.
Whenever your application is integrating with Azure AD, you should monitor Azure AD sign-in logs against that application!
You are acting as the security analyst investigating the impact of an ongoing nation-state attack against organizations in your region. You have learned that the adversaries have added additional credentials for service principles with existing elevated privileges2 to gain persistent access. You are tasked to assess whether your environment is impacted. List the steps you need to take and logs you need to query to find this out.
What are the required privileges for your investigation?
For some Azure services, you’re responsible for securing the virtual machine image. In these cases, you need to consume the logs from your infrastructure into a centrally managed location. Monitoring your Azure-based virtual machines typically requires installing a monitoring agent. There are multiple agents available natively in Azure, and the branding is evolving. There are multiple Cloud Workload Protection Platforms (CWPP) vendors, who offer other agent-based monitoring, vulnerability management, and protection solutions that you could use, too. If you choose to use a third-party agent, you should take network considerations into account in your deployment plans.
At the time this book went to press, the most comprehensive native monitoring agent from security perspective is the Log Analytics agent. The Log Analytics agent is named Microsoft Monitoring Agent or OMS agent in some documentation pages. For Windows, it is the same agent used by System Center Operations Manager. The Log Analytics agent collects Windows event logs from Windows virtual machines and Syslog from Linux virtual machines.
In addition to log collection, Azure Security Center’s Azure Defender for Servers includes a vulnerability assessment scanner by Qualys. Once installed, the Qualys agent collects artifacts from the host virtual machine and sends them to the Qualys cloud service of your region for analysis. The findings of the vulnerability assessments performed by Qualys are available in Azure Security Center.
If your virtual machine is configured as a Docker host, Azure Security Center provides you with recommendations to fix vulnerabilities in your container configurations. Azure Security Center uses Center for Internet Security (CIS) Docker Benchmark to perform these assessments.
Container images stored in the Azure Container Registry should also be scanned for vulnerabilities. Azure Defender for Container registries uses Qualys to perform the vulnerability scanning. Images are scanned whenever they are pushed to or pulled from the Azure Container Registry.
In this section, I describe how you can monitor the security of your applications and data plane.
Centralized Log Architecture
In this section, I present various options for centralized logging. The most suitable centralized logging architecture for your need varies based on the footprint you have in Azure. Your Azure usage might be split across multiple Azure clouds, Azure Active Directory tenants, and billing accounts.
Enterprise Environment Considerations
The most common scenario includes collecting logs across your organization's Azure subscriptions into a single centralized Log Analytics Workspace. Whether you are collecting all platform, infrastructure, and application logs into this centralized log store varies based on your unique requirements.
Activity log collection can be enabled centrally, if the identity used for enabling the log export has access to the target Azure subscription, as well as the centralized Log Analytics Workspace.
Any logs that require setting up a diagnostic setting required access to be reversed. The identity creating the diagnostic setting in the target subscription needs to have access to the centralized Log Analytics Workspace at the time the diagnostic setting is created.
SQL audit logging, such as password resets and security logs
App Service antivirus logging
Key Vault access logs
Web Application Firewall logs
Securing the Centralized Log Store
Depending on data stored in your centralized log store and your regulatory requirements, you might need to provide audit logs of your audit log access too. Log Analytics Workspace can emit audit logs using diagnostic settings. The specific log type is labeled LAQueryAuditLogs.
Remember to keep a balance between usefulness for a company-wide audit and usefulness for application-specific requirements!
In addition to the global Azure cloud, you might be using more than one Azure sovereign cloud to meet regulatory requirements. Azure US government cloud, Azure cloud in China, and Azure cloud in Germany are the most common examples. These sovereign clouds have their own network and identity parameters. In addition to technical connectivity issues, your regulatory requirements might prevent you from exporting logs from these sovereign clouds to global Azure infrastructure.
Whether or not you are using sovereign clouds, you might be using more than one Azure Active Directory tenant. This could be due to your business requirements, such as past or upcoming acquisitions, or for any number of reasons. If that is the case, you need to build a solution that integrates across the trust boundary of these tenants. You will effectively consume these logs in the same way as you would consume logs from Azure to a SIEM hosted by a third party.
Security Posture Monitoring
If your cloud security posture management tool reports 100% compliance against your standards, you need to set this number in context. Without knowing how many of your cloud resources are not covered within the reports, you are effectively only seeing the tip of that proverbial iceberg, and you do not have a complete picture of the potential risks, as illustrated in Figure 3-5.
When choosing a cloud security posture monitoring approach, you need to align with your cloud strategy. Specifically, you need to understand the range of existing upcoming cloud services in use within your organization. If you are mostly consuming capacity and services that are similar across cloud vendors, multi-cloud cloud security posture management tools could give you a reasonable coverage across your environment. If you are using platform-as-a-service services, you might need to build additional security monitoring capabilities for the individual clouds you are working with anyway.
Security Posture Monitoring Using Azure Security Center
Azure Security Center is an Azure-native tool that helps you monitor your Azure security posture. Azure Security Center comes built into Azure subscriptions, providing a compliance review against Microsoft set of security best practices, the Azure Security Benchmark (ASB). Azure Security Center can be used by the central security team and by individual application development teams. Azure Security Center set of recommendations conform to Azure role-based access control. Therefore, developers and stakeholders see the security posture information of the resources that they have access to normally.
The central security teams can either use the same Azure-native interface as the application teams do or export the Security Center information to a centralized location. Azure Security Center data can be consumed into custom reports within the Security Center user interface using Azure monitor workbooks. Instead of the Security Center, you can use existing reporting tools at your disposal. Continuous Cloud Optimization Power BI Dashboards project4 is an example of such reporting outside the native Security Center interface.
Since 2021, the standard Security Center policy initiative is the Azure Security Benchmark. Azure Security Benchmark is based on Microsoft best practices and mapped into industry standard security controls, such as the ones from Center for Internet Security (CIS) and the National Institute of Standards and Technology (NIST).
Security Policy Initiatives
In addition to the standard policy initiative, you can use Azure Security Center with your own compliance policy initiatives. These policy initiatives are assigned the same way as Azure policies, that is, to the management group, subscription, and other scopes within Azure Resource Manager. Once assigned, conformance against your security policies can be viewed within the native Azure Security Center, specifically in the compliance blade.
You can use your custom security policy the same way you will do the default Security Center policy initiative. If you want to operationalize the custom security policy in a similar manner as the standard Security Center policy initiative, you can provide your own remediation steps and severity information.
SWIFT CSP CSCF
Security Policy Architecture at Scale
Implementing and enforcing security policies across your Azure environment is a complex topic. On one hand, you want to ensure a consistent security posture and visibility across all your applications. On the other hand, you want to provide room for exception management and multiple levels of granularity. Security policy management therefore ends up being a technical exercise of privileged access and Azure policy assignment(s). Microsoft’s best practices are described in the Azure Security Center Enterprise Onboarding Guide.6
Change Tracking of Security Policies
Due to the evergreen nature of Azure, and specifically Azure Security Center, the standard security policies assigned to your Azure environment will change over time. You might have regulatory requirements requiring you to keep an audit trail of which policies have been in effect and when.
Microsoft keeps track of policy deprecations and other changes in the Azure Security Center release notes documentation.7 You can use this information as a baseline to meet your change tracking requirements. If you are using a custom policy initiative, you are in control of the changes. In that case, your change tracking information will come from your version control and deployment history of the policy initiative.
These sources give you a baseline of changes in the policies. But what about the effective scope of these policies? To answer that question, you would need to complement the policy information with any exceptions you might have. Azure Security Center supports exempting resources or even recommendation categories. Exporting this exemption state can be done using Azure resource graph.9
Azure Tenant Security Scan
In addition to Security Center recommendations and policies, Microsoft built the Secure DevOps kit for Azure (AzSK) to automate security scanning across the application life cycle. The Secure DevOps kit for Azure’s security tested Azure PowerShell to check against misconfigurations. This approach was beneficial for automation purposes and allowed you to control the security of your resources in staging environments, too. In 2021, Microsoft announced the deprecation of Secure DevOps kit for Azure, citing advancements in Azure-native security capabilities such as Azure policy and Security center.
The replacing service is Azure Tenant Security Solution (AzTS). Azure Tenant Security Solution is built by the same team in Microsoft who built the Secure DevOps kit for Azure. It provides comparative security scan coverage for continuous assurance but lacks standalone mode and Azure DevOps pipeline integration. As the AzTS uses Azure Security Center, Azure policy, and Azure resource graph, it is scalable to tens of thousands of Azure subscriptions but, as of 2021, is still lacking some features that are available in the PowerShell-based approach of Secure DevOps kit for Azure.
In this chapter, you learned that security monitoring at scale in Microsoft Azure is a complex topic. You need to balance between different requirements, level of granularity, storage types, and the signal to noise ratio. To succeed in meeting your business requirements, you need to clearly differentiate where centralized logging and monitoring responsibilities and application team responsibilities begin.
When implemented successfully as part of a comprehensive security architecture, both security posture monitoring and operational security monitoring can provide improved results when compared to on-premises monitoring.