In this chapter, we discuss protecting infrastructure-as-a-service workloads in Azure. After reading this chapter, you will be able to select and implement appropriate controls for securing infrastructure-as-a-service workloads as part of your organization’s security policy framework.
While you can certainly replicate your on-premises datacenters into Azure, the cloud’s distributed and software-defined nature means that your experience will be different, no matter your level of execution. To mitigate the differences in performance and cost, you will need to understand what you are paying for when deploying Azure services. For example, an Azure virtual machine is more than just physical hardware, but includes highly available deployment, host updates, storage, and networking capabilities. To benefit from your existing expertise, processes, and licensing, I recommend you apply those in a centrally managed infrastructure-as-a-service environment.
This chapter focuses on securing infrastructure-as-a-service capabilities in self-service manner, where application development teams can deploy and configure their own virtual machines and other services, while relying minimally on a centralized team for configuration and management. This approach is attuned to the cloud-native approach of shifting security responsibilities left.
If you are planning to build a centrally managed shared service, emulating the capabilities and service level of that of on-premises, you will still benefit from the structure of this chapter. You might solve some of the topics differently from my recommendations, however.
Specifically, your network topology, choice of firewall technologies, and update management operations might vary.
Administrative access to Azure virtual machines is controlled using Azure role-based access control, virtual machine extensions, and the various options for user access for logging in to the virtual machine resources.
Azure Role-Based Access Control
Management plane access to Azure virtual machines is controlled using role-based access control. The Contributor and Virtual Machine Contributor roles grant access to create and manage virtual machines, disks, snapshots, and extensions.
The Virtual Machine Contributor role is slightly more limited than Contributor, limiting access on virtual network and storage resources. You should use Virtual Machine Contributor when you want to grant your application teams access to create virtual machines on their own and let them use existing network resources. If you want to let them only configure the virtual machine but not create a new one, you should grant the role in the virtual machine resource scope.
The Contributor and Virtual Machine Contributor grant the user access to reset or create local administrator passwords by using the VMAccess virtual machine extension!
For operational access in shared environments, you should assign the access according to the least privilege principle. To assign access to forensic investigators, use the Disk Snapshot Contributor role, which lets the user manage virtual machine snapshots, but does not grant access to modify, restart, or delete the virtual machine resource itself. To assign access to perform backup and restore operations, you should use the Disk Snapshot Contributor, Disk Restore Operator, Site Recovery Contributor, and Site Recovery Operator roles.
You can use several tools to automate your Azure virtual machine guest operating system configuration management. If you build your own virtual machine images, you can include any configuration management tools you would like to. If you use images from the Azure Marketplace, however, you will need to take the virtual machine onboarding into account when planning your automation.
Whether you are using an agent-based automation tool, such as Chef or Puppet, or an agentless automation tool, such as Ansible, you can onboard your virtual machines using Azure virtual machine extensions. Some configuration management tools are provided as native extensions, but for others, you can use Azure Custom Script Extension to download and execute configuration scripts in the virtual machines.
If you perform the extension installation post-deployment, the user installing the extension will require Virtual Machine Contributor access to the virtual machine resource. You can also include the extension installation as part of your infrastructure-as-code deployment. In that case, your continuous deployment pipeline will have to have the same Azure RBAC access.
Virtual Machine Login Access
The data-plane access for virtual machines is controlled using virtual machine login access control.
Azure allows users to log in using a local administrator account by default. While the Azure Portal experience prevents you to use some of the most common usernames, such as admin or root, it will still expose you to brute-force and password-spray attacks.
The Azure-native way of managing login access to virtual machines is to use Azure Active Directory login. With Azure Active Directory login, authentication is performed using Azure Active Directory and authorization using Azure role-based access control. In practice, Azure Active Directory login is implemented as a virtual machine extension, which installs and manages the required packages on your Linux or Windows virtual machines.
Virtual machine administrator login: This role grants access with administrative privileges, including the ability to elevate privileges with the sudo command in Linux virtual machines.
Virtual machine user login: This role grants access with user privileges.
When your users have one of the earlier-mentioned Azure role-based access control roles assigned, they can log in to the virtual machines using Remote Desktop Protocol (Windows). To log in to Linux virtual machines, they will need to use the az ssh extension of Azure Command-Line Interface. After a successful authentication, standard SSH clients can be used.
To support a broader set of existing workloads, you might want to use Azure Active Directory Domain Services (Azure AD DS) or traditional Active Directory Domain Services (AD DS) to control login access to your virtual machine. If you have extended your Active Directory to the cloud by hosting domain controllers in Azure, you can join your virtual machines to the domain and continue using your established login methods. This requires some considerable network planning, however, so this is mostly suitable for centrally managed infrastructure-as-a-service scenarios. For self-managed virtual machines, it is recommended to use Azure Active Directory login.
Network access to the virtual machine management ports (RDP and SSH) is unrestricted by default: virtual machines with public IP addresses can be accessed remotely by anyone. To protect your virtual machines from port scanning, brute-force logins, and other threats that arise from open Internet access, you can set multiple controls in place.
Self-Managed Virtual Machines
First, you should only use private IP addresses for your virtual machines. This recommendation applies regardless of whether you are securing self-managed virtual machines or virtual machines in centrally managed shared services.
If you are not able to use private IP addresses, you should use Azure Security Center just-in-time access (JIT) to reduce the public exposure of your virtual machine management ports. Once configured, Azure Security Center JIT enforces that inbound traffic is denied. To open management port access from their network location, users need to perform an activation in the Azure Portal, which authorizes the user, audits the access, and opens the management ports for a predetermined time to the IP addresses defined by the user requesting access.
Finally, you can implement adaptive network hardening1 to let Azure Security Center automatically monitor and manage your network security group rules.
Azure point-to-site VPN supports most client operating systems. The point-to-site VPN access can be authenticated using Azure AD (Windows 10), RADIUS, or certificate authentication.
As an alternative to client-based point-to-site VPN, you can use Azure Bastion. Azure Bastion provides HTML5-based web client that is streamed to your administrative users through the Azure Portal. Azure Bastion is jumpbox service that is deployed into a separate subnet in your virtual network, just like the VPN gateway. Azure Bastion is managed and hardened by Microsoft, as a service.
Centrally Managed Virtual Machines
Web Application Access
Based on your cloud strategy and available application modernization investments, you might not be able to re-platform or re-architect your applications to be suitable for hosting in platform as a service or container as a service. Therefore, in addition to providing administrative access, application workloads hosted in your virtual machines might need to be accessible by its users directly.
Monitoring and Detection
Administrative operations, such as deleting, shutting down, or restarting the virtual machine
Role-based access control changes (roleAssignments/write)
Data-plane access control changes, such as changes to SSH keys or using the VMAccess extension to reset local administrator credentials
Azure virtual machines do not keep data-plane security logs by default. To enable data-plane security logging, you need to enable Azure Defender, provision the Azure Monitor agent on the virtual machine, and configure the Azure Security Center data collection level.
Azure Defender is enabled by configuring the Security Center Standard pricing tier for the virtual machines of your environment. This can be done in infrastructure as code or enforced using Azure Policy by defining the Microsoft.Security/pricings resource with the name VirtualMachines and setting the pricingTier property to Standard.
Next, you need to install the Azure Monitor agent (Log Analytics agent), which reads security-related configuration and event logs from the virtual machine and transmits them to your Log Analytics Workspace. Azure Security Center then uses these logs to provide you with security recommendations.
Minimal covers only events that might indicate a successful breach and important events that have a very low volume (such as failed login attempts).
Common provides a full user audit trail, such as logins and sign-outs, Kerberos operations, and security group changes.
Azure Defender includes a built-in vulnerability scanner by Qualys. The vulnerability scanner is deployed as an extension: LinuxAgent.AzureSecurityCenter or WindowsAgent.AzureSecurityCenter, respectively. Once the extension is installed, the Qualys scans the virtual machine and sends the results for analysis in the Qualys’ cloud service hosted closed to your region (either in the United States or Europe). Subsequent scans are performed every four hours, but you may also trigger scans manually.3 Qualys can identify affected machines within 48 hours of the disclosure of a critical vulnerability. Any findings will be available in the Azure Security Center as under the recommendation Vulnerabilities in your virtual machines should be remediated.
You may also use any other vulnerability management or endpoint detection and response solution, such as Microsoft Defender for Endpoint.
Azure Defender includes a license to Microsoft Antimalware for Azure, which identifies viruses, spyware, and other malicious software. Microsoft Antimalware uses the same platform as Microsoft Security Essentials (MSE), Microsoft Forefront Endpoint Protection, and Microsoft System Center Endpoint Protection. It offers both real-time protection and scheduled scanning. Microsoft Antimalware for Azure is installed as a virtual machine extension and is only available for Windows.
You may also use any other endpoint protection solution. Major vendors that offer endpoint protection as virtual machine extensions include Trend Micro, Symantec, McAfee, and Sophos.
Azure Defender Alerts
A logon from a malicious IP has been detected.
Event log or command history log was cleared.
Detected the disabling of critical services.
Possible credential dumping detected.
Suspected Kerberos Golden Ticket attack parameters observed.
Behavior similar to ransomware detected.
Detected persistence attempted.
Disabling of auditd logging
Failed SSH brute-force attack.
Backup and Disaster Recovery
Azure provides four tiers of high availability for VM-based applications: single-instance deployments with SSD or HDD, availability sets, and availability zones. When you deploy a single instance of a virtual machine and use HDD Managed Disks, Microsoft guarantees the virtual machine connectivity with a 99.5% service-level agreement (SLA). If you deploy a single instance with solid-state disks, Microsoft offers SLA of 99.9%. If you deploy two or more instances, you have further options. Availability sets offer SLA of 99.95% while deploying your virtual machines to separate fault and update domains within the same Azure datacenter region. Availability zones provide SLA of 99.99%, while deploying your virtual machines to two or more zones in a datacenter region. Each zone consists of one or more datacenters equipped with independent power, cooling, and networking.
To protect the data and the state stored in your applications running in virtual machines, you can use Azure Backup and Azure Site Recovery. Both store their data in the Azure Recovery Services Vault, which can be configured for either locally redundant storage (LRS) or geo-redundant storage (GRS).
Azure Backup provides a native backup solution for Azure virtual machines. To enable Azure Backup, add it to your Azure Recovery Services Vault and configure the retention policy. When restoring data, you can choose to create a new virtual machine, replace the existing virtual machine, or perform a cross-region restoring the secondary (paired) Azure region.
Azure Site Recovery replicates workload’s virtual machines from your primary region to a secondary region. When an outage occurs at your primary region, you can automate failover to the secondary location. After the primary region is running again, you can revert to it. Azure Site Recovery has SLA-guaranteed recovery time objective (RTO) of two hours. Azure Site Recovery integrates with other Azure infrastructure as a service, such as storage and networks. As it is application-aware, you can customize the failover and recovery of multi-tier applications running on multiple virtual machines by configuring Recovery Plans.
You are responsible for architecting the resiliency of an infrastructure-as-a-service application. The application is deployed to three tiers: front end, back end, and data. Your business requirement is to design a solution that is available 99.95% of time. How will you design the solution?
Guest Operating System Management
When it comes to operating infrastructure as a service in the public cloud, you will face several impediments. In contrast to platform as a service, there are still a significant number of controls you are responsible for in an ongoing basis, and most often your application development teams do not have the capacity or expertise to perform them. So how do you balance between fast time to market of a self-service infrastructure as a service and the operational responsibilities that this comes with?
You may start by ensuring your application development teams are using the latest versions of operating system images that meet your requirements. You can approach this by standardizing on a set of approved images from the Azure Marketplace or by publishing your own images using the Azure Shared Image Gallery service.
Next, you can automate operating system patching in Azure to support both self-service and centrally managed infrastructure-as-a-service scenarios.
Finally, remember that you do not need to do the impossible. To deliver appropriate operational controls with an appropriate cost that also fits your enterprise security architecture is a complex topic. You may very well end up in a solution that promotes self-service infrastructure as a service for a limited number of scenarios and encourages your users to take advantage of your centrally managed infrastructure as a service, where operational responsibilities are taken care for them.
Operating System Image Management
To ensure a secure baseline for your operating system configuration, you can control the operating system images used in your virtual machines. The images offered by Microsoft are a good starting point as they are always up to date, but their security configurations are close to defaults. As such, their configuration needs to be hardened before operational usage. If you are providing a centrally managed infrastructure as a service, you may use these images as a source before applying your own configuration management and hardening. For example, you might use a Microsoft-provided image and control the hardening of the images using domain policies. As this depends on the virtual machine having network access to the domain controllers, it might not be suitable for all network topologies, however.
If you would like to control the original image, you can use Azure Shared Image Gallery to provide your application teams access to private images you have approved or built yourself (such as your golden image). Images in the Shared Image Gallery are considered sub-resources in Azure Resource Manager, so you can assign your application teams access to them using role-based access control. Once they have access to an image, they can create virtual machines in their subscriptions using that image.
You can also use images built and managed by other vendors. Specifically, the Center for Internet Security (CIS) provides hardened images through the Azure Marketplace. The CIS hardened images come with an hourly license fee that is billed in addition to the virtual machine infrastructure cost. The images are provided for both Level 1 and Level 2 CIS Benchmark profiles.5 The images are configured using local group policy, so they are a good fit for a self-managed infrastructure-as-a-service scenario.
By using the CIS hardened images from the Azure Marketplace, you delegate the responsibility for both operating system hardening and vulnerability patching to CIS.
To keep your self-managed infrastructure as a service patched, you can automate patching of your virtual machine operating systems using the automatic VM guest patching feature. This automatically downloads and applies critical and security patches and reboots your virtual machines if required. Managing other updates is still left for the responsibility of the application development team.
The automatic patching is performed during off-peak hours of your virtual machine time zone. The patches will be installed within 30 days of release. Automatic VM guest patching is a good option for you, if your virtual machines are not automatically turned off during off-peak hours.
Not all workloads are suitable for automatic patching! Before going to production, validate your approach and follow best practices detailed in the Cloud Adoption Framework.6
You can also evaluate the usage of Hotpatching for Windows Server Azure Edition, which allows for automating the patching of security updates without virtual machine reboots.
As we went to press, automatic virtual machine guest OS patching and Hotpatching were still in preview.
Centrally Managed Patching
To centrally update your Azure virtual machines, you can certainly leverage your existing processes and tools, provided that all networking requirements are met. The most straightforward example of that would be if you would centrally manage a landing zone for virtual machines and give your application development teams access only to the data plane, that is, inside the virtual machine resources. In that case, you can continue to use System Center Updates Publisher or Windows Server Update Services (WSUS) as you would have before.
If your infrastructure-as-a-service workloads are not completely under central control, you might benefit from using Azure Update Management, a native service to manage operating system updates at scale. Azure Update Management can manage not only manage critical and security updates, like automatic VM guest patching, but any other updates, too.
Azure Update Management consumes data from Log Analytics agents running in your virtual machines and uses automation runbooks to compare the state of your virtual machines to available updates in private or public update sources. For Windows virtual machines, this means WSUS or Microsoft update. For Linux virtual machines, the options are private or public repositories. Windows virtual machines are scanned twice a day for available updates, while Linux virtual machines are scanned hourly.
When updates are available, you can deploy them to all or a subset of the virtual machines that you manage with Azure Update Management. As update management is using Azure Automation, you can also schedule update deployments programmatically, such as to install critical updates on Sundays.
In this chapter, we discussed the controls available to you for protecting Azure infrastructure-as-a-service workloads.
As we learned, your options vary based on how autonomous your application development teams are. If you are giving the application development teams a high degree of freedom in terms of network topology and deployment options, you essentially approach securing virtual machines as you would any other Azure resources, that is, by enforcing security controls on self-service resources. This gives your application delivery teams more flexibility and potentially reducing time to market. Your infrastructure-as-a-service environment will be more heterogeneous, however. Rather than attempting to build central policies that control your cloud virtual machines like you have previously, with this approach, you should focus on limiting any exposure to the application development team’s own environment. For example, you might do well to isolate them from any internal networks (either on Azure or on-premises).
If you are taking more responsibilities and offering a centrally managed service, you will be able to build a more homogeneous infrastructure-as-a-service environment. You might even be able to reuse your existing process and tools, providing an indistinguishable virtual machine experience to your end users. As they might not even need to get access to the Azure Resource Manager or the Azure Portal, the fact that the resources are deployed to Azure might become a trivial implementation detail to them.