Chapter 6: Designing, Implementing, and Managing the Landing Zone

This section of the book is all about the basic operations in multi-cloud, or BaseOps. We'll be learning about the basics, starting with managing the landing zone – the foundation of any cloud environment. Before a business can start migrating workloads or develop applications in cloud environments, they will need to define that foundation. Best practices for landing zones include the hub and spoke-model in Azure, AWS Landing Zone, and the definition of projects in Google Cloud. In multi-cloud, this landing zone expands over multi-cloud concepts and technologies.

This chapter describes how to design the landing zones for the major cloud platforms and explores the BaseOps principles for managing them. We will learn how to design the landing zones in Azure, AWS, and GCP, how to define policies to manage the landing zone, and get a deeper understanding of handling accounts in these landing zones. We will also learn that there are platforms where we can manage different clouds from just one console via orchestration. In this chapter, we're going to explore the foundational concepts of the major cloud providers; that is, Azure, AWS, and GCP. We'll also design the basic landing zones in the major clouds, manage the foundation environments in multi-cloud, learn how to abstract policies from resources on the different cloud platforms by exploring Infrastructure as Code and Configuration as Code, and understand the need for demarcation in the cloud.

In this chapter, we will cover the following topics:

  • Understanding BaseOps and the foundational concepts
  • Creating a multi-cloud landing zone and blueprint
  • Managing the landing zone using policies
  • Orchestrating policies for multi-cloud
  • Global admin galore – the need for demarcation
  • Let's get started!

Understanding BaseOps and the foundational concepts

BaseOps might not be a familiar term to all, although you could guess what it means: basic operations. In cloud environments, this is more often referred to as cloud operations. BaseOps is mainly about operating the cloud environment in the most efficient way possible by making optimal use of the cloud services that major providers offer on the different layers: network, compute, storage, but also PaaS and SaaS.

The main objective of BaseOps is to ensure that cloud systems are available to the organization and that these can safely be used to do the following:

  • Monitor network capacity and appropriately route traffic.
  • Monitor the capacity of compute resources and adjust this to the business requirements.
  • Monitor the capacity of storage resources and adjust this to the business requirements.
  • Monitor the availability of resources, including health checks for backups and ensuring that systems can be recovered when required.
  • Monitor the perimeter and internal security of systems, ensuring data integrity.
  • Overall, manage systems at the service levels and use Key Performance Indicators (KPIs), as agreed upon by the business.
  • Assuming that the systems are automated as much as possible, part of BaseOps is also being able to monitor and manage the pipeline.

At the end of the day, this is all about the quality of service. That quality is defined by service levels and KPIs that have been derived from the business goals. BaseOps must be enabled to deliver that quality via clear procedures, skilled people, and the proper tools.

We have already explored the business reasons regarding why we should deploy systems in cloud environments: the ultimate goal is to have flexibility, agility, but also cost efficiency. This can only be achieved if we standardize and automate. All repetitive tasks should be automated. Identifying these tasks and monitoring whether these automated tasks are executed in the right way, is part of BaseOps. The automation process itself is development, but the one reason we should have DevOps in the first place is so that we can execute whatever the developer invents. Both teams have the same goal, for that matter: protect and manage the cloud systems according to best practices.

We can achieve these goals by executing the activities mentioned in the following sections.

Defining and implementing the base infrastructure – the landing zone

This is by far the most important activity in the BaseOps domain. It's really the foundation of everything else. The landing zone is the environment on the designated cloud platform where we will host the workloads, the applications, and the data resources. The starting principle of creating a landing zone is that it's fully provisioned through code. In other words, the landing zone contains the building blocks that form a consistent environment where we can start deploying application and data functionality, as we discussed in Chapter 4, Service Design for Multi-cloud, where we talked about scaffolding. In the Creating a multi-cloud landing zone and blueprint section of this chapter, we will deep dive into creating landing zones on the different major platforms; that is, Azure, AWS, and GCP.

Defining standards and policies for the base infrastructure

The base infrastructure typically consists of networking and environments that can host, compute, and storage resources. You could compare this with the Hyperconverged Infrastructure (HCI), which refers to a physical box that holds compute nodes, a storage device, and a switch to make sure that compute nodes and storage can actually communicate. The only addition that we would need is a router that allows the box to communicate with the outside world. The cloud is no different: the base infrastructure consists of compute, storage nodes, and switches to enable traffic. The major difference with the physical box is that, in the cloud, all these components are code.

But as we have already learned, this wouldn't be enough to get started. We also need an area that allows us to communicate from our cloud to the outside and to access our cloud. Next, we will need to control who accesses our cloud environment. So, a base infrastructure will need accounts and a way to provision these accounts in a secure manner. You've guessed it: even when it comes to defining the standard and policies for setting up a base infrastructure, there are a million choices to make. Landing zone concepts make it a lot easier to get started fast.

As a rule of thumb, the base infrastructure consists of five elements:

  • Network
  • Compute nodes
  • Storage nodes
  • Accounts
  • Defense (security)

The good news is that all cloud providers agree that these are the base elements of an infrastructure. Even better, they all provide code-based components to create the base infrastructure. From this point onward, we will call these components building blocks. The issue is that they offer lots of choices in terms of the different types of building blocks and how to deploy them, such as through blueprints, templates, code editors, command-line programming, or through their respective portals. As we mentioned previously, we will explore the landing zone solutions in this chapter.

Defining standard architecture principles (architecture patterns and reference architecture)

A way to define a reference architecture for your business is to think outside-in. Think of an architecture in terms of circles. The outer circle is the business zone, where all the business requirements and principles are gathered. These drive the next inner circle: the solutions zone. This is the zone where we define our solutions portfolio. For example, if the business has a demand for analyzing large sets of data (business requirement), then a data lake could be a good solution.

The solution zone is embedded between the business zone at the outer side and the platform zone at the inner side. If we have, for instance, Azure as our defined platform, then we could have Azure Data Factory as a solution for the specific data lake requirement. The principle is that from these platforms, which can also be third-party PaaS and SaaS platforms, the solutions are mapped to the business requirements. By doing so, we create the solutions portfolio, which contains specific building blocks that make up the solution.

The heart of this model – the utmost inner circle – is the integration zone, from where we manage the entire ecosystem in the other, outer circles.

Security should be included in every single layer or circle. Due to this, the boundary of the whole model is set by the intrinsic security zone:

Figure 6.1 – Circular model showing the layers of the enterprise portfolio

Figure 6.1 – Circular model showing the layers of the enterprise portfolio

The preceding diagram shows this model with an example of the business requiring data analytics, with Data Factory and Data Bricks as solutions coming from Azure as the envisioned platform. The full scope forms the enterprise portfolio.

Managing the base infrastructure

Even if we have only deployed a landing zone, there are still quite a number of building blocks that we will have to manage from that point onward.

For a network, we will have to manage, at a minimum, the following:

  • Provisioning, configuring, and managing virtual networks (vNets, VPCs, subnets, internet facing public zones, and private zones)
  • Provisioning and managing routing, Network Address Translation (NAT), Network Access Control (NAC), Access Control Lists (ACL), and traffic management
  • Provisioning and managing load balancing, network peering, and network gateways for VPNs or dedicated connections
  • Provisioning and managing DNS
  • Network monitoring
  • Detect, investigate, and resolve incidents related to network functions

For compute, we will have to manage, at a minimum, the following:

  • Provisioning, configuring, and the operations of virtual machines. This often includes managing the operating system (Windows, various Linux distributions, and so on).
  • Detect, investigate, and resolve incidents related to the functions of virtual machines.
  • Patch management.
  • Operations of backups (full, iterative, and snapshots).
  • Monitoring, logging, health checks, and proactive checks/maintenance.

Do note that compute in the cloud involves more than virtual machines. It also includes things such as containers, container orchestration, functions, and serverless computing. However, in the landing zone, these native services are often not immediately deployed. You might consider having the container platform deployed as part of the base infrastructure. Remember that, in the cloud, we do see a strong shift from VM to container, so we should prepare for that while setting up our landing zone.

In most cases, this will include setting up the Kubernetes cluster. In Azure, this is done through Azure Kubernetes Services (AKS), where we create a resource group that will host the AKS cluster. AWS offers its own cluster service through Elastic Kubernetes Service (EKS). In GCP, this is the Google Kubernetes Engine (GKE). The good news is that a lot of essential building blocks, such as Kubernetes DNS, are already deployed as part of setting up the cluster. Once we have the cluster running, we can start deploying cluster nodes, pods (a collection of application containers), and containers. In terms of consistently managing Kubernetes platforms across multi-cloud platforms, there are multiple agnostic solutions that you can look at, such as Rancher or VMWare's Tanzu Mission Control.

For storage, we will have to manage, at a minimum, the following:

  • Provisioning, configuring, and the operations of storage, including disks for managed virtual machines
  • Detect, investigate, and resolve incidents related to the functions of storage resources
  • Monitoring, logging, health checks on local and different redundant types of storage solutions (zone, regional, globally redundant), and proactive checks/maintenance, including capacity checks and adjustments (capacity allocation)

Next, we will have to manage the accounts and make sure that our landing zone – the cloud environment and all its building blocks – is secure. Account management involves creating accounts or account groups that need access to the cloud environment. These are typically created in Active Directory.

In the Global admin galore – the need for demarcation section of this chapter, we will take a deeper look at admin accounts and the use of global admin accounts. Security is tightly connected to account, identity, and access management, but also to things such as hardening (protecting systems from outside threats), endpoint protection, and vulnerability management. From day 1, we must have security in place on all the layers in order to prevent, detect, assess, and mitigate any breach. This is part of SecOps. Section 4 of this book is all about securing our cloud environments.

Defining and managing infrastructure automation tools and processes (Infrastructure as Code and Configuration as Code)

In the cloud, we work with code. There's no need to buy physical hardware anymore; we simply define our hardware in code. This doesn't mean we don't have to manage it. To do this in the most efficient way, we need a master code repository. This repository will hold the code that defines the infrastructure components, as well as how these components have to be configured to meet our principles in terms of security and compliancy. This is what we typically refer to as the desired state.

Azure, AWS, and Google offer native tools to facilitate infrastructure and configuration as code, as well as tools to automate the deployment of the desired state. In Azure, we can work with Azure DevOps and Azure Automation, both of which work with Azure Resource Manager (ARM). AWS offers CloudFormation, while Google has Cloud Resource Manager and Cloud Deployment Manager. These are all tied into the respective platforms, but the market also offers third-party tooling that tends to be agnostic to these platforms. We will explore some of the leading tools later in this chapter, in the Orchestrating policies for multi-cloud section.

For source code management, we can use tools such as GitHub, Azure DevOps, AWS CodeCommit, and GCP Cloud Repositories.

Defining and implementing monitoring and management tools

We've already discussed the need for monitoring. The next step is to define what tooling we can use to perform these tasks. Again, the cloud platforms offer native tooling: Azure Monitoring, Application Insights, and Log Analytics; AWS CloudTrail and CloudWatch; and Google Stackdriver monitoring. And, of course, there's a massive set of third-party tools available, such as Splunk and Nagios. These latter tools have a great advantage since they can operate independent of the underlying platform. This book won't try to convince you that tool A is preferred over tool B; as an architect, you will have to decide what tool fits the requirements – and the budget, for that matter.

Security is a special topic. The cloud platforms have spent quite some effort in creating extensive security monitoring for their platforms. Monitoring is not only about detecting; it's also about triggering mitigating actions. This is especially true when it comes to security where detecting a breach is certainly not enough. Actually, the time between detecting a vulnerability or a breach and the exploit can be a matter of seconds, which makes it necessary to enable fast action. This is where SIEM comes into play: Security Incident and Event Management. SIEM systems evolve rapidly and, at the time of writing, intelligent solutions are often part of the system.

An example of this is Azure Sentinel, an Azure-native SIEM solution: it works together with Azure Security Center, where policies are stored and managed, but it also performs an analysis of the behavior it sees within the environments that an enterprise hosts on the Azure platform. Lastly, it can automatically trigger actions. For instance, it can block an account that logs in from the UK one minute and from Singapore the next – something that wouldn't be possible without warp-driven time travelling.

In other words, monitoring systems do become more sophisticated and developments become as fast as lighting.

Supporting operations

Finally, once we have thought about all of this, we need to figure out who will be executing all these tasks. We will need people with the right skills to manage our multi-cloud environments. As we have said already, the truly T-shaped engineer or admin doesn't exist. That would be the five-legged sheep. Most enterprises end up with a group of developers and operators that all have generic and more specific skills. Some providers refer to this as the Cloud Center of Excellence (CCoE), and they mark it as an important step in the cloud journey or cloud adoption process of that enterprise. Part of this stage would be to identify the roles this CCoE should have and get the members of the CCoE on board with this. The team needs to be able to build and manage the environments, but they will also have a strong role to fulfil to evangelize new cloud-native solutions.

Tip

Just as a reading tip, please have a look at an excellent blog post on forming a CCoE by Amazon's Enterprise Strategist Mark Schwartz: https://aws.amazon.com/blogs/enterprise-strategy/using-a-cloud-center-of-excellence-ccoe-to-transform-the-entire-enterprise/.

In this section, we have learned what we need to cover to set up our operations in multi-cloud. The next step is building our landing zones on the cloud platforms.

Creating a multi-cloud landing zone and blueprint

All the major cloud providers offer a methodology that can be used to create a landing zone on their respective platforms. In this section, we will explore the landing zone concepts for Azure, AWS, and GCP.

Configuring the landing zone on Azure

The landing zone in Azure is part of the Cloud Adoption Framework (CAF) and implements a set of cloud services to get us started with building or migrating workloads to the Azure platform. The landing zone creates all the necessary building blocks to enable a business to start using the cloud platform.

We talked about the analogy of constructing a house previously, when we discussed scaffolding. Consider the landing zone to be the empty house. A house has a foundation; that is, a front door that provides access to the house and rooms where we can place furniture. These rooms have already been designed to cater for specific needs. The kitchen has connections for cooking equipment and a tap for running water. So does the bathroom: it has taps, a shower, a bath, but also a floor that doesn't get damaged when it gets wet. We can compare this to the landing zone: it already has rooms that have been set up for specific usage, such as to cater for outside connectivity.

Preparing these rooms for usage is something Microsoft calls refactoring. CAF guides the business in setting up security, identity, and access management, naming conventions, cost management, and so on. All these topics are deployed as part of the landing zone. Once we've finished building the landing zone, we will have a base platform that is secure and defined a naming and tagging convention for, where Role-Based Access Control (RBAC) is in place and where we have a clear insight into the costs that we are generating in the platform.

Now, what do we need for that?

First of all, we need a subscription to Azure. Next, we need to deploy rooms, the different segments in our environment where we will host our systems. In Azure, we typically deploy the hub and spoke model. This derives from the fact that Azure offers shared services, which are used across the different rooms, such as monitoring and backup services. These shared services land in the hub. The spokes connect to the hub so that they can consume the shared services from there, instead of having to deploy all these services separately into the different spokes.

The landing zone consists of code: it's Infrastructure as Code, so it drives the Azure architecture completely from code, from the very start. To do this, it uses ARM templates in JSON format. We can actually blueprint the code so that we can easily launch new spokes in a very consistent way. The blueprint would, for instance, contain code that shows how the spokes connect to the hub and how shared services are consumed. Azure offers various sample landing zone blueprints to get us started really fast. However, do check if the blueprint meets the compliancy and security requirements of your specific business.

The landing zone blueprint will provide the following:

  • Virtual networks with subnets for gateways, Azure Firewall, and an Azure Bastion server (a jumpbox, which is the management server that administrators use to enter the environments in the cloud)
  • Storage account for logging and monitoring
  • An Azure Migrate project

Be aware that this landing zone is not fit to host sensitive data or mission-critical applications yet. The blueprint is deployed under the assumption that the subscription is already associated with an Azure Active Directory instance. Also, the landing zone blueprint makes the assumption that no Azure policies have to be applied. In other words, you will have an empty house with a few empty rooms that you will still need to decorate yourself, meaning that you will have to implement baselines and policies.

This gets you started. By refactoring the landing zone and adding services to improve performance, reliability, cost efficiency, and security, you will get it ready to actually host workloads:

Figure 6.2 – Basic setup of Azure landing zone B

Figure 6.2 – Basic setup of Azure landing zone B

The preceding diagram shows a basic setup for a landing zone in Azure containing a hub to host the generic services and two spokes to host the workloads. These spokes have two subnets: one for management and one for the actual workloads, such as applications.

Tip

More information about Azure Landing Zone can be found at https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/ready/landing-zone/.

Creating a landing zone in AWS

AWS offers AWS Landing Zone as a complete solution, based on the Node.js runtime. Like Azure, AWS offers numerous solutions so that you can set up an environment. All these solutions require design decisions. To save time in getting started, AWS Landing Zone sets up a basic configuration that's ready to go. To enable this, AWS Landing Zone deploys the so-called AWS Account Vending Machine (AVM), which provisions and configures new accounts with the use of single sign-on.

To grasp this, we must understand the way AWS environments are configured. It is somewhat comparable to the hub and spoke model of Azure, but instead of hub and spokes, AWS uses accounts. AWS Landing Zone comprises four accounts that follow the Cloud Adoption Framework (CAF) of AWS:

  • Organization account: This is the account that's used to control the member accounts and configurations of the landing zone. It also includes the so-called manifest file in the S3 storage bucket. The manifest file sets parameters for region and organizational policies. The file refers to AWS CloudFormation, a service that we could compare to ARM in Azure. CloudFormation helps with creating, deploying, and managing resources in AWS, such as EC2 computing instances and Amazon databases. It supports Infrastructure as Code.
  • Shared services account: By default, Landing Zone manages the associated accounts through SSO, short for single sign-on. The SSO integration and the AWS managed AD is hosted in the shared services account. It automatically peers new accounts in the VPC where the landing zone is created. AVM plays a big role in this.
  • Log archive account: AWS Landing Zone uses CloudTrail and Config Logs. CloudTrail monitors and logs account activity in the AWS environment that we create. It essentially keeps a history of all actions that take place in the infrastructure that is deployed in a VPC. It differs from CloudWatch in that it's complementary to CloudTrail. CloudWatch monitors all resources and applications in AWS environments, whereas CloudTrail tracks activity in accounts and logs these activities in an S3 storage bucket.
  • Security account: This account holds the key vault–the directory where we store our accounts–for cross-account roles in the Landing Zone and two security services that AWS provides: GuardDuty and Amazon SNS. GuardDuty is the AWS service for threat detection, the Simple Notification Service (SNS) that enables sending of security notifications. The Landing Zone implements an initial security baseline that comprises (among other things) central storage of config files, configuration of IAM password policies, threat detection, and Landing Zone notifications. For the latter, CloudWatch is used to send out alerts in case of, for example, root account login and failed console sign-in.

The following diagram shows the setup of the landing zone in AWS:

Figure 6.3 – The AWS Landing Zone solution

Figure 6.3 – The AWS Landing Zone solution

The one thing that we haven't discussed yet is the Account Vending Machine (AVM), which plays a crucial role in setting up the Landing Zone. The AVM launches the basic accounts in the Landing Zone with a predefined network and the security baseline. Under the hood, AVM uses Node.js templates that set up organization units wherein the previously described accounts are deployed with default, preconfigured settings. One of the components that is launched is the AWS SSO directory allows federated access to AWS accounts.

Tip

More information about AWS Landing Zone can be found at https://aws.amazon.com/solutions/aws-landing-zone/.

Creating the landing zone in GCP

GCP differs a lot from Azure and AWS, although the hub and spoke model can also be applied in GCP. Still, you can actually tell that this platform has a different vision of the cloud. GCP focuses more on containers than on IaaS by using more traditional resources. Google talks about a landing zone as somewhere you are planning to deploy a Kubernetes cluster in a GCP project using GKE, although deploying VMs is, of course, possible on the platform.

In the landing zone, you create Virtual Private Cloud (VPC) network and set Kubernetes network policies. These policies define how we will be using isolated and non-isolated pods in our Kubernetes environment. Basically, by adding network policies, we create isolated pods, meaning that these pods – which hold a number of containers – only allow defined traffic, where non-isolated pods accept all traffic from any source. The policy lets you assign IP blocks and deny/allow rules to the pods. The next step is to define service definitions to the Kubernetes environment in the landing zone so that pods can actually start running applications or databases. The last step to create the landing zone is to configure DNS for GKE.

As we mentioned previously, Google very much advocates the use of Kubernetes and containers, which is why GCP is really optimized for running this kind of infrastructure. If we don't want to use container technology, then we will have to create a project in GCP ourselves. The preferred way to do this is through Deployment Manager and the gcloud command line. You could compare Deployment Manager to ARM in Azure: it uses the APIs of other GCP services to create and manage resources on the platform. One way to access this is through the Cloud Shell within the Google Cloud portal, but GCP also offers some nice tools to get the work done. People who are still familiar with Unix command-line programming will find this really recognizable and easy to work with.

The first step is enabling these APIs; that is, the Compute Engine API and the Deployment Manager API. By installing the Cloud SDK, we get a command-line tool called gcloud that interfaces with the Deployment Manager. Now that we have gcloud running, we can simply start a project with the gcloud config set project command, followed by the name or ID of the project itself; for example, gcloud config set project [Project ID]. Next, we must set the region where we will be deploying our resources. It uses the very same command; that is, gcloud config set compute/region, followed by the region ID; that is, gcloud config set compute/region [region].

With that, we're done! Well, almost. You can also clone samples from the Deployment Manager GitHub repository. This repository also contains good documentation on how to use these samples.

Tip

To clone the GitHub repository for Deployment Manager into your own project, use the git clone https://github.com/GoogleCloudPlatform/deploymentmanager-samples command or go to https://github.com/terraform-google-modules/terraform-google-migrate. There are more options, but these are the commonly used ways to do this.

The following diagram shows a basic setup for a GCP project:

Figure 6.4 – Basic setup of a project in GCP, using Compute Engine and Cloud SQL

Figure 6.4 – Basic setup of a project in GCP, using Compute Engine and Cloud SQL

With that, we have created landing zones in all three major cloud platforms and by doing so, we have discovered that, in some ways, the cloud concepts are similar, but that there are also some major differences in the underlying technology. Now, let's explore how we can manage these landing zones using policies, as well as how to orchestrate these policies over the different platforms.

Managing the landing zone using policies

When we work in cloud platforms, we work with code. Everything we do in the cloud is software- and code-defined. This makes the cloud infrastructure absolutely very agile, but it also means that we need some strict guidance in terms of how we manage the code, starting with the code that defines our landing zone or foundation environment. As with everything in IT, it needs maintenance. In traditional data centers and systems, we have maintenance windows where we can update and upgrade systems. In the cloud, things work a little differently.

First of all, the cloud providers apply maintenance whenever it's needed. There's no way that they can agree upon maintenance windows with thousands of customers spread across the globe. They simply do whatever needs to be done to keep the platform healthy, ready for improvements and the release of new features. Enterprises don't want to be impacted by these maintenance activities, so they will have to make sure that their code is safe at all times.

The next thing we need to take into account is the systems that the enterprise has deployed on the platform, within their own virtual cloud or project. These resources also need maintenance. If we're running VMs, we will need to patch them every now and then. In this case, we are patching code. We want to make sure that, with these activities, administrators do not accidently override certain security settings or worse, delete discs or any critical code that is required for a specific function that a resource fulfills. This is something that we must care about from the very start, when setting up the landing zones. From that point onward, we must start managing. For that, we use can policies and management tooling.

In this section, we have set up the landing zones. In the next section, we'll learn how to manage them.

Managing basic operations in AWS

This time, we'll start with AWS first. AWS offers CloudFormation Guardrails. This is a very appropriate name since it really keeps your environment on the rails. Guardrails come with four principal features for which it sets policies in JSON format. To create policies, AWS offers Policy Generator. In Policy Generator, you define the type of policy first and then define the conditions, meaning when the policy should be applied:

  • Termination protection: Here, AWS talks about stacks and even nested stacks. Don't get confused – a stack is simply a collection of AWS resources that can be managed as one unit from the AWS Management Console. An example of a stack can be an application that comprises a frontend server, a database instance using a S3 bucket, and network rules. Enabling termination protection prevents that stack from being deleted unintendedly. Termination protection is disabled by default, so you need to enable it, either from the management console or by using command-line programming.
  • Deletion policies: Where termination protection has entire stacks as scope, deletion policies target specific resources. To enable this, you must set DeletionPolicy attributes within the CloudFormation templates. Now, this policy comes with a lot of features. For instance, the policy has a retain option so that whenever a resource is deleted, it's still kept as an attribute in your AWS account. You can also have CloudFormation take a snapshot of the resource before it gets deleted. It's absolutely worthwhile to have a very good understanding of deletion policies in terms of compliancy and audit obligations. Keep in mind that deletion policies are set per resource.
  • Stack policies: These policies are set to define actions for a whole stack or group of resources. An action could be to update all database instances.
  • IAM policies: These policies define the access controls; that is, who is allowed to do what and when? Access controls can be set with fine granularity for whole stacks, specific resource groups, or even single resources and only allow specific tasks to define the roles that users can have. In other words, this is the place where we manage RBAC. The last section of this chapter, Global admin galore – the need for demarcation, is all about IAM and the separation of duties.

    Tip

    More information on Guardrails policies in AWS can be found ay https://aws.amazon.com/blogs/mt/aws-cloudformation-guardrails-protecting-your-stacks-and-ensuring-safer-updates/.

Managing basic operations in Azure

When we look at Azure, we must look at a service called test-driven development (TDD) for landing zones in Azure. TDD is particularly known in software development as it aims to improve the quality of software code. As we have already discussed, the landing zone in Azure is expanded through the process of refactoring, an iterative way to build out the landing zone. Azure provides a number of tools that support TDD and help in the process of refactoring the landing zone:

  • Azure policy: This validates the resources that will be deployed in Azure against the business rules. Business rules can be defined as cost parameters or thresholds, as well as security parameters such as checking for hardening of resources or consistency with other resources. For instance, they can check if a certain ARM template has been used for deployment. Policies can also be grouped together to form an initiative that can be assigned to a specific scope, such as the landing zone. A policy can contain actions, such as denying changes to resources or deploy after validation. Azure policy offers built-in initiatives that can be specifically used to execute TTD: it will validate the planned resources in the landing zone against business rules. A successful validation will result in a so-called definition of done and, with that, accept that resources may be deployed.
  • Azure blueprints: With blueprints, you can assemble policies, initiatives, and deployment configurations in one package so that they can be reused over and over again in case an enterprise wants to deploy multiple landing zones in different subscriptions. Microsoft Azure offers various blueprint samples, including policies for testing and deployment templates. The good thing is that these can easily be imported through Azure DevOps so that you have a CI/CD pipeline with a consistent code repository right from the start.
  • Azure Graph: Azure Landing Zone is deployed based on the principle of refactoring. So, in various iterations, we will be expanding our landing zone. Since we are working according to the principles of TTD, this means that we must test whether the iterations are successfully implemented, that resources have been deployed in the right manner, and that the environments have interoperability. For these tasks, Azure offers Graph. It creates test sets to validate the configuration of the landing zone. Azure Graph comes with query samples, since it might become cumbersome to get started with the settings and coding that Graph uses.
  • Azure quickstart templates: If we really want to get going fast, we can use the quickstart templates, which provide default settings for the deployment of the landing zone itself and its associated resources.

    Tip

    More information on test-driven development in Azure Landing Zone can be found at https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/ready/considerations/azure-test-driven-development.

In all cases, Azure uses ARM templates, based on JSON.

Managing basic operations in GCP

As we have seen, GCP can be a bit different in terms of public cloud and landing zones. This originates from the conceptual view that Google has, which is more focused on container technology using Kubernetes. Still, GCP offers extensive possibilities in terms of setting policies for environments that are deployed on GCP. In most cases, these policies are comprised of organizations and resources that use IAM policies:

  • Organizations: In GCP, we set policies using constraints. A constraint is an attribute that is added to the service definition. Just as an example, we'll take the service Compute Engine that deploys VMs to our GCP project. In Compute Engine projects, logging in for operating systems by default is disabled. We can enable this and set a so-called Boolean constraint, named after George Boole, who invented this type of logic as an algebraic system in the nineteenth century: a statement or logical expression is either true or false. In this case, we set Compute Engine to true. Next, we must set a policy that prevents that login from being disabled: constraints/compute.requireOsLogin. A lot of policies and constraints in GCP work according to this principle.

    Tip

    More on organization policy constraints in GCP can be found at https://cloud.google.com/resource-manager/docs/organization-policy/org-policy-constraints.

  • Resource policies: Cloud IAM policies set access controls for all GCP resources in JSON or YAML format. Every policy is defined by bindings, an audit configuration, and metadata. This may sound complex, but once you understand this concept, it does make sense. First, let's look at bindings. Each binding consists of a member, a role, and a condition. The member can be any identity. Remember what we set previously: in the cloud, basically everything is an identity. This can be users, but also resources in our cloud environment that have specific tasks so that they can access other resources and have permission to execute these tasks. Thus, a member is an identity: a user, a service account, a resource, or group of resources. The member is bound to a role that defines the permission that a member has. Finally, we must determine under what condition a member may execute its role and what constraints are valid. Together, this makes a binding.

    However, the binding is only one part of the policy. We also have an AuditConfig to log the policy and the metadata. The most important field in the metadata is etag. The etag field is used to guarantee that policies are used in a consistent way across the various resources in the project. If a policy is altered on one system, the etag field makes sure that the policies stay consistent. Inconsistent policies will lead to resource deployments to fail.

    Policies can have multiple bindings and can be set on different levels within GCP. However, be aware that there are limitations. As an example, GCP allows a maximum of 1,500 members per policy. So, do check the documentation thoroughly, including the best practices for using policies.

    TIP

    Extensive documentation on Cloud IAM Policies in GCP can be found at https://cloud.google.com/iam/docs/policies.

In this section, we have learned how to create policies by enabling the basic operations (BaseOps) of our landing zones in the different clouds. The next section talks about orchestrating policies in a true multi-cloud setup, using a single repository.

Orchestrating policies for multi-cloud

So far, we've looked at the different ways we can set policies in the major cloud platforms. Now, what we really want in multi-cloud is a single repository where we can store and manage all our policies. Can we do this? From a technological perspective, we probably can: all cloud providers support JSON as a programming format. The problem is that these platforms have different concepts of deploying policies. What's the solution to this problem?

To think of a solution, we must start thinking in terms of layers and abstract logic from the code itself. What do we mean by this? A policy has a certain logic. As an example, from a security perspective, we can define that all the VMs in our environment must be hardened by following the guidelines of CIS, the baseline of the Center for Internet Security. What type of VM we're talking about is irrelevant, as is the type of operating system it runs or on what platform the VM is hosted on. The logic only says that the VM needs to be hardened by following the recommendations of the CIS framework. It's completely abstracted from the code that deploys the VM. If we do this, we can store the policies themselves in a single repository. The only thing we need to do then is add the specific code that is required to deploy the VM to our target cloud platform.

This is basically what HashiCorp's Terraform application does. Terraform abstracts policies from code so that it can deploy Infrastructure as Code on various cloud platforms from a single source of truth. For this, it uses the definition of the desired state: the code that launches the infrastructure resources is completely abstracted from the actual configuration of that resource. It's important to note that Terraform is idempotent and convergent, meaning that only the required changes are applied to return the environment to the desired state.

This point will help you gain a better understanding of desire state configuration (DSC). First of all, DSC is often associated with Microsoft PowerShell. This makes sense since DSC was indeed introduced with Windows Server 2012 R2. However, nowadays, the term desired state is more broadly used to abstract Infrastructure as Code from the actual configuration of that infrastructure. It is commonly used in CI/CD pipelines. Here, development teams can build the necessary systems and when these are pushed to production, the desired state gets deployed. An example is installing a backup agent or bringing resources under monitoring. The following diagram shows the simplified model of desired state:

Figure 6.5 – High-level model of desired state using Infrastructure as Code and Configuration as Code

Figure 6.5 – High-level model of desired state using Infrastructure as Code and Configuration as Code

Let's get back to Terraform. The syntax that Terraform uses allows us to fully abstract resources and providers. It defines blocks that can hold any type of resource, from a VM to a container, but also certain services, such as DNS. This is defined in the HashiCorp Configuration Language (HCL). The next step is to deploy these blocks to our target cloud. This is done by initializing a project in that cloud. For this, the Terraform init command is used. init will read the Terraform configuration files and import the providers needed to connect to various clouds and services.

The next step is to use the Terraform plan command, which is used to create the execution plan. This determines what actions are necessary to achieve the desired state specified in the configuration files. The last step is to use the Terraform apply command, which deploys the actions to reach the desired state.

Terraform will now apply the blocks to the cloud and, at the same time, create a so-called state file. This state file is used to apply future changes to the infrastructure: before changes are applied in an execution plan that is automatically created by the Terraform software, it runs a refresh of the actual environment to update the state file. This way, Terraform always holds the latest version of the actual deployed code and keeps environments in sync at all times.

Tip

You can use Terraform to deploy landing zones in Azure, AWS, and GCP. In Azure, this will create a basic setup that enables activity logs and a subscription for Azure Security Center. For AWS, the Terraform HCL scripts call the AWS Landing Zone solution that we described in this chapter. You can find the Terraform code for Azure at https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/ready/landing-zone/terraform-landing-zone. The code for AWS has been made publicly available by the Mitoc Group on GitHub: https://github.com/MitocGroup/terraform-aws-landing-zone.

If you are already familiar with configuration tools such as Chef or Puppet, you will find that there's some overlap in the functionality of Terraform and some other tools. The big difference is that Terraform actually provisions new infrastructure resources, where most other tools are more focused on adding configuration settings to resources that have been previously deployed. This does not mean that configuration tools are useless; on the contrary. These tools have other use cases; there's no good or bad.

The key to multi-cloud is the single pane of glass view. We will discuss this frequently in this book. However, this is a complicated area. Companies such as ServiceNow target their development at creating platforms from which enterprises can do multi-cloud orchestration from just one console. At the time of writing, the latest release of ServiceNow is Orlando. It contains a product for Policy and Compliance Management that provides a centralized process for creating and managing policies, cross-cloud.

In summary, yes, you can deploy code and policies that are agnostic to different cloud platforms. However, it does require tooling. Throughout this section, we've explored some of the leading tools on the market, at time of writing. All of this requires a thorough understanding of abstracting the infrastructure resources from functionality and policies, resulting in the desired state of the resources.

Global admin galore – the need for demarcation

Typically, when we talk about demarcation in cloud models, we refer to the matrix or delineation of responsibility: who's responsible for what in IaaS, PaaS, and SaaS computing? The following diagram shows the very basics of this matrix:

Figure 6.6 – Demarcation model in cloud deployment

Figure 6.6 – Demarcation model in cloud deployment

However, we need a much more granular model in multi-cloud. We have been discussing policies throughout this chapter and by now, we should have come to the conclusion that it's not very easy to draw some very sharp lines when it comes to responsibilities in our multi-cloud environment. Just look at the solution stack – even in SaaS solutions, there might be certain security and/or compliancy policies that the solution needs to adhere to. Even something such as an operating system might already be causing issues in terms of hardening: are monitoring agents from a PaaS provider allowed or not? Can we run them alongside our preferred monitoring solution? Or will that cause too much overhead on our systems? In short, the world of multi-cloud is not black and white. On the contrary, multi-cloud has an extensive color scheme to work with.

So, how do we get to a demarcation model that will work for our enterprise? Well, that's architecture. First, we don't need global admins all over our estate. This is a major pitfall in multi-cloud. We all know the cases: the database administrator that needs global admin rights to be able to execute certain actions or worse, solutions that require service accounts with such roles. It's global admin galore. Do challenge these requests and do challenge software providers – or developers, for that matter – when it comes to why systems would need the highest possible access rights in the environment.

That's where it starts: policies. In this case, a good practice is the Policy of Least Privilege (PoLP). This states that every identity is granted the minimum amount of access that is necessary to perform the tasks that have been assigned to that identity. Keep in mind that an identity, in this case, doesn't have to be a user: it can be any resource in the environment. When we are talking about users, we're addressing this as Least-Privileged User Account or Access (LPUA). PoLP helps in protecting data as data will only be accessible when a user or identity is explicitly granted access to that data. But there are more reasons to adhere to this policy. It also helps in keeping systems healthy as it minimizes risks or faults in systems. These faults can be unintended or the result of malicious conduct. We should follow the rule of least privilege at all times. We will discuss this in more detail in Chapter 15, Implementing Identity and Access Management, which is all about identity and access management.

Regarding this very first principle, there are a few more considerations that need to be made at this stage. These considerations translate into controls and with that, into deliverables that are part of BaseOps, since they are absolutely part of the foundational principles in multi-cloud. The following table shows these controls and deliverables:

Demarcation and separation of duties is very strongly related to identity and access management. That will be discussed in full in Chapter 15, Designing Identity and Access Management.

Summary

In this chapter, we have designed and set up our landing zones in the different major cloud platforms. We have learned that the foundational principles might be comparable, but the actual underlying implementation of the landing zone concepts do differ.

Next, we explored the principles of Infrastructure as Code and Configuration as Code. With tools such as Terraform, we can manage multi-cloud from one code base using configuration policies that have been abstracted from the resource code. We then learned how to define policies and how to apply these to manage our landing zones. Finally, we learned that there's a need for a redundant demarcation model in multi-cloud. This all adds up to the concept of BaseOps: getting the basics right.

Part of keeping the basics right is making sure that our environments are resilient and performing well. That's what we will be discussing in the next chapter, which is all about creating availability and scalability in the cloud.

Questions

  1. A basic infrastructure in the cloud consists of five major domains, three of which are network, compute, and storage. What are two other domains?
  2. What is the best practice deployment model for the landing zone in Azure?
  3. AWS offers a service called Landing Zone. In enrolls four accounts. In which account is single sign-on managed?
  4. A good practice in managing identity and access management is PoLP. What does PoLP stans for?

Further reading

  • VMware Cross-Cloud Architecture, by Ajit Pratap Kundan, Packt Publishing
  • Azure for Architects, by Ritesh Modi, Packt Publishing
  • Architecting Cloud Computing Solutions, by Kevin L. Jackson, Packt Publishing
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset