Chapter 4
Compute

EXAM AZ-303 OBJECTIVES COVERED IN THIS CHAPTER:

  • Implement and Monitor an Azure Infrastructure
    • Implement VMs for Windows and Linux
  • Implement Management and Security Solutions
    • Manage workloads in Azure
  • Implement Solutions for Apps
    • Implement an application infrastructure
    • Implement container-based applications

EXAM AZ-304 OBJECTIVES COVERED IN THIS CHAPTER:

  • Design Infrastructure
    • Design a compute solution
    • Design an application architecture

Companies or developers expose their applications to consumers or employees via a computer. A computer is often visualized as a workstation or laptop that is on the desk or table in front of you. However, the computer I am referring to here is a server in a data center whose sole purpose is to provide a service instead of consuming it. That's clear, right? But, because servers play such a significant role in the existence of an application, often compute products are wrongly chosen as the first point of entry and design into Azure. Compute, aka the hosting model, is not the most optimal entry point for production workloads. As you already know, security and networking components need to be considered prior to compute.

Beginning with the creation of an Azure compute resource before the other two steps (security and networking) can lead to growing pains or leaving your application vulnerable to bad actors. For example, did you make your subnet too small for the planned number of resources being placed into it, or do you have control over who can create Azure resources in your subscription? Do you need to synchronize your on-premise Active Directory with Azure to run restricted applications on compute resources hosted in the cloud? What region do you need to place your compute resources into? Those are just a few of the questions needing answers that could be overlooked by jumping straight to compute.

By no means am I attempting to convince you to underestimate the importance of compute. Compute is where the magic happens and should be considered the heart of your IT solution. Without it, there is no place to run the service your program was written to provide. With no compute, there is no need for a network nor any kind of security implementation. You can even argue the importance of compute from a database perspective. It is great to have data, and companies have a lot of it, but without a computer to capture, interpret, manipulate, and present it, what value does data have? The point is that compute is a place where you can realize impact and show progress the quickest, which makes it a favorable entry point. However, doing so would showcase short-term thinking and lack of structured planning. Therefore, it is recommended you follow the process discussed so far (first security, then networking, then compute). By doing so, the odds of successful migration or creation of your solution onto Azure will greatly increase.

“Excellence is never an accident. It is always the result of high intention, sincere effort, and intelligent execution; it represents the wise choice of many alternatives—choice, not chance, determines your destiny.”

—Aristotle

Although Aristotle was referring to life in that quote, the same principle can be applied to choosing Azure compute products; there are a lot of choices, and making the wrong choice, as in life, can have undesirable consequences. Moving to or creating a new solution on Azure unfortunately doesn't get much easier at this point. Compute is simply the next level of technical competency required to progress toward your ultimate goal of Azure proficiency. Choosing which Azure compute resources you need, how much of them, how much they cost, and how you get your application configured to run on them, can be a great challenge. But with good intentions and sincere effort, it is possible.

An Overview of Compute (Hosting Model)

This chapter explains the details of Azure compute, which is synonymous with the hosting model. To make sure you know what hosting model means, take a look at the exam objectives covered in this chapter. To call out a few, Azure VMs, Service Fabric, App Services, and Azure Functions are hosting models. Each of these compute products has specific benefits, use cases, and limitations that require clarity so that the best one is chosen for the workload being created or moved to Azure. The specifics of each hosting model will be discussed in more detail in their dedicated sections. Until then, let's focus on two key elements: cloud service models and how to choose the right hosting model.

Cloud Service Models

I expect you to already know these cloud models if you are now preparing for the Azure Solutions Architect Expert exam; however, a review will cause no harm. You will know them as IaaS, CaaS, PaaS, and FaaS. There are numerous other “-aaS” acronyms for cloud service models; however, those listed previously are the most common in the Azure context. Refer to Figure 4.1 for a visual representation of them and read on for their description.

Infrastructure as a service (IaaS) was one of the earliest cloud service models, if not the first. This model most resembles the long-established on-premise architecture where a software application executes on a stand-alone or virtual server. The server is running an operating system and is connected to a network. As shown in Figure 4.1, the cloud provider (i.e., Microsoft) is responsible for the hardware and its virtualization, while the operating system (Windows or Linux) is the responsibility of the customer. The customer's responsibilities include, for example, updating the OS version and installing security patches. The customer has great control and freedom with this model, but with that comes greater responsibility. Microsoft only commits to providing the chosen compute power (CPU and memory) and ensuring that it has connectivity to a network. All other activities rest in the hands of the customer. Azure Virtual Machines, discussed in more detail later, is Microsoft's IaaS offering.

Snapshot of the cloud service models and their responsibilities.

FIGURE 4.1 Cloud service models and their responsibilities

A container as a service (CaaS) is one of the newer cloud service models on Azure. CaaS delivers all the benefits available in IaaS, but you drop the responsibility for maintaining the operating system. This is visualized in Figure 4.1 as the operating system box is now the same shade as the hardware and virtualization boxes. A popular containerization product is Docker, which allows the bundling of software, dependent libraries, and configurations into a single package. Then that package can be deployed into and run on a container hosting model. Azure Kubernetes Service (AKS), Azure Container Instances, Azure Service Fabric, and Web App for Containers all provide the service for running containerized packages on Azure, all of which are discussed later.

Platform as a service (PaaS) is an offering for customers who want to focus only on the application and not worry about the hardware, network, operating system, or runtimes. This offering comes with some restrictions, specifically in regard to the runtime. A runtime is the library of code that the application is dependent on, for example .NET Framework, .NET Core, Java, PHP, or Python. The dependency on the runtime is defined during development. For example, if your application targets Python 3.7.4 but that runtime is not available on the PaaS hosting model, then your application will not run. The same goes for .NET Core 2.2. If you target 2.2 during development but that runtime is not yet on the platform, then it also will not run. Making changes to the runtime is not allowed when running on a PaaS platform. There are other constraints such as changing or deleting operating system–level configurations. This isn't allowed because they may cause damage, and as you are not responsible for the operating system, you cannot make changes to it.

Notice the shaded Unit of Scale box on the far right of Figure 4.1. The boxes with the similar shades under the titles IaaS, CaaS, PaaS, and FaaS symbolize where scaling rules (or manual scaling) are applied when executed. Scaling is a standard PaaS feature but can also be realized using virtual machine scale sets in IaaS. When a scale command is executed, a duplicated instance of your application is brought online for user consumption. The duplication is from the virtualization level upward, whereby the VM on which the PaaS runs will have the same specification. For example, if four CPUs and 7GB of memory (a common size of a single-instance PaaS production workload) are chosen, then the operating system with all patches, the runtime, the containers, and the application code will be scaled, making all the instances identical.

Autoscaling or scaling has not been discussed in much detail so far, but the concept has been touched on. Again, scaling is the most valuable and cost-effective offering that exists in the cloud, from a compute perspective, because it optimizes utilization, which reduces the overall cost of compute power. Products such as Azure App Service and Cloud Services (deprecating) are Microsoft's PaaS offerings.

The final cloud service model to describe is functions as a service (FaaS). FaaS is most commonly referred to as serverless computing and is offered via a product called Azure Functions. Unlike the previously discussed cloud service models, FaaS does not require the creation of a compute instance. When creating an instance of Azure VM or an Azure App Service, each of those services requires the selection of an SKU, which describes the number of CPUs, amount of memory, and storage capacity. In the FaaS context, this is not required; instead, you simply create the Azure Function and deploy your code to it. The platform is then responsible for making sure there is enough compute capacity to execute the code. This simply means that the scaling is done for you. There are some restrictions such as the length of time an Azure Function can run, and there is a limit on the amount of capacity you can get allocated. Both of those limits and other limitations to watch out for will be covered in more detail later in the chapter.

How to Choose the Right Hosting Model

Buying the right Azure compute product and getting it to work properly depends greatly on understanding your own application's requirements. Can your code run successfully using Azure Functions (FaaS), or does your solution require options only available through Azure VM (IaaS)? You might be thinking, “I know my application well. I know its dependences, but I don't know what is and what is not supported on each of the Azure compute products. How do I get started?” I can relate to that due to the sheer number of Azure compute options; it can be an overwhelming situation and a cause of great uncertainty.

One reason you may be unsure as to which compute option to use is because you have not finished reading this chapter yet. All the details you need to know to make an educated decision are included in this chapter. But to get started, take a look at the decision diagram presented in Figure 4.2. The diagram is intended only to get you started and to narrow down the number of possible options to a more manageable amount. Notice there are seven possible compute options presented in the diagram; if it helps you reduce the number of options to two or three, then consider that a good thing.

Snapshot of the compute selection decision tree.

FIGURE 4.2 Compute selection decision tree

Let's walk through the decision tree together starting with the choices for creating a new application or migrating an existing one (bold words in the discussion relate to a step in Figure 4.2). For the Azure Solutions Architect Expert exam, understanding the migration of existing solutions to Azure is most important, so let's focus specifically on that path. The answer to Create New, therefore, is no. The next decision point is Migrate. This may be a bit confusing, because of course you are migrating, but the question to answer here is whether you plan on making any cloud optimizations to the solution. Cloud optimizations are discussed later in the “Azure Compute Best Practices” section. But for now, the decision is, will you simply lift and shift (aka rehost), or will you do some optimizations to make the program behave better on cloud architecture?

Assuming you will simply lift and shift with no cloud optimizations (not Cloud Optimized) immediately reduces the number of recommended options to three Azure compute products.

  • Azure Container Instances
  • Azure Virtual Machines
  • Azure App Service

Can your solution run within a container; can it be Containerized? If yes, then use the Azure compute product for your application code and proceed toward Azure Container Instances. If no, is the product a Web Application or a Web API? If yes, then the best option for Azure compute would be an Azure App Service. If the code being migrated to Azure is not web/internet-based, then your best choice is an Azure VM.

Now go back and take a look at the decision tree where we chose no for making cloud optimizations; this time choose yes for Cloud Optimized. Notice that electing to make cloud optimizations to your existing application increases the number of available compute options. This is a good decision because many of the Azure compute products that support the most technically advanced cloud capabilities require tweaking to get the code functional; just a simple lift and shift is not enough to get the most advanced technical benefits. In many cases, these other cloud-optimized compute options are more cost effective. Lastly, as you will learn in Chapter 7, “Developing for the Cloud,” there are some specific technical concepts that exist in the cloud that you must be aware of. These concepts may not be intuitively obvious to those having only experience with creating and maintaining IT solutions on-premise.

Next answer the question about HPC. High Performance Computing (HPC), aka Big Compute, is a IT solution that uses a large amount of CPU, GPU, and memory to perform its function. These kinds of workloads are typically used in finance, genomics, and weather modeling. The amount of compute power for these processes is huge. If your application falls into the HPC category, then Azure Batch is the place to begin your research and analysis. If not, can your application run in a serverless context? Azure Functions is the Microsoft serverless compute offering. The primary difference between Azure Batch (HPC) and Azure Functions is the size and scale required for processing. The program triggered from an Azure Batch job can be small and compact, but the amount of compute power required to run would likely consume more CPU and memory than is available from an Azure Function. Azure Functions too, should be small and compact. The amount of CPU and memory would be more mainstream and can be large, just not jumbo. In both cases, HPC and serverless are scaled dynamically for you so that the program will successfully complete; the scale is what is different. This will become clearer as you read through the chapter. Don't worry.

In reality, all Azure compute products are running on Azure VMs behind the scenes. Azure App Services, Azure Batch/HPC, Azure Functions, Azure Containers, Service Fabric, and Azure Kubernetes Service (AKS) all run on Azure Virtual Machines. The remaining three compute options are focused primarily on, but not limited to, deployment, maintenance, and failover tasks. These capabilities are commonly referred to as the orchestration of containerized workloads. I would go so far and confidently state that most legacy enterprise applications cannot simply be containerized and orchestrated without significant investment in both application redesign and IT employee training. The concepts of containerization, orchestration, and maintenance didn't exist a little more than a decade ago. That being said, if the application would not benefit from Full Orchestration, then Azure Container Instances is a recommended point of entry for the solution. Service Fabric is focused on the Microsoft Stack (.NET and Windows), and AKS is focused on open source stacks (PHP, Python, Node.js, and Linux).

The decision tree is intended as a starting point. I hope after reading the previous text and viewing the flow chart, things are not as overwhelming as you might have initially thought. Before we get deeper into the specific Azure compute products, take a look at the following two sections, which contain some general information to consider as you begin the procurement of Azure compute.

Architectural Styles, Principles, and Patterns

The decision-making process surrounding Azure compute products requires a solid understanding of the technical requirements of your application. Proficient knowledge of the application's style and architectural pattern is an added necessity. The style of application has great impact on the combined architectural pattern and its defined use cases. Here, use cases refers to the services that provide the application's purpose. The following bullet points give a quick overview of the three topics that are discussed briefly in this section:

  • Styles: Big Compute, Big Data, event-driven, microservices, n-tier, web-queue-worker
  • Principles: Self-healing, redundancy, scaling, data storage
  • Patterns: Circuit breaker, gatekeeper, retry, sharding

A previous discussion around the decision tree flow had to do with HPC versus Azure Functions. There, I linked HPC with its Big Compute style. The Big Compute style is a rather standard architectural pattern. In general, there is a scheduler, like Azure Batch, which also coordinates the tasks on the provisioned Azure VM worker pool (see Figure 4.3).

Snapshot of an HPC diagram with Azure Batch.

FIGURE 4.3 An HPC diagram with Azure Batch

The tasks are commonly either run in parallel when there is no dependency between them or coupled when more than a single task needs to run on the same resource in sequence. Perhaps the use case here is number computation for Monte Carlo simulations. The point is that a Big Compute architecture pattern would mimic Figure 4.3 to some extent most of the time in this application style. If you implement a different pattern for that style, then you may have availability or performance issues because the wrong approach was implemented. Take next, for example, Azure Functions, which is event-driven. This style would have an event producer, event ingestion, and one or more event consumers, as visualized in Figure 4.4.

Snapshot of an event-driven architecture diagram.

FIGURE 4.4 An event-driven architecture diagram

An event producer could be numerous IoT devices that are measuring humidity and updating a Service Bus message queue. The Service Bus is providing the event ingestion service. The humidity reading is then consumed or processed and stored into a database, for example by the Azure Function, i.e., the event consumer. Each style has a recommended or common architecture pattern, and each pattern is covered in detail in Chapter 7. Keep in mind that the architecture on which the application executes also has best-practice design principles and patterns. This means that in every case one must understand which compute product to use, know how it works, and then build the application following the best-case patterns for that hosting model; read Chapter 7 for those details.

To touch only briefly on Azure design principles, there is one that jumps out and is worthy of mention here and again later. The principle is referred to as design for self-healing. Self-healing means that when a server is recognized as not being healthy, then the platform, in this case Azure, takes an action to make it healthy again. In many cases, this a reboot of the virtual machine. In the cloud context, there is a term called hyperscale, which means there is so much capacity being added and removed so quickly that its complexity exceeds that of human capacities. There is no chance that Microsoft could hire enough people with the right skills to manage all the servers that exist in the Azure cloud today; it must be automated.

The health of an application is not the responsibility of the cloud provider; however, an unhealthy application can cause the host (i.e., the virtual machine) to become unhealthy. For example, when there is a memory leak, the storage capacity is 100% consumed or there is a fatal Blue Screen of Death (BSOD), so the server and the application will not be usable anymore. Some action would need to happen to bring the application back online, and that action cannot be a manual one. That action is called auto-heal or self-heal. That brings you to the conclusion that when writing code, your application must be able to withstand a self-healing when a fail occurs.

One cloud design pattern for handling self-healing is called retry, as illustrated in Figure 4.5. Assume that for some reason the VM that the Azure App Service is making a connection to was determined to be unhealthy and is performing a recycle of the website.

Snapshot of a retry cloud design pattern diagram.

FIGURE 4.5 A retry cloud design pattern diagram

If the site has high traffic, then during that recycle, it is probable that a request to the VM will fail. In the application code, you must handle the exception and perform a retry of the request that just failed. It does depend on the scenario and requirements of your application. It might be okay to simply return an exception message to a client requesting a document, for example, while not so acceptable if an exception is returned while placing an order. The preceding few sentences should now clarify my previous comment that you must be aware of specific technical concepts that exist in the cloud that may not be intuitively obvious to those having only experience with creating and maintaining IT solutions on-premise. It is for sure that exceptions happen when running applications on-premise, but most on-premise applications have support teams that can connect via RDP to the machine and manually correct the problem. This is not always the case in the cloud; the scale is simply too large for manual activities to be the norm. Therefore, instead of manual actions, recovery is performed by the platform automatically. All styles, principles, and patterns are discussed in detail in Chapter 7; if you are interested in learning more about them now, skip ahead and learn more.

Azure Compute Best Practices

Chapter 7 has in-depth coverage of cloud best practices, styles, principles, and patterns. The awareness of these concepts at this point, however, is necessary because each will influence the decision of which Azure compute product to deploy or create your application on. From a best-practice perspective, some decision points are again based on the requirements of the application. There are best-practice recommendations for applications that utilize and expose APIs and for applications that are background jobs. In addition, best-practice guidelines exist for implementing autoscaling, monitoring and diagnostics, caching, and how to best recover from a transient failure.

From an API best-practice perspective, applications would be best suited for supporting clients and consumers if they were to implement a Representational State Transfer (REST) API. REST APIs are endpoints that expect and deliver responses in the form of JSON documents. There are numerous other technologies that support this kind of internet API capability, such as Electronic Data Interchange (EDI), XML documents, Tuxedo, Simple Object Application Protocol (SOAP), and the Windows Communication Foundation (WCF) framework. Each of those techniques would work, but best-practice recommendations, when your requirement is to expose an API using the HTTP protocol, are to use the REST API architectural style.

From a background job perspective, there are as many options as there are scenarios in which background jobs operate. A background job is a program that runs on a computer that does not have a user interface and typically processes data and delivers the results of the processing. There are two primary scenarios to discuss regarding background processing: how to trigger it and where to run it from. Triggering refers to how the program gets started. As mentioned, there is no interface with a button to click that tells the program to run. Instead, the background process can be scheduled to run at certain intervals or triggered when an event takes place. Running at a scheduled interface is relatively straightforward; CRON is the most common scheduler for this scenario. The other scenario is much more dependent on the type of event and what the requirements are. An event is somewhat synonymous with a message, and all the messaging Azure products are discussed in Chapter 6. There are a number of them, and all have their best-case scenarios, use cases, and software development kits (SDKs). In short, the background process would be hooked into a queue of some kind where a message would be sent (remember the event-driven diagram from Figure 4.4). When a message is received, the hook is triggered that invokes the background job, which then performs the code contained within it. Which hosting environment to use also plays an important role as many Azure compute products can be used to run APIs and background jobs. Azure App Service WebJobs and Azure VMs are well suited for running background jobs and supporting APIs. Azure Batch and Azure Kubernetes Service (AKS) can also be used for running background jobs, but Azure Batch is not intended to host APIs as you would imagine. By the time you complete this chapter, it will be clear which model to use for which application style; however, you will need to know what patterns your application implements and then search the best-practice patterns for that one specifically.

If you intended on implementing autoscaling, any custom monitoring, or diagnostics capabilities or caching, there are some good examples of how to do this. Again, these are discussed in Chapter 7. A description of a transient failure and the expectation of its occurrence in the cloud are worthy of some initial discussion. From an on-premise perspective, transient errors or any error caused by a network virtual appliance hang, memory failure, or hard drive crash are not expected. This is because those hardware components are of trademark quality and are expected to have a high level of stability (which you pay a premium for). In the context of hyperscale cloud architecture, the hardware is more of a commodity. That means when it fails, it simply gets replaced, and the old one is trashed. That's fine, but those failures can have what is called a transient event, or an event that happens and is self-healed after some short amount of time. This is something that isn't expected often when running on-premise but needs to be coded for when running in the cloud because it can and will happen more often. The application must gracefully recover from short, random, nonreproducible moments of downtime as a matter of design.

In conclusion, each best-practice example is bound to a cloud service model, hosting model, style, principle, and patterns that are implemented into or required by your application. Figure 4.6 illustrates a Venn diagram to visually represent the relationship between these concepts.

Snapshot of a Venn diagram that visually links the cloud service model, hosting model, style, pattern, design principles, and patterns together.

FIGURE 4.6 A Venn diagram that visually links the cloud service model, hosting model, style, pattern, design principles, and patterns together

The sweet spot directly in the middle of those five concepts is where you want your application to land. Finding that spot is no easy task; however, knowing the requirements to make an educated decision is the first step. The remaining comes from experience. You will gain some of that experience as you complete the exercises in this chapter. You should be able to answer questions like the following:

  • Which two of the following compute hosting models provide the highest level of customization?
  • Azure Virtual Machines
  • Azure App Service
  • Azure Functions
  • Azure Container Instances
  • The answers are Azure Virtual Machines and Azure Container Instances. As you will learn, both Azure App Services and Azure Functions run within a sandbox that limits some kinds of supported configuration activities. You have more freedom with a virtual machine and a container. This will become crystal clear as you read more.

Azure Container Instances

The Azure Container Instances (ACI) is Microsoft's container as a service offering and is an entry point for customers with applications that run within isolated containers, where a container is an application or program that is packaged with all dependencies and deployed onto a server for its execution. The package is often referred to as an image, while the container is often referred to as the runtime. The following are a few benefits for running application code in containers:

  • Containers are portable.
  • Containers are lightweight.
  • Containers are flexible.
  • Containers are scalable.

From a portable perspective, containers allow a developer or release manager to have confidence that the code in the container will run anywhere, on a local development environment, on a corporate on-premise physical server, or in the cloud. It will run anywhere without making any configuration or coding changes because it has no system dependencies. A container is lightweight because it reuses the operating system of the host; there is no need for an operating system to be deployed along with the container. It is flexible because all kinds of programs—small, large, simple, or complex—can run in a container. They are scalable, which means the number of servers that can simultaneously run the container can be added to provide more compute power when usage increases.

When a container package is deployed to ACI, it receives a public-facing IP address and domain name with the extension *.<region>.azurecontainer.io , where * is the name of the ACI container that must be unique for the given region. A region, as covered in detail in Chapter 3, is the geographical location that the ACI will be installed into. This is selected during the container's creation. Regions are, for example, northeurope or southcentralus. Keep in mind that ACI is not yet available in all Azure regions; however, the deployment globally is ongoing and will reach worldwide scale in the short term.

ACI offers support for containers running on either Windows or Linux. The container concept has been mostly focused on the open source community and therefore Linux, so you will find the number of supported features for Linux images is greater than that currently for Windows. Table 4.1 lists the currently supported Windows base images on ACI.

TABLE 4.1 ACI-Supported Windows Images

Server Version Edition Version
Windows Server 2016 Nano Server 10.0.14393.*
Windows Server 2016 Server Core 10.0.14393.*
Windows Server 2019 Nano Server 10.0.17763.*
Windows Server 2019 Server Core 10.0.17763.*
Windows Server 2019 Windows 10.0.17763.*

The thing about Linux is that there are a lot of versions and editions. And when I write “a lot,” I mean a lot. In reality, there can be an infinite number of versions and editions because the Linux operating system is open source, and if I were so inclined, I could create my own version and edition of Linux, and it would run on the ACI platform. Therefore, there is no table that defines which Linux images will run on ACI. Be assured, however, that mainstream versions of Linux will deploy and run as expected on the Azure Container Instances platform.

OS Virtualization, Containers, and Images

If you are like me, meaning most of your career has been in the Microsoft world of Windows and the .NET Framework, then the concept of images and containers may be a bit of a mystery. This is common because like I wrote earlier, this concept was mostly confined to the open source community, which until recently Microsoft was not actively engaged in. Although there are numerous products that provide OS-level virtualization, the most common one is Docker. The Docker software is less than a decade old, which may seem like a long time when considering “cloud speed” but in reality is not. Docker became a publicly available open source product in 2013 and only became supported in Windows Server 2016, which was shipped that same year. Based on that, I am confident that you will agree containers are a relatively new concept, especially from a Windows perspective. Be assured, however, that this area is picking up steam and will become a must-know skill set, especially for an Azure Solutions Architect Expert.

Let's look at what OS-level virtualization is in a bit more detail. Take a look at Figure 4.7, which compares a virtual machine with a container. We have not discussed Azure Virtual Machines in detail yet; that is coming in the next section. However, I would expect anyone reading this book to have a decent understanding of what a virtual machine is. You should have also created a number of Azure VMs in the previous chapter.

Snapshot of the comparison of virtual machines and containers.

FIGURE 4.7 Comparison of virtual machines and containers

Notice that the primary difference between running application code on a virtual machine and a container is that the operating system is abstracted away. When you create a virtual machine, part of the process is to choose the operating system that is installed along with the acquisition of the CPU, memory, and storage resources. In many cases, a program you want to run doesn't warrant the existence of an operating system, because the operating system itself consumes more of the compute power than the application. In that scenario, having an alternative to run a program on a host in an isolated container, without having a dependency on an operating system, is a desirable option.

So, what is an image exactly? I will explain it here, but note that in one of the exercises later in this chapter, you will get to create an image. It will become clearer as you read on. An image is the package of files, configurations, libraries, and a runtime required by a program to run. If your program is written in ASP.NET Core, then the code itself, its dependent libraries, and the runtime in which the code can execute together constitute the image. The image is defined in text form and then built, most commonly, using the Docker program. Once the image is created, you can deploy it to Docker Hub for private or public consumption. Figure 4.8 illustrates the relationship between an image and a container visually, which may help with your understanding of those terms.

Snapshot of the images compared to containers.

FIGURE 4.8 Images compared to containers

A container, on the other hand, is an instance of the image runtime. When an image is deployed to, for example, Azure Container Instances (the host), initially it will only consume space on some form of storage provider, like a hard drive. In that state, it is still an image. Once the image is instantiated, accessed or run, and loaded into memory, it becomes a container. The container becomes a process that comprises the runtime and application code for responding to requests, messages, or web hooks. In summary, a container is a living instance of an image on a host, while an image is a package of code, libraries, and a runtime waiting to be used.

Container Groups and Multicontainers

A container group, as one would imagine, is a group of containers. So what is a container? Can you answer that without reading forward or backward? A container is an instantiated image or an image that is loaded into memory ready to execute and perform its intended purpose. Therefore, a container group is a group of images running on a host ready to do some work.

It is possible to have multiple container groups, but it is currently only possible to have multiple containers in a single container group when running in a Linux container. This means when you consider a container group as the unit that runs on a virtual machine and the container as a process running on the virtual machine, you can run many containers within the same process. If you know something about IIS, a synonymous scenario is when you run multiple websites within the same application pool, assuming the application pool maps to a single process. See Figure 4.9.

Snapshot of a representation of multiple containers in a single container group.

FIGURE 4.9 A representation of multiple containers in a single container group

Visualize the virtual machine where the container group consists of two websites (i.e., containers), both of which are running within the same process, where a process means all the EXEs running on the machine that are presented when you open Task Manager or Process Explorer. The container is one of those EXEs. The caveat here is that each container in the container group must be bound to a unique port. As shown in Figure 4.9, one of the containers is bound to port 80, while the other is bound to port 8080. Now that you have some understanding about images and containers, complete Exercise 4.1, which will help you get a workstation configured to enable the creation of an image and the local execution of a container. If you do not want to install the third-party software required to create an image and run a container, that is not a problem. You can skip Exercises 4.1, 4.2, and 4.3. EXERCISE 4.4 and the following exercises are not dependent on those three exercises; complete them if you want to get hands-on experience with Docker.

As alluded to in EXERCISE 4.1, the variety of operating systems that exist such as Linux, Windows, and iOS would make the exercise too repetitive to perform the steps for each one of them. In general, you simply need to get Git and Docker installed and working on your workstation; I am certain this is achievable with the details provided in the Windows-focused exercise.

With newer workstations, BIOS-level virtualization is enabled by default; for older ones it is not. The means for accessing the BIOS of a computer is dependent on the manufacturer of the machine. Although it is common to press F12 or F2 during bootup, it is not a standard activity, nor is the navigation within the BIOS system, as it too is created by the manufacturer.

Before continuing to the creation of a Docker image and running it in a local Docker container, please review these important limitations. These should be understood when choosing to run your application in ACI.

The following are not supported in an ACI:

  • Microsoft Message Queuing (MSMQ), Microsoft Distributed Transaction Coordinator (MSDTC), Microsoft Office, UI Apps, and Windows infrastructure roles such as DNS, DHCP, file servers, and NTP are not supported on Azure Container Instances.
  • There are limits based on the number of containers that can be created and deleted in a 60-minute time frame (300 creates/deletes per hour and 100 creates/deletes in a five-minute period).
  • There is a limit of 100 container groups per subscription and 60 containers per container group.

Docker Components

Before you build a Docker image and run it in a local Docker container, let's learn a bit more about what Docker is and how it works. The five specific Docker components that you need to know are listed next:

  • Docker engine
  • Docker daemon
  • Docker client
  • Docker registry
  • Docker objects
  • Docker Engine   The Docker engine is a group of the following Docker components: the daemon, the APIs, and the client.
  • Docker Daemon   The Docker daemon, dockerd , is a listener that responds to requests via APIs that originate from the Docker client or another Docker daemon. These requests concern the management of container, images, volumes, and networks.
  • Docker Client   The Docker client, which is a command-line interface (CLI), was installed in EXERCISE 4.1 and is the program docker that you used to check the version and run the hello-world sample image, also in EXERCISE 4.1. The client sends the commands to dockerd , which is then responsible for routing the request to the correct API that executes the request.
  • Docker Registry   The Docker registry is a location where you can store Docker images for public or private consumption. The registry is known as Docker Hub and is accessible here: https://hub.docker.com. Executing the docker push or docker pull command with the required parameters will publish an image to Docker Hub or download an image from Docker Hub, respectively. As we are focused on Azure, the image created in the next exercise, Exercise 4.2, will not use Docker Hub. Instead, there is an Azure feature called the Azure Container Registry (ACR) that provides the same benefits as Docker Hub.
  • Docker Objects   Docker objects are considered images and containers, which have already been discussed. In short, an image is a template that includes the instructions for how to create a container. While a container is a runnable instance of the image, the container is executed using the Docker client, docker run , which sends a request to the daemon to spin up an instance of the container.

Finally, there are two more topics that require discussion: the Dockerfile and the runtime of the containers. In the next exercise, you will create a Dockerfile and run it in a Linux container. Here is an example of the Dockerfile that you will create later in EXERCISE 4.2:

FROM http://mcr.microsoft.com/dotnet/core/aspnet:2.2
WORKDIR /app
COPY ./publish .
EXPOSE 80
EXPOSE 443
ENTRYPOINT ["dotnet", "csharpguitar-aci.dll"]

The FROM instruction identifies the base image, in this case aspnet:2.2 . By doing this, you do not need to start from scratch and manually install all the packages required to run this kind of application. If additional or basic packages are required for the application, you would use the RUN instruction and then the package name. WORKDIR is short for working directory. Once this is defined, it is the reference point from which other instructions operate. For example, the point of entry for the COPY instruction is the /app directory. EXPOSE is an important instruction as it informs the users on which port the application will listen. Ports 80 and 443 are common ports for running web applications. ENTRYPOINT is like the Main() method of a console application. This instruction notifies the Docker build where and how to interface with the image once it's instantiated into a container.

Docker Desktop for Windows supports two container types, Windows and Linux. After installing Docker on your workstation, there will be a white badge that looks like a whale with some ship containers on its back, similar to that shown in Figure 4.13.

Snapshot of the Docker container operating systems.

FIGURE 4.13 Docker container operating systems

If you click that badge, a menu that displays the type of container you will get when you run the docker build command is rendered. If you see Switch To Windows Containers, it indicates that you are in Linux mode. If you see “Switch To Windows Containers,” you are in Windows mode. The sample program you will use in the next exercise is an ASP.NET Core Web API that can run on both Windows and Linux. We will use Linux for the next exercise, so make sure you have it set correctly. In EXERCISE 4.2, you will create a local Git repository, download an ASP.NET Core Web API hosted on GitHub, create a Docker image, and run it in a Linux Docker container.

Azure Container Registry

An Azure Container Registry (ACR) is a location where you can store container images. As mentioned earlier, Docker Hub provides the same capability as ACR; both are places to store containers. The primary difference is that ACR is private only, while with Docker Hub you can make your images publicly consumable. Docker Hub is accessible at https://hub.docker.com and in EXERCISE 4.4. If you did not perform EXERCISE 4.1 and EXERCISE 4.2, use the public Docker image that I created called benperk/csharpguitar-aci . Complete Exercise 4.3 to create an ACR and upload the image you created in EXERCISE 4.2.

During the creation of the ACR, you selected an Admin user option and the SKU. I mentioned that images hosted in an Azure Container Registry are private. A username and password are required for access whether you are using the Docker CLI in a client like PowerShell or creating an Azure Container Instances. (You will do that later.) The Admin user option can be enabled or disabled from the Access Keys blade for the given ACR. That blade contains the registry name, login server, the username, and two passwords, just in case you need to change one but have one available at all times.

There are three SKUs for ACR, Basic, Standard, and Premium. As is common, the more resources you need, the higher level of SKU you need. Be aware that those higher levels come with a higher price. However, the great thing about the cloud is that you pay based on consumption instead of a flat fee of the cost for the entire solution, which could be a significant one. I'd like to call out three specific features that are different between the pricing tiers. There are more; check them online if you desire. These three (Storage, Events, and Geo-replication) are the ones worthy of further comment.

Storage is pretty straightforward. Images and containers take space on a hard drive. The more space you need, the higher the tier.

  • Basic provides 10GB.
  • Standard provides 100GB.
  • Premium provides 500GB.

Read and write operations are limited per SKU, as well.

The Events feature of ACR is an interesting one. Though I have not seen this capability on Docker Hub, it may be available with the Enterprise version, but even that won't get too deep into the capabilities of Docker Hub. The interesting thing about the Events feature is that when an image is uploaded or updated, it can trigger an action of some kind. For example, an update to an image or the insertion of a new one could trigger a Logic App or an Azure Function that could then be programmed to execute code. For example, you could enable the copying of site content from a file store to the new image file store or cause the event to write and send a message to Event Hub, Service Bus, or Storage Queue. (Those three products are covered in Chapter 6.) The ability to send a notification to one of those messaging products when a change happens to the ACR image is a cool capability.

Lastly, Geo-replication, which we covered in Chapter 3, replicates your images to other regions. The capability is available in Premium mode only. It is configurable by clicking the Replications link for the given ACR and then choosing which regions you want the ACR replicated into. A benefit is that you deploy once, and that same image is replicated and can be consumed in all regions; this limits the possibility of unintentionally having different versions of an application running at the same time.

Now that the ACR is clear, let's get your hands dirty and create an Azure Container Instance using the image you created or the one that I publicly hosted on Docker Hub (see Exercise 4.4).

Something I found interesting when reviewing the options in the Azure Portal available for ACI was there were not many options. Other than Managed Identity support, which was discussed in Chapter 3, there are not any others worth discussion. Perhaps this is a product in expansion mode or a product that doesn't need any additional features. This means that the image is generally completely packaged and needs no other features because you must consider all the options necessary before deploying the image. Currently, if the application running inside your container is experiencing some unexpected behavior, such as exceptions or performance, due to the product being relatively new, you would have some problems troubleshooting and finding the root cause. (Or perhaps it's just me!) Nonetheless, this entire concept was a black box. Now for me and I hope for you, it is no longer a mystery and all has been clarified. Actually, I found it relatively simple to create, test, deploy, and consume a simple Docker image in a Linux container. That is saying a lot from a Microsoft guy. Someone from the open source community would find it even easier; that's my point.

Orchestration

I often find that when words or concepts are used in technology, they match the meaning when used in a different context. For example, the concept of inheritance in C# can be applied to real life, where properties and attributes are inherited from your parent. Hair color, eye color, and height are all things one would inherit from a parent; the same goes when you inherit from a class in C#.

When I think about the word orchestration, the first thing that pops into my mind is music. In that context, an orchestra is a group of performers who play various instruments in unison, such as strings, woodwinds, brass, or percussion. The orchestra has a conductor who orchestrates or organizes the different components of the orchestra. You could say that the conductor is responsible for the conduct of the orchestra. When you then apply that same model to the concept of containers, the model seems to fit nicely. The concept of orchestration in technology is the management (conductor) of different containers (players) that play different instruments (Windows or Linux containers and images). The conventional aspects of container-based orchestration are the following:

  • Health monitoring
  • Networking
  • Scaling
  • Scheduling
  • Synchronizing application upgrades

Before we proceed into the discussion of those activities, it should be stated again that ACI is the place to run smaller workloads and is best for getting something deployed and running quickly. ACI doesn't provide any orchestration features. When you created the ACI instance, remember that you selected the size of the Azure VM on which it will run. By doing that, you bound your application to that, and there is no automated way to orchestrate that container or multiple instances of that container. The products Azure provides for such orchestration are Azure Kubernetes Service (AKS) and Service Fabric. Refer to Figure 4.2 and you will see that those two products (located toward the bottom of the decision tree) are triggered based on the necessity of orchestration. The point is that ACI is an entry point for using containers on Azure, but if you need greater control and manageability options, then you might outgrow ACI pretty quickly. I will touch on orchestration a little bit more when we cover AKS and Service Fabric later in the chapter, but the activities in the bullet list apply to those products and not to ACI. This just seemed like a good place to introduce this topic.

Health monitoring doesn't need a lot of explanation. When an orchestrator is configured to do so, a service runs that pings the containers to make sure they are still up and responding. If they do not respond or respond in a way that is unexpected, the orchestrator will remove and replace or restart the container. From a networking perspective, you may realize a scenario in which different containers need to communicate with each other. What happens if one of the containers or the host on which a container is running becomes unhealthy and the IP address or location of the container changes? This is what the networking capability of an orchestrator is responsible for—specifically the maintaining and updating the list of containers with location and metadata details. Unlike when you deploy to an ACI and are bound to a single instance of the chosen size, using an orchestrator will allow you to increase the number of container instances and the hardware on which they run based on demand. Of course, you can decrease the consumption when demand slows down, which is very cost effective. Scheduling is the most complicated activity to explain and comprehend, so if you get it, then you are so awesome! But then again, scheduling is just scheduling, and we all have schedules, right? Take a look at Figure 4.18.

Snapshot of a diagram of how the scheduler activity in a containerized orchestration works.

FIGURE 4.18 A diagram of how the scheduler activity in a containerized orchestration works

Consider that you have a large number of hosts, which is synonymous with a virtual machine in this context, and you want to deploy some more instances of a container. Do you know if you have existing capacity on some currently deployed hosts, or do you need a new host? That seems like a pretty complicated piece of information to capture without some help. That help comes from the scheduler. Assume there is only a single instance of an image named csharpguitar running in a container, and you request that two more instances need to be deployed along with two instances of the csharpguitar-aci container images. The scheduler would have knowledge of the current configuration stored in a data source and would make the deployment as required, whether the deployment needs new hosts or if there is enough capacity to run them on existing ones; this is what the scheduler can do. Lastly, the synchronization of application upgrades manages the deployment of the release of new container versions. Some additional capabilities are to avoid downtime and to be in a position to roll back the deployment if something goes wrong.

Azure Container Instances and Docker are young concepts but growing fast on Azure. The consumption rate is one of the fastest at the moment. At this point, you should have a good understanding of what ACI is, what Docker is, and what orchestration is. We will get back to orchestration again later, but let's now move on to Azure Virtual Machines.

Azure Virtual Machines

Azure Virtual Machines is Microsoft's IaaS product offering. Azure VM is by far the most popular and utilized Azure service. This can be attributed to the fact that Azure VM existed before the PaaS, FaaS, or CaaS offerings were available. Some years ago, any company or individual who wanted to utilize compute resources in Azure had only one option to do so, and that option was Azure VM. Even at that early stage of cloud consumption, the savings you could realize by no longer needing to manage the hardware and network infrastructure was great. Recall from Figure 4.1 that networking, hardware, and the virtualization of an instance are owned by the cloud provider when you choose to use IaaS. Also, recall from Figure 4.2 that if your workload is not cloud optimized, if you do not require or desire containerization, and if the workload is of relative complexity, then Azure VM is the place to begin your investigation and consumption. But what is meant by “relative complexity”? Mostly this means that if you need to have some control over the operating system on which your program runs, i.e., you need registry changes, your application instantiates child processes, or the application requires some third-party assembly installation, then it would not work on PaaS, for example. It won't work on PaaS because you have no control over the OS using that cloud model. Also, if you wanted to move an entire application infrastructure that included multiple tiers like web, application, database, and authentication tiers, each of which was running on its own machine, then that would be “of relative complexity” and would best fit in a group of Azure VMs. Figure 4.19 illustrates that kind of architecture.

If someone asked you what a virtual machine was, could you answer that? In your own words, what is a virtual machine? In my words, a virtual machine is a simulated server running on physical hardware that is granted access to CPU, memory, and storage that actually exist on a physical server. There can be many virtual machines on a single physical server. For example, a physical server with 32 CPUs, 128GB of RAM, and 200GB of storage space could realistically host three virtual machines with eight CPUs, 32GB of RAM, and 50GB of storage. The missing one-fourth capacity is necessary to run the OS and programs that manage the physical hardware and the virtual machines. You wouldn't want to allocate all physical resources to run virtual machines, leaving nothing left for the host to use. Related to this, a virtual network is also a simulated network within a physical network, so by understanding the VM concept, you can also visualize the concept of a virtual network.

Snapshot of an example of a multitier application that fits good on Azure VMs (IaaS).

FIGURE 4.19 An example of a multitier application that fits good on Azure VMs (IaaS)

If you read the previous chapter and completed EXERCISE 3.3, then you already have some experience with Azure VM. Take a look at Figure 4.20 and reflect on that exercise; think about if this is what you had in mind when you created your first Azure virtual machine.

As you may have noticed earlier in Chapter 3, there are a number of additional products and features created when an Azure VM is provisioned. Looking again at Figure 4.20, you will notice a few products such as a virtual network and subnet (if one doesn't already exist), a public IP address, a network interface card (NIC), and managed disks. Each of these you should already have a good understanding of except for managed disks. Although managed disks are discussed in more detail later, since the disk roles OS, Data, and Temp are called out in the figure, a description of them is warranted. On Windows, there is a command named diskmgmt.msc that when entered into a command window identifies the partitions and disks that are configured on the server. Execute this on your workstation or on a Windows Azure VM to see which disks currently exist. Compare the different elements and take away some personal learnings from that.

Snapshot of how an Azure VM looks with a NIC and hard drives.

FIGURE 4.20 How an Azure VM looks with an NIC and hard drives

The OS disk is the one that contains the pre-installed operating system that was selected during the creation of the VM such as Ubuntu, Red Hat, Debian, Window Server, or Windows 10. OS disks have a maximum capacity of 2,048GB. A data disk is one that stores application data such as cache, auditing logs, configuration settings, or data in a file or a database management system (DBMS). The number of data disks that can be attached to an Azure VM is based on the size and expected number of input/output operations per second (IOPS). You'll learn more about this later in the chapter, but in short we are talking about a number potentially in the thousands with a maximum capacity of 16,384GB, which is pretty large—actually gigantic. These numbers increase often, so check the Azure documentation for the current maximums.

The temporary disk is a location that is used for storing swap and page files. Those kinds of files exist in the context of managing memory and are used to offload memory from RAM. Recognize that this disk exists and what its intended purpose is, but leave it pretty much alone unless there is a specific reason to change it. I checked using diskmgmt.msc , and the temporary storage drive is mapped to D:/ and has a size of 7GB; on Linux the disk is at /dev/sdb . Take note that this disk is temporary, and you should expect that anything stored on that drive can be removed during updates, redeployments, or maintenance, so again, don't use it or change it unless you have a specific need.

Managed disks are available in the following types: Standard HDD, Standard SSD, and Premium SSD. Choosing between a hard disk drive (HDD) and solid-state drive (SSD) comes down to speed, lifespan, reliability, and cost. SSD is the newest and will outperform HDD in all aspects, but it comes with a higher cost. Choosing which disk type and the number of attached data disks depends on what the requirements of the application are.

Now that we've discussed the basics of what a VM is and the related products and features, let's create a few VMs and get into a little more detail.

Creating Azure Virtual Machines

Before creating and provisioning an Azure VM, there are a few items you must consider, such as its location, its size, the quota limits, and the operating system. As discussed in the previous chapter, the location in which you place your workload should be analyzed from a privacy perspective if the application will store customer or personal data. There are laws that dictate the legal requirements of its management. Knowing the regional laws, the concept of geographies, and which data may be sent to other regions outside of the geography are needed pieces of information and were covered in the previous chapter. In addition, you must confirm that all the products required to run your application also exist in the same region. For example, does your application require zone redundancy or a Cosmos DB? If yes, then you would want to make sure those products are in the region you place your Azure VM.

As you would expect, there is a great variety of available sizes of Azure VMs, aka a VM series. To list the available VM sizes in a given region, you can execute this PowerShell command, for example: Get-AzVMSize -Location "SouthCentralUS" . The VM size (i.e., the number of CPUs and amount of memory) is greatly influenced by the requirements of the application. In addition, there are different sizes available based on the chosen operating system, for example Windows or Linux. As it is not realistic to determine all the possible customer use case scenarios, both size and quota limits will be discussed later when we focus on Windows and Linux Azure VMs in more detail. There are, however, different categories of Azure VMs that can be applied to both Windows and Linux VMs; take a look at Table 4.2 for more details on the classification. There is more to come in regard to what instance prefix means, so please read on.

TABLE 4.2 Azure Virtual Machine Categories

Type Instance Prefix Description
Compute Optimized F Optimal for batch processing of medium-sized applications
General Purpose B, D, A, DC Best for development, testing, and small applications
GPU NV, NC, ND Most useful for video editing and graphics
High Performance H Powerful CPUs with high network throughput
Memory Optimized E, M, D, DS Ideal for in-memory analytics and relations databases
Standard A Small VMs not for production, testing only
Storage Optimized L Useful for applications requiring high disk I/O and Big Data or SQL databases

Also recognize that there are numerous ways to deploy and create an Azure VM. You may remember when you created your first VM that there was a step where you decided which OS image to use from a drop-down. In that drop-down list there were approximately ten of the most common images, like Windows and Linux, some of which have already been mentioned. But as you will experience in Exercise 4.5, there is also a link below that drop-down that leads to hundreds of public and private images to select from. These other images exist in a place called the Azure Marketplace, which is a location for businesses to host their software products for consumption by Azure customers. Microsoft places its software products in the Azure Marketplace just like any other company would do. To learn more about the Azure Marketplace, visit https://azuremarketplace.microsoft.com.

In the previous exercise, you created an Azure virtual machine using a public Azure image and looked through other available images.

docs.microsoft.com/en-us/azure/bastion/bastion-create-host-portal

Using Images

An image in the Azure VM context is similar to the definition of an image in the container context discussed in the previous section. An image is a template that defines the environment requirements in which it will run. From an Azure VM perspective, the big difference is that the operating system is part of the image definition. Review Figure 4.7 if you need a refresher on the differences. There are numerous tools that help you to create an image for deployment to an Azure VM, for example, Azure VM Image Builder, System Preparation Tool (SYSPREP), Disk2VHD, snapshots, and export from VMware, VirtualBox, from an already created Azure VM or Hyper-V (VHDs) from the portal. We won't cover all those tools and options in detail now, but an interesting one is snapshots. We have discussed a bit about managed disks, specifically the OS disk. It is possible to create a snapshot of that disk and then navigate to the disk in the portal where you will find a button on the Overview tab that allows you to export it and then create a VM from that snapshot; there are similar capabilities in on-premise virtualization software as well. The simplest way to create an image of an existing Azure VM is described in Exercise 4.6. You would want to do this after the VM that you provisioned is complete and ready to be shared and used in production. In the following exercise, you will create an image from the VM created in the previous exercise and use it to deploy a new VM.

Now that you have an image that, as in my examp1e created for EXERCISE 4.6, responds with, “Hello from CSHARPGUITAR-SC.” You can use this image you created any number of times as you create redundancy and failover solutions. The image is also available for usage with a virtual machine scale set (VMSS), which will be discussed in more detail later in the chapter. Let's switch gears now and look a little more at the two most popular supported operations systems available for Azure VMs: Windows and Linux. An Azure Solutions Architect Expert must grasp which version of each operating system is supported on Azure. Know, too, that the OS has limits imposed by the subscriptions, CPU, IOPS, RAM, and storage available. You can expect some questions about this on the exam. For example, which of the following types of VMs cannot be deployed to Azure?

Windows

The Windows Server operating system is one of the most used software products for running enterprise-level workloads. In this section and in the following one focused on Linux Azure VMs, we'll focus on the supported OS versions, a subset of recommended VM sizes for those OS versions, and some quota limits specific for the OS and version. Recall from EXERCISE 4.5 that you selected a prebuilt image of Windows Server 2019 Datacenter Server Core from the Azure Marketplace. Take a look at the following 10 supported Windows images available via the Azure Marketplace in the portal:

  • Windows Server 2008 R2 SP1
  • Windows Server 2012 Datacenter
  • Windows Server 2012 R2 Datacenter
  • Windows Server 2016 Datacenter
  • Windows Server 2016 Datacenter - Server Core
  • Windows Server 2016 Datacenter - with Containers
  • Windows Server 2019 Datacenter
  • Windows Server 2019 Datacenter Server Core
  • Windows Server 2019 Datacenter Server Core /w Containers
  • Windows Server 2019 Datacenter with Containers

There are some recommended sizes of an Azure VM based on each of these operating system versions. The recommendations are arranged based on the categories that were presented previously in Table 4.2. Therefore, it is important to know into which category your application falls. Table 4.3 provides a list of recommended Azure VM sizes based on the selected workload category. The values in the Instance/Series column, in my example, DS1V2, represent the grouping of many compute components such as CPU speeds, CPU-to-memory ratio, temp storage ranges, maximum data disks, IOPS, and network bandwidth throughput. The ratio between each of those components is different for each series and therefore has an intended target application category. You can find the specifics of each Windows OS Azure VM here:

docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-memory.

As an Azure Solutions Architect Expert, you would be expected to recommend the category and VM series when given the specifications of the workload being deployed onto an Azure VM.

TABLE 4.3 Windows Versions to Azure VM Size Recommendation

Windows Version Category Instance/Series
2008 R2 SP1
2012 Datacenter
2012 R2 Datacenter
General Purpose DS1V2, DS2V2, D2SV3, D1V2, D1, DS1, DS2
Memory Optimized DS11V2, E2SV3, DS11
2016 Datacenter *
2019 Datacenter *
General Purpose DS1V2, DS2V2, D4SV3, D1V2, D1, DS1, DS2
Memory Optimized DS11V2, E2SV3, DS11

If you see an asterisk (*), it symbolizes all variants of the Windows versions. As covered later in the “Migrating Azure Virtual Machines” section, it is possible to bring your own on-premise VM to the Azure platform. This means that almost any OS and its configuration that you can export and convert to the virtual hard disk (VHD) format can be deployed to an Azure VM. That sounds a bit easier than actually doing it. Sure, you can deploy a VM running Windows Server 2003; however, some capabilities such as cluster failovers, Azure VM agents, or VM extensions will not work or be supported. Some other common Windows Server features that are not supported on Azure are included in the following list:

  • Wireless LAN Service
  • SNMP Services
  • Dynamic Host Configuration Protocol (DHCP)
  • Storage manager for SAN
  • Hyper-V
  • DirectAccess
  • Network Load Balancing
  • BitLocker Drive Encryption
  • RRAS
  • Rights Management Services
  • Peer Name Resolution Protocol
  • Windows Deployment Services

A primary reason for limits and quotas is to prevent the accidental consumption of resources that results in a high, unexpected charge. Most of the limits are considered soft, which means if you need more, then you can get more by contacting Microsoft and requesting more. These limits are typically bound to a subscription or a region. I have seen many customers creating multiple subscriptions or deploying to multiple regions to get around the soft limits instead of realizing that the limit, which is there for protection, can be increased. Contacting Microsoft to get the soft limits increased would make managing your workloads on Azure more intuitive. Moving resources to other subscriptions or regions when those resources are all part of the same solution makes things harder to manage. There are, however, some hard limits, imposed on almost all customers that you must adhere to. Those numbers are big, and most companies wouldn't need (or couldn't afford) so much. They also are not always documented. This is because if the hard limit isn't hard-coded into the product, then it could be increased even more than a documented limit, if the business case is justified. Table 4.4 describes some subscription limits that are related to Azure VMs.

TABLE 4.4 Azure VM Limits

Azure Resource Soft/Default Limit Hard/Max Limit
Virtual machines 25,000 per region Contact Microsoft
Virtual networks 100 per region 1000 per region
Managed disks 50,000 per region Contact Microsoft
Storage accounts 250 per region Contact Microsoft
Virtual machine cores 30 per region Contact Microsoft

Also note that these limits change often. The ones listed in Table 4.4 and most of the numerical and relationship limits existed when the Azure Solution Architect Expert exam was created. Learning these will help you answer any question on the exam regarding this topic. By relationship matching I am referring to Table 4.3 where the links between OS version, category, and VM size are displayed.

The limit of 30 cores per region in Table 4.4 refers to a specific scenario where the limit applies to specific instance/series types. For example, you cannot have more than 30 cores of series A and D VMs in the same region. Additionally, there is a limit of 30 with the Dv2 and F series VMs in the same region. You can, however, have 10 cores of A1 and 20 cores of D1 in the same region equaling 30. You could also have 30 cores of D1 and 30 cores of F1 in the same regions because the instance/series are not grouped together in the limits logic. This is an important point to know when using IaaS; however, I wouldn't expect such a question on the Azure Solutions Architect Expert exam, so just keep it in mind as you progress to being not only a certified Azure Solutions Architect Expert but also a tenured and highly competent one.

Linux

Linux is rapidly becoming the most utilized operating system for running small to medium workloads. Microsoft is helping to support that growth by providing the means for its simple implementation onto the Azure platform. As you have already built an Azure VM, you know how easy that was. The only difference from the previous exercise where you built the Windows VM is that you would choose a Linux image from the drop-down box instead of a Windows one. The other steps are the same; there is no apparent attempt to make deploying Windows VMs easier than Linux. The image defaults to a Ubuntu image. Currently there are six flavors of Linux OS offerings in the Azure Marketplace.

  • Ubuntu Server
  • Red Hat Enterprise 7
  • Container Linux by CoreOS
  • Clear Linux OS
  • SUSE Linux Enterprise
  • Debian

Table 4.5 lists the Azure Marketplace images available for each of those flavors.

TABLE 4.5 Azure Marketplace Linux Versions

Linux OS Linux Version
Ubuntu Server 14.04 LTS, 16.04 LTS, 18.04 LTS
Red Hat Enterprise 7.2, 7.3, 7.6
CoreOS Alpha, Beta, Stable 7.5
Clear Linux OS Basic, containers, machine learning
SUSE Linux Enterprise 15, 12 SP4
Debian 8, 9

There is a concept referred to as blessed images or endorsed distributions in the context of Linux. As stated earlier, it is possible to build any machine using any OS and configuration and attempt to deploy it to Azure. The key word there is attempt. There are so many possible configurations that it would be inconceivable at this time to have 100% coverage. I once tried unsuccessfully to deploy a Linux OS that was not blessed. When you deploy and experience some kind of issue, the place where you tend to turn to is the Serial Console. The Serial Console lets you make a hardware connection to the VM instead of using SSH, which requires a network connection. The Serial Console is listening on a virtual console named tty0 by default. The Linux image I was deploying was configured to listen on tty1, and I couldn't connect to it. Only by chance was I able to figure that out, but this is an example of what happens when you do not use an endorsed or recommended image. There are many “one-offs” that can occur and delay your deployment or, worse, completely prevent it. It is therefore most prudent that your application targets one of the Azure Marketplace images; however, there are more Linux flavors that are considered blessed and endorsed that do not exist in the Azure Marketplace. They are listed in Table 4.6.

TABLE 4.6 Additional Endorsed Linux Flavors

Linux OS Linux Version
CentOS 6.3, 7.0
CoreOS 494.4
Debian 7.9, 8.2
Oracle Linux 6.4, 7.0
Red Hat enterprise 6.7, 7.1, 8.0
SUSE Linux Enterprise SLES for SAP, 11 SP4, 12 SP1
openSUSE Leap, 42.2
Ubuntu 12.04

As with the Windows machines, the recommendations for the VM sizes are based on category and the operating system version. Please find the VM size recommendations per Linux OS flavor in Table 4.7. For greater visibility into the details of the Linux VMs instances, take a look at https://docs.microsoft.com/en-us/azure/virtual machines/linux/sizes-memory. As the amount allocated for compute resources increases, so does the cost; therefore, choosing the right size is important.

TABLE 4.7 Linux Versions to Azure VM Size Recommendation

Linux version Category Instance/Series
Ubuntu
CoreOS
General Purpose DS1V2, DS2V2, D2SV3, D4SV3, D1V2, D1, DS1, DS2
Memory Optimized DS11V2, E2SV3, DS11
Red Hat Enterprise General Purpose DS2V2, D2SV3, D4SV3, D2V2, D2V3, D2, DS2
Clear Linux OS General Purpose D1, D3, DS3, DS4
SUSE Linux Enterprise General Purpose DS1V2, DS2V2, D1V2, D1, Ds1, DS2
Memory Optimized DS11V2, DS11

There are no limits or quotas that focus specifically on Azure VMs running Linux; they are the same that were covered previously in Table 4.4. Microsoft fully supports Linux, and there are no policies or practices that knowingly inhibit this operating system.

Azure VM Extensions

Extensions are small programs or automation activities that are helpful for performing post-deployment tasks, security, monitoring, and automated deployments. If you were curious in the previous exercises where we created the Azure VM, there was a tab name Advanced, and on that tab there was a section that allowed you to select an extension. In the exercises I usually skip over those tabs, but you may have looked at them and wondered what all those features are. Consider accessing the portal and simulating the creation of a new Azure VM. First, notice that the list of installable extensions is different when selecting a Linux or Windows-based VM, as well. The region plays a role in the extension list, so again, here is another example of knowing what capabilities are available in each region prior to committing to one.

For Windows there are some nice extensions such as the PowerShell Desired State Configuration extension that will help in post-deployment activities to make sure the VM is configured in the same way in every case. This is important once your workloads get rather complicated and require automated deployments, which are discussed in more detail in the next section. There are anti-malware, cloud security, and other security-related agents that can be deployed, configured, and run on your Azure VM as an extension. When you create your initial Azure VM, you configure all these environment-specific capabilities and then capture your image for use with later automated or manual deployments.

Microsoft provides a lot of monitoring capabilities; however, it fully supports other companies with more specific monitoring capabilities through this extension feature. Some third-party monitoring products available for installation are Datadog, APM Insight, and Dynatrace. Monitoring is covered in more detail in Chapter 9 but will focus on the Azure platform–based capabilities and not third-party extensions in IaaS. If you have an interest in learning more about these extensions, check out this online document:

docs.microsoft.com/en-us/azure/virtual-machines/extensions/overview.

Automated Deployment

Deployment and migrations are covered in Chapter 8, which will target ARM and code deployments (aka content deployments). As you already know, there are many ways to deploy an application and many components that need to be deployed to make it work. If any portions of those deployment tasks can be automated, it decreases the amount of required effort. Consider in many of the previous exercises, after the provisioning of the Azure VM was complete, you were requested to connect via RDP or Bastion to the server and manually install IIS using some PowerShell cmdlets. That is an acceptable approach if you have only one or two servers to deploy that need IIS; however, if you were to deploy 50 or 100, then that option really isn't worth considering. It is not realistic to manually log in to 50+ servers and make manual configurations to each of them. EXERCISE 4.6 is a similar approach to realize the same outcome, where you create an image and use it as the baseline for all future deployments. Using automated scripting is another option to consider and is useful when you deploy your Azure VMs with PowerShell as well. There are even scenarios when a combination of both these capabilities add great value.

An example of a scenario where both a custom image and an automated deployment script are useful is when there is no public image that has the required utilities installed to run your script. For example, if you wanted to run an Az PowerShell cmdlet, then the image must have those cmdlets installed prior to executing. This currently requires that the following PowerShell installation command be run first; you may remember this from the previous chapter.

Install-Module -Name Az -AllowClobber -Scope AllUsers

As mentioned in the previous section, there is an option on the Advanced tab of the Azure VM creation blade called Extensions. Clicking Select An Extension To Install opens a window to select an extension to install. The one used for executing custom scripts is named Custom Script Extension for Windows and Custom Script for Linux. When building a Windows OS VM, you can save the following PowerShell cmdlet to a file named, for example, iis.ps1 and upload it when configuring the build properties of the Azure VM in the portal.

Set-AzVMExtension `
    -ExtensionName IIS `
    -Publisher Microsoft.Compute `
    -ExtensionType CustomScriptExtension `
    -TypeHandlerVersion 1.4 `
    -SettingString '{"commandToExecute":"Add-WindowsFeature  `
                   -name Web-Server -IncludeManagementTools"}'

Then, once the Azure VM is created, IIS will be installed using an extension. You can also install, for example, SQL Server and the .NET Framework using extensions. From a Linux VM perspective, in addition to the extensions, there is also the feature on the Advanced tab to implement cloud-init scripts, which can configure users and security and install software packages. The creativity customers and developers have within this tool is greatly varied, and the wonderful point about Azure is it provides a platform to realize that creativity. I recognize this section is a bit abstract. I simply point out this feature as an option to pursue and consider when you are deploying your application to an Azure VM.

You should now have a good understanding of how to create an Azure VM whether it be Windows or Linux. You should also, if asked, know which versions of those operating systems are endorsed and what trying to deploy an image that is not endorsed could entail. You should also know what the different categories/series of VMs mean, such as memory optimized and general purpose. Given a table that shows a list of different VMs with OS, CPU requirements, and memory requirements, you need to know which ones are endorsed by Azure and if the requested resources breach any quota or resource limits.

Azure Dedicated Hosts

When you provision an Azure virtual machine, you receive the compute from a pool of existing virtual machines running on a host. The available pool of compute capacity is commonly used by all Azure customers. Be confident, however, that there is no content or configuration that remains behind after the deallocation occurs. The virtual machine is completely cleaned before being placed back into the pool of available resources. If you wanted or needed all of your virtual machines to run on its own host aka physical machine, which was not deallocated or provisioned from a pool of shared compute resources, then you can choose an Azure dedicated host. Visit this site for more information about this product offering:

docs.microsoft.com/en-us/azure/virtual-machines/windows/dedicated-hosts.

The cost of the Azure dedicated host is more than running in the shared hosting environment. This is because you would be charged for all the compute power available on the host instead of only the consumed compute power of the host. An advantage of using an Azure dedicated host is the ability to control any infrastructure change that may impact your provisioned resource such as infrastructure or networking kinds of changes. Azure dedicated hosts are not available with VM scale sets.

Managing Azure Virtual Machines

After your Azure VMs are provisioned, it's support time. You should already have solid knowledge about creating Azure VMs. Now it is time to learn some activities that can be done after their creation. This doesn't necessarily mean that the Azure VMs are in production and being actively consumed; rather, it means that you may encounter a scenario in which one of more of them requires a rebuild, a reconfiguration, or a redesign. This section will focus on networking, maintenance, cost and sizing, storage, managed disks, disaster recovery, and backup activities.

Networking

If you followed along with the previous chapter, you are competent from an Azure networking perspective. Even if you didn't complete the previous chapter, the following networking tips may come in handy at some point. The focus of these PowerShell cmdlets is to provide insights into how your network is configured. These are helpful in case you need to find out the cause of unexpected behaviors or transient outages. It would be possible to capture the same information from the portal, but in some cases getting a holistic view of what is going on can be achieved better through running some PowerShell cmdlets. Prior to running PowerShell cmdlets, remember that you must authenticate and then set the focus to a specific Azure subscription, as shown in the following code snippet. From now, I will expect you know this step and will not mention it anymore.

Connect-AzAccount
$subscription = Get-AzSubscription -SubscriptionId "#####-####-###########"
Set-AzContext $subscription

The following is an example of a PowerShell cmdlet. It lists all the network security groups (NSGs) in a given resource group. It then cycles through all the NSGs and dumps out the NSG name, the direction of the rule, the rule name, and the inbound port. This would be helpful just to get a quick understanding of all the different NSG rules you have in your resource group. The output might resemble something like Figure 4.22.

$NSG = Get-AzNetworkSecurityGroup -ResourceGroupName <Resource Group Name>
 foreach($nsg in $NSG)
 {
  
     $securityRules = $nsg.SecurityRules
     foreach($rule in $securityRules)
     {
         $nsg.Name + ": " + $rule.Direction + " - " + $rule.Name + " - " `
                   + $rule.DestinationPortRange
     }
 }
 
Snapshot of PowerShell output of NSG details per resource group.

FIGURE 4.22 PowerShell output of NSG details per resource group

The following are some other helpful PowerShell cmdlets:

  • Get-AzVirtualNetworks
  • Get-AzVirtualNetworkSubnetConfig
  • Get-AzNetworkInterface

There are many, but those are the ones that are most useful. They will require some customization similar to the code snippet previously shown prior to Figure 4.22. The output of those PowerShell cmdlets is often large JSON-formatted documents that contain all the details of the network, subnet, and network interface. Much of the information is unnecessary and can be filtered out with a little creative PowerShell scripting. It is a powerful tool with a large open source set of cmdlets at your disposal.

Maintenance

There is no way around it; once you deploy your workloads to an Azure VM, you can't just walk away from it and forget it. Like a car or your garden/yard, it needs some ongoing attention. Some details you would want to be aware of fall into, but are not limited to, these areas:

  • Starting, stopping, and deleting
  • Resource locks
  • Resizing
  • Managing Windows Updates
  • Viewing boot diagnostics

In the next section, we'll go into more detail about stopping and starting VMs, but it's most optimal in this cloud service model that you turn off VMs that you don't need. There are numerous ways to achieve this; one simple way is to execute the PowerShell cmdlet Stop-AzVM -ResourceGroupName <name> -Name <VM Name> -Force to stop a VM or use Start-AzVM -ResourceGroupName <name> -Name <VM Name> to start one. Stopping and starting VMs can also be achieved via the Azure Portal or via an RDP, Bastion or SSH session directly on a VM. It is also possible to use Remove-AzVM -ResourceGroupName <name> -Name <VM Name> to delete a VM. The Remove-AzVM cmdlet can have some significant impact if the VM performs a critical function in your solution. RBAC controls can be used to restrict this kind of activity based on individuals or groups. If you wanted to make sure that no one, no matter what, was allowed to delete a resource or any resource group regardless of RBAC restrictions, then there is a feature called resource locks that will handle that requirement. To test a resource lock, complete Exercise 4.7.

You may have noticed in Figure 4.23 that there were links to the resource group and subscription next to the + Add button. In this context, on the VM blade, you can only add a lock on the given resource, i.e., the Azure VM. However, clicking those other links will show you a list of locks that exist in that resource group and subscription. If you wanted to apply a lock on a resource group or subscription, you would need to navigate to that resource’s blade and click the Lock link on that resource. As you would expect, locks placed on the parent will be applied to the child, in that a lock on a resource group will apply to all resources within it. Additionally, the most restrictive lock is the one that is applied if there are locks in both the parent and the child. For example, the drop-down list in Figure 4.23 included not only Delete but also Read-Only. If there is a read-only lock placed on a resource group and a delete lock on an Azure VM within the resource group, then the delete lock is the one that is respected, as it is more restrictive. Refer to Chapter 2 where we discussed scopes if you need a hierarchy refresher in regard to management groups, subscriptions, resource groups, and resources, as this concept applies to the scope model discussed here too.

From a read-only perspective, the meaning here is that resource modifications are read-only, not the operation on the resource. Two examples, if there is a read-only lock placed on a VM, then the size, disks, configuration, or auto-shutdown schedule cannot be changed. However, if someone RDPs or SSHs to the VM, that person will be able to change and/or remove content from the VM itself. Assuming there is a managed SQL Server instance on the VM, the data in the database would be changeable, and the read-only setting applies only to the changes that occur on the VM itself via the portal or other supported client.

The next maintenance-related activity has to do with resizing. Any financially minded person wants to spend the exact amount required to get the job done—nothing more, nothing less. This holds true when choosing an Azure VM because the cost is directly related to the size, i.e., how much compute you get. Starting off small and then growing is an option because resizing is not so complicated. When you create an Azure VM, a default size is selected (for example, D2SV3), but there is a link under it that allows you to change the size. If you decide to keep that size and later determine you need more compute power, there is a link in the navigation bar for the Azure VM named size. Clicking that link opens the Size blade and will show you the existing size and a list of other options, as shown in Figure 4.24.

Snapshot of listing different Azure VM sizes.

FIGURE 4.24 Listing different Azure VM sizes

Simply select the desired size, and click the Resize button, and the Azure VM will be scaled up to that size. Note that changing the size will result in a restart of the application, which should be expected since the configuration of the compute power associated to the VM is a significant altercation. It is also possible to make the same change using PowerShell. Execute the following cmdlets and view the output in Figure 4.25:

Get-AzVMSize -ResourceGroupName "<RG-NAME>" -VMName "<VM-NAME>"
 $vm = Get-AzVM -ResourceGroupName "RG-NAME" -VMName "<VM-NAME>"
 $vm.HardwareProfile.VmSize = "Standard_DS3_v2"
 Update-AzVM -VM $vm -ResourceGroupName "<RG-NAME>"
 

The first PowerShell cmdlet lists all the possible VM sizes available for the Azure VM in that region. This is helpful for finding out not only the options you have but also the nomenclature (its name) you need later once you decide on the size. The options in the Azure Portal are a bit more restrictive; you will get an unfiltered list using PowerShell. The next lines of the PowerShell script get the VM into a PowerShell object, set the VmSize , and update it. Wait some time, and then the workload you had running on that VM will be scaled up to the newly allocated compute series. Just as an extra test, update the resource lock you created in EXERCISE 4.7 to read-only and try the same update process discussed just now. It will fail, because the configuration options for the VM would then be read-only and cannot be changed.

Snapshot of resizing an Azure VM using PowerShell.

FIGURE 4.25 Resizing an Azure VM using PowerShell

Let's shift gears a little bit and consider updates. One of the major responsibilities that you have when choosing IaaS is the management of the operating system. This means you need to schedule and manage security patches, bug fixes, and hot fixes (i.e., from a Windows perspective KB patches). There is a product called Update Management that will help you with this. Update Management is used in collaboration with Log Analytics, which is discussed in Chapter 9. Once you configure Update Management, it will perform an analysis of the targeted VM and provide details about any missing updates. This works on both Windows and CentOS, Red Hat, SUSE, and Ubuntu Linux VMs. If any updates are found to be missing, there is an additional feature found within the capabilities of Update Management that allows you to schedule an update deployment. The update can be scheduled to run once or be recurring, and you can configure whether to reboot after the update is applied or not. That is an important option, i.e., whether to allow a reboot. Early in my IT career there was some high risks involved in installing operating system patches. On numerous occasions the patch simply wouldn't install and killed the server, or the patch installed and went down for a reboot and never came back up. In both scenarios, the only option we had was to rebuild the entire server, which was the easy part. Installing, configuring, and testing the freshly built application was the hard part. I am so thankful for images, backups, and deployment slots that can now save me many hours, late nights, and weekends. The experience is to simply point out that selecting the reboot options and automating OS patch installation needs to have some thought about rollback or troubleshooting scenarios. There is another feature for Azure VMs called boot diagnostics that can help if after an update is installed the VM doesn't come back up or simply for any reason after a reboot the VM is hanging.

To keep your head from spinning too much, an overview of what is going on in your subscription and resource group can help to give some clarity. Those monitoring topics are covered more in Chapter 6 where we cover compliance topics and in Chapter 9 when we cover monitoring. However, the Inventory link on the Azure VM navigation menu lets you enable change tracking on the VM. Once it's configured, you can get an overview of what software, files, Windows registry, and Windows services have been added, changed, or deleted. For Linux VMs, the software, files, and Linux daemons are monitored for changes. I am confident you will agree that knowing what is happening on your machines is helpful toward their maintenance and support. If something stopped working, you could look on the Inventory blade and check whether something has changed without needing to RDP or SSH to the VM and looking manually.

Finally, if you recall from the numerous times you have created an Azure VM, on the Management tab there is a feature named Boot Diagnostics. It is enabled by default and stores some helpful pieces of information. A helpful one is that it captures a screen shot that may be a BSOD, or for Linux there may be some text shown on what you'd normally see on a monitor when running on-premise and directly connected to the machine. There is also a Serial log, which provides some potentially helpful logs with errors and exceptions that may lead to a root cause and a fix. Another useful tool is the Serial console, which provides a COM1 serial connection to the VM. I mentioned earlier about tty0; this was the place where I was working when failing at the deployment of an unblessed Azure VM image. I was using the boot diagnostics and the Serial console (which I couldn't connect to) trying to get the Azure VM to work. Both of those features are useful for maintenance and troubleshooting efforts.

Costs and Sizes

Choosing the right compute size is important because the cost is static for an Azure VM, unlike when running a consumption mode Azure Function, for example. You do not want too much, nor do you want too little, but hitting the precise size from the beginning can be challenging. If you decide to start small and then increase or decrease as you go along, that is a good plan. The Azure VM resizing capabilities as shown in Figure 4.25 can help you proceed with that approach. Also, starting and stopping the VM when it is not being used is also a means to reduce costs. There are numerous ways to stop and start a VM; you know already that you can use PowerShell cmdlets Start-AzVM and Stop-AzVM to achieve that. However, in that context, you need to watch out for a few things. As shown in Table 4.8, there are numerous Azure VM power states that you need to understand.

TABLE 4.8 Virtual Machine Power States

Power state Detail
Deallocating The VM is in the process of releasing allocated compute resource.
Deallocated Allocated compute resources are no longer consumed.
Stopping The VM is in the process of being shut down.
Stopped The VM is shut down; compute resources remain allocated.
Starting The VM is in the process of being activated.
Running The VM has been started and is ready for consumption.

The important point in those VM power states is that if you simply stop the VM, you will continue to incur a charge because the compute resources remain allocated to the VM. You need to make sure that you deallocate the VM if it is no longer needed. Deallocation is achieved when you use the Stop-AzVM PowerShell cmdlet or click the Stop button on the Overview blade in the portal. If you want to use the Stop-AzVM PowerShell cmdlet but not deallocate the virtual machine, pass the -StayProvisioned parameter along with the command. By contrast, if you have an RDP connection to a Windows VM and click the shutdown button, the VM is not deallocated; it is simply stopped. An important note is that when a VM is deallocated, the Public IP address is placed back into the pool, and when you start it back up again, it is likely that it has a different public IP address. That has an impact if your solution has any dependencies on that IP address. You will notice that when you click the stop button in the portal, you get warned of the loss of the IP address and are asked if you want to keep it. If you choose to keep it, the IP address is changed to a static IP address, and there is a cost associated with that. The default public IP address setting is dynamic, which is why it can change when the VM is deallocated.

You may have also noticed during the creation of an Azure VM in the portal that on the Management tab there is an option called Auto-Shutdown. This is configurable during and after the creation of the VM. This feature allows you to deallocate the VM at a given time each day so you do not have to worry about forgetting to stop it all the time. Additionally, you can provide an email address to be notified when the auto-shutdown feature is executed; it includes the name of the VM it ran on as well. Up to now we have focused on Windows VMs; in Exercise 4.8, you will create a Linux VM and execute some Azure CLI commands using https://shell.azure.com.

It is possible to install Azure CLI onto your workstation and use it remotely, that same as with PowerShell, but I wanted to show you this Azure Cloud Shell feature. One advantage and something I like about it is that I do not have to constantly execute Connect-AzAccount to get logged in. I am prompted for my credentials when I first access the site. Another experience you may have noticed is that az vm stop doesn't deallocate the VM like Stop-AzVM does; you must instead use az vm deallocate . That is an important point to be aware of, and it is mentioned when executing az vm stop . However, had you run this as an automated script, perhaps it could have been missed.

When all is said and done and you no longer need the allocated Azure compute power, you can simply remove the resource group, and all of its contents are removed. This is a good reason to keep all your resources grouped together in a resource group when you are testing or developing. Have all the resources being used for a given project or application so you know what the provisioned resources belong to. There is a concept called tags that we will discuss in Chapter 6 that provides a similar capability, but for now, put every related Azure resource into a resource group, and when you're done, run Remove-AzResourceGroup -Name <Name> - Force in PowerShell or az group delete --name <Name> --no-wait --yes using Azure CLI.

Finally, when you have completed the project and the resources can be deleted, you can check the bill. The bill includes the ability to download a CSV file that contains a daily breakdown of resource usage and the associated charge. This is useful not only in hindsight but also to check and see whether there are any resources consuming Azure compute without any real purpose. You could then remove them. To get that report, navigate to the Subscription blade and select the desired subscription. In the Billing section, select Invoices, which renders the option to download the Usage + Charges report, similar to that shown in Figure 4.27.

Snapshot of Download Azure consumption charges from Azure Portal.

FIGURE 4.27 Download Azure consumption charges from Azure Portal

It is important to keep an eye on costs, not only from an Azure VM perspective but from a subscription-wide perspective. I sit next to some Microsoft employees who work on the Subscription and Billing team and hear how customers get a shock with the bill and try to reason their way out of it because they didn't expect it to cost that much. I don't get to hear or see the result, but I don't think the charge gets forgiven 100% of the time. So, be careful and keep a close eye on this.

Managed Disk Storage

A managed disk is a virtual hard disk (VHD), and whether you know it or not, you have already created many of them. Each time you created an Azure VM you used the Disks tab where you could select the OS disk type from a drop-down list, the option to add one or more data disks, and an Advanced section. Azure managed disks are not physical disks that use, for example, Small Computer System Interface (SCSI) or Integrated Drive Electronics (IDE) standards to connect a hard drive to a physical server. Instead, the managed disk is a construct of a page blob in an Azure Container Storage account. If you by chance do not know what a blob is, consider it to be a file (for example, DOCX, PHP, VHD, or PNG) that has content like text, source code or an image contained within it. Instead of saving the content of the file into a database, for example, you would store the entire file as a blob type. Data and storage concepts are covered in Chapter 5, so we won't go too deep into the topic now. Instead, just know that the hard disk really is a VHD, and the internals of how that works can be left to the Azure platform and accepted as part of the service in which the platform provides.

The alternative to a managed disk is called an ephemeral disk, which is found on the Disks tab during the creation of the VM. Expand the Advanced contents to see the option to create one. As you can see in Figure 4.28, an ephemeral drive is connected to the local virtual machine (VM) instead of the abstracted page blob. There are specific use cases for choosing this kind of drive; for one, ephemeral disks are free, unlike managed disks. They also perform better if your workload needs to be reset or reimaged, which makes sense because the disk is physically connected to the VM. Also, if your workload is stateless, you might consider an ephemeral disk. Considering both managed and ephemeral disks, it would mean that your VM can come up, go down, and be reimaged often with no impact on the application or your customers. If that is the case, then the ephemeral drive type might be an option. Most use cases would fit best on managed disks.

Snapshot of an advanced option to choose ephemeral drives instead of managed.

FIGURE 4.28 An advanced option to choose ephemeral drives instead of managed

Managed disks have a 99.999% availability threshold, and you can currently create up to 50,000 disks per subscription per region. Five nines is achieved by duplicating your data on the disk to three different instances. All three disks would need to fail before there would be an outage, which is unlikely. Managed disks are deeply integrated with VMSS, which is discussed later and provides protection against impact because of a segment failure from within a data center (you'll learn more about that later in the chapter). Managed disks support Availability Zones, which were covered in the previous chapter and can be encrypted. Remember encryption at rest from Chapter 2? Table 4.9 provides more details about managed disk types.

TABLE 4.9 Managed Disk Types

Offering Premium SSD Standard SSD Standard HDD
Disk Type SSD SSD HDD
Maximum Size 32,767GB 32,767GB 32,767GB
Maximum IOPS 20,000 6,000 2,000
Max Throughput 900 MB/s 750 MB/s 500 MB/s
Usage High-performance productions Web servers, dev and test Backup, noncritical

The managed disk types in the table should be familiar to you; they were the options in the drop-drop list for the OS disk type on the Disk tab when creating an Azure VM in the portal. As a side note, notice that the maximum size of the disks is 32,767GB, which is 215. There is also an Ultra Disk type that is currently limited to a few regions, and the max size of that is 65,536, which is the number again that I called out specifically in the previous chapter, 216. When I see things like that, it makes me feel like there was some real thought behind them. Let's get back to the topic at hand; up to now we have only added an OS disk to the VM, which resulted in a disk configuration like that shown in Figure 4.29. This is displayed after connecting via RDP to the VM and running diskmgmt.msc .

Snapshot of the default disk configuration on an Azure VM.

FIGURE 4.29 The default disk configuration on an Azure VM

In Exercise 4.9 you will add a data disk to an Azure VM. If you are not clear on the different types of disks available to a VM, refer again to Figure 4.20.

During the creation of the data disk, you may have noticed a drop-down next to the Storage Type option. The drop-down contained three options; we chose None as it was the default, meaning we simply want a blank disk. There was also an option to build the disk from a Storage blob, which was mentioned earlier. This means you can make a VHD of any VM, store it in a blob container, and then reference it as the source for a managed disk. That is cool, and you'll learn more about it in the next chapter. The final option was Snapshot.

Think about what a snapshot is in the real world. When you take a picture of something, the state of the subject of the picture is frozen in time and won't change. Similar to a custom VM image, which you created in EXERCISE 4.6, a managed disk snapshot is a copy of the contents of the disk at a point of time that can be used as a backup or as a means to build another identical VM for troubleshooting. If you navigate to one of the managed disks you created, you will notice the + Create Snapshot link at the top of the Overview blade for the selected disk. After you create a snapshot, if you do EXERCISE 4.9 again and select Snapshot instead of None as the storage type, then the one you created is listed in the drop-down.

Backup, Redeploy, and Reset Password

There is an entire chapter, Chapter 9, that covers monitoring and recovery, so only some basic concepts will be discussed here. If you recall from Chapter 1 and Chapter 2, we have already discussed Azure Site Recovery. We will again touch on it with a comparison with the Azure VM backup feature, but the real, in-depth coverage is in Chapter 9. The specific topics covered in this section are provided in the following list:

  • Backup and redeploy
  • Azure Backup versus Azure Site Recovery
  • Resetting the password

The whole point to backing up data and software is to protect your intellectual property and your business. I cannot quote any specific incident, but imagine you have a program with 100,000 lines of code that cost the value of 2 million hours of labor. Imagine too that the data it accesses and analyzes is on the same server and is a set of values captured over the course of the last 10 years. Imagine there is neither a backup of the source code nor the data. That's a scary thought. What happens if the server crashes and you cannot get any of that data or code. If this happens, then the business disappears. This is an avoidable situation if you simply back up. A less impactful scenario discussed in the previous section is what can happen after an update to the operating system that requires a reboot. When you configure Update Management to install updates, you might consider scheduling a backup some hours or a day before the scheduled patching process. There is a Backup link in the navigation menu on the Azure VM blade that allows you to schedule a backup based on a configured policy. The policy can be based on daily or weekly frequencies. Once configured, there is a link at the top of the Backup blade, which allows you to back up manually. In the background, the backup creates and stores a snapshot that can be used later to provision a new Azure VM.

The Redeploy navigation menu item is a bit different than you might initially think. This can be easily misunderstood to mean that it is the feature used to rebuild a VM with a snapshot or a backup. However, when you click the menu item, it becomes quickly apparent that instead of recovery, the feature moves the application to a new host. That concept is interesting because it is only possible because the managed disks are not physically attached to the VM. It is therefore possible to attach those disks to another VM and boot it up, almost in real time, which is amazing. That is what happens when you redeploy and this model is used with VMSS, PaaS, and FaaS. Take note that with ephemeral disks, this kind of redeploy action is not possible because the disks are attached, but as mentioned, the reimaging is faster. You just need to determine which is best for your given use case.

The scenario of losing your business because you don't back up or haven't backed up should give an Azure Solution Architect Expert goosebumps. If I were to reflect on the model being proposed in this book (security, network, compute), I think the next consideration would be a business continuity and disaster recovery (BCDR) plan, mentioned in Chapter 1. That leads to the comparison between Azure Backup and Azure Site Recovery. From a backup (Azure Backup) perspective, your context is singular and granular. Where you are backing up a VM and then recover from that backup, you are thinking about files, machine state, or specific folders. You can see that in practice when you click the Backup navigation item for a VM versus when you click the Disaster Recovery Navigation menu item. The Disaster Recovery menu item is the entry point into Azure Site Recovery. Azure Site Recovery is, in old terms, disaster recovery (DR). DR means something big happened, everything on the machine is permanently gone, and likely everything in the data center will be down for many hours. Having a BCDR plan or contingency plan is built upon Azure Site Recovery; jump to Chapter 9 if you urgently need to learn more about that.

Migrating Azure Virtual Machines

Migration is covered in detail in Chapter 8, but it is worth a mention here in the VM context. VMs are the most common cloud service model that is “migrated” to Azure. That's either because, like I mentioned, it was the first offering from Azure or it is an entry point for enterprises to deploy their existing applications that require virtual networks and have multiple tiers of compute requirements. Refer to Figure 4.19 if you don't know what I mean by multiple tiers. In this section we will touch briefly on the following migration topics:

  • Migrate from on-premise
  • Change regions, zones, subscriptions, or resource groups

Migrate from On-Premise

First, it would be prudent to know what is meant by migration. Say you have a website running on a virtual machine constructed using Hyper-V. Hyper-V is Microsoft's virtualization offering that offers the same capabilities as VMware. The website could also be running on a physical server where no virtualization software is in place. Migration means that you want to move the workload from one place, usually in an on-premise data center, to an Azure VM. You would want there to be an automated process to achieve that. It should be an automated process that avoids the manual rebuild and reconfiguration of the server and application, a simple cut-and-paste scenario if you may. There are some advanced capabilities that help streamline a migration, but unfortunately they are not as simple as cut and paste. It is, however, as simple as preparing the on-premise machine by generating the proper configuration and data files and then packing them and deploying them onto a disk, which can then be used for the Azure VM provisioning. That process is presented in Chapter 8. For now, just know that some tools are called Azure Site Recovery and Azure Migrate; more manual tools include AzCopy, Disk2VHD, SYSPREP, Microsoft Virtual Machine Converter (MVMC), Hyper-V, and VMware.

Azure Site Recovery and Azure Migrate are the tools you will use if you have a large enterprise solution with multiple tiers and many machines. That scenario would be one where the number of servers being migrated is too great to even consider a manual provision. The other mentioned tools are for smaller migration projects, where much of the work can be accomplished by performing manual tasks and is performed by a single person or a small team.

Change Resource Groups, Subscriptions, Regions, or Zones

What happens if you realize you have placed your VM workloads into the wrong resource group, subscription, or region? Perhaps, if you realize it early in the deployment cycle, a simple delete and re-create won't have any impact, and that is the simplest and cleanest way to move the workloads. However, if you realize this after some significant configuration has been performed on a VM or there are now other resources with a dependency on a given VM, re-creating that VM isn't really an option. It's not an option because the impact would be too large. Moving the VM into another resource group is actually easy. As you recall, a resource group is only a logical grouping; there is nothing physical about it. On the Azure Portal on the Overview blade, the resource group to which the VM is associated is displayed. Directly after the words resource group there is a link named Change. Simply click that and change it. The same goes for the subscription, as shown in Figure 4.31.

Snapshot of Azure Portal view of changing resource group or subscription.

FIGURE 4.31 Azure Portal view of changing resource group or subscription

There are three points to call out here, as shown in Figure 4.31. First, although the resource group did change, its physical location did not. The VM was in CSHARPGUITAR-SN1-RG (South Central US), and I changed it to CSHARPGUITAR-DB3-RG (North Europe), but the physical location of the VM did not change. It does kind of make sense in that there may be IT solutions that are global, and I'd want workloads in different regions in the same resource group because they play a part in the overall solution. However, the name of the resource group no longer makes sense. The second point is that only the VM was moved, and there are more components to a VM than just the host. For example, if you enabled boot diagnostics, there would be a storage account for the VM, and there is a network interface, possibly an NSG, and a static public IP address and the disks. You would want to move all the pieces and not leave them separated. You would lose oversight of them quickly. I hope you are realizing more and more how important it is to organize your resources and that there are some good features in Azure to achieve that. Lastly, the name of the VM cannot already exist in the resource group to which it is being moved. The unique key in a resource group is name + resource type, which means you can name everything the same as long as the type of resource is different. An Azure SQL instance, a VM, a Cosmos DB, a Storage account, and an Azure VM can all be named CSHARPGUITAR since they are different types of resources.

Now we come to the more difficult scenario where you actually want to physically move the workload from a region or zone. There are two points to mention about moving like this in this chapter; you cannot move from any region to any region, and Azure Site Recovery is the recommended tool for achieving that kind of move. In the previous chapter, we discussed the Azure network, and the concept of a geography was provided. Well, that is the limit in which you can physically move an Azure VM. One of the reasons for that limit has to do with the sovereignty of the data or intellectual property running on your VMs. It is in a way a protective barrier so you cannot by mistake move restricted data into a location where there is a risk of its presence being prohibited. You know from the previous chapter that geographies are usually in the same area of the world. For example, West Europe and North Europe are in the same geography, and there may be less chance of breaching a Global Data Protection Regulation (GDPR) when moving between those two regions than any other region not in Europe. Moving between South Central US, West Central US, and North Central US is also supported, for example. You'll learn more about that and more on Azure Site Recovery in Chapter 8.

We also discussed what an Availability Zone was in the previous chapter. Simply, it means Microsoft has your back, and all of your compute is not running in a single data center. For a given region, there are multiple data centers, either across the street or across town, but your data and workload are replicated into both, making sure that your solution remains available even if an entire data center experiences a catastrophic event. If you recall from EXERCISE 4.5, on the Basic tab of the Azure VM creation process there was a drop-down list named Availability Options. In the exercise so far, the value was left at the default because infrastructure redundancy has an associated cost, and it wasn't ready to be discussed then. However, if you return and view the contents of the drop-down, you will find the availability zone and availability set. Both of them are touched on in the next section; however, if after creating a VM with no infrastructure redundancy, you decide you need it, then you can move it into a zone using Azure Site Recovery.

Availability Sets and VM Scale Sets

Availability is a big deal. If you were to deploy a workload to the Azure platform and then the application experiences more unavailability than it had on-premise, some serious questions would be asked. In Chapter 2, we discussed what an SLA is, and it will be touched on again in Chapter 9, but from an Azure VM perspective we'll touch on it a bit because it has something to do with the topic in hand. Basically, if you deploy only a single VM, the SLA commitment for its availability is 99.9%, which translates to approximately 45 minutes of allowed downtime per month. Is that good enough? It would depend on the criticality of the code running on the VM. If you need more, than you have to take some actions to add redundancies across availability sets and Availability Zones.

As you know, when you choose IaaS, the cloud provider is responsible for managing the hardware and infrastructure. Those components do sometimes experience transient outages or require some maintenance. If you have only a single instance of your VM, then there will be downtime if the hardware or infrastructure updates a component that requires a reboot or inhibits traffic transmission from arriving to or departing from your VM. For example, hardware driver updates or a memory module fails. If you create multiple instances of your VM, Microsoft has implemented two concepts to help manage outages during transient or maintenance activities called fault domains and update domains. When you place your VMs into those domains, the SLA increases to 99.95%, which allows about 20 minutes per month, which is less than half the downtime when compared to running on a single VM. There is a cost for the additional instance, but no cost for the domain feature. If you then go so far as to place multiple domain instances into multiple zones, then the SLA increases to 99.99%, which is a very low amount of downtime at 4.5 minutes per month. Again, it costs more, but the cost may be justified based on the workloads running on the VMs. You may be asking yourself, have we spoken about these domains? The answer is not really, but by the end of this chapter, you will create some and add some VMs to them. You should certainly know what an Availability Zone is because it was discussed in the previous chapter. You will also add VMs to an Availability Zone later.

Availability Sets vs. Scale Sets

In Chapter 3, I stated that more would be covered regarding availability sets. That will happen in the coming paragraphs. In addition to that, it will be good to include scale sets in this discussion now. They kind of read the same and sound the same if spoken; some might even think they are the same, but they are not. An availability set is like a DR environment or primary secondary architectural configuration. While a virtual machine scale set is a group of identically configured VMs. Both do have something in common, in that they benefit from fault domains and update domains. See Figure 4.32 for an illustration.

Schematic illustration of the fault domains and update domains.

FIGURE 4.32 Fault domains and update domains

A fault domain is a grouping of VMs that share the same network switch and power source. An update domain is a group of VMs that can be rebooted at the same time after a maintenance activity has been performed. That would mean it would be prudent to place VMs that shouldn't be rebooted at the same time into different update domains. The platform is intelligent enough to place primary and secondary VMs into different update domains on a best-effort basis. For both availability sets and VMSS, you have the fault domains feature and get up to three for availability sets and a default of five for VMSS. There are five update domains created into each fault domain. Finally, note that neither fault nor update domains will prevent downtime caused by application or operating system exceptions. Let's now get into a little more hands-on detail for both Azure products.

Availability Set

As mentioned just previously, an availability set is most useful for disaster recovery environments or primary/secondary environments. From a DR or failover perspective, consider you have a web farm that is an internet/intranet application running on many different servers behind some kind of load balancing device. The reason for having many servers running the application is for redundancy. If you have only one server and it goes down, then your business is down. Therefore, have numerous servers each of which is acting as a DR or failover instance of the other. The issue comes when there is a power outage or transient network issue. Fault domains will prevent your servers from being dependent on the same power supply and network switches. When you create your VMs, you select the availability set to place them into, and the platform decides which fault domain to place them into.

Each VM in an availability set is uniquely named, and each can be configured and added to an application gateway or load balancer. Additionally, if you are running a managed SQL Server instance, you would want to have a secondary copy of the database in case of outage, again both being in separate availability sets. Finally, if you were performing updates to your VMs, installing patches, or doing an OS upgrade, you wouldn't do it on all the machines and then reboot them all at the same time. It is actually a tedious process for installing updates and maintaining availability and resiliency throughout that process. It is no small undertaking. But in the simplest context, you would install a patch on one and test to make sure all is well, before moving onto the others. From a database perspective, you may want to failover to the standby, making it the primary after the upgrade and then upgrade the primary, which then becomes secondary. The update domain concept is intended to help maintain availability during Azure hardware and infrastructure updates and not operating system or application upgrades. When you create an availability set in Exercise 4.10 you have the option to choose the number of fault domains, from 1 to 3, and the number of update domains, from 1 to 20. Figure 4.33 simulates the organization of five VMs into five update domains in three fault domains.

Schematic illustration of an availability set with three fault domains and five update domains.

FIGURE 4.33 An availability set with three fault domains and five update domains

Complete EXERCISE 4.10 to create an availability set and then add some VMs to it. Note that you can only add a VM to an availability set during VM creation. You cannot move a VM into an availability set after creation; it must be deleted and re-created. However, with what you know about images, disks, and backups, that shouldn't be a significant issue. Just keep that in mind.

In Figure 4.34, notice that of the two created VMs placed into the availability set, one is in fault domain 0, update domain 0, while the other is in fault domain 1, update domain 1. This means that I can have, for example, either a primary and secondary database or two web servers running the same application behind a load balancer on those machines while they are highly isolated from each other. Additionally, recall from step 3 in EXERCISE 4.10 that the drop-down allowed the choice between either availability sets or Availability Zones. When you choose Availability Zone, you will get the availability set replicated into each of the zones, which is like a redundancy within a redundancy. You get the realized redundancies from the domains within a single data center replicated to other data centers (zones) within the region. That is great! Lastly, 2,000 availability sets are allowed per region, per subscription, with a maximum of 200 VMs per availability set. That is a large number, but maybe your business will take off due to its high performance and availability, and you need that many. Let's hope so.

Virtual Machine Scale Sets

As mentioned already, scale sets are identical instances of a VM that can scale out based on traffic and load. VMSS provides redundancy capabilities but for a different use case than availability sets. Instead of the primary/secondary or DR scenario, one VMSS benefit is to automatically scale out more instances of the VM based on load. As illustrated in Figure 4.35, the scaling minimum and maximum number of VM instances can be set from 0 to 1000; the default setting is a minimum of 1 and a maximum of 10. You will see that image in practice when you complete Exercise 4.11. From an autoscaling perspective, the default setting is to scale out when the average CPU percentage for all instances is greater than 75%. When that happens, the platform will create another instance of your VM up to a maximum of ten. If later the CPU consumption on the running VMs goes below an average of 25%, then the number of instances will be reduced by one until there is only one left. This scaling rule can be changed after creating the VMSS.

Snapshot of the autoscaling rules for a virtual machine scale set.

FIGURE 4.35 Autoscaling rules for a virtual machine scale set

To get more knowledge about a VMSS, create one in EXERCISE 4.11. Before proceeding, however, you will need either an application gateway (Exercise 3.10) or a load balancer. Both of these were discussed in the previous chapter. For EXERCISE 4.11, an application gateway is utilized because we walked through the creation of one in the previous chapter.

In EXERCISE 4.11, we did not create public IP addresses for the instances that will live in the VMSS pool. The radio button for the public IP address per instance was left at the default of Off, since all the VMs will reside behind an application gateway and will not be directly accessible from the internet. Next, if you click the Instances navigation menu item, it will list the three VM instances created. Click one of the instances, and on the Overview blade you will see its location, which also includes the zone in which is resides. Since, per the instructions, you selected all three zones, each VM is placed into a different zone, which gives maximum redundancy. Note that also, behind the scenes, the concept of fault domains and update domains is at play and implemented for an ever-greater level of resiliency when the number of VMs increases.

On the Scaling menu item, you can modify the autoscale configuration created originally or manually scale to a static number of instances. Those decisions, like many others, are dependent on the requirements of the application. The Storage menu item is an interesting one and one of the nice benefits of a VMSS. Remember that you chose a custom image when you created the VMSS. This means that each of the three VMs that got created used the same image. The content and configuration of all three VMs are therefore identical, making the deployment of new instances of the image into the VMSS quick and easy with no chance of having an error occur due to a wrongly configured VM. Before this kind of service existed, servers had to be built manually, and there were often scenarios where one of the servers in the pool didn't act like the others. It was hard to find which one it was and sometimes not possible to find out why it was misbehaving, so we just rebuilt from scratch and hoped we got it right the next time. That manual scenario would not be possible in an environment with numbers close to the maximum number of VMs per VMSS of 1,000 instances, 600 of which can be based on the same image. This is per region, per subscription. At that scale, all instances must be identical, which is what is achieved with VMSS using a single base image for all VM instances within it. Finally, if you decide you need a larger VM size, click the Size menu option and scale to a larger size, no problem.

Managed Disks

You might be wondering about the dependencies an Azure VM host has on the disks that are attached to it like the OS disk, data disk, and temp, as you might recall from Figure 4.20. As inferred earlier, the contents of a managed disk that is attached to an Azure VM is stored as a page blob in an Azure Storage container. Storage is covered in Chapter 5; however, the concepts relating to storage such as LRS, ZRS, and GRS have been touched upon already and are relevant here. To see why they are relevant, execute this Azure CLI command from https://shell.azure.com/

az disk show --ids <disk-id> --resource-group <name>

Notice the output for "sku": { "name": "Premium_LRS" } and make a quick assessment of what that means from a redundancy perspective. LRS means the disk, although copied three times, is locally redundant and has domain redundancies within the data center, but not outside of it. The same goes for a VMSS. To confirm that, navigate to the Instances menu item on the VMSS blade where you can find the name and the numeric InstanceId of the disk. To view the details of the disk attached to a VMSS instance, run this PowerShell cmdlet. Be sure to replace the contents within < name > variables with your specific/unique values.

(Get-AzVmssVm -ResourceGroupName <name> -VMScaleSetName <name> `
    -InstanceId <#>).StorageProfile.OsDisk.ManagedDisk.StorageAccountType

That cmdlet dumps out the account type into which the disk is stored, in this case Standard_LRS , which means locally redundant storage (LRS). The important point here is simply that you know that, and you know the dependencies your solution has also impact the availability of your solution. Considering that an availability set is local but redundant across the fault and update domains, you can instead choose to deploy the VM into availability zones and get ZRS reliance in addition to the domain features. Lastly, the Get-AzVmssVm PowerShell cmdlet that ran previously was for a single managed disk on a single VM. Remember that in EXERCISE 4.11 you created a VM in three zones; therefore, if you ran that command on all three managed disks, it would still show Standard_LRS , but each one would be in a different zone.

Securing Azure Virtual Machines

Chapter 2 covered security, but not specifically focused on the context of an Azure virtual machine. Let's discuss this a bit more now. There are three aspects of security that need some extra discussion.

  • Who can access the VM via the portal or client?
  • Who can access the VM using SSH or RDP?
  • Who can access the application running on the VM?

The Azure Portal, Azure PowerShell, Azure CLI, and REST APIs all allow great administrative capabilities. You know already the method in which to restrict who can do what is via RBAC and the Access control (IAM) menu item on the VM portal blade. Determining who can do what to which resource at which level in the resource hierarchy requires thought, design, and maintenance. I mean, at the click of a button, a VM can be deleted. That may not be so bad, knowing what you do now about images and how managed disks work. The VM can relatively quickly be recovered, but what if the disks or images get deleted…on accident? You need to prevent that. And you should already know how to do that.

When you create the VM, part of the process is to create an administrator username and password. You need to protect them and make a conscious decision as to who gets the credentials. Instead of sharing the admin credentials, create specific accounts with a minimal set of permissions using a just-in-time access policy and monitor the connection activity using the Connection Monitor menu link on the VM blade in the portal. Microsoft doesn't back up the content of your disks on your behalf, but you can, and it's recommended. Microsoft simply lets you do with the provisioned resource as you wish while keeping it highly available. Once someone gets onto the server with mischievous intentions, if they have the credentials, then there isn't a lot you can do to prevent accidents or unwanted behavior. Removing files, replacing files, running some malware, or planting an exploit are things that you must be cognizant of when running your workloads. If you ever feel the credentials of a VM have been compromised, then there is a Reset password menu item on the VM blade. This is also helpful if you happen to forget it, which happens sometimes since you never write them down because that's a no-no. There is also a Security menu option that is an entry point into Azure Security Center (ASC). ASC helps monitor for activities that have been discussed in this paragraph.

From an application perspective, it depends greatly on your chosen identity provider and how it is implemented. You can, however, add safeguards like network security groups, an Azure Policy that enforces TLS 1.2 and HTTPS, and perhaps Managed Identity. All of these application-level security techniques are helpful in securing your application running on a VM. There is one last technique that we also covered in Chapter 2, and that was the concept of encryption of data at rest. This is important if the contents of the managed disks need to be encrypted due the storage of very sensitive data, intellectual property, or the code that runs the application. To encrypt a managed disk, complete Exercise 4.12.

There you have it. After reading this section, you now know Azure VMs in some depth. You should feel confident that going into a meeting or to the Azure Solutions Architect Expert exam that your knowledge is ready for use. To make sure, answer the following question, and then move on to learn about other Azure Compute products.

  • What is the best tool for moving an on-premise server or VM to an Azure Virtual Machine?
    1. Azure Migrate
    2. SYSPREP
    3. AzCopy
    4. Azure Site Recovery

The answer is Azure Site Recovery, which is helpful not only for planning and recovering from a disaster but also for moving on-premise workloads to Azure VMs. SYSPREP and AzCopy are not useful for moving VMs, and Azure Migrate is more for planning the move versus doing the actual move.

Azure App Services

Azure App Services are Microsoft's PaaS offerings. This product grouping offers Web Apps, API Apps, Mobile Apps, Web Apps for Containers (Linux), App Service Environments, and Azure WebJobs. PaaS removes the responsibility of operating systems maintenance from the customer. In Chapter 3, I mentioned the concept of a sandbox and that Microsoft PaaS offerings operates within one. The sandbox is what restricts the actions that you and your code can do. For example, no writing to the registry, no access to the Event logs, no out-of-process COM, and no access to User32/GDI32 are a few of the most common impactful limitations. There is a complete list of the restrictions on GitHub specifically at this location:

github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox

Additionally, supported framework runtime versions (for example, .NET, PHP, Python, Node, etc.), third-party drivers, cypher suites, registry editing, or root certificates are driven by the cloud service provider. This means, for example, if you want to target .NET Core 3.0 or Python 3.8 and the runtime is not on the platform, then you cannot use it. When you decide to run your workloads on a PaaS offering, you gain some value by delegating responsibility of the OS, such as patching, security, and OS upgrades. However, you lose some flexibility when it comes to making custom environmental configurations and targeting new versions of runtimes that include the newest technologies.

Azure App Services come in a variety of pricing tiers and sizes, the combination of those two attributes create what is called an App Service Plan (ASP). From a pricing tier perspective, there are six different options, each having a different set of accompanying features. Free and Shared are the most economical offering and are intended for testing and learning; you shouldn't run any production workload on them as neither of those tiers comes with an SLA, and they have some strict metered capacity constraints. For example, running on the Free plan, your process can consume 60 minutes of CPU minutes per day and 240 minutes in the Shared plan. This may be enough for some small test sites, but as already mentioned, it's not for production workloads. The other four tiers are Basic, Standard, Premium, and Isolated. Isolated is also called an App Service Environment (ASE), which is discussed later. Some features available in Basic, Standard, and Premium are presented in Table 4.10.

TABLE 4.10 Basic, Standard, and Premium Tiers

Feature Basic Standard Premium
Disk space 10 GB 50 GB 250GB
Max instances Up to 3 Up to 10 Up to 20
Deployment slots X
Cloning X X
VNET Integration X
Autoscale X
Backup/restore X
Traffic manager X
Number of Apps Unlimited Unlimited Unlimited

Any of the tiers discussed in Table 4.10 execute in what is called dedicated mode. This means the application is running on its own VM and is isolated from other customers. This is not the case in Free and Shared modes. In addition to the different feature limits available per tier, they also come in three sizes: Small, Medium, and Large. Basic and Standard VMs currently run on A-series sizes where:

  • Small: 1 CPU with 1.75GB RAM
  • Medium: 2 CPU with 3.5GB RAM
  • Large: 4 CPU with 7GB RAM

Premium tier, on the other hand, runs on Dv2-series VMs and come with the following allocated compute resources.

  • Small: 1 CPU with 3.5GB RAM
  • Medium: 2 CPU with 7GB RAM
  • Large: 4 CPU with 14GB RAM

A reason I call out the sizes is that there are restrictions based on the size of the chosen VM, which is an important one. It is important because there are some bad coding patterns that occur often that cause troubles with an application running on an App Service. Those patterns will be discussed in Chapter 7. The specific restriction is the number of concurrent outbound connections. There is no limit on the overall number of outbound connections; the limit and the key word here is concurrent. In other words, how many open, in-scope connections is the application currently holding? The limit is 1,920, 3,968, and 8,064, per instance, for Small, Medium, and Large, respectively. Per instance means that if you have two instances of a Medium, you can have a total of 7,936 outbound connections, which is 2 × 3968.

Recall some of the options covered in Table 4.10. Let's discuss them a bit more, specifically these options:

  • Max instances
  • Autoscale
  • Number of apps
  • Deployment slots
  • Cloning

The documented maximum number of instances can be increased if your workload is running in the Premium plan and you need more than 20. Instances were also just mentioned regarding the number of concurrent outbound connections. An instance is a dedicated VM, aka ASP, that is running your application workload. You can increase and decrease the number of instances using an Autoscale rule, as discussed previously for IaaS Azure VM, or you can perform the scale out or scale in manually. The technique to achieve such scalability with App Services is also similar to a VMSS, where the image is stored in a central location and new instances are built from that as needed. The difference with an Azure App Service is that instead of an image, the application source code and its configuration are stored in an Azure Storage container and used when adding a new instance into the pool. An image is not necessary because you are running on PaaS and the operating system configurations are all the same on every VM.

Recall from Table 4.10 that the number of apps per ASP was unlimited. Nothing is really unlimited; it just means that there is no coded limitations on it, like there is for Free (10) and Shared (100). Each time you create a web app, for example, you are effectively creating a website that will run in its own process within the ASP. Each website gets its own hostname, such as the *. azurewebsites.net, which you already know since you created a few in previous chapters already. The limitation isn't on the number of apps but on the amount of compute resource your applications require plus the number of instances per pricing tier. If you can get 1,000 websites onto a one-CPU machine with 1.75GB of RAM to be responsive, great, but I'd find that unlikely. You would need to use better judgment on which apps need how much compute and then balance them across different App Service Plans and instances. This leads nicely into deployment slots, which are test instances of an app on the same ASP. When you create a slot on an ASP for a given app, you are effectively creating a new process to run a newer version of that same app. In Chapter 9, we will cover deployments, but it is safe to write here that deploying straight into production is risky. Deploying to a deployment slot avoids doing that.

Take, for example, that your web app name is csharpguitar.azurewebsites.net and you create a deployment slot named test. A new web app will be created named csharpguitar-test.azurewebsites.net. You can then deploy a new version of the csharpguitar web app to the test site and test it. Once you are sure all things are working fine, you can do what is called a slot swap. This means the test site becomes production, and the production becomes the test. This happens by switching which process responds to requests made to the hostnames. Both processes and hostnames remain alive, but new requests start flowing to the new version once the slot swap is performed. Finally, cloning, which is like a backup, is a cool feature. It is also helpful in debugging and trying to find the root cause of an issue. From a debugging perspective, some logging or debugging techniques can have negative impacts on application performance and stability. Making an exact replica of the environment and placing it someplace else can give you more options when troubleshooting issues. Cloning is helpful if you want to move your application to another region, for example. From the portal you can move an App Service Plan between resource groups and subscriptions; however, moving to another physical region is not possible via a click activity in the Azure Portal. Cloning will help you achieve that if that is your objective.

Web Apps

An Azure App Service web app is a popular entry point onto the Azure platform from a PaaS perspective. They are limited to no barriers to entry, and you can be up and running with an internet presence in minutes. That is absolutely an amazing feat when reflecting back 15 years or so and the amount of effort to achieve this. Then also, if required, you could scale out to 400 CPUs in a few additional minutes, which is absolutely inconceivable. Let's do a few things with web apps now. If you do not remember how to create an Azure App Service, please refer to EXERCISE 3.8 in the previous chapter. In Exercise 4.13, you will enable Azure Active Directory authentication for an Azure App Service Web App. This is a feature named Easy Auth and was introduced in Chapter 2.

You certainly noticed the other authentication providers such as Microsoft, Facebook, Google, and Twitter. As easily as you enabled AAD for authentication for your site, you can do the same using those other providers. Note that this is for authentication only, which means we can only be sure your visitor is who they say they are; it doesn't provide any information about what they can do on the site. That is more application-specific and is the authorization part. We won't cover that in more detail here. One last note is that in step 1 of EXERCISE 4.13 the action to take was to log in with Azure Active Directory. This means that no one can access your site if they do not have an identity to be validated. It is common to select the other option Allow Anonymous and then let the visitor create a profile on your site. Once the account is validated, the user can have access to other protected features in your site. However, that kind of accessibility requires code in your application to check for the validation tokens. What you did in EXERCISE 4.13 is simply wrap the entire application with a security protocol; if more granular or more precise authorized access is required, then it will require some code.

In Exercise 4.14, you will configure Hybrid Connection Manager (HCM) to connect to an IaaS Azure VM in a VNET. Instead of using an Azure VM, you can also use an on-premise machine. This is a simple method for connecting a Web App to an on-premise machine or an Azure VM in a virtual network. HCM uses ports that are typically open for allowing internet traffic, so in most cases this configuration works quickly.

In summary, an Azure App Service web app is a place to run internet applications using Microsoft's PaaS cloud service. It has limited barriers of entry, and you can deploy, scale, and consume at a very fast pace. There are two other products that run on the same PaaS platform as a web app; they are API apps and mobile apps. A brief description of them follows.

API Apps

API apps, not to be confused with API management discussed in the previous chapter, is a product offering that correlates more closely to a Web API versus a web application. In short, an API is an interface that exposes one or more methods that can be called from other applications or consumers; there is no graphical user interface (GUI) with an API. API Apps fully support cross-origin resource sharing (CORS) and Swagger.

When you create an API app, you follow the same process as when creating a regular web app. The difference is that on the Web App blade there are navigation items named API definitions that are used for the configuration and implementation of Swagger. Swagger has numerous capabilities, but its notable feature is making it easy to discover and consume the API app. As you may know, one of the difficulties in consuming an API is trying to figure out the name of all the methods it exposes, the required parameters for them, and the kind of authentication protocol required to communicate with the API. Swagger, when configured, provides this information and greatly simplifies its consumption. There is also a navigation menu item named CORS, which is used for identifying allowed origins. CORS is a security feature that makes sure when code (for example, JavaScript) is running in your browser that any outbound calls from within the code are allowed. There have been cases where scripts are maliciously injected into client-side JavaScript that downloads some snippet of malware. CORS is implemented with the preventions of such injections in mind.

Mobile Apps

Some years ago, Mobile Apps was its own product; today it is running on the same platform as web apps and API apps. It will soon be retired, but it's worth mentioning because of its historical significance. When Microsoft was making its push into the mobile world, this Mobile App was a central point of that. It was planned to be the backend for all the apps that would be published to the Microsoft Store. Well, we all know how that went and it is why we see “no light at the end of the tunnel.” Regardless, you create a mobile app like you would an app service and look in the navigation menu for easy tables and easy APIs. Both of those made it quick and easy to set up the backend database or dependent APIs for the mobile app. I expect the feature to be removed, but I am confident it will come back in another shape or form in the next few years. That behavior is a fairly common one (i.e., retire something and then bring it back with a new look and marketing campaign).

Web App for Containers (Linux)

Yes, you can run Linux on PaaS. The version of Linux that Web Apps runs on is Ubuntu Server 18.04 LTS. However, this isn't a showstopper because you can also run Web Apps in a container, which allows you to choose the targeted OS. To create a web app that targets Linux, you begin the same as you have previously, paying special attention to the Publish, Runtime Stack, and Operating System attributes on the Web App creation blade. Figure 4.44 shows how that looks.

Snapshot of creating a Linux Azure App Service web app.

FIGURE 4.44 Creating a Linux Azure App Service web app

The first attribute is Publish, which provides the option for code or a Docker container. If you choose Code, it means you will receive a VM that you can then deploy your application code to. The other option is Docker Container, which you should already be an expert in as it was covered already in this chapter. If you choose Docker Container, the Runtime drop-down goes away, and your only option then is to choose Linux or Windows. Choosing a Docker container would result in a VM being allocated to run the container that you create locally or download from Docker Hub, for example.

The contents of the Runtime Stack drop-down is a list of supported, for lack of a better analogy, common language runtimes (CLRs). The runtimes are the frameworks and libraries in different programming languages and their different versions that are available on the platform for your application to target. There are many of them; Table 4.11 displays some of the supported runtimes. For a most up-to-date list, check https://docs.microsoft.com/en-us/azure/app-service/containers/app-service-linux-intro#languages.

TABLE 4.11 Supported Open Source Languages

Language Versions
.NET Core 1.0, 1.1, 2.0, 2.1, 2.2
ASP.NET 3.5, 4.7
Java 8, 11, SE, Tomcat 9, 8.5
Node 4.x, 6.x, 8.x, 10.x
PHP 5.6, 7.x
Python 2.7, 3.6, 3.7
Ruby 2.x

If a version of the runtime or the total programming language is not supported, then you can simply create a Docker Container to run the workload on PaaS. An interesting scenario you may notice is that when you select, for example, Node or PHP from the Runtime Stack drop-down, the Operating System selection automatically changes to Linux. Even though you can run Node and PHP on a Windows OS, the portal is steering you toward Linux. It's similar behavior for .NET Core. Even though it can run on Linux, when selected, Windows is the selected operating system.

I have a last point to make about Web Apps for Container, and it is an important one. At the beginning of this section I mentioned the sandbox that constrained the configuration that you can perform when running your application with Azure App Services. With Web Apps for Containers, this is no longer the case. Whether you run your application on Windows or Linux, you can create a Docker container and deploy it to the Azure App Service environment. Within the Docker container you can make any modification you desire—registry changes, cypher suite reordering, whatever. The biggest hurdle for most, as I mentioned in the container section, is that this capability is kind of new, and many are reluctant to consume it because it is kind of young. However, for smaller workloads having a less mission-critical role in the IT solution, you should give this a try. It is time to start learning this, and you learn best by consuming, maintaining, and troubleshooting it.

App Service Environments

An App Service environment, aka the Isolated tier, is an instance of the entire Azure App Service capability inside a virtual network accessible to only one entity. When running in other tiers such as Basic, Standard, and Premium, i.e., in a multitenant stamp, although your VMs are dedicated, there are other customers who use some of the shared resources that provide the App Services product. For example, the front ends that load balance and direct requests to the correct web app or to one of the multiple instances of the web app are shared resources in Basic, Standard, or Premium. If you do not want or cannot share resources with other tenants, then you can choose the Isolated tier and get all the shared resources for yourself. You never share VMs running your application in Basic, Standard, or Premium; they are dedicated to you.

There are two benefits that come to mind when running in an ASE that do not exist in other tiers. The first is that you can implement an internal load balancer (ILB), which will prevent direct access to the ASE from the internet. In the context of an ILB, the IP address and the internet end point (i.e., *. azurewebsites.net) are not globally accessible and need a gateway VPN connection to allow access. In addition, the ordering of cypher suites is allowed with an ASE. A cypher suite is an encryption algorithm used for encrypting TLS connectivity between a client and server. They are added to an OS in a specific order, and the first match is the one that a client and server agree to use. Sometimes a customer wants to use a stronger or different cipher for many reasons. An ASE allows the reordering; other tiers do not. Table 4.12 shows the feature limitations for an ASE's Isolated tier.

TABLE 4.12 ASE/Isolated Tier Limitations

Feature Isolated Limit
Disk space 1 TB
Max instances Up to 100
Deployment slots
Cloning
VNET integration
Autoscale
Backup/restore
Traffic manager
Number of Apps Unlimited

The VMs are a dedicated Dv2-series with the following specifications:

  • Small: One CPU with 3.5GB RAM
  • Medium: Two CPU with 7GB RAM
  • Large: Four CPU with 14GB RAM

Keep in mind that with ASEs you are getting a few more servers that run the other Azure App Service components that are typically shared in the multitenant environment. Those additional series and the additional benefits come with an additional cost.

This product is most common for large enterprises or customers who have some stringent security requirements and need or desire to run PaaS within their own isolated VNET.

Azure WebJobs

Azure WebJobs is a feature strongly linked to Azure App Services that supports the execution of background tasks. This is similar to the older concept of batch jobs. Batch jobs are typically programs that are run at scheduled intervals to process data or to read from a queue and perform some action based on its contents. They typically have filename extensions like .cmd , .bat , .exe , .ps1 , and .js , which the operating system recognizes as an executable program. There are two types of WebJobs.

  • Triggered (Remote debugging is not supported.)
  • Continuous (Remote debugging is supported.)

A triggered WebJob is a program that is scheduled to run using a scheduler scheme named CRON, or it can also be run manually. A common format of CRON is illustrated in Figure 4.45. For example, a CRON schedule of 0 15 8 * * * would run every day at 8:15 a.m.

Schematic illustration of CRON schedule format for use with a triggered WebJob.

FIGURE 4.45 CRON schedule format for use with a triggered WebJob

There are two common ways to manually trigger an Azure WebJob; the first is from within the Azure Portal. As stated, a WebJob is tied into an Azure App Service, meaning an Azure App Service is required to run a WebJob. To manually run a WebJob, access the Azure App Service where the WebJob is located, select the WebJobs link from the navigation menu item, select the WebJob, and click the Run button. The WebJob blade looks something like that shown in Figure 4.46.

Snapshot of WebJob portal display.

FIGURE 4.46 WebJob portal display

The other means for manually triggering a WebJob is using the WebJob API. The WebJob is hosted on the Azure App Service platform, which exposes a global endpoint. This means also that the WebJob is globally accessible. When KUDU was mentioned and you accessed it via the Advanced Tools menu item, you may have noticed the URL that you were routed to. It was the name of the web app followed by .scm.azurewebsites.net. If you add /api/triggeredwebjobs/<WebJobName>/run and you access the URL, then the WebJob would be manually triggered from the WebJob API. A final example might resemble the following, where * is the name of your web app:

https://*.scm.azurewebsites.net/api/triggeredwebjobs/Exception002/run

Just on a side note, this API is one of the foundational technologies on which Azure Functions is built. Azure Functions is discussed later, and you will read briefly about a Timer Triggered Azure Function; just note that the WebJob SDK is a fundamental component of Azure Functions, and it may or may not be recognized without some deep technical review.

The other type of WebJob is one that runs continuously. This is useful for scenarios where you process orders or execute reports in the background but in near real time. Capturing a customer order is something that you always want to make easy, which is why you want to do as little as possible when the place order button is clicked. Perhaps simply place the order details into a queue. In other words, do a simple valuation and then a simple insert into a data source. Then you can have the WebJob continuously monitoring that queue that then processes the order offline. If there are any errors, notify an administrator to correct it and get it reprocessed. Once the order is reprocessed, then the WebJob can send the customer an email notification. Another scenario is running large reports. Many reports take a long time to process, and running them in real time can result in timeouts and not getting any results. Again, you could save the query parameters in a queue that a WebJob is monitoring, and then it can process it. Batch or offline processes typically run longer than real-time web requests, which have built-in timeouts. Once the report is complete, a link to the output is emailed to the user. Easy.

There is a configuration that needs to happen on the Azure App Service to run continuous WebJobs. You must have enabled a capability called Always On. There is a mechanism that conserves resources by shutting down websites that haven't been used in 20 minutes. This shutdown feature can be disabled by enabling Always On, which is found on the Configuration menu item for the Azure App Service. Another point worth considering is what happens if you run multiple instances of your Azure App Service. Does the WebJob run continuously on every instance? The answer is yes, by default it does. However, you can configure the WebJob as a singleton by creating a file named settings.job and placing it into the root directory of where the WebJob is located. The contents of the settings.job file are as follows:

{ 'is_singleton': true }

Instead of creating the settings.job file manually, if you configure the WebJob from the portal using the + Add button, as shown in Figure 4.46, when you select Continuous as the type, you have the ability to select Multi Instance or Single Instance from the drop-down, as shown in Figure 4.47.

Figure 4.47 also illustrates a view of how + Add Capability looks when Triggered is selected.

Snapshot of adding a continuous or triggered WebJob.

FIGURE 4.47 Adding a continuous or triggered WebJob

Azure Batch and HPC

In the previous section, we touched on batch jobs. So, you should have some idea at this point of what they are. However, in that context I would think we are talking about a maximum of 20 to 30 cores and perhaps from 7GB to 14GB of memory to perform a serial execution of a specific task. Let me now introduce you to Azure Batch, which provides you with the tools to run processes in parallel on a large scale. Azure Batch lets you connect thousands of compute instances across multiple graphical processing units (GPU) that provide low latency and extremely powerful processing when compared to CPU-only architectures. Tasks such as dynamic image rendering and visual simulations with GPUs, Big Data scientific and engineering scenarios are a few examples of workloads that run using Azure Batch and High-Performance Computing concepts. Those workloads can run on either Windows or Linux, using coding languages like R, Node, PHP, Ruby, Python, Java, and .NET.

There are numerous HPC models, two of which work optimally with Azure Batch; they are InfiniBand and Remote Direct Memory Access (RDMA). As shown in Figure 4.48, InfiniBand architecture provides a metrics-like structure to group together massive, and I mean massive, compute power. The compute power can come from a mixture of CPU, GPU, and XMC processors.

Using Azure Batch, the RDMA architecture, as shown in Figure 4.49, illustrates the sharing of memory between a pool of nodes.

Notice how the access to memory across nodes happens over an Ethernet connection. Both InfiniBand and RDMA support what is called cloud bursting. If you recall from the previous chapter, we discussed the concept of a hybrid network, and one use of that hybrid model was to handle extra traffic or unexpected growth when necessary. There is a similar model that allows you to cloud burst from an on-premise HPC architecture to Azure if you happen to run out of or want more compute power. The connectivity works via both ExpressRoute and a VPN Gateway connection. Finally, let's not fail to mention that if you want a super computer, Azure offers you the ability to get a dedicated cray computer placed into one of your private virtual networks so you can crunch and crack those weak encryption algorithms in less than two months. You might be tempted to do that just for fun, but there isn't even a price I could find for that, so if costs matter, then you might want to just consider this as something cool and for NASA or other government organizations to consume.

Schematic illustration of the InfiniBand architecture.

FIGURE 4.48 InfiniBand architecture

Schematic illustration of the RDMA architecture.

FIGURE 4.49 RDMA architecture

Let's start off with creating an Azure Batch account in Exercise 4.15.

Before you attempt the online simulation, let's cover a few things. First, what is the difference between the Batch service and Subscription service that you saw on the Advanced tab? For most cases, you would use the default Batch service, which makes the most of what is going on with compute provisioning behind the scenes. In general, it simplifies the consumption and execution of workloads by handling the server pools for you. If you want to have more control over the server pools or utilize Azure Reserved VM Instance, choose the Subscription service. Next, take a look at Figure 4.50 to see more about how Azure Batch and HPC works.

  1. Upload the code and data input files into storage.
  2. The tasks, jobs, and the pool of compute nodes are created.
  3. Azure Batch downloads the application code and data to use for computation.
  4. The application is monitoring pool and tasks.
  5. The computation completes, and the output is loaded in storage.
  6. The application retrieves the output files for conclusion.

An application is the code and data that you are tasking Azure Batch to process. In EXERCISE 4.15, the source and data are accessible on GitHub and used in the Azure Portal when creating the application within the Azure Batch blade. The pool, as shown in Figure 4.50, is the group of compute nodes. Recall from earlier where you read about InfiniBand and RDMA. You would have noticed that the difference in those models is based on one having massive compute processing power and the other offering massive amounts of memory. Prior to creating your pool, you need to consider which kind of model your HPC workload requires. Consider Figure 4.51.

Schematic illustration of How Azure Batch and HPC work.

FIGURE 4.50 How Azure Batch and HPC work

Schematic illustration of the Ratios between CPU, RAM, and GPU.

FIGURE 4.51 Ratios between CPU, RAM, and GPU

Figure 4.51 symbolizes the different kinds of HPC, where some workloads perform computational activities that require large CPU power for simulations and analytics, while other data analysis programs parse large databases that have been loaded into memory, requiring lots of RAM. If the workload processes images for gaming or real-time images generation, then the workload would need large amounts of GPUs. There could also be a scenario where you would need a solution that could benefit from both CPU and GPU with a smaller amount of RAM. As shown in Figure 4.52, there are a large variety of options when choosing the kind of VM nodes to place into your pool.

Schematic illustration of choosing a VM node for the HPC pool.

FIGURE 4.52 Choosing a VM node for the HPC pool

When you run your workload, you do not have to worry about the amount of resources in the pool. When you create the pool in the portal, there is a scale section where you can create a customized scale algorithm to scale out and in with. When I think about VMSS and try to compare it with Azure Batch, a difference is that I do not need a base image, and it looks like there are VMs with much higher resource power in the selection list. I would think that it boils down to the reason you need the compute power. If you are doing intensive graphics simulations or Big Data analysis, then use Azure Batch. If you are running an enterprise web application or other workload type, then use VMSS or even App Services. Also, in Figure 4.50 there were tasks and a job, which is where you specifically specify what method or activity in the application is run in which pool.

Storage

Storage is covered in detail in the next chapter. As you can imagine, images, movies, Big Data, and other kinds of files that get processed by Azure Batch will likely be huge. The space required to store them and the latency you get when retrieving them is an important aspect when implementing a workload using the Azure product. Azure-supported storage products include VMs that are storage optimized (for example, the Ls series), if your workload will store large amounts of data on the physical VM. Also, blob, table, and queue Azure storage containers and Azure files all can be utilized for storing the input and output for your Azure Batch HPC workloads. Let's not forget about a DBMS like SQL Server or Oracle, which can be used to store and retrieve data in such scenarios.

Marketplace

Access the Azure Portal, and in the search textbox at the top of the browser enter Marketplace. Then do a search for HPC, which will result in numerous preconfigured HPC solutions that can be created, configured, and utilized fast. You might consider taking a look through these to see whether they can meet your requirements prior to investing the time required to build a custom HPC solution.

Azure Functions

Many details of an Azure Function have been touched on in Chapter 1, so review that if you need a refresher. Azure Functions is Microsoft's FaaS product offering and is also commonly referred to as a serverless computing model. One of my colleagues once said, “If you have two servers and you take one away, what do you have? You have a server less.” Although that is indeed true and kind of funny, it isn't what serverless means in regard to Azure Functions. What it means is that when your Azure Function workload is not running, it is not bound to any compute resource and therefore not costing you anything. The real beauty of serverless is its cost effectiveness. In reality, the product is free, yes, free, as long as you run in Consumption mode and stay under 1,000,000 executions and 400,000 gigaseconds of compute. A gigasecond is an algorithm based on memory consumption and the number of seconds the Azure Function ran. It's a lot of free compute—more than enough to get your hands dirty with and even to run some smaller production workloads. When your Azure Function is invoked/triggered, a feature called a Scale Controller will notify the Azure Function that it has work to do, which results in a real-time provisioning of compute capacity, the configuration of that compute resource, and the execution of the Azure Function code that you have deployed. As you see illustrated in Figure 4.53, the Scale Controller is monitoring supported message sources, and when one arrives, it spins up a VM, configures it, and puts the code on the VM. Then the Azure Function will retrieve the message and then process it, as the code dictates.

Schematic illustration of the Scale Controller role and an Azure Function.

FIGURE 4.53 The Scale Controller role and an Azure Function

When I spoke to one of my technical friends about serverless, the first question was about the latency of going from 0 to 100, meaning, how long does it take for the Scale Controller to get hardware provisioned, the Azure Function configured, and the function executed? I won't try to pretend that there is no latency, but that would only happen on the first invocation, and it will remain warm for 20 minutes before it times out (using a dedicated or premium hosting plan, you can avoid this timeout and keep the instances warm). The internals of how it all works is proprietary and won't be shared, but there are some steps you can take if you want to improve the cold start of an Azure Function.

There are two ways to make the initial Azure Function invocation faster. The first one is to deploy your code in something called a run from package. This requires basically that you publish your code in a ZIP file. When you “zip” a file or a group of files, it compresses them and makes them smaller. As you would then imagine, when the code is being deployed to the VM, it happens faster than if you have many files in an uncompressed form. The run from package is enabled using an App Setting that you will set in a later exercise, and it has much more impact when running in the consumption hosting plan than when in dedicated. The reason for running faster in dedicated is because the VM that is provisioned to run the Azure Function code doesn't get rebuilt after 20 minutes, which means the code is copied once. Take note that since the zip capability discussed here is the Windows variety (in contrast to .7z, .tar, .war, etc.), it means that this feature is only supported on Windows. The second way to reduce latency with the first Azure Function Invocation after a shutdown is to not let it shut down in the first place. Implementing the second approach requires you to run on a dedicated or premium hosting plan instead of consumption. When running in dedicated or premium mode, there is a general configuration setting named Always On, which, as the name implies, keeps the Azure Function warm by disabling the 20-minute timeout threshold.

The remainder of this section will discuss hosting plans, triggers and bindings, runtime versions, and supported languages.

Hosting Plans

In Chapter 1, I shared two hosting plans, Consumption and Dedicated. There is another, which won't be on the Azure Solutions Architect Expert exam because it is in preview; it is called Premium. I will include Premium in the following text, but keep it in mind that it won't be necessary to know for the exam. The Consumption hosting plan is the one that is considered the serverless model, when you operate in any of the other plans you are running an Azure Function but you have a VM or a group of VMs that are actively provisioned and bound to the workload. Both Dedicated and Premium will have a fixed cost, but the primary difference between Dedicated and Premium is the existence of the Scale Controller. In Dedicated, you are using the scaling capabilities that exist for an Azure App Service that were just discussed. That scaling can be automated using that products feature, but you have to manage it, you can also manually scale out and in. However, in both Consumption and Premium, the Scale Controller manages the scaling out and in of compute resources based on numerous intellectual property algorithms and concepts. Table 4.13 lists additional limits and differences based on hosting plan. Please note that the Dedicated hosting plan is often referred to as the App Service plan.

TABLE 4.13 Hosting Plan Limits

Feature Consumption Dedicated Premium
Storage 1 GB 50 to 100GB 250GB
Maximum Apps per plan 100 Unlimited 100
Maximum memory 1.5 GB 1.75 to 14GB 3.5 to 14GB
Default timeout 5 minutes 30 30
Maximum timeout 10 minutes Unlimited Unlimited
Maximum number of instances 200 10 to 20 20

From a storage perspective, note that when running in Consumption mode you are allocated 1GB of storage, which is where you can store the files (source, configuration, log, etc.) required to run and manage your application. Additionally, the content of an Azure Function running in Consumption mode is placed into Azure Files, which is unique to this mode. When running in Dedicated or Premium mode, your content is stored in an Azure Blob container. Both of those storage products are discussed in the next chapter, but they are different, and over time it was found that Azure Files with Azure Functions perform better when there are fewer files needing retrieval from the storage source, which is where the run from package feature came from. The differences in storage limits for the dedicated mode (i.e., 50 to 1000GB, shown in Table 4.13) exist because you have to choose different tiers of VM to run in that mode, like small, medium, and large, each having their own set of limits. The same applies to the amount of memory where each tier has a specific allocated amount; for memory this applies to Premium mode as well. An Azure Function in consumption mode has a limit of 1.5GB for the Function app; if more are required, you can move to one of the other plans to get more allocated memory.

The Maximum Apps per plan means, as a plan is synonymous with a process (EXE), that you can have 100 to an unlimited number of Function apps running on the VM. Remember that you can have multiple Functions per Function app. Also, remember that nothing is really unlimited, but there are large amounts of compute at your call. Realize that in Consumption mode you will run on a single processor, so would you really put workloads that have massive compute needs there? You would consider another hosting plan or compute product if that were the case. The maximum number of concurrent instances, where instances mean the number of VMs that will run your Azure Function, is also limited based on the hosting plan. There is some major compute power there; consider Premium with a max of 20 EP3 tier VMs massive. Finally, there is a timeout duration for the execution of an Azure Function. With Azure Functions there are two JSON configuration files, function.json and host.json . The function.json file is the place where your bindings and triggers are configured for a given Function within a Function app; you'll learn more about triggers and bindings in the next section. The host.json configuration file contains options that are applied to all the Functions in the Function app, one of which is the functionTimeout . It resembles something like the following:

{
"functionTimeout": "00:05:00"
}

The default is set to five minutes per Function invocation, and an invocation is the same as an execution in principle. For Consumption mode, the values can be from 1 second to 10 minutes. For both Dedicated and Premium modes, the default is 30 minutes. Setting the attribute to −1 means the function will run to completion. You might agree that 30 minutes is a long time for a single method to run, but it's not unheard of; however, I think setting the limit to infinity is a dangerous setting because you are charged based on usage, and what if something gets hung or runs in a loop for a week or so until you get the bill? Ouch. Keep in mind that if you consume the compute power, even if by accident, you will have to pay for it.

Triggers and Bindings

When you think about a regular program, there is a main() method that is the entry point into the program. From there all the if/then/else statements are assessed, and the code within the selected code block gets executed. In the Azure Function context, without getting too much into the coding aspect of this, instead of the main() entry point, there is a method named run() , which is where there are some details about what will cause the Azure Function to be invoked, i.e., the trigger. The simplest kind of Azure Function to create and consume is the HTTP trigger, which is triggered from a browser, curl , or any client that can make a request to an internet address. Take a look at the following code snippet; notice the run() method that has an attribute named HttpTrigger . It defines the kind of trigger that will invoke the Azure Function. The attribute has additional parameters that define the details about the binding (i.e., its metadata).

[FunctionName("csharpguitar-http")]
public static async Task<IActionResult>
       Run([HttpTrigger(AuthorizationLevel.Function, "get", "post",
           Route = null)] HttpRequest req, ILogger log)

It is also possible to declare the binding in the function.json file instead of in the method definition, as shown here:

{
  "bindings": [
    {
      "authLevel": "function", "name": "req",
      "type": "httpTrigger", "direction": "in",
      "methods": [ "get", "post" ]
    },
    {
      "name": "$return",
      "type": "http",
      "direction": "out"
    }
  ]
}

To summarize, a trigger is what will cause the function to run, and the binding defines the properties of the type. Take, for example, the previous function.json file that specifies the authentication level of type function , the name of type req (i.e., an HTTP request), and the HTTP methods that are supported, get and post . If the binding was to a database or other storage product, the connection string, the container name, an access key, and a name of a collection of data that the function will receive when triggered are all examples of what can be included in the binding. By doing so, you can implicitly access them in the Azure Function code without declaration. There are numerous kinds of triggers that are supported by an Azure Function, a summary of the most popular ones is shown in Table 4.14.

TABLE 4.14 Azure Functions Supported Bindings

Type Trigger Input Output
Timer X X
Service Bus X
Queue Storage X
Blob Storage
Cosmos DB
Event Grid X X
Event Hub X
HTTP X

If an image or file is uploaded into a blob container and there is an Azure Function bound to that container, then the Azure Function gets triggered. The same goes for an HTTP trigger; when it is called, the code within it is executed. Let's look at both the blob and HTTP trigger in a bit more detail, specifically at the binding metadata. If a file is added to a blob container, what kind of information do you think is important to know if your code needs to perform some action on that file? Its name, the location of the file, and perhaps the size of the file are important. All those details and many more will be accessible within the run() method as it was defined in the following blob storage binding example:

{
  "bindings": [
    {
      "name": "myBlob",
      "type": "blobTrigger",
      "direction": "in",
      "path": "samples-workitems/{name}",
      "connection": "AzureWebJobsStorage"
    }
  ],
  "disabled": false
}

The name attribute is mapped to an element called myBlob that passed from the blob storage to the Azure Function of type System.IO.Stream . The System.IO.Stream class contains properties called name , LocalPath , and Length that are populated with information about the blob and can be accessed via code to provide the details of the blob as previously stated. Additionally, an important object when managing HTTP requests is the HttpRequest object, and as you see in the previous binding example found within a function.json file, it is identified as req and passed into the run() method as an HttpRequest object. That would mean you could access the properties and methods that exist in an HttpReqest object like the query sting and the request body.

To mention the input and output a bit more, consider that the direction of a trigger is always in. You can see that in the previous HTTP bindings example based on this setting: "direction": "in" . In Table 4.14 note the triggers that support input bindings; this means you can have multiple input bindings. For example, if you wanted to have a Timer Trigger read some data on a Cosmos DB, because Cosmos DB supports input bindings, the Timer Trigger can be imperatively preconfigured to know the metadata about the Cosmos DB, for example connection string, database instance, etc. Then if you wanted that same Timer Trigger to save the content from the Cosmos DB into a file and place it into a blob container, since blob storage supports an output binding, the details of it can be preconfigured as a binding with a setting of "direction": "out" . You can see "direction": "out" as a binding for the HTTP trigger shown previously. It is, of course, supported to define the binding information at runtime (declaratively) using the extensions and SDKs for those given messaging and storage Azure products. There are more actions required to make this happen and more concepts to consider, which won't be discussed here, but it is possible.

Every trigger has a corresponding extension module that must be loaded into the runtime when the Azure Function is started for the first time. All trigger extensions have some configuration settings that can be set in the host.json file. Take the HTTP trigger settings, for example.

{
    "extensions": {
        "http": {
            "routePrefix": "api",
            "maxOutstandingRequests": 200,
            "maxConcurrentRequests": 100,
            "dynamicThrottlesEnabled": true
        }
    }

}

If you have some experience writing ASP.NET Web APIs, then you know by default the route to the API is prefixed by /api/ after the URL. Then after /api/ is the name of the Azure Function, which in the previous example is csharpguitar-http . Recall from the run() method declaration that included this annotation [FunctionName("csharpguitar-http")] . What the routePrefix allows you to do is change api to something else, like v1 or v2. For more information about the other HTTP trigger settings, take a look at the following online documentation: https://docs.microsoft.com/en-us/azure/azure-functions/functions-host-json#http.

Create an Azure Function that is triggered by an HTTP request by completing Exercise 4.16.

I'll point out a few things about some decisions made during the creation of the Azure Function. First, all runtime stacks do not support the feature of developing in the portal. For example, both Python and Java require that you develop locally and then deploy; therefore, the exercise chose .NET Core because the example was a bit simpler. You also chose not to enable Application Insights. This is a valuable tool, and again, since this was just an example, it was not enabled. If you are developing an Azure Function to run some live workload, then Application Insights is by all means recommended. Application Insights will be discussed in more detail in Chapter 9, but keep in mind that Microsoft (i.e., the FaaS offering) doesn't record your application errors. Cloud hosting providers are concerned primarily about the platform. You as the owner of the application need to code in exception handlers, and the location where you write those exceptions to is Application Insights. Finally, although the exercise created the Azure Function via the portal, that scenario is more for getting your hands wet and learning a bit about the capabilities. It is recommended for more complicated scenarios to develop and test locally using Visual Studio, Visual Studio Code, IntelliJ, Eclipse, or other open source IDEs.

Runtime Versions

A runtime is an in-memory set of supporting code that helps the management and execution of the custom application running within it. What a runtime is has been mentioned before, so more depth isn't required here. From an Azure Functions perspective, there are three versions of the runtime: version 1, version 2, and version 3. The difference between version 1 and version 2 is quite significant, and by all means, if you are creating a new function, choose the latest version. Version 1 of Azure Functions targets the .NET Framework, which doesn't support cross-platform execution. This means you are bound to Windows and bound to the full .NET Framework library. That isn't necessarily a bad thing; it just isn't the newest thing. Version 2 targets .NET Core 2.x and is cross-platform. .NET Core has proven to be more performant and has a much smaller footprint. Also, in version 1, it is possible to have a function written in C# and another function written in JavaScript contained within the same Function app. This is no longer a supported scenario in version 2. The runtime stack must be the same for all functions within the Function app.

To view the targeted runtime of your Azure Function, from the Overview blade in the portal, click Function App Setting. When the blade is rendered, you will see something similar to Figure 4.56.

Snapshot of a view of the Azure Function runtime version.

FIGURE 4.56 A view of the Azure Function runtime version

At this moment, version 3 of the Azure Function runtime is in preview; however, it is moving quickly toward Global Availability (GA). Version 3 targets .NET Core 3.x. Notice also the tilde in front of versions ~1, ~2, and ~3. This notifies the platform that when there is a newer version of the runtime released, the code running will automatically target that new version. As you see in Figure 4.56, there is a specific version of the runtime 2.0.12858.0. If there is ever a newer version that your code cannot support, then you can pin your function to a specific version of the runtime using the FUNCTIONS_WORKER_RUNTIME app setting parameter accessible via the Configuration blade in the portal.

Supported Programming Languages

The chosen runtime version is important because it will dictate the operating system the Azure Function runs on and the languages in which you can write code with. As mentioned, version 1 targets the full version of the .NET Framework, which is not cross-platform and therefore can only run on the Windows operating system. However, version 2 and any future version target .NET Core, which is cross-platform and can therefore run on either Windows or Linux.

From a language perspective, C#, JavaScript (Node), and F# are supported in all supported versions of the Azure Function runtime. Java, PowerShell, Python, and TypeScript are supported in version 2 and greater. There is likely work happening in the background to support more languages as time progresses. See Table 4.15 for an overview of supported languages by the Azure Function runtime version.

TABLE 4.15 Azure Function Supported Languages

Language 1.x 2.x 3.x
C# .NET Framework 4.7 .NET Core 2.2 .NET Core 3.x
JavaScript Node 6 Node 8 and 10 Node 8 and 10
F# .NET Framework 4.7 .NET Core 2.2 .NET Core 3.x
Java X Java 8 Java 8
PowerShell X PowerShell Core 6 PowerShell Core 6
Python X Python 3.6 Python 3.6
TypeScript X

As of writing this chapter, version 3 is in preview, and all those languages in the 3.x column are considered preview as well. There were also numerous other languages considered “experimental” in the context of version 1. I used the past tense because those will remain “experimental” and will never be a fully supported development language for Azure Functions v1. Those languages are Bash, Batch, and PHP.

  1. Which of the following runtime stacks are available for running an Azure Function on Linux?

    1. JavaScript
    2. .NET Core
    3. Python
    4. .NET Framework

The answer is A, B, and C because the .NET Framework is not cross-platform and therefore cannot run on the Linux operating system.

Service Fabric

The concept of microservices has been around for quite a number of years. The structural style of service-oriented architecture (SOA) may ring some bells. Service Fabric is a platform and an orchestrator to run microservices, which are a variant of the SOA development technique. What are microservices then? You can begin to understand by visualizing a nonmicroservice application, which would run on an Azure VM or an Azure Function, for example. When you create an application to run on virtual machines, the capabilities within it would typically span multiple tiers, i.e., a monolithic approach, as shown in Figure 4.57. The monolithic application solution would have a GUI and possibly these hypothetical built-in capabilities or services: security, order validation, logistics management, order fulfillment, and billing. This kind of monolithic architecture provides the compute using multiple tiers of IT architecture. Recall Figure 4.19 where the IT solution is comprised of the web/GUI, application, database, and authentication tiers.

Schematic illustration of the monolithic architecture.

FIGURE 4.57 Monolithic architecture

Each of those services (security, ordering, logistics, etc.) has a set of executable code that possibly overlapped, reused, or combined logic and shared libraries. A change to any part of the monolithic application would have some impact on all the services hosted throughout the multiple tiers.

From an Azure Function perspective, you may recognize some similarities with the term microservices, in that the unit of work performed by an Azure Function would be a much smaller service than the IT solution running on those multiple tiers. You could consider implementing a Function for each of those services within a given Function app. That scenario is a valid one so long as the number of Functions are manageable and that it can run in a sandbox. However, there is no orchestrator for running a large number of Azure Functions that may or may not be dependent upon each other, and you can only scale a Function App and not specific Functions. This is where Service Fabric and microservices fill a gap. Service Fabric is an orchestrator that helps efficiently execute and manage a massive number of small isolated programs. Take Figure 4.58, for example, where instead of running those capabilities with a dedicated tier bound to a VM or within a Function App, each service can scale and be managed independently.

Schematic illustration of the microservices architecture.

FIGURE 4.58 Microservices architecture

Scaling/managing a microservice independently is an interesting concept. In a monolithic code base, where you have many methods doing many different things, it doesn't take long to find where the bottlenecks are. A common place is performing database create, insert, update, delete (CRUD) or Select operations. The fact that all your code runs within the same process, a single slow-running, CPU- or memory-intensive procedure can have significant impact on the other methods running in the same process. There is no simple way to give that specific method more compute power; instead, you need to give compute to the entire tier. The group of services running in the process increases the compute by scaling up to a larger VM. This is not the most cost-effective or stable approach, because the need for extra compute power likely happens in bursts for a single service, and when the burst goes away, you have extra compute sitting idle generating costs. What if by scaling up, the application tier results in greater load on the database tier, again with more scaling until you finally get it right for that burst? The effort to get it right is not a simple one, and contemplating scaling back down and doing it all over again, when the next burst occurs, is often a hard pill to take. With microservices, you can scale more precisely as it is the service that scales and not a tier. Service Fabric handles the scaling; it monitors for health and helps with the deployment of changes.

Clusters and Nodes

There was mention of a node in Chapter 3 within the context of network traffic routing. It was described as a server that is connected to a network that helps route a packet of data to its intended location. A node is a virtual or physical server, and a cluster is a tightly connected group of nodes. Nodes in clusters typically operate with the same operating system, have the same hardware specifications like CPU and memory, and are connected through a high-speed LAN. A cluster in the context of Service Fabric as you would expect isn't concerned about routing networking data packets. Instead, the cluster is concerned with running one or more specific microservices. If you look back at Figure 4.58, connected to each microservice there is a cluster containing three nodes. In reality, the number of nodes for a given cluster can scale into the thousands, and the scaling can be programmatically or manually performed. Recognize that the Service Fabric platform runs on VMSS. That's right, this is another example of an Azure product (i.e., Service Fabric) built on top of another Azure product (VMSS). This means everything you have learned about VMSS in this chapter can be applied here. For example, you know that VMSS scales out using a base image consisting of the operating system and application. Additionally, with that scale out, the concepts of fault domains and update domains are applied. Understanding that, the confidence in the stability and redundancy of the architecture and infrastructure behind Service Fabric should be high.

Recognize that there are durability tiers in regard to the durability of Service Fabric components. They are Gold, Silver, and Bronze. Each of those tiers, as with many of the Azure products, has benefits, different limits, and a different set of capabilities and costs. You might assume that you can choose any series of VMs as your nodes, but this is not true; it is dependent on the tier. The same goes for storing the state. (State management, which is discussed later, is only supported in Silver and Gold.) It is recommended for production workloads that you use either Silver or Gold, because with Bronze, the scheduling of updates and reboots is not respected. The speed in which scaling happens is also dependent on the tier and will scale interestingly slower with Silver or Gold. This is because in those tiers, data safety is prioritized over speed and conversely so in regard to Bronze.

It is the responsibility of the orchestrator to properly deallocate and allocate microservices from or to nodes in the given cluster. This is a complicated activity considering that there can be a different set of microservices on each node. For example, a security and a logistic management service can coexist on the same node. To then properly deallocate a node, Service Fabric must know the state of all other nodes in the cluster so that when moving a service from one node to another, there is no risk of then overloading the new one. All of that scaling logic is happening behind the scenes using the orchestration logic within Service Fabric. When to scale is most commonly based on the consumption of the services or based on the resource usage on the node. There is a concept called logical load metrics that allows you to create custom service consumption-based metrics used for scaling. This kind of metric is concerned with elements such as connection count over a given time frame, the responsiveness of the application, or the number of times the application has been running. Based on counters maintained in those custom metrics, it is possible to scale out or in based on your defined thresholds. Another technique that is the most common scaling method is where you scale in or out using usage metrics such as CPU and memory. How all this works together can be better understood by reading more about the Service Fabric architecture, which is coming up next.

Architecture

The Service Fabric orchestration architecture is built upon a layered subsystem of components. Each subsystem component plays a role in making the microservice applications available, fault tolerant, manageable, scalable, and testable. Each component that makes up Service Fabric is shown in Figure 4.59 and discussed in more detail in the following text.

Schematic illustration of the Service Fabric subsystem components.

FIGURE 4.59 Service Fabric subsystem components

  • Activation and Hosting   The activation and hosting component is responsible for managing the lifecycle of the application running on each node. It knows which service or services are running on each node, and by interacting with the reliability component it can also determine whether the application is healthy and take appropriate action if not. An appropriate action would be to move the service to another node, move all the services to another node, and/or deallocate the node from the cluster.
  • Reliability   The reliability component consists of three mechanisms, starting with a replicator that ensures changes are applied to all services on all nodes. A failover manager that monitors when nodes are added and removed from the cluster then can adjust the load to distribute it evenly on the additional or remaining nodes, and finally a resource manager ensures nodes are replicated across fault domains and that the contents within them remain operational.
  • Communication   The communication component helps resolve service names to a location on any given node in the cluster. As services, aka microservices, can be hosted on multiple nodes, knowing where they are and how to locate them is quite an important feature; consider this as a DNS or naming service within the cluster.
  • Federation   The federation component is the heart of the Service Fabric product that provides the overview or the entry point for making decisions that need to know the current state of the microservice and nodes running in the cluster. Examples include how many nodes are there, how many of each microservice is running on each node, and what is the health status of each node.
  • Management   The lifecycle management of the Service Fabric cluster is managed by the, you guessed it, management component. This component interacts with many of the other components for gathering statistics on health, placements, and versioning. In addition, this is where the application binaries are placed and referenced when new services need provisioning.
  • Transport   The transport component manages the communication between the Service Fabric components and the nodes in the cluster as well as between the nodes and the consumers of the microservices running upon them.
  • Testing   Finally, the testing component provides a set of tools for simulating failovers, scaling, and deployments without having negative impact on the live application.

Best-Practice Scenarios

There are three areas that I will offer some guidance on regarding Service Fabric.

  • Stateless and stateful
  • Scaling vertically and horizontally
  • Application logging

The HTTP protocol is stateless by design. This means that when an HTTP request is made to a web server, all the required information to respond exists within, for example, the header and/or body. Once the request is responded to, the web server does not remember much if anything about it. The next time you make a request, everything needed to make the request successful again must be sent from the client. A stateful application, on the other hand, could conceivably store some details about the client and the request on the server so that each request didn't have to carry along so much detail. Take a simple example of applications that would not require state like an API that returns the datetime or performs some kind of mathematical calculation, like a calculator. In most cases, neither of those examples would need to store information about a client's previous datetime request or previous calculation. On the other hand, applications that provide capabilities that require multiple steps to complete or need to know the identity of the client sending the request would need to support the maintaining of state. Completing an order online usually takes multiple steps that require clicking a button that sends a request to a web server. Each of those requests would need to know where you are in the order process, as well as who you are. Most websites store that information on the web server (for example, order details and identity token) because it is faster since the order detail doesn't have to be sent each time and because it's safer since just a small time-sensitive encrypted identity token can be sent back and forth instead of reauthenticating with each request.

Service Fabric supports both stateless and stateful scenarios. There is no extra step to take when creating a stateless microservice. However, when the application requires state, then here is a place where the reliability component plays another important role. (Recall Figure 4.59.) Implementing the stateless capabilities into Service Fabric requires coding. The reliability component exposes a set of service APIs that can be consumed to create a StateManager for managing and storing the information your application needs to store for subsequent reference. Since the implementation of this technique is code-based and not something you need to know for the Azure Solutions Architect Expert exam, it won't be covered in more detail. However, one important detail that you must understand is that the data that is stored in the StateManager exists only on the node where the request is sent. It is important to recognize this in the context of scaling and failover scenarios. The request from a client must always be routed to the same node, and if the node goes away for any reason, so does the data stored on that node for the given session. You need to code for those scenarios.

You should understand what scaling up and scaling out mean. Scaling up means you have chosen to scale from a 1 CPU VM with 32GB of RAM to a VM with 4 CPUs having 64GB of RAM. Scaling out means that when you have one 4 CPU VM with 64 RAM, you add an additional VM with those same specifications that will run the identical application. In the context of Service Fabric, those terms are referred to as vertical and horizontal scaling. Scaling up equals vertical, and scaling out is horizontal. Scaling is recommended using Azure Resource Manager (ARM) templates, which are discussed in Chapter 8, or using the AzureClient , which is discussed in more detail here:

docs.microsoft.com/en-us/dotnet/api/overview/azure/service-fabric

Manually scaling the VMSS via the portal or any other interface, vertically or horizontally circumvents the Service Fabric components and can get the management component out of sync with the overall state of the cluster. That is not recommended.

I learned a scaling lesson some years ago while working on support at Microsoft. Previously in this chapter I discussed App Service Environments (ASEs), which are a private PaaS Azure offering. I learned the hard way that ASEs exhibit a different behavior when scaling than what I had experienced many times before. Commonly, when you scale vertically (up/down), it happens within a relatively short amount of time; it is impactful. You will lose state, but the timeframe in which it takes to get running again is usually the same amount of time it would take for a normal reboot of a physical server to happen. What I learned was that for a version 1 ASE environment, it is not the same; the time required to scale an ASE (V1) is much, much longer. I became so comfortable with using this virtual reboot (vertical scaling) as a quick solution for solving downtime, hanging, or badly behaving applications that I used it way too often without much thought. My point is to take caution when scaling. The impact can and sometimes is more impactful based on the product with which you are performing that action. However, recognize that, as stated many times, maybe too many so far, is that the ability to scale compute up, down, in, and out is the most valuable and game-changing offering existing in the cloud from a compute perspective.

Finally, it is important that you implement some kind of logging strategy. The cloud hosting provider does not have the manpower to monitor and resolve all exceptions that are running on the platform; instead, they focus on the platform only. There are limited platform-provided application-focused logging capabilities that are built in by default. It is often the case that, when an exception happens, it gets rendered to the client or shows up in the event viewer on the VM. However, if the application itself, in the form of code, does not implement any kind of logging, the scenario in which the error happened is usually unknown, and it will likely never be figured out. Microsoft provides and recommends tools like the Application Insights SDK, EventSource , and ASP.NET Core Logging for Service Fabric microservice deployments. You can learn more in Chapter 9.

Application Insights and Log Analytics, two previous stand-alone features, have been bundled into a single product named Azure Monitor. You will read often about Application Insights, but keep in mind it is bundled into Azure Monitor. The EventSource class is found within the System.Diagnostics.Tracing namespace available in the .NET Framework. You would need to implement this class in the code of the microservice running on Service Fabric. Part of the implementation requires a configuration in a web.config file that identifies where the log file is to be written and whether the logging is enabled. It is advised that you leave logging off unless there is a reason for it being on. Logs can grow fast and large and then have some negative impact on the performance. Watch out for that. Lastly, EventSource is dependent on the .NET Framework, which does not run on Linux. If you are targeting an ASP/NET Core application to run on Linux, then you would consider the ASP.NET Core Logging instrumentation. Logging capabilities are exposed via the Microsoft.Extensions.Logging namespace, which exposes an ILogger interface for implementing the log capturing, but again, this needs design, coding, and configuration. Application logging is a fundamental aspect for resolving bugs in production. If you want to increase the probability of having a successful IT solution running on Azure, then application logging is essential, which is why half of Chapter 9 is focused directly on that.

Azure Integration

It is common when running microservices or web applications on Service Fabric that you also consume and configure other Azure products. For example, when you create the cluster and the nodes within the cluster, there is a need to load balance requests across clusters and to also expose those endpoints to the clients and customers who will consume them. Figure 4.60 shows a common architecture scenario that a customer would implement.

Schematic illustration of a common Service Fabric architecture scenario.

FIGURE 4.60 A common Service Fabric architecture scenario

Notice that a load balancer can be used to balance requests across the nodes and microservices hosted in the Service Fabric cluster. The load balancer, as you learned in the previous chapter, can be configured to only support connections from a specific on-premise network, or it can be exposed globally. The same goes for the exposure of the microservice APIs, for example, the logics, ordering, and billing services. API Management can interpret the path in a request and redirect the request to a specific backend pool of compute. This is helpful when changes happen on the Service Fabrics location detail, like a relocation into a new zone, virtual network, or subnet, where its endpoint may receive a new IP address. Those backend changes can be hidden from the client connecting to the exposed endpoint because the API Management configuration can be updated to route to the new location of the Service Fabric cluster as required. The point here is that simply provisioning an instance of Service Fabric doesn't get you up and running. There is coding and some significant configuration requirements to make before this product becomes functional. The barrier to its implementation is a bit high and requires some skilled IT professionals to deploy and support it. However, for large microservice workloads that target the .NET Framework, this has proven to be a good option.

Azure Kubernetes Service

In principle, Service Fabric and Azure Kubernetes Service (AKS) provide similar capabilities. The difference between the two, or, better said, the reason you would choose one over the other, comes down to two things: open source versus .NET Framework (Figure 4.2) and midlevel complexity versus high complexity. While you were reading the previous section about Service Fabric, there wasn't a mention of Docker, Azure Container Instance (ACI), or Azure Container Registry (ACR), for example. AKS is specifically designed toward the consumption of containers created using either the Docker file format or ACR. AKS was not only developed from the ground up using open source technologies but also designed specifically for running open source applications. Service Fabric can run open source code and provide Linux as an operating system to run them on; however its original target was the .NET Framework development stack. Also, AKS is a bit less complicated to get up and running. The barriers to entry are less demanding. When it comes to Service Fabric, the orchestration of the workloads requires coding. When it comes to AKS, the default deployment and management of the workload requires no coding. Both of these orchestrators focus on the containerization of microservices; however, each one is targeted to a specific kind of application (open source versus .NET Framework). Each has a different complexity level of prerequisites and maintenance activities. Figure 4.61 illustrates the AKS concept and that is what will be discussed in this section.

The illustration describes not only the AKS components (like the cluster master) but also its tight integration with Azure DevSpaces, Azure DevOps, Azure Monitor, and container repositories like Docker Hub and ACR. Let's begin with a brief discussion of Kubernetes versus AKS and what existed prior to that on Azure.

Kubernetes vs. AKS

Kubernetes is itself a complete system for automating deployments, as well as scaling and managing applications. If you wanted, you could deploy Kubernetes itself onto the Azure platform and use the Kubernetes product without even touching AKS. What AKS provides is the infrastructure to run the cluster master (the Kubernetes control plane) for no cost. You will see the cluster master in Figure 4.61. Only the nodes within the cluster that run your application incur costs, so simply using AKS, you get some free compute that would otherwise result in a bill. How specifically the cluster master is architected is platform specific and not something you would need to worry so much about; however, it is worth some time to cover the various components that make up the cluster master. The cluster master components are as follows:

  • The APIs
  • etdc
  • The scheduler
  • The controller
Schematic illustration of an end-to-end illustration of Azure Kubernetes Services.

FIGURE 4.61 An end-to-end illustration of Azure Kubernetes Services

The API server uses JSON over HTTP REST for connecting with the internal Kubernetes API that reads/writes to the etdc data store for configuring the workloads and containers on the nodes in the cluster. The etdc data store is the location where the current state of the cluster is stored. Consider you have numerous pods running on numerous nodes within the cluster and you want to update them with a new version of the container image. How many instances of the pods and on which node they are running are all stored in the etdc , which helps find where the update needs to be rolled out. Additionally, knowing how many instances of a node you have is helpful when it comes to scaling. The scheduler also plays an important role where, if you have multiple instances, then you would not want to deploy a change to all of them at the same time. The scheduler will make sure they are updated one after the other so that the highest availability is achieved. The controller is the engine that maintains the status between the desired state of the cluster and the current state. Consider that you have requested to manually scale out to five instances of a specific node running a specific pod; the request would be sent to the API server. The controller would find out how many instances currently exist via etdc , determine how many more need to be added, and then take on the responsibility of getting them all provisioned, configured, and accepting traffic. Then it would update etdc once the actual cluster state matches the desired state. Imagine if the implementation is very large, there could be a lot going on here. AKS provides you with the cluster master compute and interface for free.

There was a product before AKS named Azure Container Service (ACS) that is being depreciated and replaced by AKS. ACS fully supports the Windows OS for running workloads, which AKS is still in the process of providing full support for. As mentioned, AKS is designed from the ground up on Linux and open source languages. ACS doesn't support the concept of node pools where nodes with the same configuration can target the same underlying VMs. Remember from Figure 4.52, where you read about VMs that are designed specifically for different workloads such as memory, CPU, or storage-intensive work. Pooling the nodes together, which is not possible with ACS, keeps applications that have specific workload requirements together and thereby ensures the best possible performance. ACS is only mentioned here in case at some point you hear or read something about it. You should not target this product for any new projects. Finally, AKS supports 100 clusters per subscription, 100 nodes per cluster, and a maximum 110 pods per node.

Clusters, Nodes, and Pods

A cluster is a group of nodes, where a node is typically a VM. The nodes typically do span update and fault domains so that availability is preserved during transient or other more serious regional outages. Review the “Clusters and Nodes” section in the previous section as the cluster and node concept here is similar to the scenario with Service Fabric. Take a look again at Figure 4.61 and notice the customer-managed component to get a good visual of the cluster, node, and pod configuration in the AKS context. Also note the term pod, which wasn't referred to in the Service Fabric context. I like to make the connection between the term microservice, which is used in the Service Fabric context, and a pod. Each of those references has to do with the location where the application code gets executed. From an AKS perspective and also from a Service Fabric one, you can have multiple microservices or pods running on a single node at any given time. However, from an AKS perspective, the pod is a one-to-one mapping with a container, while in Service Fabric the mapping is with an image (no Docker). From now, in this context, the usage of pod and microservices will carry the same meaning.

Development and Deployment

Creating and deploying an application to AKS is a bit easier than you might expect and is something you will do in a few moments. Before you do the exercise, there are a few concepts I would like to introduce to you or perhaps provide you with a refresher. The first one is a new configuration file called YAML (rhymes with camel) and has a recursive acronym of “YAML Ain't Markup Language.” In the context of Kubernetes, AKS, Docker, and even some in Azure DevOps, you will see this file format occurring often. For many this is the “next” iteration of file formatting that started from XML to JSON and now YAML. YAML has yet to show up much in the non-open source world; however, it is likely on its way. There is also some helpful integration of AKS into Visual Studio and Visual Studio Code for deploying to AKS. The Visual Studio IDE allows a developer to develop and test an application on their local workstation and then deploy directly to an AKS cluster. You can see in Figure 4.61 that, once deployed, remote debugging is also possible. What is going on behind the scenes to make that happen is completely abstracted away from the developer by a service called Azure Dev Spaces, which you will see and use later. The final concept I would like to introduce you to briefly here is GitHub and Azure DevOps. Both of those products are designed for the management and storage of application source code. Remember in EXERCISE 4.2 that with Azure Container Instance (ACI) you got some source code from GitHub to create the application running in that container. The same set of capabilities exist for Azure DevOps as referenced in Figure 4.61 where Source Code Repository and Azure DevOps Pipeline are presented. Much more about Azure DevOps is discussed in Chapter 8 where deployments are covered in detail. Read more about it there if desired.

To learn more about AKS and use Azure Dev Spaces, complete Exercise 4.17 where you will create an Azure Kubernetes Service cluster using Azure CLI and then deploy an ASP.NET Core web application to it using Visual Studio. For this example, I used Visual Studio Community 2019 (16.2.4), which is free to download from here:

visualstudio.microsoft.com/downloads/

That wasn't too hard, but realize that the exercise is just scratching the surface, and there is so much more to it. There will be some books written about this at some point if there haven't been already. To explain a bit more about what you just did, I start with the Azure CLI commands. The first one, az aks create , is the one that created the AKS cluster. The second, az aks get-credentials , retrieved and stored the credentials of the AKS cluster so that I would then be able to capture information using kubectl and perform other administrative work on the cluster. The Kubernetes command-line tool kubectl is installed by default when using Azure Shell. A reason I decided to use Azure Shell was to avoid the complexities of installing Azure CLI and kubectl on my workstation. Now that you have created an AKS cluster, developed an application, and deployed it to the cluster, read on to learn a bit about maintaining and scaling the AKS cluster.

Maintaining and Scaling

Since you know that VMSS is used behind the scenes of AKS and you know the roles and responsibilities that come with IaaS, you will recognize that you must manage any OS upgrades, the OS patching, the version of Kubernetes, and of course the application running on the nodes. The details to specifically perform these updates will not be covered here; instead, only the concepts will be explained. When an update to Kubernetes is initiated, it is rolled out using the concept of cordoning and draining. This is done to minimize the impact the update may have on the IT solution. The first action that is triggered from the controller of the cluster master is to cordon one of the nodes and target it for upgrade. There is a waiting period after the node is identified so that any request running on the node can complete; this is considered draining. Once the node is no longer being used, the upgrade proceeds. Once complete, the controller places the node back into the pool and takes out another; the process is repeated until all nodes are upgraded. It is possible to upgrade just a node within the cluster or the entire cluster of nodes. In the Azure Portal on the AKS blade, you will find a navigation link named Upgrade, which will upgrade the entire cluster. Click the Node pools link on the AKS blade, and you will find an option to upgrade each node. Upgrading an application running on the AKS cluster has many available options; you can perform it manually, or you can choose any of the deployment options found by clicking the Deployment Center navigation menu item on the AKS blade. Azure DevOps Repos, GitHub, Bitbucket Cloud, and External Git are all options (as shown in Figure 4.61). Each of those implement CI/CD, which is very helpful when it comes to managing, developing, testing, and deploying application code (more on that in Chapter 8). For upgrading the OS version you have provisioned for AKS, you can find the VMSS instance that was created when you created the AKS cluster. Open the virtual machine scale set in the Azure Portal and navigate to the blade; the name should be prefixed with aks- . On the VMSS blade, click Instances, and then click the instance. The process for upgrading the instances running your AKS cluster is the same as you would do for any other VMSS implementation.

There are two methods for scaling: manual or automated. Manual scaling isn't something you should consider when running a live production application. This kind of scaling capability is most useful during testing where you can learn how your application responds to scaling. You can scale from a pod or node perspective. To manually scale a pod, you can use the Kubernetes command-line kubectl . First use it to get a list of pods so you can see what you have; then once you identify the pod, scale that pod to, for example, five replicas or instances.

kubectl get pods
kubectl scale --replicas=5 <podName>

To manually scale a node to, for example, three instances, you can execute the following Azure CLI command from either Azure Cloud Shell or from a workstation with Azure CLI configured:

az aks scale --resource-group <name> --name <name> --node-count 3

From an autoscaling perspective, Kubernetes provides a service called Horizontal Pod Autoscale (HPA), which monitors the resource demand and scales the number of replicas/instances when required. This service does work with AKS but requires an optional component named Metrics Service for Kubernetes 1.8+. The HPA checks the Metric API exposed from the API server, which is part of the cluster master, every 30 seconds. If the metric threshold has been breached, then the scale out is managed by the controller. To check the scale metrics, you can use the Kubernetes command-line kubectl to check, using this command:

kubectl get hpa

To set the metric threshold, if the previous command doesn't return any configurations, use the following:

kubectl autoscale deployment <podName> --cpu-percent=70 --min=2 --max=15

This command will add a pod when the average CPU across all existing pods exceeds 70%. The rule will increase up to a maximum of 15 pods, and when the CPU usage across the pods falls below 70%, the autoscaler will decrease one- by one to a minimum of two instances.

Also, you can autoscale a node (the VM on which the pods are running). Although there are some capabilities in the Azure Portal to scale out the number of VMs in the AKS cluster (it is a VMSS product after all), this is not recommended as it will get the AKS cluster controller and etcd out of sync. You can scale a VMSS using PowerShell, Azure CLI, or a supported client. This also is not recommended. It is recommended to always configure scaling or to manually scale using the Azure CLI Kubernetes component, which is accessed via Azure CLI and always followed by aks , for example az aks . The script that follows is an example of how to scale a node:

az aks update  --resource-group <name>  --name <name> 
  --update-cluster-autoscaler --min-count 2  --max-count 15

Having used the Azure CLI Kubernetes component, as recommended, the AKS cluster is set to have a minimum of two nodes and a maximum of 15. You might be asking yourself, what are the thresholds? Well, in practice, HPA and a node scaler service called Cluster autoscaler are used alongside each other. The cluster autoscaler will check the same metrics that were set for the pod every 10 seconds versus 30 seconds like HPA does for the pod. In a similar technique, HPA takes by increasing the number of pods based on the metrics. The cluster autoscaler will focus on the nodes (aka the VM) and adjust them appropriately to match the needs of the pod. You must be running Kubernetes version 1.10.x to get this cluster autoscale feature.

The final topic covered here is one that will handle a burst. I have mentioned the bursting concept a few times in regard to hybrid networks and HPC. That capability is also available when running AKS. In some scenarios, when a rapid burst occurs, there may be some delay with the provisioning of new pods as they wait for the scheduler and controller to get them rolled out; the requested increase in capacity simply hangs in a waiting state until they are added into the cluster. When the burst is unexpected and the number of nodes and pods exceeds the upper limit of your configuration, additional nodes simply are not allowed to be scaled out further. The way to handle this burst a bit better is to use virtual nodes and our friend ACI. Virtual nodes and their configuration into AKS is an advanced feature and is mentioned here only for your information. You can find more details here:

docs.microsoft.com/en-us/azure/aks/concepts-scale

Cloud Services

Cloud Services were Microsoft's first PaaS offering. At the time, Cloud Services offered some good capabilities like the all-powerful autoscaling feature. The problem with Cloud Services was the barrier of entry was high. To deploy an application to Cloud Services, the application needed to be reconfigured to run on that environment. This meant existing ASN.NET and ASP.NET MVC applications couldn't simply be deployed to and run from Cloud Services. The application code needed to be migrated to a Cloud Services project in Visual Studio, which was acceptable for smaller applications, but there was never a good migration process from an existing on-premise web application to Cloud Services. Many customers still use this service, but let it be known that this product is deprecating, and you should not move any workloads to it unless there is a justified business case to do so. There is no need to discuss this Azure product in more detail, but it's worthy of mention because at one time it was the only PaaS Microsoft had, and it was good. It has simply been replaced by better Azure products and features.

Windows Virtual Desktop

The word virtual has been popping up a lot in this book, no? Examples are virtual network, virtual machine, and now Virtual Desktop. You should already have an understanding of what virtual means in the context of computing now. You should have no doubt that behind the virtual there is a physical layer because nothing exists without some kind of actual connectivity between true existent elements. So, you shouldn't have much difficulty making an educated guess when it comes to Windows Virtual Desktop, right? Take a second now to formalize your definition, and then read on. Try this additional mental exercise and guess why a company would implement or want such a thing.

To answer the first question, virtual means that the compute resources allocated to process a workload are allocated from a greater physical pool of compute power. But why would a company want to run a bunch of Windows desktops as virtual machines? The answer to the second question has two aspects. The first is client/server, and the other is cost. From a client/server perspective, before the internet/intranet, the big thing was the creation of GUI (desktop) applications using, for example, Windows Forms, which had very complicated client-side business logic. This design principle certainly existed before Model-View-Controller (MVC) and even before the concept of separating presentation and business logic entered into any kind of design-oriented conversation. The referred to computing era was the 1990s. If you recall from the introduction, I mentioned that 2013 was the time companies started migrating to the cloud. Now I also call for the rewriting or rearchitecting of all non-cloud-based applications. It is with limited effort that you should recognize how fast technology moves and why there are possibly many programs from the 1990s that remain too mission critical and complex to attempt such an upgrade or fundamental modification. This is mostly because the people who coded them are long gone, because no one wants to look at the old code, or because the effort required to make the change is too great for most units or people.

The solution was then to create a cost-effective means for a lift-and-shift solution to keep the code and process model intact. That means Windows Virtual Desktop. If you think about the costs involved in running a client/server solution, one is the server side for sure, and this entire chapter has been about the provisioning and configuration of server-side compute power. There have also been some tips on its redundancies and how to control its associated costs. The other side of the equation is the client. Those who need to consume a client/server application need a workstation. It isn't like an internet application where customers have their own workstation to access the code on your server. In this context, it is most commonly an employee, or maybe even a customer, who accesses your application via an in-store kiosk to use some kind of backend server application. The cost comes down to having those machines where they are needed, with ample compute power to run the desktop application.

If the workstation is required to only make a remote connection to a virtual desktop running the actual client-side program, then the employee workstation can be much less powerful than the virtual desktop. In addition, does the employee need to access and use the virtual machine 100% of the time? If not, then the compute power utilized and charged to run the desktop can be reduced. If you purchase a workstation for each employee to run the desktop application, then that compute is 100% allocated to that employee and can be idle sometimes. However, you might be able to save some resources if a shared employee workstation is purchased with lower specifications and the remote shared virtual machine with higher specification is used only on demand and changed only based on its consumption.

To run a Windows Virtual Desktop solution, the following components are required:

  • An Azure virtual network
  • An Azure Active Directory connected to a Windows Server Active directory via Azure AD Connect or Azure AD Domain Services
  • The workstations that connect to the Windows Virtual Desktop solution are not VMs.

There is much more to this area; the topic is worthy of a book on its own. This section should be enough to know the requirements and use case for the Windows Virtual Desktop product. For more information, see docs.microsoft.com/en-us/azure/virtual-desktop/overview. It provides many more details.

Summary

Just like in the previous chapter, we covered a lot, but I hope that you have gotten your hands dirty, created some Azure products, and configured many of the features. You should feel confident that since you now have a solid understanding about Azure security, Azure networking, and now Azure compute, the probability of passing the Azure Solutions Architect Expert exam is rising rapidly.

Specifically, in this chapter, the key takeaways are that although the compute power is advertised as unlimited, there are some limits. For a majority of consumers, however, the limits of that compute will not be reached due to either cost or simply the need for such workload computations are unnecessary. Also, you know that if you do ever hit a compute threshold, if you have a justified business case, those limits can be lifted, and more resources can be provided. The limits are to protect you from receiving an outrageous bill.

Containers and images are gaining a lot of traction and can help simplify the deployments onto IaaS and PaaS architectures, as well as transforming applications to run as microservices or AKS orchestrated solutions. Docker is a leading technology for creating containers. Azure VMs have a vast catalog of VM types, with series that target CPU, GPU, memory, or high storage workloads. You can run the Windows operating system and almost any version of Linux, and remember, when you want to move an on-premise workload to an Azure VM, the tool of choice is Azure Migrate. Also recall the similarity in name between availability sets and VM Scale Sets, which doesn't mean they provide similar capabilities. Availability sets have to do with zones and fault and update domains, and they are usually implemented along with a tiered monolithic architecture model. VMSS is a pool of VMs that get provisioned using the same image.

Azure App Services and Azure Functions provide great platforms for running web applications, Web APIs, and serverless workloads. Being PaaS, they both eliminate the maintenance of operating system and third-party runtimes. That loss can also mean your application cannot run on the platform, but in that case you can either choose to run in a container or move over to IaaS. Don't forget about WebJobs, which run batch jobs, and Azure Batch, which also runs batch processing but at a galactic scale.

Lastly, you learned about Service Fabric and the Azure Kubernetes Service (AKS), which focus on microservices and containerization. Service Fabric can support containers and some open source; however, its primary strength is running and managing .NET-based microservices. AKS is a robust open source orchestrator targeted at containerized open source applications on Linux.

Key Terms

App Service Environment (ASE) Horizontal Pod Autoscale (HPA)
App Service Plan (ASP) hyper-scale
Azure Container Registry (ACR) Infrastructure as a service (IaaS)
Azure Container Storage Input/Output operations per second (IOPS)
Azure Marketplace Integrated Drive Electronics (IDE)
Azure Reserved VM Instance Internal Load Balancer (ILB)
batch jobs lift and shift (aka rehost)
Blue Screen of Death (BSOD) Microsoft Distributed Transaction Coordinator (MSDTC)
Business Continuity and Disaster Recovery (BCDR) Microsoft Message Queuing (MSMQ)
Cloud Bursting Microsoft Virtual Machine Converter (MVMC)
Cloud Optimized Model-View-Controller (MVC)
Cluster autoscaler Orchestration
Command Line Interface (CLI) OS level virtualization
Common Language Runtimes (CLR) Page Blob
Compute Platform as a service (PaaS)
Container as a service (CaaS) Remote Direct Memory Access (RDMA)
Containerized Representational state transfer (REST)
Create, Insert, Update, Delete (CRUD) Resource lock
Cross-Origin Resource Sharing (CORS) runtime
Database Management System (DBMS) Service Oriented Architecture (SOA)
Disaster Recovery (DR) Simple Object Application Protocol (SOAP)
Docker Small Computer System Interface (SCSI)
Electronic Data Interchange (EDI) Software Development Kits (SDK)
ephemeral disk update domains
fault domains Virtual Hard Disk (VHD)
Functions as a service (FaaS) Virtual Machine Scale Sets (VMSS)
Global Availability (GA) virtual nodes
Graphical Processing Units (GPU) Web API
Graphical User Interface (GUI) Web Application
High Performance Computing (HPC) Windows Communication Foundation (WCF)
Windows Subsystem for Linux (WSL)

Exam Essentials

  • Understand Azure Virtual Machines and scale sets.   Azure Virtual Machines is the most utilized compute product consumed in Azure. This is also the most important area in compute to completely understand when it comes to the Azure Solutions Architecture Expert exam. Knowing how to migrate, move, and create them, their limits, and the blessed (aka supported by Microsoft) images are must-know concepts.
  • Understand Azure Container Instances.   Containers are the new thing, and you need to get up to speed with them. This means not only regarding open source technologies, but from a Windows and Azure perspective. Docker is the driver of this concept, but Azure Container Instances, the Azure Container Registry, and AKS are products that prove Microsoft is serious about this.
  • Understand App Services and Azure Functions.   The PaaS and FaaS cloud services are great for getting your foot into cloud computing. App Services are for running web applications and web APIs, while Azure Functions are for small but rapid processing of messages that arrive in a work queue.
  • Understand Azure Batch and high-performance computing.   You know about WebJobs and what a batch job is, but Azure Batch and HPC are at another level. Machine learning, AI, Big Data, gaming, and visual rendering are all the aspects of compute that require great CPU, GPU, memory, and storage resources. Compute power is what governments need and use; for example, supercomputers are also products available to Microsoft Azure customers.
  • Understand the orchestrators Service Fabric and AKS.   Running large-scale .NET-based microservices and open source containerized application images are some of the newer kind of architectural patterns. Using these orchestrators allow an administrator to properly scale, patch, and monitor large-scale workloads using this new architectural design pattern.

Review Questions

  1. Which of the following cloud service models allow the customer to control the runtime? (Select all that apply.)
    1. IaaS
    2. FaaS
    3. CaaS
    4. PaaS
  2. Which Azure compute product would you choose to host a Web API considering compute cost and maintenance requirements?
    1. Azure VM
    2. Azure Function
    3. Azure App Service
    4. Azure Batch
  3. Which of the following Azure compute products support containerization? (Select all that apply.)
    1. Azure Container Instances
    2. Azure App Services
    3. Azure Container Registry
    4. Azure Functions
  4. What is the difference between a VM and a container?
    1. Only VMs are allocated a dedicated amount of compute resources; containers are scaled automatically based on load.
    2. A container can scale both out and up while a VM can only scale out.
    3. After provisioning a container image, the originally allocated compute resource cannot be changed without a rebuild and redeployment.
    4. When an Azure container is created, it must target either Windows or Linux OS.
  5. Which of the following actions do not deallocate an Azure VM? (Select all that apply.)
    1. az vm stop
    2. Stop-AzVM
    3. Logging off from an RDP connection
    4. Remove-AzResourceGroup on the resource group that the VM exists in
  6. What is true about Azure virtual machine scale sets (VMSS)?
    1. VMSS can be deployed into only two Availability Zones at any given time frame.
    2. The VMSS architecture is intended to run tiered applications that contain a front-end tier, a middle tier, and a database or backend tier.
    3. VM instances in a VMSS cluster are identical.
    4. Scaling out additional VMSS instances will not have any negative impact so long as your application is stateful.
  7. Which type of WebJob can be remotely debugged?
    1. Continuous
    2. Triggered
    3. Manual
    4. Singleton
  8. In which scenarios would you choose an ephemeral disk over managed? (Select all that apply.)
    1. You must use the lowest cost option.
    2. Your application is stateful and cannot be reimaged without impact.
    3. Your disk capacity requirement exceeds 65,535GB (i.e., 216GB).
    4. Your application is stateless and can be reimaged without impact.
  9. Which of the following are true about both Service Fabric and Azure Kubernetes Service (AKS)? (Select all that apply.)
    1. The concept of a node is the same for Service Fabric and AKS.
    2. Both AKS and Service Fabric run on VMSS.
    3. Service Fabric is best suited for .NET Framework development stacks on Windows, and AKS is best suited for open source stacks on Linux.
    4. AKS and Service Fabric are orchestrators that manage clusters and nodes that run microservices.
  10. True or False? All of the following components are required to implement a Windows Virtual Desktop solution.

    Azure Active Directory

    A physical workstation

    Azure Virtual Network

    Azure AD Connect

    1. True
    2. False
  11. Which of the following are true about managed disks? (Select all that apply.)
    1. Managed disks are physically connected to the Azure VM.
    2. Managed disks can be used with both Windows and Linux VMs.
    3. The contents of managed disks are stored on Azure Storage as a page blob.
    4. Managed disks provide better performance when compared to an ephemeral drive.
  12. Which of the following represent a use case for running your workloads on Azure Virtual Machines? (Select all that apply.)
    1. Creating a BCDR footprint on Azure
    2. Non-cloud-optimized, lift-and-shift migration
    3. Content-based web application hosting
    4. High-performance compute batch processing
  13. Which of the following represent a use case for running your workloads on Azure App Services? (Select all that apply.)
    1. Your website requires a third-party toolkit.
    2. A website that exposes a REST API of relative complexity
    3. A website that requires no operating system–level configurations
    4. A website that is not containerized
  14. Which of the following represent a use case for running your workloads on App Service Web App for Containers? (Select all that apply.)
    1. You want to utilize an existing Docker container.
    2. The website currently runs Linux.
    3. You want to run PHP or Node on Windows.
    4. Your website requires a third-party toolkit.
  15. Which of the following represents a use case for running your workloads on Azure Functions?
    1. The code requires significant CPU and memory.
    2. Managing your cost is of utmost importance.
    3. Your code has a long startup/warmup time.
    4. Content-based web application hosting
  16. True or false? Azure Container Instances (ACI) requires the provisioning of Azure VMs, while Azure Kubernetes Service (AKS) does not.
    1. True
    2. False
  17. Which of the following are true regarding Azure Container Instances (ACI)? (Select all that apply.)
    1. Tightly aligned with Docker concepts and features
    2. Supports only the Windows operating systems
    3. Best used for small to medium workloads
    4. Has comparable orchestrator capabilities when compared to Azure Kubernetes Service (AKS)
  18. Which of the following are true regarding Azure Kubernetes Service (AKS)? (Select all that apply.)
    1. AKS offers similar capabilities as Service Fabric.
    2. AKS only supports applications that can run on Linux.
    3. JSON syntax is primarily used for configuring AKS.
    4. You can integrate Visual Studio and Azure DevOps into your AKS development and deployment processes.
  19. Which one of the following is true concerning the Azure Container Registry (ACR)?
    1. ACR and the Docker Hub serve the same purpose.
    2. ACR containers are private and require a user ID and password to access and deploy.
    3. You can only deploy images contained in ACR into the Azure region where the ACR itself is deployed.
    4. Both A and B
  20. Which of the following is true regarding Windows Virtual Desktop? (Select all that apply.)
    1. You must connect to a Windows Virtual Desktop from a physical machine.
    2. An Azure Virtual Network (VNet) is required to implement a Windows Virtual Desktop solution.
    3. You can connect to a Windows Virtual Desktop either from a physical machine or from a VM.
    4. Windows Virtual Desktop reduces costs by optimizing the use of compute resources.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.234.154.197