EXAM AZ-303 OBJECTIVES COVERED IN THIS CHAPTER:
- Implement and Monitor an Azure Infrastructure
- Implement VMs for Windows and Linux
- Implement Management and Security Solutions
- Manage workloads in Azure
- Implement Solutions for Apps
- Implement an application infrastructure
- Implement container-based applications
EXAM AZ-304 OBJECTIVES COVERED IN THIS CHAPTER:
- Design Infrastructure
- Design a compute solution
- Design an application architecture
Companies or developers expose their applications to consumers or employees via a computer. A computer is often visualized as a workstation or laptop that is on the desk or table in front of you. However, the computer I am referring to here is a server in a data center whose sole purpose is to provide a service instead of consuming it. That's clear, right? But, because servers play such a significant role in the existence of an application, often compute products are wrongly chosen as the first point of entry and design into Azure. Compute, aka the hosting model, is not the most optimal entry point for production workloads. As you already know, security and networking components need to be considered prior to compute.
Beginning with the creation of an Azure compute resource before the other two steps (security and networking) can lead to growing pains or leaving your application vulnerable to bad actors. For example, did you make your subnet too small for the planned number of resources being placed into it, or do you have control over who can create Azure resources in your subscription? Do you need to synchronize your on-premise Active Directory with Azure to run restricted applications on compute resources hosted in the cloud? What region do you need to place your compute resources into? Those are just a few of the questions needing answers that could be overlooked by jumping straight to compute.
By no means am I attempting to convince you to underestimate the importance of compute. Compute is where the magic happens and should be considered the heart of your IT solution. Without it, there is no place to run the service your program was written to provide. With no compute, there is no need for a network nor any kind of security implementation. You can even argue the importance of compute from a database perspective. It is great to have data, and companies have a lot of it, but without a computer to capture, interpret, manipulate, and present it, what value does data have? The point is that compute is a place where you can realize impact and show progress the quickest, which makes it a favorable entry point. However, doing so would showcase short-term thinking and lack of structured planning. Therefore, it is recommended you follow the process discussed so far (first security, then networking, then compute). By doing so, the odds of successful migration or creation of your solution onto Azure will greatly increase.
“Excellence is never an accident. It is always the result of high intention, sincere effort, and intelligent execution; it represents the wise choice of many alternatives—choice, not chance, determines your destiny.”
—Aristotle
Although Aristotle was referring to life in that quote, the same principle can be applied to choosing Azure compute products; there are a lot of choices, and making the wrong choice, as in life, can have undesirable consequences. Moving to or creating a new solution on Azure unfortunately doesn't get much easier at this point. Compute is simply the next level of technical competency required to progress toward your ultimate goal of Azure proficiency. Choosing which Azure compute resources you need, how much of them, how much they cost, and how you get your application configured to run on them, can be a great challenge. But with good intentions and sincere effort, it is possible.
This chapter explains the details of Azure compute, which is synonymous with the hosting model. To make sure you know what hosting model means, take a look at the exam objectives covered in this chapter. To call out a few, Azure VMs, Service Fabric, App Services, and Azure Functions are hosting models. Each of these compute products has specific benefits, use cases, and limitations that require clarity so that the best one is chosen for the workload being created or moved to Azure. The specifics of each hosting model will be discussed in more detail in their dedicated sections. Until then, let's focus on two key elements: cloud service models and how to choose the right hosting model.
I expect you to already know these cloud models if you are now preparing for the Azure Solutions Architect Expert exam; however, a review will cause no harm. You will know them as IaaS, CaaS, PaaS, and FaaS. There are numerous other “-aaS” acronyms for cloud service models; however, those listed previously are the most common in the Azure context. Refer to Figure 4.1 for a visual representation of them and read on for their description.
Infrastructure as a service (IaaS) was one of the earliest cloud service models, if not the first. This model most resembles the long-established on-premise architecture where a software application executes on a stand-alone or virtual server. The server is running an operating system and is connected to a network. As shown in Figure 4.1, the cloud provider (i.e., Microsoft) is responsible for the hardware and its virtualization, while the operating system (Windows or Linux) is the responsibility of the customer. The customer's responsibilities include, for example, updating the OS version and installing security patches. The customer has great control and freedom with this model, but with that comes greater responsibility. Microsoft only commits to providing the chosen compute power (CPU and memory) and ensuring that it has connectivity to a network. All other activities rest in the hands of the customer. Azure Virtual Machines, discussed in more detail later, is Microsoft's IaaS offering.
A container as a service (CaaS) is one of the newer cloud service models on Azure. CaaS delivers all the benefits available in IaaS, but you drop the responsibility for maintaining the operating system. This is visualized in Figure 4.1 as the operating system box is now the same shade as the hardware and virtualization boxes. A popular containerization product is Docker, which allows the bundling of software, dependent libraries, and configurations into a single package. Then that package can be deployed into and run on a container hosting model. Azure Kubernetes Service (AKS), Azure Container Instances, Azure Service Fabric, and Web App for Containers all provide the service for running containerized packages on Azure, all of which are discussed later.
Platform as a service (PaaS) is an offering for customers who want to focus only on the application and not worry about the hardware, network, operating system, or runtimes. This offering comes with some restrictions, specifically in regard to the runtime. A runtime is the library of code that the application is dependent on, for example .NET Framework, .NET Core, Java, PHP, or Python. The dependency on the runtime is defined during development. For example, if your application targets Python 3.7.4 but that runtime is not available on the PaaS hosting model, then your application will not run. The same goes for .NET Core 2.2. If you target 2.2 during development but that runtime is not yet on the platform, then it also will not run. Making changes to the runtime is not allowed when running on a PaaS platform. There are other constraints such as changing or deleting operating system–level configurations. This isn't allowed because they may cause damage, and as you are not responsible for the operating system, you cannot make changes to it.
Notice the shaded Unit of Scale box on the far right of Figure 4.1. The boxes with the similar shades under the titles IaaS, CaaS, PaaS, and FaaS symbolize where scaling rules (or manual scaling) are applied when executed. Scaling is a standard PaaS feature but can also be realized using virtual machine scale sets in IaaS. When a scale command is executed, a duplicated instance of your application is brought online for user consumption. The duplication is from the virtualization level upward, whereby the VM on which the PaaS runs will have the same specification. For example, if four CPUs and 7GB of memory (a common size of a single-instance PaaS production workload) are chosen, then the operating system with all patches, the runtime, the containers, and the application code will be scaled, making all the instances identical.
Autoscaling or scaling has not been discussed in much detail so far, but the concept has been touched on. Again, scaling is the most valuable and cost-effective offering that exists in the cloud, from a compute perspective, because it optimizes utilization, which reduces the overall cost of compute power. Products such as Azure App Service and Cloud Services (deprecating) are Microsoft's PaaS offerings.
The final cloud service model to describe is functions as a service (FaaS). FaaS is most commonly referred to as serverless computing and is offered via a product called Azure Functions. Unlike the previously discussed cloud service models, FaaS does not require the creation of a compute instance. When creating an instance of Azure VM or an Azure App Service, each of those services requires the selection of an SKU, which describes the number of CPUs, amount of memory, and storage capacity. In the FaaS context, this is not required; instead, you simply create the Azure Function and deploy your code to it. The platform is then responsible for making sure there is enough compute capacity to execute the code. This simply means that the scaling is done for you. There are some restrictions such as the length of time an Azure Function can run, and there is a limit on the amount of capacity you can get allocated. Both of those limits and other limitations to watch out for will be covered in more detail later in the chapter.
Buying the right Azure compute product and getting it to work properly depends greatly on understanding your own application's requirements. Can your code run successfully using Azure Functions (FaaS), or does your solution require options only available through Azure VM (IaaS)? You might be thinking, “I know my application well. I know its dependences, but I don't know what is and what is not supported on each of the Azure compute products. How do I get started?” I can relate to that due to the sheer number of Azure compute options; it can be an overwhelming situation and a cause of great uncertainty.
One reason you may be unsure as to which compute option to use is because you have not finished reading this chapter yet. All the details you need to know to make an educated decision are included in this chapter. But to get started, take a look at the decision diagram presented in Figure 4.2. The diagram is intended only to get you started and to narrow down the number of possible options to a more manageable amount. Notice there are seven possible compute options presented in the diagram; if it helps you reduce the number of options to two or three, then consider that a good thing.
Let's walk through the decision tree together starting with the choices for creating a new application or migrating an existing one (bold words in the discussion relate to a step in Figure 4.2). For the Azure Solutions Architect Expert exam, understanding the migration of existing solutions to Azure is most important, so let's focus specifically on that path. The answer to Create New, therefore, is no. The next decision point is Migrate. This may be a bit confusing, because of course you are migrating, but the question to answer here is whether you plan on making any cloud optimizations to the solution. Cloud optimizations are discussed later in the “Azure Compute Best Practices” section. But for now, the decision is, will you simply lift and shift (aka rehost), or will you do some optimizations to make the program behave better on cloud architecture?
Assuming you will simply lift and shift with no cloud optimizations (not Cloud Optimized) immediately reduces the number of recommended options to three Azure compute products.
Can your solution run within a container; can it be Containerized? If yes, then use the Azure compute product for your application code and proceed toward Azure Container Instances. If no, is the product a Web Application or a Web API? If yes, then the best option for Azure compute would be an Azure App Service. If the code being migrated to Azure is not web/internet-based, then your best choice is an Azure VM.
Now go back and take a look at the decision tree where we chose no for making cloud optimizations; this time choose yes for Cloud Optimized. Notice that electing to make cloud optimizations to your existing application increases the number of available compute options. This is a good decision because many of the Azure compute products that support the most technically advanced cloud capabilities require tweaking to get the code functional; just a simple lift and shift is not enough to get the most advanced technical benefits. In many cases, these other cloud-optimized compute options are more cost effective. Lastly, as you will learn in Chapter 7, “Developing for the Cloud,” there are some specific technical concepts that exist in the cloud that you must be aware of. These concepts may not be intuitively obvious to those having only experience with creating and maintaining IT solutions on-premise.
Next answer the question about HPC. High Performance Computing (HPC), aka Big Compute, is a IT solution that uses a large amount of CPU, GPU, and memory to perform its function. These kinds of workloads are typically used in finance, genomics, and weather modeling. The amount of compute power for these processes is huge. If your application falls into the HPC category, then Azure Batch is the place to begin your research and analysis. If not, can your application run in a serverless context? Azure Functions is the Microsoft serverless compute offering. The primary difference between Azure Batch (HPC) and Azure Functions is the size and scale required for processing. The program triggered from an Azure Batch job can be small and compact, but the amount of compute power required to run would likely consume more CPU and memory than is available from an Azure Function. Azure Functions too, should be small and compact. The amount of CPU and memory would be more mainstream and can be large, just not jumbo. In both cases, HPC and serverless are scaled dynamically for you so that the program will successfully complete; the scale is what is different. This will become clearer as you read through the chapter. Don't worry.
In reality, all Azure compute products are running on Azure VMs behind the scenes. Azure App Services, Azure Batch/HPC, Azure Functions, Azure Containers, Service Fabric, and Azure Kubernetes Service (AKS) all run on Azure Virtual Machines. The remaining three compute options are focused primarily on, but not limited to, deployment, maintenance, and failover tasks. These capabilities are commonly referred to as the orchestration of containerized workloads. I would go so far and confidently state that most legacy enterprise applications cannot simply be containerized and orchestrated without significant investment in both application redesign and IT employee training. The concepts of containerization, orchestration, and maintenance didn't exist a little more than a decade ago. That being said, if the application would not benefit from Full Orchestration, then Azure Container Instances is a recommended point of entry for the solution. Service Fabric is focused on the Microsoft Stack (.NET and Windows), and AKS is focused on open source stacks (PHP, Python, Node.js, and Linux).
The decision tree is intended as a starting point. I hope after reading the previous text and viewing the flow chart, things are not as overwhelming as you might have initially thought. Before we get deeper into the specific Azure compute products, take a look at the following two sections, which contain some general information to consider as you begin the procurement of Azure compute.
The decision-making process surrounding Azure compute products requires a solid understanding of the technical requirements of your application. Proficient knowledge of the application's style and architectural pattern is an added necessity. The style of application has great impact on the combined architectural pattern and its defined use cases. Here, use cases refers to the services that provide the application's purpose. The following bullet points give a quick overview of the three topics that are discussed briefly in this section:
A previous discussion around the decision tree flow had to do with HPC versus Azure Functions. There, I linked HPC with its Big Compute style. The Big Compute style is a rather standard architectural pattern. In general, there is a scheduler, like Azure Batch, which also coordinates the tasks on the provisioned Azure VM worker pool (see Figure 4.3).
The tasks are commonly either run in parallel when there is no dependency between them or coupled when more than a single task needs to run on the same resource in sequence. Perhaps the use case here is number computation for Monte Carlo simulations. The point is that a Big Compute architecture pattern would mimic Figure 4.3 to some extent most of the time in this application style. If you implement a different pattern for that style, then you may have availability or performance issues because the wrong approach was implemented. Take next, for example, Azure Functions, which is event-driven. This style would have an event producer, event ingestion, and one or more event consumers, as visualized in Figure 4.4.
An event producer could be numerous IoT devices that are measuring humidity and updating a Service Bus message queue. The Service Bus is providing the event ingestion service. The humidity reading is then consumed or processed and stored into a database, for example by the Azure Function, i.e., the event consumer. Each style has a recommended or common architecture pattern, and each pattern is covered in detail in Chapter 7. Keep in mind that the architecture on which the application executes also has best-practice design principles and patterns. This means that in every case one must understand which compute product to use, know how it works, and then build the application following the best-case patterns for that hosting model; read Chapter 7 for those details.
To touch only briefly on Azure design principles, there is one that jumps out and is worthy of mention here and again later. The principle is referred to as design for self-healing. Self-healing means that when a server is recognized as not being healthy, then the platform, in this case Azure, takes an action to make it healthy again. In many cases, this a reboot of the virtual machine. In the cloud context, there is a term called hyperscale, which means there is so much capacity being added and removed so quickly that its complexity exceeds that of human capacities. There is no chance that Microsoft could hire enough people with the right skills to manage all the servers that exist in the Azure cloud today; it must be automated.
The health of an application is not the responsibility of the cloud provider; however, an unhealthy application can cause the host (i.e., the virtual machine) to become unhealthy. For example, when there is a memory leak, the storage capacity is 100% consumed or there is a fatal Blue Screen of Death (BSOD), so the server and the application will not be usable anymore. Some action would need to happen to bring the application back online, and that action cannot be a manual one. That action is called auto-heal or self-heal. That brings you to the conclusion that when writing code, your application must be able to withstand a self-healing when a fail occurs.
One cloud design pattern for handling self-healing is called retry, as illustrated in Figure 4.5. Assume that for some reason the VM that the Azure App Service is making a connection to was determined to be unhealthy and is performing a recycle of the website.
If the site has high traffic, then during that recycle, it is probable that a request to the VM will fail. In the application code, you must handle the exception and perform a retry of the request that just failed. It does depend on the scenario and requirements of your application. It might be okay to simply return an exception message to a client requesting a document, for example, while not so acceptable if an exception is returned while placing an order. The preceding few sentences should now clarify my previous comment that you must be aware of specific technical concepts that exist in the cloud that may not be intuitively obvious to those having only experience with creating and maintaining IT solutions on-premise. It is for sure that exceptions happen when running applications on-premise, but most on-premise applications have support teams that can connect via RDP to the machine and manually correct the problem. This is not always the case in the cloud; the scale is simply too large for manual activities to be the norm. Therefore, instead of manual actions, recovery is performed by the platform automatically. All styles, principles, and patterns are discussed in detail in Chapter 7; if you are interested in learning more about them now, skip ahead and learn more.
Chapter 7 has in-depth coverage of cloud best practices, styles, principles, and patterns. The awareness of these concepts at this point, however, is necessary because each will influence the decision of which Azure compute product to deploy or create your application on. From a best-practice perspective, some decision points are again based on the requirements of the application. There are best-practice recommendations for applications that utilize and expose APIs and for applications that are background jobs. In addition, best-practice guidelines exist for implementing autoscaling, monitoring and diagnostics, caching, and how to best recover from a transient failure.
From an API best-practice perspective, applications would be best suited for supporting clients and consumers if they were to implement a Representational State Transfer (REST) API. REST APIs are endpoints that expect and deliver responses in the form of JSON documents. There are numerous other technologies that support this kind of internet API capability, such as Electronic Data Interchange (EDI), XML documents, Tuxedo, Simple Object Application Protocol (SOAP), and the Windows Communication Foundation (WCF) framework. Each of those techniques would work, but best-practice recommendations, when your requirement is to expose an API using the HTTP protocol, are to use the REST API architectural style.
From a background job perspective, there are as many options as there are scenarios in which background jobs operate. A background job is a program that runs on a computer that does not have a user interface and typically processes data and delivers the results of the processing. There are two primary scenarios to discuss regarding background processing: how to trigger it and where to run it from. Triggering refers to how the program gets started. As mentioned, there is no interface with a button to click that tells the program to run. Instead, the background process can be scheduled to run at certain intervals or triggered when an event takes place. Running at a scheduled interface is relatively straightforward; CRON is the most common scheduler for this scenario. The other scenario is much more dependent on the type of event and what the requirements are. An event is somewhat synonymous with a message, and all the messaging Azure products are discussed in Chapter 6. There are a number of them, and all have their best-case scenarios, use cases, and software development kits (SDKs). In short, the background process would be hooked into a queue of some kind where a message would be sent (remember the event-driven diagram from Figure 4.4). When a message is received, the hook is triggered that invokes the background job, which then performs the code contained within it. Which hosting environment to use also plays an important role as many Azure compute products can be used to run APIs and background jobs. Azure App Service WebJobs and Azure VMs are well suited for running background jobs and supporting APIs. Azure Batch and Azure Kubernetes Service (AKS) can also be used for running background jobs, but Azure Batch is not intended to host APIs as you would imagine. By the time you complete this chapter, it will be clear which model to use for which application style; however, you will need to know what patterns your application implements and then search the best-practice patterns for that one specifically.
If you intended on implementing autoscaling, any custom monitoring, or diagnostics capabilities or caching, there are some good examples of how to do this. Again, these are discussed in Chapter 7. A description of a transient failure and the expectation of its occurrence in the cloud are worthy of some initial discussion. From an on-premise perspective, transient errors or any error caused by a network virtual appliance hang, memory failure, or hard drive crash are not expected. This is because those hardware components are of trademark quality and are expected to have a high level of stability (which you pay a premium for). In the context of hyperscale cloud architecture, the hardware is more of a commodity. That means when it fails, it simply gets replaced, and the old one is trashed. That's fine, but those failures can have what is called a transient event, or an event that happens and is self-healed after some short amount of time. This is something that isn't expected often when running on-premise but needs to be coded for when running in the cloud because it can and will happen more often. The application must gracefully recover from short, random, nonreproducible moments of downtime as a matter of design.
In conclusion, each best-practice example is bound to a cloud service model, hosting model, style, principle, and patterns that are implemented into or required by your application. Figure 4.6 illustrates a Venn diagram to visually represent the relationship between these concepts.
The sweet spot directly in the middle of those five concepts is where you want your application to land. Finding that spot is no easy task; however, knowing the requirements to make an educated decision is the first step. The remaining comes from experience. You will gain some of that experience as you complete the exercises in this chapter. You should be able to answer questions like the following:
The Azure Container Instances (ACI) is Microsoft's container as a service offering and is an entry point for customers with applications that run within isolated containers, where a container is an application or program that is packaged with all dependencies and deployed onto a server for its execution. The package is often referred to as an image, while the container is often referred to as the runtime. The following are a few benefits for running application code in containers:
From a portable perspective, containers allow a developer or release manager to have confidence that the code in the container will run anywhere, on a local development environment, on a corporate on-premise physical server, or in the cloud. It will run anywhere without making any configuration or coding changes because it has no system dependencies. A container is lightweight because it reuses the operating system of the host; there is no need for an operating system to be deployed along with the container. It is flexible because all kinds of programs—small, large, simple, or complex—can run in a container. They are scalable, which means the number of servers that can simultaneously run the container can be added to provide more compute power when usage increases.
When a container package is deployed to ACI, it receives a public-facing IP address and domain name with the extension
*.<region>.azurecontainer.io
, where * is the name of the ACI container that must be unique for the given region. A region, as covered in detail in Chapter 3, is the geographical location that the ACI will be installed into. This is selected during the container's creation. Regions are, for example, northeurope or southcentralus. Keep in mind that ACI is not yet available in all Azure regions; however, the deployment globally is ongoing and will reach worldwide scale in the short term.
ACI offers support for containers running on either Windows or Linux. The container concept has been mostly focused on the open source community and therefore Linux, so you will find the number of supported features for Linux images is greater than that currently for Windows. Table 4.1 lists the currently supported Windows base images on ACI.
TABLE 4.1 ACI-Supported Windows Images
Server Version | Edition | Version |
---|---|---|
Windows Server 2016 | Nano Server | 10.0.14393.* |
Windows Server 2016 | Server Core | 10.0.14393.* |
Windows Server 2019 | Nano Server | 10.0.17763.* |
Windows Server 2019 | Server Core | 10.0.17763.* |
Windows Server 2019 | Windows | 10.0.17763.* |
The thing about Linux is that there are a lot of versions and editions. And when I write “a lot,” I mean a lot. In reality, there can be an infinite number of versions and editions because the Linux operating system is open source, and if I were so inclined, I could create my own version and edition of Linux, and it would run on the ACI platform. Therefore, there is no table that defines which Linux images will run on ACI. Be assured, however, that mainstream versions of Linux will deploy and run as expected on the Azure Container Instances platform.
If you are like me, meaning most of your career has been in the Microsoft world of Windows and the .NET Framework, then the concept of images and containers may be a bit of a mystery. This is common because like I wrote earlier, this concept was mostly confined to the open source community, which until recently Microsoft was not actively engaged in. Although there are numerous products that provide OS-level virtualization, the most common one is Docker. The Docker software is less than a decade old, which may seem like a long time when considering “cloud speed” but in reality is not. Docker became a publicly available open source product in 2013 and only became supported in Windows Server 2016, which was shipped that same year. Based on that, I am confident that you will agree containers are a relatively new concept, especially from a Windows perspective. Be assured, however, that this area is picking up steam and will become a must-know skill set, especially for an Azure Solutions Architect Expert.
Let's look at what OS-level virtualization is in a bit more detail. Take a look at Figure 4.7, which compares a virtual machine with a container. We have not discussed Azure Virtual Machines in detail yet; that is coming in the next section. However, I would expect anyone reading this book to have a decent understanding of what a virtual machine is. You should have also created a number of Azure VMs in the previous chapter.
Notice that the primary difference between running application code on a virtual machine and a container is that the operating system is abstracted away. When you create a virtual machine, part of the process is to choose the operating system that is installed along with the acquisition of the CPU, memory, and storage resources. In many cases, a program you want to run doesn't warrant the existence of an operating system, because the operating system itself consumes more of the compute power than the application. In that scenario, having an alternative to run a program on a host in an isolated container, without having a dependency on an operating system, is a desirable option.
So, what is an image exactly? I will explain it here, but note that in one of the exercises later in this chapter, you will get to create an image. It will become clearer as you read on. An image is the package of files, configurations, libraries, and a runtime required by a program to run. If your program is written in ASP.NET Core, then the code itself, its dependent libraries, and the runtime in which the code can execute together constitute the image. The image is defined in text form and then built, most commonly, using the Docker program. Once the image is created, you can deploy it to Docker Hub for private or public consumption. Figure 4.8 illustrates the relationship between an image and a container visually, which may help with your understanding of those terms.
A container, on the other hand, is an instance of the image runtime. When an image is deployed to, for example, Azure Container Instances (the host), initially it will only consume space on some form of storage provider, like a hard drive. In that state, it is still an image. Once the image is instantiated, accessed or run, and loaded into memory, it becomes a container. The container becomes a process that comprises the runtime and application code for responding to requests, messages, or web hooks. In summary, a container is a living instance of an image on a host, while an image is a package of code, libraries, and a runtime waiting to be used.
A container group, as one would imagine, is a group of containers. So what is a container? Can you answer that without reading forward or backward? A container is an instantiated image or an image that is loaded into memory ready to execute and perform its intended purpose. Therefore, a container group is a group of images running on a host ready to do some work.
It is possible to have multiple container groups, but it is currently only possible to have multiple containers in a single container group when running in a Linux container. This means when you consider a container group as the unit that runs on a virtual machine and the container as a process running on the virtual machine, you can run many containers within the same process. If you know something about IIS, a synonymous scenario is when you run multiple websites within the same application pool, assuming the application pool maps to a single process. See Figure 4.9.
Visualize the virtual machine where the container group consists of two websites (i.e., containers), both of which are running within the same process, where a process means all the EXEs running on the machine that are presented when you open Task Manager or Process Explorer. The container is one of those EXEs. The caveat here is that each container in the container group must be bound to a unique port. As shown in Figure 4.9, one of the containers is bound to port 80, while the other is bound to port 8080. Now that you have some understanding about images and containers, complete Exercise 4.1, which will help you get a workstation configured to enable the creation of an image and the local execution of a container. If you do not want to install the third-party software required to create an image and run a container, that is not a problem. You can skip Exercises 4.1, 4.2, and 4.3. EXERCISE 4.4 and the following exercises are not dependent on those three exercises; complete them if you want to get hands-on experience with Docker.
As alluded to in EXERCISE 4.1, the variety of operating systems that exist such as Linux, Windows, and iOS would make the exercise too repetitive to perform the steps for each one of them. In general, you simply need to get Git and Docker installed and working on your workstation; I am certain this is achievable with the details provided in the Windows-focused exercise.
With newer workstations, BIOS-level virtualization is enabled by default; for older ones it is not. The means for accessing the BIOS of a computer is dependent on the manufacturer of the machine. Although it is common to press F12 or F2 during bootup, it is not a standard activity, nor is the navigation within the BIOS system, as it too is created by the manufacturer.
Before continuing to the creation of a Docker image and running it in a local Docker container, please review these important limitations. These should be understood when choosing to run your application in ACI.
The following are not supported in an ACI:
Before you build a Docker image and run it in a local Docker container, let's learn a bit more about what Docker is and how it works. The five specific Docker components that you need to know are listed next:
dockerd
, is a listener that responds to requests via APIs that originate from the Docker client or another Docker daemon. These requests concern the management of container, images, volumes, and networks.docker
that you used to check the version and run the
hello-world
sample image, also in EXERCISE 4.1. The client sends the commands to
dockerd
, which is then responsible for routing the request to the correct API that executes the request.https://hub.docker.com
. Executing the
docker push
or
docker pull
command with the required parameters will publish an image to Docker Hub or download an image from Docker Hub, respectively. As we are focused on Azure, the image created in the next exercise, Exercise 4.2, will not use Docker Hub. Instead, there is an Azure feature called the Azure Container Registry (ACR) that provides the same benefits as Docker Hub.docker run
, which sends a request to the daemon to spin up an instance of the container.Finally, there are two more topics that require discussion: the
Dockerfile
and the runtime of the containers. In the next exercise, you will create a
Dockerfile
and run it in a Linux container. Here is an example of the
Dockerfile
that you will create later in EXERCISE 4.2:
FROM http://mcr.microsoft.com/dotnet/core/aspnet:2.2
WORKDIR /app
COPY ./publish .
EXPOSE 80
EXPOSE 443
ENTRYPOINT ["dotnet", "csharpguitar-aci.dll"]
The
FROM
instruction identifies the base image, in this case
aspnet:2.2
. By doing this, you do not need to start from scratch and manually install all the packages required to run this kind of application. If additional or basic packages are required for the application, you would use the
RUN
instruction and then the package name.
WORKDIR
is short for working directory. Once this is defined, it is the reference point from which other instructions operate. For example, the point of entry for the
COPY
instruction is the
/app
directory.
EXPOSE
is an important instruction as it informs the users on which port the application will listen. Ports 80 and 443 are common ports for running web applications.
ENTRYPOINT
is like the
Main()
method of a console application. This instruction notifies the Docker build where and how to interface with the image once it's instantiated into a container.
Docker Desktop for Windows supports two container types, Windows and Linux. After installing Docker on your workstation, there will be a white badge that looks like a whale with some ship containers on its back, similar to that shown in Figure 4.13.
If you click that badge, a menu that displays the type of container you will get when you run the
docker build
command is rendered. If you see Switch To Windows Containers, it indicates that you are in Linux mode. If you see “Switch To Windows Containers,” you are in Windows mode. The sample program you will use in the next exercise is an ASP.NET Core Web API that can run on both Windows and Linux. We will use Linux for the next exercise, so make sure you have it set correctly. In EXERCISE 4.2, you will create a local Git repository, download an ASP.NET Core Web API hosted on GitHub, create a Docker image, and run it in a Linux Docker container.
An Azure Container Registry (ACR) is a location where you can store container images. As mentioned earlier, Docker Hub provides the same capability as ACR; both are places to store containers. The primary difference is that ACR is private only, while with Docker Hub you can make your images publicly consumable. Docker Hub is accessible at https://hub.docker.com
and in EXERCISE 4.4. If you did not perform EXERCISE 4.1 and EXERCISE 4.2, use the public Docker image that I created called
benperk/csharpguitar-aci
. Complete Exercise 4.3 to create an ACR and upload the image you created in EXERCISE 4.2.
During the creation of the ACR, you selected an Admin user option and the SKU. I mentioned that images hosted in an Azure Container Registry are private. A username and password are required for access whether you are using the Docker CLI in a client like PowerShell or creating an Azure Container Instances. (You will do that later.) The Admin user option can be enabled or disabled from the Access Keys blade for the given ACR. That blade contains the registry name, login server, the username, and two passwords, just in case you need to change one but have one available at all times.
There are three SKUs for ACR, Basic, Standard, and Premium. As is common, the more resources you need, the higher level of SKU you need. Be aware that those higher levels come with a higher price. However, the great thing about the cloud is that you pay based on consumption instead of a flat fee of the cost for the entire solution, which could be a significant one. I'd like to call out three specific features that are different between the pricing tiers. There are more; check them online if you desire. These three (Storage, Events, and Geo-replication) are the ones worthy of further comment.
Storage is pretty straightforward. Images and containers take space on a hard drive. The more space you need, the higher the tier.
Read and write operations are limited per SKU, as well.
The Events feature of ACR is an interesting one. Though I have not seen this capability on Docker Hub, it may be available with the Enterprise version, but even that won't get too deep into the capabilities of Docker Hub. The interesting thing about the Events feature is that when an image is uploaded or updated, it can trigger an action of some kind. For example, an update to an image or the insertion of a new one could trigger a Logic App or an Azure Function that could then be programmed to execute code. For example, you could enable the copying of site content from a file store to the new image file store or cause the event to write and send a message to Event Hub, Service Bus, or Storage Queue. (Those three products are covered in Chapter 6.) The ability to send a notification to one of those messaging products when a change happens to the ACR image is a cool capability.
Lastly, Geo-replication, which we covered in Chapter 3, replicates your images to other regions. The capability is available in Premium mode only. It is configurable by clicking the Replications link for the given ACR and then choosing which regions you want the ACR replicated into. A benefit is that you deploy once, and that same image is replicated and can be consumed in all regions; this limits the possibility of unintentionally having different versions of an application running at the same time.
Now that the ACR is clear, let's get your hands dirty and create an Azure Container Instance using the image you created or the one that I publicly hosted on Docker Hub (see Exercise 4.4).
Something I found interesting when reviewing the options in the Azure Portal available for ACI was there were not many options. Other than Managed Identity support, which was discussed in Chapter 3, there are not any others worth discussion. Perhaps this is a product in expansion mode or a product that doesn't need any additional features. This means that the image is generally completely packaged and needs no other features because you must consider all the options necessary before deploying the image. Currently, if the application running inside your container is experiencing some unexpected behavior, such as exceptions or performance, due to the product being relatively new, you would have some problems troubleshooting and finding the root cause. (Or perhaps it's just me!) Nonetheless, this entire concept was a black box. Now for me and I hope for you, it is no longer a mystery and all has been clarified. Actually, I found it relatively simple to create, test, deploy, and consume a simple Docker image in a Linux container. That is saying a lot from a Microsoft guy. Someone from the open source community would find it even easier; that's my point.
I often find that when words or concepts are used in technology, they match the meaning when used in a different context. For example, the concept of inheritance in C# can be applied to real life, where properties and attributes are inherited from your parent. Hair color, eye color, and height are all things one would inherit from a parent; the same goes when you inherit from a class in C#.
When I think about the word orchestration, the first thing that pops into my mind is music. In that context, an orchestra is a group of performers who play various instruments in unison, such as strings, woodwinds, brass, or percussion. The orchestra has a conductor who orchestrates or organizes the different components of the orchestra. You could say that the conductor is responsible for the conduct of the orchestra. When you then apply that same model to the concept of containers, the model seems to fit nicely. The concept of orchestration in technology is the management (conductor) of different containers (players) that play different instruments (Windows or Linux containers and images). The conventional aspects of container-based orchestration are the following:
Before we proceed into the discussion of those activities, it should be stated again that ACI is the place to run smaller workloads and is best for getting something deployed and running quickly. ACI doesn't provide any orchestration features. When you created the ACI instance, remember that you selected the size of the Azure VM on which it will run. By doing that, you bound your application to that, and there is no automated way to orchestrate that container or multiple instances of that container. The products Azure provides for such orchestration are Azure Kubernetes Service (AKS) and Service Fabric. Refer to Figure 4.2 and you will see that those two products (located toward the bottom of the decision tree) are triggered based on the necessity of orchestration. The point is that ACI is an entry point for using containers on Azure, but if you need greater control and manageability options, then you might outgrow ACI pretty quickly. I will touch on orchestration a little bit more when we cover AKS and Service Fabric later in the chapter, but the activities in the bullet list apply to those products and not to ACI. This just seemed like a good place to introduce this topic.
Health monitoring doesn't need a lot of explanation. When an orchestrator is configured to do so, a service runs that pings the containers to make sure they are still up and responding. If they do not respond or respond in a way that is unexpected, the orchestrator will remove and replace or restart the container. From a networking perspective, you may realize a scenario in which different containers need to communicate with each other. What happens if one of the containers or the host on which a container is running becomes unhealthy and the IP address or location of the container changes? This is what the networking capability of an orchestrator is responsible for—specifically the maintaining and updating the list of containers with location and metadata details. Unlike when you deploy to an ACI and are bound to a single instance of the chosen size, using an orchestrator will allow you to increase the number of container instances and the hardware on which they run based on demand. Of course, you can decrease the consumption when demand slows down, which is very cost effective. Scheduling is the most complicated activity to explain and comprehend, so if you get it, then you are so awesome! But then again, scheduling is just scheduling, and we all have schedules, right? Take a look at Figure 4.18.
Consider that you have a large number of hosts, which is synonymous with a virtual machine in this context, and you want to deploy some more instances of a container. Do you know if you have existing capacity on some currently deployed hosts, or do you need a new host? That seems like a pretty complicated piece of information to capture without some help. That help comes from the scheduler. Assume there is only a single instance of an image named
csharpguitar
running in a container, and you request that two more instances need to be deployed along with two instances of the
csharpguitar-aci
container images. The scheduler would have knowledge of the current configuration stored in a data source and would make the deployment as required, whether the deployment needs new hosts or if there is enough capacity to run them on existing ones; this is what the scheduler can do. Lastly, the synchronization of application upgrades manages the deployment of the release of new container versions. Some additional capabilities are to avoid downtime and to be in a position to roll back the deployment if something goes wrong.
Azure Container Instances and Docker are young concepts but growing fast on Azure. The consumption rate is one of the fastest at the moment. At this point, you should have a good understanding of what ACI is, what Docker is, and what orchestration is. We will get back to orchestration again later, but let's now move on to Azure Virtual Machines.
Azure Virtual Machines is Microsoft's IaaS product offering. Azure VM is by far the most popular and utilized Azure service. This can be attributed to the fact that Azure VM existed before the PaaS, FaaS, or CaaS offerings were available. Some years ago, any company or individual who wanted to utilize compute resources in Azure had only one option to do so, and that option was Azure VM. Even at that early stage of cloud consumption, the savings you could realize by no longer needing to manage the hardware and network infrastructure was great. Recall from Figure 4.1 that networking, hardware, and the virtualization of an instance are owned by the cloud provider when you choose to use IaaS. Also, recall from Figure 4.2 that if your workload is not cloud optimized, if you do not require or desire containerization, and if the workload is of relative complexity, then Azure VM is the place to begin your investigation and consumption. But what is meant by “relative complexity”? Mostly this means that if you need to have some control over the operating system on which your program runs, i.e., you need registry changes, your application instantiates child processes, or the application requires some third-party assembly installation, then it would not work on PaaS, for example. It won't work on PaaS because you have no control over the OS using that cloud model. Also, if you wanted to move an entire application infrastructure that included multiple tiers like web, application, database, and authentication tiers, each of which was running on its own machine, then that would be “of relative complexity” and would best fit in a group of Azure VMs. Figure 4.19 illustrates that kind of architecture.
If someone asked you what a virtual machine was, could you answer that? In your own words, what is a virtual machine? In my words, a virtual machine is a simulated server running on physical hardware that is granted access to CPU, memory, and storage that actually exist on a physical server. There can be many virtual machines on a single physical server. For example, a physical server with 32 CPUs, 128GB of RAM, and 200GB of storage space could realistically host three virtual machines with eight CPUs, 32GB of RAM, and 50GB of storage. The missing one-fourth capacity is necessary to run the OS and programs that manage the physical hardware and the virtual machines. You wouldn't want to allocate all physical resources to run virtual machines, leaving nothing left for the host to use. Related to this, a virtual network is also a simulated network within a physical network, so by understanding the VM concept, you can also visualize the concept of a virtual network.
If you read the previous chapter and completed EXERCISE 3.3, then you already have some experience with Azure VM. Take a look at Figure 4.20 and reflect on that exercise; think about if this is what you had in mind when you created your first Azure virtual machine.
As you may have noticed earlier in Chapter 3, there are a number of additional products and features created when an Azure VM is provisioned. Looking again at Figure 4.20, you will notice a few products such as a virtual network and subnet (if one doesn't already exist), a public IP address, a network interface card (NIC), and managed disks. Each of these you should already have a good understanding of except for managed disks. Although managed disks are discussed in more detail later, since the disk roles OS, Data, and Temp are called out in the figure, a description of them is warranted. On Windows, there is a command named
diskmgmt.msc
that when entered into a command window identifies the partitions and disks that are configured on the server. Execute this on your workstation or on a Windows Azure VM to see which disks currently exist. Compare the different elements and take away some personal learnings from that.
The OS disk is the one that contains the pre-installed operating system that was selected during the creation of the VM such as Ubuntu, Red Hat, Debian, Window Server, or Windows 10. OS disks have a maximum capacity of 2,048GB. A data disk is one that stores application data such as cache, auditing logs, configuration settings, or data in a file or a database management system (DBMS). The number of data disks that can be attached to an Azure VM is based on the size and expected number of input/output operations per second (IOPS). You'll learn more about this later in the chapter, but in short we are talking about a number potentially in the thousands with a maximum capacity of 16,384GB, which is pretty large—actually gigantic. These numbers increase often, so check the Azure documentation for the current maximums.
The temporary disk is a location that is used for storing swap and page files. Those kinds of files exist in the context of managing memory and are used to offload memory from RAM. Recognize that this disk exists and what its intended purpose is, but leave it pretty much alone unless there is a specific reason to change it. I checked using
diskmgmt.msc
, and the temporary storage drive is mapped to
D:/
and has a size of 7GB; on Linux the disk is at
/dev/sdb
. Take note that this disk is temporary, and you should expect that anything stored on that drive can be removed during updates, redeployments, or maintenance, so again, don't use it or change it unless you have a specific need.
Managed disks are available in the following types: Standard HDD, Standard SSD, and Premium SSD. Choosing between a hard disk drive (HDD) and solid-state drive (SSD) comes down to speed, lifespan, reliability, and cost. SSD is the newest and will outperform HDD in all aspects, but it comes with a higher cost. Choosing which disk type and the number of attached data disks depends on what the requirements of the application are.
Now that we've discussed the basics of what a VM is and the related products and features, let's create a few VMs and get into a little more detail.
Before creating and provisioning an Azure VM, there are a few items you must consider, such as its location, its size, the quota limits, and the operating system. As discussed in the previous chapter, the location in which you place your workload should be analyzed from a privacy perspective if the application will store customer or personal data. There are laws that dictate the legal requirements of its management. Knowing the regional laws, the concept of geographies, and which data may be sent to other regions outside of the geography are needed pieces of information and were covered in the previous chapter. In addition, you must confirm that all the products required to run your application also exist in the same region. For example, does your application require zone redundancy or a Cosmos DB? If yes, then you would want to make sure those products are in the region you place your Azure VM.
As you would expect, there is a great variety of available sizes of Azure VMs, aka a VM series. To list the available VM sizes in a given region, you can execute this PowerShell command, for example:
Get-AzVMSize -Location "SouthCentralUS"
. The VM size (i.e., the number of CPUs and amount of memory) is greatly influenced by the requirements of the application. In addition, there are different sizes available based on the chosen operating system, for example Windows or Linux. As it is not realistic to determine all the possible customer use case scenarios, both size and quota limits will be discussed later when we focus on Windows and Linux Azure VMs in more detail. There are, however, different categories of Azure VMs that can be applied to both Windows and Linux VMs; take a look at Table 4.2 for more details on the classification. There is more to come in regard to what instance prefix means, so please read on.
TABLE 4.2 Azure Virtual Machine Categories
Type | Instance Prefix | Description |
---|---|---|
Compute Optimized | F | Optimal for batch processing of medium-sized applications |
General Purpose | B, D, A, DC | Best for development, testing, and small applications |
GPU | NV, NC, ND | Most useful for video editing and graphics |
High Performance | H | Powerful CPUs with high network throughput |
Memory Optimized | E, M, D, DS | Ideal for in-memory analytics and relations databases |
Standard | A | Small VMs not for production, testing only |
Storage Optimized | L | Useful for applications requiring high disk I/O and Big Data or SQL databases |
Also recognize that there are numerous ways to deploy and create an Azure VM. You may remember when you created your first VM that there was a step where you decided which OS image to use from a drop-down. In that drop-down list there were approximately ten of the most common images, like Windows and Linux, some of which have already been mentioned. But as you will experience in Exercise 4.5, there is also a link below that drop-down that leads to hundreds of public and private images to select from. These other images exist in a place called the Azure Marketplace, which is a location for businesses to host their software products for consumption by Azure customers. Microsoft places its software products in the Azure Marketplace just like any other company would do. To learn more about the Azure Marketplace, visit https://azuremarketplace.microsoft.com
.
In the previous exercise, you created an Azure virtual machine using a public Azure image and looked through other available images.
docs.microsoft.com/en-us/azure/bastion/bastion-create-host-portal
An image in the Azure VM context is similar to the definition of an image in the container context discussed in the previous section. An image is a template that defines the environment requirements in which it will run. From an Azure VM perspective, the big difference is that the operating system is part of the image definition. Review Figure 4.7 if you need a refresher on the differences. There are numerous tools that help you to create an image for deployment to an Azure VM, for example, Azure VM Image Builder, System Preparation Tool (SYSPREP), Disk2VHD, snapshots, and export from VMware, VirtualBox, from an already created Azure VM or Hyper-V (VHDs) from the portal. We won't cover all those tools and options in detail now, but an interesting one is snapshots. We have discussed a bit about managed disks, specifically the OS disk. It is possible to create a snapshot of that disk and then navigate to the disk in the portal where you will find a button on the Overview tab that allows you to export it and then create a VM from that snapshot; there are similar capabilities in on-premise virtualization software as well. The simplest way to create an image of an existing Azure VM is described in Exercise 4.6. You would want to do this after the VM that you provisioned is complete and ready to be shared and used in production. In the following exercise, you will create an image from the VM created in the previous exercise and use it to deploy a new VM.
Now that you have an image that, as in my examp1e created for EXERCISE 4.6, responds with, “Hello from CSHARPGUITAR-SC.” You can use this image you created any number of times as you create redundancy and failover solutions. The image is also available for usage with a virtual machine scale set (VMSS), which will be discussed in more detail later in the chapter. Let's switch gears now and look a little more at the two most popular supported operations systems available for Azure VMs: Windows and Linux. An Azure Solutions Architect Expert must grasp which version of each operating system is supported on Azure. Know, too, that the OS has limits imposed by the subscriptions, CPU, IOPS, RAM, and storage available. You can expect some questions about this on the exam. For example, which of the following types of VMs cannot be deployed to Azure?
The Windows Server operating system is one of the most used software products for running enterprise-level workloads. In this section and in the following one focused on Linux Azure VMs, we'll focus on the supported OS versions, a subset of recommended VM sizes for those OS versions, and some quota limits specific for the OS and version. Recall from EXERCISE 4.5 that you selected a prebuilt image of Windows Server 2019 Datacenter Server Core from the Azure Marketplace. Take a look at the following 10 supported Windows images available via the Azure Marketplace in the portal:
There are some recommended sizes of an Azure VM based on each of these operating system versions. The recommendations are arranged based on the categories that were presented previously in Table 4.2. Therefore, it is important to know into which category your application falls. Table 4.3 provides a list of recommended Azure VM sizes based on the selected workload category. The values in the Instance/Series column, in my example, DS1V2, represent the grouping of many compute components such as CPU speeds, CPU-to-memory ratio, temp storage ranges, maximum data disks, IOPS, and network bandwidth throughput. The ratio between each of those components is different for each series and therefore has an intended target application category. You can find the specifics of each Windows OS Azure VM here:
docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-memory
.
As an Azure Solutions Architect Expert, you would be expected to recommend the category and VM series when given the specifications of the workload being deployed onto an Azure VM.
TABLE 4.3 Windows Versions to Azure VM Size Recommendation
Windows Version | Category | Instance/Series |
---|---|---|
2008 R2 SP1 2012 Datacenter 2012 R2 Datacenter |
General Purpose | DS1V2, DS2V2, D2SV3, D1V2, D1, DS1, DS2 |
Memory Optimized | DS11V2, E2SV3, DS11 | |
2016 Datacenter * 2019 Datacenter * |
General Purpose | DS1V2, DS2V2, D4SV3, D1V2, D1, DS1, DS2 |
Memory Optimized | DS11V2, E2SV3, DS11 |
If you see an asterisk (*), it symbolizes all variants of the Windows versions. As covered later in the “Migrating Azure Virtual Machines” section, it is possible to bring your own on-premise VM to the Azure platform. This means that almost any OS and its configuration that you can export and convert to the virtual hard disk (VHD) format can be deployed to an Azure VM. That sounds a bit easier than actually doing it. Sure, you can deploy a VM running Windows Server 2003; however, some capabilities such as cluster failovers, Azure VM agents, or VM extensions will not work or be supported. Some other common Windows Server features that are not supported on Azure are included in the following list:
A primary reason for limits and quotas is to prevent the accidental consumption of resources that results in a high, unexpected charge. Most of the limits are considered soft, which means if you need more, then you can get more by contacting Microsoft and requesting more. These limits are typically bound to a subscription or a region. I have seen many customers creating multiple subscriptions or deploying to multiple regions to get around the soft limits instead of realizing that the limit, which is there for protection, can be increased. Contacting Microsoft to get the soft limits increased would make managing your workloads on Azure more intuitive. Moving resources to other subscriptions or regions when those resources are all part of the same solution makes things harder to manage. There are, however, some hard limits, imposed on almost all customers that you must adhere to. Those numbers are big, and most companies wouldn't need (or couldn't afford) so much. They also are not always documented. This is because if the hard limit isn't hard-coded into the product, then it could be increased even more than a documented limit, if the business case is justified. Table 4.4 describes some subscription limits that are related to Azure VMs.
TABLE 4.4 Azure VM Limits
Azure Resource | Soft/Default Limit | Hard/Max Limit |
---|---|---|
Virtual machines | 25,000 per region | Contact Microsoft |
Virtual networks | 100 per region | 1000 per region |
Managed disks | 50,000 per region | Contact Microsoft |
Storage accounts | 250 per region | Contact Microsoft |
Virtual machine cores | 30 per region | Contact Microsoft |
Also note that these limits change often. The ones listed in Table 4.4 and most of the numerical and relationship limits existed when the Azure Solution Architect Expert exam was created. Learning these will help you answer any question on the exam regarding this topic. By relationship matching I am referring to Table 4.3 where the links between OS version, category, and VM size are displayed.
The limit of 30 cores per region in Table 4.4 refers to a specific scenario where the limit applies to specific instance/series types. For example, you cannot have more than 30 cores of series A and D VMs in the same region. Additionally, there is a limit of 30 with the Dv2 and F series VMs in the same region. You can, however, have 10 cores of A1 and 20 cores of D1 in the same region equaling 30. You could also have 30 cores of D1 and 30 cores of F1 in the same regions because the instance/series are not grouped together in the limits logic. This is an important point to know when using IaaS; however, I wouldn't expect such a question on the Azure Solutions Architect Expert exam, so just keep it in mind as you progress to being not only a certified Azure Solutions Architect Expert but also a tenured and highly competent one.
Linux is rapidly becoming the most utilized operating system for running small to medium workloads. Microsoft is helping to support that growth by providing the means for its simple implementation onto the Azure platform. As you have already built an Azure VM, you know how easy that was. The only difference from the previous exercise where you built the Windows VM is that you would choose a Linux image from the drop-down box instead of a Windows one. The other steps are the same; there is no apparent attempt to make deploying Windows VMs easier than Linux. The image defaults to a Ubuntu image. Currently there are six flavors of Linux OS offerings in the Azure Marketplace.
Table 4.5 lists the Azure Marketplace images available for each of those flavors.
TABLE 4.5 Azure Marketplace Linux Versions
Linux OS | Linux Version |
---|---|
Ubuntu Server | 14.04 LTS, 16.04 LTS, 18.04 LTS |
Red Hat Enterprise | 7.2, 7.3, 7.6 |
CoreOS | Alpha, Beta, Stable 7.5 |
Clear Linux OS | Basic, containers, machine learning |
SUSE Linux Enterprise | 15, 12 SP4 |
Debian | 8, 9 |
There is a concept referred to as blessed images or endorsed distributions in the context of Linux. As stated earlier, it is possible to build any machine using any OS and configuration and attempt to deploy it to Azure. The key word there is attempt. There are so many possible configurations that it would be inconceivable at this time to have 100% coverage. I once tried unsuccessfully to deploy a Linux OS that was not blessed. When you deploy and experience some kind of issue, the place where you tend to turn to is the Serial Console. The Serial Console lets you make a hardware connection to the VM instead of using SSH, which requires a network connection. The Serial Console is listening on a virtual console named tty0 by default. The Linux image I was deploying was configured to listen on tty1, and I couldn't connect to it. Only by chance was I able to figure that out, but this is an example of what happens when you do not use an endorsed or recommended image. There are many “one-offs” that can occur and delay your deployment or, worse, completely prevent it. It is therefore most prudent that your application targets one of the Azure Marketplace images; however, there are more Linux flavors that are considered blessed and endorsed that do not exist in the Azure Marketplace. They are listed in Table 4.6.
TABLE 4.6 Additional Endorsed Linux Flavors
Linux OS | Linux Version |
---|---|
CentOS | 6.3, 7.0 |
CoreOS | 494.4 |
Debian | 7.9, 8.2 |
Oracle Linux | 6.4, 7.0 |
Red Hat enterprise | 6.7, 7.1, 8.0 |
SUSE Linux Enterprise | SLES for SAP, 11 SP4, 12 SP1 |
openSUSE | Leap, 42.2 |
Ubuntu | 12.04 |
As with the Windows machines, the recommendations for the VM sizes are based on category and the operating system version. Please find the VM size recommendations per Linux OS flavor in Table 4.7. For greater visibility into the details of the Linux VMs instances, take a look at https://docs.microsoft.com/en-us/azure/virtual machines/linux/sizes-memory
. As the amount allocated for compute resources increases, so does the cost; therefore, choosing the right size is important.
TABLE 4.7 Linux Versions to Azure VM Size Recommendation
Linux version | Category | Instance/Series |
---|---|---|
Ubuntu CoreOS |
General Purpose | DS1V2, DS2V2, D2SV3, D4SV3, D1V2, D1, DS1, DS2 |
Memory Optimized | DS11V2, E2SV3, DS11 | |
Red Hat Enterprise | General Purpose | DS2V2, D2SV3, D4SV3, D2V2, D2V3, D2, DS2 |
Clear Linux OS | General Purpose | D1, D3, DS3, DS4 |
SUSE Linux Enterprise | General Purpose | DS1V2, DS2V2, D1V2, D1, Ds1, DS2 |
Memory Optimized | DS11V2, DS11 |
There are no limits or quotas that focus specifically on Azure VMs running Linux; they are the same that were covered previously in Table 4.4. Microsoft fully supports Linux, and there are no policies or practices that knowingly inhibit this operating system.
Extensions are small programs or automation activities that are helpful for performing post-deployment tasks, security, monitoring, and automated deployments. If you were curious in the previous exercises where we created the Azure VM, there was a tab name Advanced, and on that tab there was a section that allowed you to select an extension. In the exercises I usually skip over those tabs, but you may have looked at them and wondered what all those features are. Consider accessing the portal and simulating the creation of a new Azure VM. First, notice that the list of installable extensions is different when selecting a Linux or Windows-based VM, as well. The region plays a role in the extension list, so again, here is another example of knowing what capabilities are available in each region prior to committing to one.
For Windows there are some nice extensions such as the PowerShell Desired State Configuration extension that will help in post-deployment activities to make sure the VM is configured in the same way in every case. This is important once your workloads get rather complicated and require automated deployments, which are discussed in more detail in the next section. There are anti-malware, cloud security, and other security-related agents that can be deployed, configured, and run on your Azure VM as an extension. When you create your initial Azure VM, you configure all these environment-specific capabilities and then capture your image for use with later automated or manual deployments.
Microsoft provides a lot of monitoring capabilities; however, it fully supports other companies with more specific monitoring capabilities through this extension feature. Some third-party monitoring products available for installation are Datadog, APM Insight, and Dynatrace. Monitoring is covered in more detail in Chapter 9 but will focus on the Azure platform–based capabilities and not third-party extensions in IaaS. If you have an interest in learning more about these extensions, check out this online document:
docs.microsoft.com/en-us/azure/virtual-machines/extensions/overview
.
Deployment and migrations are covered in Chapter 8, which will target ARM and code deployments (aka content deployments). As you already know, there are many ways to deploy an application and many components that need to be deployed to make it work. If any portions of those deployment tasks can be automated, it decreases the amount of required effort. Consider in many of the previous exercises, after the provisioning of the Azure VM was complete, you were requested to connect via RDP or Bastion to the server and manually install IIS using some PowerShell cmdlets. That is an acceptable approach if you have only one or two servers to deploy that need IIS; however, if you were to deploy 50 or 100, then that option really isn't worth considering. It is not realistic to manually log in to 50+ servers and make manual configurations to each of them. EXERCISE 4.6 is a similar approach to realize the same outcome, where you create an image and use it as the baseline for all future deployments. Using automated scripting is another option to consider and is useful when you deploy your Azure VMs with PowerShell as well. There are even scenarios when a combination of both these capabilities add great value.
An example of a scenario where both a custom image and an automated deployment script are useful is when there is no public image that has the required utilities installed to run your script. For example, if you wanted to run an Az PowerShell cmdlet, then the image must have those cmdlets installed prior to executing. This currently requires that the following PowerShell installation command be run first; you may remember this from the previous chapter.
Install-Module -Name Az -AllowClobber -Scope AllUsers
As mentioned in the previous section, there is an option on the Advanced tab of the Azure VM creation blade called Extensions. Clicking Select An Extension To Install opens a window to select an extension to install. The one used for executing custom scripts is named Custom Script Extension for Windows and Custom Script for Linux. When building a Windows OS VM, you can save the following PowerShell cmdlet to a file named, for example,
iis.ps1
and upload it when configuring the build properties of the Azure VM in the portal.
Set-AzVMExtension `
-ExtensionName IIS `
-Publisher Microsoft.Compute `
-ExtensionType CustomScriptExtension `
-TypeHandlerVersion 1.4 `
-SettingString '{"commandToExecute":"Add-WindowsFeature `
-name Web-Server -IncludeManagementTools"}'
Then, once the Azure VM is created, IIS will be installed using an extension. You can also install, for example, SQL Server and the .NET Framework using extensions. From a Linux VM perspective, in addition to the extensions, there is also the feature on the Advanced tab to implement
cloud-init
scripts, which can configure users and security and install software packages. The creativity customers and developers have within this tool is greatly varied, and the wonderful point about Azure is it provides a platform to realize that creativity. I recognize this section is a bit abstract. I simply point out this feature as an option to pursue and consider when you are deploying your application to an Azure VM.
You should now have a good understanding of how to create an Azure VM whether it be Windows or Linux. You should also, if asked, know which versions of those operating systems are endorsed and what trying to deploy an image that is not endorsed could entail. You should also know what the different categories/series of VMs mean, such as memory optimized and general purpose. Given a table that shows a list of different VMs with OS, CPU requirements, and memory requirements, you need to know which ones are endorsed by Azure and if the requested resources breach any quota or resource limits.
When you provision an Azure virtual machine, you receive the compute from a pool of existing virtual machines running on a host. The available pool of compute capacity is commonly used by all Azure customers. Be confident, however, that there is no content or configuration that remains behind after the deallocation occurs. The virtual machine is completely cleaned before being placed back into the pool of available resources. If you wanted or needed all of your virtual machines to run on its own host aka physical machine, which was not deallocated or provisioned from a pool of shared compute resources, then you can choose an Azure dedicated host. Visit this site for more information about this product offering:
docs.microsoft.com/en-us/azure/virtual-machines/windows/dedicated-hosts
.
The cost of the Azure dedicated host is more than running in the shared hosting environment. This is because you would be charged for all the compute power available on the host instead of only the consumed compute power of the host. An advantage of using an Azure dedicated host is the ability to control any infrastructure change that may impact your provisioned resource such as infrastructure or networking kinds of changes. Azure dedicated hosts are not available with VM scale sets.
After your Azure VMs are provisioned, it's support time. You should already have solid knowledge about creating Azure VMs. Now it is time to learn some activities that can be done after their creation. This doesn't necessarily mean that the Azure VMs are in production and being actively consumed; rather, it means that you may encounter a scenario in which one of more of them requires a rebuild, a reconfiguration, or a redesign. This section will focus on networking, maintenance, cost and sizing, storage, managed disks, disaster recovery, and backup activities.
If you followed along with the previous chapter, you are competent from an Azure networking perspective. Even if you didn't complete the previous chapter, the following networking tips may come in handy at some point. The focus of these PowerShell cmdlets is to provide insights into how your network is configured. These are helpful in case you need to find out the cause of unexpected behaviors or transient outages. It would be possible to capture the same information from the portal, but in some cases getting a holistic view of what is going on can be achieved better through running some PowerShell cmdlets. Prior to running PowerShell cmdlets, remember that you must authenticate and then set the focus to a specific Azure subscription, as shown in the following code snippet. From now, I will expect you know this step and will not mention it anymore.
Connect-AzAccount
$subscription = Get-AzSubscription -SubscriptionId "#####-####-###########"
Set-AzContext $subscription
The following is an example of a PowerShell cmdlet. It lists all the network security groups (NSGs) in a given resource group. It then cycles through all the NSGs and dumps out the NSG name, the direction of the rule, the rule name, and the inbound port. This would be helpful just to get a quick understanding of all the different NSG rules you have in your resource group. The output might resemble something like Figure 4.22.
$NSG = Get-AzNetworkSecurityGroup -ResourceGroupName <Resource Group Name>
foreach($nsg in $NSG)
{
$securityRules = $nsg.SecurityRules
foreach($rule in $securityRules)
{
$nsg.Name + ": " + $rule.Direction + " - " + $rule.Name + " - " `
+ $rule.DestinationPortRange
}
}
The following are some other helpful PowerShell cmdlets:
Get-AzVirtualNetworks
Get-AzVirtualNetworkSubnetConfig
Get-AzNetworkInterface
There are many, but those are the ones that are most useful. They will require some customization similar to the code snippet previously shown prior to Figure 4.22. The output of those PowerShell cmdlets is often large JSON-formatted documents that contain all the details of the network, subnet, and network interface. Much of the information is unnecessary and can be filtered out with a little creative PowerShell scripting. It is a powerful tool with a large open source set of cmdlets at your disposal.
There is no way around it; once you deploy your workloads to an Azure VM, you can't just walk away from it and forget it. Like a car or your garden/yard, it needs some ongoing attention. Some details you would want to be aware of fall into, but are not limited to, these areas:
In the next section, we'll go into more detail about stopping and starting VMs, but it's most optimal in this cloud service model that you turn off VMs that you don't need. There are numerous ways to achieve this; one simple way is to execute the PowerShell cmdlet
Stop-AzVM -ResourceGroupName <name> -Name <VM Name> -Force
to stop a VM or use
Start-AzVM -ResourceGroupName <name> -Name <VM Name>
to start one. Stopping and starting VMs can also be achieved via the Azure Portal or via an RDP, Bastion or SSH session directly on a VM. It is also possible to use
Remove-AzVM -ResourceGroupName <name> -Name <VM Name>
to delete a VM. The
Remove-AzVM
cmdlet can have some significant impact if the VM performs a critical function in your solution. RBAC controls can be used to restrict this kind of activity based on individuals or groups. If you wanted to make sure that no one, no matter what, was allowed to delete a resource or any resource group regardless of RBAC restrictions, then there is a feature called resource locks that will handle that requirement. To test a resource lock, complete Exercise 4.7.
You may have noticed in Figure 4.23 that there were links to the resource group and subscription next to the + Add button. In this context, on the VM blade, you can only add a lock on the given resource, i.e., the Azure VM. However, clicking those other links will show you a list of locks that exist in that resource group and subscription. If you wanted to apply a lock on a resource group or subscription, you would need to navigate to that resource’s blade and click the Lock link on that resource. As you would expect, locks placed on the parent will be applied to the child, in that a lock on a resource group will apply to all resources within it. Additionally, the most restrictive lock is the one that is applied if there are locks in both the parent and the child. For example, the drop-down list in Figure 4.23 included not only Delete but also Read-Only. If there is a read-only lock placed on a resource group and a delete lock on an Azure VM within the resource group, then the delete lock is the one that is respected, as it is more restrictive. Refer to Chapter 2 where we discussed scopes if you need a hierarchy refresher in regard to management groups, subscriptions, resource groups, and resources, as this concept applies to the scope model discussed here too.
From a read-only perspective, the meaning here is that resource modifications are read-only, not the operation on the resource. Two examples, if there is a read-only lock placed on a VM, then the size, disks, configuration, or auto-shutdown schedule cannot be changed. However, if someone RDPs or SSHs to the VM, that person will be able to change and/or remove content from the VM itself. Assuming there is a managed SQL Server instance on the VM, the data in the database would be changeable, and the read-only setting applies only to the changes that occur on the VM itself via the portal or other supported client.
The next maintenance-related activity has to do with resizing. Any financially minded person wants to spend the exact amount required to get the job done—nothing more, nothing less. This holds true when choosing an Azure VM because the cost is directly related to the size, i.e., how much compute you get. Starting off small and then growing is an option because resizing is not so complicated. When you create an Azure VM, a default size is selected (for example, D2SV3), but there is a link under it that allows you to change the size. If you decide to keep that size and later determine you need more compute power, there is a link in the navigation bar for the Azure VM named size. Clicking that link opens the Size blade and will show you the existing size and a list of other options, as shown in Figure 4.24.
Simply select the desired size, and click the Resize button, and the Azure VM will be scaled up to that size. Note that changing the size will result in a restart of the application, which should be expected since the configuration of the compute power associated to the VM is a significant altercation. It is also possible to make the same change using PowerShell. Execute the following cmdlets and view the output in Figure 4.25:
Get-AzVMSize -ResourceGroupName "<RG-NAME>" -VMName "<VM-NAME>"
$vm = Get-AzVM -ResourceGroupName "RG-NAME" -VMName "<VM-NAME>"
$vm.HardwareProfile.VmSize = "Standard_DS3_v2"
Update-AzVM -VM $vm -ResourceGroupName "<RG-NAME>"
The first PowerShell cmdlet lists all the possible VM sizes available for the Azure VM in that region. This is helpful for finding out not only the options you have but also the nomenclature (its name) you need later once you decide on the size. The options in the Azure Portal are a bit more restrictive; you will get an unfiltered list using PowerShell. The next lines of the PowerShell script get the VM into a PowerShell object, set the
VmSize
, and update it. Wait some time, and then the workload you had running on that VM will be scaled up to the newly allocated compute series. Just as an extra test, update the resource lock you created in EXERCISE 4.7 to read-only and try the same update process discussed just now. It will fail, because the configuration options for the VM would then be read-only and cannot be changed.
Let's shift gears a little bit and consider updates. One of the major responsibilities that you have when choosing IaaS is the management of the operating system. This means you need to schedule and manage security patches, bug fixes, and hot fixes (i.e., from a Windows perspective KB patches). There is a product called Update Management that will help you with this. Update Management is used in collaboration with Log Analytics, which is discussed in Chapter 9. Once you configure Update Management, it will perform an analysis of the targeted VM and provide details about any missing updates. This works on both Windows and CentOS, Red Hat, SUSE, and Ubuntu Linux VMs. If any updates are found to be missing, there is an additional feature found within the capabilities of Update Management that allows you to schedule an update deployment. The update can be scheduled to run once or be recurring, and you can configure whether to reboot after the update is applied or not. That is an important option, i.e., whether to allow a reboot. Early in my IT career there was some high risks involved in installing operating system patches. On numerous occasions the patch simply wouldn't install and killed the server, or the patch installed and went down for a reboot and never came back up. In both scenarios, the only option we had was to rebuild the entire server, which was the easy part. Installing, configuring, and testing the freshly built application was the hard part. I am so thankful for images, backups, and deployment slots that can now save me many hours, late nights, and weekends. The experience is to simply point out that selecting the reboot options and automating OS patch installation needs to have some thought about rollback or troubleshooting scenarios. There is another feature for Azure VMs called boot diagnostics that can help if after an update is installed the VM doesn't come back up or simply for any reason after a reboot the VM is hanging.
To keep your head from spinning too much, an overview of what is going on in your subscription and resource group can help to give some clarity. Those monitoring topics are covered more in Chapter 6 where we cover compliance topics and in Chapter 9 when we cover monitoring. However, the Inventory link on the Azure VM navigation menu lets you enable change tracking on the VM. Once it's configured, you can get an overview of what software, files, Windows registry, and Windows services have been added, changed, or deleted. For Linux VMs, the software, files, and Linux daemons are monitored for changes. I am confident you will agree that knowing what is happening on your machines is helpful toward their maintenance and support. If something stopped working, you could look on the Inventory blade and check whether something has changed without needing to RDP or SSH to the VM and looking manually.
Finally, if you recall from the numerous times you have created an Azure VM, on the Management tab there is a feature named Boot Diagnostics. It is enabled by default and stores some helpful pieces of information. A helpful one is that it captures a screen shot that may be a BSOD, or for Linux there may be some text shown on what you'd normally see on a monitor when running on-premise and directly connected to the machine. There is also a Serial log, which provides some potentially helpful logs with errors and exceptions that may lead to a root cause and a fix. Another useful tool is the Serial console, which provides a COM1 serial connection to the VM. I mentioned earlier about tty0; this was the place where I was working when failing at the deployment of an unblessed Azure VM image. I was using the boot diagnostics and the Serial console (which I couldn't connect to) trying to get the Azure VM to work. Both of those features are useful for maintenance and troubleshooting efforts.
Choosing the right compute size is important because the cost is static for an Azure VM, unlike when running a consumption mode Azure Function, for example. You do not want too much, nor do you want too little, but hitting the precise size from the beginning can be challenging. If you decide to start small and then increase or decrease as you go along, that is a good plan. The Azure VM resizing capabilities as shown in Figure 4.25 can help you proceed with that approach. Also, starting and stopping the VM when it is not being used is also a means to reduce costs. There are numerous ways to stop and start a VM; you know already that you can use PowerShell cmdlets
Start-AzVM
and
Stop-AzVM
to achieve that. However, in that context, you need to watch out for a few things. As shown in Table 4.8, there are numerous Azure VM power states that you need to understand.
TABLE 4.8 Virtual Machine Power States
Power state | Detail |
---|---|
Deallocating | The VM is in the process of releasing allocated compute resource. |
Deallocated | Allocated compute resources are no longer consumed. |
Stopping | The VM is in the process of being shut down. |
Stopped | The VM is shut down; compute resources remain allocated. |
Starting | The VM is in the process of being activated. |
Running | The VM has been started and is ready for consumption. |
The important point in those VM power states is that if you simply stop the VM, you will continue to incur a charge because the compute resources remain allocated to the VM. You need to make sure that you deallocate the VM if it is no longer needed. Deallocation is achieved when you use the
Stop-AzVM
PowerShell cmdlet or click the Stop button on the Overview blade in the portal. If you want to use the
Stop-AzVM
PowerShell cmdlet but not deallocate the virtual machine, pass the
-StayProvisioned
parameter along with the command. By contrast, if you have an RDP connection to a Windows VM and click the shutdown button, the VM is not deallocated; it is simply stopped. An important note is that when a VM is deallocated, the Public IP address is placed back into the pool, and when you start it back up again, it is likely that it has a different public IP address. That has an impact if your solution has any dependencies on that IP address. You will notice that when you click the stop button in the portal, you get warned of the loss of the IP address and are asked if you want to keep it. If you choose to keep it, the IP address is changed to a static IP address, and there is a cost associated with that. The default public IP address setting is dynamic, which is why it can change when the VM is deallocated.
You may have also noticed during the creation of an Azure VM in the portal that on the Management tab there is an option called Auto-Shutdown. This is configurable during and after the creation of the VM. This feature allows you to deallocate the VM at a given time each day so you do not have to worry about forgetting to stop it all the time. Additionally, you can provide an email address to be notified when the auto-shutdown feature is executed; it includes the name of the VM it ran on as well. Up to now we have focused on Windows VMs; in Exercise 4.8, you will create a Linux VM and execute some Azure CLI commands using https://shell.azure.com
.
It is possible to install Azure CLI onto your workstation and use it remotely, that same as with PowerShell, but I wanted to show you this Azure Cloud Shell feature. One advantage and something I like about it is that I do not have to constantly execute
Connect-AzAccount
to get logged in. I am prompted for my credentials when I first access the site. Another experience you may have noticed is that
az vm stop
doesn't deallocate the VM like
Stop-AzVM
does; you must instead use
az vm deallocate
. That is an important point to be aware of, and it is mentioned when executing
az vm stop
. However, had you run this as an automated script, perhaps it could have been missed.
When all is said and done and you no longer need the allocated Azure compute power, you can simply remove the resource group, and all of its contents are removed. This is a good reason to keep all your resources grouped together in a resource group when you are testing or developing. Have all the resources being used for a given project or application so you know what the provisioned resources belong to. There is a concept called tags that we will discuss in Chapter 6 that provides a similar capability, but for now, put every related Azure resource into a resource group, and when you're done, run
Remove-AzResourceGroup -Name <Name> - Force
in PowerShell or
az group delete --name <Name> --no-wait --yes
using Azure CLI.
Finally, when you have completed the project and the resources can be deleted, you can check the bill. The bill includes the ability to download a CSV file that contains a daily breakdown of resource usage and the associated charge. This is useful not only in hindsight but also to check and see whether there are any resources consuming Azure compute without any real purpose. You could then remove them. To get that report, navigate to the Subscription blade and select the desired subscription. In the Billing section, select Invoices, which renders the option to download the Usage + Charges report, similar to that shown in Figure 4.27.
It is important to keep an eye on costs, not only from an Azure VM perspective but from a subscription-wide perspective. I sit next to some Microsoft employees who work on the Subscription and Billing team and hear how customers get a shock with the bill and try to reason their way out of it because they didn't expect it to cost that much. I don't get to hear or see the result, but I don't think the charge gets forgiven 100% of the time. So, be careful and keep a close eye on this.
A managed disk is a virtual hard disk (VHD), and whether you know it or not, you have already created many of them. Each time you created an Azure VM you used the Disks tab where you could select the OS disk type from a drop-down list, the option to add one or more data disks, and an Advanced section. Azure managed disks are not physical disks that use, for example, Small Computer System Interface (SCSI) or Integrated Drive Electronics (IDE) standards to connect a hard drive to a physical server. Instead, the managed disk is a construct of a page blob in an Azure Container Storage account. If you by chance do not know what a blob is, consider it to be a file (for example, DOCX, PHP, VHD, or PNG) that has content like text, source code or an image contained within it. Instead of saving the content of the file into a database, for example, you would store the entire file as a blob type. Data and storage concepts are covered in Chapter 5, so we won't go too deep into the topic now. Instead, just know that the hard disk really is a VHD, and the internals of how that works can be left to the Azure platform and accepted as part of the service in which the platform provides.
The alternative to a managed disk is called an ephemeral disk, which is found on the Disks tab during the creation of the VM. Expand the Advanced contents to see the option to create one. As you can see in Figure 4.28, an ephemeral drive is connected to the local virtual machine (VM) instead of the abstracted page blob. There are specific use cases for choosing this kind of drive; for one, ephemeral disks are free, unlike managed disks. They also perform better if your workload needs to be reset or reimaged, which makes sense because the disk is physically connected to the VM. Also, if your workload is stateless, you might consider an ephemeral disk. Considering both managed and ephemeral disks, it would mean that your VM can come up, go down, and be reimaged often with no impact on the application or your customers. If that is the case, then the ephemeral drive type might be an option. Most use cases would fit best on managed disks.
Managed disks have a 99.999% availability threshold, and you can currently create up to 50,000 disks per subscription per region. Five nines is achieved by duplicating your data on the disk to three different instances. All three disks would need to fail before there would be an outage, which is unlikely. Managed disks are deeply integrated with VMSS, which is discussed later and provides protection against impact because of a segment failure from within a data center (you'll learn more about that later in the chapter). Managed disks support Availability Zones, which were covered in the previous chapter and can be encrypted. Remember encryption at rest from Chapter 2? Table 4.9 provides more details about managed disk types.
TABLE 4.9 Managed Disk Types
Offering | Premium SSD | Standard SSD | Standard HDD |
---|---|---|---|
Disk Type | SSD | SSD | HDD |
Maximum Size | 32,767GB | 32,767GB | 32,767GB |
Maximum IOPS | 20,000 | 6,000 | 2,000 |
Max Throughput | 900 MB/s | 750 MB/s | 500 MB/s |
Usage | High-performance productions | Web servers, dev and test | Backup, noncritical |
The managed disk types in the table should be familiar to you; they were the options in the drop-drop list for the OS disk type on the Disk tab when creating an Azure VM in the portal. As a side note, notice that the maximum size of the disks is 32,767GB, which is 215. There is also an Ultra Disk type that is currently limited to a few regions, and the max size of that is 65,536, which is the number again that I called out specifically in the previous chapter, 216. When I see things like that, it makes me feel like there was some real thought behind them. Let's get back to the topic at hand; up to now we have only added an OS disk to the VM, which resulted in a disk configuration like that shown in Figure 4.29. This is displayed after connecting via RDP to the VM and running
diskmgmt.msc
.
In Exercise 4.9 you will add a data disk to an Azure VM. If you are not clear on the different types of disks available to a VM, refer again to Figure 4.20.
During the creation of the data disk, you may have noticed a drop-down next to the Storage Type option. The drop-down contained three options; we chose None as it was the default, meaning we simply want a blank disk. There was also an option to build the disk from a Storage blob, which was mentioned earlier. This means you can make a VHD of any VM, store it in a blob container, and then reference it as the source for a managed disk. That is cool, and you'll learn more about it in the next chapter. The final option was Snapshot.
Think about what a snapshot is in the real world. When you take a picture of something, the state of the subject of the picture is frozen in time and won't change. Similar to a custom VM image, which you created in EXERCISE 4.6, a managed disk snapshot is a copy of the contents of the disk at a point of time that can be used as a backup or as a means to build another identical VM for troubleshooting. If you navigate to one of the managed disks you created, you will notice the + Create Snapshot link at the top of the Overview blade for the selected disk. After you create a snapshot, if you do EXERCISE 4.9 again and select Snapshot instead of None as the storage type, then the one you created is listed in the drop-down.
There is an entire chapter, Chapter 9, that covers monitoring and recovery, so only some basic concepts will be discussed here. If you recall from Chapter 1 and Chapter 2, we have already discussed Azure Site Recovery. We will again touch on it with a comparison with the Azure VM backup feature, but the real, in-depth coverage is in Chapter 9. The specific topics covered in this section are provided in the following list:
The whole point to backing up data and software is to protect your intellectual property and your business. I cannot quote any specific incident, but imagine you have a program with 100,000 lines of code that cost the value of 2 million hours of labor. Imagine too that the data it accesses and analyzes is on the same server and is a set of values captured over the course of the last 10 years. Imagine there is neither a backup of the source code nor the data. That's a scary thought. What happens if the server crashes and you cannot get any of that data or code. If this happens, then the business disappears. This is an avoidable situation if you simply back up. A less impactful scenario discussed in the previous section is what can happen after an update to the operating system that requires a reboot. When you configure Update Management to install updates, you might consider scheduling a backup some hours or a day before the scheduled patching process. There is a Backup link in the navigation menu on the Azure VM blade that allows you to schedule a backup based on a configured policy. The policy can be based on daily or weekly frequencies. Once configured, there is a link at the top of the Backup blade, which allows you to back up manually. In the background, the backup creates and stores a snapshot that can be used later to provision a new Azure VM.
The Redeploy navigation menu item is a bit different than you might initially think. This can be easily misunderstood to mean that it is the feature used to rebuild a VM with a snapshot or a backup. However, when you click the menu item, it becomes quickly apparent that instead of recovery, the feature moves the application to a new host. That concept is interesting because it is only possible because the managed disks are not physically attached to the VM. It is therefore possible to attach those disks to another VM and boot it up, almost in real time, which is amazing. That is what happens when you redeploy and this model is used with VMSS, PaaS, and FaaS. Take note that with ephemeral disks, this kind of redeploy action is not possible because the disks are attached, but as mentioned, the reimaging is faster. You just need to determine which is best for your given use case.
The scenario of losing your business because you don't back up or haven't backed up should give an Azure Solution Architect Expert goosebumps. If I were to reflect on the model being proposed in this book (security, network, compute), I think the next consideration would be a business continuity and disaster recovery (BCDR) plan, mentioned in Chapter 1. That leads to the comparison between Azure Backup and Azure Site Recovery. From a backup (Azure Backup) perspective, your context is singular and granular. Where you are backing up a VM and then recover from that backup, you are thinking about files, machine state, or specific folders. You can see that in practice when you click the Backup navigation item for a VM versus when you click the Disaster Recovery Navigation menu item. The Disaster Recovery menu item is the entry point into Azure Site Recovery. Azure Site Recovery is, in old terms, disaster recovery (DR). DR means something big happened, everything on the machine is permanently gone, and likely everything in the data center will be down for many hours. Having a BCDR plan or contingency plan is built upon Azure Site Recovery; jump to Chapter 9 if you urgently need to learn more about that.
Migration is covered in detail in Chapter 8, but it is worth a mention here in the VM context. VMs are the most common cloud service model that is “migrated” to Azure. That's either because, like I mentioned, it was the first offering from Azure or it is an entry point for enterprises to deploy their existing applications that require virtual networks and have multiple tiers of compute requirements. Refer to Figure 4.19 if you don't know what I mean by multiple tiers. In this section we will touch briefly on the following migration topics:
First, it would be prudent to know what is meant by migration. Say you have a website running on a virtual machine constructed using Hyper-V. Hyper-V is Microsoft's virtualization offering that offers the same capabilities as VMware. The website could also be running on a physical server where no virtualization software is in place. Migration means that you want to move the workload from one place, usually in an on-premise data center, to an Azure VM. You would want there to be an automated process to achieve that. It should be an automated process that avoids the manual rebuild and reconfiguration of the server and application, a simple cut-and-paste scenario if you may. There are some advanced capabilities that help streamline a migration, but unfortunately they are not as simple as cut and paste. It is, however, as simple as preparing the on-premise machine by generating the proper configuration and data files and then packing them and deploying them onto a disk, which can then be used for the Azure VM provisioning. That process is presented in Chapter 8. For now, just know that some tools are called Azure Site Recovery and Azure Migrate; more manual tools include AzCopy, Disk2VHD, SYSPREP, Microsoft Virtual Machine Converter (MVMC), Hyper-V, and VMware.
Azure Site Recovery and Azure Migrate are the tools you will use if you have a large enterprise solution with multiple tiers and many machines. That scenario would be one where the number of servers being migrated is too great to even consider a manual provision. The other mentioned tools are for smaller migration projects, where much of the work can be accomplished by performing manual tasks and is performed by a single person or a small team.
What happens if you realize you have placed your VM workloads into the wrong resource group, subscription, or region? Perhaps, if you realize it early in the deployment cycle, a simple delete and re-create won't have any impact, and that is the simplest and cleanest way to move the workloads. However, if you realize this after some significant configuration has been performed on a VM or there are now other resources with a dependency on a given VM, re-creating that VM isn't really an option. It's not an option because the impact would be too large. Moving the VM into another resource group is actually easy. As you recall, a resource group is only a logical grouping; there is nothing physical about it. On the Azure Portal on the Overview blade, the resource group to which the VM is associated is displayed. Directly after the words resource group there is a link named Change. Simply click that and change it. The same goes for the subscription, as shown in Figure 4.31.
There are three points to call out here, as shown in Figure 4.31. First, although the resource group did change, its physical location did not. The VM was in CSHARPGUITAR-SN1-RG (South Central US), and I changed it to CSHARPGUITAR-DB3-RG (North Europe), but the physical location of the VM did not change. It does kind of make sense in that there may be IT solutions that are global, and I'd want workloads in different regions in the same resource group because they play a part in the overall solution. However, the name of the resource group no longer makes sense. The second point is that only the VM was moved, and there are more components to a VM than just the host. For example, if you enabled boot diagnostics, there would be a storage account for the VM, and there is a network interface, possibly an NSG, and a static public IP address and the disks. You would want to move all the pieces and not leave them separated. You would lose oversight of them quickly. I hope you are realizing more and more how important it is to organize your resources and that there are some good features in Azure to achieve that. Lastly, the name of the VM cannot already exist in the resource group to which it is being moved. The unique key in a resource group is name + resource type, which means you can name everything the same as long as the type of resource is different. An Azure SQL instance, a VM, a Cosmos DB, a Storage account, and an Azure VM can all be named CSHARPGUITAR since they are different types of resources.
Now we come to the more difficult scenario where you actually want to physically move the workload from a region or zone. There are two points to mention about moving like this in this chapter; you cannot move from any region to any region, and Azure Site Recovery is the recommended tool for achieving that kind of move. In the previous chapter, we discussed the Azure network, and the concept of a geography was provided. Well, that is the limit in which you can physically move an Azure VM. One of the reasons for that limit has to do with the sovereignty of the data or intellectual property running on your VMs. It is in a way a protective barrier so you cannot by mistake move restricted data into a location where there is a risk of its presence being prohibited. You know from the previous chapter that geographies are usually in the same area of the world. For example, West Europe and North Europe are in the same geography, and there may be less chance of breaching a Global Data Protection Regulation (GDPR) when moving between those two regions than any other region not in Europe. Moving between South Central US, West Central US, and North Central US is also supported, for example. You'll learn more about that and more on Azure Site Recovery in Chapter 8.
We also discussed what an Availability Zone was in the previous chapter. Simply, it means Microsoft has your back, and all of your compute is not running in a single data center. For a given region, there are multiple data centers, either across the street or across town, but your data and workload are replicated into both, making sure that your solution remains available even if an entire data center experiences a catastrophic event. If you recall from EXERCISE 4.5, on the Basic tab of the Azure VM creation process there was a drop-down list named Availability Options. In the exercise so far, the value was left at the default because infrastructure redundancy has an associated cost, and it wasn't ready to be discussed then. However, if you return and view the contents of the drop-down, you will find the availability zone and availability set. Both of them are touched on in the next section; however, if after creating a VM with no infrastructure redundancy, you decide you need it, then you can move it into a zone using Azure Site Recovery.
Availability is a big deal. If you were to deploy a workload to the Azure platform and then the application experiences more unavailability than it had on-premise, some serious questions would be asked. In Chapter 2, we discussed what an SLA is, and it will be touched on again in Chapter 9, but from an Azure VM perspective we'll touch on it a bit because it has something to do with the topic in hand. Basically, if you deploy only a single VM, the SLA commitment for its availability is 99.9%, which translates to approximately 45 minutes of allowed downtime per month. Is that good enough? It would depend on the criticality of the code running on the VM. If you need more, than you have to take some actions to add redundancies across availability sets and Availability Zones.
As you know, when you choose IaaS, the cloud provider is responsible for managing the hardware and infrastructure. Those components do sometimes experience transient outages or require some maintenance. If you have only a single instance of your VM, then there will be downtime if the hardware or infrastructure updates a component that requires a reboot or inhibits traffic transmission from arriving to or departing from your VM. For example, hardware driver updates or a memory module fails. If you create multiple instances of your VM, Microsoft has implemented two concepts to help manage outages during transient or maintenance activities called fault domains and update domains. When you place your VMs into those domains, the SLA increases to 99.95%, which allows about 20 minutes per month, which is less than half the downtime when compared to running on a single VM. There is a cost for the additional instance, but no cost for the domain feature. If you then go so far as to place multiple domain instances into multiple zones, then the SLA increases to 99.99%, which is a very low amount of downtime at 4.5 minutes per month. Again, it costs more, but the cost may be justified based on the workloads running on the VMs. You may be asking yourself, have we spoken about these domains? The answer is not really, but by the end of this chapter, you will create some and add some VMs to them. You should certainly know what an Availability Zone is because it was discussed in the previous chapter. You will also add VMs to an Availability Zone later.
In Chapter 3, I stated that more would be covered regarding availability sets. That will happen in the coming paragraphs. In addition to that, it will be good to include scale sets in this discussion now. They kind of read the same and sound the same if spoken; some might even think they are the same, but they are not. An availability set is like a DR environment or primary secondary architectural configuration. While a virtual machine scale set is a group of identically configured VMs. Both do have something in common, in that they benefit from fault domains and update domains. See Figure 4.32 for an illustration.
A fault domain is a grouping of VMs that share the same network switch and power source. An update domain is a group of VMs that can be rebooted at the same time after a maintenance activity has been performed. That would mean it would be prudent to place VMs that shouldn't be rebooted at the same time into different update domains. The platform is intelligent enough to place primary and secondary VMs into different update domains on a best-effort basis. For both availability sets and VMSS, you have the fault domains feature and get up to three for availability sets and a default of five for VMSS. There are five update domains created into each fault domain. Finally, note that neither fault nor update domains will prevent downtime caused by application or operating system exceptions. Let's now get into a little more hands-on detail for both Azure products.
As mentioned just previously, an availability set is most useful for disaster recovery environments or primary/secondary environments. From a DR or failover perspective, consider you have a web farm that is an internet/intranet application running on many different servers behind some kind of load balancing device. The reason for having many servers running the application is for redundancy. If you have only one server and it goes down, then your business is down. Therefore, have numerous servers each of which is acting as a DR or failover instance of the other. The issue comes when there is a power outage or transient network issue. Fault domains will prevent your servers from being dependent on the same power supply and network switches. When you create your VMs, you select the availability set to place them into, and the platform decides which fault domain to place them into.
Each VM in an availability set is uniquely named, and each can be configured and added to an application gateway or load balancer. Additionally, if you are running a managed SQL Server instance, you would want to have a secondary copy of the database in case of outage, again both being in separate availability sets. Finally, if you were performing updates to your VMs, installing patches, or doing an OS upgrade, you wouldn't do it on all the machines and then reboot them all at the same time. It is actually a tedious process for installing updates and maintaining availability and resiliency throughout that process. It is no small undertaking. But in the simplest context, you would install a patch on one and test to make sure all is well, before moving onto the others. From a database perspective, you may want to failover to the standby, making it the primary after the upgrade and then upgrade the primary, which then becomes secondary. The update domain concept is intended to help maintain availability during Azure hardware and infrastructure updates and not operating system or application upgrades. When you create an availability set in Exercise 4.10 you have the option to choose the number of fault domains, from 1 to 3, and the number of update domains, from 1 to 20. Figure 4.33 simulates the organization of five VMs into five update domains in three fault domains.
Complete EXERCISE 4.10 to create an availability set and then add some VMs to it. Note that you can only add a VM to an availability set during VM creation. You cannot move a VM into an availability set after creation; it must be deleted and re-created. However, with what you know about images, disks, and backups, that shouldn't be a significant issue. Just keep that in mind.
In Figure 4.34, notice that of the two created VMs placed into the availability set, one is in fault domain 0, update domain 0, while the other is in fault domain 1, update domain 1. This means that I can have, for example, either a primary and secondary database or two web servers running the same application behind a load balancer on those machines while they are highly isolated from each other. Additionally, recall from step 3 in EXERCISE 4.10 that the drop-down allowed the choice between either availability sets or Availability Zones. When you choose Availability Zone, you will get the availability set replicated into each of the zones, which is like a redundancy within a redundancy. You get the realized redundancies from the domains within a single data center replicated to other data centers (zones) within the region. That is great! Lastly, 2,000 availability sets are allowed per region, per subscription, with a maximum of 200 VMs per availability set. That is a large number, but maybe your business will take off due to its high performance and availability, and you need that many. Let's hope so.
As mentioned already, scale sets are identical instances of a VM that can scale out based on traffic and load. VMSS provides redundancy capabilities but for a different use case than availability sets. Instead of the primary/secondary or DR scenario, one VMSS benefit is to automatically scale out more instances of the VM based on load. As illustrated in Figure 4.35, the scaling minimum and maximum number of VM instances can be set from 0 to 1000; the default setting is a minimum of 1 and a maximum of 10. You will see that image in practice when you complete Exercise 4.11. From an autoscaling perspective, the default setting is to scale out when the average CPU percentage for all instances is greater than 75%. When that happens, the platform will create another instance of your VM up to a maximum of ten. If later the CPU consumption on the running VMs goes below an average of 25%, then the number of instances will be reduced by one until there is only one left. This scaling rule can be changed after creating the VMSS.
To get more knowledge about a VMSS, create one in EXERCISE 4.11. Before proceeding, however, you will need either an application gateway (Exercise 3.10) or a load balancer. Both of these were discussed in the previous chapter. For EXERCISE 4.11, an application gateway is utilized because we walked through the creation of one in the previous chapter.
In EXERCISE 4.11, we did not create public IP addresses for the instances that will live in the VMSS pool. The radio button for the public IP address per instance was left at the default of Off, since all the VMs will reside behind an application gateway and will not be directly accessible from the internet. Next, if you click the Instances navigation menu item, it will list the three VM instances created. Click one of the instances, and on the Overview blade you will see its location, which also includes the zone in which is resides. Since, per the instructions, you selected all three zones, each VM is placed into a different zone, which gives maximum redundancy. Note that also, behind the scenes, the concept of fault domains and update domains is at play and implemented for an ever-greater level of resiliency when the number of VMs increases.
On the Scaling menu item, you can modify the autoscale configuration created originally or manually scale to a static number of instances. Those decisions, like many others, are dependent on the requirements of the application. The Storage menu item is an interesting one and one of the nice benefits of a VMSS. Remember that you chose a custom image when you created the VMSS. This means that each of the three VMs that got created used the same image. The content and configuration of all three VMs are therefore identical, making the deployment of new instances of the image into the VMSS quick and easy with no chance of having an error occur due to a wrongly configured VM. Before this kind of service existed, servers had to be built manually, and there were often scenarios where one of the servers in the pool didn't act like the others. It was hard to find which one it was and sometimes not possible to find out why it was misbehaving, so we just rebuilt from scratch and hoped we got it right the next time. That manual scenario would not be possible in an environment with numbers close to the maximum number of VMs per VMSS of 1,000 instances, 600 of which can be based on the same image. This is per region, per subscription. At that scale, all instances must be identical, which is what is achieved with VMSS using a single base image for all VM instances within it. Finally, if you decide you need a larger VM size, click the Size menu option and scale to a larger size, no problem.
You might be wondering about the dependencies an Azure VM host has on the disks that are attached to it like the OS disk, data disk, and temp, as you might recall from Figure 4.20. As inferred earlier, the contents of a managed disk that is attached to an Azure VM is stored as a page blob in an Azure Storage container. Storage is covered in Chapter 5; however, the concepts relating to storage such as LRS, ZRS, and GRS have been touched upon already and are relevant here. To see why they are relevant, execute this Azure CLI command from https://shell.azure.com/
az disk show --ids <disk-id> --resource-group <name>
Notice the output for
"sku": { "name": "Premium_LRS" }
and make a quick assessment of what that means from a redundancy perspective. LRS means the disk, although copied three times, is locally redundant and has domain redundancies within the data center, but not outside of it. The same goes for a VMSS. To confirm that, navigate to the Instances menu item on the VMSS blade where you can find the name and the numeric
InstanceId
of the disk. To view the details of the disk attached to a VMSS instance, run this PowerShell cmdlet. Be sure to replace the contents within
< name >
variables with your specific/unique values.
(Get-AzVmssVm -ResourceGroupName <name> -VMScaleSetName <name> `
-InstanceId <#>).StorageProfile.OsDisk.ManagedDisk.StorageAccountType
That cmdlet dumps out the account type into which the disk is stored, in this case
Standard_LRS
, which means locally redundant storage (LRS). The important point here is simply that you know that, and you know the dependencies your solution has also impact the availability of your solution. Considering that an availability set is local but redundant across the fault and update domains, you can instead choose to deploy the VM into availability zones and get ZRS reliance in addition to the domain features. Lastly, the
Get-AzVmssVm
PowerShell cmdlet that ran previously was for a single managed disk on a single VM. Remember that in EXERCISE 4.11 you created a VM in three zones; therefore, if you ran that command on all three managed disks, it would still show
Standard_LRS
, but each one would be in a different zone.
Chapter 2 covered security, but not specifically focused on the context of an Azure virtual machine. Let's discuss this a bit more now. There are three aspects of security that need some extra discussion.
The Azure Portal, Azure PowerShell, Azure CLI, and REST APIs all allow great administrative capabilities. You know already the method in which to restrict who can do what is via RBAC and the Access control (IAM) menu item on the VM portal blade. Determining who can do what to which resource at which level in the resource hierarchy requires thought, design, and maintenance. I mean, at the click of a button, a VM can be deleted. That may not be so bad, knowing what you do now about images and how managed disks work. The VM can relatively quickly be recovered, but what if the disks or images get deleted…on accident? You need to prevent that. And you should already know how to do that.
When you create the VM, part of the process is to create an administrator username and password. You need to protect them and make a conscious decision as to who gets the credentials. Instead of sharing the admin credentials, create specific accounts with a minimal set of permissions using a just-in-time access policy and monitor the connection activity using the Connection Monitor menu link on the VM blade in the portal. Microsoft doesn't back up the content of your disks on your behalf, but you can, and it's recommended. Microsoft simply lets you do with the provisioned resource as you wish while keeping it highly available. Once someone gets onto the server with mischievous intentions, if they have the credentials, then there isn't a lot you can do to prevent accidents or unwanted behavior. Removing files, replacing files, running some malware, or planting an exploit are things that you must be cognizant of when running your workloads. If you ever feel the credentials of a VM have been compromised, then there is a Reset password menu item on the VM blade. This is also helpful if you happen to forget it, which happens sometimes since you never write them down because that's a no-no. There is also a Security menu option that is an entry point into Azure Security Center (ASC). ASC helps monitor for activities that have been discussed in this paragraph.
From an application perspective, it depends greatly on your chosen identity provider and how it is implemented. You can, however, add safeguards like network security groups, an Azure Policy that enforces TLS 1.2 and HTTPS, and perhaps Managed Identity. All of these application-level security techniques are helpful in securing your application running on a VM. There is one last technique that we also covered in Chapter 2, and that was the concept of encryption of data at rest. This is important if the contents of the managed disks need to be encrypted due the storage of very sensitive data, intellectual property, or the code that runs the application. To encrypt a managed disk, complete Exercise 4.12.
There you have it. After reading this section, you now know Azure VMs in some depth. You should feel confident that going into a meeting or to the Azure Solutions Architect Expert exam that your knowledge is ready for use. To make sure, answer the following question, and then move on to learn about other Azure Compute products.
The answer is Azure Site Recovery, which is helpful not only for planning and recovering from a disaster but also for moving on-premise workloads to Azure VMs. SYSPREP and AzCopy are not useful for moving VMs, and Azure Migrate is more for planning the move versus doing the actual move.
Azure App Services are Microsoft's PaaS offerings. This product grouping offers Web Apps, API Apps, Mobile Apps, Web Apps for Containers (Linux), App Service Environments, and Azure WebJobs. PaaS removes the responsibility of operating systems maintenance from the customer. In Chapter 3, I mentioned the concept of a sandbox and that Microsoft PaaS offerings operates within one. The sandbox is what restricts the actions that you and your code can do. For example, no writing to the registry, no access to the Event logs, no out-of-process COM, and no access to User32/GDI32 are a few of the most common impactful limitations. There is a complete list of the restrictions on GitHub specifically at this location:
github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox
Additionally, supported framework runtime versions (for example, .NET, PHP, Python, Node, etc.), third-party drivers, cypher suites, registry editing, or root certificates are driven by the cloud service provider. This means, for example, if you want to target .NET Core 3.0 or Python 3.8 and the runtime is not on the platform, then you cannot use it. When you decide to run your workloads on a PaaS offering, you gain some value by delegating responsibility of the OS, such as patching, security, and OS upgrades. However, you lose some flexibility when it comes to making custom environmental configurations and targeting new versions of runtimes that include the newest technologies.
Azure App Services come in a variety of pricing tiers and sizes, the combination of those two attributes create what is called an App Service Plan (ASP). From a pricing tier perspective, there are six different options, each having a different set of accompanying features. Free and Shared are the most economical offering and are intended for testing and learning; you shouldn't run any production workload on them as neither of those tiers comes with an SLA, and they have some strict metered capacity constraints. For example, running on the Free plan, your process can consume 60 minutes of CPU minutes per day and 240 minutes in the Shared plan. This may be enough for some small test sites, but as already mentioned, it's not for production workloads. The other four tiers are Basic, Standard, Premium, and Isolated. Isolated is also called an App Service Environment (ASE), which is discussed later. Some features available in Basic, Standard, and Premium are presented in Table 4.10.
TABLE 4.10 Basic, Standard, and Premium Tiers
Feature | Basic | Standard | Premium |
---|---|---|---|
Disk space | 10 GB | 50 GB | 250GB |
Max instances | Up to 3 | Up to 10 | Up to 20 |
Deployment slots | X | ✓ | ✓ |
Cloning | X | X | ✓ |
VNET Integration | X | ✓ | ✓ |
Autoscale | X | ✓ | ✓ |
Backup/restore | X | ✓ | ✓ |
Traffic manager | X | ✓ | ✓ |
Number of Apps | Unlimited | Unlimited | Unlimited |
Any of the tiers discussed in Table 4.10 execute in what is called dedicated mode. This means the application is running on its own VM and is isolated from other customers. This is not the case in Free and Shared modes. In addition to the different feature limits available per tier, they also come in three sizes: Small, Medium, and Large. Basic and Standard VMs currently run on A-series sizes where:
Premium tier, on the other hand, runs on Dv2-series VMs and come with the following allocated compute resources.
A reason I call out the sizes is that there are restrictions based on the size of the chosen VM, which is an important one. It is important because there are some bad coding patterns that occur often that cause troubles with an application running on an App Service. Those patterns will be discussed in Chapter 7. The specific restriction is the number of concurrent outbound connections. There is no limit on the overall number of outbound connections; the limit and the key word here is concurrent. In other words, how many open, in-scope connections is the application currently holding? The limit is 1,920, 3,968, and 8,064, per instance, for Small, Medium, and Large, respectively. Per instance means that if you have two instances of a Medium, you can have a total of 7,936 outbound connections, which is 2 × 3968.
Recall some of the options covered in Table 4.10. Let's discuss them a bit more, specifically these options:
The documented maximum number of instances can be increased if your workload is running in the Premium plan and you need more than 20. Instances were also just mentioned regarding the number of concurrent outbound connections. An instance is a dedicated VM, aka ASP, that is running your application workload. You can increase and decrease the number of instances using an Autoscale rule, as discussed previously for IaaS Azure VM, or you can perform the scale out or scale in manually. The technique to achieve such scalability with App Services is also similar to a VMSS, where the image is stored in a central location and new instances are built from that as needed. The difference with an Azure App Service is that instead of an image, the application source code and its configuration are stored in an Azure Storage container and used when adding a new instance into the pool. An image is not necessary because you are running on PaaS and the operating system configurations are all the same on every VM.
Recall from Table 4.10 that the number of apps per ASP was unlimited. Nothing is really unlimited; it just means that there is no coded limitations on it, like there is for Free (10) and Shared (100). Each time you create a web app, for example, you are effectively creating a website that will run in its own process within the ASP. Each website gets its own hostname, such as the
*.
azurewebsites.net
, which you already know since you created a few in previous chapters already. The limitation isn't on the number of apps but on the amount of compute resource your applications require plus the number of instances per pricing tier. If you can get 1,000 websites onto a one-CPU machine with 1.75GB of RAM to be responsive, great, but I'd find that unlikely. You would need to use better judgment on which apps need how much compute and then balance them across different App Service Plans and instances. This leads nicely into deployment slots, which are test instances of an app on the same ASP. When you create a slot on an ASP for a given app, you are effectively creating a new process to run a newer version of that same app. In Chapter 9, we will cover deployments, but it is safe to write here that deploying straight into production is risky. Deploying to a deployment slot avoids doing that.
Take, for example, that your web app name is csharpguitar.azurewebsites.net
and you create a deployment slot named test. A new web app will be created named csharpguitar-test.azurewebsites.net
. You can then deploy a new version of the csharpguitar web app to the test site and test it. Once you are sure all things are working fine, you can do what is called a slot swap. This means the test site becomes production, and the production becomes the test. This happens by switching which process responds to requests made to the hostnames. Both processes and hostnames remain alive, but new requests start flowing to the new version once the slot swap is performed. Finally, cloning, which is like a backup, is a cool feature. It is also helpful in debugging and trying to find the root cause of an issue. From a debugging perspective, some logging or debugging techniques can have negative impacts on application performance and stability. Making an exact replica of the environment and placing it someplace else can give you more options when troubleshooting issues. Cloning is helpful if you want to move your application to another region, for example. From the portal you can move an App Service Plan between resource groups and subscriptions; however, moving to another physical region is not possible via a click activity in the Azure Portal. Cloning will help you achieve that if that is your objective.
An Azure App Service web app is a popular entry point onto the Azure platform from a PaaS perspective. They are limited to no barriers to entry, and you can be up and running with an internet presence in minutes. That is absolutely an amazing feat when reflecting back 15 years or so and the amount of effort to achieve this. Then also, if required, you could scale out to 400 CPUs in a few additional minutes, which is absolutely inconceivable. Let's do a few things with web apps now. If you do not remember how to create an Azure App Service, please refer to EXERCISE 3.8 in the previous chapter. In Exercise 4.13, you will enable Azure Active Directory authentication for an Azure App Service Web App. This is a feature named Easy Auth and was introduced in Chapter 2.
You certainly noticed the other authentication providers such as Microsoft, Facebook, Google, and Twitter. As easily as you enabled AAD for authentication for your site, you can do the same using those other providers. Note that this is for authentication only, which means we can only be sure your visitor is who they say they are; it doesn't provide any information about what they can do on the site. That is more application-specific and is the authorization part. We won't cover that in more detail here. One last note is that in step 1 of EXERCISE 4.13 the action to take was to log in with Azure Active Directory. This means that no one can access your site if they do not have an identity to be validated. It is common to select the other option Allow Anonymous and then let the visitor create a profile on your site. Once the account is validated, the user can have access to other protected features in your site. However, that kind of accessibility requires code in your application to check for the validation tokens. What you did in EXERCISE 4.13 is simply wrap the entire application with a security protocol; if more granular or more precise authorized access is required, then it will require some code.
In Exercise 4.14, you will configure Hybrid Connection Manager (HCM) to connect to an IaaS Azure VM in a VNET. Instead of using an Azure VM, you can also use an on-premise machine. This is a simple method for connecting a Web App to an on-premise machine or an Azure VM in a virtual network. HCM uses ports that are typically open for allowing internet traffic, so in most cases this configuration works quickly.
In summary, an Azure App Service web app is a place to run internet applications using Microsoft's PaaS cloud service. It has limited barriers of entry, and you can deploy, scale, and consume at a very fast pace. There are two other products that run on the same PaaS platform as a web app; they are API apps and mobile apps. A brief description of them follows.
API apps, not to be confused with API management discussed in the previous chapter, is a product offering that correlates more closely to a Web API versus a web application. In short, an API is an interface that exposes one or more methods that can be called from other applications or consumers; there is no graphical user interface (GUI) with an API. API Apps fully support cross-origin resource sharing (CORS) and Swagger.
When you create an API app, you follow the same process as when creating a regular web app. The difference is that on the Web App blade there are navigation items named API definitions that are used for the configuration and implementation of Swagger. Swagger has numerous capabilities, but its notable feature is making it easy to discover and consume the API app. As you may know, one of the difficulties in consuming an API is trying to figure out the name of all the methods it exposes, the required parameters for them, and the kind of authentication protocol required to communicate with the API. Swagger, when configured, provides this information and greatly simplifies its consumption. There is also a navigation menu item named CORS, which is used for identifying allowed origins. CORS is a security feature that makes sure when code (for example, JavaScript) is running in your browser that any outbound calls from within the code are allowed. There have been cases where scripts are maliciously injected into client-side JavaScript that downloads some snippet of malware. CORS is implemented with the preventions of such injections in mind.
Some years ago, Mobile Apps was its own product; today it is running on the same platform as web apps and API apps. It will soon be retired, but it's worth mentioning because of its historical significance. When Microsoft was making its push into the mobile world, this Mobile App was a central point of that. It was planned to be the backend for all the apps that would be published to the Microsoft Store. Well, we all know how that went and it is why we see “no light at the end of the tunnel.” Regardless, you create a mobile app like you would an app service and look in the navigation menu for easy tables and easy APIs. Both of those made it quick and easy to set up the backend database or dependent APIs for the mobile app. I expect the feature to be removed, but I am confident it will come back in another shape or form in the next few years. That behavior is a fairly common one (i.e., retire something and then bring it back with a new look and marketing campaign).
Yes, you can run Linux on PaaS. The version of Linux that Web Apps runs on is Ubuntu Server 18.04 LTS. However, this isn't a showstopper because you can also run Web Apps in a container, which allows you to choose the targeted OS. To create a web app that targets Linux, you begin the same as you have previously, paying special attention to the Publish, Runtime Stack, and Operating System attributes on the Web App creation blade. Figure 4.44 shows how that looks.
The first attribute is Publish, which provides the option for code or a Docker container. If you choose Code, it means you will receive a VM that you can then deploy your application code to. The other option is Docker Container, which you should already be an expert in as it was covered already in this chapter. If you choose Docker Container, the Runtime drop-down goes away, and your only option then is to choose Linux or Windows. Choosing a Docker container would result in a VM being allocated to run the container that you create locally or download from Docker Hub, for example.
The contents of the Runtime Stack drop-down is a list of supported, for lack of a better analogy, common language runtimes (CLRs). The runtimes are the frameworks and libraries in different programming languages and their different versions that are available on the platform for your application to target. There are many of them; Table 4.11 displays some of the supported runtimes. For a most up-to-date list, check https://docs.microsoft.com/en-us/azure/app-service/containers/app-service-linux-intro#languages
.
TABLE 4.11 Supported Open Source Languages
Language | Versions |
---|---|
.NET Core | 1.0, 1.1, 2.0, 2.1, 2.2 |
ASP.NET | 3.5, 4.7 |
Java | 8, 11, SE, Tomcat 9, 8.5 |
Node | 4.x, 6.x, 8.x, 10.x |
PHP | 5.6, 7.x |
Python | 2.7, 3.6, 3.7 |
Ruby | 2.x |
If a version of the runtime or the total programming language is not supported, then you can simply create a Docker Container to run the workload on PaaS. An interesting scenario you may notice is that when you select, for example, Node or PHP from the Runtime Stack drop-down, the Operating System selection automatically changes to Linux. Even though you can run Node and PHP on a Windows OS, the portal is steering you toward Linux. It's similar behavior for .NET Core. Even though it can run on Linux, when selected, Windows is the selected operating system.
I have a last point to make about Web Apps for Container, and it is an important one. At the beginning of this section I mentioned the sandbox that constrained the configuration that you can perform when running your application with Azure App Services. With Web Apps for Containers, this is no longer the case. Whether you run your application on Windows or Linux, you can create a Docker container and deploy it to the Azure App Service environment. Within the Docker container you can make any modification you desire—registry changes, cypher suite reordering, whatever. The biggest hurdle for most, as I mentioned in the container section, is that this capability is kind of new, and many are reluctant to consume it because it is kind of young. However, for smaller workloads having a less mission-critical role in the IT solution, you should give this a try. It is time to start learning this, and you learn best by consuming, maintaining, and troubleshooting it.
An App Service environment, aka the Isolated tier, is an instance of the entire Azure App Service capability inside a virtual network accessible to only one entity. When running in other tiers such as Basic, Standard, and Premium, i.e., in a multitenant stamp, although your VMs are dedicated, there are other customers who use some of the shared resources that provide the App Services product. For example, the front ends that load balance and direct requests to the correct web app or to one of the multiple instances of the web app are shared resources in Basic, Standard, or Premium. If you do not want or cannot share resources with other tenants, then you can choose the Isolated tier and get all the shared resources for yourself. You never share VMs running your application in Basic, Standard, or Premium; they are dedicated to you.
There are two benefits that come to mind when running in an ASE that do not exist in other tiers. The first is that you can implement an internal load balancer (ILB), which will prevent direct access to the ASE from the internet. In the context of an ILB, the IP address and the internet end point (i.e.,
*.
azurewebsites.net
) are not globally accessible and need a gateway VPN connection to allow access. In addition, the ordering of cypher suites is allowed with an ASE. A cypher suite is an encryption algorithm used for encrypting TLS connectivity between a client and server. They are added to an OS in a specific order, and the first match is the one that a client and server agree to use. Sometimes a customer wants to use a stronger or different cipher for many reasons. An ASE allows the reordering; other tiers do not. Table 4.12 shows the feature limitations for an ASE's Isolated tier.
TABLE 4.12 ASE/Isolated Tier Limitations
Feature | Isolated Limit |
---|---|
Disk space | 1 TB |
Max instances | Up to 100 |
Deployment slots | ✓ |
Cloning | ✓ |
VNET integration | ✓ |
Autoscale | ✓ |
Backup/restore | ✓ |
Traffic manager | ✓ |
Number of Apps | Unlimited |
The VMs are a dedicated Dv2-series with the following specifications:
Keep in mind that with ASEs you are getting a few more servers that run the other Azure App Service components that are typically shared in the multitenant environment. Those additional series and the additional benefits come with an additional cost.
This product is most common for large enterprises or customers who have some stringent security requirements and need or desire to run PaaS within their own isolated VNET.
Azure WebJobs is a feature strongly linked to Azure App Services that supports the execution of background tasks. This is similar to the older concept of batch jobs. Batch jobs are typically programs that are run at scheduled intervals to process data or to read from a queue and perform some action based on its contents. They typically have filename extensions like
.cmd
,
.bat
,
.exe
,
.ps1
, and
.js
, which the operating system recognizes as an executable program. There are two types of WebJobs.
A triggered WebJob is a program that is scheduled to run using a scheduler scheme named CRON, or it can also be run manually. A common format of CRON is illustrated in Figure 4.45. For example, a CRON schedule of
0 15 8 * * *
would run every day at 8:15 a.m.
There are two common ways to manually trigger an Azure WebJob; the first is from within the Azure Portal. As stated, a WebJob is tied into an Azure App Service, meaning an Azure App Service is required to run a WebJob. To manually run a WebJob, access the Azure App Service where the WebJob is located, select the WebJobs link from the navigation menu item, select the WebJob, and click the Run button. The WebJob blade looks something like that shown in Figure 4.46.
The other means for manually triggering a WebJob is using the WebJob API. The WebJob is hosted on the Azure App Service platform, which exposes a global endpoint. This means also that the WebJob is globally accessible. When KUDU was mentioned and you accessed it via the Advanced Tools menu item, you may have noticed the URL that you were routed to. It was the name of the web app followed by .scm.azurewebsites.net
. If you add
/api/triggeredwebjobs/<WebJobName>/run
and you access the URL, then the WebJob would be manually triggered from the WebJob API. A final example might resemble the following, where * is the name of your web app:
https://*.scm.azurewebsites.net/api/triggeredwebjobs/Exception002/run
Just on a side note, this API is one of the foundational technologies on which Azure Functions is built. Azure Functions is discussed later, and you will read briefly about a Timer Triggered Azure Function; just note that the WebJob SDK is a fundamental component of Azure Functions, and it may or may not be recognized without some deep technical review.
The other type of WebJob is one that runs continuously. This is useful for scenarios where you process orders or execute reports in the background but in near real time. Capturing a customer order is something that you always want to make easy, which is why you want to do as little as possible when the place order button is clicked. Perhaps simply place the order details into a queue. In other words, do a simple valuation and then a simple insert into a data source. Then you can have the WebJob continuously monitoring that queue that then processes the order offline. If there are any errors, notify an administrator to correct it and get it reprocessed. Once the order is reprocessed, then the WebJob can send the customer an email notification. Another scenario is running large reports. Many reports take a long time to process, and running them in real time can result in timeouts and not getting any results. Again, you could save the query parameters in a queue that a WebJob is monitoring, and then it can process it. Batch or offline processes typically run longer than real-time web requests, which have built-in timeouts. Once the report is complete, a link to the output is emailed to the user. Easy.
There is a configuration that needs to happen on the Azure App Service to run continuous WebJobs. You must have enabled a capability called Always On. There is a mechanism that conserves resources by shutting down websites that haven't been used in 20 minutes. This shutdown feature can be disabled by enabling Always On, which is found on the Configuration menu item for the Azure App Service. Another point worth considering is what happens if you run multiple instances of your Azure App Service. Does the WebJob run continuously on every instance? The answer is yes, by default it does. However, you can configure the WebJob as a singleton by creating a file named
settings.job
and placing it into the root directory of where the WebJob is located. The contents of the
settings.job
file are as follows:
{ 'is_singleton': true }
Instead of creating the
settings.job
file manually, if you configure the WebJob from the portal using the + Add button, as shown in Figure 4.46, when you select Continuous as the type, you have the ability to select Multi Instance or Single Instance from the drop-down, as shown in Figure 4.47.
Figure 4.47 also illustrates a view of how + Add Capability looks when Triggered is selected.
In the previous section, we touched on batch jobs. So, you should have some idea at this point of what they are. However, in that context I would think we are talking about a maximum of 20 to 30 cores and perhaps from 7GB to 14GB of memory to perform a serial execution of a specific task. Let me now introduce you to Azure Batch, which provides you with the tools to run processes in parallel on a large scale. Azure Batch lets you connect thousands of compute instances across multiple graphical processing units (GPU) that provide low latency and extremely powerful processing when compared to CPU-only architectures. Tasks such as dynamic image rendering and visual simulations with GPUs, Big Data scientific and engineering scenarios are a few examples of workloads that run using Azure Batch and High-Performance Computing concepts. Those workloads can run on either Windows or Linux, using coding languages like R, Node, PHP, Ruby, Python, Java, and .NET.
There are numerous HPC models, two of which work optimally with Azure Batch; they are InfiniBand and Remote Direct Memory Access (RDMA). As shown in Figure 4.48, InfiniBand architecture provides a metrics-like structure to group together massive, and I mean massive, compute power. The compute power can come from a mixture of CPU, GPU, and XMC processors.
Using Azure Batch, the RDMA architecture, as shown in Figure 4.49, illustrates the sharing of memory between a pool of nodes.
Notice how the access to memory across nodes happens over an Ethernet connection. Both InfiniBand and RDMA support what is called cloud bursting. If you recall from the previous chapter, we discussed the concept of a hybrid network, and one use of that hybrid model was to handle extra traffic or unexpected growth when necessary. There is a similar model that allows you to cloud burst from an on-premise HPC architecture to Azure if you happen to run out of or want more compute power. The connectivity works via both ExpressRoute and a VPN Gateway connection. Finally, let's not fail to mention that if you want a super computer, Azure offers you the ability to get a dedicated cray computer placed into one of your private virtual networks so you can crunch and crack those weak encryption algorithms in less than two months. You might be tempted to do that just for fun, but there isn't even a price I could find for that, so if costs matter, then you might want to just consider this as something cool and for NASA or other government organizations to consume.
Let's start off with creating an Azure Batch account in Exercise 4.15.
Before you attempt the online simulation, let's cover a few things. First, what is the difference between the Batch service and Subscription service that you saw on the Advanced tab? For most cases, you would use the default Batch service, which makes the most of what is going on with compute provisioning behind the scenes. In general, it simplifies the consumption and execution of workloads by handling the server pools for you. If you want to have more control over the server pools or utilize Azure Reserved VM Instance, choose the Subscription service. Next, take a look at Figure 4.50 to see more about how Azure Batch and HPC works.
An application is the code and data that you are tasking Azure Batch to process. In EXERCISE 4.15, the source and data are accessible on GitHub and used in the Azure Portal when creating the application within the Azure Batch blade. The pool, as shown in Figure 4.50, is the group of compute nodes. Recall from earlier where you read about InfiniBand and RDMA. You would have noticed that the difference in those models is based on one having massive compute processing power and the other offering massive amounts of memory. Prior to creating your pool, you need to consider which kind of model your HPC workload requires. Consider Figure 4.51.
Figure 4.51 symbolizes the different kinds of HPC, where some workloads perform computational activities that require large CPU power for simulations and analytics, while other data analysis programs parse large databases that have been loaded into memory, requiring lots of RAM. If the workload processes images for gaming or real-time images generation, then the workload would need large amounts of GPUs. There could also be a scenario where you would need a solution that could benefit from both CPU and GPU with a smaller amount of RAM. As shown in Figure 4.52, there are a large variety of options when choosing the kind of VM nodes to place into your pool.
When you run your workload, you do not have to worry about the amount of resources in the pool. When you create the pool in the portal, there is a scale section where you can create a customized scale algorithm to scale out and in with. When I think about VMSS and try to compare it with Azure Batch, a difference is that I do not need a base image, and it looks like there are VMs with much higher resource power in the selection list. I would think that it boils down to the reason you need the compute power. If you are doing intensive graphics simulations or Big Data analysis, then use Azure Batch. If you are running an enterprise web application or other workload type, then use VMSS or even App Services. Also, in Figure 4.50 there were tasks and a job, which is where you specifically specify what method or activity in the application is run in which pool.
Storage is covered in detail in the next chapter. As you can imagine, images, movies, Big Data, and other kinds of files that get processed by Azure Batch will likely be huge. The space required to store them and the latency you get when retrieving them is an important aspect when implementing a workload using the Azure product. Azure-supported storage products include VMs that are storage optimized (for example, the Ls series), if your workload will store large amounts of data on the physical VM. Also, blob, table, and queue Azure storage containers and Azure files all can be utilized for storing the input and output for your Azure Batch HPC workloads. Let's not forget about a DBMS like SQL Server or Oracle, which can be used to store and retrieve data in such scenarios.
Access the Azure Portal, and in the search textbox at the top of the browser enter Marketplace. Then do a search for HPC, which will result in numerous preconfigured HPC solutions that can be created, configured, and utilized fast. You might consider taking a look through these to see whether they can meet your requirements prior to investing the time required to build a custom HPC solution.
Many details of an Azure Function have been touched on in Chapter 1, so review that if you need a refresher. Azure Functions is Microsoft's FaaS product offering and is also commonly referred to as a serverless computing model. One of my colleagues once said, “If you have two servers and you take one away, what do you have? You have a server less.” Although that is indeed true and kind of funny, it isn't what serverless means in regard to Azure Functions. What it means is that when your Azure Function workload is not running, it is not bound to any compute resource and therefore not costing you anything. The real beauty of serverless is its cost effectiveness. In reality, the product is free, yes, free, as long as you run in Consumption mode and stay under 1,000,000 executions and 400,000 gigaseconds of compute. A gigasecond is an algorithm based on memory consumption and the number of seconds the Azure Function ran. It's a lot of free compute—more than enough to get your hands dirty with and even to run some smaller production workloads. When your Azure Function is invoked/triggered, a feature called a Scale Controller will notify the Azure Function that it has work to do, which results in a real-time provisioning of compute capacity, the configuration of that compute resource, and the execution of the Azure Function code that you have deployed. As you see illustrated in Figure 4.53, the Scale Controller is monitoring supported message sources, and when one arrives, it spins up a VM, configures it, and puts the code on the VM. Then the Azure Function will retrieve the message and then process it, as the code dictates.
When I spoke to one of my technical friends about serverless, the first question was about the latency of going from 0 to 100, meaning, how long does it take for the Scale Controller to get hardware provisioned, the Azure Function configured, and the function executed? I won't try to pretend that there is no latency, but that would only happen on the first invocation, and it will remain warm for 20 minutes before it times out (using a dedicated or premium hosting plan, you can avoid this timeout and keep the instances warm). The internals of how it all works is proprietary and won't be shared, but there are some steps you can take if you want to improve the cold start of an Azure Function.
There are two ways to make the initial Azure Function invocation faster. The first one is to deploy your code in something called a run from package. This requires basically that you publish your code in a ZIP file. When you “zip” a file or a group of files, it compresses them and makes them smaller. As you would then imagine, when the code is being deployed to the VM, it happens faster than if you have many files in an uncompressed form. The run from package is enabled using an App Setting that you will set in a later exercise, and it has much more impact when running in the consumption hosting plan than when in dedicated. The reason for running faster in dedicated is because the VM that is provisioned to run the Azure Function code doesn't get rebuilt after 20 minutes, which means the code is copied once. Take note that since the zip capability discussed here is the Windows variety (in contrast to .7z, .tar, .war, etc.), it means that this feature is only supported on Windows. The second way to reduce latency with the first Azure Function Invocation after a shutdown is to not let it shut down in the first place. Implementing the second approach requires you to run on a dedicated or premium hosting plan instead of consumption. When running in dedicated or premium mode, there is a general configuration setting named Always On, which, as the name implies, keeps the Azure Function warm by disabling the 20-minute timeout threshold.
The remainder of this section will discuss hosting plans, triggers and bindings, runtime versions, and supported languages.
In Chapter 1, I shared two hosting plans, Consumption and Dedicated. There is another, which won't be on the Azure Solutions Architect Expert exam because it is in preview; it is called Premium. I will include Premium in the following text, but keep it in mind that it won't be necessary to know for the exam. The Consumption hosting plan is the one that is considered the serverless model, when you operate in any of the other plans you are running an Azure Function but you have a VM or a group of VMs that are actively provisioned and bound to the workload. Both Dedicated and Premium will have a fixed cost, but the primary difference between Dedicated and Premium is the existence of the Scale Controller. In Dedicated, you are using the scaling capabilities that exist for an Azure App Service that were just discussed. That scaling can be automated using that products feature, but you have to manage it, you can also manually scale out and in. However, in both Consumption and Premium, the Scale Controller manages the scaling out and in of compute resources based on numerous intellectual property algorithms and concepts. Table 4.13 lists additional limits and differences based on hosting plan. Please note that the Dedicated hosting plan is often referred to as the App Service plan.
TABLE 4.13 Hosting Plan Limits
Feature | Consumption | Dedicated | Premium |
---|---|---|---|
Storage | 1 GB | 50 to 100GB | 250GB |
Maximum Apps per plan | 100 | Unlimited | 100 |
Maximum memory | 1.5 GB | 1.75 to 14GB | 3.5 to 14GB |
Default timeout | 5 minutes | 30 | 30 |
Maximum timeout | 10 minutes | Unlimited | Unlimited |
Maximum number of instances | 200 | 10 to 20 | 20 |
From a storage perspective, note that when running in Consumption mode you are allocated 1GB of storage, which is where you can store the files (source, configuration, log, etc.) required to run and manage your application. Additionally, the content of an Azure Function running in Consumption mode is placed into Azure Files, which is unique to this mode. When running in Dedicated or Premium mode, your content is stored in an Azure Blob container. Both of those storage products are discussed in the next chapter, but they are different, and over time it was found that Azure Files with Azure Functions perform better when there are fewer files needing retrieval from the storage source, which is where the run from package feature came from. The differences in storage limits for the dedicated mode (i.e., 50 to 1000GB, shown in Table 4.13) exist because you have to choose different tiers of VM to run in that mode, like small, medium, and large, each having their own set of limits. The same applies to the amount of memory where each tier has a specific allocated amount; for memory this applies to Premium mode as well. An Azure Function in consumption mode has a limit of 1.5GB for the Function app; if more are required, you can move to one of the other plans to get more allocated memory.
The Maximum Apps per plan means, as a plan is synonymous with a process (EXE), that you can have 100 to an unlimited number of Function apps running on the VM. Remember that you can have multiple Functions per Function app. Also, remember that nothing is really unlimited, but there are large amounts of compute at your call. Realize that in Consumption mode you will run on a single processor, so would you really put workloads that have massive compute needs there? You would consider another hosting plan or compute product if that were the case. The maximum number of concurrent instances, where instances mean the number of VMs that will run your Azure Function, is also limited based on the hosting plan. There is some major compute power there; consider Premium with a max of 20 EP3 tier VMs massive. Finally, there is a timeout duration for the execution of an Azure Function. With Azure Functions there are two JSON configuration files,
function.json
and
host.json
. The
function.json
file is the place where your bindings and triggers are configured for a given Function within a Function app; you'll learn more about triggers and bindings in the next section. The
host.json
configuration file contains options that are applied to all the Functions in the Function app, one of which is the
functionTimeout
. It resembles something like the following:
{
"functionTimeout": "00:05:00"
}
The default is set to five minutes per Function invocation, and an invocation is the same as an execution in principle. For Consumption mode, the values can be from 1 second to 10 minutes. For both Dedicated and Premium modes, the default is 30 minutes. Setting the attribute to −1 means the function will run to completion. You might agree that 30 minutes is a long time for a single method to run, but it's not unheard of; however, I think setting the limit to infinity is a dangerous setting because you are charged based on usage, and what if something gets hung or runs in a loop for a week or so until you get the bill? Ouch. Keep in mind that if you consume the compute power, even if by accident, you will have to pay for it.
When you think about a regular program, there is a
main()
method that is the entry point into the program. From there all the
if/then/else
statements are assessed, and the code within the selected code block gets executed. In the Azure Function context, without getting too much into the coding aspect of this, instead of the
main()
entry point, there is a method named
run()
, which is where there are some details about what will cause the Azure Function to be invoked, i.e., the trigger. The simplest kind of Azure Function to create and consume is the HTTP trigger, which is triggered from a browser,
curl
, or any client that can make a request to an internet address. Take a look at the following code snippet; notice the
run()
method that has an attribute named
HttpTrigger
. It defines the kind of trigger that will invoke the Azure Function. The attribute has additional parameters that define the details about the binding (i.e., its metadata).
[FunctionName("csharpguitar-http")]
public static async Task<IActionResult>
Run([HttpTrigger(AuthorizationLevel.Function, "get", "post",
Route = null)] HttpRequest req, ILogger log)
It is also possible to declare the binding in the
function.json
file instead of in the method definition, as shown here:
{
"bindings": [
{
"authLevel": "function", "name": "req",
"type": "httpTrigger", "direction": "in",
"methods": [ "get", "post" ]
},
{
"name": "$return",
"type": "http",
"direction": "out"
}
]
}
To summarize, a trigger is what will cause the function to run, and the binding defines the properties of the type. Take, for example, the previous
function.json
file that specifies the authentication level of type
function
, the name of type
req
(i.e., an HTTP request), and the HTTP methods that are supported,
get
and
post
. If the binding was to a database or other storage product, the connection string, the container name, an access key, and a name of a collection of data that the function will receive when triggered are all examples of what can be included in the binding. By doing so, you can implicitly access them in the Azure Function code without declaration. There are numerous kinds of triggers that are supported by an Azure Function, a summary of the most popular ones is shown in Table 4.14.
TABLE 4.14 Azure Functions Supported Bindings
Type | Trigger | Input | Output |
---|---|---|---|
Timer | ✓ | X | X |
Service Bus | ✓ | X | ✓ |
Queue Storage | ✓ | X | ✓ |
Blob Storage | ✓ | ✓ | ✓ |
Cosmos DB | ✓ | ✓ | ✓ |
Event Grid | ✓ | X | X |
Event Hub | ✓ | X | ✓ |
HTTP | ✓ | X | ✓ |
If an image or file is uploaded into a blob container and there is an Azure Function bound to that container, then the Azure Function gets triggered. The same goes for an HTTP trigger; when it is called, the code within it is executed. Let's look at both the blob and HTTP trigger in a bit more detail, specifically at the binding metadata. If a file is added to a blob container, what kind of information do you think is important to know if your code needs to perform some action on that file? Its name, the location of the file, and perhaps the size of the file are important. All those details and many more will be accessible within the
run()
method as it was defined in the following blob storage binding example:
{
"bindings": [
{
"name": "myBlob",
"type": "blobTrigger",
"direction": "in",
"path": "samples-workitems/{name}",
"connection": "AzureWebJobsStorage"
}
],
"disabled": false
}
The
name
attribute is mapped to an element called
myBlob
that passed from the blob storage to the Azure Function of type
System.IO.Stream
. The
System.IO.Stream
class contains properties called
name
,
LocalPath
, and
Length
that are populated with information about the blob and can be accessed via code to provide the details of the blob as previously stated. Additionally, an important object when managing HTTP requests is the
HttpRequest
object, and as you see in the previous binding example found within a
function.json
file, it is identified as
req
and passed into the
run()
method as an
HttpRequest
object. That would mean you could access the properties and methods that exist in an
HttpReqest
object like the query sting and the request body.
To mention the input and output a bit more, consider that the direction of a trigger is always in. You can see that in the previous HTTP bindings example based on this setting:
"direction": "in"
. In Table 4.14 note the triggers that support input bindings; this means you can have multiple input bindings. For example, if you wanted to have a Timer Trigger read some data on a Cosmos DB, because Cosmos DB supports input bindings, the Timer Trigger can be imperatively preconfigured to know the metadata about the Cosmos DB, for example connection string, database instance, etc. Then if you wanted that same Timer Trigger to save the content from the Cosmos DB into a file and place it into a blob container, since blob storage supports an output binding, the details of it can be preconfigured as a binding with a setting of
"direction": "out"
. You can see
"direction": "out"
as a binding for the HTTP trigger shown previously. It is, of course, supported to define the binding information at runtime (declaratively) using the extensions and SDKs for those given messaging and storage Azure products. There are more actions required to make this happen and more concepts to consider, which won't be discussed here, but it is possible.
Every trigger has a corresponding extension module that must be loaded into the runtime when the Azure Function is started for the first time. All trigger extensions have some configuration settings that can be set in the
host.json
file. Take the HTTP trigger settings, for example.
{
"extensions": {
"http": {
"routePrefix": "api",
"maxOutstandingRequests": 200,
"maxConcurrentRequests": 100,
"dynamicThrottlesEnabled": true
}
}
}
If you have some experience writing ASP.NET Web APIs, then you know by default the route to the API is prefixed by
/api/
after the URL. Then after
/api/
is the name of the Azure Function, which in the previous example is
csharpguitar-http
. Recall from the
run()
method declaration that included this annotation
[FunctionName("csharpguitar-http")]
. What the
routePrefix
allows you to do is change
api
to something else, like v1 or v2. For more information about the other HTTP trigger settings, take a look at the following online documentation: https://docs.microsoft.com/en-us/azure/azure-functions/functions-host-json#http
.
Create an Azure Function that is triggered by an HTTP request by completing Exercise 4.16.
I'll point out a few things about some decisions made during the creation of the Azure Function. First, all runtime stacks do not support the feature of developing in the portal. For example, both Python and Java require that you develop locally and then deploy; therefore, the exercise chose .NET Core because the example was a bit simpler. You also chose not to enable Application Insights. This is a valuable tool, and again, since this was just an example, it was not enabled. If you are developing an Azure Function to run some live workload, then Application Insights is by all means recommended. Application Insights will be discussed in more detail in Chapter 9, but keep in mind that Microsoft (i.e., the FaaS offering) doesn't record your application errors. Cloud hosting providers are concerned primarily about the platform. You as the owner of the application need to code in exception handlers, and the location where you write those exceptions to is Application Insights. Finally, although the exercise created the Azure Function via the portal, that scenario is more for getting your hands wet and learning a bit about the capabilities. It is recommended for more complicated scenarios to develop and test locally using Visual Studio, Visual Studio Code, IntelliJ, Eclipse, or other open source IDEs.
A runtime is an in-memory set of supporting code that helps the management and execution of the custom application running within it. What a runtime is has been mentioned before, so more depth isn't required here. From an Azure Functions perspective, there are three versions of the runtime: version 1, version 2, and version 3. The difference between version 1 and version 2 is quite significant, and by all means, if you are creating a new function, choose the latest version. Version 1 of Azure Functions targets the .NET Framework, which doesn't support cross-platform execution. This means you are bound to Windows and bound to the full .NET Framework library. That isn't necessarily a bad thing; it just isn't the newest thing. Version 2 targets .NET Core 2.x and is cross-platform. .NET Core has proven to be more performant and has a much smaller footprint. Also, in version 1, it is possible to have a function written in C# and another function written in JavaScript contained within the same Function app. This is no longer a supported scenario in version 2. The runtime stack must be the same for all functions within the Function app.
To view the targeted runtime of your Azure Function, from the Overview blade in the portal, click Function App Setting. When the blade is rendered, you will see something similar to Figure 4.56.
At this moment, version 3 of the Azure Function runtime is in preview; however, it is moving quickly toward Global Availability (GA). Version 3 targets .NET Core 3.x. Notice also the tilde in front of versions ~1, ~2, and ~3. This notifies the platform that when there is a newer version of the runtime released, the code running will automatically target that new version. As you see in Figure 4.56, there is a specific version of the runtime 2.0.12858.0. If there is ever a newer version that your code cannot support, then you can pin your function to a specific version of the runtime using the
FUNCTIONS_WORKER_RUNTIME
app setting parameter accessible via the Configuration blade in the portal.
The chosen runtime version is important because it will dictate the operating system the Azure Function runs on and the languages in which you can write code with. As mentioned, version 1 targets the full version of the .NET Framework, which is not cross-platform and therefore can only run on the Windows operating system. However, version 2 and any future version target .NET Core, which is cross-platform and can therefore run on either Windows or Linux.
From a language perspective, C#, JavaScript (Node), and F# are supported in all supported versions of the Azure Function runtime. Java, PowerShell, Python, and TypeScript are supported in version 2 and greater. There is likely work happening in the background to support more languages as time progresses. See Table 4.15 for an overview of supported languages by the Azure Function runtime version.
TABLE 4.15 Azure Function Supported Languages
Language | 1.x | 2.x | 3.x |
---|---|---|---|
C# | .NET Framework 4.7 | .NET Core 2.2 | .NET Core 3.x |
JavaScript | Node 6 | Node 8 and 10 | Node 8 and 10 |
F# | .NET Framework 4.7 | .NET Core 2.2 | .NET Core 3.x |
Java | X | Java 8 | Java 8 |
PowerShell | X | PowerShell Core 6 | PowerShell Core 6 |
Python | X | Python 3.6 | Python 3.6 |
TypeScript | X | ✓ | ✓ |
As of writing this chapter, version 3 is in preview, and all those languages in the 3.x column are considered preview as well. There were also numerous other languages considered “experimental” in the context of version 1. I used the past tense because those will remain “experimental” and will never be a fully supported development language for Azure Functions v1. Those languages are Bash, Batch, and PHP.
Which of the following runtime stacks are available for running an Azure Function on Linux?
The answer is A, B, and C because the .NET Framework is not cross-platform and therefore cannot run on the Linux operating system.
The concept of microservices has been around for quite a number of years. The structural style of service-oriented architecture (SOA) may ring some bells. Service Fabric is a platform and an orchestrator to run microservices, which are a variant of the SOA development technique. What are microservices then? You can begin to understand by visualizing a nonmicroservice application, which would run on an Azure VM or an Azure Function, for example. When you create an application to run on virtual machines, the capabilities within it would typically span multiple tiers, i.e., a monolithic approach, as shown in Figure 4.57. The monolithic application solution would have a GUI and possibly these hypothetical built-in capabilities or services: security, order validation, logistics management, order fulfillment, and billing. This kind of monolithic architecture provides the compute using multiple tiers of IT architecture. Recall Figure 4.19 where the IT solution is comprised of the web/GUI, application, database, and authentication tiers.
Each of those services (security, ordering, logistics, etc.) has a set of executable code that possibly overlapped, reused, or combined logic and shared libraries. A change to any part of the monolithic application would have some impact on all the services hosted throughout the multiple tiers.
From an Azure Function perspective, you may recognize some similarities with the term microservices, in that the unit of work performed by an Azure Function would be a much smaller service than the IT solution running on those multiple tiers. You could consider implementing a Function for each of those services within a given Function app. That scenario is a valid one so long as the number of Functions are manageable and that it can run in a sandbox. However, there is no orchestrator for running a large number of Azure Functions that may or may not be dependent upon each other, and you can only scale a Function App and not specific Functions. This is where Service Fabric and microservices fill a gap. Service Fabric is an orchestrator that helps efficiently execute and manage a massive number of small isolated programs. Take Figure 4.58, for example, where instead of running those capabilities with a dedicated tier bound to a VM or within a Function App, each service can scale and be managed independently.
Scaling/managing a microservice independently is an interesting concept. In a monolithic code base, where you have many methods doing many different things, it doesn't take long to find where the bottlenecks are. A common place is performing database create, insert, update, delete (CRUD) or Select operations. The fact that all your code runs within the same process, a single slow-running, CPU- or memory-intensive procedure can have significant impact on the other methods running in the same process. There is no simple way to give that specific method more compute power; instead, you need to give compute to the entire tier. The group of services running in the process increases the compute by scaling up to a larger VM. This is not the most cost-effective or stable approach, because the need for extra compute power likely happens in bursts for a single service, and when the burst goes away, you have extra compute sitting idle generating costs. What if by scaling up, the application tier results in greater load on the database tier, again with more scaling until you finally get it right for that burst? The effort to get it right is not a simple one, and contemplating scaling back down and doing it all over again, when the next burst occurs, is often a hard pill to take. With microservices, you can scale more precisely as it is the service that scales and not a tier. Service Fabric handles the scaling; it monitors for health and helps with the deployment of changes.
There was mention of a node in Chapter 3 within the context of network traffic routing. It was described as a server that is connected to a network that helps route a packet of data to its intended location. A node is a virtual or physical server, and a cluster is a tightly connected group of nodes. Nodes in clusters typically operate with the same operating system, have the same hardware specifications like CPU and memory, and are connected through a high-speed LAN. A cluster in the context of Service Fabric as you would expect isn't concerned about routing networking data packets. Instead, the cluster is concerned with running one or more specific microservices. If you look back at Figure 4.58, connected to each microservice there is a cluster containing three nodes. In reality, the number of nodes for a given cluster can scale into the thousands, and the scaling can be programmatically or manually performed. Recognize that the Service Fabric platform runs on VMSS. That's right, this is another example of an Azure product (i.e., Service Fabric) built on top of another Azure product (VMSS). This means everything you have learned about VMSS in this chapter can be applied here. For example, you know that VMSS scales out using a base image consisting of the operating system and application. Additionally, with that scale out, the concepts of fault domains and update domains are applied. Understanding that, the confidence in the stability and redundancy of the architecture and infrastructure behind Service Fabric should be high.
Recognize that there are durability tiers in regard to the durability of Service Fabric components. They are Gold, Silver, and Bronze. Each of those tiers, as with many of the Azure products, has benefits, different limits, and a different set of capabilities and costs. You might assume that you can choose any series of VMs as your nodes, but this is not true; it is dependent on the tier. The same goes for storing the state. (State management, which is discussed later, is only supported in Silver and Gold.) It is recommended for production workloads that you use either Silver or Gold, because with Bronze, the scheduling of updates and reboots is not respected. The speed in which scaling happens is also dependent on the tier and will scale interestingly slower with Silver or Gold. This is because in those tiers, data safety is prioritized over speed and conversely so in regard to Bronze.
It is the responsibility of the orchestrator to properly deallocate and allocate microservices from or to nodes in the given cluster. This is a complicated activity considering that there can be a different set of microservices on each node. For example, a security and a logistic management service can coexist on the same node. To then properly deallocate a node, Service Fabric must know the state of all other nodes in the cluster so that when moving a service from one node to another, there is no risk of then overloading the new one. All of that scaling logic is happening behind the scenes using the orchestration logic within Service Fabric. When to scale is most commonly based on the consumption of the services or based on the resource usage on the node. There is a concept called logical load metrics that allows you to create custom service consumption-based metrics used for scaling. This kind of metric is concerned with elements such as connection count over a given time frame, the responsiveness of the application, or the number of times the application has been running. Based on counters maintained in those custom metrics, it is possible to scale out or in based on your defined thresholds. Another technique that is the most common scaling method is where you scale in or out using usage metrics such as CPU and memory. How all this works together can be better understood by reading more about the Service Fabric architecture, which is coming up next.
The Service Fabric orchestration architecture is built upon a layered subsystem of components. Each subsystem component plays a role in making the microservice applications available, fault tolerant, manageable, scalable, and testable. Each component that makes up Service Fabric is shown in Figure 4.59 and discussed in more detail in the following text.
There are three areas that I will offer some guidance on regarding Service Fabric.
The HTTP protocol is stateless by design. This means that when an HTTP request is made to a web server, all the required information to respond exists within, for example, the header and/or body. Once the request is responded to, the web server does not remember much if anything about it. The next time you make a request, everything needed to make the request successful again must be sent from the client. A stateful application, on the other hand, could conceivably store some details about the client and the request on the server so that each request didn't have to carry along so much detail. Take a simple example of applications that would not require state like an API that returns the datetime or performs some kind of mathematical calculation, like a calculator. In most cases, neither of those examples would need to store information about a client's previous datetime request or previous calculation. On the other hand, applications that provide capabilities that require multiple steps to complete or need to know the identity of the client sending the request would need to support the maintaining of state. Completing an order online usually takes multiple steps that require clicking a button that sends a request to a web server. Each of those requests would need to know where you are in the order process, as well as who you are. Most websites store that information on the web server (for example, order details and identity token) because it is faster since the order detail doesn't have to be sent each time and because it's safer since just a small time-sensitive encrypted identity token can be sent back and forth instead of reauthenticating with each request.
Service Fabric supports both stateless and stateful scenarios. There is no extra step to take when creating a stateless microservice. However, when the application requires state, then here is a place where the reliability component plays another important role. (Recall Figure 4.59.) Implementing the stateless capabilities into Service Fabric requires coding. The reliability component exposes a set of service APIs that can be consumed to create a
StateManager
for managing and storing the information your application needs to store for subsequent reference. Since the implementation of this technique is code-based and not something you need to know for the Azure Solutions Architect Expert exam, it won't be covered in more detail. However, one important detail that you must understand is that the data that is stored in the
StateManager
exists only on the node where the request is sent. It is important to recognize this in the context of scaling and failover scenarios. The request from a client must always be routed to the same node, and if the node goes away for any reason, so does the data stored on that node for the given session. You need to code for those scenarios.
You should understand what scaling up and scaling out mean. Scaling up means you have chosen to scale from a 1 CPU VM with 32GB of RAM to a VM with 4 CPUs having 64GB of RAM. Scaling out means that when you have one 4 CPU VM with 64 RAM, you add an additional VM with those same specifications that will run the identical application. In the context of Service Fabric, those terms are referred to as vertical and horizontal scaling. Scaling up equals vertical, and scaling out is horizontal. Scaling is recommended using Azure Resource Manager (ARM) templates, which are discussed in Chapter 8, or using the
AzureClient
, which is discussed in more detail here:
docs.microsoft.com/en-us/dotnet/api/overview/azure/service-fabric
Manually scaling the VMSS via the portal or any other interface, vertically or horizontally circumvents the Service Fabric components and can get the management component out of sync with the overall state of the cluster. That is not recommended.
I learned a scaling lesson some years ago while working on support at Microsoft. Previously in this chapter I discussed App Service Environments (ASEs), which are a private PaaS Azure offering. I learned the hard way that ASEs exhibit a different behavior when scaling than what I had experienced many times before. Commonly, when you scale vertically (up/down), it happens within a relatively short amount of time; it is impactful. You will lose state, but the timeframe in which it takes to get running again is usually the same amount of time it would take for a normal reboot of a physical server to happen. What I learned was that for a version 1 ASE environment, it is not the same; the time required to scale an ASE (V1) is much, much longer. I became so comfortable with using this virtual reboot (vertical scaling) as a quick solution for solving downtime, hanging, or badly behaving applications that I used it way too often without much thought. My point is to take caution when scaling. The impact can and sometimes is more impactful based on the product with which you are performing that action. However, recognize that, as stated many times, maybe too many so far, is that the ability to scale compute up, down, in, and out is the most valuable and game-changing offering existing in the cloud from a compute perspective.
Finally, it is important that you implement some kind of logging strategy. The cloud hosting provider does not have the manpower to monitor and resolve all exceptions that are running on the platform; instead, they focus on the platform only. There are limited platform-provided application-focused logging capabilities that are built in by default. It is often the case that, when an exception happens, it gets rendered to the client or shows up in the event viewer on the VM. However, if the application itself, in the form of code, does not implement any kind of logging, the scenario in which the error happened is usually unknown, and it will likely never be figured out. Microsoft provides and recommends tools like the Application Insights SDK,
EventSource
, and ASP.NET Core Logging for Service Fabric microservice deployments. You can learn more in Chapter 9.
Application Insights and Log Analytics, two previous stand-alone features, have been bundled into a single product named Azure Monitor. You will read often about Application Insights, but keep in mind it is bundled into Azure Monitor. The
EventSource
class is found within the
System.Diagnostics.Tracing
namespace available in the .NET Framework. You would need to implement this class in the code of the microservice running on Service Fabric. Part of the implementation requires a configuration in a
web.config
file that identifies where the log file is to be written and whether the logging is enabled. It is advised that you leave logging off unless there is a reason for it being on. Logs can grow fast and large and then have some negative impact on the performance. Watch out for that. Lastly,
EventSource
is dependent on the .NET Framework, which does not run on Linux. If you are targeting an ASP/NET Core application to run on Linux, then you would consider the ASP.NET Core Logging instrumentation. Logging capabilities are exposed via the
Microsoft.Extensions.Logging
namespace, which exposes an
ILogger
interface for implementing the log capturing, but again, this needs design, coding, and configuration. Application logging is a fundamental aspect for resolving bugs in production. If you want to increase the probability of having a successful IT solution running on Azure, then application logging is essential, which is why half of Chapter 9 is focused directly on that.
It is common when running microservices or web applications on Service Fabric that you also consume and configure other Azure products. For example, when you create the cluster and the nodes within the cluster, there is a need to load balance requests across clusters and to also expose those endpoints to the clients and customers who will consume them. Figure 4.60 shows a common architecture scenario that a customer would implement.
Notice that a load balancer can be used to balance requests across the nodes and microservices hosted in the Service Fabric cluster. The load balancer, as you learned in the previous chapter, can be configured to only support connections from a specific on-premise network, or it can be exposed globally. The same goes for the exposure of the microservice APIs, for example, the logics, ordering, and billing services. API Management can interpret the path in a request and redirect the request to a specific backend pool of compute. This is helpful when changes happen on the Service Fabrics location detail, like a relocation into a new zone, virtual network, or subnet, where its endpoint may receive a new IP address. Those backend changes can be hidden from the client connecting to the exposed endpoint because the API Management configuration can be updated to route to the new location of the Service Fabric cluster as required. The point here is that simply provisioning an instance of Service Fabric doesn't get you up and running. There is coding and some significant configuration requirements to make before this product becomes functional. The barrier to its implementation is a bit high and requires some skilled IT professionals to deploy and support it. However, for large microservice workloads that target the .NET Framework, this has proven to be a good option.
In principle, Service Fabric and Azure Kubernetes Service (AKS) provide similar capabilities. The difference between the two, or, better said, the reason you would choose one over the other, comes down to two things: open source versus .NET Framework (Figure 4.2) and midlevel complexity versus high complexity. While you were reading the previous section about Service Fabric, there wasn't a mention of Docker, Azure Container Instance (ACI), or Azure Container Registry (ACR), for example. AKS is specifically designed toward the consumption of containers created using either the Docker file format or ACR. AKS was not only developed from the ground up using open source technologies but also designed specifically for running open source applications. Service Fabric can run open source code and provide Linux as an operating system to run them on; however its original target was the .NET Framework development stack. Also, AKS is a bit less complicated to get up and running. The barriers to entry are less demanding. When it comes to Service Fabric, the orchestration of the workloads requires coding. When it comes to AKS, the default deployment and management of the workload requires no coding. Both of these orchestrators focus on the containerization of microservices; however, each one is targeted to a specific kind of application (open source versus .NET Framework). Each has a different complexity level of prerequisites and maintenance activities. Figure 4.61 illustrates the AKS concept and that is what will be discussed in this section.
The illustration describes not only the AKS components (like the cluster master) but also its tight integration with Azure DevSpaces, Azure DevOps, Azure Monitor, and container repositories like Docker Hub and ACR. Let's begin with a brief discussion of Kubernetes versus AKS and what existed prior to that on Azure.
Kubernetes is itself a complete system for automating deployments, as well as scaling and managing applications. If you wanted, you could deploy Kubernetes itself onto the Azure platform and use the Kubernetes product without even touching AKS. What AKS provides is the infrastructure to run the cluster master (the Kubernetes control plane) for no cost. You will see the cluster master in Figure 4.61. Only the nodes within the cluster that run your application incur costs, so simply using AKS, you get some free compute that would otherwise result in a bill. How specifically the cluster master is architected is platform specific and not something you would need to worry so much about; however, it is worth some time to cover the various components that make up the cluster master. The cluster master components are as follows:
The API server uses JSON over HTTP REST for connecting with the internal Kubernetes API that reads/writes to the
etdc
data store for configuring the workloads and containers on the nodes in the cluster. The
etdc
data store is the location where the current state of the cluster is stored. Consider you have numerous pods running on numerous nodes within the cluster and you want to update them with a new version of the container image. How many instances of the pods and on which node they are running are all stored in the
etdc
, which helps find where the update needs to be rolled out. Additionally, knowing how many instances of a node you have is helpful when it comes to scaling. The scheduler also plays an important role where, if you have multiple instances, then you would not want to deploy a change to all of them at the same time. The scheduler will make sure they are updated one after the other so that the highest availability is achieved. The controller is the engine that maintains the status between the desired state of the cluster and the current state. Consider that you have requested to manually scale out to five instances of a specific node running a specific pod; the request would be sent to the API server. The controller would find out how many instances currently exist via
etdc
, determine how many more need to be added, and then take on the responsibility of getting them all provisioned, configured, and accepting traffic. Then it would update
etdc
once the actual cluster state matches the desired state. Imagine if the implementation is very large, there could be a lot going on here. AKS provides you with the cluster master compute and interface for free.
There was a product before AKS named Azure Container Service (ACS) that is being depreciated and replaced by AKS. ACS fully supports the Windows OS for running workloads, which AKS is still in the process of providing full support for. As mentioned, AKS is designed from the ground up on Linux and open source languages. ACS doesn't support the concept of node pools where nodes with the same configuration can target the same underlying VMs. Remember from Figure 4.52, where you read about VMs that are designed specifically for different workloads such as memory, CPU, or storage-intensive work. Pooling the nodes together, which is not possible with ACS, keeps applications that have specific workload requirements together and thereby ensures the best possible performance. ACS is only mentioned here in case at some point you hear or read something about it. You should not target this product for any new projects. Finally, AKS supports 100 clusters per subscription, 100 nodes per cluster, and a maximum 110 pods per node.
A cluster is a group of nodes, where a node is typically a VM. The nodes typically do span update and fault domains so that availability is preserved during transient or other more serious regional outages. Review the “Clusters and Nodes” section in the previous section as the cluster and node concept here is similar to the scenario with Service Fabric. Take a look again at Figure 4.61 and notice the customer-managed component to get a good visual of the cluster, node, and pod configuration in the AKS context. Also note the term pod, which wasn't referred to in the Service Fabric context. I like to make the connection between the term microservice, which is used in the Service Fabric context, and a pod. Each of those references has to do with the location where the application code gets executed. From an AKS perspective and also from a Service Fabric one, you can have multiple microservices or pods running on a single node at any given time. However, from an AKS perspective, the pod is a one-to-one mapping with a container, while in Service Fabric the mapping is with an image (no Docker). From now, in this context, the usage of pod and microservices will carry the same meaning.
Creating and deploying an application to AKS is a bit easier than you might expect and is something you will do in a few moments. Before you do the exercise, there are a few concepts I would like to introduce to you or perhaps provide you with a refresher. The first one is a new configuration file called YAML (rhymes with camel) and has a recursive acronym of “YAML Ain't Markup Language.” In the context of Kubernetes, AKS, Docker, and even some in Azure DevOps, you will see this file format occurring often. For many this is the “next” iteration of file formatting that started from XML to JSON and now YAML. YAML has yet to show up much in the non-open source world; however, it is likely on its way. There is also some helpful integration of AKS into Visual Studio and Visual Studio Code for deploying to AKS. The Visual Studio IDE allows a developer to develop and test an application on their local workstation and then deploy directly to an AKS cluster. You can see in Figure 4.61 that, once deployed, remote debugging is also possible. What is going on behind the scenes to make that happen is completely abstracted away from the developer by a service called Azure Dev Spaces, which you will see and use later. The final concept I would like to introduce you to briefly here is GitHub and Azure DevOps. Both of those products are designed for the management and storage of application source code. Remember in EXERCISE 4.2 that with Azure Container Instance (ACI) you got some source code from GitHub to create the application running in that container. The same set of capabilities exist for Azure DevOps as referenced in Figure 4.61 where Source Code Repository and Azure DevOps Pipeline are presented. Much more about Azure DevOps is discussed in Chapter 8 where deployments are covered in detail. Read more about it there if desired.
To learn more about AKS and use Azure Dev Spaces, complete Exercise 4.17 where you will create an Azure Kubernetes Service cluster using Azure CLI and then deploy an ASP.NET Core web application to it using Visual Studio. For this example, I used Visual Studio Community 2019 (16.2.4), which is free to download from here:
visualstudio.microsoft.com/downloads
/
That wasn't too hard, but realize that the exercise is just scratching the surface, and there is so much more to it. There will be some books written about this at some point if there haven't been already. To explain a bit more about what you just did, I start with the Azure CLI commands. The first one,
az aks create
, is the one that created the AKS cluster. The second,
az aks get-credentials
, retrieved and stored the credentials of the AKS cluster so that I would then be able to capture information using
kubectl
and perform other administrative work on the cluster. The Kubernetes command-line tool
kubectl
is installed by default when using Azure Shell. A reason I decided to use Azure Shell was to avoid the complexities of installing Azure CLI and
kubectl
on my workstation. Now that you have created an AKS cluster, developed an application, and deployed it to the cluster, read on to learn a bit about maintaining and scaling the AKS cluster.
Since you know that VMSS is used behind the scenes of AKS and you know the roles and responsibilities that come with IaaS, you will recognize that you must manage any OS upgrades, the OS patching, the version of Kubernetes, and of course the application running on the nodes. The details to specifically perform these updates will not be covered here; instead, only the concepts will be explained. When an update to Kubernetes is initiated, it is rolled out using the concept of cordoning and draining. This is done to minimize the impact the update may have on the IT solution. The first action that is triggered from the controller of the cluster master is to cordon one of the nodes and target it for upgrade. There is a waiting period after the node is identified so that any request running on the node can complete; this is considered draining. Once the node is no longer being used, the upgrade proceeds. Once complete, the controller places the node back into the pool and takes out another; the process is repeated until all nodes are upgraded. It is possible to upgrade just a node within the cluster or the entire cluster of nodes. In the Azure Portal on the AKS blade, you will find a navigation link named Upgrade, which will upgrade the entire cluster. Click the Node pools link on the AKS blade, and you will find an option to upgrade each node. Upgrading an application running on the AKS cluster has many available options; you can perform it manually, or you can choose any of the deployment options found by clicking the Deployment Center navigation menu item on the AKS blade. Azure DevOps Repos, GitHub, Bitbucket Cloud, and External Git are all options (as shown in Figure 4.61). Each of those implement CI/CD, which is very helpful when it comes to managing, developing, testing, and deploying application code (more on that in Chapter 8). For upgrading the OS version you have provisioned for AKS, you can find the VMSS instance that was created when you created the AKS cluster. Open the virtual machine scale set in the Azure Portal and navigate to the blade; the name should be prefixed with
aks-
. On the VMSS blade, click Instances, and then click the instance. The process for upgrading the instances running your AKS cluster is the same as you would do for any other VMSS implementation.
There are two methods for scaling: manual or automated. Manual scaling isn't something you should consider when running a live production application. This kind of scaling capability is most useful during testing where you can learn how your application responds to scaling. You can scale from a pod or node perspective. To manually scale a pod, you can use the Kubernetes command-line
kubectl
. First use it to get a list of pods so you can see what you have; then once you identify the pod, scale that pod to, for example, five replicas or instances.
kubectl get pods
kubectl scale --replicas=5 <podName>
To manually scale a node to, for example, three instances, you can execute the following Azure CLI command from either Azure Cloud Shell or from a workstation with Azure CLI configured:
az aks scale --resource-group <name> --name <name> --node-count 3
From an autoscaling perspective, Kubernetes provides a service called Horizontal Pod Autoscale (HPA), which monitors the resource demand and scales the number of replicas/instances when required. This service does work with AKS but requires an optional component named Metrics Service for Kubernetes 1.8+. The HPA checks the Metric API exposed from the API server, which is part of the cluster master, every 30 seconds. If the metric threshold has been breached, then the scale out is managed by the controller. To check the scale metrics, you can use the Kubernetes command-line
kubectl
to check, using this command:
kubectl get hpa
To set the metric threshold, if the previous command doesn't return any configurations, use the following:
kubectl autoscale deployment <podName> --cpu-percent=70 --min=2 --max=15
This command will add a pod when the average CPU across all existing pods exceeds 70%. The rule will increase up to a maximum of 15 pods, and when the CPU usage across the pods falls below 70%, the autoscaler will decrease one- by one to a minimum of two instances.
Also, you can autoscale a node (the VM on which the pods are running). Although there are some capabilities in the Azure Portal to scale out the number of VMs in the AKS cluster (it is a VMSS product after all), this is not recommended as it will get the AKS cluster controller and
etcd
out of sync. You can scale a VMSS using PowerShell, Azure CLI, or a supported client. This also is not recommended. It is recommended to always configure scaling or to manually scale using the Azure CLI Kubernetes component, which is accessed via Azure CLI and always followed by
aks
, for example
az aks
. The script that follows is an example of how to scale a node:
az aks update --resource-group <name> --name <name>
--update-cluster-autoscaler --min-count 2 --max-count 15
Having used the Azure CLI Kubernetes component, as recommended, the AKS cluster is set to have a minimum of two nodes and a maximum of 15. You might be asking yourself, what are the thresholds? Well, in practice, HPA and a node scaler service called Cluster autoscaler are used alongside each other. The cluster autoscaler will check the same metrics that were set for the pod every 10 seconds versus 30 seconds like HPA does for the pod. In a similar technique, HPA takes by increasing the number of pods based on the metrics. The cluster autoscaler will focus on the nodes (aka the VM) and adjust them appropriately to match the needs of the pod. You must be running Kubernetes version 1.10.x to get this cluster autoscale feature.
The final topic covered here is one that will handle a burst. I have mentioned the bursting concept a few times in regard to hybrid networks and HPC. That capability is also available when running AKS. In some scenarios, when a rapid burst occurs, there may be some delay with the provisioning of new pods as they wait for the scheduler and controller to get them rolled out; the requested increase in capacity simply hangs in a waiting state until they are added into the cluster. When the burst is unexpected and the number of nodes and pods exceeds the upper limit of your configuration, additional nodes simply are not allowed to be scaled out further. The way to handle this burst a bit better is to use virtual nodes and our friend ACI. Virtual nodes and their configuration into AKS is an advanced feature and is mentioned here only for your information. You can find more details here:
Cloud Services were Microsoft's first PaaS offering. At the time, Cloud Services offered some good capabilities like the all-powerful autoscaling feature. The problem with Cloud Services was the barrier of entry was high. To deploy an application to Cloud Services, the application needed to be reconfigured to run on that environment. This meant existing ASN.NET and ASP.NET MVC applications couldn't simply be deployed to and run from Cloud Services. The application code needed to be migrated to a Cloud Services project in Visual Studio, which was acceptable for smaller applications, but there was never a good migration process from an existing on-premise web application to Cloud Services. Many customers still use this service, but let it be known that this product is deprecating, and you should not move any workloads to it unless there is a justified business case to do so. There is no need to discuss this Azure product in more detail, but it's worthy of mention because at one time it was the only PaaS Microsoft had, and it was good. It has simply been replaced by better Azure products and features.
The word virtual has been popping up a lot in this book, no? Examples are virtual network, virtual machine, and now Virtual Desktop. You should already have an understanding of what virtual means in the context of computing now. You should have no doubt that behind the virtual there is a physical layer because nothing exists without some kind of actual connectivity between true existent elements. So, you shouldn't have much difficulty making an educated guess when it comes to Windows Virtual Desktop, right? Take a second now to formalize your definition, and then read on. Try this additional mental exercise and guess why a company would implement or want such a thing.
To answer the first question, virtual means that the compute resources allocated to process a workload are allocated from a greater physical pool of compute power. But why would a company want to run a bunch of Windows desktops as virtual machines? The answer to the second question has two aspects. The first is client/server, and the other is cost. From a client/server perspective, before the internet/intranet, the big thing was the creation of GUI (desktop) applications using, for example, Windows Forms, which had very complicated client-side business logic. This design principle certainly existed before Model-View-Controller (MVC) and even before the concept of separating presentation and business logic entered into any kind of design-oriented conversation. The referred to computing era was the 1990s. If you recall from the introduction, I mentioned that 2013 was the time companies started migrating to the cloud. Now I also call for the rewriting or rearchitecting of all non-cloud-based applications. It is with limited effort that you should recognize how fast technology moves and why there are possibly many programs from the 1990s that remain too mission critical and complex to attempt such an upgrade or fundamental modification. This is mostly because the people who coded them are long gone, because no one wants to look at the old code, or because the effort required to make the change is too great for most units or people.
The solution was then to create a cost-effective means for a lift-and-shift solution to keep the code and process model intact. That means Windows Virtual Desktop. If you think about the costs involved in running a client/server solution, one is the server side for sure, and this entire chapter has been about the provisioning and configuration of server-side compute power. There have also been some tips on its redundancies and how to control its associated costs. The other side of the equation is the client. Those who need to consume a client/server application need a workstation. It isn't like an internet application where customers have their own workstation to access the code on your server. In this context, it is most commonly an employee, or maybe even a customer, who accesses your application via an in-store kiosk to use some kind of backend server application. The cost comes down to having those machines where they are needed, with ample compute power to run the desktop application.
If the workstation is required to only make a remote connection to a virtual desktop running the actual client-side program, then the employee workstation can be much less powerful than the virtual desktop. In addition, does the employee need to access and use the virtual machine 100% of the time? If not, then the compute power utilized and charged to run the desktop can be reduced. If you purchase a workstation for each employee to run the desktop application, then that compute is 100% allocated to that employee and can be idle sometimes. However, you might be able to save some resources if a shared employee workstation is purchased with lower specifications and the remote shared virtual machine with higher specification is used only on demand and changed only based on its consumption.
To run a Windows Virtual Desktop solution, the following components are required:
There is much more to this area; the topic is worthy of a book on its own. This section should be enough to know the requirements and use case for the Windows Virtual Desktop product. For more information, see docs.microsoft.com/en-us/azure/virtual-desktop/overview
. It provides many more details.
Just like in the previous chapter, we covered a lot, but I hope that you have gotten your hands dirty, created some Azure products, and configured many of the features. You should feel confident that since you now have a solid understanding about Azure security, Azure networking, and now Azure compute, the probability of passing the Azure Solutions Architect Expert exam is rising rapidly.
Specifically, in this chapter, the key takeaways are that although the compute power is advertised as unlimited, there are some limits. For a majority of consumers, however, the limits of that compute will not be reached due to either cost or simply the need for such workload computations are unnecessary. Also, you know that if you do ever hit a compute threshold, if you have a justified business case, those limits can be lifted, and more resources can be provided. The limits are to protect you from receiving an outrageous bill.
Containers and images are gaining a lot of traction and can help simplify the deployments onto IaaS and PaaS architectures, as well as transforming applications to run as microservices or AKS orchestrated solutions. Docker is a leading technology for creating containers. Azure VMs have a vast catalog of VM types, with series that target CPU, GPU, memory, or high storage workloads. You can run the Windows operating system and almost any version of Linux, and remember, when you want to move an on-premise workload to an Azure VM, the tool of choice is Azure Migrate. Also recall the similarity in name between availability sets and VM Scale Sets, which doesn't mean they provide similar capabilities. Availability sets have to do with zones and fault and update domains, and they are usually implemented along with a tiered monolithic architecture model. VMSS is a pool of VMs that get provisioned using the same image.
Azure App Services and Azure Functions provide great platforms for running web applications, Web APIs, and serverless workloads. Being PaaS, they both eliminate the maintenance of operating system and third-party runtimes. That loss can also mean your application cannot run on the platform, but in that case you can either choose to run in a container or move over to IaaS. Don't forget about WebJobs, which run batch jobs, and Azure Batch, which also runs batch processing but at a galactic scale.
Lastly, you learned about Service Fabric and the Azure Kubernetes Service (AKS), which focus on microservices and containerization. Service Fabric can support containers and some open source; however, its primary strength is running and managing .NET-based microservices. AKS is a robust open source orchestrator targeted at containerized open source applications on Linux.
App Service Environment (ASE) | Horizontal Pod Autoscale (HPA) |
App Service Plan (ASP) | hyper-scale |
Azure Container Registry (ACR) | Infrastructure as a service (IaaS) |
Azure Container Storage | Input/Output operations per second (IOPS) |
Azure Marketplace | Integrated Drive Electronics (IDE) |
Azure Reserved VM Instance | Internal Load Balancer (ILB) |
batch jobs | lift and shift (aka rehost) |
Blue Screen of Death (BSOD) | Microsoft Distributed Transaction Coordinator (MSDTC) |
Business Continuity and Disaster Recovery (BCDR) | Microsoft Message Queuing (MSMQ) |
Cloud Bursting | Microsoft Virtual Machine Converter (MVMC) |
Cloud Optimized | Model-View-Controller (MVC) |
Cluster autoscaler | Orchestration |
Command Line Interface (CLI) | OS level virtualization |
Common Language Runtimes (CLR) | Page Blob |
Compute | Platform as a service (PaaS) |
Container as a service (CaaS) | Remote Direct Memory Access (RDMA) |
Containerized | Representational state transfer (REST) |
Create, Insert, Update, Delete (CRUD) | Resource lock |
Cross-Origin Resource Sharing (CORS) | runtime |
Database Management System (DBMS) | Service Oriented Architecture (SOA) |
Disaster Recovery (DR) | Simple Object Application Protocol (SOAP) |
Docker | Small Computer System Interface (SCSI) |
Electronic Data Interchange (EDI) | Software Development Kits (SDK) |
ephemeral disk | update domains |
fault domains | Virtual Hard Disk (VHD) |
Functions as a service (FaaS) | Virtual Machine Scale Sets (VMSS) |
Global Availability (GA) | virtual nodes |
Graphical Processing Units (GPU) | Web API |
Graphical User Interface (GUI) | Web Application |
High Performance Computing (HPC) | Windows Communication Foundation (WCF) |
Windows Subsystem for Linux (WSL) |
az vm stop
Stop-AzVM
Remove-AzResourceGroup
on the resource group that the VM exists inAzure Active Directory
A physical workstation
Azure Virtual Network
Azure AD Connect
18.234.154.197