Chapter 2: Principles of Modern Architecture

In the previous chapter, we looked at why architecture is important, what it seeks to achieve, and how it has changed over time. Understanding how we got to where we are today helps us in our role and provides a solid framework for our designs.

This chapter will look at how we architect systems in general to understand the high-level requirements and potential methods. Split into pillars, we will examine different aspects of each; however, as we will see, they are all interlinked and have some element of dependency on each other.

We will start by looking at security, perhaps one of the essential aspects of architecture, and understand how access to systems is gained and how we prevent it.

Next, we'll investigate resilience, which is closely related to performance. By understanding the principles of these subjects, we can ensure our designs produce stable and performant applications.

Deployment mechanisms have become far more sophisticated in recent years, and we'll learn how our decisions around how we build platforms have become entwined with the systems themselves.

Finally, we will see how a well-designed solution must include a suite of monitoring, alerting, and analytics to support the other pillars.

With this in mind, throughout this chapter, we will cover the following topics:

  • Architecting for security
  • Architecting for resilience and business continuity
  • Architecting for performance
  • Architecting for deployment
  • Architecting for monitoring and operations

Architecting for security

As technology has advanced, the solutions we build have become more powerful, flexible, and complex. Our applications' flexibility and dynamic nature enable a business to leverage data and intelligence at a level previously unknown. The cloud is often touted by many vendors as having near unlimited capacity and processing power that is accessible by anyone.

But power comes at a cost, because it's not just businesses who wish to leverage the potential of the cloud—hackers also have access to that tooling. Therefore, the architect of any system must keep security at the core of any design they produce.

Knowing the enemy

The first step in ensuring security is to understand the hacker mindset or, at the very least, to think about what they wish to accomplish—why do hackers hack?

Of course, there are lots of reasons, but we'll state the obvious one—because they can! Some people see hacking a system as a challenge. Because of this, their attack vector could be any security hole they can exploit, even if this didn't result in any valuable benefit.

The most dangerous attackers are the ones who wish to profit or cause damage from their actions, either directly by selling/releasing private data or by holding their victim to ransom by encrypting data and demanding money to release it.

Some wish to simply disrupt by bringing a system down—depending on the nature of the solution in question, the damage this causes can be reputational or financial.

All of these scenarios highlight some interesting points. The first is that when designing, you need to consider all areas of your solutions that might be vulnerable—authorization, data, application access points, network traffic, and so on.

The second is that, depending on your solution, you can prioritize the areas that would cause you the most damage. For example, if you are a high-volume internet retailer, uptime may be your primary concern, and you might therefore concentrate on preventing attacks that could overload your system. If data is your most valuable asset, then you should think about ways to ensure it cannot be accessed, either by preventing an intrusion in the first place or protecting information if it is accessed via encryption or obfuscation.

Once we have the why, we need to think about the how.

How do they hack?

There are, of course, many ways hackers can gain access to your systems, but once you have identified the reason why an attacker may want to hack you, you can at least narrow down the potential methods. The following are some of the more common ones:

  • Researching login credentials: Although the simplest method, this is perhaps one of the most common. If an attacker can get your login details, they can do a lot of damage very quickly. Details can be captured either by researching a user's social and public profiles, to guess the password or find answers to your security questions.
  • Phishing: Another way of capturing your credentials; you may receive an email notifying you that your account has been locked, with a link to a fake website. The website looks like what you are expecting, and when you enter your details, it merely captures them.
  • Email: Rather than capturing login details, some emails may contain malicious code in the form of an attachment or a link to a compromised site. The purpose is to infect your computer with a virus, Trojan, or similar. The payload could be a keylogger to capture keystrokes (that is, login details) or spread and use more sophisticated attacks to access other systems.
  • Website vulnerabilities: Poorly written code can lead to all sorts of entry points. SQL injection attacks whereby Transact-SQL (T-SQL) statements are posted within a form can update, add, or delete data if the backend is not written to protect against this type of attack. Cross-site scripts that run on the hacker's website but access the backend on yours can override form posts, and so on.
  • Distributed Denial of Service (DDoS): A DDoS attack seeks to overwhelm your servers and endpoints by flooding them with requests—this can either bring down your applications or potentially trigger other exploits that grant complete access.
  • Vulnerability exploits: Third-party applications and operating systems can also have vulnerable code that hackers seek to exploit in many different ways, from triggering remote execution scripts to taking complete control of the affected system.

Of course, there are many more, but understanding the main reasons why and how hackers hack is the first step in your defense. With this knowledge, we can start to define and plan our strategy.

Defining your strategy

Once we have identified what we need to protect, including any prioritization based on your platform's characteristics, we can start to define a set of rules that set out how we protect ourselves.

Based on your requirements, which may be solution- and business-led, the strategy will state which elements need protecting, and how. For example, you may have a rule that states all data must be encrypted at rest, or that all logging is monitored and captured.

There are several industry compliance standards, such as ISO27001, National Institute of Standards and Technology (NIST), and the Payment Card Industry Data Security Standard (PCI DSS). These can either form the basis of your internal policies or be used as a reference; however, depending on your business's nature, you may be required to align with one or more of them.

Information

ISO is the acronym for International Organization for Standardization, which is an international standard-setting body with representatives from multiple other standards organizations.

We can now consider which technologies we will use to implement the various policies; next, we will look at some of the more common ones.

Networking and firewalls

Preventing access to systems from non-authorized networks is, of course, a great way to control access. In an on-premises environment this is generally by default, but for cloud environments, many services are open by default—at least from a networking perspective.

If your application is an internal system, consider controls that would force access along internal routes and block access to external routes.

If systems do need to be external-facing, consider network segregation by breaking up the individual components into their networks or subnets. In this scenario, your solution would have externally facing user interfaces in a public subnet, a middle tier managing business rules in another subnet, and your backend database on another subnet.

Using firewalls, you would only allow public access to the public subnet. The other subnets would only allow access on specific ports on the adjacent subnet. In this way, the user interface would have no direct access to the databases; it would only be allowed access to the business layer that then facilitates that access.

Azure provides firewall appliances and network security groups (NSGs) that deny and allow access between source and destination services, and using a combination of the two together provides even greater control.

Finally, creating a virtual private network (VPN) from an on-premises network into a cloud environment ensures only corporate users can access your systems, as though they were accessing them on-premises.

Network-level controls help control both perimeter and internal routes, but once a user has access, we need to confirm that the user is who they claim to be.

Identity management

Managing user access is sometimes considered the first line of defense, especially for cloud solutions that need to support mobile workforces.

Therefore, you must have a well-thought-out plan for managing access. Identity management is split into two distinct areas: authentication and authorization.

Authentication is the act of a user proving they are who they say they are. Typically, this would be a username/password combination; however, as discussed in the hacking techniques section How do they hack?, somebody could compromise these.

Therefore, you need to consider options for preventing these types of attacks, as either guessing or capturing a user's password is a common exploit. You could use alternatives such as Multi-Factor Authentication (MFA) or monitoring for suspicious login attributes, such as from where a user is logging on.

Once a user is authenticated, the act of authorization determines what they can access. Following principles such as least privilege or Just Enough Access (JEA) ensures users should only access what they require to perform their role. Just-in-Time (JIT) processes provide elevated access only when a user needs it and remove it after a set period.

Continual monitoring with automated alerting and threat management tools helps ensure that any compromised accounts are flagged and shut down quickly.

Using a combination of authorization and authentication management and good user education around the danger of phishing emails should help prevent the worst attacks. Still, you also need to protect against attacks that bypass the identity layer.

Patching

When working with virtual machines (VMs), you are responsible for managing the operating system that runs on them, and attackers can seek to exploit known vulnerabilities in that code.

Regular and timely patching and security updates with anti-virus and anti-malware agents are the best line of defense against this. Therefore, your solution design needs to include processes and tools for checking, testing, and applying updates.

Of course, it is not just third-party code operating systems that are susceptible; your application code is vulnerable too.

Application code

Most cloud services run custom code, in the form of web apps or backend application programming interface (API) services. Hackers often look for programming errors that can open holes in the application. As with other forms of protection, multiple options can be included in your architecture, and some are listed here:

  • Coding techniques: Breaking code into smaller, individually deployed components and employing good development practices such as Test-Driven Design (TDD), paired programming, or code reviews can help ensure code is cleaner and error-free.
  • Code scanners: Code can be scanned before deployment to check for known security problems, either accidental or malicious, as part of a deployment pipeline.
  • Web application firewalls (WAFs): Unlike layer 3 or 4 firewalls that block access based on Internet Protocol (IP) or protocol, WAFs inspect network packet contents, looking for arbitrary code or common exploits such as SQL injection attacks.

Application-level security controls help protect you against code-level exploits; however, new vulnerabilities are uncovered daily, so you still need to prepare for the eventuality of a hacker gaining data access.

Data encryption

If the data you hold is sensitive or valuable, you should plan for the eventuality that your security controls are bypassed by making that data impossible to read. Encryption will achieve this; however, there are multiple levels you can apply. Each level makes your information more secure, but at the cost of performance.

Encryption strategies should be planned carefully—standard encryption at rest is lightweight but provides a basic protection level and should be used for all data as standard.

For more sensitive data such as credit card numbers, personal details, passwords, and so on, additional levels can be applied. Examples of how and where we can apply controls are given here:

  • Databases: Many databases now support Transparent Data Encryption (TDE), whereby the data is encrypted. Applied by the database engine itself, consuming applications are unaware and therefore do not need to be modified.
  • Database fields: Some databases provide field-level encryption that can be applied by the database engine itself or via client software. Again, this can be transparent from a code point of view but may involve additional client software.
  • Applications: Applications themselves can be built to encrypt and decrypt data before it is even sent to the database. Thus, the database is unaware of the encryption, but the client must be built specifically to perform this.
  • Transport: Data can be encrypted when transferring between application components. HyperText Transfer Protocol Secure (HTTPS) using Secure Sockets Layer (SSL) certificates is the most commonly known for end-user websites, but communications between elements such as APIs should also be protected. Other transport layer encryption is also available—for example, SQL database connections or file shares.

Data can be encrypted using either string keys or, preferably, certificates. When using certificates, many cloud vendors, including Azure, offer either managed or customer-supplied keys. With managed keys, the cloud vendors generate, store, and rotate the certificates for you, whereas with customer-supplied keys, you are responsible for obtaining and managing them.

Keys, secrets, and certificates should always be stored in a suitably secure container such as a key vault, with access explicitly granted to the users or services that need them, and access being logged.

As with other security concerns, the variability and ranges of choices mean that you must carefully plan your encryption techniques.

On their own, each control can provide some protection; however, to give your solution the best defense, you need to implement multiple tactics.

Defense-in-Depth

Modern solutions, especially those built in the cloud using microservice patterns, will be made from many components. Although these provide excellent power and flexibility, they also offer numerous attack points.

Therefore, when considering our security controls, we need to consider multiple layers of defense. Always assume that your primary measures will fail, and ensure you have backup controls in place as well.

Known as Defense-In-Depth (DID), an example would be data protection in a database that serves an e-commerce website. Enforcing an authentication mechanism on a database might be your primary control, but you need to consider how to protect your application if those credentials are compromised. An example of a multilayer implementation might include (but not be limited to) the following:

  • Network segregation between the database and web app
  • Firewalls to only allow access from the web app
  • TDE on the database
  • Field-level encryption on sensitive data (for example, credit card numbers; passwords)
  • A WAF

The following diagram shows an example of a multilayer implementation:

Figure 2.1 – Multiple-layer protection example

Figure 2.1 – Multiple-layer protection example

We have covered many different technical layers that we can use to protect our services, but it is equally important to consider the human element, as this is often the first point of entry for hacks.

User education

Many attacks originate from either a phishing/email attack or social data harvesting.

With this and an excellent technical defense in mind, a solid education plan is also an invaluable way to prevent attacks at the beginning.

Training users in acceptable online practices that help prevent them from leaking important information, and therefore any plan, should include the following:

  • Social media data harvesting: Social media platforms are a gold mine for hackers; users rarely consider the information they routinely supply could be used to access password-protected systems. Birth dates, geographical locations, even pet names and relationship information are routinely supplied and advertised—all of which are often used as security questions when confirming your identity through security questions, and so on.

    Some platforms present quizzes and games that again ask questions that answer common security challenges.

  • Phishing emails: A typical exploit is to send an email stating that an account has been suspended or a new payment has been made. A link will direct the user to a fake website that looks identical to an official site, so they enter their login details, which are then logged by the hacker. These details can not only be used to access the targeted site in question but can also obtain additional information such as an address, contact information, and, as stated previously, answers to common security questions.
  • Password policies: Many people reuse the same password. If one system is successfully hacked, that same password can then be used across other sites. Educating users in password managers or the dangers of password reuse can protect your platform against such exploits.

This section has looked at the importance of design security throughout our solutions, from understanding how and why we may be attacked, to common defenses across different layers. Perhaps the critical point is that good design should include multiple layers of protection across your applications.

Next, we will look at how we can protect our systems against failure—this could be hardware, software, or network failure, or even an attack.

Architecting for resilience and business continuity

Keeping your applications running can be important for different reasons. Depending on your solution's nature, downtime can range from a loss of productivity to direct financial loss. Building systems that can withstand some form of failure has always been a critical aspect of architecture, and with the cloud, there are more options available to us.

Building resilient solutions comes at a cost; therefore, you need to balance the cost of an outage against the cost of preventing it.

High Availability (HA) is the traditional option and essentially involves doubling up on components so that if one fails, the other automatically takes over. An example might be a database server—building two or more nodes in a cluster with data replication between them protects against one of those servers failing as traffic would be redirected to the secondary replica in the event of a failure, as per the example in the following diagram:

Figure 2.2 – Highly available database servers

Figure 2.2 – Highly available database servers

However, multiple servers are always powered on, which in turn means increased cost. Quite often, the additional hardware is not used except in the event of a failure.

For some applications, this additional cost is less than the cost of a potential failure, but it may be more cost-effective for less critical systems to have them unavailable for a short time. In such cases, our design must attempt to reduce how long it takes to recover.

The purpose of HA is to reduce the Mean Time Between Failures (MTBF). In contrast, the alternative is to reduce the Mean Time To Recovery (MTTR)—in other words, rather than concentrating on preventing outages, spend resources on reducing the impact and speeding up recovery from an outage. Ultimately, it is the business who must decide which of these is the most important, and therefore the first step is to define their requirements.

Defining requirements

When working with a business to understand their needs for a particular solution, you need to consider many aspects of how this might impact your design.

Identifying individual workloads is the first step—what are the individual tasks that are performed, and where do they happen? How does data flow around your system?

For each of these components, look for what failure would mean to them—would it cause the system as a whole to fail or merely disrupt a non-essential task? The act of calculating costs during a transactional process is critical, whereas sending a confirmation email could withstand a delay or even complete failure in some cases.

Understand the usage patterns. For example, a global e-commerce site will be used 24/7, whereas a tax calculation service would be used most at particular times of the year or at the month-end.

The business will need to advise on two important metrics—the Recovery Time Objective (RTO) and the Recovery Point Objective (RPO). The RTO dictates an acceptable amount of time a system can be offline, whereas the RPO determines the acceptable amount of data loss. For example, a daily backup might mean you lose up to a day's worth of data; if this is not acceptable, more frequent backups are required.

Non-functional requirements such as these will help define our solution's design, which we can use to build our architecture with industry best practices.

Using architectural best practices

Through years of research and experience, vendors such as Microsoft have collected a set of best practices that provide a solid framework for good architecture when followed.

With the business requirements in mind, we can perform a Failure Model Analysis (FMA). An FMA is a process for identifying common types of failures and where they might appear in our application.

From the FMA, we can then start to create a redundancy and scalability plan; designing with scalability in mind helps build a resilient solution and a performant one, as technologies that allow us to scale also protect us from failure.

A load balancer is a powerful tool for achieving scale and resilience. This allows us to build multiple service copies and then distribute the load between them, with unhealthy nodes being automatically removed.

Consider the cost implications of any choices. As mentioned previously, we need to balance the cost of downtime versus the cost of providing protection. This, in turn, may impact decisions between the use of Infrastructure-as-a-Service (IaaS) components such as VMs or Platform-as-a-Service (PaaS) technologies such as web apps, functions, and containers. Using VMs in our solution means we must build out load balancing farms manually, which are challenging to scale, and demand that components such as load balancers be explicitly included. Opting for managed services such as Azure Web Apps or Azure Functions can be cheaper and far more dynamic, with load-balancing and auto-scaling technologies built in.

Data needs to be managed effectively, and there are multiple options for providing resilience and backup. Replication strategies involving geographically dispersed copies provide the best RPO as the data is always consistent, but this comes at a financial cost.

For less critical data or information that does not change often, daily backup tools that are cheaper may suffice, but these require manual intervention in the event of a failure.

A well-defined set of requirements and adherence to best practices will help design a robust solution, but regular testing should also be performed to ensure the correct choices have been made.

Testing and disaster recovery plans

A good architecture defines a blueprint for your solution, but it is only theory until it is built; therefore, solutions need to be tested to validate our design choices.

Work through the identified areas of concern and then forcefully attempt to break them. Document and run through simulations that trigger the danger points we are trying to protect.

Perform failover and failback tests to ensure that the application behaves as it should, and that data loss is within allowable tolerances.

Build test probes and monitoring systems to continually check for possible issues and to alert you to failed components so that these can be further investigated.

Always prepare for the worst—create a disaster recovery plan to detail how you would recover from complete system failure or loss, and then regularly run through that plan to ensure its integrity.

We have seen how a well-architected solution, combined with robust testing and detailed recovery plans, will prepare you for the worst outcomes. Next, we will look at a closely related aspect of design—performance.

Architecting for performance

As we have already seen, resilience can be closely linked to performance. If a system is overloaded, it will either impact the user experience or, in the worst case, fail altogether.

Ensuring a performant solution is more than just increasing resources; how our system is built can directly impact the options available and how efficient they are.

Breaking applications down into smaller discrete components not only makes our solution more manageable but also allows us to increase resources just where they are needed. If we wish to scale in a monolithic, single-server environment, our only option is to add more random-access memory (RAM) and CPU to the entire system. As we decompose our applications and head toward a microservices pattern whereby individual services are hosted independently, we can apportion additional resources where needed, thus increasing performance efficiently.

When we need to scale components, we have two options: the first is to scale up—add more CPU and RAM; the second option is to scale out—deploy additional instances of our services behind a load balancer, as per the example in the following diagram:

Figure 2.3 – Scale-out: identical web servers behind a load balancer

Figure 2.3 – Scale-out: identical web servers behind a load balancer

Again, our choice of the underlying technology is important here—virtual servers can be scaled up or out relatively quickly, and with scale, sets can be dynamic. However, virtual servers are slower to scale since a new machine must be imaged, loaded, and added to the load balancer. With containers and PaaS options such as Azure Web Apps, this is much more lightweight and far easier to set up; containers are exceptionally efficient from a resource usage perspective.

We can also decide what triggers a scaling event; services can be set to scale in response to demand—as more requests come in, we can increase resources as required and remove them again when idle. Alternatively, we may wish to scale to a schedule—this helps control costs but requires us to already know the periods when we need more power.

An important design aspect to understand is that it is generally more efficient to scale out than up; however, to take advantage of such technologies, our applications need to avoid client affinity.

Client affinity is a scenario whereby the service processing a request is tied to the client; that is, it needs to remember state information for that client from one request to another. In a system built from multiple backend hosts, the actual host performing the work may change between requests.

Particular types of functions can often cause bottlenecks—for example, processing large volumes of data for a report, or actions that must contact external systems such as sending emails. Instead of building these tasks as synchronous activities, consider using queuing mechanisms instead. As in the example in the following diagram, requests by the User are placed in a Job Queue and control is released back to the User. A separate service processes the job that was placed in the Job Queue and updates the User once complete:

Figure 2.4 – Messaging/queueing architectures

Figure 2.4 – Messaging/queueing architectures

Decoupling services in this fashion gives the perception of a more responsive system and reduces the number of resources to service the request. Scaling patterns can now be based on the number of items in a queue rather than an immediate load, which is more efficient.

By thinking about systems as individual components and how those components respond—either directly or indirectly—your solution can be built to not just scale, but to scale in the most efficient manner, thereby saving costs without sacrificing the user experience.

In this section, we have examined how the right architecture can impact our solution's ability to scale and perform in response to demand. Next, we will look at how we ensure these design considerations are carried through into the deployment phase.

Architecting for deployment

One area of IT solutions in which the cloud has had a dramatic impact is around deployment. Traditional system builds, certainly at the infrastructure level, were mostly manual in their process. Engineers would run through a series of instructions then build and configure the underlying hosting platform, followed by another set of instructions for deploying the software on top.

Manual methods are error-prone because instructions can be misunderstood or implemented wrongly. Validating a deployment is also a complicated process as it would involve walking back through an installation guide, cross-checking the various configurations.

Software deployments led the way on this with automated mechanisms that are scripted, which means they can be repeated time and time again consistently—in other words, we remove the human element.

We can define our infrastructure in code within Azure, too, using either Azure Resource Manager (ARM) templates or other third-party tools; the entire platform can be codified and deployed by automated systems.

The ability to consistently deploy and re-deploy in a consistent manner gives rise to some additional opportunities. Infrastructure as Code (IaC) enables another paradigm—immutable infrastructure.

Traditionally, when modifications are required to the server's configuration, the process would be to manually make the configuration on the server and record the change in the build documentation. With immutable infrastructure, any modifications are made to the deployment code, and then the server is re-deployed. In other words, the server never changes; it is immutable. Instead, it is destroyed and recreated with the new configuration.

IaC and immutable infrastructure have an impact on our designs. PaaS components are more straightforward to automate than IaaS ones. That is not to say you can't automate IaaS components; however, PaaS's management does tend to be simpler. Although not a reason to use PaaS in its own right, it does provide yet one more reason to use technologies such as web apps over VMs running Internet Information Services (IIS).

You also need to consider which deployment tooling you will use. Again, Microsoft has its own native solution in the form of Azure DevOps; however, there are other third-party options. Whichever you choose will have some impact around connectivity and any additional agents and tools you use.

For example, most DevOps rules require some form of deployment agent to pull your code from a repository. Connectivity between the repository, the agent, and the Azure platform is required and must be established in a secure and resilient manner.

Because IaC and DevOps make deployments quicker and more consistent, it is easier to build different environments—development, testing, staging, and production. Solution changes progress through each environment and can be checked and signed off by various parties, thus creating a quality culture—as per the example in the following diagram:

Figure 2.5 – Example DevOps flow

Figure 2.5 – Example DevOps flow

The ability to codify and deploy complete solutions at the click of a button broadens the scope of your solution. An entire application environment can be encapsulated and deployed multiple times; this, in turn, provides the opportunity to create various single-tenant solutions instead of a one-off multi-tenant solution. This aspect is becoming increasingly valuable to organizations as it allows for better separation of data between customers.

In this section, we have introduced how deployment mechanisms can change what our end-state solution looks like, which impacts the architecture. Next, we will look in more detail at how monitoring and operations help keep our system healthy and secure.

Architecting for monitoring and operations

For the topics we have covered in this chapter to be effective, we must continually monitor all aspects of our system. From security to resilience and performance, we must know what is happening at all times.

Monitoring for security

Maintaining the security of a solution requires a monitoring solution that can detect, respond, and ultimately recover from incidents. When an attack happens, the speed at which we respond will determine how much damage is incurred.

However, a monitoring solution needs to be intelligent enough to prioritize and filter false positives.

Azure provides several different monitoring mechanisms in general and, specifically, in terms of security, and can be configured according to your organization's capabilities. Therefore, when designing a monitoring solution, you must align with your company's existing teams to effectively direct and alert appropriately, and send pertinent information as required.

Monitoring requirements cover more than just alerts; the policies that define business requirements around configuration settings such as encryption, passwords, and allowed resources must be checked to confirm they are being adhered to. The Azure risk and compliance reports will highlight any items that deviate so that the necessary team can investigate and remediate.

Other tools, such as Azure Security Center, will continually monitor your risk profile and suggest advice on improving your security posture.

Finally, security patching reports also need regular reviews to ensure VMs are being patched so that insecure hosts can be investigated and brought in line.

Monitoring for resilience

Monitoring your solution is not just about being alerted to any issues; the ideal scenario is to detect and remediate problems before they occur—in other words, we can use it as an early warning system.

Applications should include in their designs the ability to output relevant logs and errors; this then enables health alerts to be set up that, when combined with resource thresholds, provide details of the running processes.

Next, a set of baselines can be created that identify what a healthy system looks like. When anomalies occur, such as long-running processes or specific error logs, they are spotted earlier.

As well as defined alerts that will proactively contact administrators when possible issues are detected, visualization dashboards and reporting can also help responsible teams see potential problems or irregular readings as part of their daily checks.

Monitoring for performance

The same CPU, RAM, and input/output (I/O) thresholds used for early warning signs of errors also help identify performance issues. By monitoring response times and resource usage over time, you can understand usage patterns and predict when more power is required.

Performance statistics can either manually set scaling events through the use of schedules or set automated scaling rules more accurately.

Keeping track of scaling events throughout the life cycle of an application is useful. If an application is continually scaling up and down or not scaling at all, it could indicate that thresholds are set incorrectly.

Again, creating and updating baseline metrics will help alert you to potential issues. If resources for a particular service are steadily increasing over time, this information can predict future bottlenecks.

Network monitoring

CPU and RAM utilization are not the only source of problems; problems can also arise from misconfigured firewalls and routing, or misbehaving services causing too much traffic.

Traffic analytics tools will provide an overview of the networks in the solution and help identify sources that generate high traffic levels. Network performance managers offer tools that allow you to create specific tests between two endpoints to investigate particular issues.

For hybrid environments, VPN meters specifically monitor your direct connection links to your on-premises networks.

Monitoring for DevOps and applications

For solutions with well-integrated DevOps code libraries and deployment pipelines, additional metrics and alerts will notify you of failed builds and deployments. Information, support tickets, or work tasks can be automatically raised and linked to the affected build.

Additional application-specific monitoring tools allow for an in-depth analysis of your application's overall health, and again will help with troubleshooting problems.

Application maps, artificial intelligence (AI)-driven smart detection, usage analytics, and component communications can all be included in your designs to help drive operational efficiencies and warn of future problems.

We can see that for every aspect of your solution design—security, resilience, performance, and deployments—an effective monitoring and alerting regime is vital to ensure the platform's ongoing health. With proper forethought, issues can be prevented before they happen. Forecasting and planning can be based on intelligent extrapolation rather than guesswork, and responding to failure events becomes a science instead of an art.

Summary

In this chapter, we looked at a high-level view of the architecture and the types of decisions that must be considered, agreed upon, and documented.

By thinking about how we might design for security, resilience, performance, and deployment and monitor all our systems, we get a greater understanding of our solution as a whole.

The last point is important—although a system design must contain the individual components, they must all work together as a single, seamless solution.

In the next chapter, we will look at the different tools and patterns we can use in Azure to build great applications that align with best-practice principles.

Further reading

You can check out the following link for more information about Microsoft's Well-Architected Framework:

https://docs.microsoft.com/en-us/azure/architecture/framework/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.36.141