Using Auto Scaling

Auto Scaling scale your compute instances to the extent required otherwise scale down, automatically. Auto Scaling aligns your deployed infrastructure to the demand at any given point in time. You can define launch configurations for your EC2 instances and then set up appropriate auto scaling groups for them. This helps automate the process of saving money by turning off unused instances during scale down. You can set parameters, such as the minimum and maximum number of instances, to meet your functional and nonfunctional requirements while controlling your overall costs.

It can take a few minutes for your new instances to come online during a scale-up. So, make sure that you account for this lag while establishing your thresholds. Do not set the threshold too high in production (for example, at 90% CPU utilization) because there is a high likelihood that your existing instances will hit 100% utilization before the new instances have spun up. It is a good practice to set the production CPU utilization thresholds to be between 60-70% to give you sufficient headroom. At the same time, in order to guard against inadvertent scale up, due to a random spike, you should also specify a duration of say 2 or 5 minutes at the threshold CPU utilization before the scale-up process kicks in. As EC2 instances are charged by the hour, do not rush to scale down immediately after you have scaled up (against an immediate dip in utilization below the threshold). You can set a longer duration say 10-20 minutes at the reduced CPU utilization threshold before scaling down.

You can also set thresholds for network and memory utilization based on profiling your application or working with an initial best guess and iteratively adjusting it to arrive at the right threshold values. However, avoid setting multiple scaling triggers per auto scaling group because this increases the chance of conflict in your triggers. This could lead to a situation where you are scaling up based on one trigger while scaling down due to another. You should also specify a cooling down period upon a scale down.

If you have implemented a multi-AZ architecture, then scale up and scale down should be in increments of two instances at a time. This helps keep your AZs balanced with equal numbers of instances in each.

Sometimes, massive scaling is required in response to certain planned or scheduled events. Special events such as a big sales event or flash sales events on popular e-commerce sites, or during sporting events and elections reporting on news sites can lead to disruptions due to the huge spikes in resource usage and demand during these events. In such cases, it may be a better approach to over-provision instances for the sharp increase in traffic rather than relying on auto scaling alone. After the event is over, the instances can be scaled down, appropriately.

You can do schedule-based scaling where you can match scaling with the workload at different times during the day and/or weekends. This approach can also be used to scale down development and test environments during off-peak hours.

After you have architected your application environment, the next step is to monitor it. Monitoring is important because it helps you validate your architectural decisions. If your focus is both costs and usage, then you need to monitor them closely, as they are both necessary to identify targets for further optimizations. Tag your instances with identifying information with respect to the environment, owner, cost center, and so on for reporting purposes. You also need to establish various monitoring thresholds and alerts. Analyze this information frequently to iterate on your architecture for further savings.

Table of Contents for Using Auto Scaling

Create new playlist

Sign In

Sign Up

Table of Contents for
Using Auto Scaling