Alarms

CloudWatch alarms give us the ability to create actions upon any kind of metric, log or event. Before creating alarms, we will always need to determine the reason for the alarm. There are numerous reasons why we would require an action to be performed on a metric and we will look at some examples, namely the following:

  • Notifications
  • Autoscaling
  • Auto-recovery
  • Event-based computing

The simplest goal for alarms is notifications. We can create an alarm that will notify us of a certain metric being above a certain threshold for a certain period of time. For example, we need to be aware of any RDS instance where the available space is below 10%. We can simply create an alarm that will send an email via an SNS topic to our incident response department to deal with the issue and increase the size of the volume or handle it in another appropriate manner.

We can also use the alarms to enable autoscaling for our applications. As discussed in the EC2 chapter, we can use CloudWatch to scale our EC2 clusters if the CPU metrics are above a certain percentage for a certain amount of time. For example, if the aggregate CPU usage of the cluster is above 80% for 5 minutes, the CloudWatch alarm can notify the autoscaling service and a scaling action that will increase the number of instances in the cluster will be performed.

The same approach can be used for auto-recovery. For example, we have an application where our developers have identified an issue that causes the instance to continue processing requests even after the session has been closed. Instances affected by this issue will start exhibiting a continuous increase in the CPU utilization, which does not affect the application until it hits 100% and cannot take on any new requests. The developers are working on a bug fix that should be due out in a few days, but in the meantime we need to make sure we implement a temporary solution that will reboot any affected instances before they hit 90% CPU utilization. We can simply create an alarm, for which we set an EC2 action to reboot the instance when the CPU reaches the desired threshold. This is a simple solution that will allow our application to survive until the bug fix is in place.

CloudWatch alarms can also be used as triggers for event-based computing. Literally any metric and any threshold can be used to trigger some kind of function or other process that will perform some kind of intelligent action. For example, when monitoring our RDS volumes, instead of only notifying the incident response team, we can also subscribe a Lambda function to the same SNS topic, and that Lambda function can have the appropriate permissions to modify our RDS instances. In this case, we could enable the Lambda function to automatically increase the size of the RDS volume and completely automate the response scenario in this example.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.46.92