Calculating target server utilization

First, calculate your custom server utilization target, which is the point where your server is under increasing load and triggers a new server to provision with enough time so that the original server does not reach 100% utilization and drop requests. Consider this formula:

Target Utilization

Let's demonstrate how the formula works with a concrete example:

Load test your instances to find out user capacity per instance: Load test results: 3,846 users/second

Requests/sec and users/sec are not equivalents, since a user makes multiple requests to complete an action and may execute multiple requests/sec. Advanced load testing tools such as OctoPerf are necessary to execute realistic and varied workloads and measure user capacity over request capacity.

Measure instance provisioning speed, from creation/cold boot to first fulfilled request: Measured instance provisioning speed: 60 seconds

In order to measure this speed, you can put the stopwatch away. Depending on your exact setup, AWS provides event and application logs in ECS Service Events tab, CloudWatch, and CloudTrail to correlate enough information to figure out when a new instance was requested and how long it took for the instance to be ready to fulfill requests. For example, in the ECS Service Events tab, take the target registration event as the beginning time. Once the task has been started, click on the task ID to see the creation time. Using the task ID, check the task's logs in CloudWatch to see the time the task served its first web request as the end time and then calculate the duration.

Measure 95th-percentile user growth rate, excluding known capacity increases: 95th-percentile user growth rate: 10 users/second

If you don't have prior metrics, initially defining user growth rate will be an educated guess at best. However, once you start collecting data, you can update your assumptions. In addition, it is impossible to operate an infrastructure that can respond to any imaginable outlier without dropping a request in a cost-effective manner. Given your metrics, a business decision should be consciously made to what percentile of outliers should be ignored as an acceptable business risk.

Let's plug in the numbers to the formula:

The custom target utilization rate, rounded down, would be 84%. Setting your scale out trigger at 84% will avoid instances from being over provisioned, while avoiding dropping users requests.

With this custom target utilization in mind, let's update the Per User Cost formula with scaling in mind:

Per User Cost with Scaling

So if our infrastructure cost was $100 per month serving 150 users, at a 100% utilization, you calculate the Per User Cost to be $0.67/user/month. If you were to take scaling in to account, the cost would be as follows:

Scaling without dropping requests would cost 16% more of the original $0.67 at $0.79 per user per month. However, it is important to keep in mind that your infrastructure won't always be so efficient, at lower utilization targets, or misconfigured with scaling triggers costs can easily double, triple, or quadruple the original cost. The ultimate goal here is to find the sweet spot, so you will be paying the right amount per user.

There's no prescriptive per user cost you should be targeting for. However, if you are running a service where you charge users $5 per month after all other operational costs and profit margins are accounted for, you're still left over with an additional budget and your users complaining about poor performance, then you're underspending. However, if you're eating into your profits margins, or worse breaking even, then you may be overspending or you may need to reconsider your business model.

There are several other factors that can impact your per user cost such as Blue-Green deployments. You can also increase the efficiency of your scaling by leveraging pre-scheduled provisioning.

Table of Contents for Calculating target server utilization

Create new playlist

Sign In

Sign Up

Table of Contents for
Calculating target server utilization