DynamoDB partitions and distribution

We already mentioned that DynamoDB is a distributed cluster that stores our data in a redundant and distributed manner. The data is stored in partitions, which are distributed and replicated across availability zones for fault tolerance, high availability, and redundancy. The partitions are fully managed by AWS, but we can determine the pattern of distribution of our data across partitions with the partition key.

DynamoDB will, in principle, always assign enough partitions to your table to handle the provisioned capacity for your tables. But it is on us to make sure the performance is always up to the capacity defined, since the data is stored in the partitions using the partition key. If our partition keys have a bad distribution, then we could create a so-called hot partition where a lot of items with the same partition key are stored.

For example, when our partition key is username and the sort key is login_time, we could get a lot of entries for one particular username and all those will be sitting on the same backend partition. This does not become a problem until we try to perform operations that will consume more than 3,000 RCUs or 1,000 WCUs on that particular partition, as this is the limit of the per-key performance that DynamoDB can muster.

The way DynamoDB determines the partition is by running a hash operation against the partition key values when we write them. This means that a good distribution of our partition keys is crucial to the performance of our tables. When there is no other option but to use the same partition key, we can always add a calculated suffix to the data instead of using the same entry for the partition key for each time the user logs in:

username	login_time
`thownsr`	2018-09-11-18:30:07
`thownsr`	2018-09-13-12:18:46
`thownsr`	2018-09-15-19:21:13

We can use a calculated suffix derived from the data of the login and make sure our application understands that username.wxyz is the same as username without the suffix, like so:

username	login_time
`thownsr.2106`	2018-09-11-18:30:07
`thownsr.2116`	2018-09-13-12:18:46
`thownsr.2095`	2018-09-15-19:21:13

In this example, we get three different partition keys for the same username by calculating all the numbers in the date like so:

2018+09+13+12+18+46 = 2116

This makes the suffix in this example .2116 and the partition key thownsr.2116.

This approach gives us a whole new range of partition keys that will have a whole new range of hashes, thereby avoiding having a hot partition without having to change the data model.

Table of Contents for DynamoDB partitions and distribution

Create new playlist

Sign In

Sign Up

Table of Contents for
DynamoDB partitions and distribution