Creating a Table

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Creating a Table

To create this table, you can use the sspa script in the prepared workspace. Just as with the identity pool we created in the last chapter, an action in the script will create and configure all the necessary resources for us. The two AWS CLI commands the script uses to do this are dynamodb create-table and iam put-role-policy. Also, just as before, we’re going to walk through what this script is doing for us, so there’s no mystery to it.

There’s a preconfigured table in the prepared workspace, with a configuration you can use or change to your liking. It’s located in conf/dynamodb/tables/learnjs/. Open the config.json file in that directory and take a look.

learnjs/5000/conf/dynamodb/tables/learnjs/config.json

	{
	"AttributeDefinitions": [
	{
	"AttributeName": "problemId",
	"AttributeType": "N"
	},
	{
	"AttributeName": "userId",
	"AttributeType": "S"
	}
	],
	"KeySchema": [
	{
	"KeyType": "HASH",
	"AttributeName": "userId"
	},
	{
	"KeyType": "RANGE",
	"AttributeName": "problemId"
	}
	],
	"ProvisionedThroughput": {
	"ReadCapacityUnits": 5,
	"WriteCapacityUnits": 5
	}
	}

This file is passed to the dynamodb create-table command, and it specifies the configuration for the table. The three top-level attributes—AttributeDefinitions, KeySchema, and ProvisionedThroughput—are all required parameters for this configuration. Let’s look at what these settings mean, and then take a look at some of the optional settings we haven’t specified yet.

Attributes and Keys

Attributes in DynamoDB not only have names and values, but they also have types. As you saw earlier, these can be simple types like Strings and Numbers, or they can be more complex types like Maps and Lists. When writing data to a table, the AWS SDK for JavaScript attempts to detect an appropriate type for an attribute, based on the data that’s in the object being saved.

However, you may want to define these attribute types up front, if you know what they’re going to be. In our case, we know that we’re going to need a number problemId attribute, and a string userId attribute. These types are specified with the AttributeType property and a string value that represents the type. The list of available types and their representations is available in the AWS documentation.^[56] In this case, we have an "N" for the number type, and an "S" for the string type.

It’s worth noting that you don’t need to specify attribute definitions ahead of time. DynamoDB doesn’t require a fixed record schema, and you can add new attributes to any record you write to the database. You have to include the property in the config.json file in order to create the table, but you don’t have to add all the attributes you want on your records. The AWK SDK will auto-detect your attribute types when you write records with new attributes.

As you also saw in the last section, DynamoDB has two options for the structure of a table’s primary key. In our case, we want to add a sort key (also called a range key). By using the Cognito identity ID as the hash key, and the problem number as the range key, we’ll be able to not only provide fast query access to the items in this table, but also authorize users to only be able to access the data they create for themselves.

The KeySchema property in config.json lets us define the settings we want for this table’s primary keys. Adding two objects to this array—one for the HASH key, and one for the RANGE key—lets us specify which attributes in the record will make up our multidimensional key. Note that the order of the keys matters. The first one must be the HASH key, and the RANGE key, if specified, must be the second one. Although this property is an array, multiple HASH and RANGE keys are not allowed.

Provisioned Throughput

A critical thing to understand about any web service, including DynamoDB, is how you pay for it. Certain costs are associated with data storage and data transfer, but we don’t need to be concerned about that when creating a table. All we need to figure out right now is what the ProvisionedThroughput setting on our table should be.

With DynamoDB, you purchase the capacity to do read and write operations up front. This capacity is measured in read and write units, and you can increase or decrease the capacity on demand to meet the needs of your app. Each read unit gives you the capability to perform one strongly consistent read per second, of an item 4KB or smaller. Using an eventually consistent read gives you two such reads per second for each read capacity unit you allocate. A unit of write capacity allows you to write one item per second, of 1KB or less. These units scale linearly—that is, doing a strongly consistent read, an item of 40KB will use ten read capacity units, and writing five items of 1KB in less than a second will consume five write capacity units.

You can change your provisioned throughput whenever you like to meet the needs of your app. You can provision up to 40,000 read capacity units per table through the AWS console or APIs. Beyond that, you need to contact Amazon support. Amazon’s claim is that DynamoDB can scale without limits, but as you can imagine, Amazon wants a little heads up.

When provisioning throughput, you need to consider how random the application’s query patterns are. Amazon recommends that you choose primary keys that evenly distribute your query load across the hash space. This is because DynamoDB splits the total key space up into partitions, and it assigns a fraction of your data (and throughput capacity) to each partition.^[57] So you might normally think that your provisioned throughput applies to the entire table. However, if your app doesn’t query evenly across the keys in the table, you can run into capacity problems while still staying below the total capacity you’ve allocated.

If the app exceeds the allocated capacity, any request it makes to DynamoDB will result in a ProvisionedThroughputExceededException. Our primary key is a userId, so what this means for our app is that if we have one user who is performing a disproportionate number of read and write operations, it’s possible that user may start running into capacity errors when nobody else is seeing a problem. Whether this is a bug or a feature is up to you to decide.

To specify the provisioned throughput for our new table, we need to set the ReadCapacityUnits and WriteCapacityUnits properties in the config object. The example config in the prepared workspaces has these set to five, but the AWS Free Tier provides up to twenty-five read and write capacity units for free across all of your DynamoDB tables. If you’re creating two tables for production and test environments, you might want to split this up between the two; otherwise, you can set this as high as twenty-five without adding to your AWS bill.

You can divide your Free Tier throughput capacity among multiple tables.

Now that you understand what throughput capacity is, go ahead and save any changes you’ve made to the config.json file. You’ll be able to change this as needed, so it’s not critical to get it right the first time. However, if you start seeing errors in your application, you’ll likely want to change these values right away. You can change them using the AWS CLI tool, or by adjusting the table settings in the AWS web console.

This table has other settings that we’re not specifying in config.json. Before moving on, let’s take a look at another option you have when creating DynamoDB tables. Although you might not need it now, when it comes time to scale your application up, it may become an essential part of your app.

Secondary Indexes and Query vs. Scan

There are two primary methods of getting data out of a DynamoDB table. A query efficiently selects items out of the table based on primary key values—hash, or hash and range, depending on the structure of the key. When using a range key, queries can quickly select a subset of the items in the table that meet the conditions of the query.

The other way to get data out of a table is by using a scan. A scan evaluates every item in the table, and it can be much slower than a query. By default, a scan returns all the items in the table (up to a limit of 1MB of data), but you can narrow the results by providing a filter expression. However, just like a regular scan, this expression will be evaluated against every item in the table.

Running a scan against a table can be slow if the table contains too much data, even if the results only contain a handful of items. To get around this, you can query against a secondary index^[58] instead. There are two kinds of secondary indexes: global and local.

A global secondary index contains a copy of the data in a table, but is indexed with different primary keys to allow for fast access. Local secondary indexes use the same hash key as the table, but provide an additional range key, based on one of the attributes on the items in the table. Global secondary indexes can have either a hash or hash and range primary key, and the primary key of a global secondary index can be made up of any attributes in the table.

Global secondary indexes need their own provisioned throughput.

While it’s only possible to create local secondary indexes when you create the table, you can create global secondary indexes later, if you need to. Keep this in mind as your applications grow. We don’t need to create any secondary indexes for our application right now, but as the data model evolves and the size of the tables starts to scale up, we may want to add them to maintain acceptable performance characteristics. Creating one or more global secondary indexes can be a quick and easy way to resolve query performance issues.

Now that you understand all the options in our table configuration, it’s time to run the command to create the table. You’ll need to specify both the configuration directory and the name of the identity pool with users who are allowed to access the table, like so:

learnjs $ ./sspa create_table conf/dynamodb/tables/learnjs/ learnjs

The reason you need to provide the name of the identity pool is that the sspa script also creates the necessary IAM policy to allow access to those users. In the next section, we’ll take a look at that generated policy and see how Cognito and DynamoDB can work together to allow safe and secure direct access to the database.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Creating a Table

Create new playlist

Sign In

Sign Up

Creating a Table

Attributes and Keys

Provisioned Throughput

Secondary Indexes and Query vs. Scan

Table of Contents for
Creating a Table