Many applications and systems that you create need to store files. These may be profile images, documents uploaded by a user, or artifacts generated by the system. Some files are temporary and transient, whereas other files must be kept for a long time. A reliable service for storing files is Amazon’s Simple Storage Service (S3). It was Amazon’s first available web service, launched in March 2006, and it has been a cornerstone AWS service ever since. In this chapter, we’ll explore S3 in more detail. We’ll look at features such as versioning, storage classes, and transfer acceleration. And you’ll continue to work on 24-Hour Video by adding new storage-related features.
You’ve been working with S3 since chapter 3 but haven’t had a proper, in-depth look at it. Apart from basic file storage, S3 has many great features. These include versioning, hosting of static websites, storage classes, cross-region replication, and requester-pays buckets. Let’s explore some of the more compelling features of S3 and see how they’re useful.
If you ever need a thorough guide to S3, Amazon’s documentation is a great reference. Check out https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html for good walkthroughs and examples of how S3 works.
Up until now you’ve been using S3 as a basic storage mechanism for files (or objects, as S3 refers to them). Back in chapter 3 you created two S3 buckets to store videos. One of the buckets was for users uploading files. The other bucket was for transcoded files. This was simple and practical, but it also meant that you could overwrite and lose existing files. Luckily, S3 has an optional feature that allows it to keep copies of all versions of every object. This means that you can overwrite an object and then go back to previous versions of that object at any time. It’s powerful and completely automatic. We’ve been thankful for versioning when we’ve accidentally deleted files and had to restore them.
Buckets don’t have versioning enabled by default, so it must be turned on. And once versioning is enabled, it can’t be turned off in that bucket, only suspended. Thus, a bucket can be in only one of the following three possible states:
As expected, the cost of using S3 goes up when versioning is used. But you can remove versions you no longer need so that you’re not billed for the files you don’t want to keep. S3 Object Lifecycle Rules (see section 8.1.4) and versioning can help to automate removal and archival of old versions. For example, you can set up an S3 bucket to work like an operating system trash can (that is, you can set up a rule to delete old files from the bucket after a period of time, such as 30 days).
To enable versioning, follow these steps:
You can now overwrite, delete, and then recover older versions of an object yourself:
Every versioned object in S3 has a unique version ID. As you can see from figure 8.1, you can have many objects with the same key but different IDs (see the second-to-last column in the diagram). If you decide to retrieve a version programmatically, you need to know its ID. The version ID isn’t hard to get using the AWS SDK or the REST API. You can retrieve all objects and their version IDs or retrieve the version ID for a given key. You’ll also get other useful metadata such as the object’s LastModified date. Once you know the version ID, it’s easy to download the file. If, for example, you were retrieving an image using the REST API, you could issue a GET request to /my-image.jpg?versionId=L4kqtJlcpXroDTDmpUMLUo HTTP/1.1 to get it.
Static website hosting is a popular use case for S3 buckets. S3 doesn’t support server-side code (that is, code that needs to execute on a server), but it can serve static website content like HTML, CSS, images, and JavaScript files. S3 is an effective way to host static websites because it’s quick and cheap to set up. After static website hosting is enabled, content in the bucket becomes accessible to web browsers via an endpoint provided by S3.
A Cloud Guru (https://acloud.guru) initially hosted its static website on S3. The web-site built on AngularJS worked well except in cases of certain web crawlers. The team discovered that rendering of rich-media snippets of the website on Facebook, Slack, and other platforms didn’t work. This was because crawlers used by those platforms couldn’t run JavaScript. A Cloud Guru needed to serve a rendered, static HTML version of the website that those crawlers could parse. Unfortunately, with S3 and Cloud-Front that couldn’t be done. There was no way to prerender and serve an HTML version of the site to Facebook and then another (JavaScript-driven) version to everyone else. In the end, A Cloud Guru chose to move to Netlify (a static website-hosting service) to solve its problem.
Netlify integrates with a service called prerender.io. Prerender.io can execute Java-Script and create a static HTML page. This HTML page can then be served to crawlers while normal users continue to use the regular SPA website. Netlify (https://www.netlify.com) is a great little service that’s worth checking out.
Let’s walk through the process to see how to enable static website hosting and allow you to serve HTML from the bucket:
Next, you need to set up a bucket policy to grant everyone access to the objects in the bucket:
To test whether static website hosting works, upload an HTML file to the bucket (make sure it’s named index.html) and open the bucket’s endpoint in a web browser. You can go one step further: copy the 24-Hour Video website to the bucket and try to open it (in fact, this is one of the exercises at the end of this chapter).
You can buy a domain name and use Amazon’s Route 53 as the DNS provider for the domain. If you do this, be aware of a few gotchas. For example, the name of your bucket must match the name of your domain. So if your domain is www.google.com, the bucket must also be called www.google.com. Consult the following step-by-step guide if you want to set up a custom domain with your S3 bucket: https://docs.aws.amazon.com/AmazonS3/latest/dev/website-hosting-custom-domain-walkthrough.html.
Moving away from versioning, we think it’s fair to say that different data has different storage requirements. Some data, such as logs, may need to be kept for a long time but accessed infrequently. Other kinds of data may need to be accessed frequently but may not need the same kind of storage reliability. Luckily, S3 has something for everyone because it supports four kinds of storage classes with different levels of redundancy, access characteristics, and pricing (https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html):
We’ll discuss these in more detail, but first a note about pricing. Pricing is a combination of factors such as the storage class, location (region) of the files, and the quantity of data stored. For this example, we’ll simplify our requirements and assume the following:
If your requirements are different, you can always check the S3 pricing page for more detail (https://aws.amazon.com/s3/pricing/). Also, note that apart from storage, S3 charges for requests and data transfers.
This is the default storage class in S3. It’s set automatically on any object you create or upload (if you don’t specify a different class yourself). This class is designed for frequent access to data. The cost is $0.0300 per GB for the first TB of data (per month). This class, as well as Standard_IA and Glacier classes, has a durability rating of 99.999999999%.
This class is designed for less frequently accessed data. Amazon recommends using this class for backups and older data that requires quick retrieval when needed. The request pricing is higher for Standard_IA than for Standard ($0.01 per 10,000 requests for Standard_IA versus $0.004 per 10,000 requests for Standard). The storage cost, however, is less at $0.0125 per GB for the first TB of data (per month).
The Glacier storage class is designed for infrequent access to data and where retrieval can take three to five hours. It’s the best option for data such as backups that don’t require real-time access. The Glacier storage class uses the Amazon Glacier service, but objects are still managed from the S3 console. It’s important to note that objects can’t be created with the Glacier class from the get-go. They can only be transitioned to Glacier using lifecycle management rules (see section 8.1.4). Glacier storage is charged at $0.007 per GB for the first TB of data (per month).
The fourth class is Reduced Redundancy storage (RRS), which is designed to be cheaper and with less redundancy than other classes. This class has a durability rating of 99.99% (as opposed to all other classes that are designed for a durability rating of 99.999999999%). Amazon recommends using this storage class for data that can be easily re-created (for example, use the Standard storage class for original images uploaded by users and use RRS for autogenerated thumbnails). Naturally, RRS costs are lower. The price for storage is $0.0240 per GB for the first TB (per month).
Lifecycle management is a great feature of S3 that can be used to define what happens to an object over its lifetime. In essence, you can set up rules to do the following:
Every rule requires you to enter a time period (in days from the creation of the file) after which it takes an effect. You can, for example, set up a rule to archive an object to Glacier class storage 20 days after it has been created.
To set up lifecyle management for your objects, follow these steps:
If you ever want to disable or edit a rule at a later stage, you can do it later in the Lifecycle section of the bucket.
Transfer acceleration is a feature of S3 that allows it to upload and transfer files more quickly than normal using Amazon’s CloudFront distributed edge locations. Amazon recommends using transfer acceleration in cases where users from all over the world need to upload data to a centralized bucket (this could be a use case for 24-Hour Video) or transfer gigabytes (or terabytes) of data across continents (https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html). You can use the speed comparison tool (figure 8.4) to see what effect enabling transfer acceleration would have for you (http://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-comparsion.html). The pricing for data transfer (in and out) ranges from $0.04/GB to $0.08/GB, depending on which CloudFront edge location is used to accelerate the transfer.
To enable S3 bucket transfer acceleration using the AWS console, follow these steps:
You can suspend transfer acceleration by choosing the Suspend option at any time. You can also enable transfer acceleration using the AWS SDK or the CLI. Refer to https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration-examples.html for more information.
We first used S3 event notifications all the way back in chapter 3 when we connected Lambda and then SNS to a bucket. The purpose of event notifications is to receive notifications when the following events take place in a bucket:
S3 can publish events to the following destinations (the bucket and the target must be in the same region):
You might remember to grant S3 permissions to post messages to SNS topics and SQS queues. You worked on an IAM policy for SNS and S3 permissions in chapter 3, but we haven’t looked at how to do it with SQS. The following listing shows an example policy that you’d need to attach to an SQS queue if you decided to use it as a destination for S3 events. Naturally, S3 must also be given permission to invoke Lambda functions, but if you use the S3 console, it will be done for you automatically.
You can find more information on S3 events, including examples and IAM policies, at https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html.
S3 events are important to understand if you use S3 and Lambda (or SNS or SQS) together. S3 events are a JSON message with a specific format that describes the bucket and the object. We briefly looked at the S3 event message structure in chapter 3 (section 3.1.4) when you tested the transcode-video function locally. Appendix F provides a more detailed overview of the event message structure, which you might find useful going forward.
So far, you’ve been uploading your videos directly to a bucket using the S3 console when you wanted to test 24-Hour Video. But that’s not going to work for your end users. They need an interface to be able to upload their files to the 24-Hour Video website. You also don’t want just anyone (that is, anonymous users) to upload files. Only registered, authenticated users should be allowed to do this. In this section, you’re going to work on adding secure upload functionality to 24-Hour Video. Your end users will be able to click a button on the website, select a file, and upload it to an S3 bucket. Figure 8.5 shows which component of your architecture you’ll be working on in this section.
To upload a file from a user’s browser to a bucket in a secure, authenticated fashion, you need the following:
To get started, you’re going to create a Lambda function. This function will validate the user and generate a policy and a signature needed to upload the file to S3. This information will be sent back to the browser. Upon receiving this information, the user’s browser will begin an upload to a bucket using HTTP POST. All this will be invisible to the end user because they’ll just select a file and upload. Figure 8.6 shows this flow in full.
You could do it differently and use Auth0 to provide you with temporary AWS credentials and then use the AWS JavaScript SDK to upload the file. It’s a viable way of doing things (https://github.com/auth0-samples/auth0-s3-sample) but we wanted to write a Lambda function to show you how to generate a policy and upload using a simple POST request. Having a Lambda function will also give you more opportunities to do interesting things later on (such as logging attempts to request credentials, updating a database, and sending a notification to an administrator).
Here are the steps you need to carry out to get everything working:
The IAM user you’re going to create will have the permissions needed to upload files to S3. If you don’t give this user the right permissions, uploads will fail. Create a new IAM user in the IAM console as per normal (refer back to chapter 4 for more information on creating IAM users if you need help) and name the user upload-s3. Make sure to save the user’s access and secret keys in a secure place. You’ll need them later. From here follow these steps:
The only parameter that this Lambda function takes is the name of the file the user wants to upload. The output from this function is the following:
All of this information is needed to upload a file to S3.
Clone one of the other Lambda functions you’ve written previously on your computer and rename it to get-upload-policy. Update package.json as you see fit (you’ll have to update the ARN or the function name in the deployment script if you want to deploy the function from the terminal). Also, update dependencies to match the following listing. Remember to run npm install from the terminal to install these dependencies.
Having updated package.json, copy the next listing to index.js.
Create a new blank function in the AWS console (you can always refer to appendix B for information on how to create a Lambda function) and name it get-upload-policy. Assign the lambda-s3-execution-role role to the function. You should have that role from chapter 3. Deploy the function from your computer to AWS (you can run npm run deploy from the terminal, but make sure to set the right ARN in package.json).
Finally, you need to set the right environment variables for the get-upload--policy to work. In the AWS console, open the get-upload-policy Lambda function and add four environment variables at the bottom. These variables should be your upload bucket (UPLOAD_BUCKET), the access key of the upload-s3 user you created (ACCESS_KEY), the secret access key of that user (SECRET_ACCESS_KEY), and the S3 upload URL (UPLOAD_URI). Figure 8.7 shows what this looks like.
It’s time to turn to the API Gateway. You need to create an endpoint that will invoke the Lambda function you just created:
You care about security so you should enable your custom authorizer for this method (you might remember that a custom authorizer is a special-function Lambda function that’s called by the API Gateway to authorize the incoming request):
Finally, deploy the API Gateway (click Deploy API under Actions) to make your changes live. The AWS Lambda and API Gateway side of things are done, but there’s one more thing left do in AWS. You need to update the upload bucket CORS configuration to make sure that POST uploads are allowed.
The default S3 CORS configuration won’t allow POST uploads to take place. That’s the default set by AWS. It’s easy to change, though. Click into your upload bucket and follow these steps:
Now you can move on to your website.
You’re going to add a new file called upload-controller.js to the 24-Hour Video website. Create this file in the js folder and copy listing 8.7 to it. The purpose of this file is to do the following:
Open index.html and add the line <script src="js/upload-controller.js"></script> above <script src="js/config.js"></script> to include the new file in the website. Finally, below the line that says <div class="container" id="video-list--container">, copy the contents of the next listing, which contains HTML for an upload button and an upload progress bar.
Edit main.js to include uploadController.init(configConstants); under videoController.init(configConstants); and modify main.css in the css directory and include the contents of the following listing at the bottom of the file.
There’s one more step. You need to modify user-controller.js to do the following:
In user-controller.js make the following edits:
Start the web server by running npm start from the terminal (in the website’s directory). Open the website in a browser and log in. You should see a blue button appear in the middle of the page. Click this button and upload the file. If you have the web browser’s developer tools opened, you can inspect requests. You’ll see that first there is a request to /s3-policy-document followed by a multipart POST upload to S3 (figure 8.8).
You might notice something odd if you inspect the transcoded bucket at this time. The key of newly uploaded files will look like this: <guid>/file/<guid>/file.mp4 instead of <guid>/file.mp4. That’s a little bit puzzling until you look at the transcode-video Lambda function you implemented in chapter 3. This function sets an OutputPrefix, which prepends a prefix and is the cause of your problem. You needed an output prefix originally when you uploaded files directly to S3. Now you’re creating a prefix manually in the upload-policy Lambda function, so you don’t need to do it twice. Remove the line OutputKeyPrefix: outputKey + '/', from the transcode-video Lambda function and redeploy it. That will fix the annoyance.
So far you’ve made your transcoded video files public. But what if you want to secure these videos and make them available only to authenticated users? You might decide to charge for videos (bandwidth isn’t free!) where only users who have registered and paid have access to files. To implement such a system, you need to do two things:
To restrict public access to files, you need to remove the bucket policy you already have. In the transcoded bucket, do the following:
You may also remember that you have a set-permissions Lambda function that changes permissions on the video (you created that function in chapter 3). You can remove that Lambda function or, better yet, disconnect it from SNS (it’s invoked from the transcoded-video-notifications SNS topic). Remove the subscription from SNS now.
Furthermore, you’ll need to change the permission for each video to make sure that it can’t be publicly accessed. To do this you need to do the following:
If you try refreshing 24-Hour Video now, you’ll see that every request for a video will come back as Forbidden, with a 403 status code.
The second step is to generate presigned URLs to allow users to access videos without hitting the 403 status code. You’re going to modify the get-video-list function to generate these presigned URLs and then return them to the client. Having this capability allows you to add additional functionality to 24-Hour Video. For example, you can put the get-video-list function behind a custom authorizer and force users to log in before they can retrieve videos. And once you have a database, you can implement features like private videos, subscriptions, and lists. With presigned URLs, you can control who gets access to which videos and for how long.
Let’s first update the video-listing Lambda function to generate presigned URLs and return them. Replace the line urls.push(file); in the createList function with the code in the next listing.
Modify video-controller.js in 24-Hour Video and replace the line
clone.find('source').attr('src',baseUrl + '/' + bucket + '/' + video.filename);
with
clone.find('source').attr('src', video.filename);
Refresh the website (make sure the web server is running) to see the videos again. You may have noticed that now you’re passing back the full URL rather than an S3 key as before. It’s also important to keep in mind that the default expiration for presigned URLs is 15 minutes. After 15 minutes your users would have to refresh to get new URLs. You can control the expiration by adding an Expires property to params (it’s specified as an integer in seconds). The following would make the URL valid for 30 minutes:
var params = {Bucket: config.BUCKET, Key: file.Key, Expires: 18000}
In this chapter we covered useful S3 features and implemented video uploads for 24-Hour Video. Try to complete the following exercises to reinforce your knowledge:
In this chapter, we explored S3 and you added a new feature to 24-Hour Video. The S3 features we covered in section 8.1 are useful for managing files. You learned about the following:
You can use this knowledge to effectively manage your storage service. We also showed how to upload files directly from a user’s web browser and how to generate presigned URLs. In the next chapter, we’ll introduce Firebase. This real-time streaming database can be a powerful addition to your serverless application. You’ll also work to complete 24-Hour Video by adding this last piece of the puzzle.
3.145.193.134