To give you a thorough understanding of serverless architectures, you’re going to build a serverless application. Specifically, you’ll build a video-sharing website, a YouTube mini clone, which we’ll call 24-Hour Video. This application will have a website with user registration and authentication capabilities. Your users will be able to watch and upload videos. Any videos uploaded to the system will be transcoded to different resolutions and bitrates so that people on different connections and devices will be able to watch them. You’ll use a number of AWS services to build your application, including AWS Lambda, S3, Elastic Transcoder, SNS, and non-AWS services such as Auth0 and Firebase. In this chapter, we’ll focus on building your serverless pipeline for transcoding uploaded videos.
Before we jump in to the nitty-gritty of the chapter, let’s step ahead and look at what you’re going to accomplish by the time you get to the final chapter. Figure 3.1 shows a 10,000-foot view of the major components you’re going to develop. These include a transcoding pipeline, a website, and a custom API. At the end, you’ll also have a full-fledged system with a database and a user system.
The website you’re going to build will look like figure 3.2. Videos uploaded by your users will be shown on the main page. Your users will be able to click any video and play it.
The overall purpose of building 24-Hour Video throughout the book is threefold:
Before you begin, however, you need to set up your machine, install the necessary tooling, and configure a few services in AWS. Details for that process are in appendix B, “Installation and setup.” Go through appendix B first, and then come back here to begin your adventure!
You’re going to build an important part of your system in this chapter: an event-driven pipeline that will take uploaded videos and encode them to different formats and bitrates. 24-Hour Video will be an event-driven, push-based system where the workflow to encode videos will be triggered automatically by an upload to an S3 bucket. Figure 3.3 shows the two main components you’re going to work on.
A quick note about AWS costs: most of AWS services have a free tier. By following the 24-Hour Video example, you should stay within the free tier of most services. Elastic Transcoder, however, is likely to be the one that costs a little. Its free tier includes 20 minutes of SD output and 10 minutes of HD (720p or above) output per month (a minute refers to the length of the source video, not transcoder execution time). As usual, costs are dependent on the region where Elastic Transcoder is used. In the eastern part of the United States, for example, the price for 1 minute of HD output per month is $.03. This makes a 10-minute source file cost 30 cents to encode. Elastic Transcoder pricing for other regions can be found at https://aws.amazon.com/elastictranscoder/pricing/.
The S3 free tier allows users to store 5 GB of data with standard storage, issue 20,000 GET requests and 2,000 PUT requests, and transfer 15 GB of data out each month. Lambda provides a free tier with 1M free requests and 400,000 GB-seconds of compute time. You should be well within the free tiers of those services with your basic system.
The following are the high-level requirements for 24-Hour Video:
To make things simpler to manage, you’ll set up a build and deployment system using the Node Package Manager (npm). You’ll want to do it as early as possible to have an automated process for testing, packaging Lambda functions, and deploying them to AWS. You will, however, temporarily set aside other developmental and operational aspects such as versioning or deployment and come back to them later.
To create your serverless back end, you’ll use several services provided by AWS. These include S3 for storage of files, Elastic Transcoder for video conversion, SNS for notifications, and Lambda for running custom code and orchestrating key parts of the system. Refer to appendix A for a short overview of these services. For the most part, you’ll use the following AWS services:
Figure 3.4 shows a detailed flow of the proposed approach. Note that the only point where a user needs to interact with the system is at the initial upload stage. This figure and the architecture may look complex, but we’ll break the system into manageable chunks and tackle them one by one.
Now that you’ve taken care of the setup and configuration details in appendix B, it’s time to write the first Lambda function. In the same directory as package.json, which you created during installation, create a new file named index.js and open it in your favorite text editor. This file will contain the first function. The important thing to know is that you must define a function handler, which will be invoked by the Lambda runtime. The handler takes three parameters—event, context, and callback—and is defined as follows:
exports.handler = function(event, context, callback){}
Your Lambda function will be invoked from S3 as soon as a new file is placed in a bucket. Information about the uploaded video will be passed to the Lambda function via the event object. It will include the bucket name and the key of the file being uploaded. This function will then prepare a job for the Elastic Transcoder; it will specify the input file and all possible outputs. Finally, it will submit the job and write a message to an Amazon CloudWatch Log stream. Figure 3.5 visualizes this part of the process.
Listing 3.1 shows this function’s implementation; copy it into index.js. Don’t forget to set PipelineId to the corresponding Elastic Transcoder pipeline you created earlier. You can find the Pipeline ID (figure 3.6) in the Elastic Transcoder console by clicking the magnifier button next to the pipeline you created in appendix B.
Our GitHub repository at https://github.com/sbarski/serverless-architectures-aws has all the code snippets and listings you need for this book. So you don’t have to manually type anything out—unless you really want to.
You can name the file containing your Lambda function something other than index.js. If you do that, you’ll have to modify the handler value in Lambda’s configuration panel in AWS to reflect the new name of the file. For example, if you decide to name your file TranscodeVideo.js rather than index.js, you’ll have to modify the handler to be TranscodeVideo.handler in the AWS console (figure 3.7).
Having copied the function from listing 3.1 into index.js, you can think about how to test it locally on your machine. A way to do that is to simulate events and have the function react to them. This means you have to invoke the function and pass three parameters representing the context, event, and callback objects. The function will execute as if it was running in Lambda, and you’ll see a result without having to deploy it.
You can run Lambda functions locally using an npm module called run-local-lambda. To install this module, execute the following command from a terminal window (make sure you’re in the function’s directory): npm install run-local-lambda save-dev.
This module allows you to invoke your Lambda function but it doesn’t emulate Lambda’s environment. It doesn’t respect memory size or the CPU, ephemeral local disk storage, or the operating system of real Lambda in AWS.
Modify package.json, as in the next listing, to change the test script. The test script will invoke the function and pass the contents of event.json, a file you’re about to create, as the event object. For more information about this npm module, including additional parameters and examples, see https://www.npmjs.com/package/run-local-lambda.
The test script requires an event.json file to function. This file must contain the specification of the event object that run-local-lambda will pass in to the Lambda function. In the same directory as index.js, create a subdirectory called tests and then create a file called event.json in it. Copy the next listing into event.json and save it.
To execute the test, run npm test from a terminal window in the directory of the function. If it works, you should see the values of key, sourceKey, and outputKey print to the terminal.
Having run the test script, you might see an error message with an AccessDenied-Exception. That’s normal, because your user lambda-upload doesn’t have permissions to create new Elastic Transcoder jobs. Once uploaded to AWS, your function will run correctly because it will assume the identity and access management (IAM) role defined in appendix B. One of the exercises at the end of this chapter will be to add a policy to the IAM user (lambda-upload) to create Elastic Transcoder jobs from your local system.
You’re now ready to deploy the function to AWS. To do that, you need to modify package .json to create predeploy and deploy scripts. The predeploy script creates a zip file of the function. The deploy script then deploys the zip file to AWS. Note that if you’re a Windows user, you won’t have the zip file, which is needed by the predeploy script, installed by default. Please refer to appendix B and the sidebar “Zip and Windows” for further information. Update package.json to include deploy and predeploy scripts, as shown in the following listing.
For deployment to work, the --function-name parameter must match the name or the ARN of the function. If you wish to use the ARN, follow these steps:
Having updated the ARN value in the deploy script, execute npm run deploy from the terminal. This will zip up the function and deploy it to AWS. If the deployment was successful, you’ll see the current function configuration, including timeout and memory size, printed to the terminal (chapter 6 goes into more detail on function configuration options and what all of this represents).
The last step before you can test the function in AWS is to connect S3 to Lambda. You need to configure S3 to raise an event and invoke a Lambda function whenever a new file is added to the upload bucket (figure 3.9).
To configure S3, follow these steps:
If this is your first time connecting S3 to Lambda, you may see a permissions error. If that happens, you’ll need to use Lambda’s console to set up the event instead:
To test the function in AWS, upload a video to the upload bucket. Follow these steps:
After a time, you should see three new videos in the transcoded videos bucket. These files should appear in a folder rather than in the root of the bucket (figure 3.12).
Having performed a test in the previous section, you should see three new files in the transcoded videos bucket. But things may not always go as smoothly. In case of problems, such as new files not appearing, you can check two logs for errors. The first is a Lambda log in CloudWatch. To see the log, do the following:
The latest log stream should be at the top, but if it’s not, you can sort log streams by date by clicking the Last Event Time column header. If you click into a log stream, you’ll see log entries with more detail. Often, if you make an error, these logs will reveal what happened. See chapter 4 for more information about CloudWatch and logging.
If Lambda logs reveal nothing out of the ordinary, take a look at the Elastic Transcoder logs:
The next part of the job is to connect Simple Notification Service to your transcoded videos bucket. After Elastic Transcoder saves a new file to this bucket, you need to send an email and invoke two other Lambda functions to make the new file publicly accessible and to create a JSON file with metadata.
You’ll create an SNS topic and three subscriptions. One subscription will be used for email and the other two will trigger Lambda functions (you’re implementing the fan-out pattern described in chapter 2). The transcoded videos bucket will automatically create event notifications as soon as new video appears and push a notification to an SNS topic to kick-start this bit of the workflow. Figure 3.15 displays this part of the system with the SNS topic in the middle and three subscribers consuming new notifications.
Create a new SNS topic by clicking SNS in the AWS console and then selecting Create Topic. Give your topic a name such as transcoded-video-notifications.
You need to connect S3 to SNS so that when a new object is added to the transcoded videos bucket, an event is pushed to SNS. To achieve this, the SNS security policy must be modified to allow communication with S3:
Figure 3.16 shows what the updated policy looks like. Make sure to modify the SourceArn to reflect the name of your bucket. It should be in the following form: arn:aws:s3:*:*:<your bucket name>.
If you get an error message such as, “Permissions on the destination topic do not allow S3 to publish notifications from this bucket” when trying to save, double-check that you copied listing 3.5 correctly. If you get stuck, have a look at http://amzn.to/1pgkl4X for more helpful information.
One of your requirements is to get an email about each transcoded file. You have an SNS topic that receives events from an S3 bucket whenever a new transcoded file is saved in it. You need to create a new email subscription for the topic so that you can begin receiving emails. In the SNS console, follow these steps:
SNS will immediately send a confirmation email, which you must activate to receive further notifications. Going forward, you’ll receive an email whenever a file is added to the bucket.
To test if SNS is working, upload a video file to the upload bucket. You can also rename an existing file in the bucket to trigger the workflow. You should receive an email for each transcoded file.
The second Lambda function you create will make your newly transcoded files publicly accessible. Figure 3.18 shows this part of the workflow. In chapter 8, we’ll look at securing access to files using signed URLs, but for now your transcoded videos will be available for everyone to play and download.
First, create the second Lambda function in AWS the way you created the first one. This time, though, name your function set-permissions. You can follow the instructions in appendix B again. Then, on your system, create a copy of the directory containing the first Lambda function. You’ll use this copy as a basis for the second function. Open package.json and change all references of transcode-video to set-permissions. Also, change the ARN in the deploy script to reflect the ARN of the new function created in AWS.
In the second Lambda function, you’ll need to perform two tasks:
The next listing shows a reference implementation for the second function. Copy it to index.js, replacing anything that’s already there.
Having copied over the second Lambda function to index.js, perform a deployment using npm run deploy. Finally, you need to connect Lambda to SNS:
There’s still one more security issue: the role under which the Lambda function executes has permissions only to download or upload new objects to the bucket. But this role doesn’t have permission to change the object ACL. You can fix this by creating a new inline policy for the role (lambda-s3-execution-role) you’ve been using:
In a production environment, you should create separate roles for your Lambda functions, especially if they’ll use different resources and require different permissions.
Having configured role permissions, you can test the second Lambda function by uploading or renaming a video in the upload bucket. To see if the function has worked, find any newly created file in the transcoded videos bucket, select it, and click Permissions. You should see the second Grantee setting configured for Everyone with the Open/Download check box selected (figure 3.19). You can now copy the URL that’s given just above on that same page and share it with others.
If something goes wrong with the Lambda function, look at CloudWatch logs for the function. They might reveal clues as to what happened.
The third Lambda function needs to create a JSON file with metadata about the video. It should also save the metadata file next to the video. This Lambda function will be invoked via SNS just like the one before it. The problem in this function is how to analyze the video and get the required metadata.
FFmpeg is a command-line utility that records and converts video and audio. It has several components, including the excellent FFprobe, which can be used to extract media information. You’re going to use FFprobe to extract metadata and then save it to a file. This section is slightly more advanced than other sections, but it’s also optional. You’ll learn a lot by working through it, but you can skip it without affecting what you do in other chapters.
There are two ways to acquire FFprobe. The first way is to spin up a copy of EC2 with Amazon Linux, grab the FFmpeg source code, and build FFprobe. If you do that, you’ll need to create a static build of the utility. The second way is to find a static build of FFmpeg for Linux (for example, https://www.johnvansickle.com/ffmpeg/) from a reputable source or a distribution. If you decide to compile your own binaries, per the article “Running Arbitrary Executables in AWS Lambda” (http://amzn.to/29yhvpD), ensure that they’re either statically linked or built for the matching version of Amazon Linux. The current version of Amazon Linux in use within AWS Lambda can always be found on the Supported Versions page (http://amzn.to/29w0c6W) of the Lambda docs.
Having acquired a static copy of FFprobe, create the third Lambda function in the AWS console, and name it extract-metadata. Set the role for this function to lambda-s3-execution-role, timeout to 2 minutes, and memory to 256 MB. You can reduce memory allocation and timeout at a later stage when everything works. On your system, copy the second function and associated files into a new directory to create the third function. Open package.json and change all occurrences of the old function name (set-permissions) to the new one (extract-metadata). Make sure to update the ARN in package.json, as well as to reflect the ARN of the new function.
In the function directory, create a new subdirectory called bin. Copy your statically built version of FFprobe into it. You’ll be pushing Lambda to the max with this function, so make sure to include only FFprobe and not the other components. The maximum deployment package size for Lambda is 50 MB, so including too many unnecessary files may cause your deployment to fail.
The third Lambda function works by copying the video from S3 to a /tmp directory on its local filesystem. It then executes FFprobe and collects the required information. Finally, it creates a JSON file with the required data and saves it in the bucket next to the file (figure 3.20). Lambda has a maximum disk capacity of 512 MB, so this function won’t work if your videos are larger.
Listing 3.7 shows an implementation of the third Lambda function. Replace the contents of index.js with the code in the listing. Once you’ve finished, deploy the third function to AWS.
Any script or program you wish to execute in Lambda must have the right (executable) file permissions. Unfortunately, you can’t change file permissions directly in Lambda, so it must be done on your computer before the function is deployed. If you use Linux or Mac, it’s easy. Run chmod +x bin/ffprobe from a terminal command line (you must be in the Lambda function’s directory). You can then deploy the function, and FFprobe will work. If you’re on Windows, it’s trickier because it doesn’t come with the chmod command. One way you can solve this problem is by spinning up an Amazon Linux machine in AWS, copying FFprobe over, changing permissions, and then copying the file back.
You may notice that the function in listing 3.7 has many callbacks. Having numerous callbacks in a function that essentially carries out sequential operations makes it harder to read and understand. Chapter 6 introduces a pattern called async waterfall that makes composition of asynchronous operations easier to manage.
The third Lambda function needs to subscribe to the SNS topic. Create a new subscription for it just as you did for the second Lambda function:
Deploy the third function to AWS, and you’re now ready to run the whole process end to end. Upload a video to the upload bucket; you should see JSON files created and placed next to the video files in the transcoded videos bucket (figure 3.21).
You might also see a few errors in CloudWatch if you didn’t set an mp4 suffix in the S3 event configuration back in section 3.2.1. If you didn’t set the suffix, your workflow will trigger automatically whenever any new object is saved to the transcoded videos bucket. When a JSON file is saved, the workflow runs again, except the extract-metadata function doesn’t know how to deal with a JSON file, which causes an error.
To fix this problem, S3 needs to create notifications only for objects that end with mp4 so that other types of files including JSON don’t trigger the workflow:
Of course, if you did this back in section 3.2.1, you don’t need to do it again.
At the moment, 24-Hour Video is functional, but it has a number of limitations that have been left for you to solve as an exercise. See you if you can implement a solution for the following problems:
In this chapter, we covered the basics of creating a serverless back end, including the following:
In the next chapter, we’ll look at AWS security, logging, alerting, and billing in more detail. This information is important to know to create secure serverless architecture, to know where to look for answers when things go wrong, and to avoid unexpected and unwelcome surprises on the monthly bill.
18.220.127.68