Revision

Let's have a bit of a recap to see why S3 would be such a good fit for storing objects in a serverless application. First, why am I talking about objects in a storage system instead of files? S3 is an object storage system as opposed to block storage with a file system layered on top. Each piece of data is considered an object, and these are stored with their metadata in a flat pool of storage. This is distinct from a filesystem since filesystems are hierarchical in the way in which files and folders are designated. Because of this flat design, an object storage system has an advantage when it comes to scaling capability and data retrieval performance.

This flat design is fine in theory, but, in practice, we do need to group objects into areas for easier management and organization of data. For this, we can use S3 object prefixes, which look much like a folder! An example object prefix include raw-data/ or processed/, which comes before the key name. Objects with the same prefix will be grouped into the same area. From an application development point of view, we don't need to know about the nuances of supporting and interfacing with the filesystem. S3 has a public API that we can use to perform management tasks, as well as object storage.

S3 comes with all the benefits of a serverless service. There is no need to run your own storage clusters or worry about how much redundancy you're building in. Its capacity is limitless. Here's a random fact for you: Amazon actually orders their storage in kilograms, not data volume. Bandwidth is high and there is no aggregated throughput limit, so there is no need to worry about S3 being a performance bottleneck in your account.

As an added benefit, you can also enable versioning on your objects in order to protect them from accidental overwrites and other cases. S3 integrates with AWS Key Management Service so that you can protect the objects in your bucket with strong encryption.

Finally, to give you flexibility in the solutions you build, S3 has several storage classes that have different characteristics:

Standard: Objects are replicated across a minimum of three availability zones within a region.
Standard with infrequent access: Lower price point than standard for a lower availability SLA.
One zone with infrequent access: Data stored in a single availability zone for lower availability and durability, at a lower cost.
Glacier: Designed for long-term storage for data archiving and backups.
Glacier deep archive: Even lower-cost storage designed to archive data for years.
Intelligent tiering: Automatically moves your data between the storage classes to achieve a price that is optimized for your data access patterns.

Now that we are up to speed with the S3 service, let's move on and learn how we can use it as an event source for Lambda.

Table of Contents for Revision

Create new playlist

Sign In

Sign Up

Table of Contents for
Revision