Amazon Simple Storage Service (S3) is a highly reliable web service that allows you to securely store and retrieve object data in the AWS cloud. After Amazon EC2, Amazon S3 is one of the most commonly used services. Data on Amazon S3 is spread across multiple devices and availability zones within a region automatically.
Amazon S3 is an object-based storage service (not block-based). It is ideal for storing files but cannot be used to install an operating system; thus, it cannot provide the storage for an EC2 instance.
Data within Amazon S3 is stored using a key-value system, with keys being globally unique. There is no limit to how much data can be stored on Amazon S3; however, the maximum size of a single file cannot exceed 5 TB.
In this section, you learn some of the key concepts you will encounter when working with Amazon S3.
A bucket is a folder on Amazon S3 where you can store your files. Bucket names are globally unique; therefore, no two users can own a bucket with the same name. Amazon S3 does not internally implement a hierarchical file system similar to what you encounter on your computer's operating system. All files across all Amazon S3 buckets are stored within a global flat file system. However, your bucket names can contain the forward path delimiter character (/). Therefore, you can name your buckets in such a way so as to create the appearance of a nested folder structure.
For each bucket you create, you can set up permissions that control who can access the bucket and what they can do with the bucket. Each object you store in an Amazon S3 bucket has an object key and metadata associated with it.
An object key is a sequence of UTF-8 characters that identifies an object in an Amazon S3 bucket. The key is assigned to the object when it is first uploaded into an Amazon S3 bucket and can be up to 1024 bytes long.
The key name is basically the name of the file you have uploaded to the bucket. Amazon S3 internally stores data alphabetically, which means files with similar names are stored next to each other on the same physical disks. This can be an important factor to consider if the files you are planning on storing in Amazon S3 are going to have some kind of sequential naming scheme, or share a common prefix with each other. If this is the case, you could encounter performance bottlenecks when reading the data out of Amazon S3; you may want to consider naming the files differently or adding a short random string, or a timestamp to the start of the filename.
The object value is the data that you are storing. It is a sequence of bytes and can be up to 5 TB in length.
The version ID is a string value that identifies the version of the object. Amazon S3 assigns a version ID when you upload an object to a bucket. If object versioning is subsequently enabled, every update to the object creates a new version ID. Together, the object key and the version ID uniquely identify an object.
Each object in Amazon S3 has a storage class associated with it. The storage class determines how Amazon S3 stores the data for the object and if you will be charged additional costs to read the data. Amazon S3 offers the following storage classes:
Amazon charges you for the following aspects when you use Amazon S3. The specific costs differ between regions.
https://aws.amazon.com/about-aws/whats-new/2016/11/revolutionizing-s3-storage-management-with-4-new-features/
.You can visit the following site to get an idea of the difference in access times with and without Amazon S3 Transfer Acceleration:
http://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-comparsion.html
To get an updated list of charges, visit https://aws.amazon.com/s3/pricing/
.
Every bucket and object in Amazon S3 has a set of subordinate objects associated with it. These subordinate objects are called subresources of the object. Subresources cannot exist on their own; they are always associated with a bucket or an object. When this chapter was written, two subresources were associated with Amazon S3 objects:
.torrent
file associated with the specific resource.Two kinds of metadata are associated with each object in Amazon S3: system-defined and user-defined.
As the name suggests, system-defined metadata is automatically maintained by Amazon S3 and includes information such as object creation date, object size, and more. Users cannot edit all system-defined metadata fields. Table 9.1 lists the system-defined metadata fields associated with an object.
TABLE 9.1: Amazon S3 System-Defined Metadata
NAME | DESCRIPTION | USER EDITABLE |
Date | Date when the object was created. | No |
Content-Length | Size of the object in bytes. | No |
Last-Modified | Date when the object was last modified (or created if the object has never been modified). | No |
Content-MD5 | MD5 hash of the object. | No |
x-amz-server-side-encryption | Indicates whether server-side encryption is enabled for the object and which service is providing the encryption. | No |
x-amz-version-id | The version number of the object, only applicable to objects that have versioning enabled. | No |
x-amz-delete-marker | Only applicable to objects that have versioning enabled; for such objects this field indicates whether the object is a delete marker. | No |
x-amz-storage-class | Storage class used for storing the object. | Yes |
x-amz-website-redirect-location | If configured, allows you to redirect requests for the object to another object or external URL. | Yes |
x-amz-server-side-encryption-aws-kms-key-id | Applicable only if server-side encryption is enabled on the object. Contains the ID of the encryption key that encrypted the object. | Yes |
x-amz-server-side-encryption-customer-algorithm | Indicates if server-side encryption is enabled on the object using customer-provided keys. | Yes |
User-defined metadata is any additional key-value metadata provided by the user when the object was created.
In this section, you learn to use the AWS management console to create Amazon S3 buckets and manage the content in these buckets. Log in to the IAM console using your dedicated IAM user-specific sign-in link and navigate to the Amazon S3 service home page (Figure 9.1).
To create a new Amazon S3 bucket, follow these steps.
If you have existing buckets in your Amazon S3 account, you will be presented with a page that lists them (Figure 9.3).
In this section, the name of the bucket being created is com.asmtechnology.samplebucket
and is located in the EU (Ireland) region. The name you choose for your bucket must be globally unique, and prefixing a reverse domain name is a common practice to ensure unique naming.
Access to Amazon S3 resources are controlled using resource-based IAM policies. A resource-based IAM policy is a JSON document that describes which IAM users have access to a resource, and what they can do with the resource. Amazon S3 buckets and objects within the buckets have independent resource-based policies, and objects do not inherit permissions from a bucket.
Each bucket also has an XML document associated with it, called an access control list (ACL). The ACL is used to control access to the bucket from other AWS accounts, and the general public.
It is highly recommended to leave the default options unchanged on this screen, and change them (if needed) at a later point in time. Click Next to proceed.
If you have one or more buckets, this screen becomes the home screen presented to you whenever you visit the Amazon S3 console.
Complete these steps to upload an object to an existing bucket.
Once the file has finished uploading, it appears in your bucket (Figure 9.14).
To download an object from your Amazon S3 bucket onto your computer, follow these steps:
If you do not want to use the management console, you can also access any object in Amazon S3 using a URL.
The value within the Object URL field is a URL that follows this naming convention:
https://s3.<region name>.amazonaws.com/<bucket name> /<file name>
For example, a file called sunset.jpg
, in a bucket called com.asmtechnology.awsbook.testbucket1
, in the eu-west-2 region can be accessed using the following URL:
https://s3.eu-west-2.amazonaws.com/com.asmtechnology.awsbook.testbucket1/sunset.jpg
If both the bucket and the file you are accessing are not publicly accessible, you will receive an access denied error when you try the URL in a web browser (Figure 9.17).
If you retry the URL in a web browser, you will be able to access the file. You can either set permissions at an individual object level, or you can set up permissions for the entire bucket.
The default storage class of objects on Amazon S3 is Standard. To change the storage class of an object:
To delete an object from an Amazon S3 bucket:
Once you delete an object, it is permanently removed from Amazon S3. The only exception to this rule occurs when versioning has been enabled on a bucket, in which case an object that has been deleted from a bucket can be restored.
Versioning is a bucket-level concept that, when enabled, stores all versions of an object. You can download an older version of an object, and you can even recover an object after it has been deleted. Once versioning is enabled on a bucket, you cannot remove it. You can, however, temporarily suspend versioning.
To enable versioning on a bucket:
To understand how versioning works:
welcome_letter.txt
and in that document type the following line:
Welcome to the world of Amazon Web Services.
welcome_letter.txt
file that you had previously saved on your computer, and edit its contents to resemble the following:
Welcome to the world of Amazon Web Services.
Amazon S3 versioning allows you to access older versions of documents.
welcome_letter.txt
to reveal a pop-up dialog with options. Expand the versions drop-down menu in the pop-up dialog to reveal links to the different versions of the document (Figure 9.26).
The newest version of the document is always listed at the top. It is important to remember that you are charged for the combined space occupied by all versions of a document.
When versioning is enabled on a bucket, you will see an additional selector that allows you to view all versions of the objects in your bucket (Figure 9.27).
When the selector switch is set up to show versioned objects, you can see not only object versions, but also delete markers, which are special entries used to indicate that an object has been deleted. Restoring a deleted object is simply a matter of deleting the delete marker.
You can use the AWS CLI to interact with Amazon S3 over the command line. Setup and configuration of the CLI client for Mac OS X and Windows is covered in Appendix C.
The general syntax of the aws
command follows:
$ aws <service identifier > <service instructions>
The service identifier is a string that identifies an AWS service you want to interact with. The service identifier for Amazon S3 is s3
(in lowercase). Each service supports a different list of instructions. For a complete list of s3 instructions that are available within the CLI, visit http://docs.aws.amazon.com/cli/latest/userguide/using-s3-commands.html
.
As an example, the ls
instruction retrieves a list of all buckets in the user account that have been configured into the CLI. If you type the following instruction at the command prompt:
$ aws s3 ls
you receive a list of buckets:
Abhisheks-MacBook:~ abhishekmishra$ aws s3 ls
2017-01-15 16:52:59 com.asmtechnology.awsbook.testbucket1
Abhisheks-MacBook:~ abhishekmishra$
In addition to the high-level operations that can be performed using the s3
service identifier, Amazon also provides access to lower-level operations using the s3api
service identifier. For more information on lower-level operations that can be performed on Amazon S3 buckets using the s3api
service identifier, visit http://docs.aws.amazon.com/cli/latest/userguide/using-s3api-commands.html
.
3.12.107.81