Chapter 1. Azure Blob Storage

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 1 Azure Blob Storage

Overview

Microsoft Azure provides an object storage solution called Blob Storage that enables organizations to store large volumes of unstructured data in the cloud. The unstructured data can consist of text data such as log files, XML files, or binary data, such as images, audio, and video files.

The storage allows you to access the unstructured data in many ways, such as the following:

Applications can connect using REST APIs over HTTP/HTTPS.
Users and clients can connect and access the blobs using Azure portal, Azure PowerShell, Azure CLI, or Azure Storage client libraries.

Azure Blob Storage supports various programming languages, including .NET, Python, Node.js, PHP, Java, and Ruby. This enables developers to integrate Azure Blob Storage to meet the needs of various applications to extend or replace the existing underlying storage.

Key concepts

This section explains each of the components that make up the Azure Blob Storage service and the various redundancy types supported for blob storage. It also provides a better understanding of how you can connect to and access a blob storage account using blob endpoints, and of how to manage blob storage encryption and data integrity.

Storage components

Azure Blob Storage has three main components:

Storage account Every storage account in Azure has a unique namespace that helps construct the unique base address for every blob stored in that storage account. The unique base address is a combination of the storage account name and the Azure Blob Storage endpoint address.
Containers Containers are like folders in a Windows directory structure, and similar to folders, containers store blobs that are the text and binary data files.
Blobs Blobs are text or binary data, such as audio, video, log, image, csv, and other such file types. They contain the data that you want to store in that storage account. There are three blob types supported by Azure Blob Storage: Block blobs, Append blobs, and Page blobs. We will review these in more detail in the subsequent sections.

Blobs are stored in containers. Storage accounts can hold a number of containers, making it possible to host numerous blobs in a single storage account. (See Figure 1-1.) You also can create multiple storage accounts in a single Azure subscription spread across different Azure regions, depending on your needs.

**FIGURE 1-1** Account, container, and blob structure.

Storage accounts

A storage account is a unique namespace that contains containers and blobs. Every blob stored in a storage account has a unique address, which is a combination of the namespace and the Azure Blob Storage endpoint. For example, for a storage account named azureblobstorageaccount, the blob endpoint address would be https://azureblobstorageaccount.blob.core.windows.net. In turn, for a blob stored in this storage account in a container named blobcontainer, the URL address would be https://azureblobstorageaccount.blob.core.windows.net/blobcontainer/blobname.extension.

Data in a storage account is accessible from anywhere in the world over HTTP or HTTPS. Data is stored in a redundant manner and is massively scalable to accommodate your organization’s expanding needs. You can create multiple storage accounts in a single Azure subscription to meet your organization’s various redundancy, latency, and usage needs.

There are three storage account types that support blobs:

Standard general purpose v2 These storage accounts support various storage types, such as blob, files, queue storage, and table storage. This is the most commonly used storage account type for Azure Blob Storage because it provides a good balance of price, speed, redundancy, and reliability to meet general storage requirements.
Premium block blobs These storage accounts support both block and append blob types. They use solid-state disks (SSDs) to provide low latency and high input/output operations per second (IOPS). This makes them ideal for applications that require high IOPS, low latency, or the storage of large volumes of small files. For example:
- Data analytics and data querying across large datasets
- Real-time streaming analytics
- Artificial intelligence (AI)/machine learning (ML) workloads
- Internet of Things (IoT) data processing and analytics
- High-volume e-commerce businesses
Premium page blobs These storage accounts support page blobs only. Like premium block blobs storage, they use SSDs for low latency and high IOPS. They are ideal for storing virtual machine hard disks (VHDs) that require high transaction volume or need to support low-latency workloads.

Standard general purpose v2 storage accounts support the following types of storage:

Local-redundant storage (LRS)
Zone-redundant storage (ZRS)
Geo-redundant storage (GRS)
Read-access geo-redundant storage (RA-GRS)
Geo-zone-redundant storage (GZRS)
Read-access geo-zone-redundant storage (RA-GZRS)

In contrast, premium page blobs provide only locally redundant storage (LRS), and premium block blobs provide LRS and ZRS.

Storage costs for premium data storage are higher than for standard general v2 storage. However, transaction costs are lower. If you are storing a large volume of data, but interactions with that data would be limited or would not require fast response times, then a standard general purpose v2 storage account might be the right choice. However, if you need high IOPS and low latency, then the added costs of premium storage could be justified.

Containers

Containers help organize the block, page, or append blobs in a storage account. They provide a structure to the storage account similar to folders. So, you can organize related blobs together in a container or a set of containers.

Each storage account can hold an unlimited number of containers, and each container can hold an unlimited number of blobs, as long as the total size of these assets do not exceed the storage account’s overall size limits.

Container names must meet the following requirements:

Names must be between 3 and 63 characters long.
Names must start with either a number or a lowercase character.
Names can contain only numbers, lowercase characters, and dashes (-). No other special characters can be used.
Names cannot contain two or more consecutive dashes (--).
The name of every container within a storage account must be unique.

Blobs

Azure Blob Storage accounts support three types of blobs:

Block blobs These contain text and binary data files (referred to as blocks) that can be individually managed. File types include TXT, HTML, XML, JPG, WAV, MP3, MP4, AVI, PNG, and other similar text, image, audio, and video file formats. Each block blob contains multiple blocks indicated by a block ID. A single block blob can contain 50,000 blocks. At the time of this writing, the maximum block blob size is 190.7 tebibytes (TiB), assuming the latest service APIs for put operations are used. Block blobs are optimized to support efficient uploading of large amounts of data with multiple parallel data streams.
Append blobs These are block blobs optimized for append operations. They are ideal for log files. Append blob operations add blocks only to the end of a blob, ensuring no tampering can occur in the log file. Like block blobs, a single append blob can contain 50,000 blocks, but the current maximum append blob size is 195 gibibytes (GiB).
Page blobs These are optimized for random read and write operations. This makes them ideal for use as VMD files or as storage for platform as a service (PaaS) offerings, such as Azure SQL DB. Each page blob is a collection of 512-byte pages that provide the ability to read/write arbitrary ranges of bytes. The current maximum page blob size is 8 TiB. You can create both premium and standard page blobs based on your storage account type. Page blobs provide REST APIs to access and interact with the blobs. The underlying storage is extremely durable, making page blobs ideal for storing index-based and sparse data structures like disks for Azure VMs and Azure SQL DB storage.

Storage tiers

Blob Storage also provides multiple storage tiers, such as Hot, Cool, and Archive tiers. This enables data to be stored and accessed at different costs based on your differing user or application needs. This helps organizations use Azure Blob Storage to address various scenarios such as the following:

Audio and video streaming
Storing logs that require to be appended on an ongoing basis
Preserving data for backup or archival purposes
Storing and serving static content websites directly to the storage
Hosting Azure VM disks

Now that we have a brief understanding of the components, structure, and some of the use cases of Azure Blobs, let’s dive in to learn in more detail.

Storage redundancy types

Storage redundancy helps ensure that in the event of an outage, your data can be brought online and accessed within the specific period stated in your service level agreement (SLA). Outages can be planned or unplanned. An unplanned outage might occur due to a natural disaster, power outage, fire, cooling, network issues in the Azure datacenter, or storage hardware failures.

As mentioned in the “Storage accounts” section earlier in this chapter, Azure Blob Storage provides various levels of storage redundancy, depending on which storage account type you select for your Azure Blob Storage. These levels of redundancy (from least redundancy to most redundancy) are LRS, ZRS, GRS, RA-GRS, GZRS, and RA-GZRS. We’ll talk more about each of these levels in the sections that follow.

As the level of redundancy increases, the availability of your data increases, too—but so does the cost of storage. It is therefore important for you to consider your organization’s and application’s requirements, with respect to data availability and redundancy, along with the overall budget available for the storage, to select the storage account that’s best for your needs.

You can split your data across different storage accounts, providing different levels of redundancy based on the requirements of individual applications or application components. Some storage options also maintain an active read-only copy of your data in a secondary region. Before selecting this storage option, be sure your application is capable of using such read-only storage in the event of an outage. Also ensure that in the event of an outage in the primary region, your application will be available or recoverable in the secondary region by using the storage in that secondary region.

Locally redundant storage (LRS)

LRS is the cheapest redundancy option in Azure Blob Storage. With LRS storage, Azure maintains three replicas of your data in a single datacenter within your primary Azure region. Data is synchronously committed to each replica to ensure there is no data loss in the event of an outage. (See Figure 1-2.)

**FIGURE 1-2** Locally redundant storage.

Synchronously committing and maintaining three copies of your data protects against local storage hardware, server rack, or network component failures. However, because all three replicas are stored in the same datacenter, if that datacenter experiences some type of disaster, all three copies of your data could be lost. Therefore, depending on your application, redundancy, and compliance requirements, LRS might not be the best option for you.

Zone-redundant storage (ZRS)

Like LRS, ZRS synchronously commits and maintains three replicas of your data in your primary Azure region. However, instead of storing each replica in a single datacenter, they are spread across three availability zones. (See Figure 1-3.) An availability zone is an independent datacenter in your Azure primary region with its own power, cooling, and networking components. So, if a disaster occurs in one availability zone, your data will still be accessible (unless the disaster also affects the other availability zones in that region).

If an outage occurs in one availability zone, ZRS relies on automated network changes on the Microsoft back end to divert DNS endpoints from one zone to another, which could involve a small gap in availability. This could affect your application’s performance if it is not configured to retry connections in the event one attempt to connect fails. Still, if your organization has data governance requirements that limit the storage of data within specific geographical regions, then ZRS might not be the most appropriate option for your environment.

Geo-redundant storage (GRS)

With GRS, Azure synchronously commits and maintains three replicas of your data in your primary Azure region in LRS. Then, three more replicas of your data in a secondary Azure region (selected automatically by Microsoft) are updated to match the three replicas in the primary Azure region, again using LRS. So, you have six copies of your data spread across two geographical regions that are hundreds of miles apart. (See Figure 1-4.)

If the datacenter in your primary region experiences an outage or disaster, then your data will be available in the datacenter in the secondary region. However, the data in the secondary region might not be available for read or write operations until the storage has failed over to the secondary region. Azure Blob Storage has a Recovery Point Objective (RPO) of less than 15 minutes for geo-replication, but there is currently no SLA on how long it takes to replicate data to the secondary region. Also, in the event of a disaster, there is a chance of some data loss if not all write operations have been replicated over to the secondary region.

Geo-zone-redundant storage (GZRS)

GZRS is just like GRS, but the three replicas of your data in the primary region use ZRS, while the replicas in the secondary region use LRS. So, there is additional redundancy in the primary region. (See Figure 1-5.) The SLA for GZRS is similar to GRS.

**FIGURE 1-5** Geo-zone-redundant storage.

Read-only geo-redundant storage (RA-GRS) and read-only geo-zone-redundant storage (RA-GZRS)

RA-GRS and RA-GZRS function in the same manner as GRS and GZRS, respectively. The only difference is that RA-GRS and RA-GZRS provide the ability to perform read operations on the secondary region in case of an outage in the primary region. (See Figure 1-6.) This allows your application to function partially while the storage is failed over to the secondary site. Also, while Microsoft manages the failover of the geo-redundant storage in the event of a disaster in the primary Azure region, you can perform a manual failover to the secondary region if you are using a standard general purpose v2 storage account.

**FIGURE 1-6** Read-only geo-redundant storage and read-only geo-zone-redundant storage.

Storage endpoints

Every Azure Blob Storage account has a storage endpoint, accessible from an HTTP/HTTPS connection, that provides access to blobs stored in that account. The URL for the storage endpoint is a combination of the storage account namespace and a static predefined suffix. For Azure Blob Storage accounts, this is https://<storage-account-name>.blob.core.windows.net. (This is why, when you define a name for your storage account, it is validated against all existing storage accounts globally in Azure to ensure it is unique.)

The URL for a particular blob simply appends the container and blob name to the storage endpoint URL. For example, a blob named blob01 stored in a container named blobcontainer in a storage account named myblobstorageaccount would have the URL https://myblobstorageaccount.blob.core.windows.net/blobcontainer/blob01.

Storage encryption for at-rest data

Azure Blob Storage accounts use AES 256-bit encryption to transparently encrypt and decrypt data. Encryption is applied on the underlying disks, similar to BitLocker encryption on Windows. As a result, the end client does not require access to the key to read or write from the storage account. This ensures that the underlying disks cannot be read when removed from the storage in the Azure storage cluster without access to the encryption key.

Azure Storage encryption is enabled by default on all Azure Blob Storage accounts and cannot be disabled. This applies regardless of the storage redundancy selected and the storage tier selected. This encryption extends even to the object metadata. This is offered at no additional charge.

The keys used for encryption can be Microsoft-managed, customer-managed, or customer-provided. Customers can select which type to use based on their organizational requirements for handling data within each storage account. Customer-managed keys must be stored in Azure Key Vault or Azure Key Vault Managed Hardware Security Model (HSM). With customer-provided keys, the client connecting to the blob for a read or write operation can provide the key along with the access request to allow the data to be encrypted and decrypted at that time.

Azure Blob Storage also offers infrastructure encryption, which offers encryption on the infrastructure level and on the storage service level. This uses 256-bit AES encryption. With infrastructure encryption, the encryption keys are different from the ones used for service-level encryption, even if Microsoft is managing them both. This ensures that a breach in one level does not compromise the other level. You cannot use customer-managed keys for infrastructure encryption.

Depending on which option you choose, as a best practice, you might need to devise a key-hosting and rotation strategy to ensure keys are rotated on a regular basis but can still be accessed by the blob storage service for read/write operations. If you choose Microsoft-managed encryption keys, then Microsoft ensures those keys are available to the service for operational use and rotates the keys on a regular basis. (You cannot change the frequency at which this occurs.)

Storage data integrity

Azure regularly verifies data stored in an Azure Blob Storage account using cyclic redundancy checks (CRCs). These check for data corruption or integrity issues. If any such issues are detected, repairs are performed using the redundant data copies.

Storage account walkthrough

The following sections step you through the process of setting up an Azure Blob Storage account using the Azure portal, Azure PowerShell, and Azure CLI.

Using Azure portal

To set up an Azure Blob Storage account using the Azure portal, follow these steps:

Log in to the Azure portal, type storage in the search box, and select storage accounts from the list that appears. (See Figure 1-7.)

FIGURE 1-7 Search for storage accounts in the Azure portal.
On the Storage Accounts page (see Figure 1-8), click the Create Storage Account button.

FIGURE 1-8 Create a storage account.

Note

If this is not the first storage account you’ve created using this subscription, you won’t see the button shown in Figure 1-8. In that case, click the Create button near the top of the Storage Accounts page.
In the Basics tab of the Create a Storage Account wizard (see Figure 1-9), enter the following information and click Next:
- Subscription Select the subscription in which you want to create the Azure Blob Storage account.
- Resource Group Select an existing resource group or create a new one in which to create the Azure Blob Storage account.
- Storage Name Enter a unique name for the storage account.
- Region Select the Azure region you want to host the storage account.
- Performance Select the Standard or Premium option button, depending on your needs.
- Redundancy Select the redundancy type you want to use for the storage.
- Make Read Access to Data Available in the Event of Regional Unavailability Select this check box.
FIGURE 1-9 The Basics tab of the Create a Storage Account wizard.
In the Advanced tab (see Figure 1-10), select the Default to Azure Active Directory Authorization in the Azure Portal check box, leave the other options set to their default values, and click Next.

FIGURE 1-10 The Advanced tab of the Create a Storage Account wizard.
In the Networking tab of the Create a Storage Account wizard (see Figure 1-11), for the sake of example, leave the Network Connectivity and Network Routing options set to their default values (Enable Public Access from All Networks and Microsoft Network Routing, respectively) and click Next.

FIGURE 1-11 The Networking tab of the Create a Storage Account wizard.
In the Data Protection tab (see Figure 1-12), leave the options set to their default values (unless your organization data-protection requirements dictate otherwise) and click Next.

FIGURE 1-12 The Data Protection tab of the Create a Storage Account wizard.
In the Encryption tab (see Figure 1-13), select the Enable Infrastructure Encryption check box, leave the other options set to their default values, and click Next:

FIGURE 1-13 The Encryption tab of the Create a Storage Account wizard.
In the Tags tab (see Figure 1-14), enter any tags you want to associate with the Azure Blob Storage account and click Next.

FIGURE 1-14 The Tags tab of the Create a Storage Account wizard.
In the Review tab (see Figure 1-15), review your settings. Then click Create to create the Azure Blob Storage account.

FIGURE 1-15 The Review tab of the Create a Storage Account wizard.
After the account is created, click Go to Resource to access the new account’s page. (See Figure 1-16.)

FIGURE 1-16 Storage deployment completion.

Your next step is to create a container inside the new storage account.
In the left pane of the Azure Blob Storage account page, click Containers. Then click the Container button in the pane that opens on the right. (See Figure 1-17.)

FIGURE 1-17 The Containers page for the new storage account.
On the New Container page, enter the following details and click Create. (See Figure 1-18.)
- Name Enter a unique name for the container.
- Public Access Level Select Private.
- Encryption Scope Leave these options set to their default values.
FIGURE 1-18 Create a new container.
The new container appears in the account’s Container page. (See Figure 1-19.) Now you’re ready to upload a blob (files) to the new container.

FIGURE 1-19 The new container appears in the Container page.
Click the container.
In the right pane of the container’s Overview page (see Figure 1-20), click the Upload button.

FIGURE 1-20 Start the blob upload.
In the Upload Blob dialog box (see Figure 1-21), enter the following information (leave the rest of the options set to their default values) and click Upload:
- Files Select the files to upload. You can select a single or multiple files.
- Overwrite If Files Already Exist Since this is the first upload, leave this unchecked.
- Authentication Type Select Account Key.
FIGURE 1-21 Upload Blob dialog box.
When the upload is complete, the file(s) you selected will appear in the container’s Overview page. (See Figure 1-22.)