Chapter 5

Cloud Storage Infrastructure

Images

CERTIFICATION OBJECTIVES

5.01     Storage Media

5.02     Cloud Storage Configuration

5.03     Databases in the Cloud

5.04     Content Delivery Networks

Images         Two-Minute Drill

Q&A    Self Test


Data storage continues to be a large part of cloud computing. Cloud service providers support software-defined storage, which means cloud customers are spared the complexities of the underlying storage infrastructure when provisioning cloud storage. This chapter introduces you to various types of storage media that can impact cloud storage performance and cost.

We will start with a discussion of cloud storage configurations, such as replication and disk performance settings, that can be enabled to address business needs. Next, we’ll identify differences between SQL and NoSQL databases. Finally, we’ll examine how data can be copied to geographical regions to place content near users that need it.

CERTIFICATION OBJECTIVE 5.01

Storage Media

Data stored in electronic random access memory (RAM) is referred to as being volatile; in other words, its retention depends on the constant flow of electricity to the computer, and it is erased when the computer is turned off. Storage media is referred to as nonvolatile; it does not need a constant flow of electricity to retain the data. In the cloud, both RAM and storage media are configurable. Figure 5-1 shows an example of how you can determine the amount of RAM allocated to a virtual machine and the type of storage used.

FIGURE 5-1     Microsoft Azure virtual machine sizing options

Images

Cloud service providers are responsible for the physical storage infrastructure upon which cloud storage services are made available to customers. Physical storage comes in a variety of different drive types with varying configuration settings.

Drive Types

Hard disk drive (HDD) storage uses spinning disk platters and stores data magnetically. This type of storage media has moving parts. Solid state drives (SSDs) use flash memory instead of spinning disk platters and are considered quieter and faster than HDDs. As you might expect, choosing SSD in the cloud costs more than choosing HDDs.

IOPS

Disk throughput is measured in input/output operations per second (IOPS). Figure 5-2 shows virtual machine OS disk options where Premium SSD provides more throughput than Standard HDD. More IOPS means better disk performance, and you should consider whether to increase IOPS depending on the VM workload requirements.

FIGURE 5-2     Microsoft Azure virtual machine disk options

Images

Cloud service providers often arrange their vast storage arrays into dedicated storage area networks (SANs) with Redundant Array of Independent Disk (RAID) configurations to increase performance and resiliency against disk failure. There are many levels of RAID configurations. The following list includes three common RAID levels:

Images   RAID 0 (disk striping) Uses multiple physical disks working as one (striping) to improve disk I/O, but provides no resilience to disk failure because the failure of a single disk renders the entire disk array unavailable.

Images   RAID 1 (disk mirroring) Copies data to a secondary disk when it is written to the primary disk and can tolerate the failure of one disk since the second disk has a complete copy of data from the primary disk.

Images   RAID 5 (disk striping with distributed parity) Improves disk I/O and can tolerate the failure of one disk since data parity information is never stored on the same disk with the data. Parity is error recovery information. This parity information is used to rebuild data that resided on a failed disk in the array.

SAN policies can be set to compress data written to the RAID array using techniques such as data deduplication, which reduces duplicate data blocks to single occurrences. Of course, you as the cloud customer can also enable these types of disk options within your cloud-based virtual machines. For example, you can enable compression for files and folders to reduce operating system disk space consumption within a Windows Server 2019 cloud-based virtual machine.

Images

Make sure to monitor VM performance metrics over time. It’s common to set VM resources such as virtual CPU and the amount of RAM high initially only to realize over time that less compute power is still sufficient, which reduces cloud computing costs.

CERTIFICATION OBJECTIVE 5.02

Cloud Storage Configuration

One great feature of cloud storage is that it offers elasticity along with rapid provisioning, otherwise called capacity on demand. A cloud customer can acquire additional storage capacity with the click of a mouse or by issuing a command, which is much faster and often much cheaper than having to physically acquire storage media without cloud computing. Additionally, the cloud customer’s data is stored offsite, which provides further backup protection from problems like on-premises theft, floods, and fires.

Some CSPs support the notion of VM managed disks. This means the CSP automatically manages the configuration and scaling of the underlying storage of VM virtual hard disks. When you’re working with clusters of virtual machines on a large scale, this can make your job much easier.

Depending on your storage needs, you can configure cloud storage settings for frequently versus infrequently accessed data. If availability and security are important, replication and encryption can be enabled. These types of cloud storage settings are detailed in the next sections.

File and Object Storage

File storage uses a file system to organize stored items. File systems are organized into a hierarchy, with a root folder containing subordinate folders and, ultimately, files within the folders.

In the cloud, you can configure file storage using services such as Microsoft Azure Files. Azure Files is essentially a Server Message Block (SMB) shared folder implementation in the cloud. Client devices can connect to the Azure Files shared folder as they would for an on-premises folder using standard drive mapping commands (Windows) or a mount point (Linux), although on-premises firewalls might need to be adjusted to allow SMB traffic, which normally uses TCP port 445. Figure 5-3 shows a Windows machine using the net use command to map drive letter P: to an existing Azure file share. Creating Azure file shares is accomplished using either the Azure portal GUI or command-line tools.

FIGURE 5-3     Mapping a drive letter to a Microsoft Azure file share

Images

Object storage does not have to adhere to a filing hierarchy as a traditional file system does—in other words, object storage uses a flat storage structure that can be distributed across various platforms. Each object is given a unique identifier that is used to quickly locate the item, whereas file storage uses the directory path to locate the stored item. Accessing object storage is normally done over HTTP using the Representational State Transfer (REST) API instead of older network file access methods such as File Transfer Protocol (FTP) and Secure Shell FTP (SFTP). Block binary large objects (blobs) are used to store common office productivity types of files such as JPGs and PDFs, whereas page blobs are for random reading and writing to files such as VM hard disk files.

Hot and Cold Storage

Cloud storage settings are all about requirements. If your organization requires frequent access to cloud-stored data, you should enable hot storage, which offers higher performance to achieve quicker data access. CSPs will use different terminology for this type of storage, such as the Amazon Web Services (AWS) variations shown in Figure 5-4: Provisioned IOPS SSD or Throughput Optimized HD for hot storage as opposed to Cold HDD.

FIGURE 5-4     Hot and cold AWS disk volume types

Images

Using cold storage is cheaper than using hot storage, and thus cold storage should be enabled for data that will be accessed infrequently. Data retrieval is slower with cold storage, which is why cold storage costs less. For longer-term archiving, consider options such as AWS Glacier, which is even less expensive than standard cold storage. AWS Glacier data retrieval time can range from minutes to hours.

Replication and Encryption

To provide additional resiliency to failure, you can enable cloud storage replication. This is conceptually similar to storing on-premises backup tapes at an alternative location for safety, or replicating data to a different company data center. In disaster recovery terms, storage replication relates to the recovery point objective (RPO), which specifies the maximum amount of tolerable data loss for the organization. Figure 5-5 shows cloud storage replication settings, including geo-redundant storage (GRS). GRS replicates cloud data to an alternative geographical region.

FIGURE 5-5     Microsoft Azure storage account replication options

Images

Images

While backups are related to the RPO, they are also related to the Recovery Time Objective (RTO) which specifies the maximum tolerable amount of downtime for a service or data. The exam could emphasize how quickly data can be restored from backup (RTO), or the emphasis could be on how often backups should occur (RPO).

Legal and regulatory compliance sometimes dictates that data at rest must be protected through encryption, in addition to protection of data in transit using Hypertext Transfer Protocol Secure (HTTPS). Most CSPs support Advanced Encryption Standard (AES) 256-bit encryption, which is required by the U.S. federal government to protect sensitive information.

CSPs can provide encryption keys, but your organization might require full control of custom encryption keys. Figure 5-6 shows how cloud customers can opt to use their own encryption keys in Azure.

FIGURE 5-6     Microsoft Azure storage account customer encryption keys

Images

CERTIFICATION OBJECTIVE 5.03

Databases in the Cloud

There are a few methods through which cloud databases can be established:

Images   Migrated from on premises

Images   Manually installed and configured within cloud virtual machines

Images   Deployed as a managed cloud service

Database migration is used to copy on-premises database objects and data into the cloud. Before migrating on-premises databases, you should conduct an on-premises assessment to provide assurances of cloud readiness. One tool for conducting this type of assessment is the Microsoft Data Migration Assistant, shown in Figure 5-7. (Refer to Exercise 3-1 for instructions on downloading, installing, and running the Microsoft Data Migration Assistant.)

FIGURE 5-7     Microsoft Azure Data Migration Assistant

Images

The option to manually install and configure a cloud database is in stark contrast to using a managed Database as a Service (DBaaS) solution. A DBaaS solution takes care of all the underlying complexities of managing the database and allows cloud users to focus on using the database itself. Manually installing and configuring a cloud database solution involves provisioning virtual machines and installing database software, as well as updating operating system and database software.

Database Types

One aspect of cloud data planning is determining whether to use a Structured Query Language (SQL) database or a NoSQL type of database; this decision depends on what will be stored in the database and how it will be used, as summarized in Table 5-1.

TABLE 5-1   Comparison of SQL and NoSQL Databases

Images

The amount of compute horsepower for underlying virtual machines supporting a database is dependent on the usual suspects:

Images   Disk IOPS

Images   Amount of RAM

Images   Number of vCPUs

Images   Number of disks

Some cloud service providers such as Microsoft Azure lump these items together into Database Transaction Units (DTUs), as shown in Figure 5-8. Virtual machine disks are referred to as block storage. Virtual machine disks can be attached and detached from cloud virtual machines, much like you would physically attach or detach physical disks to and from physical servers.

FIGURE 5-8     Microsoft Azure SQL Database DTUs

Images

CERTIFICATION OBJECTIVE 5.04

Content Delivery Networks

Because CSPs have servers placed around the globe, cloud customers should be thinking about how this can serve their interests. Specifically, placing content geographically near the user base that will be accessing that content, such as for large media files stored on a website, could be beneficial to a cloud customer. This is where a content delivery network (CDN) comes in.

CDNs allow data to be copied (cached) to various geographical regions to reduce network latency for users accessing that content. You can configure wildcards to cache only specific types of media that tend to have large file sizes, so to cache only .AVI video files, for example, you could specify *.AVI.

But is this copying a one-time thing? What if the source content changes? One CDN configuration item is the Time To Live (TTL) value. This is normally represented in seconds and determines how long before a cached CDN item is checked against the source item to detect changes. User requests for content are satisfied from data in the CDN cache, or if it hasn’t yet been cached, the content is fetched from the origin server. Once the TTL for CDN cached content expires, the content is refreshed from the origin server. For data that is rarely modified (static data) such as product lists, the TTL should be set in accordance with how often changes are made to the data. For dynamic data that changes often, a shorter TTL should be configured.

Directing user requests to the nearest CDN endpoint is automatic. References to CDN cached objects, such as for a link to a file on a website, use the CDN Domain Name System (DNS) domain name. If you want to use your own custom DNS domain names, you can create a DNS CNAME, or alias record, that points to the CSP-assigned CDN name.

Images

The exam will often test your ability to apply cloud solutions such as CDNs to business needs. Be prepared for storage solutions that seem similar yet serve different needs. For example, both CDNs and geo-redundant replication copy data to alternative geographical locations. Replicated CDN data is only periodically refreshed based on the TTL and is used to improve user access time to content. Geo-redundant storage replication provides data high availability; data is replicated when it is modified and not based on a TTL value.

EXERCISE 5-1

Create a Microsoft Azure Storage Account

In this exercise, you will create a Microsoft Azure cloud storage account. This exercise depends on having successfully completed Exercise 1-1.

1.   Using a web browser, sign in to the Azure portal at https://portal.azure.com.

2.   At the top of the navigation pane on the left, click the Create a Resource button, as shown in Figure 5-9.

FIGURE 5-9     Creating a resource in the Microsoft Azure portal

Images

3.   In the Search field, type storage account, select Storage Account from the search results list, then click Create.

4.   Configure the storage account with the following settings (accept the default values for all other settings):

Images   Resource group: Create a new one named ResGroup1

Images   Storage account name: storacct1289

5.   Click Review + Create, then click Create.

6.   When the deployment is complete, click the Go to Resource button to view the storage account.

EXERCISE 5-2

Upload Content to a Microsoft Azure Storage Account

In this exercise, you will use the Azure portal to upload sample files to an Azure storage account. This exercise depends on having successfully completed Exercise 5-1.

1.   Using a web browser, sign in to the Azure portal at https://portal.azure.com.

2.   In the left-hand navigator, click Storage Accounts.

3.   On the right, click storacct1289 to open the storage account properties.

4.   Scroll down the properties navigation bar and click Blobs.

5.   On the right, click the +Container button.

6.   Name the container samplefiles and click OK.

7.   Click the samplefiles container, then click the Upload button, as shown in Figure 5-10.

FIGURE 5-10   Uploading a blob to a storage account

Images

8.   Click the folder icon and specify some local sample files to upload, then click the Upload button.

9.   After the upload completes, click the name of the uploaded file and verify that server-side encryption is set to a value of true. Microsoft Azure encrypts all storage account blobs by default, as shown in Figure 5-11.

FIGURE 5-11   Storage account blob properties

Images

INSIDE THE EXAM

Cloud Storage and Hands-On

To prepare for answering questions related to cloud storage on the CompTIA Cloud Essentials+ CLO-002 exam, it’s helpful to experiment. Create cloud storage accounts, upload sample data, and configure settings such as encryption and replication.

CERTIFICATION SUMMARY

This chapter focused on cloud storage settings that align with business needs. When configuring cloud storage details, how the storage will be used is a major factor in determining whether high performance is important, in which case SSD hot access storage should be enabled. However, as you learned, the SSD option is more expensive than the HDD option because SSDs are faster than HDDs.

When choosing cloud storage options, the disk IOPS measurement represents overall disk I/O performance; opting for more IOPS results in better performance, which in turn means increased cost for the cloud storage service.

Next, hierarchical file storage such as Microsoft Azure Files was compared to flat object storage. Microsoft Azure Files provides cloud-based shared folders. Cloud-based virtual machine virtual disks are referred to as block storage.

We then discussed how replicating cloud storage to multiple regions increases data availability in the event of a regional disaster. Next, we talked about encrypting cloud data using either CSP keys or customer keys.

We next compared SQL databases, which are best suited for related data stored in separate tables, to NoSQL databases, which use a less rigid schema and can store vast amounts of unstructured data. You also learned that managed database solutions spare the cloud customer from the complexities of deploying and managing the underlying database structure.

Finally, you learned about content delivery networks and how they place content near the users that request it, thus improving the user experience by reducing network latency.

Images TWO-MINUTE DRILL

Storage Media

Images  HDD-based cloud storage is slower than SSD-based cloud storage but is less expensive.

Images  SSD-based cloud storage is best suited for intensive disk I/O usage.

Images  Disk IOPS is a measurement of disk throughput; a higher value means better performance.

Cloud Storage Configuration

Images  Managed disks remove the need for cloud customers to provision storage for cloud VM disks.

Images  CSP file-based solutions are similar to on-premises shared folders.

Images  Accessing Microsoft Azure Files shared folders occurs over TCP port 445.

Images  CSP object-based storage is flat compared to file system hierarchies.

Images  Network access to cloud-based storage is normally done over HTTP using the REST API.

Images  Common file types such as text and media documents are stored as block blobs.

Images  Virtual machine disk files are commonly stored as page blobs.

Images  Hot storage should be used for data that will be accessed frequently.

Images  Cold storage should be used for data that will be accessed infrequently.

Images  Cloud storage replication creates additional copies of data for increased resiliency to failure.

Images  Cloud customers can use custom encryption keys to secure data at rest.

Databases in the Cloud

Images  On-premises databases should be assessed for cloud readiness with a tool such as Microsoft Data Migration Assistant and then be migrated to the cloud.

Images  Managed databases remove the underlying infrastructure complexity from the cloud customer; this often referred to as Database as a Service (DBaaS).

Images  SQL-compliant databases use a structured data schema and are best suited to store related data stored in separate tables.

Images  NoSQL-compliant databases are designed to accommodate vast amounts of unstructured data.

Images  Microsoft SQL Server access occurs over TCP port 1433.

Images  Access by NoSQL databases such as MongoDB occurs over TCP port 27017.

Images  Database Transaction Units (DTUs) are a performance unit consisting of vCPUs, amount of RAM, and disk IOPS.

Content Delivery Networks

Images  CDNs copy (cache) data to different geographical locations near users to improve the user experience.

Images  CDN Time To Live (TTL) values determine how long before the source data is checked for changes.

Images  DNS CNAME (alias) records point to other DNS records.

Images SELF TEST

The following questions will help you measure your understanding of the material presented in this chapter. As indicated, some questions may have more than one correct answer, so be sure to read all the answer choices carefully.

Storage Media

1.   You are planning how cloud storage will address business needs. Choosing which cloud storage option will have the largest positive impact on performance?

A.   Capacity

B.   Storage media brand

C.   Solid state drives

D.   FTP access

2.   Which data storage characteristic is the most closely related to minimizing data redundancy?

A.   IOPS

B.   Replication

C.   Deduplication

D.   RAID

3.   Which RAID configuration improves disk I/O performance but does not include fault tolerance?

A.   RAID 0

B.   RAID 1

C.   RAID 5

D.   RAID deduplication

4.   Which solution protects stored data even if physical storage devices are stolen?

A.   Deduplication

B.   RAID 1

C.   RAID 5

D.   Encryption of data at rest

Cloud Storage Configuration

5.   Your organization is configuring cloud backup for on-premises servers. Which cloud backup storage configuration should be used to minimize costs?

A.   Increased IOPS

B.   Cool access tier

C.   Storage replication

D.   Hot access tier

6.   Developers are planning to write on-premises code that programmatically accesses cloud storage. You are configuring on-premises firewall rules to allow this storage access. Which type of outbound traffic will you most likely allow in this scenario?

A.   FTPS

B.   SMB

C.   NFS

D.   HTTPS

7.   In the event of a regional disaster, you would like cloud-stored data available elsewhere. What should you configure?

A.   RAID 0

B.   Geo-redundant storage

C.   Deduplication

D.   RAID 1

Databases in the Cloud

8.   Which type of database solution uses a rigid schema?

A.   NoSQL

B.   SQL

C.   Managed

D.   Replicated

9.   Which TCP port is normally used to connect to Microsoft SQL Server?

A.   80

B.   443

C.   1433

D.   3389

10.   What is another term for DBaaS?

A.   Unmanaged database

B.   NoSQL

C.   Managed database

D.   SQL

Content Delivery Networks

11.   What is the primary benefit of using a CDN?

A.   Regulatory compliance

B.   Adherence to standards

C.   Improved performance

D.   Enhanced security

12.   Which CDN configuration determines how long before the source of cached data is checked for changes?

A.   TTL

B.   Replication

C.   Path

D.   SSL

13.   You need to create a DNS record that redirects a custom domain name for a CDN configuration. What type of record should you create?

A.   MX

B.   A

C.   PTR

D.   CNAME

14.   What is the primary benefit of deploying a CDN?

A.   Enhanced security

B.   Improved performance

C.   Reduced costs

D.   Regulatory compliance

15.   You are configuring a CDN that will be used to serve media files to users. What should you configure to use the CDN most efficiently?

A.   Increased TTL

B.   Reduced TTL

C.   Wildcard path for media files

D.   Custom encryption keys

Images SELF TEST ANSWERS

Storage Media

1.   Images   C. Solid state drives (SSDs) provide better performance than traditional hard disk drives (HDDs). Naturally, CSPs charge more for the performance improvement, so choosing SSDs over HDDs also increases cloud costs.

Images   A, B, and D are incorrect. Storage capacity, the brand of storage media, and accessing cloud storage through FTP will not positively impact performance as much as the use of SSDs will.

2.   Images   C. Deduplication removes duplicate disk blocks and replaces duplicates with pointers to reduce disk space consumption.

Images   A, B, and D are incorrect. Disk IOPS is a disk I/O throughput measurement. Replication creates copies of data for increased resiliency to failure at a primary location. Redundant Array of Independent Disks (RAID) organizes multiple disk storage devices together in various ways to improve disk performance and/or to provide fault tolerance.

3.   Images   A. RAID 0, disk striping, uses multiple physical disks working as one to improve performance, but the failure of a single disk renders the entire disk array unavailable.

Images   B, C, and D are incorrect. RAID 1 (disk mirroring) and RAID 5 (disk striping with distributed parity) both provide fault tolerance. RAID deduplication is not a function specifically related to RAID; deduplication is a method of reducing disk space consumption.

4.   Images   D. Encrypting data at rest protects stored data. The correct decryption key is required to read information that is encrypted.

Images   A, B, and C are incorrect. Deduplication is a method of reducing disk space consumption. RAID 1 (disk mirroring) and RAID 5 (disk striping with distributed parity) both provide disk fault tolerance.

Cloud Storage Configuration

5.   Images   B. Cool or cold cloud storage is best suited for data that is accessed infrequently, such as backups, and is less expensive than hot cloud storage.

Images   A, C, and D are incorrect. Increasing disk IOPS and enabling storage replication increase cloud computing charges. Hot access tiers are best suited for data that is frequently accessed, because hot access provides higher performance for quicker data access, but is more expensive than cold access.

6.   Images   D. Accessing cloud storage programmatically normally occurs through the REST API, which relies on HTTP and HTTPS.

Images   A, B, and C are incorrect. These network file access protocols are not used for cloud storage access as often as HTTPS is.

7.   Images   B. Geo-redundant storage keeps copies of data in different regions, which is resilient against a regional disaster.

Images   A, C, and D are incorrect. RAID 0 (disk striping) uses multiple physical disks working as one to increase disk I/O performance. Deduplication is a method of reducing disk space consumption. RAID 1 (mirroring) copies data to a secondary disk when it is written to the primary disk.

Databases in the Cloud

8.   Images   B. SQL databases use a structured, or rigid, schema that defines what type of data will be stored.

Images   A, C, and D are incorrect. NoSQL databases do not use a structured schema; many different types of data can be stored without a definition of how that data will be stored. Managed SQL and NoSQL cloud databases hide the underlying infrastructure complexities related to hosting databases from cloud customers. Database replication is not determined by a structured or unstructured schema.

9.   Images   C. By default, Microsoft SQL Server is accessible over TCP port 1433.

Images   A, B, and D are incorrect. Port 80 is used for HTTP, port 443 is used for HTTPS, and port 3389 is used for Remote Desktop Protocol (RDP; covered in Chapter 6).

10.   Images   C. Database as a Service (DBaaS) is a managed database service, which means the CSP takes care of the underlying infrastructure to host the database.

Images   A, B, and D are incorrect. Unmanaged databases require cloud customers to install and configure the underlying infrastructure to support the database. NoSQL and SQL databases are available as managed and nonmanaged services.

Content Delivery Networks

11.   Images   C. A content delivery network (CDN) improves the performance of users’ access to content by placing a copy of that content geographically near users, which reduces network latency.

Images   A, B, and D are incorrect. A CDN does not specifically address regulatory compliance, standards adherence, or improved security.

12.   Images   A. The Time To Live (TTL) value determines how long before the CDN cache checks the source data for changes.

Images   B, C, and D are incorrect. CDN configuration settings for replication, path, and Secure Sockets Layer (SSL) do not determine when cached source data has changed.

13.   Images   D. DNS CNAME records are alias records that point to other DNS records.

Images   A, B, and C are incorrect. MX records are mail exchange records used for e-mail transfer; A records use names to point to IPv4 addresses; and PTR records are reverse lookup records that, given an IP address, return a DNS name.

14.   Images   B. A CDN is configured to place data near the users who request it, which improves performance by reducing network latency.

Images   A, C, and D are incorrect. CDNs do not enhance security, reduce costs, or help with regulatory compliance.

15.   Images   C. Wildcard paths are used to specify which files should be included in the CDN.

Images   A, B, and D are incorrect. Modifying the TTL value or using custom encryption keys for cloud data will not make as big a difference in efficiency than a correctly configured wildcard path to copy only the required files.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.87.95