© Baji Shaik, Avinash Vallarapu 2018
Baji Shaik and Avinash VallarapuBeginning PostgreSQL on the Cloudhttps://doi.org/10.1007/978-1-4842-3447-1_1

1. Introduction to Databases in the Cloud

Baji Shaik1  and Avinash Vallarapu1
(1)
Hyderabad, Andhra Pradesh, India
 

This chapter is an overview of databases as a service (DBaaS) and their benefits. We also talk about the key things to be considered when choosing a service provider, including how to implement it on PostgreSQL using popular cloud vendors. The chapter also discusses the pros and cons of on-premise and cloud databases. We discuss all the cloud vendors available for PostgreSQL and explain how PostgreSQL is different from the other databases in the cloud.

What Is Database as a Service?

DBaaS is a service that delivers a powerful on-demand database platform to provide an efficient way to satisfy all the needs of an organization. DBaaS enables DBAs to deliver database functionality as a service to their customers. This service eliminates the need to deploy, manage, and maintain on-premise hardware and software on a database or on a software stack, in the case of IaaS. It allows businesses to concentrate more on the application without worrying about the complexities of database administration and management.

DBaaS can simplify the deployment of your development and testing environments during the software development and testing phases. Maintaining a production environment with a failover mechanism and load balancing adds overhead to any organization. DBaaS can help you meet these requirements through self-service portals that manage load balancing and failover.

DBaaS helps deliver production and non-production database services with an architecture that is designed for elasticity and resource pooling. DBaaS also enables businesses to effectively use their resources for everyday DBA work. By consuming DBaaS, you can easily avoid the costs and possible delays in setting up and maintaining an infrastructure. This enables applications to be deployed to the database with no CapEx for hardware and software, and only OpEx for the database service. Most of the tools and automations are embedded as services by several cloud vendors.

The elasticity of DaaS services helps you avoid investing in capacity and resources in advance. DBaaS enables you upgrade resources and capacity as needed in the future through on-demand and self-provisioning portals. Monitoring solutions are nearly free for managing the logical infrastructure maintained as a service. DBaaS also avoids costs associated with maintaining the infrastructure and training in-house expertise. Having more visibility to the performance and diagnostic data helps you upgrade or downsize the service and thus have rightsized resources. Metrics collected through various solutions by the vendors are helpful in forecasting the business. DBaaS also brings improved availability through several monitoring solutions and high availability solutions implemented by the vendors.

Who Should Use DBaaS?

DBaaS has no limitation on the type of business and the size or volume of the business it can serve well. There are a number of companies using DBaaS that manage several thousands of transactions per second and have terabytes of data. Start-up companies as well as multinational companies use DBaaS as their choice of database platform. DBaaS has been the right choice for numerous small and medium sized businesses.

What Database Platform Does an Organization Need?

Organizations need a platform that achieves the following requirements:
  • A secured database

  • Fast performance

  • Reliable, redundant, and durable

  • Geographically distributed and independent

  • No single point of failures

  • Can be integrated into their existing systems

  • Help globally distributed teams and collaborate

Let’s discuss these needs in detail.

Secured Database Environment

DBaaS helps protect databases against data theft, confidentiality, integrity, and unauthorized or unintended activity, and misuse by hackers or unauthorized users. DBaaS enables you to configure a database environment that avoids leakage or disclose of personal or confidential data. This is an important aspect in maintaining a secure database.

DBaaS helps administrators create users with limited or the appropriate set of roles and privileges so that no resource is overallocated with more destructive privileges. One of the important aspects of maintaining a secured database environment is encryption. Businesses consider it as a must to encrypt the data in motion and data at rest. DBaaS helps businesses achieve this through various solutions, such as Secure Socket Layer (SSL) communication and encrypted storage volumes. Practical implementation of these features is discussed in the forthcoming chapters.

Fast-Performing Database

How can I make my database perform better? This is one of the concerns raised by businesses while subscribing to DBaaS. Consider the database software—PostgreSQL. PostgreSQL requires several sets of important parameters to be tuned in advance to achieve a fast performing database. Parameters such as shared_buffers, work_mem, and autovacuum settings should be tuned in advance during the provisioning stage.

DBaaS provided by most vendors simplifies this requirement by tuning these settings automatically during the provisioning stage. Vendors use several algorithms based on existing instances and several benchmarks done by experts. This allows administrators or developers to self-provision an instance without worrying about tuning the database parameters. DBaaS vendors allow users to choose a disk that performs better starting from a raw hard disk to SSD and better.

Reliable, Redundant, Durable Database

Database reliability is a serious concern. PostgreSQL , or any database software solution for that matter, is often deployed on hardware. It is the hardware that can cause reliability issues. For example, faulty RAM or a bad hard disk can result in reliability issues, as they can bring down a database or cause performance issues and downtime. Such issues are avoided in an infrastructure by continuous monitoring by DBAs or admins. The only applicable solution in such situations is to purchase new hardware.

DBaaS and deployments on the cloud take care of this issue by eliminating the need to monitor the hardware and avoid the efforts and cost involved in replacing faulty hardware. Vendors provide efficient ways to back up databases, create replication instances, and back up transaction logs for databases deployed on the cloud.

For a production database, one of the important challenges is being able to recover a database to a certain point in time during disasters. Vendors that provide DBaaS take care of this requirement and make it easy for users. It is a UI and requires a few clicks to perform PITR during disasters, which makes it more redundant and durable.

Geographically Distributed and Independent

Gone are the days when infrastructures were designed for vertical scaling. New generation techniques involve horizontal scaling and horizontal computing. What does horizontal scaling mean for database systems? It is all about slicing and dicing data across multiple machines horizontally to scale out.

When your users are distributed across the world globally and the applications are being accessed by users from various locations and countries, you must build infrastructures in several distributed regions. Building such environments is expensive but can be simplified by using the services provided by cloud vendors. DBaaS allows users to deploy their databases across several regions. Most vendors provide the infrastructure on various locations distributed globally. Users or businesses can choose several database services distributed across various regions with ease.

No Single Point of Failures

In the vast topology involved in an infrastructure that consists of a web application or an application connecting to a database, there are lot of infrastructure components that can have single point of failures. For example, a router, a switch, a database server, a hard disk, RAM, or an application server can all cause failures or downtime.

Several cloud vendors enable users to configure environments that prevent single point of failures by providing sufficient redundancy/backup mechanisms. These systems are generally called high availability features . DBaaS provides APIs and options on the dashboard that help to configure database high availability.

Integrated into Existing Systems

Cloud vendors make it easy for developers and database admins to integrate their applications or environments into databases deployed using DBaaS. Most cloud vendors provide a way to deploy the DBaaS with no modifications needed on the application environment. Developers can just use the appropriate database drivers, which enable them to talk to the databases and perform their routine tasks. Moving from a database deployed on commodity hardware to a database on the cloud is no longer a tedious task. The options available for migrating and the steps involved in migrating to DBaaS are discussed in further chapters.

Help Distributed Teams Work and Collaborate More Efficiently

Developers, admins, businesses, and testing teams work from various locations across the world. It is important to collaborate with other team members and continue structured and incremental code development. These development and testing teams should be able to deploy their code changes and test cases on development databases and revert to changes at different points in time. Most cloud vendors enable users to create snapshots of their databases. It is easy to create databases using these snapshots. This obviates the time and effort needed to involve a DBA.

Features of DBaaS

There are always feature-related questions about using DBaaS:
  • How secured is the service?

  • What level of availability does the vendor provide?

  • Is it scalable?

DBaaS delivers a powerful on-demand database platform that provides an efficient way to satisfy all the needs of an organization. These features are covered in the following sections.

Provisioning

DBaaS provided by most cloud vendors enables easy provisioning mechanisms to its users like DBAs and developers.

Users are provided with on-demand provisioning and self-service portals or mechanisms that enable user friendly and rapid provisioning. Organizations can spend days provisioning a database server.

Provisioning involves the following:
  • Allocating a server with the CPU, memory, and disks requested.

  • Installing an operating system.

  • Adding hard disks as requested.

  • Partitioning the disks.

  • Installing database software and any additionally requested software.

  • Configuring the database instance.

  • Managing host based access control.

  • Managing encryption of hard disks.

  • Distributing the data directory and the transaction logs to multiple hard disks.

  • Creating users with appropriate sets of privileges.

All these stages can be automated using DBaaS.

DBaaS allows users to create a database service with the appropriate number of CPUs, RAM, and hard disks in the first phase. Hardware resources can be limited to each customer and can be upgradable on demand. If the user wants to modify the server capacity to satisfy the growing transactions in the business, it is very easy to upgrade or downgrade the server resources on demand through dashboards.

Self-service portals enable users to create database services on the fly. You cannot choose the underlying operating system of the DBaaS. If you are particular about the operating system, you must choose IaaS.

For example, you get an option to choose your own operating system while building an EC2 instance in Amazon but not an RDS instance.

You would be able to increase the disk space and the type of storage while choosing your DBaaS . Most vendors support on-demand upgrades to storage capacity, which in turn gives you more IOPS. But you may not be able to partition the disks or select physically partitioned disks to balance the IO across multiple disks while using a DBaaS.

While choosing the database type, you can select a supported version of the database software and any extensions that help you look into the diagnostic data or PostgreSQL. Many cloud vendors do provide APIs for automated provisioning/DevOps style of orchestration of DBaaS. They provide APIs for monitoring and managing services. A few cloud vendors also provide dashboards that help you look into the underlying CPU, memory utilizations, disk IO, replication lag, and a lot more for free. DBaaS has a few limitations and thus customers will choose an instance or a virtual server if their requirements are not met.

The following list describes a few of the features that may not be available with DBaaS and could be a great concern while using DBaaS.
  • Choosing multiple disks to redirect logs, transaction logs, and data files. Separation of storage volumes is common practice to decrease the IO bottlenecks and improve the performance of the database.

  • If you have a concern with the disk IO due to the previous limitation, you may need to purchase more disk space to get more IO.

  • Additional storage space, which is not be usable by the database, cannot be used for any other purpose such as storing HTML files or backups.

  • You do not get a choice to install most of the Open Source extensions available while using DBaaS. In a way, you are at the mercy of the cloud vendor to provide extensions.

  • You cannot create tablespaces while using DBaaS. It is common practice to create tablespaces that span on different storage volumes to improve the performance of a database. You might want to create partitions of frequently accessed transactional tables and redirect them to multiple tablespaces .

Administration

As a user, you may not want to deal with sophisticated platforms that require a lot of knowledge to implement DBaaS or an infrastructure on the cloud. Gone are the days when you implemented a database using numerous manual steps. Users like to view the performance data and the diagnostic data, including the methods to monitor this, in a few clicks.

Every well-known cloud vendor provides several APIs that not only help when provisioning a database environment but also give users several features to enable monitoring and alerting in a few clicks. Vendors provide several dashboards that help users view all the performance data for diagnostics in single page or multi-page views. Most everyday DBA activities include database cloning and database refreshes to enable functional and performance testing. Simplified cloning procedures help users perform database refreshes and cloning and are just a few clicks away. Hence, the time consumed in refreshing a development, testing, QA, and performance environments can be avoided by several APIs and options for refreshing.

Several maintenance operations can be configured automatically without an impact on the application or on the users connected to the database while using DBaaS.

Monitoring

Most of the monitoring tools used with database environments require great effort from the DBAs in terms of setup and configuration. An admin has to build the monitoring server to configure monitoring for all his database environments and manage the monitoring server. If the monitoring server goes down, there is nothing that can continue monitoring a database environment. With DBaaS, you get several monitoring and alerting mechanisms. This saves you time and allows you to build an efficient monitoring system, as most of the monitoring checks are derived from the most frequent customer requests.

High Availability

An important question raised by many users and businesses is whether a DBaaS solution enables options for high availability.

This is one of the important ways to avoid downtime and loss during disasters. DBaaS does indeed provide several high availability features in the case of disasters.

Most customers look for options that provide seamless failovers during disaster recoveries. In fact, it may not be the DBaaS, but the cloud vendors, that allow us to configure load balancers and set up several other features that make a system highly available automatically and seamlessly.

Scalability

The issue of scalability is raised by many customers who build their data warehouses and critical transactions systems on the cloud using DBaaS. The massive growth of data is a very big issue. Several petabytes of data are generated every day. In such a world, where we see several millions of transactions in a few critical transaction systems and several terabytes of data in a few data warehouse environments, scalability is a burning need. This same concern is likely raised for DBaaS. However, cloud vendors who provide DBaaS are well equipped with the features that enable scalability. You can still continue to partition your database tables and perform archiving as usual. We have all the possibilities to incrementally add hardware with new requirements but not anytime earlier and not in a hurry. Most cloud vendors provide load balancers that allow you to distribute your transactions across multiple database services.

Security

The database technology has advanced to the level where access management can be considered at the cluster/instance and server levels, as well as the database level. The user is created inside the database and has to be assigned the required roles and privileges to connect and perform actions on the database. This advanced to the host-based authentication techniques on Open Source databases such as PostgreSQL and MySQL. It has also been advanced to the firewall and network level rules to allow users to connect to a database. Most vendors (like Amazon) enable organizations to design their infrastructure that meet SOX, HIPAA, PCI, and several security compliances. That is the reason behind several financial and secured data driven companies choosing DBaaS.

Cloud vendors allow users to configure firewall policies and encrypt data in motion and data at rest.

Cloud Versus On-Premise Computing

On-premise computing is the type of computing in which all the computing resources are accessed and managed by or from the premises. Overall costs are incurred by the premise that owns it; this translates to diminished returns in the long run.

In cloud computing, the pool of resources is accessed online. It is a usage on-demand service and is perceived as a utility—you pay as you go.

The Pros and Cons of Cloud and On-Premise Computing

Let’s look at each approach in turn.

On the Cloud Pros and Cons

  • Databases can be deployed on the cloud using two types of services. They are DBaaS and IaaS. While using DBaaS, you have no choice of choosing your operating system, for a few vendors. However, when you subscribe to an IaaS, you can install and configure an OS and tools or the software of your choice on the database server.

  • No manpower is needed to manage your hardware. No extra cost is involved to replace hardware faults and no insurance or warranty cost is paid for your hardware.

  • It’s easy to setup with a quick registration with the cloud vendor for the time and the resources you use. This makes it easily affordable .

  • Most of the software updates, especially for the DBaaS, are managed by completely the vendor.

  • These may incur security breaches or man in the middle attacks if they’re not configured properly. You should be aware of how to secure your database on the cloud.

  • There may be a vendor lock-in, as DBaaS are more vendor specific.

  • You may be limited to the list of services you can use.

  • Any damage or outage at the cloud vendor may make you helpless. You need to wait until the vendor gets a fix.

  • Monitoring can be enabled and managed by the cloud vendor for you. You do not have to configure or set up your own monitoring solutions on your infrastructure.

On-Premise Pros and Cons

  • You are free to install any software you want. Software licensing costs are managed by you and your team.

  • You need to manage your hardware, which may involve additional manpower.

  • To achieve disaster recovery capabilities, you may have to build the same hardware in a different data center.

  • You need to manage your software updates or hardware break fixes, which may involve more time during emergencies.

  • You have complete control over your data and have an estimate of time to fix an issue during a disaster.

  • You may have to manage your data backups and recovery .

  • You have to enable your own monitoring tools and that may involve additional costs .

Should You Move Your Databases to the Cloud?

Should your company move your data and infrastructure to the cloud or keep it on-premises?

Current IT organizations talk about three pain points.
  • They need to save on cost.

  • They need to improve the developer’s productivity in a cost-effective way.

  • They need to retain the skillset that they already have.

In the long run, cloud computing is cost effective compared to the on-premises approach. It reduces the overall capital expenditure while maximizing efficiency and productivity. Let’s compare the cost models next.

On-premises cost model :
  • Initial investment for procurement of hardware and software for a projected/anticipated load.

  • Scaling up requires investing more on hardware and other data center infrastructure.

  • Hardware refresh and upgrade investments at the end of life.

  • Software licenses.

Cloud-based cost model :
  • Subscription based (pay what you use).

  • No hardware related costs.

  • No dedicated software licenses fees.

Cloud Vendors Available for PostgreSQL

There are several vendors in the market who provide database services on the cloud. This book includes descriptions of solutions provided by a carefully selected vendor who are well-known in the market for PostgreSQL on the cloud.

Here is the list of vendors we discuss in future chapters.
  • Amazon (Chapter 3)

  • Rackspace (Chapter 4)

  • Google Cloud (Chapter 5)

  • Microsoft Azure (Chapter 6)

We talk about each vendor briefly in the following sections.

Amazon

Amazon Web Services, also known as AWS , is a well-known cloud hosting platform that provides services to several databases.

AWS is growing at a rapid pace. The Amazon public cloud is chosen as the best platform for deploying several applications and databases on the cloud and it is the largest cloud computing platform on the market. Amazon offers thousands of services, which makes means it provides more than 90% of the services of all the other cloud services combined.

AWS is distributed across 16 geographic regions and is expected to grow five more regions in North America, Europe, and Asia. To overcome the major challenges of such an infrastructure, such as reliability and durability, AWS provides 44 availability zones. Customers are allowed to build multiple availability zones and that means customers can build highly available database environments.

Amazon supports two major platforms on which PostgreSQL can be deployed.
  • Amazon Relational Database Service (Amazon RDS)

  • Amazon Elastic Compute Cloud (Amazon EC2 )

Amazon RDS

Amazon RDS is a well-known and widely implemented DBaaS solution for PostgreSQL. This is especially built to make the deployments and setup of PostgreSQL as a DBaaS platform faster and easier. Amazon RDS is one such solution and it allows customers to scale their databases on demand.

PostgreSQL service on Amazon, which is known as Amazon RDS, makes PostgreSQL deployments easy to set up and scale in the cloud. You can deploy PostgreSQL deployments, which are scalable and resizable, cheaply and on-demand. The AWS RDS Console helps administrators and developers operate and manage their cloud platforms with many features.
  • Provisioning the database service including software installation

  • Seamless software upgrades and patches

  • Enable replication or high availability for PostgreSQL with a few clicks

  • Cost-efficient and on-demand resizable hardware capacity

  • AWS Cloud Watch dashboard that stores the diagnostic data for analysis of the RDS instance

Amazon RDS instances restrict access to the operating system. Once provisioned, a user cannot manage the operating system and is limited to managing their instance through the available options on the dashboard. PostgreSQL does have several extensions that need to be installed or compiled to use them effectively. RDS restricts access to the OS and the limited number of extensions that it supports means fewer options for users.

Amazon EC2

Amazon EC2 instance is the best choice for those users who want to go beyond the limitations of an RDS instance. IOPS in RDS are provisioned and depend on the storage chosen. Users are required to purchase more storage to get more IOPS . Storage purchased additionally cannot be used for any other purpose by the users. However, such limitations can be ignored in the case of an EC2. Customers can use the additional storage to store application-related data or backups. Users have the option to install any extension and software as needed on an EC2 instance. A few of the features that are provided to the users on the console may not be available to the EC2 users.

Amazon Cloud is well-known for satisfying several compliances needed for a secured cloud hosting environment. Most customers get a performance hit while implementing security features such as encryption paying a huge expense. However, customers have a wide number of features they can use to encrypt and secure the data in motion and data at rest with a small or seamless performance hit. Amazon is also known for its network security, which enables users to configure firewalls and inbound/outbound rules for every RDS or EC2 instance. Security and encryption are further discussed in Chapter 3.

Rackspace

Rackspace is a managed cloud computing company that operates their data centers across the globe. As you know, every cloud has to be managed by someone. Rackspace offers services like managed hosts, managed cloud, and application services with fanatical support. It provides public, private, hybrid and multi-cloud solutions under managed cloud services. The platforms that Rackspace supports for these are services are:
  • Dedicated servers

  • VMware

  • Amazon Web Services

  • Google Cloud Platform

  • Microsoft

  • OpenStack

  • Pivotal Cloud Foundry

Rackspace offers fully managed dedicated servers and physical firewall configurations. You can look at the detailed dedicated configurations that Rackspace can deploy at https://www.rackspace.com/dedicated-servers .

For the AWS platform, it provides ongoing architecture design, security, migration, and recurring optimization. Rackspace is the first premier managed services partner for Google Cloud Platform and it provides ongoing optimization and fanatical support for GCP.

You can run PostgreSQL on Rackspace in two ways—via the Managed Cloud and via the Private Cloud. More details on Managed Cloud are found at https://www.rackspace.com/cloud and more details on Private Cloud can be found at https://www.rackspace.com/cloud/private .

You can compare pricing details at https://www.rackspace.com/openstack/public/pricing .

Google Cloud

Google, a well-known search engine platform, has come with their innovative thoughts to build a cloud platform that suits databases like PostgreSQL. As with other vendors, Google includes a lot of features especially related to security that are very important to any enterprise. On their cloud platform, Google promises to keep the same security model that has been implemented on its applications like Gmail, Google Search, and other applications. On top of its excellent approach to network security and the default SSL-like policies, Google data centers are physically featured with a layered security model. Google builds its hardware, networking, and the software stack while keeping security in mind.

Google introduced transparent maintenance for Google Compute Engine in December 2013. Since then, most of the software updates, hardware break-fixes, and other issues don’t require downtime to the database or applications. This is addressed by one of their innovative features known as Live Migration . This features allows Google to address the following issues with no impact to its customers.
  • Regular infrastructure maintenance and upgrades

  • Network and power grid maintenance in the data centers

  • Bricked memory, disk drives, and machines

  • Host OS and BIOS upgrades

  • Security-related updates, with the need to respond quickly

  • System configuration changes, including changing the size of the host root partition, for storage of the host image and packages

If you want to try the Google Cloud, check out the price calculator, which helps you build and understand the cost of spinning up your VM: https://cloud.google.com/products/calculator/ .

Google currently has a hard limit of up to 64 vCPUs as a maximum number you can select for your virtual machine . This also includes a CAP of 6.5GB of memory per vCPU. gcloud compute is the command-line tool and it has a lot of options for managing your Google Compute Engines on top of its Compute Engine API.

Google Cloud platform services are currently available in 12 regions and 36 zones in locations across North America, Europe, and Asia.

Here is a list of regions where GCP has a presence.
  • Oregon

  • Iowa

  • N Virginia

  • S Carolina

  • London

  • Belgium

  • Frankfurt

  • Sao Paulo

  • Tokyo

  • Taiwan

  • Singapore

  • Sydney

It is also building new regions in Los Angeles and Montreal.

Every region consists of at least one or more zones. High availability is a major concern for any business. Google allows its customers to build their database infrastructures in multi-zones to allow failovers during disasters. Google provides features such as load balancing for the customers who want to redirect some of their read traffic to disaster recovery sites .

Google provides APIs, command-line tools, and a friendly dashboard to all its customers to manage their virtual machines.

Microsoft Azure

Microsoft, a well-known multinational technology company that is best known for its software products such as Microsoft Office, introduced its cloud platform in October, 2008. It is an emerging cloud platforms that could become a great competitor to AWS. Azure provides a wide range of cloud services related to compute, analytics, storage, and networking.

Azure is scattered across 36 regions around the world, almost equivalent to the number of regions provided by Amazon. Azure also provides DBaaS for PostgreSQL Open Source databases, called Azure Database for PostgreSQL . Users can create a PostgreSQL database in minutes, just a few clicks away on its self-service portals, which are truly user-friendly.

Azure provides the following features to attract customers with its data security model .
  • Multi-factor authentication

  • Encryption of data in motion and data at rest

  • Support for encryption mechanisms such as SSL/TLS, IPsec, and AES

  • Key vault service

  • Identity and Access Management

Azure offers 57 compliance offerings to help users comply with their national, regional, and industry-specific requirements governing the collection and use of individual data.

If you want to choose an Azure Cloud but are not sure if your compliance requirement is met, use the following URL to validate.

https://www.microsoft.com/en-us/trustcenter/compliance/complianceofferings

Azure allows you to select from 20 cores to 10,000 cores per subscription. You can increase the cores or your quota using the self-service dashboards given by Azure upon subscription .

This pricing calculator helps you calculate the cost of your subscription: https://azure.microsoft.com/en-us/pricing/calculator/

Azure Database for PostgreSQL is a fully managed database service that helps users deploy apps with ease. This service has built-in high availability with no additional costs for extra configuration and replication.

Features like Automatic Backup help users achieve point-in-time-recovery up to 35 days. CPUs and IOPS are provisioned for this service upon prediction and subscription.

Microsoft is expected to launch a premium service for Azure Database for PostgreSQL shortly.

IaaS

IaaS is one of the services offered by cloud vendors along with DBaaS. IaaS stands for Infrastructure as a Service . Cloud vendors provide consoles through which users can self-provision and start a virtual machine/server on the cloud. These machines are managed by the vendors and do not require users to perform any hardware fixes or maintenance.

For example, Amazon provides EC2, S3, and several other services that come under IaaS. Google provides Google Compute Engine. Likewise, Microsoft Azure and Rackspace provide virtual machines on the cloud.

Several large and small/medium-sized business organizations build their own infrastructure at several locations. Building multiple locations enables them to improve performance and decrease latency to their globally distributed customers and helps them achieve high availability. If one data center is down, another standby site can take its role.

Consider, for example, an organization that has been globally distributed over 40 locations across the globe. Considering the latency between regions, it is almost impossible to build individual data centers at all the regions. This could be a burning need for all the organizations planning to have their business expanded all across the world.

Cloud vendors consider this fact very seriously. They build their data centers across several regions across the globe. Most cloud vendors expand their services to almost every continent. They have set up high availability zones within every region with several milliseconds of latency. Large organizations simply subscribe to the service and the vendors manage their infrastructure.

Physical hardware used by vendors to provide IaaS may not be dedicated to an organization, unless requested. You may have to explicitly request a dedicated server for your organization. This way, you let all the cloud virtual machines be hosted on the same physical server. We need to consider the fact that not all cloud vendors take this approach. However, it may not be a burning need for all organizations. You may subscribe to an IaaS that’s hosted on a physical server shared by several other organizations. This makes it even cheaper for users to subscribe to an IaaS . To get more customers and revenue, cloud vendors continue to expand their data centers across the globe. This is why we see a notification of a new region being added almost every time we log in to a vendor's website.

We have already discussed the features of DBaaS such as provisioning, monitoring, high availability, scalability, and security. These same features are considered major features of IaaS offered by cloud vendors.

Migrating to the Cloud

In the previous sections, you read an introduction to DBaaS and IaaS on the cloud . We also discussed a few of the vendors that have a great customer base subscribing to their cloud solutions. You have learned in detail about DBaaS and IaaS.

This section explains what is involved in moving to a DBaaS or an IaaS. Consider these situations that could make you think about migrating to the cloud.
  • Huge cost involved in maintaining the infrastructure and the disaster recovery sites.

  • Huge licensing costs involved in purchasing the software licenses.

  • Regular/periodic maintenance and man hours involved in hardware fixes and upgrades.

  • Issues with scalability.

  • Cost involved in monitoring the infrastructure and achieving more 9s of availability.

  • Options to have the instances up when needed and terminate them to avoid cost involved when not needed.

As discussed, migrating to the cloud provides a lot of benefits in terms of cost and availability. Your organization can thus concentrate on improving their business more than managing their infrastructure and tuning it for better performance.

Before Migrating to the Cloud

Let’s say you have a PostgreSQL production database cluster hosted at your own data center (on-premise). You might have already thought about high availability while using your PostgreSQL environment and built one or more slaves for high availability. You may have also installed or configured tools that help you achieve automatic failover and seamless application failover at your site.

Most organizations plan to replicate their existing architecture on the cloud and have their architecture design tuned at later stages. A few organizations may plan to redesign their architecture while moving to the cloud, by learning from current issues. For example, you may not have a load balancer in your existing architecture. A load balancer can redirect your application reads (such as reporting queries) to master and slave PostgreSQL instances using a round robin or least connection count algorithm. Likewise, a load balancer can also be used for automatic failover, by letting a failover service calling the load balancer API redirect all the new connections to the promoted slave (the current master that was previously a slave). This is just a basic example.

Cloud vendors offer you several services that enable you to achieve load balancing, high availability, etc. Hence, you may want to redesign your existing environment on-premises by making the most of the services offered on the cloud. Be sure to understand the existing pain points in your database infrastructure design, be aware of the services that could help you overcome the existing issues, and create a test environment using the new architecture design .

Planning Your Infrastructure on the Cloud

Most organizations follow several approaches when migrating their PostgreSQL databases to the cloud . The following approaches are all legitimate, depending on your needs:
  1. 1.

    Subscribe to the same number of CPUs, the same amount of RAM, and the same volume of disks as your existing PostgreSQL database on-premise.

     
  2. 2.

    Subscribe to bigger hardware needs than your existing environments, as it is a lot cheaper, or with an assumption that the performance can go down on the cloud.

     
  3. 3.

    Learn from your existing database usage metrics and plan hardware efficiently on the cloud.

     
  4. 4.

    Along with the point 3, test the performance metrics by generating almost double the peak application traffic on the hardware created on the cloud, as a phase of performance testing. If performance testing does not show fruitful results, upgrade the hardware as needed.

     

You may follow any of these approaches, but it is always important to fine tune your database server and rightsize it by learning from its usage metrics over time.

After learning from several such migration experiences, here is how I recommend you start your migrations to the cloud. Let’s divide this migration into two types.
  • Moving from other databases like Oracle to PostgreSQL on the cloud.

  • Moving from PostgreSQL on-premises to the cloud.

In this book, we are concentrating on beginning PostgreSQL on the cloud, not migration from other databases to PostgreSQL, which means we only discuss the second point, assuming that we have an existing Postgres database on-premises or on bare metal.

Consider these points before moving to the cloud:
  1. 1.

    Understand the database peak transaction time/hour/minute of the day and peak transaction day of the week and peak week of the month of an year.

     
  2. 2.

    Take a snapshot of the OS metrics using a tool like sar. The ability to draw a pattern using the history data drives you toward choosing the perfect server specifications for your database on the cloud.

     
  3. 3.

    Compare the load averages of a certain period and see if the load has been increasing with database growth or over a period of time.

     
  4. 4.

    Check if the CPU, memory, and IO utilizations are increasing gradually or staying under-utilized even at the peak transaction times.

     
  5. 5.

    Enable snapshotting tools on the PostgreSQL database and look at the database traffic during the peak utilization times of the server resources.

     
  6. 6.

    Fix any contention issues caused by the application logic. Identify such locking SQLs or application logic using snapshotting solutions.

     
  7. 7.

    If the existing database server on-premise has a pattern of increasing server resource utilizations, estimate a server architecture that could accommodate that transaction load for an year. As you know already, you can scale up the resources such as CPU, RAM, or disk space and IOPS.

     
  8. 8.

    Note if there are any historic tables or tables with lots of historic data not being used by the application. Such data can be safely archived to avoid SQLs scanning the data blocks containing older data. If you find it difficult to archive such data at this point, enable opportunities for partitioning on the DB server being created on the cloud. PostgreSQL allows you to enable partitioning on your tables seamlessly.

     
  9. 9.

    When you create your database service on the cloud, be sure to create all the extensions that help you deep dive into performance data and tune your database to an optimal level .

     
  10. 10.

    As discussed, have at least one slave for high availability and use the services such as connection pooler or load balancers to effectively use your slaves for reads or reporting queries. Connection pooling is discussed in Chapter 11.

     

Tools and Extensions

These tools and extensions can help you rightsize your PostgreSQL server on the cloud. Rightsizing in this chapter refers to choosing the server or DB resources that suit your database traffic efficiently. You may have already deployed your PostgreSQL DB on the cloud, or you may be planning to do so in the future. Let’s say that you chose an instance type that you feel is not rightsized. Here are the methods and tools that could help you efficiently rightsize your PostgreSQL on the cloud .
  • sar : Can be used to snapshot your Linux server resource utilizations.

  • pgPulse : A snapshotting tool that helps PostgreSQL gather and store its history. Oracle-like databases have a feature that enables you to see the activity in the database at a certain point in time. This helps you be more predictive than reactive while managing PostgreSQL databases. Unlike with other databases, this tool allows you to capture historic information in a remote centralized database. Thus, you can avoid huge write IO on the production database where the data is being collected.

  • This tool helps you query the historic data and understand the SQLs that have performed bad, tables/indexes accessed, locks acquired, etc., on the PostgreSQL database at a certain time. See https://bitbucket.org/avinash_vallarapu/pgpulse .

  • Here’s a list of extensions to be created on the PostgreSQL database on the cloud:
    1. a.

      pg_stat_statements: Historic data for SQLs that have hit the DB server with their total resource utilizations and average execution times.

       
    2. b.

      pg_buffercache: List of tables in the shared buffers (PostgreSQL memory area) along with the amount of memory used by them.

       
    3. c.

      pgstattuples: Amount of bloat/fragmentation in a table .

       
    4. d.

      pg_repack: Tool used to perform online table maintenance.

       
Using all these tools, you can look into the following metrics to decide whether you have rightsized your PostgreSQL database:
  • CPU usage trend: The percentage of CPU being used and the maximum CPU used at any given time.

  • Memory usage: Amount of memory that is always free in the server. If you never have any free memory, including the cache, you need to have more RAM allocated to your instance. Or there is a culprit in the database that could be hanging all the server memory.

  • IO queue depth and IO wait: This indicates how many processes are waiting for IO and the amount of time they waited to get IO.

  • Temp usage: More usage of temp indicates that your application is badly designed or that your server's memory is not at an acceptable size.

To conclude, you need to plan the future scope while deploying a database on the cloud and have all the tools and extensions in place that could enable you to efficiently tune your PostgreSQL instance. Having a rightsized PostgreSQL instance on the cloud is more efficient and cost-effective.

Summary

This chapter covered what database as a service is and who should think about using it. It also covered the perfect business need and explained the differences between on-premise databases and cloud databases. This chapter explained the major cloud vendors for PostgreSQL service in brief, as well as the security features and limitations of each vendor. The next chapter covers the basic architecture of PostgreSQL, including installation, configuration, and limitations. It provides some basic commands to get started with PostgreSQL.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.87.196