Chapter 2. What’s New in Greenplum?

Technology products evolve over time. Greenplum forked from the mainline branch of PostgreSQL at release 8.2.15, but continued to add new PostgreSQL features. PostgreSQL also evolved over time, and Pivotal began the process of reintegrating Greenplum into PostgreSQL with the goals of introducing the useful new features of later releases of PostgreSQL into Greenplum while also adding Greenplum-specific features into PostgreSQL.

This process began in release 5 of Greenplum in 2017 and continues with release 6 of Greenplum in 2019.

What’s New in Greenplum 5?

New in Greenplum Version 5

Following is a list of the new features in Greenplum 5. See later sections of the book for more details on some of these features.

PostgreSQL introduced a data file format change in release 8.4. Pivotal’s goal of rejoining the PostgreSQL code line is a gradual process. In Greenplum 5, we achieved parity with PostgreSQL 8.4. That meant a migration of the data files. There are too many new features in this release to list them all; here are a few important ones:

R and Python data science modules

These are collections of open source packages that data scientists find useful. They can be used in conjunction with the procedural languages for writing sophisticated analytic routines.

New datatypes

JSON, UUID, and improved XML support.

Enhanced query optimization

The GPORCA query optimizer has increased support for more complex queries.

PXF extension format for integrating external data

PXF is a framework for accessing data external to Greenplum. This is discussed in Chapter 8.

analyzedb enhancement

Critical for good query performance is the understanding of the size and data contents of tables. This utility was enhanced to cover more use cases and provide increased performance.

PL/Container for untrusted languages

Python and R and untrusted languages because they contain OS callouts. As a result, only the database superuser can create functions in these languages. Greenplum 5 added the ability to run such functions in a container isolated from the OS proper so superuser powers are not required.

Improved backup and restore and incremental backup

Enhancements to the tools used to back up and restore Greenplum. These will be discussed in Chapter 7.

Resource groups to improve concurrency

The ability to control queries cannot be underestimated in an analytic environment. The new resource group mechanism is discussed in Chapter 7.

Greenplum-Kafka integration

Kafka has emerged as a leading technology for data dissemination and integration for real-time and near-real-time data streaming. Its integration with Greenplum is discussed in Chapter 8.

Enhanced monitoring with Pivotal Greenplum Command Center 4

Critical for efficient use of Greenplum is the ability to understand what is occurring in Greenplum now and in the past. This is discussed in Chapter 7.

What’s New in Greenplum 6?

New in Greenplum Version 6

This is a list of the new features in Greenplum 6. Some features are explored in more detail later in the book.

Greenplum 6 continued the integration of later PostgreSQL releases and is now fully compatible with PostgreSQL 9.4. Pivotal is on a quest to add more PostgreSQL compatibility with each new major release.

PostgreSQL 9.4 merged

Pivotal Greenplum now has integrated the PostgreSQL 9.4 code base. This opens up new features and absorbs many performance improvements.

Write-ahead logging (WAL) replication

WAL is a PostgreSQL method for assuring data integrity. Though beyond the scope of this book, more information about it is located in the High Availability section of the Administrator Guide.

Row-level locking

Prior to Greenplum 6, updates to tables required locking the entire table. The introduction of locks to single rows can improve performance by a factor of 50.

Foreign data wrapper

The foreign data wrapper API allows Greenplum to access other data sources as though they were PostgreSQL or Greenplum tables. This is discussed in Chapter 8.

PostgreSQL extension (e.g., pgaudit)

The inclusion of PostgreSQL 9.4 code brings along many utilities that depend upon 9.4 functionality in the database. pgaudit is a contributed tool that makes use of that.

Recursive common table expressions (CTEs)

CTEs are like temporary tables, but they only exist for the duration of the SQL statement. Recursive CTEs reference themselves and are useful in querying hierarchical data.

JSON, FTS, GIN indexes

These are specialized indexes for multicolumn and text-based searches. They are not discussed in this book.

Vastly improved online transaction performance (OLTP)

Greenplum 6 now uses row-level locking for data manipulation language (DML) operations. This has an enormous impact on the performance of these operations, which often occur in ETL and data cleansing.

Replicated tables

Replicated tables have long been requested by customers. These are discussed in Chapter 4.

zStandard compression

zStandard is a fast lossless compression algorithm.

More efficient cluster expansion

Cluster expansion, though a rare event, requires computational and disk access resources to redistribute the data. A new algorithm minimizes this time.

Greenplum on Kubernetes

Useful in on-premises cloud deployment; this is discussed in Chapter 3.

More optimizer enhancements

Other than performance improvements, these are mostly transparent to the user community.

Diskquota

The Diskquota extension provides disk usage enforcement for database objects. It sets a limit on the amount of disk space that a schema or a role can use.

Additional Resources

The Greenplum release notes contain information specific to the Greenplum version:

  • New features

  • Changed features

  • Beta features

  • Deprecated features

  • Resolved issues

  • Known issues and limitations

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.55.198