APPENDIX A

image

Database Survey

In this appendix I’ve provided a short description for the major database systems covered in this book.

There are currently a huge number of database managements systems in use: The DB-Engines site (http://db-engines.com/en/ranking) ranks database systems according to their popularity using website references, search frequency, job postings, social network and Stack Overflow mentions. They track more than 250 systems, of which more than 100 achieve a non-trivial score (a score greater than 1.0 using their ranking scheme). Obviously I can’t include all of them here, and the choice of databases in this appendix—and within the book—represents nothing more than my subjective degree of interest in, and familiarity with, the various systems. Omission or inclusion shouldn’t be interpreted as representing endorsement or otherwise of the architecture or suitability for purpose of the system. Of the traditional relational vendors, I’ve included only Oracle; the coverage of their non-relational features in Chapter 12 argued for inclusion. However, many other relational vendors are implementing similar features (JSON integration, for instance).

Although I’ve argued that the era of database proliferation is coming to an end, to be followed by an era of consolidation, nevertheless new database systems are being launched fairly frequently. Some of these might represent major innovations and prove to be game-changing. However, generally I have concentrated only on databases that have been in active use for at least a few years: experience suggests that many of the latest “revolutionary” new databases will fail to gain traction.

The db-engines.com site provides an invaluable resource for tracking the relative popularity of databases. Their ranking system reflects more on Internet “chatter” than on revenues or enterprise adoption, but it does represent the single best resource for judging the relative level of mindshare for database systems.

Aerospike

Database Name: Aerospike

License/Company: AGPL commercially distributed by Aerospike Corp.

Wikipedia description:

Aerospike is a flash-optimized in-memory open-source NoSQL database and the name of the company that produces it.

Vendor’s description:

High-performance NoSQL database delivering speed at scale.

My take:

Aerospike uses an architecture that assumes fast flash SSD as the persistence layer and uses memory to cache indexes only.

Data model:

Key-value

Transactional model:

Strictly consistent single-server transactions and eventually consistent across a cluster

Clustering:

Sharding and replication

APIs:

Aerospike query language (AQL) - a subset of the SQL language-and drivers for most common languages.

Cassandra

Database Name: Apache Cassandra

License/Company: Open-source under Apache license

Datastax provides an Enterprise edition

Wikipedia description:

Apache Cassandra is an open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

Vendor’s description:

Cassandra’s masterless, shared-nothing architecture provides organizations with constant uptime for their transactional/operational database applications, as well as a flexible data model capable of storing today’s modern datatypes and operational simplicity for easy database management.

My take:

Cassandra was envisioned as a database system that could merge the best features from Google’s BigTable and Amazon’s Dynamo. Cassandra has achieved some of the most high-scale NoSQL implementations at sites like Netflix. Cassandra has implemented significant innovations beyond its BigTable and Dynamo influences and clearly stands as a leader in the NoSQL category.

Data model:

Based on the BigTable wide column model, but with a narrow column abstraction provided through the Cassandra Query Language (CQL) interface

Transactional model:

Dynamo-style tunable consistency together with a Paxos-based lightweight transaction capability

Clustering:

Consistent hashing

APIs:

Cassandra Query Language (CQL) is a SQL-like language that provides a data query and manipulation interface. CQL abstracts the underlying wide column data model, presenting a more relational-like tabular interface.

CouchBase

Database Name: Couchbase

License/Company: Apache licensed, commercial support from Couchbase, Inc.

Wikipedia description:

Couchbase Server, originally known as Membase, is an open-source, distributed (shared-nothing architecture) NoSQL document-oriented database that is optimized for interactive applications.

Vendor’s description:

Couchbase Server is a distributed NoSQL database engineered for performance, scalability, and availability. It enables developers to build applications easier and faster by leveraging the power of SQL with the flexibility of JSON.

My take:

Today’s CouchBase Server is derived both from the Memcached compatible Membase and the pioneering JSON database CouchDB. The CouchBase Server still supports both the key-value and document paradigms. The introduction of N1QL—a SQL language for documents—helps CouchBase differentiate from the other leading document database, MongoDB.

Data model:

Key-value and JSON

Transactional model:

Strict consistency for single-document transactions

Clustering:

Sharding and replication

APIs:

REST-based API with drivers for Java and other languages, and N1QL—a SQL language for working with documents

DynamoDB

Database Name: DynamoDB

License/Company: Amazon

Wikipedia description:

Amazon DynamoDB is a fully managed proprietary NoSQL database service that is offered by Amazon.com as part of the Amazon Web Services portfolio.

Vendor’s description:

Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models.

My take:

Amazon invented the Dynamo model and provides the dominant public cloud platform, so it’s natural that it provides a cloud-based database based on the Dynamo model. DynamoDB offers high performance, almost limitless scalability, and very low administrative overhead. However, the lock-in to Amazon and the relatively high price probably limit its current market adoption.

Data model:

Key-value store with support for lists and maps and JSON documents. Secondary index support.

Transactional model:

Eventually consistent by default, but applications can request consistent reads

Clustering:

Consistent hashing

APIs:

REST-based API with drivers for Java, .NET, and PHP

HBase

Database Name: Apache HBase

License/Company: Apache license, commercially provided by Cloudera, MapR, Hortonworks, and others

Wikipedia description:

HBase is an open-source, nonrelational, distributed database modeled after Google’s BigTable and written in Java. It is developed as part of Apache Software Foundation’s Apache Hadoop project, and runs on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like capabilities for Hadoop.

Vendor’s description:

HBase is the high-performance, distributed data store built for Apache Hadoop.

My take:

HBase achieved early market penetration and technical acceleration through its association with the Hadoop platform, and has been proven at scale at many sites. The tight integration with Hadoop HDFS helps it achieve very high availability and provides a built-in synergy for Big Data analytics.

Data model:

BigTable wide column families

Transactional model:

Strictly consistent single-row transactions

Clustering:

Range keys are partitioned into regions and a master server keeps track of this partitioning. At the disk level, HBase uses HDFS for distribution and replication.

APIs:

REST and Thrift interfaces with Java driver

MarkLogic

Database Name: MarkLogic

License/Company: Proprietary, MarkLogic Corp.

Wikipedia description:

MarkLogic is considered a multi-model NoSQL database for its ability to store, manage, and search JSON and XML documents and graph data (RDF triples).

Vendor’s description:

MarkLogic is a new-generation database that is built with a flexible data model to store, manage, and search today’s data without sacrificing any of the data resiliency and consistency features of last-generation relational databases.

My take:

MarkLogic had built a powerful and widely adopted XML database prior to the emergence of what we now call NoSQL databases. MarkLogic has recently added support for JSON and now positions as an enterprise NoSQL database.

Data model:

XML, RDF, JSON

Transactional model:

Strictly consistent

Clustering:

Sharding

APIs:

XQuery, XSTL, SPARQL, REST

MongoDB

Database Name: MongoDB

License/Company: GNU AGPL, Apache licensed drivers. Commercially supported by MongoDB, Inc.

Wikipedia description:

MongoDB (from humongous) is a cross-platform document-oriented database. Classified as a NoSQL database, MongoDB eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster.

Vendor’s description:

MongoDB is an open-source, document database designed for ease of development and scaling. MongoDB provides high performance, high availability, and automatic scaling.

My take:

MongoDB has established a strong lead in NoSQL adoption, driven by its popularity with web developers, where the database has displaced MySQL as the default choice for websites built on modern open-source frameworks.

Data model:

JSON documents

Transactional model:

Strictly consistent by default for single-document transactions

Clustering:

Hash or range sharding with master nodes

APIs:

JavaScript query API and drivers for Java, .NET, Python, and other languages

Neo4J

Database Name: Neo4J

License/Company: GPL/AGPL, commercially provided by Neo Technology

Wikipedia description:

Neo4j is an open-source graph database implemented in Java and accessible from software written in other languages using the Cypher query language through a transactional HTTP endpoint.

Vendor’s description:

Neo4j is the World’s Leading Graph Database.

My take:

Neo4J represents the most widely used property graph database. The open-source version of Neo4J’s Cypher programming language may become a standard.

Data model:

Property graph

Transactional model:

Strictly consistent

Clustering:

Master-slave replication

APIs:

Cypher graph programming language with drivers for most programming languages

NuoDB

Database Name: NuoDB

License/Company: Proprietary, provided by NuoDB Corp.

Wikipedia description:

NuoDB is a NewSQL database that works in the cloud. It can work both for single-vendor and multi-vendor cloud setup.

Vendor’s description:

NuoDB’s revolutionary durable distributed cache (DDC) architecture combines the strengths of traditional RDBMSs—rich ANSI SQL support, full ACID transactions, organization-class tooling for security, backup, and administration—with support for elastic scalability and continuous availability across multiple data centers.

My take:

NuoDB is a significant attempt at building an ACID-compliant distributed SQL database. It includes a tunable consistency model and a pluggable storage engine architecture.

Data model:

Relational model layered on top of a pluggable storage layer that may include nonrelational engines

Transactional model:

ACID with tunable consistency levels that may result in eventually consistent behavior

Clustering:

Proprietary clustering model

APIs:

SQL, with non-SQL access possible to underlying storage engines

Oracle RDBMS

Database Name: Oracle database 12c

License/Company: Oracle

Wikipedia description:

Oracle Database (commonly referred to as Oracle RDBMS or simply as Oracle) is an object-relational database management system produced and marketed by Oracle Corp.

Vendor’s description:

Oracle Database 12c introduces a new multi-tenant architecture that makes it easy to consolidate many databases quickly and manage them as a cloud service. Oracle Database 12c also includes in-memory data processing capabilities delivering breakthrough analytical performance.

My take:

Oracle can claim to be the first successful commercial database based on the relational model and for roughly 30 years has dominated the database market.

From a technology position, Oracle has generally led the market as well, pioneering many core RDBMS architectures including row-level locking, MVCC, and shared-disk clustering.

Oracle provides a Hadoop appliance and a NoSQL key-value store. Within the core RDBMS it has implemented many document-oriented database features, including a JSON store with a REST interface.

Data model:

Relational with extensions for object types (varrays, nested tables, etc.) and embedded XML and JSON

Transactional model:

ACID with MVCC

Clustering:

Shared disk cluster database (RAC) or sharding

APIs:

SQL with a proprietary PL/SQL stored procedure language

Redis

Database Name: Redis

License/Company: BSD license, commercially supported by Redis Labs

Wikipedia description:

Redis is a data structure server. It is open-source, networked, in-memory; it stores keys with optional durability.

Vendor’s description:

Redis is an open-source (BSD licensed), in-memory data structure store, used as database, cache, and message broker.

My take:

Redis is a popular lightweight in-memory key-value store.

Data model:

Key-value

Transactional model:

Strictly consistent within a single server

Clustering:

Master-slave replication

APIs:

API with drivers for most commonly used languages

Riak

Database Name: Apache Riak

License/Company: Apache open-source project, commercialized by Basho Technologies

Wikipedia description:

Riak is a distributed NoSQL key-value data store that offers high availability, fault tolerance, operational simplicity, and scalability. Riak implements the principles from Amazon’s Dynamo paper with heavy influence from the CAP theorem.

Vendor’s description:

Riak is a distributed NoSQL database that is highly available, scalable, and easy to operate. It automatically distributes data across the cluster to ensure fast performance and fault tolerance.

My take:

Riak is a fairly pure implementation of the Dynamo key-value store concept together with Solr integration, time series extensions, and an object cloud storage product. Riak has significant adoption and is technically sophisticated.

Data model:

Key-value

Transactional model:

Dynamo tunable consistency

Clustering:

Consistent hashing

APIs:

REST API with drivers for Java, Ruby, Python, etc.

SAP HANA

Database Name: Hana

License/Company: Proprietary, produced by SAP SE

Wikipedia description:

SAP HANA is an in-memory, column-oriented, relational database management system developed and marketed by SAP SE.

Vendor’s description:

Accelerate the pace of innovation with SAP HANA—an in-memory platform that combines an ACID-compliant database with advanced data processing, application services, and flexible data integration services.

My take:

Hana combines columnar or row-oriented storage formats and in-memory technology on a certified hardware specification to provide low latencies for OLTP or OLAP workloads.

Data model:

Relational

Transactional model:

ACID

Clustering:

Shared-nothing partitioning

APIs:

SQL

TimesTen

Database Name: TimesTen

License/Company: Proprietary to Oracle

Wikipedia description:

TimesTen is an in-memory, relational database management system with persistence and recoverability.

Vendor’s description:

Oracle TimesTen In-Memory Database (TimesTen) is a full-featured, memory-optimized, relational database with persistence and recoverability.

My take:

An early entrant to the in-memory database category and a good example of an in-memory transactional relational architecture. Mainly significant today as part of Oracle’s broader software stack.

Data model:

Relational

Transactional model:

ACID

Clustering:

None

APIs:

SQL

Vertica

Database Name: Vertica

License/Company: Proprietary, provided by HP

Wikipedia description:

The cluster-based, column-oriented Vertica Analytics Platform is designed to manage large, fast-growing volumes of data and provide very fast query performance when used for data warehouses and other query-intensive applications.

Vendor’s description:

HP Vertica is the most advanced SQL database analytics portfolio built from the very first line of code to address the most demanding Big Data analytics initiatives.

My take:

Vertica is a fairly faithful implementation of the concepts outlined in Stonebraker et al.’s seminal papers, which partially launched the NewSQL category. Together with SAP Sybase IQ, it represents an example of a database system based primarily on the columnar concepts.

Data model:

Relational

Transactional model:

ACID

Clustering:

Shared-nothing

APIs:

SQL

VoltDB

Database Name: VoltDB

License/Company: Proprietary, VoltDB Corp.

Wikipedia description:

VoltDB is an in-memory database designed by several well-known database system researchers, including A. M. Turing Award winner Michael Stonebraker. It is an ACID-compliant RDBMS that uses a shared-nothing architecture.

Vendor’s description:

In-memory performance, never loses data. Streaming analytics with millisecond latency. OLTP in a scale-out architecture. SQL and JSON with ACID guarantees.

My take:

VoltDB implements a purer in-memory architecture than other databases that describe themselves as in-memory, but perform disk IOs during commit operations. The architecture is also notable for avoiding latching and locking within a single partition.

Data model:

Relational, but partitioning works best when data is hierarchical

Transactional model:

ACID

Clustering:

Shared-nothing

APIs:

SQL and Java stored procedures

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.123.147