Overview

This book consists of four parts. Part I Introduction commences the book with a general introduction to the basics of data management and data modeling.

Chapter 1 Background (page 3) provides a justification why we need databases in modern society. Desired properties of modern database systems like scalability and reliability are defined. Technical internals of database management systems (DBMSs) are explained with a focus on memory management. Central components of a DBMS (like buffer manager or recovery manager) are explored. Next, database design is discussed; a brief review of Entity-Relationship Models (ERM) and the Unified Modeling Language (UML) rounds this chapter off. Chapter 2 Relational Database Management Systems (page 17) contains a review of the relational data model by defining relation schemas, database schemas and database constraints. It continues with a example of how to transform an ERM into a relational database schema. Next, it illustrates the core concepts of relational database theory like normalization to avoid anomalies, referential integrity, relational query languages (relational calculus, relational algebra and SQL), concurrency management and transactions (including the ACID properties, concurrency control and scheduling).

Part II NOSQL And Non-Relational Databases comprises the main part of this book. In its eight chapters it gives an in-depth discussion of data models and database systems that depart from the conventional relational data model.

Chapter 3 New Requirements, “Not only SQL” and the Cloud (page 33) admits that relational databases mangement systems (RDMBSs) have their strengths and merits but then contrasts them with cases where the relational data model might be inadequate and touches on weaknesses that current implementations of relational DBMSs might have. The chapter concludes with a description of current challenges in data management and a definition of NOSQL databases. Chapter 4 Graph Databases (page 41) begins by explaining some basics of graph theory. Having presented several choices for graph data structures (from adjacency matrix to incidence list), it describes the predominant data model for graph databases: the property graph model. After a brief digression of how to map graphs to an RDBMS, two advanced types of graphs are introduced: hypergraphs and nested graphs.
Chapter 5 XML Databases (page 69) expounds the basics of XML (like XML documents and schemas, and numbering schemes) and surveys XML query languages. Then, the chapter shifts to the issue of storing XML in an RDBMS. Finally, the chapter describes the core concepts of native XML storage (like indexing, storage management and concurrency control).
Chapter 6 Key-value Stores and Document Databases (page 105) puts forward the simple data structure of key-value pairs and introduces the map-reduce concept as a pattern for parallelized processing of key-value pairs. Next, as a form of nested key-value pairs, the Java Script Object Notation (JSON) is introduced. JSON Schema and Representational State Transfer are further topics of this chapter. Chapter 7 Column Stores (page 143) outlines the column-wise storage of tabular data (in contrast to row-wise storage). Next, the chapter delineates several ways for compressed storage of data to achieve a more compact representation based on the fact that data in a column is usually more uniform than data in a row. Lastly, column striping is introduced as a recent methodology to convert nested records into a columnar representation.
Chapter 8 Extensible Record Stores (page 161) describes a flexible multidimensional data model based on column families. The surveyed database technologies also include ordered storage and versioning. After defining the logical model, the chapter explains the core concepts of the storage structures used on disk and the ways to handle writes, reads and deletes with immutable data files. This also includes optimizations like indexing, compaction and Bloom filters. Chapter 9 Object Databases (page 193) starts with a review of object-oriented notions and concepts; this review gives particular focus to object identifiers, object normalization and referential integrity. Next, several options for object-relational mapping (ORM) – that is, how to store object in an RDBMS – are discussed; the ORM approach is exemplified with the Java Persistence API (JPA). The chapter moves on to object-relational databases that offer object-oriented extensions in addition to their basic RDBMS functionalities. Lastly, several issues of storing objects natively with an Object Database Management System (ODBMS) – like for example, object persistence and reference management – are attended to.

Part III Distributed Data Management treats the core concepts of data management when data are scaled out – that is, data are distributed in a network of database servers.

Chapter 10 Distributed Database Systems (page 235) looks at the basics of data distribution. Failures in distributed systems and requirements for distributed database management systems are addressed.
Chapter 11 Data Fragmentation (page 245) targets ways to split data across a set of servers which are also known under the terms partitioning or sharding. Several fragmentation strategies for each of the different data models are discussed. Special focus is given to consistent hashing.
Chapter 12 Replication And Synchronization (page 261) elucidates the background on replication for sake of increased availability and reliability of the database systems. Afterwards, replication-related issues like distributed concurrency control and consensus protocols as well hinted handoff and Merkle trees are discussed.
Chapter 13 Consistency (page 295) touches upon the topic of relaxing strong consistency requirements known from RDBMSs into weaker forms of consistency.

Part IV Conclusion is the final part of this book.

Chapter 14 Further Database Technologies (page 311) gives a cursory overview of related database topics that are out of the scope of this book. Among other topics, it glimpses at data stream processing, in-memory databases and NewSQL databases.
Chapter 15 Concluding Remarks (page 317) summarizes the main points of this book and discusses approaches for database reengineering and data migration. Lastly, it advocates the idea of polyglot architectures: for each of the different data storage and processing tasks in an enterprise, users are free to choose a database system that is most appropriate for one task while using different database systems for other tasks and lastly integrating these systems into a common storage and processing architecture.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.28.108