Preface

During the last two decades, the landscape of database management systems has changed immensely. Based on the fact that data are nowadays stored and managed in network of distributed servers (“clusters”) and these servers consist of cheap hardware (“commodity hardware”), data of previously unthinkable magnitude (“big data”) are produced, transferred, stored, modified, transformed, and in the end possibly deleted. This form of continuous change calls for flexible data structures and efficient distributed storage systems with both a high read and write throughput. In many novel applications, the conventional table-like (“relational”) data format may not the data structure of choice – for example, when easy exchange of data or fast retrieval become vital requirements. For historical reasons, conventional database management systems are not explicitly geared toward distribution and continuous change, as most implementations of database management systems date back to a time where distributed storage was not a major requirement. These deficiencies might as well be attributed to the fact that conventional database management systems try to incorporate several database standards as well as have high safety guarantees (for example, regarding concurrent user accesses or correctness and consistency of data).

Several kinds of database systems have emerged and evolved over the last years that depart from the established tracks of data management and data formats in different ways. Development of these emergent systems started from scratch and gave rise to new data models, new query engines and languages, and new storage organizations. Two things are particularly remarkable features of these systems: on the one hand, a wide range of open source products are available (though some systems are supported by or even originated from large international companies) and development can be observed or even be influenced by the public; on the other hand, several results and approaches achieved by long-standing database research (having its roots at least as early as the 1960s) have been put into practice in these database systems and these research results now show their merits for novel applications in modern data management. On the downside, there are basically no standards (with respect to data formats or query languages) in this novel area and hence portability of application code or long-term support can usually not be guaranteed. Moreover, these emerging systems are not as mature (and probably not as reliable) as conventional established systems.

The term NOSQL has been used as an umbrella term for several emerging database systems without an exact formal definition. Starting with the notion of NoSQL (which can be interpreted as saying no to SQL as a query language) it has evolved to mean “not only SQL” (and hence written as NOSQL with a capital O). The actual origin of the term is ascribed to the 2009 “NOSQL meetup”: a meeting with presentations of six database systems (Voldemort, Cassandra, Dynomite, HBase, Hypertable, and CouchDB). Still, the question of what exactly a NOSQL database system is cannot be answered unanimously; nevertheless, some structure slowly becomes visible in the NOSQL field and has led to a broad categorization of NOSQL database systems. Main categories of NOSQL systems are key-value stores, document stores, extensible record stores (also known as column family stores) and graph databases. Yet, other creatures live out there in the database jungle: object databases and XML databases do not espouse the relational data model nor SQL as a query language – but they typically would not be considered NOSQL database systems (probably because they predate the NOSQL systems). Moreover, column stores are an interesting variant of relational database systems.

This book is meant as a textbook for computer science lectures. It is based on Master-level database lectures and seminars held at the universities of Hildesheim and Göttingen. As such it provides a formal analysis of alternative, non-relational data models and storage mechanisms and gives a decent overview of non-SQL query languages. However, it does not put much focus on installing or setting up database systems and hence complements other books that concentrate on more technical aspects. This book also surveys storage internals and implementation details from an abstract point of view and describes common notions as well as possible design choices (rather than singling out one particular database system and specializing on its technical features).

This book intends to give students a perspective beyond SQL and relational database management systems and thus covers the theoretical background of modern data management. Nevertheless this book is also aimed at database practitioners: it wants to help developers or database administrators coming to an informed decision about what database systems are most beneficial for their data management requirements.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.244.250