INTRODUCTION

THE GROWTH OF USER-DRIVEN CONTENT has fueled a rapid increase in the volume and type of data that is generated, manipulated, analyzed, and archived. In addition, varied newer sets of sources, including sensors, Global Positioning Systems (GPS), automated trackers and monitoring systems, are generating a lot of data. These larger volumes of data sets, often termed big data, are imposing newer challenges and opportunities around storage, analysis, and archival.

In parallel to the fast data growth, data is also becoming increasingly semi-structured and sparse. This means the traditional data management techniques around upfront schema definition and relational references is also being questioned.

The quest to solve the problems related to large-volume and semi-structured data has led to the emergence of a class of newer types of database products. This new class of database products consists of column-oriented data stores, key/value pair databases, and document databases. Collectively, these are identified as NoSQL.

The products that fall under the NoSQL umbrella are quite varied, each with their unique sets of features and value propositions. Given this, it often becomes difficult to decide which product to use for the case at hand. This book prepares you to understand the entire NoSQL landscape. It provides the essential concepts that act as the building blocks for many of the NoSQL products. Instead of covering a single product exhaustively, it provides a fair coverage of a number of different NoSQL products. The emphasis is often on breadth and underlying concepts rather than a full coverage of every product API. Because a number of NoSQL products are covered, a good bit of comparative analysis is also included.

If you are unsure where to start with NoSQL and how to learn to manage and analyze big data, then you will find this book to be a good introduction and a useful reference to the topic.

WHO THIS BOOK IS FOR

Developers, architects, database administrators, and technical project managers are the primary audience of this book. However, anyone savvy enough to understand database technologies is likely to find it useful.

The subject of big data and NoSQL is of interest to a number of computer science students and researchers as well. Such students and researchers could benefit from reading this book.

Anyone starting out with big data analysis and NoSQL will gain from reading this book.

WHAT THIS BOOK COVERS

This book starts with the essentials of NoSQL and graduates to advanced concepts around performance tuning and architectural guidelines. The book focuses all along on the fundamental concepts that relate to NoSQL and explains those in the context of a number of different NoSQL products. The book includes illustrations and examples that relate to MongoDB, CouchDB, HBase, Hypertable, Cassandra, Redis, and Berkeley DB. A few other NoSQL products, besides these, are also referenced.

An important part of NoSQL is the way large data sets are manipulated. This book covers all the essentials of MapReduce-based scalable processing. It illustrates a few examples using Hadoop. Higher-level abstractions like Hive and Pig are also illustrated.

Chapter 10, which is entirely devoted to NoSQL in the cloud, brings to light the facilities offered by Amazon Web Services and the Google App Engine.

The book includes a number of examples and illustration of use cases. Scalable data architectures at Google, Amazon, Facebook, Twitter, and LinkedIn are also discussed.

Towards the end of the book the discussion on comparing NoSQL products and polyglot persistence in an application stack are explained.

HOW THIS BOOK IS STRUCTURED

This book is divided into four parts:

  • Part I: Getting Started
  • Part II: Learning the NoSQL Basics
  • Part III: Gaining Proficiency with NoSQL
  • Part IV: Mastering NoSQL

Topics in each part are built on top of what is covered in the preceding parts.

Part I of the book gently introduces NoSQL. It defines the types of NoSQL products and introduces the very first examples of storing data in and accessing data from NoSQL:

  • Chapter 1 defines NoSQL.
  • Starting with the quintessential Hello World, Chapter 2 presents the first few examples of using NoSQL.
  • Chapter 3 includes ways of interacting and interfacing with NoSQL products.

Part II of the book is where a number of the essential concepts of a variety of NoSQL products are covered:

  • Chapter 4 starts by explaining the storage architecture.
  • Chapters 5 and 6 cover the essentials of data management by demonstrating the CRUD operations and the querying mechanisms. Data sets evolve with time and usage.
  • Chapter 7 addresses the questions around data evolution. The world of relational databases focuses a lot on query optimization by leveraging indexes.
  • Chapter 8 covers indexes in the context of NoSQL products. NoSQL products are often disproportionately criticized for their lack of transaction support.
  • Chapter 9 demystifies the concepts around transactions and the transactional-integrity challenges that distributed systems face.

Parts III and IV of the book are where a select few advanced topics are covered:

  • Chapter 10 covers the Google App Engine data store and Amazon SimpleDB. Much of big data processing rests on the shoulders of the MapReduce style of processing.
  • Learn all the essentials of MapReduce in Chapter 11.
  • Chapter 12 extends the MapReduce coverage to demonstrate how Hive provides a SQL-like abstraction for Hadoop MapReduce tasks. Chapter 13 revisits the topic of database architecture and internals.

Part IV is the last part of the book. Part IV starts with Chapter 14, where NoSQL products are compared. Chapter 15 promotes the idea of polyglot persistence and the use of the right database, which should depend on the use case. Chapter 16 segues into tuning scalable applications. Although seemingly eclectic, topics in Part IV prepare you for practical usage of NoSQL. Chapter 17 is a presentation of a select few tools and utilities that you are likely to leverage with your own NoSQL deployment.

WHAT YOU NEED TO USE THIS BOOK

Please install the required pieces of software to follow along with the code examples. Refer to Appendix A for install and setup instructions.

CONVENTIONS

To help you get the most from the text and keep track of what’s happening, we’ve used a number of conventions throughout the book.

image

The pencil icon indicates notes, tips, hints, tricks, and asides to the current discussion.

As for styles in the text:

  • We italicize new terms and important words when we introduce them.
  • We show file names, URLs, and code within the text like so: persistence.properties.
  • We present code in two different ways:
We use a monofont type with no highlighting for most code examples.
We use bold to emphasize code that is particularly important in the present context or to show changes from a previous code snippet.

SOURCE CODE

As you work through the examples in this book, you may choose either to type in all the code manually, or to use the source code files that accompany the book. All the source code used in this book is available for download at www.wrox.com. When at the site, simply locate the book’s title (use the Search box or one of the title lists) and click the Download Code link on the book’s detail page to obtain all the source code for the book. Code that is included on the website is highlighted by the following icon:

image

Listings include the filename in the title. If it is just a code snippet, you’ll find the filename in a code note such as this:

Code snippet filename

image

Because many books have similar titles, you may find it easiest to search by ISBN; this book’s ISBN is 978-0-470-94224-6.

Once you download the code, just decompress it with your favorite compression tool. Alternately, you can go to the main Wrox code download page at www.wrox.com/dynamic/books/download.aspx to see the code available for this book and all other Wrox books.

ERRATA

We make every effort to ensure that there are no errors in the text or in the code. However, no one is perfect, and mistakes do occur. If you find an error in one of our books, like a spelling mistake or faulty piece of code, we would be very grateful for your feedback. By sending in errata, you may save another reader hours of frustration, and at the same time, you will be helping us provide even higher quality information.

To find the errata page for this book, go to www.wrox.com and locate the title using the Search box or one of the title lists. Then, on the book details page, click the Book Errata link. On this page, you can view all errata that has been submitted for this book and posted by Wrox editors. A complete book list, including links to each book’s errata, is also available at www.wrox.com/misc-pages/booklist.shtml.

If you don’t spot “your” error on the Book Errata page, go to www.wrox.com/contact/techsupport.shtml and complete the form there to send us the error you have found. We’ll check the information and, if appropriate, post a message to the book’s errata page and fix the problem in subsequent editions of the book.

P2P.WROX.COM

For author and peer discussion, join the P2P forums at p2p.wrox.com. The forums are a Web-based system for you to post messages relating to Wrox books and related technologies and interact with other readers and technology users. The forums offer a subscription feature to e-mail you topics of interest of your choosing when new posts are made to the forums. Wrox authors, editors, other industry experts, and your fellow readers are present on these forums.

At p2p.wrox.com, you will find a number of different forums that will help you, not only as you read this book, but also as you develop your own applications. To join the forums, just follow these steps:

1. Go to p2p.wrox.com and click the Register link.

2. Read the terms of use and click Agree.

3. Complete the required information to join, as well as any optional information you wish to provide, and click Submit.

4. You will receive an e-mail with information describing how to verify your account and complete the joining process.

image

You can read messages in the forums without joining P2P, but in order to post your own messages, you must join.

Once you join, you can post new messages and respond to messages other users post. You can read messages at any time on the Web. If you would like to have new messages from a particular forum e-mailed to you, click the Subscribe to this Forum icon by the forum name in the forum listing.

For more information about how to use the Wrox P2P, be sure to read the P2P FAQs for answers to questions about how the forum software works, as well as many common questions specific to P2P and Wrox books. To read the FAQs, click the FAQ link on any P2P page.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.239.214