Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3. What Is Multi-Model?

The ability to handle multiple data models in a single system is the key benefit at the core of a multi-model database. By utilizing a single product or system that can handle many data models, you greatly reduce the complexity required to get a system up and running. You also increase your system’s ability to scale and adapt to future needs.

Here are the two key components a multi-model database offers:

Supports many data models in a single system
Provides the ability to query each data model using standard queries applicable to each data model

Data Models

The data model is the most important piece of a data-centric system. The data model is responsible for enabling a solid foundation to build upon. The following section describes some different types of data models and how each one can enhance a data-centric system.

Art or Science?

Many would agree that data modeling is an art more than it is a science. For example, if you put 10 data modelers in a room with a single set of requirements, you’d end up with 11 different data models. This is because we all have different experiences, knowledge, and viewpoints, which cause us to have different problem-solving approaches and ways of thinking. By using a multi-model database, instead of restricting and forcing data into a box, we are allowed to embrace these differences and even take advantage of them.

One Model Does Not Fit All

Rarely (if ever) does a single data model encapsulate all present and future requirements of a system. However, in the absence of a multi-model database, we are often forced to pretend this is not the case. Lucky for us, we are aware (or at least becoming aware) of the benefits that a multi-model database has to offer.

It is important to remember that data models can both enable and constrain our ability to create a functional system. Data models ensure that data is stored in a consistent manner, making it easier to store and retrieve it. By using a multi-model approach, we can take advantage of all of the positives that a data model has to offer and shift to other data models to fill the gaps of others. This becomes especially important in dynamic environments. Systems that deal with dynamic data and changing data requirements will be able to adapt instead of die.

Types of Data and Databases

There are several different types of data that we might want to store in a multi-model database that would allow us to create a more robust and dynamic system. Let’s take a look at some of these types of data—and the types of databases we would normally store them in.

Relational database and tabular data

The relational database is an old tried-and-true tool with which most everyone is familiar. Structuring data into nice and neat columns and rows makes everyone feel good. The relational database is a good place for storing structured, tabular data. Unfortunately, we must spend time at the outset and make assumptions on how the data is going to be used. What happens when this structure needs to change, or we need to add additional structures to deal with additional use cases? How do we store and consolidate data across different systems? What happens is we end up with a growing web of complexity that eventually becomes unmaintainable.

Document storage (JSON and XML)

The document store removes a lot of the risks and issues of the relational database by not forcing implementers to define a schema upfront if they choose. By allowing this flexibility, the system can store many different types of data and different data structures with little to no effort. Further, a document database organizes data using self-describing, hierarchical formats such as JSON and XML. The document model often maps to business problems very naturally, and in some sense, it is a reaction to the relational model. The document store is a good place to store data that is less structured and more object like.

Key–value store

The key–value store is an efficient way to store data in a hash table for very quick reads. Data that is regularly accessed could be stored in a key/value store so that users are able to quickly retrieve data within sub-second response times.

Graph database/semantic web

A graph database is a database that uses graph structures with nodes, edges, and properties to represent and store data. The relationships allow data in the store to be linked directly and, in many cases, retrieved with a single operation. In this space, we see two major forms of graph store emerge: the property store and a triple store.

Advantages of Multi-Model

The multi-model database offers many different solutions to many different data issues. Let’s take a look at some of these advantages as well as how they might affect different system scenarios.

Pros and Cons of Various Models

Each type of data model brings with it a different set of advantages and disadvantages. For example, a document store brings the flexibility of having hierarchical, self-describing data; semantic data allows you to store and query relationships and make inferences; and key–value stores allow you to quickly retrieve data with subsecond response times. At the same time, it would be difficult to implement a robust and dynamic system that today’s businesses need using only one of these data models. Using a multi-model database, you can look at the pros and cons of each of these data models and apply them to use cases where applicable.

When Strong Data Typing Matters

Another advantage of multi-model databases is that you don’t need to adhere to always being strongly or loosely typed. You can type your data however it makes sense. In fact, in some cases it might make perfect sense to strongly type your data, such as when a data source is very well defined with little change expected. On the other hand, if you suddenly need a new data type not previously anticipated, you no longer need rebuild the database. Multi-model databases don’t require you to make a single decision for multiple scenarios. You can make many granular decisions that meet the unique and dynamic conditions of each use case.

Faster System Implementation

When implementing a new system, one of the first things that needs to be completed at the outset is data modeling. Before you can begin developing an application on top of a system, you need to complete all data modeling. Because of this, data modeling will sometimes hold up the development of a system. However, as we have discussed, with a multi-model database approach, we eliminate the need for this up-front modeling. This reduces the up-front time and cost of implementing a new system and allows users to realize the value of a system much faster. One of the ways that we can achieve system implementation without up-front data modeling is by using the envelope pattern. Let’s take a look at that now.

The envelope pattern

A strategy that you can use to increase data flexibility is to use the envelope pattern. Using the envelope pattern, you essentially wrap your data so that your original content remains untouched. To enable searching of your data, you simply add headers that facilitate any requirements. Using this strategy, you can bring in your data as is without processing it and add searchable headers, provenance data, and so on as the need arises. Let’s take a look at the envelope pattern as it is described in Pete Aven and Diane Burley’s book Building on Multi-Model Databases (O’Reilly):

From Building on Multi-Model Databases

In multi-model databases, a simple yet powerful pattern has emerged that MarkLogic calls the envelope pattern. Here’s how it works: take your source entity, and make it the subdocument of a parent document. This parent document is the “envelope” for your source data. As a sibling to your source data within the envelope, you add a header section where you start to add your standardized attribute names, attribute values, and attribute structure. In our example, we’ve standardized the “Zip” and “Postal” attributes as “Zip” properties in both envelopes (see [Figure 3-1]). As a result, we can now issue a structured query of the type “Show me all customers having a zip code equal to 94111” and get both entities back.

Figure 3-1. An envelope pattern

The benefits to this approach are numerous:

We retain our source entity as is. This is a big benefit for highly regulated environments that come in and request to see the entities as they were originally imported into our database.
We standardize only what we need, when we need it. We don’t require all the mappings up front. Now we iteratively manage our data in place.
We can continue to update our model later without reingesting.
We can divide and conquer the standardization process across development.

Single Comprehensive Index

Without multi-model, we are required to have multiple data storage technologies, and with each technology, comes its own method for indexing data. Because of this, it becomes cumbersome and complex to query. If we wanted to have the capability to query each system’s indexed data in a single query, we would need to integrate all data systems into a single data storage and index the data again. This leads to data duplication as well as countless other potential issues. With a multi-model system, because the data is stored and indexed in a single backend, we end up with a single comprehensive index, containing data from all data models stored in the system. In other words, we can store JSON, XML, and RDF data, take advantage of lightning-fast index searches, and we have to manage only a single index!

Failing Fast

Oftentimes, we don’t know what we don’t know. This is especially true when it comes to implementing hundreds of requirements across multiple use cases and disparate data. The chances of us getting it right the first time are slim to none. Using a multi-model database, we can quickly bring data in, not wasting time modeling and transforming it. As users begin to discover what data is available, we can begin modeling the data as they need it. If things don’t work out as expected, we can make adjustments, remodel data, and continually make the system better. This type of iterative approach is another major advantage offered by the multi-model databases.

Data Versus Information Versus Knowledge

The purpose of any data-based IT system is to provide users the ability to search for data and turn those results into information and knowledge. By using a multi-model database, you can easily bring in many types and forms of data. These additional types and forms of data help to complement search results and give them further context. For example, supplementing text search results with semantic relationships can help users to narrow down their search.

Let’s begin with a few definitions:

Data: The facts and statistics collected together for reference or analysis. Unfiltered and unrefined information.
Information: What is conveyed or represented by a particular arrangement or sequence of things. Refined data that is useful for analysis.
Knowledge: Facts, information, and skills acquired by a person through experience or education; the theoretical or practical understanding of a subject. Occurs when context and human experience is applied to information.

With these definitions in mind, how do they relate to how we store data, and, specifically, how does a multi-model database help transform data into information, and information into knowledge?

Storing data is simple. There are hundreds of cost-effective and efficient ways of storing data. However, if we want to turn data into information and knowledge, we must somehow refine and filter the data. Enabling this refinement and filtering is much less complex with a multi-model database.

Enhanced Security

By using a multi-model database and eliminating the need for multiple system integrations, we eliminate the complexity involved in securing data across multiple systems. Each database in a traditional integrated system will require different processes and steps for securing its data. This means that we will need to implement, document, maintain, and so on what’s required for each system separately if we need to secure our system. With the single backend offered by a multi-model database, we have only one system for which to implement security. This allows us to provide a much more secure system at a reduced cost.

Database Concepts: Consistency, Availability, and Tolerance

Before the multi-model database, if you wanted to have multiple data models in a system, you had to integrate multiple backends and databases. This made it difficult to create a system that met data consistency, availability, and fault-tolerance requirements.

Consider the following scenario as an example:

You have two databases with two distinct data models.
These two databases are integrated.
If a user makes an update in the first database, the second database must be updated.

With this scenario in mind, think about how you would handle the requirement of atomicity. You would need to maintain some logic that would roll back changes in both databases in case of a failure in one of them. These types of integrations greatly increase the complexity of your system. With a multi-model database, we have a single backend, which means there are no integrations and no additional complexity in handling atomicity.

In the next section, we highlight a few concepts so that it’s a little clearer how a multi-model database eases complexities like these.

ACID versus BASE

The ACID acronym stands for the following:

Atomic: In a transaction, all of the information is committed to save, or none is saved.
Consistent: Saved data cannot violate the database’s integrity.
Isolated: One transaction does not affect another transaction.
Durable: Completed transactions will always persist.

These attributes all seem like good things (and they are) when putting together a data-centric system. After all, we want our data to be consistent, reliable, and safe because it is the lifeblood of our systems. Unfortunately, in this case, we can’t have our cake and eat it too. The CAP theorem states that when it comes to consistency, availability, and tolerance, you must pick two out of three.

In very large systems, the CAP theorem can pose an issue when trying to offer both high availability and responsiveness. To compensate for this and still have relatively consistent and durable data, we can use a BASE database.

The BASE acronym stands for the following:

Basically available
Soft state
Eventually consistent

In some systems, it might not be a must to require strict data consistency. In this case, eventual consistency would be fine. For example, a large online retailer such as Amazon might be more concerned with quickly returning inventory descriptions for a customer’s search, and not care so much about the exact number of items that are currently available.

In the context of the CAP theorem, we do not need to completely abandon consistency. We can fine-tune our systems to be on one end of the spectrum or the other, depending on what our system’s requirements are.

Schemas

A schema represents your data’s logical structure. It is a definition of how your data is going to look in your system. In a database, a schema serves some different purposes:

Defines how data is structured
Defines what types of data you are storing
Identifies constraints on how data can be stored
Using a formal language, helps describe data to a DBMS

When schemas are essential, and when they are not

When using schemas, a DBMS will use that additional metadata to create more efficient indexes, sorting, and so on. Because of this, it is often advantageous to utilize schemas. For example, suppose that you have a field that is used for storing a date. If you needed to sort on that field, and didn’t have a schema defined for the field, it might just sort as a string value. If you defined a schema and set the field type to be a date, sorting would occur as expected, without any extra input.

Unfortunately, by defining schemas you are forcing data into a defined structure. In some cases, this might not be a very big deal. However, when you have multiple data sources as input and/or multiple users contributing and accessing this data, it can become a huge burden to perform the up-front modeling required to satisfy all parties. Because of this, you might want to consider the trade-offs and costs of having one uber schema versus not.

One final option would be to perform a schema-on-read. With this approach, data can be inserted in many different formats. After the query results are returned, a schema is applied to the results. With this approach, you can have the best of both worlds. Databases such as MarkLogic will support the schema-on-read approach with no negative impact on search performance.

Additional Benefits of Multiple Models in One Database

We’ve already discussed some of the key advantages of using a multi-model database:

Truly atomic updates across multiple models
Consistent data and indexes across models
Reduces up-front cost of data modeling
Faster proof of value
Reduced complexity in data integrations (need for data integrations is eliminated)
Reduced complexity in securing your data and consistent policies across models

Let’s talk about a few more benefits that come with a multi-model approach.

Cross-Model Querying

Bringing multiple data models into a single database provides the ability to search those data models together in a single query. For example, you can query documents that contain embedded triples using a simple search string. The embedded triples in the documents result set can then be used to query graph data to further filter your results. Using multiple data models to enhance your data’s searchability is a great use case for multi-model databases.

Synergistic Models

One of the examples we just discussed involved searching document content and using resulting embedded triples. The same is true if you reverse it. You can search a graph dataset using a SPARQL query, and the associated triple results can point to another piece of data. This is a common strategy for creating media repositories, for example. Creating a semantic model of media such as movies, books, and songs allows the media to be quickly searched for, and related media can be returned. This is the classic example of a recommendation engine.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 3. What Is Multi-Model?

Create new playlist

Sign In

Sign Up