As social media became huge, data requirements increased too. The need to store and retrieve large amounts of data immediately, led to some companies involved in the problem to think about possible alternatives.
So, projects such as BigTable (Google) and Dynamo (Amazon) were among the first few attempts to find a solution to this problem. These projects encouraged a new movement that we now know as the NoSQL initiative, the term being proposed by Johan Oskarsson in a conference in California about these topics, for which he created the Twitter hashtag #NoSQL.
We can define the NoSQL movement as a broad class of system-management databases that differ from the classical model of relational databases (RDBMS) in important facets, the most noticeable one being that they are not using SQL as the primary query language.
Stored data does not require fixed structures such as tables. The result? They don't support JOIN operations, and they do not fully guarantee ACID (atomicity, consistency, isolation, and durability) features, which are the soul of the relational model. Besides, they usually scale horizontally in a very efficient manner.
As a reminder: the four ACID features are defined as follows:
NoSQL systems are sometimes called not only SQL in order to underline the fact that they can also support query languages such as SQL, although this characteristic depends on the implementation and the type of database.
Academic researchers refer to these databases as structured storage databases, a term that also covers classical relational databases. Often, NoSQL databases are classified according to how they store data and include categories such as Key-Value (Redis), BigTable/Column Family (Cassandra, HBase), Document Databases (MongoDb, Couch DB, Raven DB), and Graph Oriented Databases (Neo4j).
With the growth of real-time websites, it became clear that an increase in processing power for large volumes of data was required. And the solution of organizing data in similar horizontal structures reached corporative consensus, since it can support millions of requests per second.
Many attempts have been made to categorize the different offers now found in the NoSQL world according to various aspects: Scalability, Flexibility, Functionality, and so on. One of these divisions, established by Scofield and Popescu (http://NoSQL.mypopescu.com/post/396337069/presentation-NoSQL-codemash-an-interesting), categorizes NoSQL databases according to the following criteria:
Performance |
Scalability |
Flexibility |
Complexity |
Functionality | |
---|---|---|---|---|---|
Key-value stores |
High |
High |
High |
None |
Variable (none) |
Column stores |
High |
High |
Moderate |
Low |
Minimal |
Document stores |
High |
Variable (high) |
High |
Low |
Variable (low) |
Graph databases |
Variable |
Variable |
High |
High |
Graph theory |
Relational databases |
Variable |
Variable |
Low |
Moderate |
Relational algebra |
So, the first point to clarify at the time of using one of these models is to identify clearly which model suits our needs better. Let's quickly review these unequal approaches in architecture:
localStorage
and sessionStorage
APIs. They allow read/write operations for a web page in the local system's dedicated area. Storage is structured in pairs, the left-hand side being the key we'll use later on to retrieve the associated value.These databases don't care about the type of information being saved as the value type (either numbers, documents, multimedia, and so on), although there might be some limitations.
As we mentioned earlier, most NoSQL databases don't have the capacity of performing joins in queries. Consequently, the database schema needs to be designed in another way.
This has led to several techniques when relational data has to be managed in a NoSQL database.
This idea relies on the fast response feature typical of these databases. In lieu of getting all data in a simple request, several queries are chained in order to get the desired information.
If the performance penalty is not acceptable, other approaches are possible.
The issue in this case is solved with a distinct approach: instead of storing foreign keys, the corresponding foreign values are stored together with the model's data.
Let's imagine blog entries. Each one can also relate and save both username and user ID, so we can read the username without requiring an extra query.
The shortcoming is that when the username changes, the modification will have to be stored in more than one place in the database. So, this kind of approach is handy when the average of reads (with respect to write operations) is fairly substantial.
As we will see in the practices with MongoDB, a common practice is based on placing more data in a smaller number of collections. Translated into practice, this means that in the blogging application we imagined earlier, we could store comments in the same document as the blog's post document.
In this way, a single query gets all the related comments. In this methodology, there's only a single document that contains all the data you need for a specific task.
Actually, this practice has become a de facto practice given the absence of a fixed schema in these databases.
The terminology that's used changes as well. The following table succinctly explains the equivalence in terms of relations between SQL and NoSQL databases:
SQL |
MongoDB |
---|---|
Database |
Database |
Table |
Collection |
Row |
Document or BSON document |
Column |
Field |
Index |
Index |
Table joins |
Embedded documents (with linking) |
Primary key (unique column or column combinations) |
Primary key (automatically set to the |
Aggregation (for example, by group) |
Aggregation pipeline |
In the case of MongoDB, which we'll use in this chapter, a read operation is a query that targets a specific collection of documents. Queries specify criteria (conditions) that identify which documents MongoDB has to return to the client.
Any query needs to express the fields required in the output. This is solved using a projection: a syntax expression that enumerates the fields indicating the matching documents. The behavior of MongoDB follows these rules:
sort()
method forms a part of the queryTraditionally, even in the relational model, those operations that change information (create, update, or delete) have their own syntax (DDL or DML in que SQL world). In MongoDB, they are noted as data modification operations, since they modify data in a single collection. However, for update operations, a conceptual division is usually made in order to distinguish punctual updates (modifications) from totally changing updates (replacements). In this case, only the _id
field is preserved.
To summarize, the operational offer can be resumed in this way:
So, in the case of MongoDB, we would have a schema like what is shown in below:
3.138.36.38