6 | Big Data Simplied
• Consistency implies that only a valid data will be written to a database. If, for some reason,
a transaction violates the database’s consistency rules, then the entire transaction should be
rolled back, and the database will be restored to a state that was consistent with those rules.
On the other hand, if a transaction successfully executes as per the consistency rules of the
database, the database moves from one state that is consistent with the rules to another state
that is also consistent with those rules.
• Isolation requires that multiple transactions occurring on the database at the same time
should not impact one another. For example, if John carries out a transaction on a database at
the same time when Jane executes a different transaction, then both should operate isolated
from each other. The database should either perform John’s transaction in entirety before
executing Jane’s or vice versa. This prevents one transaction from reading intermediate data
produced as a side effect of part of the other transaction, which may not eventually be com-
mitted to the database, resulting in erroneous outcomes. It is to be noted that the isolation
property does not mandate which transaction should execute first. It merely suggests that
transactions will not interfere with each other.
• Durability ensures that any transaction that is committed to the database will not be lost.
This is ensured through the use of database backups and transaction logs that facilitate the
restoration of committed transactions in the event of any software or hardware failures.
Database administrators can use a number of strategies to ensure ACID compliance.
The strategy used to enforce atomicity and durability is write-ahead logging (WAL) in which
the details of any transaction are first written to a log that includes both redo and undo informa-
tion. This ensures that, given a software or hardware failure of any sort, the database can check
the log and compare the contents of the log to the current state of the database.
Another technique used to address atomicity and durability is shadow paging in which a
shadow page is created whenever the data is to be modified. By this strategy, the updates of the
query are written to the shadow page rather than to the real data in the database. The database is
modified only when the edit is complete.
Finally, there is another strategy called the two-phase commit protocol, which is especially
useful in distributed database systems. This protocol splits up the request to modify the data into
two phases, namely a commit-request phase and a commit phase. In the request phase, all nodes
of the DBMS on the network, which are affected by the transaction, must confirm that they have
received the request and currently have the capacity to complete the transaction. Once confir-
mation is received from all relevant nodes, the commit phase is completed in which the data is
actually modified.
1.2.2 Unstructured Data
Unstructured data represents around 80% of the data being used today. It mainly includes
text and multimedia content. Some examples of unstructured data include word process-
ing documents, e-mail messages, videos, photos, audio les, presentations, webpages, etc.
Unstructured data is omnipresent. Most individuals and organizations throughout their life
generate heavy volumes of unstructured data. Unstructured data is either produced by a
machine or a human being.
M01 Big Data Simplified XXXX 01.indd 6 5/10/2019 9:56:18 AM