Handling billions of records with big objects

Lightning Platform inherits relational database design principles and values in a very explicit way and, when you create Custom Objects or when you add lookup or master-detail fields, you are defining a relationship. It is, of course, no secret that Salesforce themselves uses the Oracle relational database management system (RDBMS) under the hood to support this. Defining data in a relational way is a very powerful feature, allowing a rich set of domain-specific data to be expressed, as we have seen with Seasons, Teams, Races, and Contestants so far in this book. We have also seen how we can use access patterns such as SOQL and DML to access that data as a whole in one operation if needed through the use of query joins or transactions (unit of work) when updating data over several objects. This behavior is categorized by the acronym known as Atomicity, Consistency, Isolation, Durability (ACID).

When record volumes start to grow, the benefits of ACID come at a performance cost, mostly due to how relational databases in general physically store data on disk, but also its inherent commitment to maintaining integrity via the support for transactions. To improve access times, as we have seen earlier in this chapter, indexes are maintained to provide faster access routes that manage data pointers to the actual physical data on disk. From an update perspective, managing so many transactions while also avoiding locking issues becomes problematic to user response times, which will start to increase as the data volumes grow.

In short, when you are considering several millions and definitively billions of records and still require a consistent access time for your users, something has to change. Put another way, how the data is physically stored eventually influences the access patterns (for read and write) that are most optimal for accessing it. The rapid growth of the internet and business to consumer (B2C) websites has forced a rethink of how the industry thinks about accessing and storing data in situations when you will rapidly run into billions of records, yet consumers still require rapid and consistent access times to specific records (for example, your last Amazon order).

An alternative to RDBMS is today commonly known as NoSQL. NoSQL databases physically layout their data on disk to support a very specific set of access patterns, and they explicitly do not support transactions. The behaviors for such databases have been described by Basically Available, Soft state, Eventual consistency (BASE). In researching this, I found that this is a less well-known acronym than ACID. It was defined by Eric Brewer, a professor at the University of California, Berkeley, and VP of Infrastructure at Google (at the time of writing). In short, it contrasts ACID by stating that it is okay to design a database where the client code may not always read the very latest data, but that the database will guarantee that it will eventually become consistent with past writes, or "eventually consistent" for short. This allows the database to scale across multiple physical disks and implement replication more easily. This compromise is key to read and write times remaining consistent even with billions of records!

Table of Contents for Handling billions of records with big objects

Create new playlist

Sign In

Sign Up

Table of Contents for
Handling billions of records with big objects