Chapter 8. Transactions

Transactions are logical groups of processing in a database, and each group or transaction can contain one or more operations such as reads and/or writes across multiple documents. MongoDB supports ACID-compliant transactions across multiple operations, collections, databases, documents, and shards. In this chapter, we introduce transactions, define what ACID means for a database, highlight how you use these in your applications, and provide tips for tuning transactions in MongoDB. We will cover:

  • What a transaction is

  • How to use transactions

  • Tuning transaction limits for your application

Introduction to Transactions

As we mentioned above, a transaction is a logical unit of processing in a database that includes one or more database operations, which can be read or write operations. There are situations where your application may require reads and writes to multiple documents (in one or more collections) as part of this logical unit of processing. An important aspect of a transaction is that it is never partially completed—it either succeeds or fails.

Note

In order to use transactions, your MongoDB deployment must be on MongoDB version 4.2 or later and your MongoDB drivers must be updated for MongoDB 4.2 or later. MongoDB provides a Driver Compatibility Reference page that you can use to ensure your MongoDB Driver version is compatible.

A Definition of ACID

ACID is the accepted set of properties a transaction must meet to be a “true” transaction. ACID is an acronym for Atomicity, Consistency, Isolation, and Durability. ACID transactions guarantee the validity of your data and of your database’s state even where power failures or other errors occur.

Atomicity ensures that all operations inside a transaction will either be applied or nothing will be applied. A transaction can never be partially applied; either it is committed or it aborts.

Consistency ensures that if a transaction succeeds, the database will move from one consistent state to the next consistent state.

Isolation is the property that permits multiple transactions to run at the same time in your database. It guarantees that a transaction will not view the partial results of any other transaction, which means multiple parallel transactions will have the same results as running each of the transactions sequentially.

Durability ensures that when a transaction is committed all data will persist even in the case of a system failure.

A database is said to be ACID-compliant when it ensures that all these properties are met and that only successful transactions can be processed. In situations where a failure occurs before a transaction is completed, ACID compliance ensures that no data will be changed.

MongoDB is a distributed database with ACID compliant transactions across replica sets and/or across shards. The network layer adds an additional level of complexity. The engineering team at MongoDB provided several chalk and talk videos that describe how they implemented the necessary features to support ACID transactions.

How to Use Transactions

MongoDB provides two APIs to use transactions. The first is a similar syntax to relational databases (e.g., start_transaction and commit_transaction) called the core API and the second is called the callback API, which is the recommended approach to using transactions.

The core API does not provide retry logic for the majority of errors and requires the developer to code the logic for the operations, the transaction commit function, and any retry and error logic required.

The callback API provides a single function that wraps a large degree of functionality when compared to the core API, including starting a transaction associated with a specified logical session, executing a function supplied as the callback function, and then committing the transaction (or aborting on error). This function also includes retry logic that handle commit errors. The callback API was added in MongoDB 4.2 to simplify application development with transactions as well as make it easier to add application retry logic to handle any transaction errors.

In both APIs, the developer is responsible for starting the logical session that will be used by the transaction. Both APIs require operations in a transaction to be associated with a specific logical session (i.e., pass in the session to each operation). A logical session in MongoDB tracks the time and sequencing of the operations in the context of the entire MongoDB deployment. A logical session or server session is part of the underlying framework used by client sessions to support retryable writes and causal consistency in MongoDB—both of these features were added in MongoDB version 3.6 as part of the foundation required to support transactions. A specific sequence of read and write operations that have a causal relationship reflected by their ordering is defined as a causally consistent client session in MongoDB. A client session is started by an application and used to interact with a server session.

In 2019, six senior engineers from MongoDB published a paper at the SIGMOD 2019 conference entitled “Implementation of Cluster-wide Logical Clock and Causal Consistency in MongoDB”.1 This paper provides a deeper technical explanation of the mechanics behind logical sessions and causal consistency in MongoDB. The paper documents the efforts from a multiteam, multiyear engineering project. The work involved changing aspects of the storage layer, adding a new replication consensus protocol, modifying the sharding architecture, refactoring sharding cluster metadata, and adding a global logical clock. These changes provide the foundation required by the database before ACID-compliant transactions can be added.

The complexity and additional coding required in applications are the main reasons to recommend the callback API over the core API. These differences between the APIs are summarized in Table 8-1.

Table 8-1. Comparison of Core API versus Callback API
Core APICallback API
Requires explicit call to start the transaction and commit the transaction.Starts a transaction, executes the specified operations, and commits (or aborts on error).
Does not incorporate error-handling logic for TransientTransactionError and UnknownTransactionCommitResult, and instead provides the flexibility to incorporate custom error handling for these errors.Automatically incorporates error-handling logic for TransientTransactionError and UnknownTransactionCommitResult.
Requires explicit logical session to be passed to API for the specific transaction.Requires explicit logical session to be passed to API for the specific transaction.

To understand the differences between these two APIs, we can compare the APIs using a simple transaction example for an ecommerce site where an order is placed and the corresponding items are removed from the available stock as they are sold. This involves two documents in different collections in a single transaction. The two operations, which will be the core of our transaction example, are:

    orders.insert_one({"sku": "abc123", "qty": 100}, session=session)
    inventory.update_one({"sku": "abc123", "qty": {"$gte": 100}},
                         {"$inc": {"qty": -100}}, session=session)

First, let’s see how the core API can be used in Python for our transaction example. The two operations of our transaction are highlighted in Step 1 of the program listing below:

# Define the uriString using the DNS Seedlist Connection Format 
# for the connection
uri = 'mongodb+srv://server.example.com/'
client = MongoClient(uriString)

my_wc_majority = WriteConcern('majority', wtimeout=1000)

# Prerequisite / Step 0: Create collections, if they don't already exist. 
# CRUD operations in transactions must be on existing collections.

client.get_database( "webshop",
                     write_concern=my_wc_majority).orders.insert_one({"sku":
                     "abc123", "qty":0})
client.get_database( "webshop",
                     write_concern=my_wc_majority).inventory.insert_one(
                     {"sku": "abc123", "qty": 1000})

# Step 1: Define the operations and their sequence within the transaction
def update_orders_and_inventory(my_session):
    orders = session.client.webshop.orders
    inventory = session.client.webshop.inventory


    with session.start_transaction(
            read_concern=ReadConcern("snapshot"),
            write_concern=WriteConcern(w="majority"),
            read_preference=ReadPreference.PRIMARY):

        orders.insert_one({"sku": "abc123", "qty": 100}, session=my_session)
        inventory.update_one({"sku": "abc123", "qty": {"$gte": 100}},
                             {"$inc": {"qty": -100}}, session=my_session)
        commit_with_retry(my_session)

# Step 2: Attempt to run and commit transaction with retry logic
def commit_with_retry(session):
    while True:
        try:
            # Commit uses write concern set at transaction start.
            session.commit_transaction()
            print("Transaction committed.")
            break
        except (ConnectionFailure, OperationFailure) as exc:
            # Can retry commit
            if exc.has_error_label("UnknownTransactionCommitResult"):
                print("UnknownTransactionCommitResult, retrying "
                      "commit operation ...")
                continue
            else:
                print("Error during commit ...")
                raise

# Step 3: Attempt with retry logic to run the transaction function txn_func
def run_transaction_with_retry(txn_func, session):
    while True:
        try:
            txn_func(session)  # performs transaction
            break
        except (ConnectionFailure, OperationFailure) as exc:
            # If transient error, retry the whole transaction
            if exc.has_error_label("TransientTransactionError"):
                print("TransientTransactionError, retrying transaction ...")
                continue
            else:
                raise

# Step 4: Start a session.
with client.start_session() as my_session:

# Step 5: Call the function 'run_transaction_with_retry' passing it the function
# to call 'update_orders_and_inventory' and the session 'my_session' to associate
# with this transaction.

    try:
        run_transaction_with_retry(update_orders_and_inventory, my_session)
    except Exception as exc:
        # Do something with error. The error handling code is not
        # implemented for you with the Core API.
        raise

Now, let’s look at how the the callback API can be used in Python for this same transaction example. The two operations of our transaction are highlighted in Step 1 of the program listing below:

# Define the uriString using the DNS Seedlist Connection Format 
# for the connection
uriString = 'mongodb+srv://server.example.com/'
client = MongoClient(uriString)

my_wc_majority = WriteConcern('majority', wtimeout=1000)

# Prerequisite / Step 0: Create collections, if they don't already exist.
# CRUD operations in transactions must be on existing collections.

client.get_database( "webshop",
                     write_concern=my_wc_majority).orders.insert_one({"sku":
                     "abc123", "qty":0})
client.get_database( "webshop",
                     write_concern=my_wc_majority).inventory.insert_one(
                     {"sku": "abc123", "qty": 1000})

# Step 1: Define the callback that specifies the sequence of operations to
# perform inside the transactions.

def callback(my_session):
    orders = my_session.client.webshop.orders
    inventory = my_session.client.webshop.inventory

    # Important:: You must pass the session variable 'my_session' to 
    # the operations.

    orders.insert_one({"sku": "abc123", "qty": 100}, session=my_session)
    inventory.update_one({"sku": "abc123", "qty": {"$gte": 100}},
                         {"$inc": {"qty": -100}}, session=my_session)

#. Step 2: Start a client session.

with client.start_session() as session:

# Step 3: Use with_transaction to start a transaction, execute the callback,
# and commit (or abort on error).

    session.with_transaction(callback,
                             read_concern=ReadConcern('local'),
                             write_concern=my_write_concern_majority,
                             read_preference=ReadPreference.PRIMARY)
}
Note

In MongoDB multidocument transactions, you may only perform read/write (CRUD) operations on existing collections or databases. As shown in our example, you must first create a collection outside of a transaction if you wish to insert it into a transaction. Create, drop, or index operations are not permitted in a transaction.

Tuning Transaction Limits for Your Application

There are a few parameters that are important to be aware of when using transactions. They can be adjusted to ensure your application can make the optimal use of transactions.

Timing and Oplog Size Limits

There are two main categories of limits in MongoDB transactions. The first relates to timing limits of the transaction, controlling how long a specific transaction can run, the time a transaction will wait to acquire locks, and the maximum length that all transactions will run. The second category specifically relates to the MongoDB oplog entry and size limits for an individual entry.

Time limits

The default maximum runtime of a transaction is one minute or less. This can be increased by modifying the limit controlled by transactionLifetimeLimitSeconds at a mongod instance level. In the case of sharded clusters, the parameter must be set on all shard replica set members. After this time has elapsed, a transaction will be considered expired and will be aborted by a cleanup process, which runs periodically. The cleanup process will run once every 60 seconds or every transactionLifetimeLimitSeconds/2, whichever is lower.

To explicitly set a time limit on a transaction, it is recommended that you specify a maxTimeMS on commitTransaction. If maxTimeMS is not set then transactionLifetimeLimitSeconds will be used or if it is set but would exceed transactionLifetimeLimitSeconds then transactionLifetimeLimitSeconds will be used instead.

The default maximum time a transaction will wait to acquire the locks it needs for the operations in the transaction is 5 ms. This can be increased by modifying the limit controlled by maxTransactionLockRequestTimeoutMillis. If the transaction is unable to acquire the locks within this time, it will abort. maxTransactionLockRequestTimeoutMillis can be set to 0, -1, or a number greater than 0. Setting it to 0 means a transaction will abort if it is unable to immediately acquire all the locks it requires. A setting of -1 will use the operation-specific timeout as specified by maxTimeMS. Any number greater than 0 configures the wait time to that time in seconds as the specified period that a transaction will attempt to acquire the required locks.

Oplog size limits

MongoDB will create as many oplog entries as required for the write operations in a transaction. However, each oplog entry must be within the BSON document size limit of 16MB.

Transactions provide a useful feature in MongoDB to ensure consistency, but they should be used with the rich document model. The flexibility of this model and using best practices such as schema design patterns will help avoid the use of transactions for most situations. Transactions are a powerful feature, best used sparingly in your applications.

1 The authors are Misha Tyulenev, staff software engineer for sharding; Andy Schwerin, vice president for Distributed Systems; Asya Kamsky, principal product manager for Distributed Systems; Randolph Tan, senior software engineer for sharding; Alyson Cabral, product manager for Distributed Systems; and Jack Mulrow, software engineer for sharding.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.213.209