Refs and Software Transactional Memory

Most objects in Clojure are immutable. When you really want mutable data, you must be explicit about it, such as by creating a mutable reference (ref) to an immutable object. You create a ref with this:

 (ref initial-state)

For example, you could create a reference to the current song in your music playlist:

 (​def​ current-track (ref ​"Mars, the Bringer of War"​))
 -> #​'user/current-track

The ref wraps and protects access to its internal state. To read the contents of the reference, you can call deref:

 (deref reference)

The deref function can be shortened to the @ reader macro. Try using both deref and @ to dereference current-track:

 (deref current-track)
 -> ​"Mars, the Bringer of War"
 
 @current-track
 -> ​"Mars, the Bringer of War"

Notice how in this example the Clojure model fits the real world. A track is an immutable entity. It doesn’t change into another track when you’re finished listening to it. But the current track is a reference to an entity, and it does change.

ref-set

You can change where a reference points with ref-set:

 (ref-set reference new-value)

Call ref-set to listen to a different track:

 (ref-set current-track ​"Venus, the Bringer of Peace"​)
 -> java.lang.IllegalStateException​:​ No transaction running

Oops. Because refs are mutable, you must protect their updates. In many languages, you would use a lock for this purpose. In Clojure, you can use a transaction. Transactions are wrapped in a dosync:

 (dosync & exprs)

Wrap your ref-set with a dosync, and all is well.

 (dosync (ref-set current-track ​"Venus, the Bringer of Peace"​))
 -> ​"Venus, the Bringer of Peace"

The current-track reference now refers to a different track.

Transactional Properties

Like database transactions, Software Transactional Memory (STM) allows programmers to describe reads and writes to stateful references in the scope of a transaction. These transactions guarantee some important properties:

  • Updates are atomic. If you update more than one ref in a transaction, the cumulative effect of all the updates will appear as a single instantaneous event to anyone not inside your transaction.

  • Updates are consistent. Refs can specify validation functions. If any of these functions fail, the entire transaction will fail.

  • Updates are isolated. Running transactions can’t see partially completed results from other transactions.

Databases provide the additional guarantee that updates are durable. Because Clojure’s transactions are in-memory transactions, Clojure does not guarantee that updates are durable. If you want a durable transaction in Clojure, you should use a database.

Together, the four transactional properties are called ACID. Databases provide ACID; Clojure’s STM provides ACI.

If you change more than one ref in a single transaction, the changes are all coordinated to “happen at the same time” from the perspective of any code outside the transaction. So, you can make sure that updates to current-track and current-composer are coordinated:

 (​def​ current-track (ref ​"Venus, the Bringer of Peace"​))
 -> #​'user/current-track
 (​def​ current-composer (ref ​"Holst"​))
 -> #​'user/current-composer
 
 (dosync
  (ref-set current-track ​"Credo"​)
  (ref-set current-composer ​"Byrd"​))
 -> ​"Byrd"

Because the updates are in a transaction, no other thread will ever see an updated track with an out-of-date composer, or vice versa.

alter

The current-track example is deceptively easy, because updates to the ref are totally independent of any previous state. Let’s build a more complex example, where transactions need to update existing information. A simple chat application fits the bill. First, create a message record that has a sender and some text:

 (defrecord Message [sender text])

Now, you can create messages by instantiating the record:

 (->Message ​"Aaron"​ ​"Hello"​)
 -> #:user.Message{:sender ​"Aaron"​, :text ​"Hello"​}

Users of the chat application want to see the most recent message first, so a list is a good data structure. Create a messages reference that points to an initially empty list:

 (​def​ messages (ref ()))

Now you need a function to add a new message to the front of messages. You could simply deref to get the list of messages, cons the new message, and then ref-set the updated list back into messages:

 ; bad idea
 (​defn​ naive-add-message [msg]
  (dosync (ref-set messages (cons msg @messages))))

But there’s a better option. Why not perform the read and update in a single step? Clojure’s alter will apply an update function to a referenced object within a transaction:

 (alter ref update-fn & args...)

alter returns the new value of the ref within the transaction. When a transaction successfully completes, the ref will take on its last in-transaction value. Using alter instead of ref-set makes the code more readable:

 (​defn​ add-message [msg]
  (dosync (alter messages conj msg)))

Notice that the update function is conj (short for “conjoin”), not cons. This is because conj takes arguments in an order suitable for use with alter:

 (cons item sequence)
 (conj sequence item)

The alter function calls its update-fn with the current reference value as its first argument, as conj expects. If you plan to write your own update functions, they should follow the same structure as conj:

 (your-func thing-that-gets-updated & optional-other-args)

Try adding a few messages to see that the code works as expected:

 (add-message (->Message ​"user 1"​ ​"hello"​))
 -> (#:user.Message{:sender ​"user 1"​, :text ​"hello"​})
 
 (add-message (->Message ​"user 2"​ ​"howdy"​))
 -> (#:user.Message{:sender ​"user 2"​, :text ​"howdy"​}
  #:user.Message{:sender ​"user 1"​, :text ​"hello"​})

alter is the workhorse of Clojure’s STM and the primary means of updating refs. But if you know a little about how the STM works, you may be able to optimize your transactions in certain scenarios.

How STM Works: MVCC

Clojure’s STM uses a technique called Multiversion Concurrency Control (MVCC), which is also used in several major databases. Here’s how MVCC works in Clojure.

Transaction A begins by taking a point, which is simply a number that acts as a unique timestamp in the STM world. Transaction A has access to its own effectively private copy of any reference it needs, associated with the point. Clojure’s persistent data structures (Persistent Data Structures) make it cheap to provide these effectively private copies.

During Transaction A, operations on a ref work against (and return) the transaction’s private copy of the ref’s data, called the in-transaction value.

If at any point the STM detects that another transaction has already set/altered a ref that Transaction A wants to set/alter, Transaction A is forced to retry. If you throw an exception out of the dosync block, then Transaction A aborts without a retry.

If and when Transaction A commits, its heretofore private writes will become visible to the world, associated with a single point in the transaction timeline.

Sometimes the approach implied by alter is too cautious. What if you don’t care that another transaction altered a reference in the middle of your transaction? If in such a situation you’d be willing to commit your changes anyway, you can beat alter’s performance with commute.

commute

commute is a specialized variant of alter allowing for more concurrency:

 (commute ref update-fn & args...)

Of course, there’s a trade-off. Commutes are so named because they must be commutative. That is, updates must be able to occur in any order. This gives the STM system freedom to reorder commutes.

To use commute, replace alter with commute in your implementation of add-message:

 (​defn​ add-message-commute [msg]
  (dosync (commute messages conj msg)))

commute returns the new value of the ref. However, the last in-transaction value you see from a commute will not always match the end-of-transaction value of a ref, because of reordering. If another transaction sneaks in and alters a ref that you’re trying to commute, the STM will not restart your transaction. Instead, it will run your commute function again, out of order. Your transaction will never even see the ref value that your commute function finally ran against.

Since Clojure’s STM can reorder commutes behind your back, you can only use them when you don’t care about ordering. Literally speaking, this isn’t true for a chat application. The list of messages most certainly has an order, so if two message adds get reversed, the resulting list will not show correctly the order in which the messages arrived.

Practically speaking, chat message updates are commutative enough. STM-based reordering of messages will likely happen on time scales of microseconds or less. For users of a chat application, there are already reorderings on much larger time scales due to network and human latency. (Think about times that you have “spoken out of turn” in an online chat because another speaker’s message hadn’t reached you yet.) Since these larger reorderings are unfixable, it’s reasonable for a chat application to ignore the smaller reorderings that might bubble up from Clojure’s STM.

Prefer alter

Many updates are not commutative. For example, consider a counter that returns an increasing sequence of numbers. You might use such a counter to build unique IDs in a system. The counter can be a simple reference to a number:

 (​def​ counter (ref 0))

You should not use commute to update the counter. commute returns the in-transaction value of the counter at the time of the commute, but reorderings could cause the actual end-of-transaction value to be different. This could lead to more than one caller getting the same counter value. Instead, use alter:

 (​defn​ next-counter [] (dosync (alter counter inc)))

Try calling next-counter a few times to verify that the counter works as expected:

 (next-counter)
 -> 1
 
 (next-counter)
 -> 2

In general, you should prefer alter over commute. Its semantics are easy to understand and error proof. commute, on the other hand, requires that you think carefully about transactional semantics. If you use alter when commute would suffice, the worst thing that might happen is performance degradation. But if you use commute when alter is required, you’ll introduce a subtle bug that’s difficult to detect with automated tests.

Adding Validation to Refs

Database transactions maintain consistency through various integrity checks. You can do something similar with Clojure’s transactional memory, by specifying a validation function when you create a ref:

 (ref initial-state options*)
 ; options include:
 ; :validator validate-fn
 ; :meta metadata-map

The options to ref include an optional validation function that can throw an exception to prevent a transaction from completing. Note that options is not a map; it’s a sequence of key/value pairs spliced into the function call.

Continuing the chat example, add a validation function to the messages reference that guarantees that all messages have non-nil values for :sender and :text:

 (​defn​ valid-message? [msg]
  (and (:sender msg) (:text msg)))
 
 (​def​ validate-message-list #(every? valid-message? %))
 
 (​def​ messages (ref () :validator validate-message-list))

This validation acts like a key constraint on a table in a database transaction. If the constraint fails, the entire transaction rolls back. Try adding an ill-formed message such as a simple string:

 (add-message ​"not a valid message"​)
 -> java.lang.IllegalStateException​:​ Invalid reference state
 
 @messages
 -> ()

Messages that match the constraint are no problem:

 (add-message (->Message ​"Aaron"​ ​"Real Message"​))
 -> (#:user.Message{:sender ​"Aaron"​, :text ​"Real Message"​})

Refs are great for coordinated access to shared state, but not all tasks require such coordination. For updating a single piece of isolated data, prefer an atom.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.222.185