Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 6. State Management

State in an Immutable World

As much as possible, Clojure advocates eliminating state from programs. In general, data should be passed and returned from functions in a purely functional way. It keeps things clean, protected, and parallelizable.

Often, however, that's simply not possible. The real world is full of changing concepts and so real programs are full of state. If you're writing a word processor, the current document has a state. If you're writing a game, the objects in the game world exist have state. If you're writing financial software, the amount of money in an account is state. This is a fact of the way the world is and the way humans think, and programs need to be able to model it effectively.

With today's concurrent environments, effective modeling of state is not just something nice to have, but absolutely necessary to get anything done. Even aside from the concurrency issues, however, there are many benefits of languages which have a clear conceptualization of state. Even in a single threaded program, explicit managed state is preferable to having state smeared across the entire application, and Clojure provides just that: efficient explicitly managed state.

The Old Way

Most programming languages model state via a fairly naive process. There are things, represented by variables or objects, and these things can change. But how and when they change is not well defined. Usually, programs "bash objects in place"—each line of code is free to reach in and push arbitrary changes to any part of any thing as it executes. The only way to preserve consistency and prevent bugs caused by two changes happening at once is to place safeguards around each and every thing, ensuring that only one process can interact with a given thing at once. These are known as locks.

The problem with locks is that they're hard to get right. In order to make them correct, the first reaction is to use more of them, which only causes another problem: extensive use of locks solves the problems introduced by concurrency by, effectively, reducing the level of concurrency that is actually possible. It doesn't matter how many threads a program has running, if they all must queue up to access an object one at a time, then at that point they might as well be running in a single thread.

However, with the view that there are only mutable, changeable things, and without having well-defined semantics for how they change, locks are the only option. For a more effective approach to state, it is necessary to reevaluate and find better definitions for what things are, and establish clear rules for how they change.

State and Identity

Clojure introduces a philosophical and conceptual paradigm shift in its treatment of things. It takes the standard notion of a thing (an object or a variable) and decomposes it into two separate concepts—state and identity.^[4] Every thing has both a state and an identity. State is a value associated with an identity at a particular moment in time, whereas identity is the part of a thing that does not change, and creates the link between many different states at many different times. The value of each state is immutable and cannot change. Rather, change is modeled by the identity being updated to refer to a different state entirely.

For example, when I was a child, in 1990, I was a very, very different person than I am now in 2010, and it is very probable that I will be a different person still when I am much older in 2050. Luke₁₉₉₀, Luke₂₀₁₀ and Luke₂₀₅₀ are quite different people—you could go as far as to say that they don't have that many similarities at all. And yet, they do have a relationship, a constant identity—they are all me, Luke VanderHart.

In Clojure's logical terminology, Luke₁₉₉₀, Luke₂₀₁₀ and Luke₂₀₅₀ are all distinct values—distinct states. My name, Luke VanderHart, is the identity that links them all together. Like Clojure's values, these states are immutable. I may be able to change future versions of myself, but Luke₁₉₉₀ is set in stone. I can no longer do anything to change who that person was or is. Currently, the identity Luke VanderHart has Luke₂₀₁₀ as its state. Next year, it will have a new state: Luke₂₀₁₁, which will likely be very similar to Luke₂₀₁₀ but with subtle differences. Actually, in Clojure's model, every time I change at all, it generates a new state: millisecond by millisecond, I have new values associated with my identity as I have different thoughts, feelings, and motions. I am a near infinity of distinct, unchangeable persons, all slightly different, all linked by a common identity.

Another example is my bank account, a much less philosophical example and one more likely to be modeled in an actual program. As I spend money and receive paychecks, the balance of my bank account fluctuates. Clearly, it is something that needs to be modeled as changeable state. In this case, the identity which remains constant throughout the program is "my account"—call the identity account-balance. The state, then, is the amount of money in the account at a given time. For example, it might start at $1000. If I deposit a check for $100, then the account-balance identity is updated to point to a new state, $1100. Note that I have not changed the value of the state—changing the integer 1000 to 1100 is a clear impossibility: 1000 and 1100 are distinct mathematical values. The state has not changed, rather, the identity of account-balance now points to a new state. The update takes place atomically; there is no intermediate state where the value of account-balance is half-set. At any point in the program, it is safe to query the current state of account-balance.

State and Identity in Clojure

In Clojure code, states are simply any of Clojure's data types. They can be primitives, such as numbers or strings, or more complex structures built out of lists, maps, and sets. The only limitation on values that can be used as states is that they ought to be immutable. If you use a mutable structure (such as a Java object) as a state, you haven't actually accomplished anything: Clojure's state management system is founded on the premise that values themselves are immutable, and it can provide no guarantees of consistency or isolation for mutable objects.

Identities are modeled using one of the three reference types: refs, agents, atoms and vars. Each implements the conceptual model outlined previously; each represents an identity and points to a state. They differ in the semantics of how they can be updated to refer to new state values and are useful in different situations. Between them, they can handle just about any state management task:

Use refs to manage synchronous, coordinated state
Use agents to manage asynchronous, independent state
Use atoms to manage synchronous, independent state

Coordinated vs. Independent State

One requirement common to many systems is that updates to certain identities be coordinated to ensure data integrity. Coordinated updates can't just take one identity into account—they have to manage the states of several interdependent identities to ensure that they are all updated at the same time and that none are left out. The most common example of coordinated state is a transfer of funds between two bank accounts: money deposited into one account must also be subtracted from the other, and these two actions must occur as a single, coordinated event, or not at all. Clojure uses refs to provide coordinated state.

The alternative to coordinated state is independent state. Independent identities stand on their own and can have their state updated without concern for other identities. This still needs to be controlled in some way, but internally, this is usually a more efficient process than coordinating changes to multiple identities. Therefore, updates to independent identities are usually faster than updates to coordinated identities; use them in preference to refs unless coordinated access is required. Clojure provides agents and atoms as independent identity reference types.

Synchronous vs. Asynchronous Updates

Synchronous updates to the values identities occur immediately, in the same thread from which they are invoked. The execution of the code does not continue until the update has taken place, as most programmers would expect. This is the default way instructions execute in most programming languages. Updates to the values of refs and atoms are both handled synchronously in Clojure.

Asynchronous updates do not occur immediately, but at some unspecified point in the (near) future, usually in another thread. The code execution continues immediately from the point at which the update was invoked, without waiting for it to complete. Extensive use of asynchronous updates is useful for introducing concurrency into programs, and for more flexible event-based programming models. However, there is no guarantee when the effect of an asynchronous update will actually be in place. It will nearly always be instantaneous from a human scale, but from a code perspective, it might not be. For example, if one line of code updates an asynchronous identity, and the very next line of code in the same thread reads its state, it will probably get the old state. Don't use asynchronous identities where your code depends on the update happening right away. Agents are Clojure's implementation of asynchronously updated identities.

Refs and Transactions

Refs are Clojure's implementation of synchronous, coordinated identities. Each is a distinct identity, but operations on them can be run inside a transaction, guaranteeing that multiple identities whose values depend on each other are always in a consistent state. Refs provide access to Clojure's state-of-the-art Software Transactional Memory (STM) system.

Creating and Accessing refs

To create a ref, use the built-in ref function, which takes a single argument: the initial value of the ref:

user=> (def my-ref (ref 5))
#'user/my-ref

This code does two things: creates a ref with an initial state of the integer 5 and binds the ref to a var, my-ref. It is an important distinction: the var is not the ref itself, it is just bound to the ref. If you try to get the value of the var, you get the following:

user=> my-ref
#<Ref@1010058: 5>

my-ref is a var like any other. It just has a ref as its bound value, which is seen here. " #<Ref@1010058: 5>" is the string debugging representation of a ref. To actually get the current state of the ref, it is necessary to use the dereference function deref:

user=> (deref my-ref)
5

The deref function always takes a single argument, which must resolve to a ref and returns the current state of the ref. Because the deref function is used so frequently, there is a shorthand for it: the @ symbol. Prefixing an expression with @ is identical to calling deref on it:

user=> (deref my-ref)
5
user=> @my-ref
5

The shorthand form makes it easier to dereference symbols within expressions:

user=> (+ 1 @my-ref)
6

Dereferencing a ref always returns its state, immediately. Refs are never locked (at least, not in a traditional sense) and deref does not block while waiting for a transaction to complete. It always just returns a snapshot of the ref's current state. This means that if you call deref twice, outside of a transaction, it is possible that you will get two different values.

Updating refs

There are several different functions which can be used to update the values of refs. They differ in their performance implications, and are explained in detail in the following sections, but they have one thing in common: they are designed exclusively for use within transactions. Executing any of them outside a transaction always throws an error.

Transactions

For anyone who has worked with relational databases, Clojure's transactions will be a familiar concept: they operate in almost exactly the same way as their database counterparts. Essentially, all updates contained within a single transaction are committed to the application state atomically, at the same time. Either all the updates occur at the same time, or none do. Consistency across ref values is guaranteed.

Transactions are also isolated, which means that no transaction can see the effects of any other transaction while it is running. When a transaction begins, it takes a snapshot of all the ref values involved. Subsequent updates to those values from outside the transaction are invisible to code within the transaction, just as changes made within the transaction are invisible to the outside world until it is finished and committed. Of course, changes made within a transaction are visible within the same transaction. Dereferencing a ref within a transaction always returns the "in-transaction" value of the ref, which reflects any updates that have been made since the beginning of the transaction.

Additionally, transactions nest. If a transaction is initiated while already inside a transaction, the inner transaction simply becomes part of the outer transaction and will not commit until the outer transaction commits.

Transactions are conceptually lock-free and optimistic. This means that transactions don't wait for other transactions to complete before they begin. Transactions will never block a thread while waiting for another update. However, it doesn't remove the possibility that multiple transactions updating the same ref can conflict. A transaction might complete, only to find that the refs it is trying to update are stale and have already been updated by another transaction. In this case, the transaction simply retries, taking a snapshot of the new values and running itself again. The system prioritizes commits, insuring that no matter how much contention there is for a particular ref, each transaction is guaranteed to complete eventually.

High-concurrency, high-contention scenarios will result in a slowdown of the STM system as many transactions are retried. However, in most cases it will still end up faster than the equivalent system using locks. Even in the worst case, where a perfectly designed system of locks is provably faster than STM, Clojure argues that STM is still worthwhile due to the decreased cognitive load and simplicity of the solutions.

Many consider the benefits of STM to be roughly analogous to managed memory and garbage collection: most the time they are more than fast enough, and they save so much effort from programmers and software architects that the occasional scenario where they underperform the meticulously, complicated manual solution can be accepted.

Tools for Updating refs

The most important form when working with refs is the dosync macro. dosync initiates a transaction and takes any number of additional forms. Each provided form is evaluated sequentially within a transaction. The value of the final form is returned after committing the transaction. If an exception is thrown from any of the provided forms, the transaction is terminated without committing.

For actually updating the state of a ref, the most basic function is ref-set. ref-set takes two arguments: a ref and a value. It sets the state of the reference to be the value, and then returns the value. Of course, it must be run within a transaction established by dosync.

For example, the following code:

user=> (def my-ref (ref 5))
#'user/my-ref
user=> @my-ref
5

user=> (dosync (ref-set my-ref 6))
6
user=> @my-ref
6

To emphasize, ref-set and all other ref functions may only be called from within a transaction. Trying to call ref-set outside of a transaction throws the following error:

user=> (ref-set my-ref 7)
java.lang.IllegalStateException: No transaction running

Another common function for updating refs is alter. alter takes a ref, a function, and any number of additional arguments. It calls the provided function with the in-transaction value of the ref as its first argument and the other provided arguments as additional arguments. It sets the value of the ref to the return value of the function and returns the same value.

user=> (def my-ref (ref 5))
#'user/my-ref
user=> @my-ref
5
user=> (dosync (alter my-ref + 3))
8
user=> @my-ref
8

Note

The function provided to alter must be free of side effects and return a purely functional transformation of the ref value. This is because the function may be executed multiple times as the STM retries the transaction. If the function has side effects, including updates to other identities, they will be executed at least once, but potentially an arbitrary number of times if the update is highly contentious., Almost always, this will have unexpected and undesired results. Double check that all functions passed to alter are pure.

Some might wonder why both ref-set and alter are provided, given that they're essentially just different ways of doing the same thing—setting the state of a ref. The distinction is not so much in their actual functionality as in what they imply to someone reading the code. alter usually indicates that the new value of the ref is a function of the old, that it is an update that is related to it in some way. ref-set implies that the old value is being obliterated and replaced with the new. Under the hood, there isn't any difference, but when trying to understand a program, it can be a great help to see at a glance whether the value being set is tied to the old value or not.

The final function used to update refs is commute. commute has the same signature and basic functionality of alter, but with one important difference: in a contended transaction, rather than restarting the whole transaction as it normally would, it goes ahead and uses the new value instead of the in-transaction value when performing its calculation. This means that commute operations are less contentious, and will achieve much better performance in high-contention scenarios.

It also means that commute operations are not perfectly isolated within a transaction. However, if the function passed to commute is logically or mathematically commutative, it makes no difference. Commutative functions are those which may be applied in any order without impacting the end result. In contentious transactions which use commute, that is exactly what happens. commute buys efficiency by making the assumption that it can apply the update in any order relative to other updates. Therefore, you should only use commute if the provided function can be applied in any order without affecting the outcome (or if you don't care whether it does). If you use commute with a function that isn't guaranteed to be logically commutative, you will likely see inconsistent, unpredictable behavior.

An example of using commute appropriately (since + is a naturally commutative operation):

user=> (def my-ref (ref 5))
#'user/my-ref
user=> @my-ref
5
user=> (dosync (commute my-ref + 3))
8
user=> @my-ref
8

There is one more function that operates on refs: ensure. It takes a single argument, a ref. Like the other ref functions, it can only be used inside a transaction. Unlike other ref functions, it doesn't actually change the state of a ref. What it does do is to force a transaction retry if the ensured ref changes during the transaction, just as it would if it were a ref you updated. Of course, you wouldn't see such changes inside the transaction in any case, due to transaction isolation. But normally, if you don't update a ref in a transaction, that ref is not included in the consistency guarantees of the final commit. If you want to ensure that a ref you don't update is nevertheless unchanged after a transaction for coordination reasons, use ensure on it within the transaction.

Examples

Listing 6-1 illustrates the classic example of transactional behavior previously mentioned, transferring money from one bank account to another. This is a scenario in which coordination between the two pieces of state—the two accounts—is vitally important. If the values were not coordinated, it would be possible, however briefly, to be in a state in which the money was added to one account but not yet subtracted from the other (or vice versa). Using refs and transactions ensures that the account addition and subtraction occur atomically.

Example 6-1. Bank Accounts in STM

(def account1 (ref 1000))
(def account2 (ref 1500))

(defn transfer
    "transfers amount of money from a to b"
    [a b amount]
    (dosync
        (alter a - amount)
        (alter b + amount)))

(transfer account1 account2 300)
(transfer account2 account1 50)

(println "Account #1:" @account1)
(println "Account #2:" @account2)

Running this code yields the expected output after the two transactions. Because the transaction is guaranteed by Clojure's STM, the results would be the consistent no matter how many threads were concurrently updating the accounts. In this case, the output is:

Account #1: 750
Account #2: 1750

The following example is much more complex, and demonstrates how refs can be stored in any data structure (not just def'd at the top level), how they can have any data structure as their value, not just integers, and how even refs can be part of the value of another ref. It is just a basic example of using refs: you will probably want to approach the ref structure in an actual program with a great deal more thought. In general, it's better to be judicious and use as few refs as will meet your needs.

The program represents a rudimentary address book. The main data structure is a vector of contacts. It is contained in a ref, since you need to be able to update it and it starts out empty. Each contact is a map containing first name and last name. Rather than storing the entries directly, though, they are each stored as a ref themselves, since each is an individually updateable piece of state (see Listing 6-2).

Example 6-2. An Address Book in STM

(def my-contacts (ref []))

(defn add-contact
    "adds a contact to the provided contact list"
    [contacts contact]
    (dosync
        (alter contacts conj (ref contact))))

(defn print-contacts
    "prints a list of contacts"
    [contacts]
    (doseq [c @contacts]
                             (println (str "Name: " (@c :lname) ", " (@c :fname)))
                         ))


(add-contact my-contacts {:fname "Luke" :lname "VanderHart"})
(add-contact my-contacts {:fname "Stuart" :lname "Sierra"})
(add-contact my-contacts {:fname "John" :lname "Doe"})

(print-contacts my-contacts)

Running the scripts creates a list of contacts, adds several contacts to it (as refs), and then prints the list, yielding:

Name: VanderHart , Luke
Name: Sierra, Stuart
Name: Doe, John

Note how the print-contacts function needs to dereference the contacts list and also each contact before it can use it, since both are references.

Now, as an example of coordinated access to multiple refs, consider the task of adding an "initials" field to each contact, but doing it in a coordinated way so there is no chance that any contact might be left out. This is slightly contrived, but is similar to many real-world tasks: the goal is to make it impossible for there to be a state in which some contacts have initials and not others. This can be done with Listing 6-3's code added after the previous code. It is split into multiple functions for greater clarity.

Example 6-3. Adding Initials to the Address Book

(defn add-initials
    "adds initials to a single contact and returns it"
    [contact]
    (assoc contact :initials
        (str (first (contact :fname)) (first (contact :lname)))))

(defn add-all-initials
    "adds initials to each of the contacts in a list of contacts"
    [contacts]
    (dosync
        (doseq [contact (ensure contacts)]
          (alter contact add-initials))))
(defn print-contacts-and-initials
    "prints a list of contacts, with initials"
    [contacts]
    (dorun (map (fn [c]
                             (println (str "Name: " (@c :lname) ", " (@c :fname) " (" (@c :initials) ")")))
                         @contacts)))

(defn print-contacts-and-initials
    "prints a list of contacts, with initials"
    [contacts]
    (doseq [c @contacts]
      (println (str "Name: " (@c :lname) ", " (@c :fname) " (" (@c :initials) ")"))))


(add-all-initials my-contacts)
(print-contacts-and-initials my-contacts)

When executed the code prints off the same names as before, with their initials added:

Name: VanderHart , Luke (LV)
Name: Sierra, Stuart (SS)
Name: Doe, John (JD)

The key function which actually deals with the refs is add-all-initials. It first opens a transaction, and then calls ensure on the contacts ref. This is to make sure that if contacts is updated while the transaction is running, it will be restarted. I want to include all of the contacts, and without the ensure, if contacts were updated with a new contact after the transaction had begun it would not be included.

Then, for each contact (using doseq), it alters it using the add-initials function, setting it to a map containing an initials key. Because all the alter statements are run in the same transaction, the update to all the contacts is atomic: from outside the transaction, all the contacts are updated to a value with the new field instantaneously.

Because the whole operation never blocks, other threads involved in reading the contacts list continue to do so at full speed. If another transaction in another thread tries to write to a contact at the same time, one transaction or the other might have to retry, but in the end, it's still guaranteed that everything that needs to happen will eventually happen to each contact, and that they will remain in a coordinated state.

Atoms

Atoms are Clojure's implementation of synchronous, uncoordinated identities. When updated the change is applied before proceeding with the current thread and the update occurs atomically. All future dereferences to the atom from all threads will resolve to the new value.

Atoms are based on the atomic classes in the Java java.util.concurrent.atomic package. They provide a way to update values atomically with no chance of race conditions corrupting the update. Unlike the Java atomic package, however, they are lock-free. Therefore, reads of atoms are guaranteed never to block and updates will retry if the atom's value is updated while they are in progress, just like refs.

In practice, atoms are used almost exactly like refs, except that since they are uncoordinated they do not need to participate in transactions.

Using Atoms

To create an atom, use the atom function, which takes a single argument and returns an atom with the argument as its initial state. To retrieve the value of an atom, use the deref function (the same one used for refs) or the @ shorthand.

user=> (def my-atom (atom 5))
#'user/my-atom
user=> @my-atom
5

As with refs, there are two ways to update the value of an atom: swap! and reset!. The swap! function takes an atom, a function, and any number of additional arguments. It updates (swaps) the value of the atom for the value obtained by calling the supplied function with the current value of the atom as the first argument, and the other provided arguments as additional arguments. It returns the new value of the atom. Like the function provided to alter the function passed to swap! may be executed multiple times and should therefore be free of side effects.

The following example uses the atom set up in the previous snippet and passes the addition function, along with an additional argument of 3.

user=> (swap! my-atom + 3)
8
user=> @my-atom
8

The reset! function sets the value of an atom regardless of the current value. It takes two arguments (the atom and a value) and returns the new value of the atom.

user=> (reset! my-atom  1)
1
user=> @my-atom
1

When to Use Atoms

In practice, atoms aren't used as frequently as refs in programs. Since they can't coordinate with other pieces of state, their usefulness is limited to scenarios in which an identity is truly, logically independent from other identities in the system.

For cases where an identity is independent, however, atoms are the right choice. They avoid much of the overhead associated with refs and are very fast, particularly to read. They don't have the parallelism implications of agents (discussed in the next section), and overall are the most lightweight of Clojure's identity types.

One example of a case where atoms are very useful is for caching values. Cached values need to be accessible quickly, but aren't dependent on the rest of the system's state. Clojure's memoize function (which caches the results of calling a function and is described more fully in Chapter 14 uses atoms internally to maintain its cache.

Asynchronous Agents

Agents are one of Clojure's more unique and powerful features. Like refs and atoms, they are identities and adhere to Clojure's philosophy of identity and state. Unlike refs and atoms, however, updates to their values occur asynchronously in a separate system managed thread pool dedicated to managing agent state.

This implies that agents are not only a means of storing and managing state in a concurrent environment (although they certainly are that), but are also a tool for introducing concurrency into a program. Using agents, there is no need to manually spawn threads, manage thread pools, or explicitly cause any other kind of concurrency. Agents are identity types, and just as easy to use and update as refs or atoms, but have concurrency thrown in "for free."

Creating and Updating Agents

Agents can be created by using the agent function, which takes a single value as the initial value of the agent. Like other Clojure identities, the value ought to be immutable.

user=> (def my-agent (agent 5))
#'user/my-agent

Also, like the other Clojure identities, the current value of an agent can always be obtained immediately without blocking by dereferencing it using the deref (or @ ) function.

user=> @my-agent
5

The value of an agent can be updated by dispatching an action function using the send or send-off function. The call to send returns immediately in the current thread (returning the agent itself). At some undetermined point in the future, in another thread, the action function provided to send will be applied to the agent and its return value will be used as new the value of the agent.

send takes any number of arguments. The first two are the agent and the action function, the rest are additional arguments to be passed to the update function whenever it executes. For example, to send an update to the agent previously defined:

user=> (send my-agent + 3)
#'user/my-agent

Then, at some point in the future, the new value of the agent can be retrieved:

user=> @my-agent
8

There is no hard guarantee about when the update action will be applied, although usually it is nearly immediate from a human perspective. Don't write code that depends on an agent's value being updated at any given time: agents are asynchronous and can't provide guarantees about exactly when their actions will occur.

send-off has an identical signature and behavior as send. The only difference is that the two functions hint at different performance implications to the underlying agent runtime. Use send for actions that are mostly CPU-intensive, and send-off for actions that are expected to spend time blocking on IO. This allows the agent runtime to optimize appropriately. If you use the "wrong" method, everything will still work, but the overall throughput of the agent system will be lower, since it will be optimizing for the wrong type of action.

Update Semantics

Although agents provide no guarantee as to when an action will take effect, update dispatches do follow certain rules that can be relied upon:

Actions to any individual agent are applied serially, not concurrently. Multiple updates to the same agent won't overwrite each other or encounter race conditions.
Multiple actions sent to an agent from the same thread will be applied in the order in which they were sent. Obviously, no such guarantees can be made about actions sent from different threads.
If an action function contains additional dispatches to agents, either to itself or other agents, dispatches are saved and are not actually called until after the action function returns and the agent's value has been updated. This allows actions on an agent to trigger further actions without having the updates conflict.
If an update is dispatched to an agent within a STM transaction (for example, a dosync expression), the dispatch is not sent until the transaction is committed. This means that it is safe to dispatch updates to atoms from within STM transactions.

Errors and Agents

Because action functions dispatched to agents occur asynchronously in a separate thread, they need a special error-handling mechanism. Normally, exceptions are thrown from the location in the thread in which they occur, but if an action function throws an exception, there's no way of determining that it occurred, except for the built-in agent error handling.

Agents have one of two possible failure modes : fail or : continue. If an exception is thrown while processing an action, and the agent's failure mode is : continue, the agent continues as if the action which caused the error had never happened, after calling an optional error-handler function. If, on the other hand, its failure mode is : fail, the agent is put into a failed state, and will not accept any more actions until it is restarted (although it saves its current action queue).

By default, agents with an error handler defined have a failure mode of : continue. If they don't, then the default is : fail. The failure mode of an agent can also be set explicitly using the set-error-mode! function, which takes two arguments: an agent and a mode keyword. For example, the following code:

user=> (set-error-mode! my-agent :continue)
nil

You can check the current failure mode of an agent using the error-mode function:

user=> (error-mode my-agent)
:continue

Agents can be assigned an error handler using the set-error-handler! function, which takes an agent and an error function as arguments. The error function will be called whenever an action causes an exception to be thrown or sets the agent to an invalid value. It must itself take two arguments: an agent and the exception. For example, the code that follows:

user=> (set-error-handler! my-agent (fn [agt ex] ( ... ))
nil

Typically, the error handler is used to log an error, or implement some correction to ensure that it doesn't happen again. You can also retrieve the current error handler for an agent using the error-handler function, which takes a single agent as an argument and returns its error handler function.

Dealing with Agents in a Failed State

Agents currently in a failure state throw an exception on any attempt to call send or send-off on them (although dereferencing will still return the last good value of the agent). For example, dividing by zero throws the agent into a failed state in the following example:

user=> (def an-agent (agent 10))
#'user/an-agent
user=> (send an-agent / 0)
#<Agent@1afa486: 10>
user=> (send an-agent + 1)
java.lang.RuntimeException: Agent is failed, needs restart

To inspect the current errors on an agent, use the agent-error function and pass it the agent as a single argument:

user=> (agent-error an-agent)
#<ArithmeticException java.langArithmeticException: Divide by zero>

In order to put the agent back into a working state, you must call the restart-agent function. restart-agent takes as its arguments an agent, a new state, and any number of additional keyword option/value pairs. The only currently implemented option is :clear-actions with a boolean value.

When restart-agent is called, it resets the value of the agent to the provided state and takes away the agent's failure state so the agent can accept new actions. If the :clear-actions true option is provided, the agent's action queue is cleared; otherwise, pending actions will be called sequentially. restart-agent returns the new state of the agent.

To reset the agent in the preceding example:

user=> (restart-agent my-agent 5 :clear-actions true)
5

And now, the agent can be sent more actions:

user=> (send my-agent + 1)
#<Agent@1365360: 5>
user=> @my-agent
6

Waiting for Agents

Although agents are by their nature asynchronous, it is occasionally necessary to force a certain degree of synchronicity. For example, if a long-running action is being performed on an agent, a result might be required in the original thread before computation can continue. For this purpose, Clojure provides the await and await-for functions, both of which block a thread until an agent has finished processing its actions.

await takes any number of agents as its arguments and blocks the current thread indefinitely until all actions to the provided agent(s) which were dispatched from the current thread) are complete. It always returns nil.

await-for is nearly identical, except that it takes a timeout (in milliseconds) as its first argument and any number of agents as additional arguments. If the timeout expires before all the agents are finished, await-for returns nil. If the agents did finish before the timeout, it returns a non-nil value.

Shutting Down Agents

Whenever agents are used in a Clojure program, the Clojure runtime creates a thread pool in which to run agent actions behind the scenes. Normally, it isn't necessary to concern yourself about this, except that a Java/Clojure program will not gracefully terminate while there is still an active thread pool. To deactivate the agent thread pool, call the shutdown-agents function with no arguments. All currently running actions will complete, but no more will actions to agents will be accepted, and when all actions are complete the pool will shut down, allowing the program to terminate.

Never call shutdown-agents unless you intend to terminate the running program. shutdown-agents is irreversible without restarting your application, and after calling it agents can no longer be updated: all calls to send or send-off will throw exceptions.

When to Use Agents

When deciding when to use agents, it is very important to realize that agents are not only a means of managing state, but also managing program execution. Using agents doesn't just imply managed state with identities, but also splitting up computational processes across multiple threads.

As state management tools, agents are effective although they don't have all the features that refs do, such as transactions and ensuring data consistency. They are an uncoordinated identity type. For data that really needs these things, definitely consider using refs. Likewise, agents don't offer much more than atoms for simple uncoordinated state management. If all you need to do is ensure the integrity of individual pieces of state, atoms are probably a better choice than agents.

The important feature of agents is not only that they protect state, but that updates to that state occur concurrently with the thread that initiated the update. If, as in the previous examples, the only action functions being passed to agents are simple and blindingly fast, like +, there isn't much benefit to using an agent. But when the functions are more processing intensive, or when they perform IO (something that isn't even possible within transactions), there can be huge benefit to having it occur out-of-band. Every action function passed to an agent is offloaded from the calling thread, freeing it up for other important tasks.

This concurrency is the most important feature of agents. Their state management is convenient and works very well in concert with the concurrency features, but concurrency is the primary motivation behind choosing agents as opposed to one of Clojure's other identity types.

Vars and Thread-Local State

In addition to refs, atoms, and agents, Clojure has a fourth way of "changing" state: thread local var bindings. Since they are thread-local, they're not useful for shared access to state from different threads.

Rather, vars are ordinary bindings (the same ones discussed in Chapter 1, those created by def) which can be rebound on a per-thread basis and obey stack discipline. This allows for some level of imperative-style coding. I=It's the only way in Clojure to "change" a variable other than using a full-blown reference type.

To establish a thread-local binding for a var, use the binding form. binding takes a vector of bindings and one or more body expressions. The binding vector consists of a series of pairs of symbols and values. Then, the body expressions are evaluated within an implicit do, using the provided values whenever their matching symbols are encountered. binding may only be used on vars which are already defined by def on the top level. For example, the following code:

user=> (def x 5)
#'user/x
user=> (def y 3)
#'user/y
user=> (binding [x 2 y 1] (+ x y))
3
user=> (+ x y)
8

Within the context of the binding expression, (+ x y) yields 3. Outside the binding expression, (+ x y) uses the original values of the vars, yielding 8.

So far, binding might just look similar to let. The difference is, rather than establishing a local symbol, it actually rebinds it for all uses, so long as it's used at a lower position within the same call stack. For example, consider the following REPL session:

user=> (def x 5)
#'user/x
user=> (def y 3)
#'user/y
user=> (defn get-val [] (+ x y))
#'user/get-val
user=> (get-val)
8
user=> (binding [x 1 y 2] (get-val))
3

Binding actually reestablishes the values of x and y for all uses. When the get-val function is used within the stack context of the binding form, it picks up on the thread-local bindings of x and y established by binding and uses them.

Additionally, symbol bindings established by binding can be updated using the set! function, similar to imperative variables in most other programming languages. The following example is lengthy, but it demonstrates how independent code can update the same binding:

user=> (def x 5)
#'user/x
user=> (def y 3)
#'user/y
user=> (defn set-val [] (set! x 10))
#'user/set-val
user=> (defn get-val [] (+ x y))
#'user/get-val
 user=> (binding [x 1 y 2] (set-val) (get-val))
12

Notice how set-val was called first, and resets the value of x to 10, so that when get-val comes along later, it uses the updated value. Within the binding form, all references to bound symbols will see changes made by set!, just as if, for that limited context, they were ordinary, imperative, mutable variables.

When to Use Thread-Local Vars

There are very few cases where it is appropriate to use thread-local state in Clojure. Extensive use of it goes against the spirit of functional programming, and is it only provided as a concession to the very few cases where it is necessary for performance or practicality.

Scenarios where thread-local vars are useful tend to fall into two categories:

Algorithms where it is much more logical and convenient to keep track of some state as a mutable variable. Examples include some parsers and state machines. Usually, however, an equivalent, purely functional algorithm does exist, even if it's not apparent to a programmer from an imperative background.
Places where the semantics truly indicate a thread-local, context-based value that can be changed, such as a settings toggle. For example, many of Clojure's runtime settings are stored in var bindings, where they are easily accessible from all code and can be set! to new values conveniently. One example is *out*, which points to the standard output stream.

Keeping Track of Identities

There is more to managing state than just updating it, and so Clojure provides two very useful "hooks" into its state management system, which make it easy to write code that keeps track of states and identities.

Validators

Validators are functions that can be attached to any state type (refs, atoms, agents, and vars) and which validate any update before it is committed as the new value of the identity. If a new value is not approved by the validator function, the state of the identity is not changed.

To add a validator to an identity, use the set-validator! function. It takes two arguments: an identity and a function. The function must not have side effects, must take a single argument, and must return a boolean.

Then, whenever the state of the identity is about to be updated, the provided validator function will be passed the new value of the identity. If it returns true, the identity is updated normally. If it returns false or throws an exception, an exception is thrown from the identity update function.

For example, the following code sets a validator on a ref, ensuring that all values must be greater than zero:

user=> (def my-ref (ref 5))
#'user/my-ref
user=> (set-validator! my-ref (fn [x] (< 0 x)))
nil
user=> (dosync (alter my-ref – 10))
#<CompilerException java.lang.IllegalStateException: Invalid Reference State>
user=> (dosync (alter my-ref – 10) (alter my-ref + 15))
10
user=> @my-ref
5

And on an agent:

user=> (def my-agent (agent 5))
#'user/my-agent
user=> (set-validator! my-agent (fn [x] (< 0 x)))
nil
user=> (send my-agent – 10)
#<Agent 5>
user=> (agent-errors my-agent)
(#<CompilerException java.lang.IllegalStateException: Invalid Reference State>)

Note that on agents, the error is trapped and logged using the agent error-handling system, rather than being thrown immediately as it is with refs.

If the value of an identity is already invalid according to the given validator function when setting a validator, an exception is thrown and the validator is not set:

user=> (def my-atom (atom −5))
#'user/my-atom

user=> (set-validator! my-atom (fn [x] (< 0 x)))
 #<CompilerException java.lang.RuntimeException java.lang.IllegalStateException: Invalid Reference State>

The current validator function for an identity may be retrieved using the get-validator function, which takes a single identity as an argument:

user=> (def my-agent (agent 5))
#'user/my-agent
user=> (get-validator my-agent)
 #<user$eval__4868$fn__4870 user$eval__4868$fn_4870@1dc518b>

As can be seen, the string representation of a function isn't very useful. However, since functions are first-class entities in Clojure, you can use the returned function however you wish—use it as a validator on another identity, call it with a value to see what it returns, or anything.

To remove a validator, just pass nil instead of a validator function to set-validator!

Watches

Watches are functions which are called whenever a state changes. They work on refs, atoms, agents, and vars (although with vars, they are only called with root binding changes, not when updated with set!).

Unlike validators, they are called immediately after the state has changed (for agents, this is in the same thread). Each identity may have multiple watches: each watch has an arbitrary key that can be used to identify it later. Watches are useful for structuring program flow that logically depends on the value of an identity—they easily provide a form of event-based or reactive programming.

To add a watch, use the add-watch function. It takes three arguments: an identity, a key, and a function. The key may be any value, provided it is unique among an identity's watchers.

The watch function itself takes four arguments: the key, the identity, the old state of the identity, and the new state.

For example, the following code uses watches to print the old and new values of a ref whenever it is updated:

user=> (defn my-watch [key identity old-val new-val]
                  (println (str "Old: " old-val))
                  (println (str "New: " new-val)))
#'user/my-watch
user=> (def my-ref (ref 5))
#'user/my-ref
user=> (add-watch my-ref "watch1" my-watch)
#<Ref 5>
user=> (dosync (alter my-ref inc))
Old: 5

New: 6
6

Note that if an identity is being updated in rapid succession, it may have been updated again by the time the first watch function is called. This is why watch functions are passed the old and new value of the identity: they reflect the state change from the update that actually triggered the watch. Dereferencing the identity within the watch function may yield a different value than the new value passed in if there are a lot of updates occurring.

To remove a watch, use the remove-watch function. It is very simple: it just takes an identity and a key, and removes watchers associated with that key from the identity.

user=> (remove-watch my-ref "watch1")
#<Ref 6>

Summary

Clojure's state management systems provide an array of effective ways to manage state. They combine a more sophisticated philosophical approach to state with state-of-the-art Software Transactional Memory and agent-based systems to make state management clean and effective to use. Managing state in Clojure is usually much less error prone than in other languages and works the same in single or multithreaded programs. With four distinct tools state management strategies, there should always be something that meets your needs:

Use refs provide synchronous, coordinated updates, and allow direct access to the STM system.
Use atoms to manage synchronous, independent state (such as cached or memorized values) with maximum efficiency.
Use agents to manage asynchronous state as well as introduce concurrency into your program.
Use vars to maintain state within a stack discipline to efficiently simulate mutable variables for algorithms that require it.
Use validator functions to maintain data integrity.
Use watches to trigger events dependent on an identity's values.

^[4]For the definitive discussion of state and identity, see Rich Hickey's essay "On State and Identity" at http://clojure.org/state.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6. State Management

Create new playlist

Sign In

Sign Up