© Stefania Loredana Nita and Marius Mihailescu 2017

Stefania Loredana Nita and Marius Mihailescu, Practical Concurrent Haskell, https://doi.org/10.1007/978-1-4842-2781-7_7

7. Transactional Memory Case Studies

Stefania Loredana Nita and Marius Mihailescu1

(1)Bucharest, Romania

Safety and ease in programming are two advantages of transactional memory. If the transactions are used correctly, then it is almost impossible for problems to occur in parallel code (for example, deadlocks). The programmer mostly needs to assign transactions (and maybe some transaction variables). It is not necessary to identify the locks or their correct order to prevent deadlocks or other problems. How do you use transactions correctly? All shared data is passed through transaction variables to threads. Transactional data is accessed only through transactions; and in transactions, there are no operations that can be rolled back.

This chapter explains what transactional memory is and how it works in Haskell.

Transactions

A transaction consists of grouped activities that are not individually visible by an external observer. The results of the combined activities are seen by the external observer. A database transaction has several properties: atomicity, consistency, isolation, and durability —known as ACID properties.

  • Atomicity means that all actions that compound a transaction must terminate successfully; but if an action fails, then the transaction has not finished successfully. When a transaction is successful, then it commits; otherwise, it aborts.

  • Consistency depends on the application, but generally, it represents a selection of states for data. This means a new state of data is created by a transaction; but if a failure occurs, then the data is returned to its state prior to the transaction.

  • Isolation means that transactions run separately. They do not interact with each other, even if they are executed in parallel.

  • Durability means that once a transaction successfully completes and commits, the result is irreversible and available for further transactions.

Introducing Transactional Memory

Transactions seem to be a good technique for programming languages. It assures consistency in the information used by other processes.

The basic concept is easy to understand. In a system based on concurrency or parallelism, the read and write operations are coordinated properly, applying abstraction provided by the characteristics of the transactions. Nowadays, programmers coordinate read and write operations using basic techniques, such as locks or mutexes, to avoid interaction between concurrent threads.

Because of their properties, transactions can be used in concurrency such that a program encapsulates a task in a transaction. The successful execution is assured by atomicity; then, the outcome is committed, or the task is aborted if there is a failure. Isolation assures that the result is always the same, whether the transaction is executed alone or there are concurrent transactions.

For transaction memory, atomicity becomes failure atomicity, assuring consistency. For example, if a transaction is not completed successfully, then some pieces of information could remain in an uncertain state and affect other transactions. To control concurrency, it is necessary to have a technique for implementing failure atomicity, so that the data can be reverted to a previous state in the event of failure.

Software Transactional Memory

Read logs and write logs supply software transactional memory (STM) systems the information needed for conflict detection and solution. Read logs have a version number that provides useful information to a thread (because when an object update occurs, its version number is increased). The thread checks if a concurrent thread has changed an object that it uses for reading. If a conflict occurs, then the thread will use undo logs to cancel a transaction.

There are many characteristics that differentiate STM systems. This is usually based on the purpose of the system. The following are a few examples.

  • Low sequential overhead. In these systems , the transaction instructions run as quickly as allowed.

  • Good scalability. A parallel amount of work utilizing transactions can enhance execution as processors are included. This is very useful in cloud computing systems.

  • Strong progress guarantees. For example, blocking is avoided. Another criteria is the programming semantics provided by the STM systems .

The evaluation of the transactional memory systems is influenced by the way that systems with multiple processors work. From a computational point of view, choosing a more expensive method leads to lower synchronization or cache movement. Hardware dissimilarities could sway the capabilities of STM systems; for example, the STM system model for memory vs. classical memory accessions.

Software Transactional Memory in Haskell

STM supplies compounding atomic transactions, which allow you to combine read and write operations, or other types of memory operations, into one atomic operation. As you have seen, the transactions could abort or retry.

The STM transactions are found in the STM monad.

data STM a
instance Monad STM

Transactions are executed using a special function called atomically, which is covered in a later in this chapter.

atomically :: STM a -> IO a

A very important and useful primitive is represented by retry, a function that cancels the actual transaction and reloads it in case any dependencies are modified during transactions in different threads.

retry :: STM a

The STM package supplies the essential transactional variables, but there are other complicated structures that are delivered by other packages. The following are a few structures from stm:Control.Concurrent.STM.

  • TVar: A localization of memory that is imparted; a correspondent of IORef, but its type is transactional.

  • TMVar: A changeable variant correspondent to IORef.

  • TChan: Channels corresponding to Chan from the base.

  • TQueue: A channel with high speed, but does not support channel replication.

  • TBQueue: Limited and non-replicable channels.

  • TArray: The corresponding transactional arrays from the array package.

The stm-containers and stm-chans packages also supply transactional structures.

As technology becomes more complex, processors contain more cores, and software products become more elaborated, there is an increasing need for transactional memory. Mainly, the software transactional memory represents the act of synchronizing common memory and avoiding deadlock.

GHC is one of the best compilers. It supplies very good assistance to STM systems. To work with STM, Haskell makes use of monads.

The following are simple examples that implement typical concurrent programming models.

module STM where
import Random
import Control.Monad
import Control.Concurrent
import Control.Concurrent.STM

The first example implements the semaphores, or locks, as they are also named. In the snipped code, we use traditional obtain and release to achieve and issue the semaphore.

type Semaphore = TVar Bool
newSem :: Bool -> IO Semaphore
newSem available = newTVarIO available
 obtain :: Semaphore -> STM ()
 obtain sem = do b <- readTVar sem
            if b
               then writeTVar sem False
               else retry
 release :: Semaphore -> STM ()
 release sem = writeTVar sem True

One of the powerful characteristic of STM’s implementation in GHC is the retry combinator, because it manipulates conditional synchronization. When a state is not accomplished, it is called retry. At the right moment, the process is awoken by the runtime system.

Next is an example of an unlimited buffer , in which processes can add or retrieve information. In the following piece of code, we do not concentrate on efficiency, but on clarity of implementation.

type Buffer a = TVar [a]
newBuffer :: IO (Buffer a)
newBuffer = newTVarIO []
put :: Buffer a -> a -> STM ()
put buffer item = do ls <- readTVar buffer
                      writeTVar buffer (ls ++ [item])
get :: Buffer a -> STM a
get buffer = do ls <- readTVar buffer
                 case ls of
                   [] -> retry
                   (item:rest) -> do writeTVar buffer rest
                                     return item

In the preceding example, the main operation is writing. When there is only one process that writes on the buffer, it can be read an arbitrary number of times. If there are many processes, just one of them could access the buffer at a certain time, because the semaphore implementation gives a reciprocal exclusion. The first process that made a request is first served.

An easy but important pattern is resource allocation. In the following, we use a relaxed variant of resource allocation , in which a counter is used to keep the quantity of disposable resources. When a process requests a certain quantity of resources, and it is not disposable, then it blocks.

type Resource = TVar Int
acquire :: Resource -> Int -> STM ()
acquire res nr = do n <- readTVar res
                     if n < nr
                        then retry
                        else writeTVar res (n - nr)


release :: Resource -> Int -> STM ()
release res nr = do n <- readTVar res
                     writeTVar res (n + nr)

The next example is the dining philosophers, a representative problem for concurrent programming, and a good example of STM use. To check if the simulation works right, we need to display what the philosophers are doing. To do that, we need to pay attention to how we print output in standard out, because the classical I/O primitives in Haskell are thread unsafe. We have a buffer where the processes write, and a thread that reads from the buffer and prints the output. This is useful because only one process gives the output, so it will not be an unexpected output. The forks are implemented as binary semaphores, as before.

simulation n = do forks <- replicateM n (newSem True)
                   outputBuffer <- newBuffer
                   for [0..n-1] $ i ->
                     forkIO (philosopher i outputBuffer
                             (forks!!i)
                             (forks!!((i+1)`mod`n)))
                   output outputBuffer
 output buffer =
     do str <- atomically $ get buffer
        putStrLn str
        output buffer
for = flip mapM_

In the first line of code, the system is prepared for simulation, such that simulation is a function with just one argument; namely, the number of philosophers. The next step is to create the number of forks and the buffer , and then generate the philosopher processes that receive their corresponding forks. In the last step, the principal thread enters a loop that reads the output from the buffer and then displays the read messages.

philosopher :: Int -> Buffer String -> Semaphore -> Semaphore -> IO ()
philosopher n out fork1 fork2 =
     do atomically $ put out ("Philosopher " ++ show n ++ " is thinking.")
        randomDelay
        atomically $ do
          p fork1
          p fork2
        atomically $ put out ("Philosopher " ++ show n ++ " is eating.")
        randomDelay
        atomically $ do
          v fork1
          v fork2
        philosopher n out fork1 fork2
randomDelay = do r <- randomRIO (100000,500000)
                  threadDelay r

As you can see, the preceding code is quite simple, but its strength is that it is able to sequentially create transactions and then atomically execute them. Let’s observe that the philosopher processes run in the IO monad and access the transactional memory when they synchronize. This is a classic example of transactional memory use.

When a philosopher starves, randomDelay occurs, which calls threadDelay at a certain time.

A Bank Account Example

The following is a function that transfers an amount of money.

sendAmount :: Account -> Account -> Int -> IO ()
-- Send 'amount' from account 'from' to account 'to'
sendAmount fromAccount toAccount moneyAmount
  = atomically (do deposit toAccount moneyAmount
                    withdrawAmount fromAccount moneyAmount)

The inside do block is quite simple: deposit is called to put an amount in a repository, and withdrawAmount is called to retrieve an amount. We will implement these functions later in this chapter. First, let’s observe the atomically function . The parameter is an action that is executed atomically. So, there are two important characteristics of a transaction to observe:

  • Atomicity. The repercussions of atomic transactions are obvious to other threads at the same time. In our example, this means that an estate in which the amount was stored, but not withdrawn, could not be identified by another threads.

  • Isolation. When atomic transactions are performed, the action is not affected by other threads.

Let’s imagine a basic running model for atomicity: it is only one global lock. The atomically act gets the lock, runs the action, and discharges the lock. This usage guarantees that two atomic pieces will not be executed at the same time, and so the atomicity is guaranteed.

This model sounds very simple, but it still has disadvantages. First, isolation is not ensured because there is no condition that stops a thread to write the same IORef outright (namely, outside atomically, without keeping the general lock); whereas a certain thread is approaching an IORef inside an atomic piece (which holds the general lock). Another disadvantage is low performance, because each atomic sequence will be serialized, although interference is impossible. This problem is discussed later. For the moment, let’s review the atomically function.

atomically :: STM a -> IO a

As you can see, the parameter of the atomically function is an action that has STM a. An STM action is similar to an IO action in regards to side effects, but those of the STM actions are much lower. The most important operations done in an STM action are reading and writing transactional variables, which have TVar a type; like IORefs, it could be read or written in an IO action.

readTVar  :: TVar a -> STM a
writeTVar :: TVar a -> a -> STM ()

Like IO actions, the STM actions can be formed with the same do notation, because it has an implementation for every type. The following code presents the withdraw function.

type Account = TVar Int

withdrawAmount :: Account -> Int -> STM ()
withdrawAmount account amount = do
    bal <- readTVar account
    writeTVar account (bal - amount)

The Account type is has one variable, whose type is Int to represent the repository balance. The withdrawAmount function subtracts the desired amount from the total amount of the account, an operation that is implemented as an STM action.

The following is the entire sendAmount function.

depositAccount :: Account -> Int -> STM ()
depositAcoount account amount = withdraw account (- amount)


import System.IO
import Control.Concurrent.STM


type Account = TVar Int

withdrawAmount :: Account -> Int -> STM ()
withdrawAmount account amount = do
    bal <- readTVar account
    writeTVar account (bal - amount)


depositAccount :: Account -> Int -> STM ()
depositAccount account amount = withdrawAmount account (- amount)


sendAmount :: Account -> Account -> Int -> IO ()
-- Transfer 'amount' from account 'from' to account 'to'
sendAmount fromAccount toAccount moneyAmount
    = atomically (do depositAccount toAccount moneyAmount
                     withdrawAmount fromAccount moneyAmount)


displayAccount :: Account -> IO Int
displayAccount account = atomically (readTVar account)


main = do
    fromAccount <- atomically (newTVar 200)
    toAccount <- atomically (newTVar 100)
    sendAmount fromAccount toAccount 50
    v1 <- displayAccount fromAccount
    v2 <- displayAccount toAccount
    putStrLn $ (show v1) ++ ", " ++ (show v2)

newTVar represents two accounts: the sender and the beneficiary.

Observe that the sendAmount function realizes four read and write actions: reading and writing to the sender account, and reading and writing to the beneficiary account. The readings and writings run atomically, and satisfy the requirements described in the beginning of this chapter.

Read and write operations outside a TVar transaction are prevented by the type system. Let’s do the following.

wrong :: Account -> IO ()
wrong account = do
    hPutStr stdout "Withdrawing..."
    withdrawAmount account 10
This program won't compile:
import System.IO
import Control.Concurrent.STM


type Account = TVar Int

withdrawAmount :: Account -> Int -> STM ()
withdrawAmount account moneyAmount = do
    bal <- readTVar acc
    writeTVar account (bal - moneyAmount)


wrong :: Account -> IO ()
wrong account = do
    hPutStr stdout "Withdrawing..."
    withdrawAmount account 10


main = do
    account <- atomically (newTVar 200)
    wrong account
    hPutStr stdout " Done! "

The program will be cancelled because the hPutStr and withdraw actions have different types, IO and STM, respectively, so they cannot be put in the same single block .

The following is the error message.

Couldn't match type 'STM' with 'IO'       Expected type: IO ()         Actual type: STM ()

If we make the withdrawAmount action an argument for an atomically function, then it will work fine.

right :: Account -> IO ()
right account = do
    hPutStr stdout "Withdrawing..."
    atomically (withdrawAmount account 10)

The program is compiled and could be executed as follows.

import System.IO
import Control.Concurrent.STM


type Account = TVar Int

withdrawAmount :: Account -> Int -> STM ()
withdrawAmount account moneyAmount = do
    bal <- readTVar account
    writeTVar account (bal - moneyAmount)


right :: Account -> IO ()
right account = do
    hPutStr stdout "Withdrawing..."
    {-hi-}atomically{-/hi-} (withdrawAmount account 10)


main = do
    account <- atomically (newTVar 200)
    right account
    hPutStr stdout " Done! "

Transactional Memory Version

Isolation and atomicity should be enough for STM use. But it is also important to keep a clean implementation of a model to understand every step. One advantage of STMs is that it has a clean and easy-to-use interface that can be implemented in different ways.

An example of implementation called optimistic execution is in the database area. An empty thread is assigned for many local transactions whenever an atomically act occurs; after that, the locks are not taken during an act action. Every time an act is running, there is a log in which writeTVar it is writing the address of TVar in that log, but not the value of TVar. The writeTVar writes every time it is called. Similarly, when readTVar is called, the address corresponding to TVar is searched into the log (only if writeTVar was previously called). If there is no TVar address value in the log, then it reads the TVar value, which is saved in the log. Of course, at the same time, there could be more threads that read and write values and TVar addresses.

When the act is performed, the log is verified, and if the result is success, then the log is committed. To validate a log, every record of TVar in the log is verified such that the value in the log must match the actual TVar value. If there is a match, the verification has succeeded, and then in the commit, every record from write in the log is written in the real TVars.

The presented steps are dependent one of each other, such that, the implementation does not allow the interrupts, utilizes locks or instructions for example, make a comparison and then exchange. All these are needed for ensuring that the verification and commit are not seen by another threads. These steps are part of the implementation; the programmer is unaware of them.

Earlier, we presented a successful case. When it fails, however, the transaction’s view of memory is wrong. Thus, the transaction is aborted, the log is reinitialized, and the act is executed again, this procedure is named re-execution . This is possible, because the writes have not been registered into the memory. It is very important that act contains only read and write operations so as not to influence other threads. Let’s examine the following piece of code.

atomically (do x <- readTVar xv
               y <- readTVar yv
               if x>y then launchRockets
                      else return () )

launchRockets :: IO () has secondary effects over other threads. Locks are not taken when the atomic pieces are executed, so when a concurrent thread changes the values of xv and yv, an inconsistent view of memory will occur. If this happens, then the run of launchMissiles is at fault; it will be discovered later that the verification has failed, and the transaction should run again. But in Haskell, we have type system, which impedes executing IO actions inside STM actions, thus, the previous piece of code will be cancelled by the type checker. This is another benefit of differentiation between IO and STM actions.

import System.IO
import Control.Concurrent.STM


launchRockets :: IO ()
launchRockets = hPutStr stdout "Zzzing!"


main = do
    xv <- atomically (newTVar 2)
    yv <- atomically (newTVar 1)
    atomically (do x <- readTVar xv
                   y <- readTVar yv
                   if x > y then launchRockets
                            else return () )

Blocking and Choice

Atomic blocks (or pieces) are inappropriate for coordination of concurrent programs. They dismiss two important characteristics: blocking and choice. In this section, you will see how to elaborate a base STM interface, so that blocking and choice are included in a modular approach.

Let’s consider the situation when a user tries to withdraw an amount greater than what is available; in this case, a thread should prevent this action. These situations occur often in concurrent programming. Other examples are when a thread tries to read from an empty buffer, or it is waiting for an occurrence; in these cases, the thread should block. To make this happen in STM, all we need to do is add the retry function , whose type is

retry :: STM a

The following is an improved withdraw function that aborts if the desired amount is greater than the available amount.

limitWithdrawAmount :: Account -> Int -> STM ()
limitWithdrawAmount account moneyAmount = do
    bal <- readTVar account
    if moneyAmount > 0 && moneyAmount > bal
    then retry
    else writeTVar account (bal - moneyAmount)
import System.IO
import Control.Concurrent.STM
import Control.Concurrent


type Account = TVar Int

limitWithdrawAmount :: Account -> Int -> STM ()
limitWithdrawAmount account moneyAmount = do
    bal <- readTVar account
    if moneyAmount > 0 && moneyAmount > bal
    then retry
    else writeTVar account (bal - moneyAmount)


delayDepo account moneyAmount = do
    hPutStr stdout "Waiting for deposit "
    threadDelay 3000000
    hPutStr stdout "OK! Depositing now! "
    atomically ( do bal <- readTVar account
                    writeTVar account (bal + moneyAmount) )


main = do
    account <- atomically (newTVar 100)
    forkIO (delayDepo account 1)
    hPutStr stdout "Withdrawing... "
    atomically (limitedWithdrawAmount account 101)
    hPutStr stdout "Success! "

The preceding code is branched a thread that calls delayDeposit; this function waits 3 seconds to deposit the sum. Meantime, the limitedWithdrawAmount function aborts because the available sum is too low; limitedWithdrawAmount is successful after a thread successfully completes the deposit.

The retry function is very easy: when a retry action occurs, the actual transaction is abandoned and reloaded at a future time. The theoretical correct procedure is that transaction to be retried instantly, but this is also not efficient, because the available amount is probably the same, in which case the transaction is retried. An efficient way is to stop the thread until another thread writes to the account. This approach raises a question: Is the implementation able to await the account? Yes, it is, because the account is read by the transaction as a retry function; this deed is registered in the transaction log.

The limitedWithdrawAmount condition has a communal model: it verifies that a Boolean statement is pleased. If it is not, the retry occurs. We have globalized all that we have explained in the following check function.

checkAcc :: Bool -> STM ()
checkAcc True = return ()
checkAcc False = retry

The following is another version of limitedWithdrawAmount.

limitedWithdrawAmount :: Account -> Int -> STM ()
limitedWithdrawAmount account moneyAmount = do
    bal <- readTVar account
    checkAcc (moneyAmount <= 0 || moneyAmount <= bal)
    writeTVar account (bal - moneyAmount)

Here is the code in which we added the check function.

import System.IO
import Control.Concurrent.STM
import Control.Concurrent


type Account = TVar Int

limitedWithdrawAmount :: Account -> Int -> STM ()
limitedWithdrawAmount account moneyAmount = do
    bal <- readTVar account
    checkAcc (moneyAmount <= 0 || moneyAmount <= bal)
    writeTVar account (bal - moneyAmount)


delayDepo account moneyAmount = do
    threadDelay 3000000
    hPutStr stdout "Depositing... "
    atomically ( do bal <- readTVar account
                    writeTVar account (bal + moneyAmount) )


main = do
    account <- atomically (newTVar 100)
    forkIO (delayDepo account 1)
    hPutStr stdout "Withdrawing... "
    atomically (limitedWithdrawAmount account 101)
    hPutStr stdout "Oh, phew! "

Now, let the user to make a choice: if the initial account does not have enough funds, then the user could choose to withdraw the amount from another account.

For that, we need the ability to choose an alternative action if the first one retries. To support choice, STM Haskell has a primitive action called orElse, whose type is

orElse :: STM a -> STM a -> STM a

Like atomically, orElse takes actions as its arguments, and glues them together to make a bigger action. Here are its semantics: The action (orElse a1 a2) first performs a1; if a1 retries (i.e., calls retry), it tries a2 instead; if a2 also retries, the whole action retries. It may be easier to see how orElse is used.

limitedWithdrawAmount2 :: Account -> Account -> Int -> STM ()
-- (limitedWithdrawAmount2 acc1 acc2 amt) withdraws amt from acc1,
-- if acc1 has enough money, otherwise from acc2.
-- If neither has enough, it retries.
limitedWithdrawAmount2 account1 account2 amount
  = orElse (limitedWithdrawAmount account1 amount) (limitedWithdrawAmount account2 amount)
import System.IO
import Control.Concurrent.STM
import Control.Concurrent


type Account = TVar Int

limitedWithdrawAmount :: Account -> Int -> STM ()
limitedWithdrawAmount account moneyAmount = do
    bal <- readTVar account
    checkAcc (moneyAmount <= 0 || moneyAmount <= bal)
    writeTVar account (bal - moneyAmount)


displayAccount name account = do
    bal <- atomically (readTVar account)
    hPutStr stdout (name ++ ": $")
    hPutStr stdout (show bal ++ " ")


limitedWithdrawAmount2 :: Account -> Account -> Int -> STM ()
-- (limitedWithdrawAmount2 acc1 acc2 amt) withdraws amt from acc1,
-- if acc1 has enough money, otherwise from acc2.
-- If neither has enough, it retries.
limitedWithdrawAccount2 account1 account2 amount
  = orElse (limitedWithdrawAcount account1 amount) (limitedWithdrawAccount account2 amount)


delayDepo name account moneyAmount = do
    threadDelay 3000000
    hPutStr stdout ("Depositing $" ++ show moneyAmount++ " into " ++ name ++ " ")
    atomically ( do bal <- readTVar account
                    writeTVar account (bal + moneyAmount) )


main = do
    account1 <- atomically (newTVar 100)
    account2 <- atomically (newTVar 100)
    displayAccount "Left pocket" account1
    displayAccount "Right pocket" account2
    forkIO (delayDepo "Right pocket" account2 1)
    hPutStr stdout "Withdrawing $101 from either pocket... "
    atomically (limitedWithdrawAmount2 account1 account2 101)
    hPutStr stdout "Successful! "
    displayAccount "Left pocket" account1
    displayAccount "Right pocket" account2

We use a showAcc helper function to display the content of an account before and after the withdrawal. We have two accounts, account1 and account2, both with insufficient funds for the limitedWithdrawAmount2 to succeed immediately. However, when the background thread deposits $1 into account2, the call succeeds.

Since the result of orElse is an STM action, you can feed it to another call to orElse and so choose among an arbitrary number of alternatives.

Summary

This chapter discussed how software transactional memory (STM) works in Haskell, and covered the most important operations that can be applied.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.135.80