© Alejandro Serrano Mena 2019
Alejandro Serrano MenaPractical Haskellhttps://doi.org/10.1007/978-1-4842-4480-7_9

9. Dealing with Files: IO and Conduit

Alejandro Serrano Mena1 
(1)
Utrecht, The Netherlands
 

In Parts 1 and 2 you learned the basics of pure evaluation and how it helps in parallelizing and distributing code. However, sometimes you need to step out to the wild world of side effects. You’ll start by looking at simple input and output in Haskell. At the beginning, the information will be input in the console, and the output will be printed on the screen. Afterward, you’ll learn how to use a permanent means of storing data and reading and writing from files in disk.

Computations with side effects may turn out wrong in many ways; maybe the data is corrupted, or perhaps the network connection goes down. Haskell includes an exception mechanism that signals these conditions and allows you to react. In pure computations you’ve been handling erroneous scenarios using Maybe and MonadPlus. The relation between the different ways of handling errors in Haskell will be clear by the end of this chapter.

One problem with Haskell’s lazy evaluation model is that it brings some unpredictability to input and output. The Haskell community has come up with several streaming data libraries to solve this problem. This chapter discusses conduit, as well as some applications of this library to deal with file handling and networking. After learning all this information, you’ll be ready to save the client and product data on disk using a binary serialization format. For that matter, the binary library will also be presented.

Basic Input and Output

To begin the journey through side effects in Haskell programs, let’s create some simple interactions with the console. These small examples will help you to discover the concepts involved. The first bit of code to look at is an executable program that just prints “Hello Beginning Haskell!” at the console.
module Main where
main :: IO()
main = putStrLn "Hello Beginning Haskell!"
A second example is a bit more involved. In this case, the program asks for a place and decides which point in time you should travel to, based on the given name. The algorithm is fairly easy; the important part of this code is how the information is taken from the user and threaded through the program.
main = do putStrLn "Where do you want to travel?"
          place <- getLine
          let year = (length place) * 10
          putStrLn $ "You should travel to year " ++ show year
If you run the program, the output will be like the following. (As usual, bold denotes those parts that are to be input, while the program font is used for output from the program.)
Where do you want to travel?
India
You should travel to year 50

Buffering

In some systems, especially in Windows, you may find that there’s no output at all upon executing a program. This issue is related to buffering. To gain efficiency, the information you send to a file (or to the console) is not directly written (or shown) but instead is buffered until a certain amount of data is gathered. You can change the way in which each handle uses buffering via hSetBuffering. The most common use of this function, making the system flush the contents after each newline character, is coded as follows:
import System.IO
main = do hSetBuffering stdout LineBuffering  -- enable line buffering
          -- continue with the rest

There’s another possible value for hSetBuffering, namely, NoBuffering, that makes the system use no buffering at all. However, you should be wary. Problems can result from using NoBuffering in combination with console functions in Windows.

It’s interesting to look at the signature of the functions that have been used in the examples. The easiest way to get the signature is to ask the interpreter with its :t command.
*Chapter9.BasicInputOutput> :t putStrLn
putStrLn :: String -> IO ()
*Chapter9.BasicInputOutput> :t getLine
getLine :: IO String

The do notation that has been used and the data types wrapped inside another type (in this case, IO) should give you a clue that you’re indeed working inside a monad. In the previous chapters, you saw that State represents computations that carry a state, Maybe computations that may fail, and so on. IO brings to a certain bit of code the ability to perform side effects. In the previous examples, it was used to print to and get information from the console.

There’s one important difference between IO and the rest of the monads that have been considered up to this point. For the other monads, there was some way to inspect the resulting value (e.g., using pattern matching with Maybe) or run the monad (with functions similar to runState). In that way, you could get back to a context without the corresponding monad.

By contrast, IO has no escape hatch. There’s no operation that converts an IO t value into a plain t value.1 This is how Haskell imposes a barrier between the computations that involve side effects and must work inside the IO monad, as well as the rest. You want to be sure that a function without IO in its signature is pure, free of side effects, and referentially transparent. Furthermore, every call to IO in a certain application must span from the initial main function, which has IO () type.

Note

For the Haskell compiler, an expression that is pure must be treated differently than an expression involving side effects and that lives in IO. However, from a programmer’s point of view, IO is no different from any another monad. It just happens to introduce side effects into computations.

Haskell’s Prelude module offers several functions for interacting with the console. In addition to the aforementioned putStrLn, which prints a line of text and then starts a new line, you have at your disposal putStr, a variant of printing a string but without any final newline character, and putChar, which prints just one character on the screen. For the common case where the data to print is not yet a string but can be converted into one by the show function, you can use print. Here’s a brief example where the user is requested to input a first name and a last name, and the system shows a value of the type Person :
main = do putStrLn "First name?"
          fName <- getLine
          putStrLn "Last name?"
          lName <- getLine
          putChar '>' >> putChar ' '
          print $ Person fName lName
For the record, the Person and Client data types were declared as follows in Chapter 4:
data Person   = Person { firstName :: String, lastName :: String }
              deriving (Show, Eq, Ord)
data Client i = GovOrg  { clientId :: i, clientName :: String }
              | Company { clientId :: i, clientName :: String
                        , person :: Person, duty :: String }
              | Individual { clientId :: i, person :: Person }
              deriving (Show, Eq, Ord)

All these output functions have IO () as the return type. Remember, () is the unit type and there’s only one value of that type, which is confusingly also named (). It’s customarily used in Haskell programming to identify those monadic computations that don’t have a value to return but that has effects on the context that you’re interested in. In addition to putStr and friends, other functions that use () in their return types are tell (from the Writer monad) and put (from the State monad).

The counterparts to the previous functions are the ones that receive information from the user. The most common one is getLine; its task is to gather all the input up to the moment in which the user presses Enter. You may be interested, however, in getting the input character by character, which you can do using getChar. Finally, in some cases you want to get all input up to an end-of-file marker. This is common when piping data between processes in a Unix-like shell. The getContents function provides this functionality.

Haskeline

If you’re planning to create a command-line application, the simple functions that Prelude includes won’t offer the best experience to the end user. In particular, you may want to provide command history or autocompletion.

The haskeline library is specifically designed for this task. The core of the library is the InputT monad transformer, which provides those features. The previous example asking for a person could be rewritten as follows:
import System.Console.Haskeline
main = runInputT defaultSettings $ do
  fName <- getInputLine "First name? "
  lName <- getInputLine "Last name? "
  case (fName, lName) of
    (Just f, Just l) -> outputStrLn $ show (Person f l)
    (_     , _     ) -> outputStrLn "I cannot identify you"

One difference with standard Prelude is that input functions return their value wrapped on a Maybe, anticipating the case in which the input may stop earlier than expected.

Since IO is a monad, you can use the enormous set of functions that were presented in Chapters 6 and 7. As you may remember, any monad is a functor, so you can use fmap directly on IO. For example, you may refactor the following use of variable s since it’s used only to thread the information to upperS:
import Data.Char
main = do s <- getLine
          let upperS = map toUpper s
          putStrLn upperS >> putStrLn upperS
into a more concise form which directly generates upperS:
main = do upperS <- fmap (map toUpper) getLine
          putStrLn upperS >> putStrLn upperS
Another possibility is accumulating some information using foldM. The following code goes through a whole list of clients, and for each of them it asks the user whether they should be included in a special VIP list:
import Control.Monad (foldM)
createVIPList :: Show a => [Client a] -> IO [Client a]
createVIPList = foldM (lst c -> do
                         putStrLn $ " Should " ++ show c
                                                ++ "be included as VIP? "
                         answer <- getLine
                         case answer of
                           'Y':_ -> return $ c:lst
                           _     -> return lst) []
It’s interesting to consider what happens when IO values are inside a certain container. For example, you may want to create a list of actions and, based on some user input, execute one of them. This is exactly what is done in the following piece of code:
main = do actionName <- getLine
          case lookup actionName listOfActions of
            Just action -> action  -- execute action
            Nothing     -> putStrLn "Unknown action"
listOfActions :: [(String, IO ())]
listOfActions = [
  ("greet", do name <- getLine
               putStrLn $ "Hello " ++ name),
  ("sum"  , do putStrLn "First number:"
               n1 <- fmap read getLine
               putStrLn "Second number:"
               n2 <- fmap read getLine
               putStrLn $ show n1 ++ "+" ++ show n2
                                  ++ "=" ++ show (n1+n2))]

It’s important to think for a moment how the execution of such code differs from what you may expect. In most programming languages, the call to lookup would have triggered the evaluation of listOfActions. Then, all the calls to getLine or putStrLn would have been executed since they appear in the body of listOfActions. However, in the Haskell code the side effects aren’t executed until you’ve unwrapped the action to be of type IO t for some t and asked for its execution, which happens in the line in bold with the “execute action” comment. One important implication is that IO values are first-class citizens of Haskell, like functions are, and can be combined, can be passed as arguments, and can be returned as any other value.

Randomness

Let’s take a break from input and output and consider the issue of randomness in Haskell. For that purpose, I present a simulation of a time machine breaking in the middle of a journey. When this breakdown happens, the time traveler is involved in a disturbing experience: ending in a random place, at a random point in time, with no clue of what is outside the machine.

The following code uses the randomRIO function from the System.Random module in the random package to simulate a random walk from an initial point, that is, a series of random jumps in time made by a broken time machine. The randomRIO function needs upper and lower bounds for the value to obtain, which in this case have been set to 0 and 3000. Since the walk may be infinite, the code just prints the ten initial hops.
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Monad.Loops
import System.Random
main = do (initial :: Int) <- fmap read getLine
          jumps <- unfoldrM (\_ -> do next <- randomRIO (0, 3000)
                                      if next == initial
                                        then return Nothing
                                        else return $ Just (next, next))
                            initial
          print $ take 10 jumps

Note

In the preceding code I’m using the monadic counterpart of unfolding, namely, unfoldrM. However, you won’t find it in the usual Control.Monad module. Instead, you need to import Control.Monad.Loop from the monad-loops package. This is an interesting module, which you should add to your list of tools for monadic programming.

Since you know how to read and write from the console and how to generate random values, you can now develop small console games. Exercise 9-1 asks you to do this.

Exercise 9-1. Win a Time Travel Trip

Develop a small game in which you can win a time travel trip in one of the machines in the store. The game should generate a random number between 3 and 17. Then, the user has five possibilities of guessing the number. If the user guesses correctly, the program will show a message congratulating them. In case of failure, the program will show a message encouraging the user to try again.

While developing the game, try to think about how to modularize and abstract the code; the range of numbers or the number of guesses can be turned into parameters.

The previous code did its job in the main function, which has the IO () type. The call to randomRIO is not made on a let block, which points out that this function works also in IO contexts. You can see that this is the case by asking the interpreter its type.
*Chapter9.Randomness> import System.Random
*Chapter9.Randomness System.Random> :t randomRIO
randomRIO :: Random a => (a, a) -> IO a
Let’s think for a moment why randomness needs an IO context to work. The main reason is that randomRIO breaks the referential transparency property found in pure Haskell code; not every call to randomRIO will return the same result. That fact implies that the compiler may not be able to perform certain optimizations. For example, in pure code if you have a piece of code like g (f x) (f x), it may be rewritten to let h = f x in g h h, which involves one call less to f and thus less work to be done. But if instead you write this:
do x <- randomRIO (0, 10)
   y <- randomRIO (0, 10)
   return g x y
this cannot be rewritten to the following:
do z <- randomRIO (0, 10)
   return g z z

The code cannot be rewritten because the two calls to randomRIO may return a different random value. Many other good properties of Haskell code break in the presence of side effects, and thus you cannot use tools such as equational reasoning. This is another reason, in addition to maintainability, for keeping pure and IO code apart.

Furthermore, any call to that function must update the global random generator, which is kept in memory by the system. This is an important side effect. If you don’t want to use that global variable and being forced to use IO, you can create pure functions that involve random values given that you provide the initial random generator, which is a value of type StdGen. The corresponding pure functions return both a random value and the generator for the next value. For example, the previous code can be “purified” to work on StdGen values as follows:
import Data.List
getJumps :: StdGen -> Int -> [Int]
getJumps gen initial = unfoldr (g -> let (next, nextG) = randomR (0, 3000) g
                                      in if next == initial
                                         then Nothing
                                         else Just (next, nextG))
                               gen
You can either create a StdGen value with a fixed seed via mkStdGen or obtain the global one via getStdGen. Thus, the initial random code, which also outputs the result, can be written as follows:
main = do (initial :: Int) <- fmap read getLine
          gen <- getStdGen
          print $ take 10 $ getJumps gen initial

This example holds a valuable lesson; in many cases, you can split your functions with side effects in several pure functions. There is also a driver function that operates in IO and takes care of threading the information between the others. In that way, your code will be easier to maintain.

Working with Files

The next step after knowing how to deal with side effects and how to communicate with the console is to read and write on a durable location. In other words, you will learn how to read and write files on the system. At some point, the files turn into objects that can perform certain operations, such as moving files from one location to another or deleting one file from disk. This section will delve into the functions that provide this functionality in the Haskell Platform.

Reading and Writing

The Prelude provides functions for bulk operations on files, either writing from or reading an entire string into a file. The involved functions are writeFile or appendFile for output and readFile for input. One possibility is reading a list of clients and, for each of them, deciding whether they’ve won a time travel to a point in time (for this second part, the code uses randomRIO). The main assumption is that each line of the file will contain a client, so you can use the lines function in Data.String, which separates a string between newline boundaries.
{-# LANGUAGE ScopedTypeVariables #-}
import Data.String
import System.Random
main = do clients <- fmap lines $ readFile "clients.db"
          clientsAndWinners
                  <- mapM (c -> do (winner :: Bool) <- randomIO
                                    (year   :: Int ) <- randomRIO (0, 3000)
                                    return (c, winner, year))
                          clients
          writeFile "clientsWinners.db" $ concatMap show clientsAndWinners

However, working only with these operations has a severe performance impact: the information read from the file is kept entirely in memory, and the data to write to the file must be assembled into a string before writing it. In many cases, you will want further control. For example, you will want to read just a line or a file or write information to the disk as you go, instead of waiting for the entire process to finish. The module you should look at is System.IO .

Like in most programming languages, the flow of work with a file involves first opening a handle to it, then performing any operation that you need on the file, and finally closing the access to the file. The handle keeps track of all internal information that the system may need to work on the file.

The first step is handled by the openFile operation. The arguments for this function are the path to the file (the documentation shows that the type of this argument is FilePath, but it’s just a synonym for String) and the opening mode, which can be for reading, writing (or both), or appending. The result will be a file handle. In addition to opening your own files, you can use any of the predefined handles, such as stdin, stdout, or stderr, which map to standard input, output, and error, usually from the console.

The inverse operation, closing a file, is done via the hClose functions. As in any other programming language, it’s important that you close the file after you’ve finished working with it because an open handle consumes resources from the machine.

To read or write, you can use the generalizations of the previous console functions, which work on any file handle. These are all prefixed by h, and thus you get hGetChar, hGetLine, hGetContents, hPutChar, hPutStr, and hPutStrLn. Additionally, you can query the system as to whether you’ve finished reading the file with hIsEOF. Armed with these weapons, you can write a more efficient version of the previous example, which reads one line from the file at a time.
import System.Environment
import System.IO
main = do (inFile:outFile:_) <- getArgs
          inHandle  <- openFile inFile  ReadMode
          outHandle <- openFile outFile WriteMode
          loop inHandle outHandle
          hClose inHandle
          hClose outHandle
       where loop inHandle outHandle = do
               isEof <- hIsEOF inHandle
               if not isEof
                  then do client <- hGetLine inHandle
                          (winner :: Bool) <- randomIO
                          (year   :: Int ) <- randomRIO (0, 3000)
                          hPutStrLn outHandle $ show (client, winner, year)
                          loop inHandle outHandle
                  else return ()

You may have noticed that instead of hard-coding the input and output file names, the code obtains them via the getArgs function . This function, found in System.Environment, returns a list of all the command-line parameters that have been given to the executable command.

Since opening a file, working with it, and closing the handle afterward are common tasks, the Haskell Platform includes a special withFile function that takes care of the file and expects only the action to perform. For example, the previous code could have been written as follows:
main = do (inFile:outFile:_) <- getArgs
          withFile inFile  ReadMode  $ inHandle ->
            withFile outFile WriteMode $ outHandle ->
              loop inHandle outHandle
       where loop inHandle outHandle = do ...

Additionally, withFile will protect against possible errors while processing the file, ensuring that the file handle is always closed. In the next section, you will learn how to treat possible errors that may occur while working with files (e.g., data corruption, files that do not comply a certain schema, etc.).

Knowing how to read and write information from files, in addition to marshaling to and from strings with show and read, provides a way to save information about clients and products on disk. In Exercise 9-2 you are asked to classify clients in several files based on their category.

Exercise 9-2. Client Classification

Remember that clients in the time machine store can be individuals, companies, or government organizations. Right now, the store keeps the records for all clients in a single file. Each line contains a client, and it’s encoded by using the show function on them.

In this exercise, write a small executable that reads the information of those clients and generates another three files. Each of these files should contain all the clients of one of the three possible categories.

Up to this point, the code has just read the data in every file as a string. But in reality, two different scenarios may occur. The first one is that the file contains text, the other is that the information in the file is raw binary data. Furthermore, in the first case different encodings may have been used to translate from text data into a sequence of bytes. The hSetEncoding function is used to change the current encoding of a handle. The System.IO module includes many encoders, including latin1, utf8, utf16, and utf32, as well as its big-endian and little-endian versions. If you want to be sure that the contents of the output file for winners are written in UTF-8, you ought to change the code before going into the loop.
withFile outFile WriteMode $ outHandle -> do
  hSetEncoding outHandle utf8
  loop inHandle outHandle

Anyway, in the next chapter you’ll see that using Strings is rarely the best option when dealing with files. Instead, you should use ByteString and Text values. In that case, the encoding problem comes first, and you always need to specify how to convert from sequences of bytes to text values.

Handling Files

Let’s move now to another range of operations you can do with files. Moving, copying, and deleting don’t involve the data stored in files but rather the files themselves. For these operations, the Haskell Platform includes the directory package, which is quite straightforward to use.

The functions renameFile, copyFile, and removeFile, from the System.Directory module, take care of moving, copying, and deleting files from the system, respectively. It’s important to note here that none of these operations supports working on directories, only on files.

Because of the inability of the previous functions to work on folders, directory provides a different set of functions for them. The following list enumerates the most important ones:
  • getDirectoryContents returns a list of all the elements inside the folder.

  • createDirectory makes a new folder on the system. However, it may fail if the directory already exists or if some of the parent directories are not present. createDirectoryIfMissing takes care of those two conditions.

  • renameDirectory allows you to move a folder in the system. Notice that errors may happen if the path where you want to move already exists.

  • removeDirectory deletes a folder from the system. Usually, the directory cannot be removed if it’s not empty. removeDirectoryRecursive, on the other hand, deletes every element inside the folder and then the folder itself.

In addition to encoding, the other big issue when handling files is the format of the file paths, which changes depending on the underlying operating system. For example, Windows uses drive letters to prefix the paths and allows both and / to separate parts, whereas Unix and Mac OS X systems allow only / as a separator. Since the Haskell community considers interoperability between systems an important issue, a library has been included in the Haskell Platform that abstracts from these issues; its name is filepath.

The most important function in this library is (</>), which combines two path segments into a larger one. For example, if you want to read some database configuration found in the database.settings file in the config folder, the most correct way to do so is as follows:
withFile ("config" </> "database.settings") ReadMode $ handle -> ...
Conversely, you may want to split a certain file path in the directory between where the file resides and the file name. You can do this with splitFileName. As an example, here’s some code that gets an input file name from command-line arguments and writes into a file named example within the same folder:
import System.FilePath
main = do (file:_) <- getArgs
          let (folder, _) = splitFileName file
          withFile (folder </> "example") WriteMode $ handle -> ...

In some cases, it may be interesting not to split only between the folder and the file name but rather get a list of all the path segments. In that case, you can use splitDirectories instead.

Finally, filepath includes functions for dealing with extensions. You can use (<.>) to add an extension to an existing file path. The other way around, you can use splitExtensions to generate a tuple of the file name and all the extensions attached to it. The package includes many other little utilities, such as replacing an extension, dropping just the last extension, and so on. It’s useful to look at the filepath documentation when you need to handle file paths in your application.

Note

Again, never roll your own ways to combine extensions, add extensions, or do any other task involving file paths. Instead, use the filepath package to ensure that your code is correct and interoperable.

Error Handling

When dealing with input and output or many other kinds of side effects such as printing or communicating through a network, many kinds of errors can occur. In all the previous examples, the program would just crash when trying to open a file that doesn’t exist on the system. It’s important to know how to detect and recover from those error conditions.

But before proceeding with handling errors in IO contexts, I’m going to discuss how errors are handled in pure code, a topic you’ve already heard about previously in this book. In that way, you’ll notice the differences between pure errors and exceptions. The latter is the mechanism for signaling anomalous conditions in IO.

Pure Errors

Until now, when an operation could not be performed, the most common way to cope with it was to return a Maybe value. This happened, for example, when a function was not applicable to some of the constructors of the value, such as head to empty lists, [], or getting the company name of an Individual client. In that way, the calling function would get Nothing as a result if any problem happened.

Unfortunately, Maybe is not a precise way to specify what error has occurred. You can declare that the operation was not successful but cannot specify the reason. And in many cases, that information is relevant; it’s not the same failure that happens on a database transaction because the connection is not available as the failure that happens because some constraint has been violated. A useful type for these scenarios is Either, which is declared simply as follows:
data Either a b = Left a | Right b

For example, if x :: Either Int String, x can contain either an integer value, in which case the Left constructor would have been used, or a string value, which is wrapped on the Right constructor.

Conventionally, using Either for errors uses Right when the computation is successful and Left for failing scenarios. Thus, if r is the type of correct results and e is the type you would use for specifying the possible errors, Either e r is the customary type to use in functions. As an example, let’s define a version of companyName that tells you the specific error why it couldn’t retrieve the name of a company client.
data CompanyNameError = GovOrgArgument | IndividualArgument
companyName :: Client i -> Either CompanyNameError String
companyName Company { clientName = n } = Right n
companyName GovOrg     { }             = Left GovOrgArgument
companyName Individual { }             = Left IndividualArgument
A user of this function can now pattern match on the result and find the type of error in case it’s needed.
printCompanyName :: Client i-> IO ()
printCompanyName c = case companyName c of
  Right n -> putStrLn n
  Left GovOrgArgument-> putStrLn "A government organization was given"
  Left IndividualArgument -> putStrLn "An individual was given"
The dichotomy between using Maybe or Either for specifying when an operation was not successful is usually a source of headaches. This becomes especially painful when you’re using a library that uses a different style of error specification than the one you’ve decided to use in your application. Thankfully, the errors package, in its Control.Error.Util module, contains helpful functions to convert between styles. The signatures of those functions involved in the conversion are as follows:
hush :: Either a b -> Maybe b
note :: a -> Maybe b -> Either a b

Essentially, you use hush to forget about any concrete error in an Either value and just return Nothing if the computation fails. In the other direction, you need to tell which error value to return in case the Maybe value turns out to be Nothing. The name of the function is a reminder of its usage; you need to “add a note” to the possible error value.

Since deciding whether to use Maybe or Either is difficult but also may have ramifications throughout your application, so you may think about abstracting over the way errors are handled. Haskell type classes are the tool you need here.

In Chapter 7 you learned how to use MonadPlus to return values that declared an erroneous condition and its mplus operation to combine several of those values and returned the ones that were not errors. If you use MonadPlus, you can use Maybe or lists, signaling errors with Nothing and empty lists, respectively. Unfortunately, Either cannot be made an instance of MonadPlus. The problem is that the mempty operation in that type class must not have any parameter. Thus, you cannot specify which value to wrap in the Left constructor if an error should be returned.

The mtl package includes a generalization of MonadPlus to which both Maybe and Either can be given instances; its name is MonadError. Any type that supports this type class must provide two different operations, as its declaration needs.
class Monad m => MonadError e m | m -> e where
  throwError :: e -> m a
  catchError :: m a -> (e -> m a) -> m a
The first operation is the one responsible for signaling failure. As you can see, it satisfies the requirement that mempty didn’t; it takes an extra parameter that is the error value to return. For example, the companyName function could be generalized to work on both Maybe and Either as follows:
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.Except
companyName :: MonadError CompanyNameError m => Client i -> m String
companyName Company { clientName = n } = return n
companyName GovOrg     { }             = throwError GovOrgArgument
companyName Individual { }             = throwError IndividualArgument
In the case of MonadPlus, the mplus function serves well for recovering from an error. Essentially, x `mplus` y was described in this context as returning the value of x if it represents success, or otherwise returning y if x represents failure.2 This operation has also been generalized: catchError has the same task but has access to the error value of the corresponding throwError if the operation fails. For example, let’s create a function that calls companyName and in case of failure returns a predefined empty value using MonadError.
companyNameDef :: MonadError CompanyNameError m => Client i -> m String
companyNameDef c = companyName c `catchError` (\_ -> return "")
Along with the MonadError type class, mtl and transformers include an ExceptT monad transformer you can add to your stack. The best way to understand its task is by thinking of the MaybeT transformer, in other words, of computations that may fail, with the addition of a tag specifying the error in the appropriate case. The errors package also encourages the use of ExceptT and provides conversion functions between stacks using MaybeT and ExceptT.
hushT :: Monad m => ExceptT a m b -> MaybeT m b
noteT :: Monad m => a -> MaybeT m b -> ExceptT a m b

Note

There is a historical reason for the breakage of the naming convention in the case of Either, MonadError, and ExceptT . Older versions of transformers contained an ErrorT monad transformer, along with the MonadError class. This type has been deprecated, because it imposed unnecessary constraints in the type of errors, and has been replaced by ExceptT.

Safe Functions

Because of the historical development of the Haskell libraries, some of the functions in the Prelude module don’t have a pure-friendly way to cope with errors. An archetypical example is head. In the case of applying it to [], this function raises an exception (which can be caught only inside the IO monad, as you will see in the next section) instead of returning some representation of the error.

To alleviate this problem, the safe package provides a lot of versions of common functions that fail in a more pleasant way. For example, head has a version called headMay, which returns the value wrapped in a Maybe, and thus allows you to return Nothing for empty lists; and headDef, which takes an extra argument with a default value to return in the case of an empty list.

Catching Exceptions

I’ve already discussed how dealing with the outer world opens the door to a whole new category of errors, such as nonexistent files or lost connections. For that kind of events, Haskell provides an exception mechanism. There are two main differences between exceptions and the pure errors discussed.
  • Pure errors can be thrown and caught in any place, usually by simply pattern matching on the final value of the computation. In contrast, exceptions can be handled only inside an IO context (but still be thrown from any place).

  • When using Either, you need to specify in advance every possible error that may happen in the execution of some code. On the other hand, Haskell’s exception mechanism is extensible. This decision allows new side effects to fail in new ways but hurts the analysis of the code because you cannot tell in advance which exception may be thrown.

The entry point of any work with exceptions is the Control.Exception module. The code examples in the rest of the section will assume that this module is included in the imports list. In many cases you need to specify exception types inside function bodies and let declarations; the ScopedTypeVariables GHC extension will be assumed to be enabled in all the samples.

Let’s start with an example that adds exception handling to the initial function that wrote a list of winners from the database of clients by using readFile and writeFile.
import Control.Exception
import System.IO.Error
import System.Random
main = do clients <- fmap lines $ readFile "clients.db"
          clientsAndWinners
                  <- mapM (c -> do (winner :: Bool) <- randomIO
                                    (year   :: Int ) <- randomRIO (0, 3000)
                                    return (c, winner, year))
                          clients
          writeFile "clientsWinners.db" $ concatMap show clientsAndWinners
       `catch` ((e :: IOException) -> if isDoesNotExistError e
                                       then putStrLn "File does not exist"
                                       else putStrLn $ "Other error: " ++ show e)

The first thing to notice is the use of the catch function. The idea is simple; you declare the main code to run and then a handler for a specific class of exceptions. The second thing to notice is that the code explicitly mentions the type of exceptions to be handled using that code.

In this particular case, the type you’re interested in is IOException, which describes those exceptions that have something to do with input and output. A value of type IOException encodes extra information about the kind of problem that occurred. You can query it via a set of functions in the System.IO.Error module. In the example, our interest is nonexistent files and checks via the isDoesNotExistError function.

The fact that the Haskell exception mechanism is dynamic and extensible makes the type specification an important part of handling erroneous scenarios. The predefined set of exceptions that is raised by functions in the Haskell Platform is also included in the Control.Exception module. This set includes, among others, ArithException, which signals that a numerical error such as underflow or division by zero has occurred; ErrorCall, which allows handling calls to error; and PatternMatchFail, which is thrown when no pattern matches a specific value.

The following code asks the user for two integer numbers and shows the quotient of the two. Two kinds of exceptions may be raised. First, the user may input something that is not a number, which will cause a call to error inside read. Second, the other possible problem is division by zero. As you can see, each exception has its own handler.
main = do (n1 :: Int) <- fmap read getLine
          (n2 :: Int) <- fmap read getLine
          putStrLn $ show n1 ++ " / " ++ show n2
                     ++ " = " ++ show (n1 `div` n2)
       `catch` ((_ :: ErrorCall) -> putStrLn "Error reading number")
       `catch` ((e :: ArithException) -> case e of
                  DivideByZero -> putStrLn "Division by zero"
                  _            -> putStrLn $ "Other error: " ++ show e)

These exception types are different from IOException in one sense. Whereas an IOException value needs to be queried through special-purpose functions about the kind of problem that happened, these types are defined as simple ADTs, and thus you can use pattern matching to discover the source of problems.

The Control.Exception module includes many other variations of catch for handling exceptions. One of them is catches, which receives a list of handlers for different exceptions. For example, the previous code could have been written without several calls to catch using that function.
main = do ...
       `catches`
         [ Handler ((_ :: ErrorCall) -> putStrLn "Error reading number")
         , Handler ((e :: ArithException) -> case e of ...)]
Another possibility is using handle, which is just catch with reversed arguments. It’s common to use it when the code to execute is long but the code to handle the errors is short because it makes the exception handling apparent up front. The following is a third way to write the same quotient code:
main = handle ((_ :: ErrorCall)      -> ...) $
       handle ((e :: ArithException) -> ...) $
       do (n1 :: Int) <- fmap read getLine
          ...

In some cases, you want to treat an exception in a similar way to an error in a pure computation. This may lead to code that is easier to read if other sources of errors are pure. Think of a scenario when you’re validating some values from a database. In that case, exceptions can be raised in the database connection code, but validation will use Maybe or Either. The way to bridge both worlds is via the try function, which returns an Either value that may contain a result in its Right or a thrown exception in its Left.

For every exception handling function there’s a corresponding one ending in Just: catchJust, handleJust, and tryJust. Those functions take as an extra parameter an exception filter, which decides whether a particular exception should be caught by that handler or rethrown. These filters take the exception value as a parameter and must return a Maybe value. If it’s Nothing, the exception should be rethrown; if the result is Just e, the exception is handled by the code corresponding to that catch. You’ve already found a case where this is interesting to do. From all the possible ArithException values, the real interest lies only in DivisionByZero. Thus, you can use catchJust to ensure that any other exception is correctly rethrown.
main = catchJust (e -> if e == DivideByZero then Just e else Nothing)
          (do (n1 :: Int) <- fmap read getLine
              (n2 :: Int) <- fmap read getLine
              putStrLn $ show n1 ++ " / " ++ show n2
                         ++ " = " ++ show (n1 `div` n2)
           `catch` ((_ :: ErrorCall) -> putStrLn "Error reading number") )
          (\_ -> putStrLn "Division by zero")

Note

As you can see, the exception mechanism in Haskell is much more powerful than those in other languages. In addition to defining handlers by type, functions such as catchJust allow you to perform a dynamic check on whether to catch a particular exception. Using catchJust and similar functions ensures that you handle only the exceptions you know how to deal with, and the rest are properly rethrown to subsequent handlers.

Combinators such as catchJust enable you to be specific about which exception each handler should catch. In some cases, a handler has the opposite intention, though: catching every exception that might have been thrown in the code. The solution comes after looking closely at how different types of exceptions relate to each other. In particular, Haskell exceptions form a hierarchy. Each exception type E has a parent exception type P, describing the fact that E exceptions are a subset of P exceptions. At the root of this hierarchy you find the SomeException type. In conclusion, if you want to add a handler that catches all possible exceptions that may arise in your application, your code should look like this:
main = do ...
       `catch` ((e :: SomeException) -> ...) -- uncaught exceptions
Most programming languages that use exceptions as their error mechanism include, in addition to ways of throwing and catching them, a way to ensure that a certain piece of code runs even in the case of an exception. The usual purpose is to include some cleanup code or release some resource. For example, if you open a file handle, you want to ensure that it’s closed even if some exception arose in its processing. In Haskell, this functionality is provided via the finally function. It can be used to create a more resilient version of the code that writes the winner clients in a file.
main = do (inFile:outFile:_) <- getArgs
          inHandle  <- openFile inFile  ReadMode
          outHandle <- openFile outFile WriteMode
          ( loop inHandle outHandle
            `finally` (do hClose inHandle
                          hClose outHandle) )
       where loop inHandle outHandle = ...
However, this code is not completely correct. In particular, it may be the case that an exception is thrown while opening any of the files. In that case, you cannot use finally because the call to hClose without opening the handle is incorrect. A three-stage flow is usual when dealing with resources: you acquire the resource, you perform some operation, and you release it. Even in the case of an exception during processing, you want to release the resource, but you don’t want to run that code if the acquisition failed. This pattern is made explicit in Haskell by the bracket function . The most correct way to write the previous code is as follows:
main = do (inFile:outFile:_) <- getArgs
          bracket (openFile inFile  ReadMode)  -- acquisition of inHandle
                  hClose                       -- release of inHandle
                  (inHandle -> bracket (openFile outFile WriteMode)
                                hClose
                                (outHandle -> loop inHandle outHandle))
       where loop inHandle outHandle = ...

In Exercise 9-3 you can apply your new knowledge about exceptions by taking Exercise 9-2 and improving on it.

Exercise 9-3. Better Client Classification

Add exception handling to the code you wrote for Exercise 9-2.

Throwing Exceptions

Now that you’ve seen how to catch exceptions, it’s time to learn how to throw them. If you want to reuse any of the predefined exception types in Control.Exception, you just need to call throwIO if you are within the IO monad, or you can call throw if you want to throw an exception from pure code (but remember that the handler still needs to be inside the IO monad). This simple example does so by reusing the NoMethodError exception type.
main = do throw $ NoMethodError "I don't know what to do"
          `catch` ((e :: SomeException) ->
                       do putStr "An exception was thrown: "
                          putStrLn $ show e)
Usually you will want to raise an exception of a new custom type, which describes those exceptions that may happen in your code. To use a type as an exception, you need to create instances for it of the Show, Typeable, and Exception type classes. Thankfully, Haskell’s deriving mechanism saves you from writing all the boilerplate code. The following code declares a type of exceptions in an authentication system:
{-# LANGUAGE ScopedTypeVariables, DeriveDataTypeable #-}
import Data.Typeable
data AuthenticationException = UnknownUserName  String
                             | PasswordMismatch String
                             | NotEnoughRights  String
                             deriving (Show, Typeable)
instance Exception AuthenticationException
Now you can use your new exception type as any built-in one.
main = do throw $ UnknownUserName "Alejandro"
          `catch` ((e :: AuthenticationException) -> ...)

The Typeable Type Class

Haskell’s exception mechanism makes heavy use of the Typeable type class. This class allows you to get information about the type of a value at runtime (because usually Haskell erases all type information after compiling in order to increase performance). Typeable can also be used, in the Data.Dynamic module, to create an interface for dynamic values, which you can cast at runtime to other types.

Like on other occasions, having the exception functionality wrapped in a type class makes it easier to write code so that it’s reusable among several monad stacks and users of your library. In most cases, the MonadError type class introduced before should be enough to cover stacks with exceptions. If you want a real generalization of the Control.Exception functions and interface, I recommend the MonadCatch type class from the exceptions package.

Caution

Throwing exceptions in pure code is not recommended at all. Instead, you should strive to use other kinds of error propagation mechanisms, such as Maybe or Either. Inside IO code, exceptions may result code that is more concise and clearer (such as having a last-chance exception handler that logs all critical errors) but still shouldn’t be overused.

Streaming Data with Conduit

The input/output framework that has been shown in the previous sections is usually known as classic I/O or lazy I/O. It has been included in Haskell since the first versions. However, the way it works does not interact well with the laziness inherent to the Haskell language, giving rise to the so-called streaming data problem. For that reason, several stream libraries have been developed, which solve the problems related to this interaction in an elegant and efficient way.

Problems with Lazy Input/Output

Let’s consider the following simple piece of code. You open a file, get its contents, close the file, and then work with the information you’ve just obtained. At first sight, this should be OK: hGetContents reads all the information, and you ensure that the handle is closed with hClose.
main = do h <- openFile "/some/text/file" ReadMode
          s <- hGetContents h
          hClose h
          print s
However, if you run this code, you’ll either get an empty string on the screen or an error message.
$ cabal run chapter9-stream
file: hGetContents: illegal operation (delayed read on closed handle)

But if you exchange the order of the printing and closing operations, everything works fine. That a simple change in order makes the difference pinpoints a problem in the interaction between input/output operations and the time at which each element in the program is evaluated. Because of the lazy nature of Haskell, the s value won’t be evaluated until it’s needed, something that happens in the call to print. But at that point, the handle has already been closed, so hGetContents is not able to bring any kind of information from the file. In this case, the behavior of your program doesn’t match your expectations.

One possible solution is to force the evaluation of s using seq or deepseq. While this is a working solution, it has two problems. The first one is efficiency: forcing the entire string brings it into memory, consuming scarce system resources. If you want to be clever and force only the string you need, you run into problems of maintenance and composability. For example, which function should be responsible for bringing into memory a determinate value? How can you know up front which of the values will be needed in the entire application?

An even worse solution is keeping the file handle open through the life of the application. But this would have the obvious problem of acquiring resources without releasing them. Files may be cheap resources, but when speaking about network or database connections, this becomes an impossible way to go.

Laziness and IO may bring even more surprises. Suppose that during the execution of hGetContents in the previous code, an exception is thrown. This exception won’t be seen by the block of code that generated the string but rather where the value is being used. That means in the middle of some pure computation that uses s, an exception may get in the way, and there’s no way to handle it without resorting to IO. Furthermore, your code loses its predictability because you can’t always be sure whether using some value would entail running some IO computation.

As you can see, using the simple model of handles that System.IO exposes brings unpredictability to when resources may be released and when exceptions could be thrown. The possible solutions such as forcing evaluation or keeping handles open are definitely inefficient. This is called the streaming data problem.

The Haskell community has come up with some solutions to this problem in the form of stream libraries. These libraries usually provide an abstraction of data that comes from a resource in the form of a stream. Furthermore, the way in which the resource that generates the data is acquired and released is made much more predictable. In many cases, stream libraries also introduce a boost in performance because they can ensure that only the necessary data for performing an operation is brought into memory.

Several libraries in Hackage are built around this idea. An initial approach, developed by Oleg Kiselyov, is shown in the iteratee and enumerator packages. Afterward, other libraries provided a more convenient and easy-to-use interface, including pipes, io-streams, and conduit. All those three libraries are used in the wild.

I will now focus on the conduit library because it’s been used in other libraries that will be presented later, such as Persistent for database connections. In addition to the core library, there are many other libraries connecting conduit to different sources of data. In any case, the notions that appear in the conduit library are similar to those in any other libraries.

Introducing Conduits

conduit is based on streams of data that are produced, modified, and consumed by different actors. For example, if the “winner clients” example was modeled using conduit, there would be an actor providing a stream of strings from a file, another actor modifying that stream to add the information about whether each client has won or not, and finally another actor converting that stream into a new file in the system.

There are three kinds of actors which take part in processing data. Sources provide streams of values to be consumed. Examples of sources are obtaining text data from a file in disk, reading from a network connection, and obtaining each of the elements in a list. The converse behavior, consuming a stream of values and not producing any further stream (but maybe some final value), is modelled by sinks, for example, writing data into disk or sending it via a network connection. Summing a list could also be seen as a sink, since it turns a list of values into a single one. Finally, we have stream transformers (originally known as conduits) that consume an input stream and produce an output stream.

One important feature is that each actor can take care of acquiring and releasing its resources in a safe and predictable way. For example, a “write to file” sink may open a handle when the stream of data starts and can safely close the handle when the input stream is finished. Furthermore, data is only requested when the next parts of the stream need them, which enables better performance.

The separation between sources, sinks, and conduits used to be part of the public interface of the conduit package, but it is no more. In its current incarnation, the library uses a single type ConduitT i o m r. The first argument i represents the type of values in the input stream, the second argument o represents that of the output stream. The third argument m should be a monad which defines which side effects may occur while processing the stream (in fact, ConduitT is a monad transformer). For example, a source that reads data from a file would have the m parameter equal to IO. Finally, the r argument defines the type of the final result. The trick to only need one type is to set an argument to Void when no output is generated, or to () if no input is required or no interesting result is produced.

A complete flow of data is established via the (.|) connect or fuse operator. Of course, you can only combine two ConduitTs if the output type of one matches the input type of the next one. Finally, to run the operations in a stream you call runConduit , or runConduitPure if no side effects are involved.

Let’s look at some examples involving the simplest kind of stream: a list. All the functions related to using lists in this way are found in the Data.Conduit.List module. The sourceList function produces a stream that gives each of the elements in the list in order. The fold function consumes the list and produces the result of folding a specific function over all the elements. Let’s look at an example in the interpreter.
*Chapter9.Stream> import Data.Conduit
*Chapter9.Stream Data.Conduit> import qualified Data.Conduit.List as L
*Chapter9.Stream Data.Conduit L> let c = L.sourceList [1 .. 5] .| L.fold (+) 0
*Chapter9.Stream Data.Conduit L> runConduitPure c
15
The Data.Conduit.List includes many other actors over streams that have a similar interface to list functions. A useful one is map, which applies a function to each element in the stream, producing a new stream with the result of each application. You also can use filter on a stream to eliminate those values that are not interesting. As an example, let’s compute the sum of the squares of all odd numbers from 1 to 20.
> :{
| runConduitPure $ L.sourceList [1 .. 20] .| L.filter odd
|                  .| L.map (x -> x*x) .| L.fold (+) 0
| :}
1330
As in the case of lists, unfolding is another way to generate streams, provided in this case by the unfold function. In the next example you’ll see how it is used to generate an infinite stream of natural numbers, from which it takes only the first ten via isolate. The example also showcases the use of consume, which converts a stream to a simple list.
> :{
| runConduitPure $ L.unfold (x -> Just (x,x+1)) 1
|                  .| L.isolate 10 .| L.consume
| :}
[1,2,3,4,5,6,7,8,9,10]
The operations that are similar to lists give you lots of possibilities, but it’s interesting to know how to create your own conduits. Inside a ConduitT context, you gain access to four functionalities that are used to build streams.
  • await tries to take the next element in the input stream. If it is successful, it’s returned wrapped in Just. If the stream doesn’t have any more elements, it returns Nothing.

  • Input streams can be manipulated also inside a ConduitM. leftover allows you to put back some value on the input stream. At this point, the documentation discourages you from putting back elements that haven’t been obtained from a call to await .

  • yield is the function used to send values to the output stream.

  • The return value of the ConduitT you’re coding is stated simply as using the return method of monads.

As an example, let’s create a simple conduit that takes a stream of Clients and returns every person, whether an individual or part of a company, which appears in that stream.
people :: Monad m => ConduitT (Client i) Person m ()
people = do client <- await
            case client of
              Nothing -> return ()
              Just c -> do case c of
                             Company { person = p }    -> yield p
                             Individual { person = p } -> yield p
                             _                         -> return ()
                           people
Notice that the code doesn’t have to return one element in the output stream per element in the input stream; government organizations are not yielded. In that way, you can implement filters. You can check that this can be used as a normal Conduit in the interpreter.
> :{
| runConduitPure $
|   L.sourceList [ GovOrg 1 "NASA", Individual 2 (Person "A" "S")]
|   .| people .| L.consume
| :}
[Person {firstName = "A", lastName = "S"}]
It’s important to notice that if you wrap some monad m in ConduitT, the result of streaming data will live inside such monad m. Say that you want to count the number of government organizations that are clients. You can build this with a simple counter, but for illustration purposes let’s do so using State. Since ConduitT is a monad transformer, you need to insert calls to lift before the State actions. Also, this conduit does not produce any output stream, so we set the second type argument to Void to indicate this fact.
import Control.Monad.State
countGovOrgs :: MonadState Int m => ConduitT (Client i) Void m Int
countGovOrgs = do client <- await
                  case client of
                    Nothing -> do n <- lift $ get
                                  return n
                    Just c  -> do case c of
                                    GovOrg { } -> lift $ modify (+1)
                                    _          -> return ()
                                  countGovOrgs
Once you connect countGovOrgs to a source and execute it using runConduit (not runConduitPure since we use the effects of a monad), what you get still needs to be executed on a monad supporting MonadState. In this case, you’re interested only in the state, so execState is the function you need to get the result.
main = let clients = [ GovOrg 1 "Zas"
                     , Individual 2 (Person "Alejandro" "Serrano")]
           conduitGovOrgs = L.sourceList clients .| countGovOrgs
        in print $ execState (runConduit conduitGovOrgs) 0
As an extra example of conduit in which you take advantage of the ability to use other monads underneath, let’s implement the “winner clients” intermediate step in this framework. In the same way that the code uses randomRIO, you can use print or any other IO action.
import Control.Monad.Trans
import System.Random
winners :: ConduitT (Client i) (Client i, Bool, Int) IO ()
winners = do client <- await
             case client of
               Nothing -> return ()
               Just c  -> do (w :: Bool) <- lift $ randomIO
                             (y :: Int)  <- lift $ randomRIO (0, 3000)
                             yield (c, w, y)
                             winners

Exercise 9-4 should help you get fluent with conduit idioms.

Exercise 9-4. Conduit Utilities

Port the list functions that were presented in Chapter 3 to work with streams. In particular, write the definitions of unfold, which should generate a stream of values based on a generator, map that applies a function to all elements of a stream, filter for dropping some of them, and fold that computes a fold of a binary operation over a whole stream.

Accessing Files via Conduit

The problem that pushed you to consider conduit was not about lists but about accessing files better, getting improved performance, and gaining much more predictability. It’s time to consider the functions that the conduit ecosystem provides for these tasks, which are available in the Data.Conduit.Binary module of the conduit-extra package.

The interface is simple: the functions sourceFile and sourceHandle generate a stream from a file, whereas sinkFile and sinkHandle consume a stream, writing it into a file. The difference between the two kinds of functions is that those ending in File take care of opening and closing the handle to the corresponding file, whereas the ones ending in Handle must be provided with an already open handle and do not close the file at the end. The first ones provide all the features of conduit, whereas the second set enables easy interoperation.

One small tidbit is that those streams do not provide String values but rather ByteString values. This latter type is a more efficient way to treat bytes of data. The next chapter includes a complete treatment of ByteString, which is defined in the bytestring package, along with ways to convert it from and to String. But for the simple example of “winner clients” where each line must be enlarged with some extra information, the only thing you need to know is that pack converts a String into a ByteString. Here’s the corresponding code:
import qualified Data.ByteString.Char8 as BS
import qualified Data.Conduit.Binary as B
import Data.Monoid
winnersFile :: (Monad m, MonadIO m)
            => ConduitT BS.ByteString BS.ByteString m ()
winnersFile = do
  client <- await
  case client of
    Nothing -> return ()
    Just c  -> do (w :: Bool) <- liftIO $ randomIO
                  (y :: Int ) <- liftIO $ randomRIO (0, 3000)
                  yield $ c <> BS.pack (" " ++ show w ++ " " ++ show y)
                  winnersFile

As you may notice, the winnersFile code does not refer directly to the IO monad but rather to a type class called MonadIO. This is the class of all monad stacks that support calling IO actions inside its body. The particular feature that the MonadIO type class adds is lifting computations via liftIO.

With the information you have, putting all the actors to work reading and writing the file should be as simple as the following code. Notice the lines function, which separates a file into parts delimited by newlines.
main = runConduit $
         B.sourceFile "clients.db" .| B.lines .| winnersFile
                                   .| B.sinkFile "clientsWinners.db"
But the compiler will refuse such code. At this point, you need the extra generality introduced earlier via the MonadIO constraint. The operations that create sources or sinks in the Data.Conduit.Binary module use an extra facility from the conduit package (or to be more precise, from the resource package), to handle the opening and closing of resources in a safe fashion. The only difference is that you need to use runConduitRes instead of runConduit:
main = runConduitRes $
         B.sourceFile "clients.db" .| B.lines .| winnersFile
                                   .| B.sinkFile "clientsWinners.db"

Now you are sure that your files will be opened and closed when required.

The Resourcet Package

In the previous example, you saw how conduit uses ResourceT to manage the allocation and release of resources. This monad transformer, which lives in the resourcet package, is a generalization of the bracket function in Control.Exception. In particular, you can use its allocate function to acquire a resource along with a release action that is ensured to be called when the control exists from ResourceT block.

If only that functionality is provided, you will gain nothing from using ResourceT instead of bracket . But the former also allows you to release resources explicitly by calling the release function along with an identifier that allocate returns.

This package is useful for implementing managers of scarce resources. If your intention is to use a pool of resources for sharing them, you should also look at the resource-pool package, which can be easily combined with resourcet.

Looking Further Than Text Files

At the beginning of the chapter I mentioned that IO allows you to access a large variety of resources and perform many kinds of side effects. However, the only focus until now has been interacting with the console and accessing text files in the disk. This section presents two examples of work inside conduit but that relate to networking and binary serialization of Haskell data.

Basic Networking

The conduit-extra package does not only provide a conduit-based interface to the file system. Its Data.Conduit.Network module provides ready-to-use sources and sinks for network programming. There is one simplification though; when using this interface an actor in the network is either a server, which listens for incoming connections, or a client, which connects to a server, but not both. Furthermore, the connection always runs through TCP. The network interface provided by lower-level packages allows a much wider range of behavior, but in practice these two modes are enough for most applications.

In both the server and the client, the module expects a value of type AppData -> IO r, where the result r differs in server and client, in addition to the connection parameters. That value of type AppData is used to retrieve the source and sink in which you can read and write the connection, respectively. There are also generalized versions which use AppData -> m r, where m supports IO operations, but I do not consider them here.

The network application to develop will be yet another way to look at the “winner clients.” In this case, the client will send its name, obtained from the console, and the server will return information about whether the client has won. The main Conduit in the server is like previous exercises. The only addition is a call to putStrLn to print the name of the user on the screen.
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Monad.Trans
import qualified Data.ByteString.Char8 as BS
import Data.Conduit
import Data.Monoid
import System.Random
isWinner :: ConduitT BS.ByteString BS.ByteString IO ()
isWinner = do client <- await
              case client of
                Nothing -> return ()
                Just c  -> do
                  lift $ BS.putStrLn c
                  (w :: Bool) <- liftIO $ randomIO
                  (y :: Int ) <- liftIO $ randomRIO (0, 3000)
                  yield $ c <> BS.pack (" " ++ show w ++ " " ++ show y)
                  isWinner
The next step is to create the conduit which will tie together the input flow of data to the server and the output to each client. For that matter, you can access the source and sink representing the connection via the appSource and appSink functions.
import Data.Conduit.Network
serverApp :: AppData -> IO ()
serverApp d = runConduit $ appSource d .| isWinner .| appSink d
The last step is to start the server in the entry point. runTCPServer is the one with that task and needs as parameters both the port in which it will keep listening and the kind of connections to accept. You can restrict connections through IPv4 or IPv6 or from a specific address. In this case, any client is welcome.
{-# LANGUAGE OverloadedStrings #-}
import Network.Socket
main :: IO ()
main = withSocketsDo $ runTCPServer (serverSettings 8900 "*") serverApp

Note

On Windows systems, you need to initialize the networking subsystem before doing any communication over that channel. You can achieve this by wrapping your main code with a call to withSocketsDo from the Network.Socket module in the network package, as done earlier. The function itself has no effect on other operating systems, so you should always include it to retain maximum compatibility between platforms.

The other side of the coin, the client, is much simpler. After the connection is created by the runTCPClient function, it must write the name given by the command line to the appSink, which will send that information to the server. Once some information is returned, it just prints it to the screen. In total, the code looks like this:
{-# LANGUAGE OverloadedStrings #-}
import Network.Socket
import System.Environment
main :: IO ()
main = withSocketsDo $ do
         (name:_) <- getArgs
         runTCPClient (clientSettings 8900 "127.0.0.1") (clientApp name)
clientApp :: String -> AppData -> IO ()
clientApp name d = do runConduit $ (yield $ BS.pack name) .| appSink d
                      runConduit $ appSource d .| (do Just w <- await
                                                      lift $ BS.putStrLn w)

As you can see, conduit allows you to treat both files in disk and network connections with the same abstractions. This makes it easy to reuse data transformation functions between different resources, as this example has done with isWinner.

Note

The network package provides an interface to networking using the more conventional approach of sockets. You can create the sockets using that package and still access the data using the sourceSocket and sinkSocket functions.

Binary Serialization

In the Haskell ecosystem, there are two main libraries to automatically serialize Haskell values into binary data. That is, it converts some value to a stream of bytes that can later be read to reconstruct such value. This data can be sent through a network, written to disk, or in general used as any other byte-encoded information.

These libraries are binary and cereal. Both provide almost the same interface. The only difference is that when using binary, you reference the Binary type class from the Data.Binary module; and when using cereal, you should use the Serialize type class in the Data.Serialize module. There are also corresponding packages to perform the serialization via streams: binary-conduit and cereal-conduit. The difference, which is important in terms of laziness and performance, is that cereal is strict, whereas binary is lazy. This implies, for example, that binary can cope with infinite streams of data (e.g., provided through a network connection), whereas cereal cannot. On the other hand, the lazy behavior of binary may give surprising results in some cases.

Since both are so similar, in this section the focus will be just on binary and binary-conduit. The first thing you must do to be able to serialize your own data types is instantiate the Binary type class. Prior to GHC 7.2.1, you had to write the code for the instance by hand, but since that version, the compiler can write the instance for you. Thus, I won’t delve into the details of Binary instances.

Let’s say you want to serialize values of the Person data type introduced in Chapters 2 and 3 using the automatic derivation of Binary. First, you need to enable the DeriveGeneric extension to GHC. Then, you can include Generic in the deriving clause of the data type. An instance of Generic contains information about the constructors and arguments that a specific data type declares. Using that information, binary can build a whole Binary instance, so you need only to declare it, but you need to do this without providing any implementation of the functions. For Person, the whole code reads as follows:
{-# LANGUAGE DeriveGeneric #-}
import Data.Binary (Binary)
import GHC.Generics (Generic)
data Person = Person { firstName :: String, lastName  :: String }
              deriving (Show, Read, Generic)
instance Binary Person
Another alternative is to enable a small GHC extension which allows us to indicate that we want to derive Binary directly in the data definition.
{-# LANGUAGE DeriveAnyClass #-}
data Person = Person { firstName :: String, lastName  :: String }
              deriving (Show, Read, Generic, Binary)
Once you have a Binary instance , you get access to the functions decode and encode, which convert from and to ByteStrings to the corresponding values, respectively. The binary-conduit package wraps those functions, allowing you to provide a stream of values from encode, or to consume a stream of values, serializing each of them in turn, from decode. The following example serializes a list of people to disk. The specific code that brings binary into the game is shown in bold.
import Data.Conduit
import qualified Data.Conduit.List as L
import qualified Data.Conduit.Binary as B
import qualified Data.Conduit.Serialization.Binary as S
main = runConduitRes $
         L.sourceList clients.| S.conduitEncode .| B.sinkFile "people.db"
   where clients = [Person "Alejandro" "Serrano", Person " Doctor" "Who?"]
Doing the converse is also simple. The only difference is using conduitDecode instead of conduitEncode. The following example gets a stream of Person elements and outputs them to the screen:
import Control.Monad.Trans
main = runConduitRes $
  B.sourceFile "people.db"
  .| S.conduitDecode
  .| L.mapM_ ((p :: Person) -> lift $ putStrLn $ show p)

It’s important that binary knows which kind of value it’s deserializing, either from using other functions on them or by explicitly writing a signature (like in the example). In the version of binary used when writing this book, if you change Person to String, the code still works OK but generates a stream of four strings instead of two Persons.

Comma-Separated Values

Another simple format to save or transmit data is comma-separated values, usually shortened to CSV. Turning Haskell values into this format is as simple as doing binary serialization as described above, thanks to the cassava package and the cassava-conduit bridge to encoding and decoding.

In fact, the only difference is that instead of Binary, you have to write instances for ToRecord and FromRecord. As in the previous case, all these instances can be automatically generated. Remember that you need to choose only one of the options below: either independent instance declarations or using the DeriveAnyClass extension and adding FromRecord and ToRecord to the deriving clause. If you add the code as it is, GHC complains about duplicate instances.
import Data.Csv (FromRecord, ToRecord)
-- option 1: using a separate instance
instance FromRecord Person
instance ToRecord Person
-- option 2: using DeriveAnyClass
data Person = Person { firstName :: String, lastName  :: String }
              deriving (Show, Read, Generic, FromRecord, ToRecord)
Encoding to CSV instead of a binary format translates to replacing the conduitEncode function from the latter with toCsv of the former. Decoding takes slightly more work, as shown in the following code:
import qualified Data.Csv as Csv
import qualified Data.Csv.Conduit as Csv
import System.IO.Error
main = runConduitRes $
  B.sourceFile "people.db"
  .| Csv.fromCsvLiftError (userError . show)
                          Csv.defaultDecodeOptions Csv.NoHeader
  .| L.mapM_ ((p :: Person) -> lift $ putStrLn $ show p)

When decoding from CSV, problems may arise. The number or format of the data in the file may not be as required to create a value of the corresponding type. As with other parts of the Haskell ecosystem, cassava-conduit forces you to decide what to do with those errors. The simplest strategy is to map the errors from CSV decoding into those understood by the monad in which the conduit runs. In the code above, that monad is IO, and its error type is IOException. We use the simplest mechanism: turn the error into a string using show and then throw the exception with that string, without further inspection.

In addition to this error mapping, the decoder also needs to know several options, including how values are separated, and whether the file contains a first row with headers or not. In the example above, we use the default, which means that values are separated by commas.

Better Serialization

None of the serialization mechanisms presented in this chapter is very efficient when you need to query and transform a lot of data because the full set of data must be brought into memory and decoded. A better solution is to use a database management system for storing the information. Chapter 12 is devoted to connecting to databases in Haskell and storing and querying the data saved in them. In Chapter 11, I consider another common interchange format: JSON.

Summary

This chapter covered how to interface with the “outer world” and explained how to deal with side effects.
  • The IO monad was presented as the one that gives context for side effects in computations. Furthermore, we looked at the way in which Haskell separates pure computations from those with side effects.

  • You learned how to read and write from the console and from files in the disk using the “classic I/O” approach.

  • Another source of side effects is randomness, which is provided by the random package.

  • The chapter covered the way in which errors are handled in pure code. In particular, I talked about the Either type.

  • Side-effects computations open the door to exceptions, an extensible but impure way to treat erroneous scenarios inside IO.

  • The “classic I/O” exhibit has several deficiencies in its interaction with laziness. The conduit library is a solution for those problems based on the stream abstraction.

  • Finally, you learned how to use conduit in several scenarios, such as reading and writing to files, communicating through the network, and serializing Haskell values to various formats.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.195.28