In Parts 1 and 2 you learned the basics of pure evaluation and how it helps in parallelizing and distributing code. However, sometimes you need to step out to the wild world of side effects. You’ll start by looking at simple input and output in Haskell. At the beginning, the information will be input in the console, and the output will be printed on the screen. Afterward, you’ll learn how to use a permanent means of storing data and reading and writing from files in disk.
Computations with side effects may turn out wrong in many ways; maybe the data is corrupted, or perhaps the network connection goes down. Haskell includes an exception mechanism that signals these conditions and allows you to react. In pure computations you’ve been handling erroneous scenarios using Maybe and MonadPlus. The relation between the different ways of handling errors in Haskell will be clear by the end of this chapter.
One problem with Haskell’s lazy evaluation model is that it brings some unpredictability to input and output. The Haskell community has come up with several streaming data libraries to solve this problem. This chapter discusses conduit, as well as some applications of this library to deal with file handling and networking. After learning all this information, you’ll be ready to save the client and product data on disk using a binary serialization format. For that matter, the binary library will also be presented.
Basic Input and Output
Buffering
There’s another possible value for hSetBuffering, namely, NoBuffering, that makes the system use no buffering at all. However, you should be wary. Problems can result from using NoBuffering in combination with console functions in Windows.
The do notation that has been used and the data types wrapped inside another type (in this case, IO) should give you a clue that you’re indeed working inside a monad. In the previous chapters, you saw that State represents computations that carry a state, Maybe computations that may fail, and so on. IO brings to a certain bit of code the ability to perform side effects. In the previous examples, it was used to print to and get information from the console.
There’s one important difference between IO and the rest of the monads that have been considered up to this point. For the other monads, there was some way to inspect the resulting value (e.g., using pattern matching with Maybe) or run the monad (with functions similar to runState). In that way, you could get back to a context without the corresponding monad.
By contrast, IO has no escape hatch. There’s no operation that converts an IO t value into a plain t value.1 This is how Haskell imposes a barrier between the computations that involve side effects and must work inside the IO monad, as well as the rest. You want to be sure that a function without IO in its signature is pure, free of side effects, and referentially transparent. Furthermore, every call to IO in a certain application must span from the initial main function, which has IO () type.
Note
For the Haskell compiler, an expression that is pure must be treated differently than an expression involving side effects and that lives in IO. However, from a programmer’s point of view, IO is no different from any another monad. It just happens to introduce side effects into computations.
All these output functions have IO () as the return type. Remember, () is the unit type and there’s only one value of that type, which is confusingly also named (). It’s customarily used in Haskell programming to identify those monadic computations that don’t have a value to return but that has effects on the context that you’re interested in. In addition to putStr and friends, other functions that use () in their return types are tell (from the Writer monad) and put (from the State monad).
The counterparts to the previous functions are the ones that receive information from the user. The most common one is getLine; its task is to gather all the input up to the moment in which the user presses Enter. You may be interested, however, in getting the input character by character, which you can do using getChar. Finally, in some cases you want to get all input up to an end-of-file marker. This is common when piping data between processes in a Unix-like shell. The getContents function provides this functionality.
Haskeline
If you’re planning to create a command-line application, the simple functions that Prelude includes won’t offer the best experience to the end user. In particular, you may want to provide command history or autocompletion.
One difference with standard Prelude is that input functions return their value wrapped on a Maybe, anticipating the case in which the input may stop earlier than expected.
It’s important to think for a moment how the execution of such code differs from what you may expect. In most programming languages, the call to lookup would have triggered the evaluation of listOfActions. Then, all the calls to getLine or putStrLn would have been executed since they appear in the body of listOfActions. However, in the Haskell code the side effects aren’t executed until you’ve unwrapped the action to be of type IO t for some t and asked for its execution, which happens in the line in bold with the “execute action” comment. One important implication is that IO values are first-class citizens of Haskell, like functions are, and can be combined, can be passed as arguments, and can be returned as any other value.
Randomness
Let’s take a break from input and output and consider the issue of randomness in Haskell. For that purpose, I present a simulation of a time machine breaking in the middle of a journey. When this breakdown happens, the time traveler is involved in a disturbing experience: ending in a random place, at a random point in time, with no clue of what is outside the machine.
Note
In the preceding code I’m using the monadic counterpart of unfolding, namely, unfoldrM. However, you won’t find it in the usual Control.Monad module. Instead, you need to import Control.Monad.Loop from the monad-loops package. This is an interesting module, which you should add to your list of tools for monadic programming.
Since you know how to read and write from the console and how to generate random values, you can now develop small console games. Exercise 9-1 asks you to do this.
Exercise 9-1. Win a Time Travel Trip
Develop a small game in which you can win a time travel trip in one of the machines in the store. The game should generate a random number between 3 and 17. Then, the user has five possibilities of guessing the number. If the user guesses correctly, the program will show a message congratulating them. In case of failure, the program will show a message encouraging the user to try again.
While developing the game, try to think about how to modularize and abstract the code; the range of numbers or the number of guesses can be turned into parameters.
The code cannot be rewritten because the two calls to randomRIO may return a different random value. Many other good properties of Haskell code break in the presence of side effects, and thus you cannot use tools such as equational reasoning. This is another reason, in addition to maintainability, for keeping pure and IO code apart.
This example holds a valuable lesson; in many cases, you can split your functions with side effects in several pure functions. There is also a driver function that operates in IO and takes care of threading the information between the others. In that way, your code will be easier to maintain.
Working with Files
The next step after knowing how to deal with side effects and how to communicate with the console is to read and write on a durable location. In other words, you will learn how to read and write files on the system. At some point, the files turn into objects that can perform certain operations, such as moving files from one location to another or deleting one file from disk. This section will delve into the functions that provide this functionality in the Haskell Platform.
Reading and Writing
However, working only with these operations has a severe performance impact: the information read from the file is kept entirely in memory, and the data to write to the file must be assembled into a string before writing it. In many cases, you will want further control. For example, you will want to read just a line or a file or write information to the disk as you go, instead of waiting for the entire process to finish. The module you should look at is System.IO .
Like in most programming languages, the flow of work with a file involves first opening a handle to it, then performing any operation that you need on the file, and finally closing the access to the file. The handle keeps track of all internal information that the system may need to work on the file.
The first step is handled by the openFile operation. The arguments for this function are the path to the file (the documentation shows that the type of this argument is FilePath, but it’s just a synonym for String) and the opening mode, which can be for reading, writing (or both), or appending. The result will be a file handle. In addition to opening your own files, you can use any of the predefined handles, such as stdin, stdout, or stderr, which map to standard input, output, and error, usually from the console.
The inverse operation, closing a file, is done via the hClose functions. As in any other programming language, it’s important that you close the file after you’ve finished working with it because an open handle consumes resources from the machine.
You may have noticed that instead of hard-coding the input and output file names, the code obtains them via the getArgs function . This function, found in System.Environment, returns a list of all the command-line parameters that have been given to the executable command.
Additionally, withFile will protect against possible errors while processing the file, ensuring that the file handle is always closed. In the next section, you will learn how to treat possible errors that may occur while working with files (e.g., data corruption, files that do not comply a certain schema, etc.).
Knowing how to read and write information from files, in addition to marshaling to and from strings with show and read, provides a way to save information about clients and products on disk. In Exercise 9-2 you are asked to classify clients in several files based on their category.
Exercise 9-2. Client Classification
Remember that clients in the time machine store can be individuals, companies, or government organizations. Right now, the store keeps the records for all clients in a single file. Each line contains a client, and it’s encoded by using the show function on them.
In this exercise, write a small executable that reads the information of those clients and generates another three files. Each of these files should contain all the clients of one of the three possible categories.
Anyway, in the next chapter you’ll see that using Strings is rarely the best option when dealing with files. Instead, you should use ByteString and Text values. In that case, the encoding problem comes first, and you always need to specify how to convert from sequences of bytes to text values.
Handling Files
Let’s move now to another range of operations you can do with files. Moving, copying, and deleting don’t involve the data stored in files but rather the files themselves. For these operations, the Haskell Platform includes the directory package, which is quite straightforward to use.
The functions renameFile, copyFile, and removeFile, from the System.Directory module, take care of moving, copying, and deleting files from the system, respectively. It’s important to note here that none of these operations supports working on directories, only on files.
getDirectoryContents returns a list of all the elements inside the folder.
createDirectory makes a new folder on the system. However, it may fail if the directory already exists or if some of the parent directories are not present. createDirectoryIfMissing takes care of those two conditions.
renameDirectory allows you to move a folder in the system. Notice that errors may happen if the path where you want to move already exists.
removeDirectory deletes a folder from the system. Usually, the directory cannot be removed if it’s not empty. removeDirectoryRecursive, on the other hand, deletes every element inside the folder and then the folder itself.
In addition to encoding, the other big issue when handling files is the format of the file paths, which changes depending on the underlying operating system. For example, Windows uses drive letters to prefix the paths and allows both and / to separate parts, whereas Unix and Mac OS X systems allow only / as a separator. Since the Haskell community considers interoperability between systems an important issue, a library has been included in the Haskell Platform that abstracts from these issues; its name is filepath.
In some cases, it may be interesting not to split only between the folder and the file name but rather get a list of all the path segments. In that case, you can use splitDirectories instead.
Finally, filepath includes functions for dealing with extensions. You can use (<.>) to add an extension to an existing file path. The other way around, you can use splitExtensions to generate a tuple of the file name and all the extensions attached to it. The package includes many other little utilities, such as replacing an extension, dropping just the last extension, and so on. It’s useful to look at the filepath documentation when you need to handle file paths in your application.
Note
Again, never roll your own ways to combine extensions, add extensions, or do any other task involving file paths. Instead, use the filepath package to ensure that your code is correct and interoperable.
Error Handling
When dealing with input and output or many other kinds of side effects such as printing or communicating through a network, many kinds of errors can occur. In all the previous examples, the program would just crash when trying to open a file that doesn’t exist on the system. It’s important to know how to detect and recover from those error conditions.
But before proceeding with handling errors in IO contexts, I’m going to discuss how errors are handled in pure code, a topic you’ve already heard about previously in this book. In that way, you’ll notice the differences between pure errors and exceptions. The latter is the mechanism for signaling anomalous conditions in IO.
Pure Errors
Until now, when an operation could not be performed, the most common way to cope with it was to return a Maybe value. This happened, for example, when a function was not applicable to some of the constructors of the value, such as head to empty lists, [], or getting the company name of an Individual client. In that way, the calling function would get Nothing as a result if any problem happened.
For example, if x :: Either Int String, x can contain either an integer value, in which case the Left constructor would have been used, or a string value, which is wrapped on the Right constructor.
Essentially, you use hush to forget about any concrete error in an Either value and just return Nothing if the computation fails. In the other direction, you need to tell which error value to return in case the Maybe value turns out to be Nothing. The name of the function is a reminder of its usage; you need to “add a note” to the possible error value.
Since deciding whether to use Maybe or Either is difficult but also may have ramifications throughout your application, so you may think about abstracting over the way errors are handled. Haskell type classes are the tool you need here.
In Chapter 7 you learned how to use MonadPlus to return values that declared an erroneous condition and its mplus operation to combine several of those values and returned the ones that were not errors. If you use MonadPlus, you can use Maybe or lists, signaling errors with Nothing and empty lists, respectively. Unfortunately, Either cannot be made an instance of MonadPlus. The problem is that the mempty operation in that type class must not have any parameter. Thus, you cannot specify which value to wrap in the Left constructor if an error should be returned.
Note
There is a historical reason for the breakage of the naming convention in the case of Either, MonadError, and ExceptT . Older versions of transformers contained an ErrorT monad transformer, along with the MonadError class. This type has been deprecated, because it imposed unnecessary constraints in the type of errors, and has been replaced by ExceptT.
Safe Functions
Because of the historical development of the Haskell libraries, some of the functions in the Prelude module don’t have a pure-friendly way to cope with errors. An archetypical example is head. In the case of applying it to [], this function raises an exception (which can be caught only inside the IO monad, as you will see in the next section) instead of returning some representation of the error.
To alleviate this problem, the safe package provides a lot of versions of common functions that fail in a more pleasant way. For example, head has a version called headMay, which returns the value wrapped in a Maybe, and thus allows you to return Nothing for empty lists; and headDef, which takes an extra argument with a default value to return in the case of an empty list.
Catching Exceptions
Pure errors can be thrown and caught in any place, usually by simply pattern matching on the final value of the computation. In contrast, exceptions can be handled only inside an IO context (but still be thrown from any place).
When using Either, you need to specify in advance every possible error that may happen in the execution of some code. On the other hand, Haskell’s exception mechanism is extensible. This decision allows new side effects to fail in new ways but hurts the analysis of the code because you cannot tell in advance which exception may be thrown.
The entry point of any work with exceptions is the Control.Exception module. The code examples in the rest of the section will assume that this module is included in the imports list. In many cases you need to specify exception types inside function bodies and let declarations; the ScopedTypeVariables GHC extension will be assumed to be enabled in all the samples.
The first thing to notice is the use of the catch function. The idea is simple; you declare the main code to run and then a handler for a specific class of exceptions. The second thing to notice is that the code explicitly mentions the type of exceptions to be handled using that code.
In this particular case, the type you’re interested in is IOException, which describes those exceptions that have something to do with input and output. A value of type IOException encodes extra information about the kind of problem that occurred. You can query it via a set of functions in the System.IO.Error module. In the example, our interest is nonexistent files and checks via the isDoesNotExistError function.
The fact that the Haskell exception mechanism is dynamic and extensible makes the type specification an important part of handling erroneous scenarios. The predefined set of exceptions that is raised by functions in the Haskell Platform is also included in the Control.Exception module. This set includes, among others, ArithException, which signals that a numerical error such as underflow or division by zero has occurred; ErrorCall, which allows handling calls to error; and PatternMatchFail, which is thrown when no pattern matches a specific value.
These exception types are different from IOException in one sense. Whereas an IOException value needs to be queried through special-purpose functions about the kind of problem that happened, these types are defined as simple ADTs, and thus you can use pattern matching to discover the source of problems.
In some cases, you want to treat an exception in a similar way to an error in a pure computation. This may lead to code that is easier to read if other sources of errors are pure. Think of a scenario when you’re validating some values from a database. In that case, exceptions can be raised in the database connection code, but validation will use Maybe or Either. The way to bridge both worlds is via the try function, which returns an Either value that may contain a result in its Right or a thrown exception in its Left.
Note
As you can see, the exception mechanism in Haskell is much more powerful than those in other languages. In addition to defining handlers by type, functions such as catchJust allow you to perform a dynamic check on whether to catch a particular exception. Using catchJust and similar functions ensures that you handle only the exceptions you know how to deal with, and the rest are properly rethrown to subsequent handlers.
In Exercise 9-3 you can apply your new knowledge about exceptions by taking Exercise 9-2 and improving on it.
Exercise 9-3. Better Client Classification
Add exception handling to the code you wrote for Exercise 9-2.
Throwing Exceptions
The Typeable Type Class
Haskell’s exception mechanism makes heavy use of the Typeable type class. This class allows you to get information about the type of a value at runtime (because usually Haskell erases all type information after compiling in order to increase performance). Typeable can also be used, in the Data.Dynamic module, to create an interface for dynamic values, which you can cast at runtime to other types.
Like on other occasions, having the exception functionality wrapped in a type class makes it easier to write code so that it’s reusable among several monad stacks and users of your library. In most cases, the MonadError type class introduced before should be enough to cover stacks with exceptions. If you want a real generalization of the Control.Exception functions and interface, I recommend the MonadCatch type class from the exceptions package.
Caution
Throwing exceptions in pure code is not recommended at all. Instead, you should strive to use other kinds of error propagation mechanisms, such as Maybe or Either. Inside IO code, exceptions may result code that is more concise and clearer (such as having a last-chance exception handler that logs all critical errors) but still shouldn’t be overused.
Streaming Data with Conduit
The input/output framework that has been shown in the previous sections is usually known as classic I/O or lazy I/O. It has been included in Haskell since the first versions. However, the way it works does not interact well with the laziness inherent to the Haskell language, giving rise to the so-called streaming data problem. For that reason, several stream libraries have been developed, which solve the problems related to this interaction in an elegant and efficient way.
Problems with Lazy Input/Output
But if you exchange the order of the printing and closing operations, everything works fine. That a simple change in order makes the difference pinpoints a problem in the interaction between input/output operations and the time at which each element in the program is evaluated. Because of the lazy nature of Haskell, the s value won’t be evaluated until it’s needed, something that happens in the call to print. But at that point, the handle has already been closed, so hGetContents is not able to bring any kind of information from the file. In this case, the behavior of your program doesn’t match your expectations.
One possible solution is to force the evaluation of s using seq or deepseq. While this is a working solution, it has two problems. The first one is efficiency: forcing the entire string brings it into memory, consuming scarce system resources. If you want to be clever and force only the string you need, you run into problems of maintenance and composability. For example, which function should be responsible for bringing into memory a determinate value? How can you know up front which of the values will be needed in the entire application?
An even worse solution is keeping the file handle open through the life of the application. But this would have the obvious problem of acquiring resources without releasing them. Files may be cheap resources, but when speaking about network or database connections, this becomes an impossible way to go.
Laziness and IO may bring even more surprises. Suppose that during the execution of hGetContents in the previous code, an exception is thrown. This exception won’t be seen by the block of code that generated the string but rather where the value is being used. That means in the middle of some pure computation that uses s, an exception may get in the way, and there’s no way to handle it without resorting to IO. Furthermore, your code loses its predictability because you can’t always be sure whether using some value would entail running some IO computation.
As you can see, using the simple model of handles that System.IO exposes brings unpredictability to when resources may be released and when exceptions could be thrown. The possible solutions such as forcing evaluation or keeping handles open are definitely inefficient. This is called the streaming data problem.
The Haskell community has come up with some solutions to this problem in the form of stream libraries. These libraries usually provide an abstraction of data that comes from a resource in the form of a stream. Furthermore, the way in which the resource that generates the data is acquired and released is made much more predictable. In many cases, stream libraries also introduce a boost in performance because they can ensure that only the necessary data for performing an operation is brought into memory.
Several libraries in Hackage are built around this idea. An initial approach, developed by Oleg Kiselyov, is shown in the iteratee and enumerator packages. Afterward, other libraries provided a more convenient and easy-to-use interface, including pipes, io-streams, and conduit. All those three libraries are used in the wild.
I will now focus on the conduit library because it’s been used in other libraries that will be presented later, such as Persistent for database connections. In addition to the core library, there are many other libraries connecting conduit to different sources of data. In any case, the notions that appear in the conduit library are similar to those in any other libraries.
Introducing Conduits
conduit is based on streams of data that are produced, modified, and consumed by different actors. For example, if the “winner clients” example was modeled using conduit, there would be an actor providing a stream of strings from a file, another actor modifying that stream to add the information about whether each client has won or not, and finally another actor converting that stream into a new file in the system.
There are three kinds of actors which take part in processing data. Sources provide streams of values to be consumed. Examples of sources are obtaining text data from a file in disk, reading from a network connection, and obtaining each of the elements in a list. The converse behavior, consuming a stream of values and not producing any further stream (but maybe some final value), is modelled by sinks, for example, writing data into disk or sending it via a network connection. Summing a list could also be seen as a sink, since it turns a list of values into a single one. Finally, we have stream transformers (originally known as conduits) that consume an input stream and produce an output stream.
One important feature is that each actor can take care of acquiring and releasing its resources in a safe and predictable way. For example, a “write to file” sink may open a handle when the stream of data starts and can safely close the handle when the input stream is finished. Furthermore, data is only requested when the next parts of the stream need them, which enables better performance.
The separation between sources, sinks, and conduits used to be part of the public interface of the conduit package, but it is no more. In its current incarnation, the library uses a single type ConduitT i o m r. The first argument i represents the type of values in the input stream, the second argument o represents that of the output stream. The third argument m should be a monad which defines which side effects may occur while processing the stream (in fact, ConduitT is a monad transformer). For example, a source that reads data from a file would have the m parameter equal to IO. Finally, the r argument defines the type of the final result. The trick to only need one type is to set an argument to Void when no output is generated, or to () if no input is required or no interesting result is produced.
A complete flow of data is established via the (.|) connect or fuse operator. Of course, you can only combine two ConduitTs if the output type of one matches the input type of the next one. Finally, to run the operations in a stream you call runConduit , or runConduitPure if no side effects are involved.
await tries to take the next element in the input stream. If it is successful, it’s returned wrapped in Just. If the stream doesn’t have any more elements, it returns Nothing.
Input streams can be manipulated also inside a ConduitM. leftover allows you to put back some value on the input stream. At this point, the documentation discourages you from putting back elements that haven’t been obtained from a call to await .
yield is the function used to send values to the output stream.
The return value of the ConduitT you’re coding is stated simply as using the return method of monads.
Exercise 9-4 should help you get fluent with conduit idioms.
Exercise 9-4. Conduit Utilities
Port the list functions that were presented in Chapter 3 to work with streams. In particular, write the definitions of unfold, which should generate a stream of values based on a generator, map that applies a function to all elements of a stream, filter for dropping some of them, and fold that computes a fold of a binary operation over a whole stream.
Accessing Files via Conduit
The problem that pushed you to consider conduit was not about lists but about accessing files better, getting improved performance, and gaining much more predictability. It’s time to consider the functions that the conduit ecosystem provides for these tasks, which are available in the Data.Conduit.Binary module of the conduit-extra package.
The interface is simple: the functions sourceFile and sourceHandle generate a stream from a file, whereas sinkFile and sinkHandle consume a stream, writing it into a file. The difference between the two kinds of functions is that those ending in File take care of opening and closing the handle to the corresponding file, whereas the ones ending in Handle must be provided with an already open handle and do not close the file at the end. The first ones provide all the features of conduit, whereas the second set enables easy interoperation.
As you may notice, the winnersFile code does not refer directly to the IO monad but rather to a type class called MonadIO. This is the class of all monad stacks that support calling IO actions inside its body. The particular feature that the MonadIO type class adds is lifting computations via liftIO.
Now you are sure that your files will be opened and closed when required.
The Resourcet Package
In the previous example, you saw how conduit uses ResourceT to manage the allocation and release of resources. This monad transformer, which lives in the resourcet package, is a generalization of the bracket function in Control.Exception. In particular, you can use its allocate function to acquire a resource along with a release action that is ensured to be called when the control exists from ResourceT block.
If only that functionality is provided, you will gain nothing from using ResourceT instead of bracket . But the former also allows you to release resources explicitly by calling the release function along with an identifier that allocate returns.
This package is useful for implementing managers of scarce resources. If your intention is to use a pool of resources for sharing them, you should also look at the resource-pool package, which can be easily combined with resourcet.
Looking Further Than Text Files
At the beginning of the chapter I mentioned that IO allows you to access a large variety of resources and perform many kinds of side effects. However, the only focus until now has been interacting with the console and accessing text files in the disk. This section presents two examples of work inside conduit but that relate to networking and binary serialization of Haskell data.
Basic Networking
The conduit-extra package does not only provide a conduit-based interface to the file system. Its Data.Conduit.Network module provides ready-to-use sources and sinks for network programming. There is one simplification though; when using this interface an actor in the network is either a server, which listens for incoming connections, or a client, which connects to a server, but not both. Furthermore, the connection always runs through TCP. The network interface provided by lower-level packages allows a much wider range of behavior, but in practice these two modes are enough for most applications.
In both the server and the client, the module expects a value of type AppData -> IO r, where the result r differs in server and client, in addition to the connection parameters. That value of type AppData is used to retrieve the source and sink in which you can read and write the connection, respectively. There are also generalized versions which use AppData -> m r, where m supports IO operations, but I do not consider them here.
Note
On Windows systems, you need to initialize the networking subsystem before doing any communication over that channel. You can achieve this by wrapping your main code with a call to withSocketsDo from the Network.Socket module in the network package, as done earlier. The function itself has no effect on other operating systems, so you should always include it to retain maximum compatibility between platforms.
As you can see, conduit allows you to treat both files in disk and network connections with the same abstractions. This makes it easy to reuse data transformation functions between different resources, as this example has done with isWinner.
Note
The network package provides an interface to networking using the more conventional approach of sockets. You can create the sockets using that package and still access the data using the sourceSocket and sinkSocket functions.
Binary Serialization
In the Haskell ecosystem, there are two main libraries to automatically serialize Haskell values into binary data. That is, it converts some value to a stream of bytes that can later be read to reconstruct such value. This data can be sent through a network, written to disk, or in general used as any other byte-encoded information.
These libraries are binary and cereal. Both provide almost the same interface. The only difference is that when using binary, you reference the Binary type class from the Data.Binary module; and when using cereal, you should use the Serialize type class in the Data.Serialize module. There are also corresponding packages to perform the serialization via streams: binary-conduit and cereal-conduit. The difference, which is important in terms of laziness and performance, is that cereal is strict, whereas binary is lazy. This implies, for example, that binary can cope with infinite streams of data (e.g., provided through a network connection), whereas cereal cannot. On the other hand, the lazy behavior of binary may give surprising results in some cases.
Since both are so similar, in this section the focus will be just on binary and binary-conduit. The first thing you must do to be able to serialize your own data types is instantiate the Binary type class. Prior to GHC 7.2.1, you had to write the code for the instance by hand, but since that version, the compiler can write the instance for you. Thus, I won’t delve into the details of Binary instances.
It’s important that binary knows which kind of value it’s deserializing, either from using other functions on them or by explicitly writing a signature (like in the example). In the version of binary used when writing this book, if you change Person to String, the code still works OK but generates a stream of four strings instead of two Persons.
Comma-Separated Values
Another simple format to save or transmit data is comma-separated values, usually shortened to CSV. Turning Haskell values into this format is as simple as doing binary serialization as described above, thanks to the cassava package and the cassava-conduit bridge to encoding and decoding.
When decoding from CSV, problems may arise. The number or format of the data in the file may not be as required to create a value of the corresponding type. As with other parts of the Haskell ecosystem, cassava-conduit forces you to decide what to do with those errors. The simplest strategy is to map the errors from CSV decoding into those understood by the monad in which the conduit runs. In the code above, that monad is IO, and its error type is IOException. We use the simplest mechanism: turn the error into a string using show and then throw the exception with that string, without further inspection.
In addition to this error mapping, the decoder also needs to know several options, including how values are separated, and whether the file contains a first row with headers or not. In the example above, we use the default, which means that values are separated by commas.
Better Serialization
None of the serialization mechanisms presented in this chapter is very efficient when you need to query and transform a lot of data because the full set of data must be brought into memory and decoded. A better solution is to use a database management system for storing the information. Chapter 12 is devoted to connecting to databases in Haskell and storing and querying the data saved in them. In Chapter 11, I consider another common interchange format: JSON.
Summary
The IO monad was presented as the one that gives context for side effects in computations. Furthermore, we looked at the way in which Haskell separates pure computations from those with side effects.
You learned how to read and write from the console and from files in the disk using the “classic I/O” approach.
Another source of side effects is randomness, which is provided by the random package.
The chapter covered the way in which errors are handled in pure code. In particular, I talked about the Either type.
Side-effects computations open the door to exceptions, an extensible but impure way to treat erroneous scenarios inside IO.
The “classic I/O” exhibit has several deficiencies in its interaction with laziness. The conduit library is a solution for those problems based on the stream abstraction.
Finally, you learned how to use conduit in several scenarios, such as reading and writing to files, communicating through the network, and serializing Haskell values to various formats.