Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Alejandro Serrano MenaPractical Haskellhttps://doi.org/10.1007/978-1-4842-4480-7_10

10. Building and Parsing Text

Alejandro Serrano Mena¹

(1)

Utrecht, The Netherlands

The previous chapter opened the door to side effects, including among others saving data from your Haskell application on disk or sending it through the network. However, the only true serialization mechanism that was introduced was the binary one, through the binary package.

In many cases, a textual representation of data is preferable to a binary one. Even though it is usually much larger, data encoded in text is more readable for a human consumer, making your programs much easier to inspect and debug. Furthermore, many interchange formats such as XML or JSON encode the data in text, adding to the plain information tags to structure it.

Haskell has great features for working with text, which are the focus of this chapter. As you will see, the built-in String data type is not the best option in many cases and should be replaced by much more efficient alternatives such as ByteString and Text. You shouldn’t use plain concatenation to build output text either; your programs should use builders.

The converse operation, decoding some information from a textual representation, is called parsing. Haskell offers an approach to parsing that uses combinators, which is exemplified by the attoparsec library. Interestingly, parsing introduces new type classes, the most important one being Applicative.

The Five Textual Data Types

Treating text correctly is a difficult task in almost any programming language. Behind the simple concept of textual data there are always subtle issues related to encoding and representation in memory. This section discusses how to deal with text in the most correct and efficient way in your Haskell code.

Until now, you’ve worked with text values represented within the String data type, which is just a synonym for [Char], a list of characters. The main benefit of String is the simple model that allows any function working on lists to work also on text. Since lists are an integral part of Haskell, this is an important feature. Another good quality of String is its support for Unicode. You may not have tried it, but any of the code you’ve written with String would work perfectly with Chinese, Greek, or Cyrillic characters.

The downside is that this ease of use comes at the expense of performance. Even though GHC performs extensive optimizations on Strings, there’s some overhead both in time and in memory compared to the raw representation as used in other languages. When you are dealing with large quantities of text (e.g., generating output for a web application), you need the best performance possible.

At this point, two functionalities collide, and you need to balance them.

You want to see the string as a bare stream of bytes. If you don’t confer any extra meaning to the raw bytes, you can substantially increase the performance because operating systems usually provide specialized operations for moving bulk blocks of bytes quickly both on memory and to disk.
Those bytes have an intended meaning. To recover the meaning, you need to consider the encoding the data uses. The same bytes may mean different things when interpreted as a set of Latin-only characters or when using UTF-8 to decode them. Thus, if you care about the actual characters encoded in the string, you have to be careful when performing operations on them. The aim should be to have a data type that is as performant as possible while still maintaining correctness.

In the case of Haskell, the Haskell Platform splits the String data type in two other different types. When you need a fast implementation but are looking only at the text as raw bytes, you should choose the ByteString type from the bytestring package. The Text type , bundled in the text package, focuses on describing Unicode strings while keeping up good performance.

Note

String is good for experimentation and for the prototyping phases of development because Haskell makes the String easy to use. However, when writing actual code, you should use ByteString or Text. Furthermore, making this choice will make you consider other important issues such as encoding.

In addition to this distinction in focus, the two libraries provide both strict and lazy versions of their corresponding data types. Remember the subtle balance that strictness and laziness expose. Using a strict ByteString or Text means that you might end up evaluating chunks of text that are larger than expected. On the other hand, laziness may bring back some of the problems that were discussed in the previous chapter.

Table 10-1 shows the two dimensions and the module you should import to get each set of features. Notice that for lazy Unicode data you can use both lazy Text and built-in [Char].

Table 10-1

Haskell Platform Modules for Textual Data

	Strict	Lazy
Raw stream of bytes	Data.ByteString	Data.ByteString.Lazy
Unicode data	Data.Text	Data.Text.Lazy or [Char]

As important as deciding which type to use is knowing how to convert between those different representations. For example, you may need to insert some String obtained by calling getLine inside a Text value. Another common scenario is getting some input from a file or from a network as a stream of bytes, which is described as a ByteString value, and decoding it to a Text value. Figure 10-1 summarizes all possible paths between the types.

../images/316945_2_En_10_Chapter/316945_2_En_10_Fig1_HTML.jpg — Figure 10-1
Conversions between textual data types

There are two points to be made about Figure 10-1. The first one is that conversion between strict and lazy types can be done via toStrict and fromStrict, but also via toChunks and fromChunks. All these functions are available in the modules corresponding to lazy versions, that is, Data.Text.Lazy or Data.ByteString.Lazy. The difference is that the functions ending in Strict consume or produce strict versions, whereas the Chunks functions consume or produce lists of strict values. Here are two of these signatures:

toStrict :: Data.ByteString.Lazy.ByteString -> Data.ByteString.ByteString

toChunks :: Data.ByteString.Lazy.ByteString -> [Data.ByteString.ByteString]

This small distinction allows you to retain some degree of laziness even with strict versions because the elements in the list will be evaluated only when needed.

There’s no single conversion between Text and ByteString but rather a family of them, found in the Data.Text.Encoding and Data.Text.Lazy.Encoding modules. Each of the functions includes in its name the encoding that should be used to give meaning to the raw stream of bytes. For example, encodeUtf8 converts some text to the byte representation using UTF-8; decodeLatin1 does the converse operation but with Latin-1 encoded text.

Note

There are direct conversions from String to ByteString defined in the modules Data.ByteString.Char8 and Data.ByteString.Lazy.Char8. However, they treat each character as an 8-bit value, dropping any extra information (most characters need more than one byte to be encoded). Thus, using the functions in those modules may result in a loss of information, so they should be used with care.

In addition to conversions between them, the external interface to ByteString and Text is almost the same. For that reason, it’s common to import the corresponding modules in qualified form. Functions are also rather similar to the list-based functions that you would use for String. The main difference with them is that Text and ByteString values do not define constructors, so you need to substitute the use of (:) to add one character with cons and the use of pattern matching with head and tail. On the other hand, usual functions such as map, reverse, foldr, or foldl maintain their names, as this example shows:

*Chapter10.Text> import qualified Data.Text as T

*Chapter10.Text T> :t T.pack "Hello Text type"

T.pack "Hello Text type" :: T.Text

*Chapter10.Text T> import Data.Char

*Chapter10.Text T Data.Char> T.map toUpper (T.pack "Hello Text type")

"HELLO TEXT TYPE"

*...> T.intercalate (T.pack ",") (map T.pack ["apples","oranges"])

"apples,oranges"

*Chapter10.Text T Data.Char> T.length (T.pack "A Text example")

Both Text and ByteString are instances of the Monoid type class. This is quite useful because the (<>) operator is equivalent to concatenation, a quite common operation to do on these types.

*...> (T.pack "Hello ") <> (T.pack "Beginning Haskell")

"Hello Beginning Haskell"

There’s still one drawback to using Text or ByteString instead of String: the code becomes full of calls to T.pack to convert from literals. Because of the popularity of these packages, the GHC developers have introduced an extension that allows you to write constant values directly as string literals, making the compiler responsible for inserting the corresponding calls to pack (in many cases the call to pack is further optimized and the literal is compiled directly to its Text or ByteString representation). This extension is called OverloadedStrings. The following example enables such an extension in GHCi and uses it to construct directly a Text value:

*Chapter10.Text T Data.Char Data.Monoid> :set -XOverloadedStrings

*Chapter10.Text T Data.Char Data.Monoid> T.map toUpper "Hello Text type"

"HELLO TEXT TYPE"

Be aware that in some cases you may need to introduce an explicit type signature in order to fix the type of text value you’re creating. But in most cases, Haskell type inference will be able to make the decision.

In the previous chapter, I introduced the functions in Data.Conduit.Binary that work with files on the disk, as well as Data.Conduit.Network that does the same but with a network connection. In that case, the type of input and output streams was ByteString, which is compatible with the treatment of raw bytes, which was discussed here. If you want to treat Text data, you need to convert it via an encoding, using the Conduits in Data.Conduit.Text. The functions in that module take a codec, like utf8, as a parameter. This is the final version of the “winner clients” application using conduit:

{-# LANGUAGE OverloadedStrings #-}

import Control.Monad.Trans

import Data.Conduit

import qualified Data.Conduit.Binary as B

import qualified Data.Conduit.List as L

import qualified Data.Conduit.Text as T

import Data.Text

main :: IO()

main = runConduitRes $

B.sourceFile "clients.db" .| T.decode T.utf8 .|

T.lines .| winnersFile .| L.concatMap (x -> [x, " "]) .|

T.encode T.utf8 .| B.sinkFile "clientsWinners.db"

winnersFile :: (Monad m, MonadIO m) => ConduitT Text Text m ()

winnersFile = ... -- same as previous chapter, but using Text

Note

In addition to using conduit, you can also access files directly as Text or ByteString using the “classical I/O” approach. Those functions live in the Data.Text.IO and Data.ByteString.IO packages and have the same name as their String counterparts.

Building as Fast as the Wind

Until now, the examples have shown how to deal with chunks of text data in several ways, but no complete example has been shown that stores the list of clients on disk. But this shouldn’t be complicated; one way to do it is just to generate a big Text value and then save it using conduit.

Since the purpose of the data is to be read back by other applications, you should impose some structure to the representation. In particular, the encoding that I’ve chosen for Client is the following:

Each client is written as its own line. So, the file has a list of clients delimited by newline characters. This allows faster reading via lines afterward.
Each client’s data starts with client, and the fields composing the information will be written between parentheses and as a list separated by commas.
The first field in each line specifies the kind of client. gov, com, or ind will be used in each case.
Since newline, comma, and parenthesis characters are used for special purposes, these will be replaced by , ,, (, and ) inside each representation.
The rules for the embedded Person data type are the same, but with the information prepended by person instead of client.

The implementation using Text is straightforward.

{-# LANGUAGE OverloadedStrings #-}

import Data.Conduit

import qualified Data.Conduit.Binary as B

import qualified Data.Conduit.List as L

import qualified Data.Conduit.Text as T

import Data.Monoid

import Data.Text

saveClients :: FilePath -> [Client Int] -> IO ()

saveClients fpath clients = runConduitRes $

L.sourceList clients .| L.map clientToText

.| L.concatMap (x -> [x, " "]) -- write ' ' between clients

.| T.encode T.utf8 .| B.sinkFile fpath

clientToText :: Client Int -> Text

clientToText (GovOrg i n) =

"client(gov," <> escapeString (show i) <> "," <> escapeString n <> ")"

clientToText (Company i n p d) =

"client(com," <> escapeString (show i) <> "," <> escapeString n <> ","

<> personToText p <> "," <> escapeString d <> ")"

clientToText (Individual i p) =

"client(ind," <> escapeString (show i) <> "," <> personToText p <> ")"

personToText :: Person -> Text

personToText (Person f l) = "person(" <> escapeString f <> "," <> escapeString l <> ")"

escapeString :: String -> Text

escapeString = replace " " "\n" . replace "," "\," .

replace "(" "\(" . replace ")" "\(" . pack

However, while its simplicity cannot be denied, this code is highly inefficient. Every time two elements are concatenated, a new Text value has to be created, and this comes with some overhead to allocate memory, to copy data, and also to keep track of the value and release it when it’s no longer needed. Furthermore, the conversion of integer values to Text entails an intermediate conversion to String via the show function.

Both the text and bytestring packages provide a Builder data type that can be used to efficiently generate large text values. The trick is that a Builder is not itself a value but instead encodes how to build the large text value from its constituent parts. Then, you execute the Builder, which finally generates the string you were looking for. In this last step of executions, many optimizations can be done. For example, the Builder knows how long the final result will be, so it can allocate the whole memory it needs just once.

Builders are built from three basic blocks. You can build one out of a single character using the singleton function or build it from a larger string value using one of fromString, fromText, or fromLazyText. The following code, however, takes advantage of the OverloadedStrings extension, which allows you to create a Builder from a literal string. Finally, Builders are concatenated via the Monoid (<>), like Text values are. The following code is a complete replacement of the previous one, where Builders are used instead of Text:

import qualified Data.Text.Lazy.Builder as B

import qualified Data.Text.Lazy.Builder.Int as B

clientToText :: Client Int -> B.Builder

clientToText (GovOrg i n) =

"client(gov," <> B.decimal i <> B.singleton ','

<> B.fromText (escapeString n) <> B.singleton ')'

clientToText (Company i n p d) =

"client(com," <> B.decimal i <> B.singleton ','

<> B.fromText (escapeString n) <> B.singleton ','

<> personToText p <> B.singleton ','

<> B.fromText (escapeString d) <> B.singleton ')'

clientToText (Individual i p) =

"client(ind," <> B.decimal i <> B.singleton ','

<> personToText p <> B.singleton ')'

personToText :: Person -> B.Builder

personToText (Person f l) =

"person(" <> B.fromText (escapeString f) <> B.singleton ','

<> B.fromText (escapeString l) <> B.singleton ')'

You may have noticed that the Data.Text.Lazy.Builder.Int has also been imported into the code. This module gives you access to the decimal combinator, which is a fast Builder for converting integer values into text. If you prefer to use base 16, you may use hexadecimal instead.

To get the final Text value, you just need to call toLazyText on the Builder. As the name suggests, you won’t get a strict Text value but rather a lazy one. Remember that in any case you have the toStrict function to convert that value to a strict one when needed. Let’s see an example of building a client description in GHCi.

> let co = (Company 1 "Black Hole" (Person "John" "Smith") "Traveller")

> B.toLazyText $ clientToText co

"client(com,1,Black Hole,person(John,Smith),Traveller)"

Since conduit uses strict ByteStrings instead of lazy ones, the new version of saveClients using Builders must call toStrict at some point. The implementation shows that fact.

import qualified Data.Text.Lazy as LT

saveClients fpath clients = runConduitRes $

L.sourceList clients .| L.map clientToText

.| L.map (LT.toStrict . B.toLazyText)

.| L.concatMap (x -> [x, " "]) -- write ' ' between clients

.| T.encode T.utf8 .| B.sinkFile fpath

The bytestring library provides Builders for both strict and lazy ByteStrings in the Data.ByteString.Builder and Data.ByteString.Lazy.Builder modules, respectively. Note that the purpose of ByteString Builder is not to produce some text content but rather a stream of raw bytes. Thus, you can use it to create your own binary format via new combinators. For example, the library provides functions such as in16LE and word64BE, which directly include some value of a specific number of bits (8, 16, 32, or 64) with big-endian or little-endian formats directly as raw data.

Note

Remember, use Builders when you want to generate a large Text value or a big stream of bytes. They provide much more efficiency than bare concatenation of strings, without any loss in expressiveness.

You can use the same techniques shown here to allow clients to write information about products and purchases in disk. Exercise 10-1 guides you in the process of doing so.

Exercise 10-1. Building Products and Purchases

Write Builders for the following data types that encode products and purchases in the Time Machine Store:

data Product = Product { id :: Int, name :: String, price :: Double

, description :: String }

data Purchase = Purchase { client :: Client Int, products :: [Product] }

You can use any representation you want. However, make sure to escape characters correctly so that later other applications can read the file.

Parsing with attoparsec

While generating output efficiently is useful, it’s often necessary to read back those values in your application. You can use a textual format to interchange information between applications, such as a web front end, an accounting program, an inventory application, and so on.

The most efficient way to deal with the problem of recognizing patterns is using a parser. Often, you create a description of the grammar (in other words, of the structure that strings will follow) in a file separate from your source code. Then, a parser generator converts the description into executable code written in a specific programming language. The best-known examples of these parser generators are bison or yacc for C, ANTLR and JavaCC for Java, and Happy for Haskell.

The focus in this section won’t be on these parser generators but rather on the attoparsec package that you can find in Hackage. When using attoparsec, you describe your grammar using a set of combinators within your Haskell code. The benefit of this approach is that you can take advantage of the modularization and reuse features of the Haskell programming language when writing your parser. For example, you may declare a function that takes as a parameter a parser and generates another one that is the same one given as an argument but run twice.

Note

In the following examples, the code will use the Data.Attoparsec.Text module, which creates parsers that work on Text values. If you’re interesting in parsing ByteStrings instead, you should use the Data.Attoparsec module.

The simplest example of a parser is one that needs to match an exact string to succeed. The corresponding combinator is called, quite naturally, string and takes as a parameter the string to match. In the interpreter let’s try to run a parser that succeeds only with hello on some different strings.¹

*Chapter10.Parsing> :set -XOverloadedStrings

*Chapter10.Parsing> import Data.Attoparsec.Text

*Chapter10.Parsing Data.Attoparsec.Text> parse (string "hello") "hello"

Done "" "hello"

*Chapter10.Parsing ...> parse (string "hello") "hello world"

Done " world" "hello"

*Chapter10.Parsing ...> parse (string "hello") "bye bye"

Fail "bye bye" [] "string"

*Chapter10.Parsing ...> parse (string "hello") "he"

Partial _

*Chapter10.Parsing ...> feed (parse (string "hello") "he") "llo"

Done "" "hello"

As you can see, the function that ultimately executes a parser is called parse. It takes as a first argument the description of the format to decode and as a second argument the string that should be matched. However, the results from parsing may be a bit surprising.

Execution of the parser consumes only part of the string; the remainder is left for further work. The Done constructor for results wraps both that part that hasn’t been used in the parsing process, along with the string that successfully matched. In the example, parsing "hello" returns no leftover strings, whereas doing so on "hello world" makes " world" still available.
In some cases, the execution of the parser just fails; this is described by the Fail constructor of the result.
A third possibility to be returned is Partial. This highlights one important feature of attoparsec: it parses strings incrementally. In this example, when you provide "he", attoparsec does not have enough data to finish running the parser, so it gives you a callback you can use to provide more input. This is done via feed. In this example, if you provide "llo", you get a successful parse.

attoparsec includes some built-in parsers for different kinds of numbers, such as decimal, hexadecimal, or signed for integer values; rational for exact fractions; and double for floating-point values. Let’s try to use one of them.

*Chapter10.Parsing Data.Attoparsec.Text> parse decimal "78 cents"

Done " cents" 78

As you can see, the returned value is not a string but rather the actual integer value that could be parsed from the start of the string. This is another important characteristic of attoparsec: the parsers you create build Haskell values as they go. The type of the result is encoded in the type variable of the Parser type, which represents the basic attoparsec blocks. For example, the type of decimal is Integral a => Parser a.

The string and numeric parsers are basic blocks you can use to build larger ones. For that, you need combinators that combine parsers in certain ways. The first one is (<|>), which represents the disjunction of two parsers. If the first one doesn’t succeed on some input, the second one is tried. Let’s say you want to match either hello or bye. The following GHCi output shows how to do that. Notice that the (<|>) combinator lives in the Control.Applicative module (you’ll soon learn why) and that the code is using a variant of parse named parseOnly, which returns the success or failure of the parser using Either.

*> import Control.Applicative

*> parseOnly (string "hello" <|> string "bye") "hello"

Right "hello"

*> parseOnly (string "hello" <|> string "bye") "bye!"

Right "bye"

*> parseOnly (string "hello" <|> string "bye") "aloha" -- should fail

Left "string"

Note

The result types of the parsers being conjoined must coincide.

The next step is to combine several parsers in sequence. There’s one small tidbit: since each parser returns a result, you need to tell attoparsec how to combine those results. This is done via the (<$>) and (<*>) combinators. The first one is used just after the function, which should combine the results, and the second one is used to thread each of the arguments, in case the function has more than one. To make things clear, let’s define a parser that accepts hello or bye followed by a number and builds a value of type GreetingYear.

{-# LANGUAGE OverloadedStrings #-}

import Control.Applicative

import Data.Text

import Data.Attoparsec.Text

data GreetingYear = GreetingYear Text Int

greetingYearParser :: Parser GreetingYear

greetingYearParser = GreetingYear <$> (string "hello" <|> string "bye")

<*> decimal

The code without any parsing involved could be written using the ($) application function instead of (<$>) and simply a space instead of (<*>). As you can see, the interface of parser combinators resembles regular Haskell code, which makes it easy to use. With (<$>) and (<*>) you lift your normal functions to work on values wrapped in Parser. Let’s look at the types of those two operations.

(<$>) :: (a -> b) -> Parser a -> Parser b

(<*>) :: Parser (a -> b) -> Parser a -> Parser b

The first one is essentially your old friend fmap, which makes a pure function work on elements of a functor. Parser is an instance of Functor. In the previous example, you have the following:

GreetingYear :: Text -> (Int -> GreetingYear)

(string "hello" <|> string "bye") :: Parser Text

GreetingYear <$> (string "hello" <|> string "bye")

:: Parser (Int -> GreetingYear)

The (<*>) function is the one helping you supply more arguments to the function. Or, from another point of view, (<*>) unwraps the function inside a Parser and applies one argument.

GreetingYear <$> (string "hello" <|> string "bye")

:: Parser (Int -> GreetingYear)

decimal :: Parser Int

GreetingYear <$> (string "hello" <|> string "bye") <*> decimal

:: Parser GreetingYear

Thanks to curried functions, you can iterate (<*>) to apply each of the arguments of a multiparameter function.

Sometimes working with (<$>) and (<*>) alone is not enough for creating a maintainable Parser. For example, say you want to recognize the same pattern as shown earlier but with a space between the greeting and the number. You have almost all the building blocks to do so, but how do you recognize a single character? This is achieved via the char function. Now, you could write code similar to the following:

greetingYearParserS :: Parser GreetingYear

greetingYearParserS = (g _ y -> GreetingYear g y)

<$> (string "hello" <|> string "bye")

<*> char ' ' <*> decimal

Notice how the better-looking application of GreetingYear has been replaced with an abstraction whose only matter is dropping some of its arguments. If you were to change this parser (e.g., requiring an extra ! character after the greeting), you would need to assemble a new abstraction and take care that you drop the new unnecessary data. Clearly, this is not maintainable. The Control.Applicative module offers a version or (<*>), namely (<*), which describes the case where some input should be parsed but won’t be used to build any larger structure. Using it, the previous becomes the following:

greetingYearParserS :: Parser GreetingYear

greetingYearParserS = GreetingYear

<$> (string "hello" <|> string "bye")

<* char ' ' <*> decimal

It’s time to start building the parser for the Client output generated in the previous section. The first thing you need is some way to parse names, taking into account the rules of escaping that were outlined (remember, the characters , ( ) were replaced by a backslash and the character itself). Let’s create a parser that returns a single character, the satisfy combinator, that matches any character that satisfies a particular predicate; the notInClass function, which returns True only when an element is not in some set, will become handy. Also, when , and similar combinations are found, you want to return the corresponding single character, for which the const function is used.

aChar :: Parser Char

aChar = (const ',') <$> (string "\,")

<|> (const ' ') <$> (string "\n")

<|> (const '(') <$> (string "\(")

<|> (const ')') <$> (string "\)")

<|> satisfy (notInClass ", ()")

The idea is to call this parser repeatedly until some nonmatching input is found. One way to do this would be to create a new Parser that calls itself recursively. In each step it prepends a character to the String value to be returned later. But you also need a base case, which will be applied when a nonmatching character is found. The way to create a Parser that returns some value, without consuming any input, is via pure :: a -> Parser a. With all these ingredients, the code reads as follows:

aString :: Parser String

aString = ((:) <$> aChar <*> aString) <|> (pure "")

This pattern is common in parsers (if you know some grammar theory, the operation is called Kleene star, and it’s one of the basics in that theory), so Control.Applicative offers a many function that just iterates a parser until no matching input is found. The result is a list of the type of inner parsers, in this case [Char].

aString = many aChar

Once you know how to parse the escaped strings and integer numbers, you can create a parser for Person and Client. The following code is straightforward; the only new function introduced is (<$), which is the application function that drops its first argument. The purpose is similar to (<*).

aPerson :: Parser Person

aPerson = Person <$ string "person(" <*> aString

<* char ',' <*> aString <* char ')'

aClient :: Parser (Client Int)

aClient = GovOrg <$ string "client(gov," <*> decimal

<* char ',' <*> aString <* char ')'

<|> Company <$ string "client(com," <*> decimal

<* char ',' <*> aString <* char ','

<*> aPerson <* char ',' <*> aString <* char ')'

<|> Individual <$ string "client(ind," <*> decimal

<* char ',' <*> aPerson <* char ')'

You can check that the parser works on the builder output using the interpreter.

*> let co = Company 1 "Black Hole Inc." (Person "John" "Smith") "Traveller"

*> let b = clientToText co

*> let c = Data.Text.Lazy.toStrict $ Data.Text.Lazy.Builder.toLazyText b

*> parseOnly aClient c

Right (Company {clientId = 1, clientName = "Black Hole Inc.",

person = Person {firstName = "John", lastName = "Smith"},

duty = "Traveller"})

To finish this brief introduction to attoparsec, I’ll introduce the Data.Attoparsec.Combinator module . In that module you can find parser combinators that match many other typical patterns. For example, you have option, which helps you build parsers with optional input. A call to option has two parameters. The second one is the parser that is tried for matching on the input; if the matching is unsuccessful, the call to option returns the value given as the first parameter.

Many other combinators help in matching lists of elements in many different ways. For example, sepBy parses a list with some separator between them. This is the case of the list of clients; a Parser for it would call aClient repeatedly, but it would expect newline characters between them.

parseClients :: Parser [Client Int]

parseClients = sepBy aClient (char ' ')

That module also features functions for parsing lists that end in 1, such as many1, sepBy1, and so on. In that case, the lists must contain at least one element to be successful.

Exercise 10-2 asks you to parse the products and purchases in Exercise 10-1. You can use this task to master the attoparsec interface and the many combinators it provides.

Exercise 10-2. Parsing Products and Purchases

Create parsers for the output that could be generated by the functions you wrote for Exercise 10-1. You might want to read the attoparsec documentation in Hackage beforehand to have a better idea of the built-in parsers that it provides.

In the same way that a Builder could work with conduit, a Parser can be used to produce a stream of values from the textual input. In this case, it may become either a sink, returning the values themselves, or provide the information for further processing. This functionality is provided in the Data.Conduit.Attoparsec module of the conduit-extra package.

Using this package, building a function that loads all the clients in a file generated by the Builder in the previous section becomes just a simple concatenation of three actors: first we read a file as a ByteString, then decode it into Text values, and finally parse that information.

loadClients :: FilePath -> IO [Client Int]

loadClients fPath = runConduitRes $

B.sourceFile fPath .| T.decode T.utf8 .| sinkParser parseClients

Introducing New Type Classes

The fact that (<$>) and (<*>) do not live in some attoparsec-specific module but rather in their own Control.Applicative offers a hint about those combinators being useful in other scenarios. It would be interesting to study how the lifting works with other types. For example, suppose they would also work on Maybe values. Then the types would be as follows:

(<$>) :: (a -> b) -> Maybe a -> Maybe b

(<*>) :: Maybe (a -> b) -> Maybe a -> Maybe b

In combination, these functions provide a way to lift a function on possibly empty values, returning Nothing when some of the arguments are left and returning Just if every argument is available. Let’s confirm this by running some examples in the interpreter.

*Chapter10.TypeClasses> import Control.Applicative

*Chapter10.TypeClasses Control.Applicative> (+) <$> Just 2 <*> Just 3

Just 5

*Chapter10.TypeClasses Control.Applicative> (+) <$> Just 2 <*> Nothing

Nothing

If both Parser and Maybe can be used with the same functions, it indicates you’re in the presence of a new type class, namely, Applicative. In this section, you’ll get a look at Applicative and several other related type classes. I already spoke about how (<$>) is quite like fmap, which opens the door to discussing the relation to the Functor type class, and you saw how Maybe an instance of Applicative is too, so maybe there’s also some relation with Monad.

Applicative

I will now introduce the basic Applicative type class. The (<*) and (*>) functions are also part of the type class, but they have default definitions and will be treated later. The basic functions in the type class are defined as follows:

class Functor f => Applicative f where

pure :: a -> f a

(<*>) :: f (a -> b) -> f a -> f b

You already have some intuition about (<*>) ; it allows you to extract some function that is inside a context of type f and applying one argument to it. This is the main strength of the Applicative type class: being able to compute when everything you have is inside the same context. In the following examples, both functions and arguments are wrapped inside Maybe, and (<*>) allows you to perform the application.

*Chapter10.TypeClasses ...> Just (+) <*> Just 2 <*> Just 3

Just 5

*Chapter10.TypeClasses ...> Just (+) <*> Just 2 <*> Nothing

Nothing

*Chapter10.TypeClasses ...> Nothing <*> Just 2 <*> Just 3

Nothing

As you can see in the third example, both the function to be applied and the arguments can be Just or Nothing. Intuitively, when you have no function to apply, you cannot return any result, as shown earlier.

The second function in the type class is pure . Looking at its type, a -> f a, you can guess what the function does; it wraps some pure value inside the context f. For example, when using Maybe, pure is equivalent to Just. Thus, you may write the first two examples in the previous code as follows:

*Chapter10.TypeClasses ...> (pure (+) <*> pure 2 <*> pure 3) :: Maybe Int

Just 5

*Chapter10.TypeClasses ...> pure (+) <*> Just 2 <*> Nothing

Nothing

The combination of wrapping some pure function and then applying it to several arguments is common when using Applicative. For that reason, the (<$>) was created with that simple definition.

(<$>) :: (a -> b) -> f a -> f b

f <$> x = pure f <*> x

Applicative Laws

As is the case with Monoid, Functor, Monad, and many other type classes, Applicative not only mandates some functions but also has some laws that must be fulfilled. In this case, the laws are as follows:

pure id <*> x = x, or id <$> x = x

(.) <$> x <*> y <*> z = x <*> (y <*> z)

f <*> pure x = pure ($ x) <*> f

f <$> pure x = pure f <*> pure x = pure (f x)

The first three rules make Applicatives work nicely with regular functions. The identity function is still identity inside f, and function composition and application are also reflected inside Applicative. The last law is telling that pure computations do not change whether they are performed outside and then lifted or lifted in parts.

Up to now, you’ve seen that Parser and Maybe are instances of Applicative. Your old friend [] is also part of that group. As in Chapter 7, lists can be used to model nondeterminism, and this is exactly what its Applicative instance does. For example, if you have a set of functions that may be applied and a set of data that could be used as arguments, (<*>) returns a list with all pairwise applications.

*> import Data.Char

*> [("Hello, " ++), x -> "HEY, " ++ map toUpper x] <*> ["Alex", "John", "Paul"]

["Hello, Alex","Hello, John","Hello, Paul","HEY, ALEX","HEY, JOHN","HEY, PAUL"]

The (<$>) function is useful to describe computations where a fixed function may be given several different arguments. For example, let’s create several greetings for several people.

*> (++) <$> ["Hello, ", "Good bye, "] <*> ["Alex", "John", "Paul"]

["Hello, Alex","Hello, John","Hello, Paul"

,"Good bye, Alex","Good bye, John","Good bye, Paul"]

Functors, Applicatives, and Monads

There are some interesting things going on with the use of (<$>) in the preceding section. The signature of (<$>) is equal to that of fmap of a Functor.

(<$>) :: Applicative f => (a -> b) -> f a -> f b

fmap :: Functor f => (a -> b) -> f a -> f b

Also, (<*>) looks close to the (=<<) function (which is just (>>=) but with arguments reversed) in a Monad.

(<*>) :: Applicative f => f (a -> b) -> f a -> f b

(=<<) :: Monad f => (a -> f b) -> f a -> f b

And indeed, you’ve seen that Maybe and [], which are both monads, are instances of Applicative. In this section I’ll shed some light on the relation of these important type classes.

First, the Applicative type class mandates that every Applicative also be a Functor. Furthermore, the laws of the first type class imply that the definition of fmap must be exactly equal to (<$>). However, as discussed in the context of parsers, Applicative is stronger than Functor. This is because with Functor you can move functions with only one parameter inside the appropriate context. If you want to apply a function f with two parameters instead, applying fmap once to get rid of one parameter gives you the following type:

fmap :: (a -> b -> c) -> f a -> f (b -> c)

fmap f :: f a -> f (b –> c)

fmap f x :: f (b -> c)

You cannot do anything more with only Functor because it’s not able to get the function within the f context. But if f were Applicative, you could use (<*>) to continue supplying parameters. The extension from Functor to Applicative gives you the power of using functions with any number of arguments.

The Monad and Applicative type classes are also related in a direct way. In short, every Monad is also an Applicative, which means that every monad that has been presented until now can be used also with the Applicative interface. You can verify this by remembering the function ap that was introduced in Chapter 7 as a way to generalize liftM to any number of parameters.

ap :: Monad m => m (a -> b) -> m a -> m b

This is exactly the signature of (<*>) in Applicative, and it works the same. In Chapter 7, I also discussed how liftMn f x1 ... xn could be changed into the more extensible version shown here:

return f `ap` x1 `ap` ... `ap` xn

This follows the same pattern that you use in the definition of (<$>) in terms of pure and (<*>). In particular, it shows that if a type is an instance of Monad, you can define the pure function in its Applicative instance by making it equal to return.

Of course, since Applicative and Monad are not equal and Applicative is higher in the hierarchy of classes, there must be something you can do with Monads that cannot be done using only Applicatives. The answer to that question comes by looking at the different signatures of (<*>) and (=<<) that were introduced earlier in the section. There you can spot the difference; the function provided as the first argument to (<*>) must be of type f (a -> b), whereas in (=<<) this type is a -> f b.

The implication here is that using monads as an argument to a function can change the function to be executed in the next step (because you return f b), whereas in Applicatives an argument to a function may affect the current step to be executed but not the remaining ones. To make it clear, consider the following case, where you want to use the following function that, depending on whether the first parameter is 1 or not, would result in adding one or doubling one number:

x -> if x == 1 then Just (+ 1) else Just (* 2)

If you apply one parameter using (<*>), you get the following type:

(x -> if x == 1 then pure (+ 1) else pure (* 2)) <$> Just 1

:: (Num a) => Maybe (Maybe (a -> a))

And at this point the Applicative type class is of no help; there’s no way you can unwrap those two layers of Maybe using either (<$>) or (<*>). However, in Chapter 7, a join function on monads was introduced, with exactly the signature that you need: Monad m => m (m a) -> m a. So, if you’re using a monad, you can “fuse” the two layers into one and continue execution.

This doesn’t necessarily mean that a monadic interface is always better than an Applicative one. The fact that when using monads you can influence the flow of execution via the values that are computed at runtime makes it much harder to analyze what’s going on inside. But in a parsing library you are willing to inspect the structure of the computation, and in doing so you may apply many optimizations that make the parsing process much more efficient. For that reason, the Applicative interface was chosen for attoparsec.

Writing More Readable Monadic Code

In this chapter and Chapter 7, you’ve seen many different ways to express the same concepts in Functor, Applicative, and Monad.

fmap f x = f <$> x = do y <- x; return (f x) = liftM f x

f <$> x <*> y = return f `ap` x `ap` y

pure x = return x

These equalities can help you write more readable code, especially when using monads. For example, consider the following code:

do name <- getLine

let upperName = map toUpper name

putStrLn upperName

In previous chapters I advocated for removing the intermediate let step with fmap, but it can be written even more closely to a regular function application with (<$>).

do upperName <- map toUpper <$> getLine

putStrLn upperName

Or it can even be much shorter using the Monad functions.

putStrLn =<< (map toUpper <$> getLine)

Finding the best way to write the code, balancing both readability and conciseness, is always important.

Alternative

I have already discussed the (<$>) and (<*>) functions, but while using Parsers you saw another important function: (<|>) . It is part of the Alternative type class, which is reproduced here:

class Applicative f => Alternative f where

empty :: f a

(<|>) :: f a -> f a -> f a

some :: f a -> f [a]

some v = (:) <$> v <*> many v

many :: f a -> f [a]

many v = some v <|> pure []

As you can see, the Alternative type class resembles MonadPlus a lot. It gives you the possibility of failure via empty and a choice via (<|>) , much like mempty and mplus worked. For example, you can use it to return one of several Maybe possibilities, namely, the first, which is Just.

*Chapter10.TypeClasses> import Control.Applicative

*Chapter10.TypeClasses Control.Applicative> Nothing <|> Just 2 <|> Just 3

Just 2

Note

Indeed, anything that is MonadPlus is both Monad and Alternative.

The some and many functions are intended to be used in parser-like environments. If you have a computation v that holds or returns elements of type t, you can use many and some to run it repeatedly and get a list of elements of type t instead. The difference is that with many you may get an empty list, whereas some enforces that at least one element is returned. These functions are not used outside of parsers because in many cases the recursive definition does not allow them to terminate.

Traversable

Now that many new type classes have been introduced, it is a good time to look at the last of the “structural” type classes of which containers and context are examples. The name is Traversable, and here’s the definition:

class (Functor t, Foldable t) => Traversable t where

traverse :: Applicative f => (a -> f b) -> t a -> f (t b)

traverse f = sequenceA . fmap f

sequenceA :: Applicative f => t (f a) -> f (t a)

sequenceA = traverse id

mapM :: Monad m => (a -> m b) -> t a -> m (t b)

mapM f = unwrapMonad . traverse (WrapMonad . f)

sequence :: Monad m => t (m a) -> m (t a)

sequence = mapM id

It looks scary, so let’s first make some simplifications. If you look closely, the Traversable type class defines two sets of functions, one for Applicative (containing traverse and sequenceA) and another one for Monad (with mapM and sequence ), with the same signature in addition to the type class constraints. Thus, you need to learn about only one of the two sets.

One example of Traversable is []. The types of sequence and sequenceA for that case are as follows:

sequenceA :: Applicative f => [f a] -> f [a]

sequence :: Monad m => [m a] -> m [a]

The functions are telling you how to go from a list of computations of type f a to a single computation with a list in its core. Say you want to make f or m equal to Maybe, which is both Applicative and Monad. Then the specific type in that case is as follows:

sequenceA, sequence :: [Maybe a] -> Maybe [a]

If you look at Maybe as having some effect, what both sequence and sequenceA are doing is bringing all those effects “out of the list.” Since the effect Maybe entails is the possibility of failure, you would expect that running any of those functions will succeed only when every element in the list is a Just; otherwise, it needs to bring the failure “out of the list,” which can be done only by returning Nothing altogether. You can check that this is indeed the case in the interpreter.

*Chapter10.TypeClasses> import Data.Traversable

*Chapter10.TypeClasses Data.Traversable> sequenceA [Just 2, Just 3, Just 4]

Just [2,3,4]

*Chapter10.TypeClasses Data.Traversable> sequenceA [Just 2, Nothing, Just 4]

Nothing

A more specific version of sequence in Control.Monad (only for lists) was already used in Chapter 7 for this same purpose: executing all the actions in a list of monadic values and returning the results of those actions.

The sequence and sequenceA functions are often described also as “commuting two functors.” If you look at their type, you can see that you start with t (f a) and end up with f (t a). The order of t and f has been reversed; this is what commuting means. Furthermore, since t is Traversable, it ought to be a Functor, and since f is Applicative, it needs to be a Functor as well. This means there are some kinds of structures (such as lists and Maybes) that are able to go “inside” every possible effect in a normal way.

Let’s look at the other set of functions: traverse and mapM. In this case, it might be important for you to recall the usage of the mapM function in Control.Monad. Usually the way to look at these functions is thinking of “fmap with extra effects.” The first argument is a function that returns some value and along the way has some extra computational effects. This function is applied in every element contained in the Traversable value, and the effects are all joined and taken outside the returned structure.

Once again, let’s look at the special case of the Traversable t being [] and the Applicative f being Maybe. The specific signature for traverse and mapM becomes the following:

traverse, mapM :: (a -> Maybe b) -> [a] -> Maybe [b]

In this case the first argument is applied to every element of the list. Alas, since you need to return a Maybe value at the end, the only thing you can do is return Just only if every application of that first argument returned Just.

One important characteristic of all these functions is that they respect the structure of the Traversable being worked on. For example, if traverse is applied to a list with five elements, the resulting list will also have five elements, plus any extra effect. This is important because it means that the functions may change only the values contained but not the pattern they follow. This brings it closer to fmap and spans a variety of applications. For example, in Chapter 6 I spoke about the traversed lens that allowed you to go inside a container and retrieve or update the value of each of its elements. Now you can see the reason behind the name and why Traversable is exactly the type class you need.

The documentation of Traversable also specifies that the structures are traversed from left to right. This small detail was not important for Functor (because no extra effect was produced) or for Foldable (because the monoid you use underneath is associative). But in the case of Traversable, it may not be the same to execute the actions in one or another order. Think of the case of Applicative f or Monad m being the IO monad or the Writer monad.

Implementing Traversable for a specific container seems like a complex task, but it’s easy in practice. Take as an example the BinaryTree2 type, which was introduced in Chapter 4.

data BinaryTree2 a = Node2 a (BinaryTree2 a) (BinaryTree2 a)

| Leaf2

deriving Show

In Exercises 4-8 and 4-9, you were asked to implement its Functor and Foldable instances. Functor instances usually follow the pattern of applying the corresponding function to each value in the structure and then calling fmap recursively on each substructure. Foldable instances do a similar job, but they apply the monoid operations in between. Thus, the instances looked like this:

instance Functor BinaryTree2 where

fmap f (Node2 x l r) = Node2 (f x) (fmap f l) (fmap f r)

fmap _ Leaf2 = Leaf2

instance Foldable BinaryTree2 where

foldMap f (Node2 x l r) = (f x) <> (foldMap f l) <> (foldMap f r)

foldMap _ Leaf2 = mempty

The best way to write the Traversable instance is via traverse, which makes itself a minimal complete definition. The function traverse was introduced before as “fmap with effects.” The only thing you need to do is follow the same structure as fmap but ensure that Applicative is used along the way.

instance Traversable BinaryTree2 where

traverse f (Node2 x l r) = Node2 <$> f x

<*> traverse f l

<*> traverse f r

traverse _ Leaf2 = pure Leaf2

Since these instances always follow the same structure, GHC is able to automatically derive it. You do this by enabling a bunch of extensions in your source file.

{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}

Then you change the deriving part of the BinaryTree2 definition.

deriving (Show, Functor, Foldable, Traversable)

Don’t Overengineer: Just Use JSON

The focus of this chapter has been building and parsing text, which are usually tedious and error-prone activities. Every time you want to add some new data, you have to change both parts and keep them synced so that the output generated by the Builder can be consumed by the Parser.

Instead, it would be nice to use a library that helps to interchange information in a textual way. If possible, it would be nice to use a format that is common to many languages and platforms because future interoperation between tools may be a requirement. The JSON format provides all those features. It has almost universal support between languages and platforms, and it uses a simple textual representation that makes it easy for both humans and computers to process it. Furthermore, JSON is the de facto standard for data interchange in the Web, so using it opens the door to creating web applications in Haskell.

A Client can be represented in JSON in the following way:

{ "type": "company", "id": 1, "name": "Black Hole Inc."

, "person": { "first": "John", "last": "Smith" }, "duty": "Traveller" }

In this brief example, you can see almost every possible way to build a JSON value. First, you have basic types, such as numbers, Booleans, and strings, which are represented by their literals. From these basic values, you can build either arrays (not shown here, but written using the same syntax as Haskell lists), or objects, which are key-value maps. In addition, you also have the null special value. In this case, the full value is an object with four keys and values, of which the person key contains a nested object.

The good news is that the Haskell community has come up with a simple yet efficient library for reading and writing JSON values. Its name is aeson , and its use revolves around a small data type.

data Value = Object Object

| Array Array

| String Text

| Number Number

| Bool Bool

| Null

Each of the constructors corresponds to a type of value you can find in JSON, as explained earlier.

Let’s start by writing functions for converting JSON representations and Client values. Most of the conversion to a JSON value entails wrapping data in the corresponding constructor. In the case of creating Objects, the easiest way to do this is via the object function, which takes a list of pairs as an argument. Each of these pairs is defined as key .= value. The conversion to JSON then reads as follows:

{-# LANGUAGE OverloadedStrings #-}

import Data.Aeson

import Data.Text

clientToJSON :: Client Integer -> Value

clientToJSON (GovOrg i n) =

object [ "type" .= String "govorg"

, "id" .= Number (fromInteger i)

, "name" .= String (pack n) ]

clientToJSON (Company i n p d) =

object [ "type" .= String "company"

, "id" .= Number (fromInteger i)

, "name" .= String (pack n)

, "person" .= personToJSON p

, "duty" .= String (pack d) ]

clientToJSON (Individual i p) =

object [ "type" .= String "individual"

, "id" .= Number (fromInteger i)

, "person" .= personToJSON p ]

personToJSON :: Person -> Value

personToJSON (Person f l) = object [ "first" .= String (pack f)

, "last" .= String (pack l) ]

If you now want to recreate a Person from a JSON value, you need to check that every key is present. You can do this via the HashMap functions (similar to those in Map) that aeson Value uses. Since HashMap i s defined in the unordered-containers package, you need to include it in your project for this example to work.

import qualified Data.HashMap.Strict as M

jsonToPerson :: Value -> Maybe Person

jsonToPerson (Object o) = do String f <- M.lookup "first" o

String l <- M.lookup "last" o

return $ Person (unpack f) (unpack l)

jsonToPerson _ = Nothing

There’s more than one approach to accessing the fields in the JSON value. As you saw in Chapter 6, lenses are useful for this kind of navigation inside an object. The lens-aeson package provides a set of functions that aim to make the work with aeson Values easier. In this particular case, you can use key, which provides access to a named field in the object. Since the key may be unavailable, you do not use (^.) but its counterpart (^?), which may return Nothing. The previous code can be rewritten as follows:

import Control.Lens ((^?))

import Data.Lens.Aeson

jsonToPerson :: Value -> Maybe Person

jsonToPerson j = do String f <- j ^? key "first"

String l <- j ^? key "last"

return $ Person (unpack f) (unpack l)

Another interesting member of the Data.Lens.Aeson module is nth, which given an index obtains the element in that position if the JSON value being considered is an array. For the rest of the primitive JSON types, lens-aeson provides the combinators _Number (with its two variants _Double and _Integer for automatically converting into one specific numeric type), _String, and _Bool.

Anyway, directly matching on a Value is not the suggested way to decode information from JSON in aeson . Instead, you should use a JSON parser. The interface to that Parser is similar to the one in attoparsec; in particular, it also uses the Applicative functions. The main difference is the set of basic blocks for parsers. In aeson you use (.:), which extracts a value with a given key. The most common way to convert from JSON to Person is as follows:

import Data.Aeson.Types

import Control.Applicative

jsonToPerson :: Value -> Parser Person

jsonToPerson (Object o) = Person <$> o .: "first" <*> o .: "last"

jsonToPerson _ = Control.Applicative.empty

The functions for converting JSON back and forth are not usually defined alone but rather as part of the ToJSON and FromJSON type classes that aeson provides. Let’s add them to the code for the Person type.

instance ToJSON Person where

toJSON = personToJSON

instance FromJSON Person where

parseJSON = jsonToPerson

The great benefit of using ToJSON and FromJSON is that aeson includes a bunch of predefined instances for many types. For example, once you write it for Person, you can also serialize into JSON lists of people, objects containing Persons, and so on. In particular, let’s use the just-defined instance for defining the parser for Client and the corresponding instances of ToJSON and FromJSON. Notice that FromJSON allows you to be more general in the types you accept as identifiers.

{-# LANGUAGE FlexibleInstances #-}

jsonToClient :: FromJSON i => Value -> Parser (Client i)

jsonToClient (Object o) =

case M.lookup "type" o of

Just (String "govorg") -> GovOrg <$> o .: "id" <*> o .: "name"

Just (String "company") -> Company <$> o .: "id" <*> o .: "name"

<*> o .: "person"

<*> o .: "duty"

Just (String "individual") -> Individual <$> o .: "id"

<*> o .: "person"

_ -> Control.Applicative.empty

jsonToClient _ = Control.Applicative.empty

instance ToJSON (Client Integer) where

toJSON = clientToJSON

instance FromJSON i => FromJSON (Client i) where

parseJSON = jsonToClient

Using the interpreter, you can see how the conversion to JSON works correctly.²

*> :{

*| toJSON $ Company (1 :: Integer) "Black Hole Inc."

*| (Person "John" "Smith") "Traveller"

*| :}

Object fromList [("name",String "Black Hole Inc.")

,("duty",String "Traveller")

,("person",Object fromList [("last",String "Smith")

,("first",String "John")])

,("id",Number 1.0),("type",String "company")]

To make the converse operation, you need to call fromJSON, which runs the Parser with a given Value and produces a Result that may be either Success or Error. Notice that the code uses an explicit signature to tell the interpreter which FromJSON instance it needs to use.

*> :{

*| fromJSON $ toJSON $

*| Company (1 :: Integer) "Black Hole Inc."

*| (Person "John" "Smith") "Traveller" :: Result (Client Integer)

*| :}

Success (Company { clientId = 1, clientName = "Black Hole Inc."

, person = Person {firstName = "John", lastName = "Smith"}

, duty = "Traveller" })

Once you know how to convert from and to aeson Value, the next step is to convert those Values into actual textual representations. For generating the text, you just need to use encode, which returns a ByteString. In the case of moving from text to JSON, you have several possibilities.

decode takes a ByteString, parses the text to produce an aeson Value, and then converts that Value into some other type using FromJSON. The result is wrapped on a Maybe to signal failure.
eitherDecode performs the same task, but in the case of failure, the error message is returned in the Left constructor of an Either.
You can perform the two steps separately. First, aeson provides json, an attoparsec Parser from ByteString into Value. Then, you can call fromJSON to generate the final value.

One small tidbit is that encoding and decoding in aeson work on lazy ByteStrings. We have seen that other libraries, such as conduit, use strict ByteStrings instead. In those cases, you need to perform a conversion, as shown in the following code:

import Data.Conduit

import qualified Data.Conduit.Binary as B

import qualified Data.Conduit.List as L

import qualified Data.ByteString.Lazy as LB

saveClients :: FilePath -> [Client Integer] -> IO ()

saveClients fPath clients = runConduitRes $

yield (toJSON clients) .| L.map (LB.toStrict . encode)

.| B.sinkFile fPath

Exercise 10-3 asks you to use JSON to save and load products and purchases in a file.

Exercise 10-3. JSON for Products and Purchases

Write ToJSON and FromJSON instances of the Product and Purchase data types that were introduced in Exercise 10-1. Then, using aeson and conduit, create an application that reads a list of Products represented in JSON and shows the average price.

ToJSON and FromJSON instances always follow the same pattern: ToJSON writes each of the fields of a constructor in key-value pairs in JSON, with an additional field for knowing the constructor if the data type has more than one (like type in the Client example). FromJSON just tries to read each of these keys. Haskell tries to avoid boilerplate code as much as possible, and this is a clear case of it. In case you don’t need any special behavior, aeson can write the instances for you.

To do so, you need to enable the DeriveGeneric extension in GHC and add Generic to the list of instances in the deriving clause, as we have done in the previous chapter for Binary and CSV. A Generic instance contains information about the constructors and fields in a data type. Then, you just need to write empty instances of ToJSON and FromJSON, and using that Generic information, aeson will do the rest. Here’s the code for the Client version:

{-# LANGUAGE DeriveGeneric #-}

import GHC.Generics

data Client i = ... deriving (Show, Generic)

data Person = ... deriving (Show, Read, Generic)

instance ToJSON i => ToJSON (Client i)

instance FromJSON i => FromJSON (Client i)

instance ToJSON Person

instance FromJSON Person

These automatically derived instances will use as keys the name of the fields declared in the constructors. For that reason, it works only on data declarations using record syntax. If your type has more than one constructor, the key that will be used to distinguish them is called tag.

Parsing Command-Line Arguments

Every time you find some operation that involves parsing in some way or another, you will most likely find an Applicative interface to it. In addition to JSON decoding via aeson, one other useful package is optparse-applicative, which helps describe and organize the set of command-line arguments to an application.

For example, this is how you would define the arguments to an application that expects one file name containing a list of clients and another argument specifying whether the information is kept in JSON format or not:

import Options.Applicative

data Args = Args String Bool -- data type holding the arguments

args :: Parser Args -- read the arguments

args = Args <$> strOption (long "file" <> help "Database of clients to load")

<*> switch (long "json" <> help "Whether the database uses JSON")

argsInfo :: ParserInfo Args -- define arguments + help text

argsInfo = info args fullDesc

main :: IO ()

main = do Args fPath json <- execParser argsInfo

...

The great benefit of Applicative is that it provides a common interface to tasks that are not necessarily related at first sight. It’s becoming widely used, and nowadays Applicative is seen as a fundamental type class in Haskell, like Functor and Monad.

Summary

In this chapter, you learned how to deal with text in a variety of situations in Haskell.

You saw the differences between the String, ByteString, and Text types, as well as the purpose of each of them.
Both text and bytestring provide Builder types that allow you to efficiently generate textual data.
You were introduced to the converse operation, parsing, via the attoparsec package. That library uses the combinators approach to generate modular and easy-to-read parsers.
You were introduced to the Applicative type class, which lies between Functor and Monad in strength. You also learned about the Alternative and Traversable type classes.
Finally, I discussed the conversion of JSON back and forth using the aeson library.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 10. Building and Parsing Text

Create new playlist

Sign In

Sign Up

10. Building and Parsing Text

The Five Textual Data Types

Note

Note

Note

Building as Fast as the Wind

Note

Exercise 10-1. Building Products and Purchases

Parsing with attoparsec

Note

Note

Exercise 10-2. Parsing Products and Purchases

Introducing New Type Classes

Applicative

Applicative Laws

Functors, Applicatives, and Monads

Writing More Readable Monadic Code

Alternative

Note

Traversable

Don’t Overengineer: Just Use JSON

Exercise 10-3. JSON for Products and Purchases

Parsing Command-Line Arguments

Summary

Table of Contents for
10. Building and Parsing Text