Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Stefania Loredana Nita and Marius MihailescuHaskell Quick Syntax Referencehttps://doi.org/10.1007/978-1-4842-4507-1_17

17. Parsing

Stefania Loredana Nita¹ and Marius Mihailescu¹

(1)

Bucharest, Romania

Wikipedia defines the parsing process as follows:

“The process of analyzing a string of symbols, either in natural language, computer languages, or data structures, conforming to the rules of a formal grammar. The term is used to refer to the formal analysis by a computer of a sentence or other string of words into its constituents, resulting in a parse tree showing their syntactic relation to each other, which may also contain semantic and other information.”

This sounds a little complex, and it really is, involving grammars, parsing trees, some tokens, expressions, terms, and so on. In this chapter, you will learn something easier: how to parse binary data in Haskell.

This chapter is inspired from [2] and will use the examples provided there. Note that the examples may not work with newer versions of ghc. To run them, use ghc 8.0.2.

When you have a string of bytes, you will want to do something with them. A package that deals with binary data in Haskell is called Binary . First you need to install it, so open a terminal and type the following:

cabal install binary

The Binary package has three main components.

The Get monad
The Put monad
A serialization component

Get is a state monad, so it keeps a state and changes it when an action is applied on that state. In this situation, the state is actually an offset of the lazy string of bytes that will be parsed. In scenarios where you get a failure when trying to parse a bytestring, the failure will be interpreted as an exception—these are handled in the IO monad due to laziness—which will probably be thrown in a different place than you expect. In such situations, you can use a stricter version of the Get monad .

Here is an example with a regular Get:

import qualified Data.ByteString.Lazy as BL

import Data.Binary.Get

import Data.Word

deserialiseHeader :: Get (Word32, Word32, Word32)

deserialiseHeader = do

alen <- getWord32be

plen <- getWord32be

chksum <- getWord32be

return (alen, plen, chksum)

main :: IO ()

main = do

input <- BL.getContents

print $ runGet deserialiseHeader input

The inputs are three 32-bit numbers in big-endian format, and the output is a tuple .

% runhaskell /ch17/parsing.hs << EOF

heredoc> 123412341235

heredoc> EOF

(825373492,825373492,825373493)

If the input is too short, you get an exception.

% runhaskell / ch17/parsing.hs << EOF

tooshort

EOF

parsing.hs: Data.Binary.Get.runGet at position 8: not enough bytes

CallStack (from HasCallStack):

error, called at libraries/binary/src/Data/Binary/Get.hs:351:5 in binary-0.8.6.0:Data.Binary.Get

In the previous example, the function getWord32 takes the input , goes through it, and returns a value.

The next example decodes a list of numbers that end with EOF in a recursive manner:

listOfWord16 = do

empty <- isEmpty

if empty

then return []

else do v <- getWord64be

rest <- listOfWord16

return (v : rest)

You saw in the first example that you get an exception when the input is too short. There are two ways to handle exceptions: write your own parser or handle the exceptions in the IO monad . The second way is simpler because it involves a stricter version of the Get monad, in which the parser Get a taken as input to runGet results in (Either String a, ByteString). This means the first value is either a string (i.e., the exception message) or the result, while the second value is the remaining bytestring . The following is the modified version of the first example with the strict Get being used:

import qualified Data.ByteString as B

import Data.Binary.Strict.Get

import Data.Word

deserialiseHeader :: Get (Word32, Word32, Word32)

deserialiseHeader = do

alen <- getWord32be

plen <- getWord32be

chksum <- getWord32be

return (alen, plen, chksum)

main :: IO ()

main = do

input <- B.getContents

print $ runGet deserialiseHeader input

Note that this example requires binary-strict, which needs to be installed using cabal. The changes in the code are that it is using a strict bytestring instead of a lazy bytestring and it is importing Data.Binary.Strict.Get. If the example is run again, you will obtain:

% runhaskell /ch17/parsing.hs << EOF

heredoc> 123412341235

heredoc> EOF

(Right (825373492,825373492,825373493)," ")

Now it works correctly because the output is a Right, and a new line was added instead of being consumed by the parser. Let’s run it with the shorter input, as shown here:

% runhaskell /ch17/parsing.hs << EOF

heredoc> tooshort

heredoc> EOF

(Left "too few bytes"," ")

Now, the output is a Left instead of an exception and can be handled in the IO monad. The fail can be called inside the parser.

Operations on bits are allowed. Importing Data.Bits, the following operators can be used:

Operator	Symbol
AND	.&.
OR	.\|.
XOR	`xor`
NOT	`complement`
Left shift	`shiftL`
Right shift	`shiftR`

Don’t forget that working with bits needs special attention .

Summary

In this chapter, you learned how to parse a bytestring using binary and binary-strict.

For a parser created from scratch, check out [3].

References

1.
G. Hutton and E. Meijer, “Monadic Parsing in Haskell,” Journal of Functional Programming, 8.4: 437–444 (1998)
2.
Dealing with binary data, https://wiki.haskell.org/Dealing_with_binary_data#Binary_parsing
3.
Parser, https://www.schoolofhaskell.com/school/starting-with-haskell/basics-of-haskell/8_Parser

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 17. Parsing

Create new playlist

Sign In

Sign Up

17. Parsing

Summary

References

Table of Contents for
17. Parsing