© Stefania Loredana Nita and Marius Mihailescu 2019
Stefania Loredana Nita and Marius MihailescuHaskell Quick Syntax Referencehttps://doi.org/10.1007/978-1-4842-4507-1_17

17. Parsing

Stefania Loredana Nita1  and Marius Mihailescu1
(1)
Bucharest, Romania
 
Wikipedia defines the parsing process as follows:

“The process of analyzing a string of symbols, either in natural language, computer languages, or data structures, conforming to the rules of a formal grammar. The term is used to refer to the formal analysis by a computer of a sentence or other string of words into its constituents, resulting in a parse tree showing their syntactic relation to each other, which may also contain semantic and other information.”

This sounds a little complex, and it really is, involving grammars, parsing trees, some tokens, expressions, terms, and so on. In this chapter, you will learn something easier: how to parse binary data in Haskell.

This chapter is inspired from [2] and will use the examples provided there. Note that the examples may not work with newer versions of ghc. To run them, use ghc 8.0.2.

When you have a string of bytes, you will want to do something with them. A package that deals with binary data in Haskell is called Binary . First you need to install it, so open a terminal and type the following:
cabal install binary
The Binary package has three main components.
  • The Get monad

  • The Put monad

  • A serialization component

Get is a state monad, so it keeps a state and changes it when an action is applied on that state. In this situation, the state is actually an offset of the lazy string of bytes that will be parsed. In scenarios where you get a failure when trying to parse a bytestring, the failure will be interpreted as an exception—these are handled in the IO monad due to laziness—which will probably be thrown in a different place than you expect. In such situations, you can use a stricter version of the Get monad .

Here is an example with a regular Get:
import qualified Data.ByteString.Lazy as BL
import Data.Binary.Get
import Data.Word
deserialiseHeader :: Get (Word32, Word32, Word32)
deserialiseHeader = do
  alen <- getWord32be
  plen <- getWord32be
  chksum <- getWord32be
  return (alen, plen, chksum)
main :: IO ()
main = do
  input <- BL.getContents
  print $ runGet deserialiseHeader input
The inputs are three 32-bit numbers in big-endian format, and the output is a tuple .
% runhaskell /ch17/parsing.hs << EOF
heredoc> 123412341235
heredoc> EOF
(825373492,825373492,825373493)
If the input is too short, you get an exception.
% runhaskell / ch17/parsing.hs << EOF
tooshort
EOF
parsing.hs: Data.Binary.Get.runGet at position 8: not enough bytes
CallStack (from HasCallStack):
  error, called at libraries/binary/src/Data/Binary/Get.hs:351:5 in binary-0.8.6.0:Data.Binary.Get

In the previous example, the function getWord32 takes the input , goes through it, and returns a value.

The next example decodes a list of numbers that end with EOF in a recursive manner:
listOfWord16 = do
  empty <- isEmpty
  if empty
     then return []
     else do v <- getWord64be
             rest <- listOfWord16
             return (v : rest)
You saw in the first example that you get an exception when the input is too short. There are two ways to handle exceptions: write your own parser or handle the exceptions in the IO monad . The second way is simpler because it involves a stricter version of the Get monad, in which the parser Get a taken as input to runGet results in (Either String a, ByteString). This means the first value is either a string (i.e., the exception message) or the result, while the second value is the remaining bytestring . The following is the modified version of the first example with the strict Get being used:
import qualified Data.ByteString as B
import Data.Binary.Strict.Get
import Data.Word
deserialiseHeader :: Get (Word32, Word32, Word32)
deserialiseHeader = do
  alen <- getWord32be
  plen <- getWord32be
  chksum <- getWord32be
  return (alen, plen, chksum)
main :: IO ()
main = do
  input <- B.getContents
  print $ runGet deserialiseHeader input
Note that this example requires binary-strict, which needs to be installed using cabal. The changes in the code are that it is using a strict bytestring instead of a lazy bytestring and it is importing Data.Binary.Strict.Get. If the example is run again, you will obtain:
% runhaskell /ch17/parsing.hs << EOF
heredoc> 123412341235
heredoc> EOF
(Right (825373492,825373492,825373493)," ")
Now it works correctly because the output is a Right, and a new line was added instead of being consumed by the parser. Let’s run it with the shorter input, as shown here:
% runhaskell /ch17/parsing.hs << EOF
heredoc> tooshort
heredoc> EOF
(Left "too few bytes"," ")

Now, the output is a Left instead of an exception and can be handled in the IO monad. The fail can be called inside the parser.

Operations on bits are allowed. Importing Data.Bits, the following operators can be used:

Operator

Symbol

AND

.&.

OR

.|.

XOR

`xor`

NOT

`complement`

Left shift

`shiftL`

Right shift

`shiftR`

Don’t forget that working with bits needs special attention .

Summary

In this chapter, you learned how to parse a bytestring using binary and binary-strict.

For a parser created from scratch, check out [3].

References

  1. 1.

    G. Hutton and E. Meijer, “Monadic Parsing in Haskell,” Journal of Functional Programming, 8.4: 437–444 (1998)

     
  2. 2.
     
  3. 3.
     
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.214.155