© Stefania Loredana Nita and Marius Mihailescu 2017

Stefania Loredana Nita and Marius Mihailescu, Practical Concurrent Haskell, https://doi.org/10.1007/978-1-4842-2781-7_12

12. Large-Scale Design in Haskell

Stefania Loredana Nita and Marius Mihailescu1

(1)Bucharest, Romania

There are approaches to manage the complexity of computations. We talked about some of them in previous chapters; here we will explain why they are used in large-scale design. We will also discuss new approaches and provide some examples.

The Type System

The type system is used to enforce abstractions and to simplify the interactions between the programmer and the environment. It imposes key invariants through types and ensures safety through checked exceptions (using Maybe/Either monads). The data types or data structures are not combined (Word, Int, Address). There are many useful data structures (as zippers).

Purity

The complexity of a program decreases if the state is removed. An advantage of pure functional code is scalability, due to its compositionality. Frege’s principle states that the result of a complex expression is given by the results of expressions that constitute it and the rules that are applied to combine them. Another definition of complexity is “the meaning of a (syntactically complex) whole is a function only of the meanings of its (syntactic) parts together with the manner in which these parts were combined”, provided by Francis Jeffry Pelletier in The Principle of Semantic Compositionality (available: https://link.springer.com/article/10.1007/BF00763644 ). A good practice is the model-view-controller programming style , which works in functional programming as follows: data is parsed externally in functional data structures that are pure, operations are made over the functional data structures, and in the last step, the data is rendered, flushed, or serialized. This way, the code remains as pure as possible. We discussed pure functions in Chapter 2.

Monads for Structuring

The key architectural designs are captured by monads and made into types, such that one part of the code is for accessing hardware, another part of the code is used for a session with a single user, and so on. We discussed monads in Chapter 2.

Type Classes and Existential Types

Type classes are used for abstraction. They hide the implementation behind a polymorphic interface. We discussed types in Chapter 2.

Concurrency and Parallelism

Concurrency and parallelism are useful because they allow more tasks to run at the same time, which decreases the time to obtain a result. We also discuss concurrency and parallelism in other chapters.

Use of FFI

The Foreign Function Interface (FFI) works with code from other programming languages. It is very important to be careful with data that is returned by foreign code. We discussed FFIs in Chapter 11.

The Profiler

Profiling is a technique that analyzes other elements: the space complexity and time complexity, the use of specific instructions, and the rate and duration of function calls. Program heaps and time profiles can be tracked by a profiler. It is good practice to profile the heaps to make sure that memory is used as it is needed. GHC has its own tools for profiling. We can set the option so that a program is compiled with profiling by default. Profiling has three stages: compile the program to be profiled, run the program with specific profiling modes enabled, and check the resulting statistics.

Time Profiling

First, let’s talk about time profiling. Consider an example in which we compute the mean of values from a list. To compile the program using profiling, we add the -prof flag. In addition, the profiling code needs to know which function we want to profile. This is done by adding cost centers that represent the code in our program that we want statistical information on. Some code is generated by GHC, which computes the cost of evaluation of the expression in every place. To add a cost center, use SCC pragma.

mean :: [Double] -> Double
mean xs = {-# SCC "mean" #-} sum xs / fromIntegral (length xs)

Another option for cost centers is letting the compiler add them on all functions at top levels. This could be done by compiling with the -auto-all option.

An important aspect of profiling in lazy languages is paying attention to values with no arguments, which should be computed once and their result is used later. Actually, the evaluation of these values is not made to every call, but we also need to know how expensive they are. They are called constant applicative forms (CAFs) and could be included in profiling using the -caf-all option.

Thus, the example is compiled as (-fforce-recomp option is used to forcing full recompilation ).

$ ghc -O2 --make mean.hs -prof -auto-all -caf-all -fforce-recomp
[1 of 1] Compiling Main             ( mean.hs, mean.o )
Linking A ...

Space Profiling

Next, we will examine the example from a space profiling view, in which GHC generates graphs about memory usage during the program’s lifetime. This is useful for discovering the locations in which memory is wrongly used leading to a heavy garbage collector. This is similar to time proofing (i.e., when we compile, we add -prof -auto-all -caf-all), but at execution time, the runtime system should collect some detailed statistics about heap use. This information could be broken down in many ways: through a cost center, through a module, through a constructor, or through a data type. The Haskell .hs file is profiled into an .hp file that contains raw data that is examined by the hp2ps tool, which finally generates a graphical visualization of the heap in time. In order to obtain a heap profile, we add the -hc option.

$ time ./mean 1e6 +RTS -hc -p -K100M

Different samples are retrieved at a regular time when the program is running. If we want to decrease the time between two samplings, we use -iN, where N represents the number of seconds. The more often samples are taken, the more accurate the result, but the program will be slower. Now let’s take a look at the graph , as shown in Figure 12-1.

A431532_1_En_12_Fig1_HTML.gif
Figure 12-1. The result of space profiling testing
$ hp2ps -e8in -c mean.hp

We can learn some things from the graph. The execution of the program has two stages: in the first, an increasingly large amount of memory is used for computing the sum of the values, and in the second, the values are cleaned.

QuickCheck

QuickCheck ( https://hackage.haskell.org/package/QuickCheck ) represents a library that can easily test our programs. In unit testing, particular cases are tested; while in property testing (the type of testing provided by QuickCheck ), properties are tested. All we need to do is to write specifications for the code that describe invariant properties. QuickCheck generates random samples of data with which it will test if the properties that we defined are satisfied.

For the next example (from jasani.org, Testing Haskell with QuickCheck), we need to import some libraries.

> import Control.Monad ( liftM )
> import Data.List ( intersperse )
> import Test.QuickCheck.Gen
> import Test.QuickCheck.Arbitrary
> import Test.QuickCheck.Property
> import Test.QuickCheck.Test

Next, we write two simple functions that work with file names . The first is splitFN, in which the file name is separated into the name and the extension, and joinFN, in which a name and an extension are concatenated to obtain the file name.

> splitFN_0 :: String -> (String, String)
> splitFN_0 fn =
>   let fn' = span (/= '.') . reverse $ fn
>   in case (length (fst fn') == length fn) of
>        True  -> (fn, "")
>        False -> (reverse . drop 1 $ snd fn', ('.':) . reverse . fst $ fn')
>
> joinFN_0 :: (String, String) -> String
> joinFN_0 (name, ext) = name ++ ext

A property of the two functions is roundtripping (which we named prop_filenames_are_roundtrippable), because filename = joinFN(splitFN(filename)). Next, we want to generate file names, so we created a new type and an Arbitrary instance.

> newtype Filename = FN { unFN :: String } deriving Show
>
> instance Arbitrary Filename where
>   arbitrary = do name <- elements ["foo", "bar", "baz"]
>                  ext <- listOf $ elements ['a'..'z']
>                  return (FN (name ++ "." ++ ext))
>
> prop_filenames_are_roundtrippable_0 :: Filename -> Property
> prop_filenames_are_roundtrippable_0 fnStr =
>   property $ joinFN_0 (splitFN_0 fn) == fn
>   where fn = unFN fnStr

If we execute the code, we obtain

*Main> quickCheck prop_filenames_are_roundtrippable_0
+++ OK, passed 100 tests.

The test passed and the file names were successfully created, but we don’t know how they look. If we want to see some samples, we can do the following.

*Main> sample' arbitrary :: IO [Filename]

[ FN {unFN = "baz.x"}, FN {unFN = "bar.v"}, FN {unFN = "foo.k"}, FN {unFN = "foo.s"}, FN {unFN = "baz.esra"}, FN {unFN = "baz.vkgg"}, FN {unFN = "bar.uln"}, FN {unFN = "bar.k"}, FN {unFN = "baz.crynhi"}, FN {unFN = "baz.ys"} ]

We can combine the property that we defined with collect to show the data that is used.

> prop_filenames_are_roundtrippable_1 :: Filename -> Property
> prop_filenames_are_roundtrippable_1 fnStr =
>   collect fn $
>   joinFN_0 (splitFN_0 fn) == fn
>   where fn = unFN fnStr

The following are some of the results .

*Main> quickCheck prop_filenames_are_roundtrippable_1

 1% "bar.tbgufhjxeqtfpn"
 1% "bar.rymnlegngyuzvl"
 1% "bar.ryddfkncgxdopxihkmb"
 1% "bar.rafel"
 1% "bar.qbrftss"
 1% "bar.pxwpbfovejqwiqslnrdboaluihlkjifawfiyerwwtdyuepynoejx"
 1% "bar.p"
 1% "bar.nyciiyidegiwpsxta"
 1% "bar.mtidqpnitvrseakbkppkjmqtlutkqtfuirlsmkrnsmxsvwhzhwfut"
 1% "bar.mogmmzl"

An alternative to the collect function is to classify the data.

> prop_filenames_are_roundtrippable_2 :: Filename -> Property
> prop_filenames_are_roundtrippable_2 fnStr =
>   classify (length ext == 0) "no ext" $
>   classify (length ext > 0 && length ext < 5) "normal ext" $
>   classify (length ext >= 5) "long ext" $
>   joinFN_0 (splitFN_0 fn) == fn
>   where fn = unFN fnStr
>         (name,ext) = splitFN_0 fn


*Main> quickCheck prop_filenames_are_roundtrippable_2
+++ OK, passed 100 tests:
72% long ext
21% normal ext
 7% no ext

We have seen the data that is generated, but we have not taken into consideration names like README, or foo.txt.old, or .emacs. Therefore, we will change the approach a little, writing a test generator for the property we defined.

> filenames :: Gen String
> filenames = do
>   name <- opt identifier
>   dot  <- opt (return ".")
>   ext  <- opt identifier
>   exts <- listOf identifier
>   oneof [ return $ name ++ dot ++ ext
>         , return $ name ++ "." ++ (concat . intersperse "." $ exts)]


> prop_filenames_are_roundtrippable_3 :: Property
> prop_filenames_are_roundtrippable_3 =
>   forAll filenames $ fn ->
>   joinFN_0 (splitFN_0 fn) == fn

If we ask for some sample data, we note that they are diverse.

*Main> sample' filenames

[ ".K3", ".O.Va1", "1LAi.k", "rz.t", "41R8x.", ".wu.mi1kqh8.Y7PKH6.p86.O", "", ".", "P214MM71fu.k4Ayqns0f", ".k.k9.0o2e81n.d71ijpm7gh.XMNt" ]

*Main> quickCheck prop_filenames_are_roundtrippable_3
+++ OK, passed 100 tests.

Well, the results look better. But we haven’t finish yet because we have a little problem with the file names that have no extension, such as README and .emacs. We will consider a particular generator for these types of files by adding a new property called prop_names_equal_filenames.

> noExtFilenames :: Gen String
> noExtFilenames = do
>   name <- identifier
>   dot  <- opt (return ".")
>   return ( dot ++ name )


> prop_names_equal_filenames_0 :: Property
> prop_names_equal_filenames_0 =
>   forAll noExtFilenames $ fn ->
>   let (name,ext) = splitFN_0 fn
>   in name == fn

Now if we run the test for the new property, we get the following.

*Main> quickCheck prop_names_equal_filenames_0
*** Failed! Falsifiable (after 3 tests):
".i1"


*Main> splitFN_0 ".i1"
("",".i1")

We need to reconsider the split function . Then, let’s put them in a library.

> splitFN_1 :: String -> (String, String)
> splitFN_1 fn =
>   let fn' = span (/= '.') . reverse $ fn
>   in case (length (fst fn') == length fn) of
>        True  -> (fn, "")
>        False | length (fst fn') == length fn - 1 -> (fn, "")
>              | otherwise -> (reverse . drop 1 $ snd fn'
>                             , ('.':) . reverse . fst $ fn')


> prop_names_equal_filenames_1 :: Property
> prop_names_equal_filenames_1 =
>   forAll noExtFilenames $ fn ->
>   let (name,ext) = splitFN_1 fn
>   in name == fn


> prop_filenames_are_roundtrippable_4 :: Property
> prop_filenames_are_roundtrippable_4 =
>   forAll filenames $ fn ->
>   joinFN_0 (splitFN_1 fn) == fn


> ----------------------------
> -- library functions


> iden0 :: Gen Char
> iden0 = oneof [ elements ['a'..'z'], elements ['A'..'Z']
>               , elements ['0'..'9'] ]
> idenN :: Gen String
> idenN = listOf iden0


> opt :: Gen String -> Gen String
> opt g = oneof [ g, return "" ]


> identifier :: Gen String
> identifier = iden0 >>= i0 -> idenN >>= return . (i0:)

Finally, we see how QuickCheck works . It also helps us to keep the APIs clean for our modules. An intuitive conclusion is that if the properties of the code are complicated to state, a solution is to refactor it until we get clean code.

Refactor

Refactor can be used many times in Haskell, ensuring that large-scale changes are made safety, if the types are suitable. This is proper for code-base scale. You need to pay attention to type errors: they should not happen until refactoring is complete.

The following is a little tutorial from www.schoolofhaskell.com in which we compute the sum of the even numbers from a list. This is a first version of the solution.

evenSum :: [Integer] -> Integer

evenSum l = accumSum 0 l

accumSum n l = if l == []
                  then n
                  else let x = head l
                           xs = tail l
                       in if even x
                              then accumSum (n+x) xs
                              else accumSum n xs
-- The trace of the execution
*Main> evenSum [1..5]
accumSum 0 [1,2,3,4,5]
1 is odd
accumSum 0 [2,3,4,5]
2 is even
accumSum (0+2) [3,4,5]
3 is odd
accumSum (0+2) [4,5]
2 is even
accumSum (0+2+4) [5]
5 is odd
accumSum (0+2+4) []
l == []
0+2+4
0+6
6

This code can be improved. We could generalize the type as follows.

evenSum :: Integral a => [a] -> a

Next, we can use functions like where and let.

-- Version 2
evenSum :: Integral a => [a] -> a


evenSum l = accumSum 0 l
    where accumSum n l =
            if l == []
                then n
                else let x = head l
                         xs = tail l
                     in if even x
                            then accumSum (n+x) xs
                            else accumSum n xs

Another improvement uses pattern matching and guards.

-- Version 3
evenSum l = accumSum 0 l
    where
        accumSum n [] = n
        accumSum n (x:xs) =
        | even x = accumSum(n+x)xs
        | otherwise = accumSum n xs

As you know, in Haskell, the definition of the functions could be eta-reduced by dropping arguments that appear at the end of both sides. So, the following could be a final improvement.

-- Version 4
evenSum :: Integral a => [a] -> a


evenSum = accumSum 0
    where
        accumSum n [] = n
        accumSum n (x:xs) =
             if even x
                then accumSum (n+x) xs
                else accumSum n xs

Haskell-tools is an automatic tool for refactoring in Haskell ( www.haskelltools.org ). It supports rename, generate type signature, generate exports, organize exports, extract binding, and inline binding. There is a simple demo on the official site.

Summary

In this chapter, you saw

  • large-scale design techniques in Haskell.

  • how time and space are profiled in Haskell.

  • an example QuickCheck test.

  • an easy example of refactoring in Haskell.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.101.72