© Alejandro Serrano Mena 2019
Alejandro Serrano MenaPractical Haskellhttps://doi.org/10.1007/978-1-4842-4480-7_2

2. Declaring the Data Model

Alejandro Serrano Mena1 
(1)
Utrecht, The Netherlands
 

You already know how to get a working installation of the Haskell Platform. The next step toward your Time Machine Store is to create the initial set of values and functions that will represent the data in the system: clients, machines, and orders.

This chapter will give you the basic ingredients for creating these values and functions. In a first approximation, you will create functions operating on basic types. You already know numbers, and you will add lists and tuples to the mix. Afterward, you will see how to create your own algebraic data types (ADTs) to better represent the kind of values you are interested in here. As part of this, you will learn about pattern matching, a powerful idiom to write concise code that follows closely the shape of the types.

Sometimes ADTs and pattern matching lead to code that’s not clear enough. Records introduce some syntactic forms that make values easier to create and modify, and they are a well-known tool of Haskell programmers. In addition, you will look at two design patterns that are common in Haskell libraries, namely, smart constructors and default values.

This chapter will also introduce how to manage projects using Cabal and Stack. In particular, you will see how to create a new project using both systems, along with the usual structure in folders, and how to load the code into the GHC interpreter to interactively test it.

Characters, Numbers, and Lists

Characters and numbers are universally accepted as the most basic kind of values that a language should provide to programmers. Haskell follows this tradition and offers dedicated character and number types that will be introduced in this section. Afterward, you will see how to put together several of these values to create strings or lists of numbers, as well as the basic operations you can perform on any kind of list.

Characters

In some programming languages, numbers are also used to represent characters, usually in some encoding such as ASCII or Unicode. But following its tradition of clearly separating different concerns of a value, Haskell has a special type called Char for representing character data. To prevent problems with locales and languages, a Char value contains one Unicode character. These values can be created in two ways.
  • Writing the character itself between single quotes, like 'a'.

  • Writing the code point, that is, the numeric value which represents the character as defined in the Unicode standard, in decimal between ' and ' or in hexadecimal between 'x and '. For example, the same 'a' character can be written as '97' or 'x61'.

Using GHCi, you can check the actual type of each expression you introduce in the system. To do so, you use the :t command, followed by the expression. Let’s check that characters indeed are characters.
Prelude> :t 'a'
'a' :: Char
Let’s now explore some of the functionality that Haskell provides for Chars. Only a few functions are loaded by default, so let’s import a module with a lot more functions, in this case Data.Char.
Prelude> import Data.Char
Prelude Data.Char>
The prompt of the interpreter changes to reflect the fact that now two different modules are loaded. Furthermore, if you now write to and press Tab, you will see a greater number of functions than before. In Haskell, everything has its own type, so let’s try to find out toUpper’s type.
Prelude Data.Char> :t toUpper
toUpper :: Char -> Char
The arrow syntax (shown as ->) is used to specify types of functions. In this case, toUpper is a function taking a character (the Char on the left side) and returning another one (because of the Char on the right side). Of course, types don’t have to be equal. For example, chr takes an integer and gives the character corresponding to that code point.
Prelude Data.Char> chr 97
'a'
Prelude Data.Char> :t chr
chr :: Int -> Char
For functions with more than one parameter, each argument type is separated from the next with a single arrow. For example, if you had a min function taking two integers and returning the smallest one, the type would be as follows:
min :: Integer -> Integer -> Integer
I mentioned in the previous chapter that Haskell is very strict at checking types. You can indeed verify this: if you try to apply the chr function to a character, the interpreter refuses to continue.
Prelude Data.Char> chr 'a'
<interactive>:7:5:
    Couldn't match expected type `Int' with actual type `Char'
    In the first argument of `chr', namely 'a'
    In the expression: chr 'a'
    In an equation for `it': it = chr 'a'

Numbers

In Chapter 1 you may have noticed that several kinds of numeric constants were used. Like most programming languages, Haskell supports a great variety of number types, depending on the width, precision, and support for decimal parts.
  • Int is the bounded integer type. It supports values between at least ±536870911, which corresponds to 229-1 (even though GHC uses a much wider range). Usually, values of the Int type have the native width of the architecture, which makes them the fastest.

  • Integer is an unbounded integral type. It can represent any value without a decimal part without underflow or overflow. This property makes it useful for writing code without caring about bounds, but it comes at the price of speed.

  • The Haskell base library also bundles exact rational numbers using the Ratio type. Rational values are created using n % m.

  • Float and Double are floating-point types of single and double precision, respectively.

Haskell is strict with the types. If you need to convert between different numeric representations, the functions fromInteger, toInteger, fromRational, and toRational will help you deal with conversions. For example, you can switch between rational and floating-point representations of values. The toRational function tries to create a Ratio not far from the original value (this depends on its width), and you can move from rational to floating-point by dividing the numerator by the denominator of the ratio. Be aware that many of these functions are found in the Data.Ratio module, so you should import it first.
Prelude> import Data.Ratio
Prelude Data.Ratio> 1 % 2 + 1 % 3
5 % 6
Prelude Data.Ratio> toRational 1.3
5854679515581645 % 4503599627370496
Prelude Data.Ratio> toRational (fromRational (13 % 10))
5854679515581645 % 4503599627370496
As you can see from the examples, perfect round-tripping between rational and floating-point values is not always possible. You may also get a puzzling result if you try to find the type of numeric constants.
Prelude> :t 5
5 :: Num a => a
Prelude> :t 3.4
3.4 :: Fractional a => a

Instead of making a numeric constant of a specific type, Haskell has a clever solution for supporting constants for different types: they are called polymorphic. For example, 5 is a constant that can be used for creating values of every type supporting the Num type class (which includes all types introduced before). On the other hand, 3.4 can be used for creating values of any type that is Fractional (which includes Float and Double but not Int or Integer). You will read in detail about type classes in Chapter 4, but right now you can think of a type class as a way to group sets of types that support the same operations. They share many commonalities with interfaces commonly found in object-oriented languages, and are close relatives of Scala’s traits and Swift’s protocols.

Caution

Since Haskell doesn’t use parentheses in function invocations, that is, you write f a b instead of f(a,b), you must be a bit more careful than usual when using negative numbers. For example, if you write atan -4 in GHCi, you will get an error indicating

Non type-variable argument in the constraint (Num (a -> a))

This means it has interpreted that you are trying to compute the subtraction of atan and 4. To get the arctangent of -4, you should instead write atan (-4).

Strings

After playing for some time with characters, you may wonder whether you can have a bunch of them together, forming what is commonly known as a string. The syntax for strings in Haskell is similar to C: you wrap letters in double quotes. The following code creates a string. If you ask the interpreter its type, what do you expect to get back?
Prelude Data.Char> :t "Hello world!"
"Hello world!" :: [Char]

Instead of some new type, like String, you see your old friend Char but wrapped in square brackets. Those brackets indicate that "Hello world!" is not a character but a list of characters. In general, given a type T, the notation [T] refers to the type of all lists whose elements are of that type T. Lists are the most used data structure in functional programming. The fact that a type like a list depends on other types is known as parametric polymorphism, and you will delve into the details of it in the next chapter. Right now, let’s focus on the practical side.

Lists

List literals (i.e., lists whose values are explicitly set into the program code) are written with commas separating each of the elements, while wrapping everything between square brackets. As I have said, there’s also special string syntax for a list of characters. Let’s look at the types of some of these literals and the functions reverse, which gives a list in reverse order, and (++), which concatenates two lists.
Prelude> :t [1,2,3]
[1, 2, 3] :: Num t => [t]
Prelude> :t reverse
reverse :: [a] -> [a]
Prelude> :t (++)
(++) :: [a] -> [a] -> [a]
Prelude> reverse [1,2,3]
[3,2,1]
Prelude> reverse "abc"
"cba"
Prelude> [1,2,3] ++ [4,5,6]
[1,2,3,4,5,6]

Notice from this example that there are functions, such as reverse and (++), that can operate on any kind of list. This means once you know them, you can apply your knowledge of them to any list (including strings of characters). To tell this fact, these functions show in its type a type variable. It is a variable because it can be replaced by any type because regular variables can take different values. Type variables must be written in code starting with lowercase letters, and they consist usually of one or two letters. Here, the type variable is shown as a.

Note

Functions whose names are built entirely by symbols, like ++, must be called using the so-called infix syntax. That is, they should be written between the arguments instead of in front of them. So, you write a ++ b, not ++ a b. In the case where you want to use the function in the normal fashion, you must use parentheses around its name. So, you can write (++) a b, meaning the same as a ++ b.

Lists in Haskell are homogeneous: each list can handle elements of only a single type. Because of that, you are forbidden to create a list containing integers and characters and also to concatenate two lists with different kinds of elements.
Prelude> [1,2,3,'a','b','c']
<interactive>:13:2:
    No instance for (Num Char) arising from the literal `1'
Prelude> "abc" ++ [1,2,3]
<interactive>:11:11:
    No instance for (Num Char) arising from the literal `1'
Like in most functional languages, lists in Haskell are linked lists. Such lists are composed of a series of cells that hold the values in a list and a reference to the next cell and a special marker for the end of the list. The basic operations to construct lists are [] (pronounced “nil”) to create an empty list and (:) (pronounced “cons”) to append an element to an already existing list. That is, elt:lst is the list resulting from putting the value elt in front of the list lst. So, list literals can also be written as follows:
Prelude> 1 : 2 : 3 : []
[1,2,3]
Prelude> 'a' : 'b' : 'c' : []
"abc"

Note how GHCi writes back the lists using the most common representation using brackets. In the case of lists of characters, it uses string notation.

The functions that get information about the shape and the contents of the list are null, to check whether a list is empty; head, to get the first element; and tail, to get the list without that first element, also known as the rest of the list. Here are some examples of applying these functions:
Prelude> null [1,2,3]
False
Prelude> null []
True
Prelude> head [1,2,3]
1
Prelude> tail [1,2,3]
[2,3]
Prelude> head []
*** Exception: Prelude.head: empty list
Figure 2-1 shows a graphical representation of the operators and functions on lists I have talked about. The (:) operator is used to bind together an element with the rest of the list, and you can split those elements apart again using head and tail. You can also see how a list is a series of cons operations that always end with the empty list constructor, [].
../images/316945_2_En_2_Chapter/316945_2_En_2_Fig1_HTML.jpg
Figure 2-1

Graphical representation of list constructors and destructors

If you try to get the head or the tail of an empty list, you get an error, as you may expect. Be aware that exceptions are not the preferred way to handle errors in Haskell (you will see why in more detail in subsequent chapters) and by default make the entire program crash when found. To prevent errors from operations on empty lists, just be sure to check for nonemptiness before applying functions such as head and tail (or even better, use pattern matching, which will be introduced shortly).

In fact, looking at the output of null, you may have noticed two new values I talked about before: True and False. These are the only two elements of the Bool type, which represent Boolean values. Several standard functions for combining these two values (and (&&), or (||) and not) are provided in the Prelude. Most programming languages originating from C, such as C++ and Java, inherit from the former two kinds of Boolean operators. You’ll find long-circuiting (& and |) operators, which always evaluate both sides of the expression, and short-circuiting (&& and ||) operators, which may stop after evaluating only one side. In Haskell, because of its lazy evaluation model, these operators always perform their job in the short-circuiting manner. Apart from that, there exist and and or functions that take a list of Booleans and perform the operations.
Prelude> (True && False) || (False && not False)
False
Prelude> or [True, False, and [False, True, True]]
True
Prelude> (2 == 2.1) || (2 < 2.1) || (2 > 2.1)
True

Caution

The usual warnings about comparing floating-point values apply here. Computers are not able to represent with exact precision all the values, so you may find that equalities that you expect not to hold actually do. For example, in my system the expression (4.00000000000000003 - 4) == 0 evaluates to True.

Along with these functions, another important construction related to Booleans is if-then-else. An expression with the form if b then t else f evaluates to t if the value of b is True, and it evaluates to f otherwise. This structure looks similar to the one found in imperative languages but has these important differences:
  • Both then and else branches must be present along with the if. If this were not the case, then the expression wouldn’t be evaluable for some of the values of b. Other languages opt to return a default value for the nonexistent else, but Haskell makes no commitment.

  • The entire expression must have a defined type. The way Haskell manages to ensure that is by forcing both t and f expressions to have the same type. Thus, an expression such as if True then 1 else "hello" won’t be accepted by either the compiler or the interpreter.

To make real use of if expressions, you need functions that return type Bool. This includes the comparison functions between numbers: == (equality), /= (inequality, but be aware that this function has a different name than in C and Java, where it’s called !=), >= (greater than or equal to), > (greater than), <= (less than or equal to), and < (less than). The following is an example of an if expression:
Prelude> if 3 < 4.5 then "3 is less than 4.5" else "3 is not less than 4.5"
"3 is less than 4.5"
Let’s make the interpreter return the head of a list of strings if it is not empty or return "empty" otherwise.
Prelude> :{
Prelude| if not (null ["hello","hola"])
Prelude| then (head ["hello","hola"]) else "empty"
Prelude| :}
"hello"
Prelude> if not (null []) then (head []) else "empty"
"empty"
Lists can contain other lists as elements (or to any level of nesting). As [T] are lists of type T, lists of lists would be [[T]]. The inner lists inside the outer lists need not be of the same length (so they are not equivalent to arrays of multiple dimensions). One important thing to remember is that an empty list can be a member of a larger list of lists, so [] and [[]] are not equivalent. The first is a completely empty list of lists, whereas the second is a list that contains only one element, which is an empty list.
Prelude> :t [['a','b','c'],['d','e']]
["abc","de"] :: [[Char]]
Prelude> head [['a','b','c'],['d','e']]
"abc"
Prelude> head (head [['a','b','c'],['d','e']])
'a'
Prelude> head [[]]
[]

For sure you have become bored while typing more than once the same constant list in the interpreter. To overcome this, you will learn about the essential ways to reuse functionality across all programming languages: defining functions that work on different input values and creating temporal bindings. But before that, Exercise 2-1includes some tasks to see whether you have understood the concepts up to this point.

Exercise 2-1. Lists Of Lists

I have covered a lot of material about the most basic types and expressions in Haskell. The following tasks exercise the knowledge you have gained so far. In all cases, the solutions are expressions that can be typed in the interpreter to check whether they work.
  • Rewrite the previous list literals using only (:) and the empty list constructor, [].

  • Write an expression that checks whether a list is empty, [], or its first element is empty, like [[],['a','b']].

  • Write an expression that checks whether a list has only one element. It should return True for ['a'] and False for [] or ['a','b'].

  • Write an expression that concatenates two lists given inside another list. For example, it should return "abcde" for ["abc","de"].

Use GHCi to check that those expressions work as required.

Creating a New Project

You can create a new project through Cabal and Stack, the main tools for packaging and building systems for Haskell projects. The advantage of using those tools is that they have been especially tailored for Haskell and its package repository, Hackage. In addition, the Cabal description file saves interesting metadata about the project, such as its name, maintainer, and license. In this section you will see how to use both Cabal and Stack from the command line. Feel free to change between them because the project structures are fully compatible.

Creating a Project with Cabal

If you want to create a project using the command line, the first thing to do is to create the folder where the files will reside, usually named the same as the package name. Then move inside the folder in the shell (usually by issuing a series of cd commands) and run cabal init. You will need to answer some questions, as shown here:
$ cd path/to/my/haskell/projects
$ mkdir chapter2
$ cd chapter2
$ cabal init
Package name? [default: chapter2]
Package version? [default: 0.1.0.0] 0.0.1
Please choose a license:
   ...
Your choice? [default: (none)]
Author name? Alejandro Serrano
Maintainer email? [email protected]
Project homepage URL? http://my-web-page.com
Project synopsis? Project example for Chapter 2
Project category:
 * 1) (none)
   ...
Your choice? [default: (none)]
What does the package build:
   1) Library
   2) Executable
   3) Library and Executable
Your choice? 1
Source directory:
 * 1) (none)
   2) src
   3) Other (specify)
Your choice? [default: (none)] 2
... -- More not very interesting questions
Include documentation on what each field means (y/n)? [default: n]

Note

You might receive a warning about cabal update. Don’t worry, we will download a list of packages shortly, after I introduce how to add dependencies to a Cabal project.

The most important answers to give are the package name and whether you want to create a library or an executable, because what you create affects the name and structure of the project file. The essential difference between a library and an executable project is whether a final program will be produced (in the latter case) or the code is just for consuming other libraries or executables. Right now, it does not matter which one you choose because you will be testing the code using the GHC interpreter. Furthermore, you can refine the project later to add more library or executable descriptions.

Because having all the files in the root of the project makes them difficult to manage, it’s customary to create a folder to hold all the source files of a project, as it is done in other build tools such as Maven for Java. I strongly recommend placing your files in a src folder, as shown in the project initialization above.

Creating a Project with Stack

The creation of a new project in Stack follows a very similar structure. In contrast to Cabal, you do not have to create the project folder before issuing the corresponding command, and instead of init you use new:
$ cd path/to/my/haskell/projects
$ stack new chapter2
Downloading template "new-template" to create project "chapter2" in chapter2...
...
Downloaded lts-12.18 build plan
...
Updating package index Hackage
Update complete
Populated index cache.
Matches lts-12.18
Selected resolver: lts-12.18
Initialising configuration using resolver: lts-12.18
Total number of user packages considered: 1
Writing configuration to file: chapter2stack.yaml
All done.

Stack asks much fewer questions. It is your further responsibility to change the author name, maintainer e-mail, and subsequent fields to the correct value.

There is another possibility to initialize a project using Stack. If you already have a Cabal file, maybe because you have created it previously, you can accommodate it for using Stack by running the command stack init. The only visible difference is the creation of a stack.yaml file in the root of the project.

Exercise 2-2. Your First Project

Create a new library project called chapter2 using either of the methods explained so far.

When doing Exercise 2-2, a pair of files named Setup.hs and chapter2.cabal will be created in the folder. The file Setup.hs is not useful, so you will focus on the .cabal file you have just created. The name of this file always coincides with the name of the package you are developing.

A Cabal project file is composed of a series of package properties followed by several blocks of code, called stanzas in Cabal terminology, that define the components (libraries and executables) to be built, the source files making each of them, and the options for compilation (such as flags or enabled extensions). If you are familiar with the JSON format or with Python code, you will find Cabal syntax comfortable to read and interpret. The following are the two important rules of interpretation:
  • Each property is given a value in the form name: value. The name is case-insensitive (it doesn’t matter whether you write name, Name, or nAmE), and the value is written without any kind of quotes or marks. If the value is a list, the elements are separated by commas.

  • Stanzas begin with a header, usually library or executable, followed by an application name. Be aware that there is no colon (:) after the header. All properties within the stanza must be indented an equal number of spaces or tabs.

For example, here is an extract of a possible Cabal file created after initializing a project as required by Exercise 2-2:
name:           chapter2
version:        0.1
cabal-version:  >=1.2
build-type:     Simple
author:         John Doe
library
  hs-source-dirs:  src
  build-depends:   base >= 4
  ghc-options:     -Wall

Understanding Modules

You build Haskell projects by writing what are termed modules. Each module contains a set of definitions, such as functions and data types, and groups them under a common umbrella. The names of modules are nested in a hierarchical fashion. For example, inside Data there are a bunch of different modules, like Data.Bool, Data.Ratio, and so forth. This nesting makes modules similar to packages in Java or to namespaces in C#.

You define each module in its own file. The file name should be equal to the last component of the module name (the part after the last dot) and must be nested in folders named like the rest of the components. For example, you would create a module named Chapter2.Section2.Example in the path Chapter2/Section2/Example.hs. At the source directory of your project (which is src is you have followed the instructions above), create a folder named Chapter2. Inside it, create another folder named Section2. Finally, inside Section2 create the Example.hs file.

Changing The Source Directory

You can always choose another source directory by adding a property
library
  hs-source-dirs:  src

to each of the stanzas in the Cabal file. In fact, you can use different source folder for each stanza, which helps us keeping files from libraries, executables, and tests apart.

Always begin a module file with a module declaration giving its name. For example, you would begin the Example.hs module just mentioned by writing the following line:
module Chapter2.Section2.Example where

Then, you can start writing the definitions for that module.

To tell Cabal to compile a module file, you must include that module in some stanza. To do so, include a new property under the stanza adding the module either to the exposed-modules property or to the other-modules property (the difference is that when using your library from another project, only exposed modules will be available; the others remain as internal). Here’s an example:
library
  exposed-modules: Chapter2.Section2.Example
  -- or
  other-modules:   Chapter2.Section2.Example

If you are using the command line, you can now compile the project by running cabal new-configure and then cabal new-build, or stack setup and then stack build, depending on your choice of tool. At this point you shouldn’t encounter any compiling errors.

New- Commands in Cabal

At the moment of writing, Cabal is undergoing an internal reorganization. For that reason, it keeps two sets of commands: those starting with the new- prefix (like new-build), and the older ones which do not start like that (e.g., build). Whenever possible, use the former set of commands, because it provides several benefits such as automatic sandboxing of projects.

In summary, to add a new module to your project, you follow these steps:
  1. 1.

    Choose a name for the module, for example A.B.C.

     
  2. 2.

    Create a folder for each component of its name but the last one, in this case a folder A inside a folder B.

     
  3. 3.

    Create a file with the same name of the last component ending in .hs (here C.hs) and write the module declaration you saw earlier.

     
  4. 4.

    Tell Cabal to include the file in your project.

    Note From now on, create a new project for each chapter in the book. Create a new module or set of modules for each section. This convention will help keep your work organized.

     

Cabal and Stack

The Haskell ecosystem has not one but two tools for building projects and managing their dependencies. A fair question to ask is what the differences between them are. In general, Stack is focused on having reproducible builds, whereas Cabal encompasses many more usage scenarios.

The first point of divergence between the two tools is that Stack manages your Haskell installation (including the compiler), whereas Cabal does not. Each Stack project comes with a stack.yaml file in addition to the .cabal one which declares which version of the compiler is targeted. If that specific version is not present in the system, Stack would download and install it in a local directory.

The other main difference is the source of the dependencies declared by each project. Cabal by default uses Hackage, the community-maintained repository of packages. This provides access to every single package in the Haskell ecosystem, but there is no guarantee that a specific combination of packages will work (or even compile) together.

Stack, on the other hand, targets Stackage by default. In Stackage, packages are grouped as resolvers, which specify not only an available set of packages, but also their specific versions. Each of those sets is known to compile together in a specific version of the compiler. Thus, by declaring that your project uses a certain resolver, you are fixing the version of every tool and package, leading to reproducible builds. The downside is that Stackage provides a smaller set of packages than Hackage, although there are ways to declare that some dependency ought to be obtained from the bigger brother.

If you are in doubt of which tool to use, don’t worry and start with any. As I discussed above, both share the same package description format, so changing from one to the other is fairly easy.

Defining Simple Functions

Now you are going to start creating functions in a module file. Function declarations include the following:
  • A name, which in Haskell always starts with a lowercase letter

  • The list of parameters, each of which must also begin with a lowercase letter, separated from the rest by spaces (not by commas, like in most languages) and not surrounded by parentheses

  • An = sign and the body of the function

Creating a Simple Function

Let’s try to abstract the last function created in the earlier section “List operations.” Given a list of strings, that function returns either the first string in the list or the string "empty" if there is nothing in the list. You can reuse most of the expression, replacing the constant lists by the parameter name.
firstOrEmpty lst = if not (null lst) then head lst else "empty"
To test the function, first create a new module Chapter2.SimpleFunctions for holding it. Then, load the file in the interpreter by issuing the command :l followed by the entire path to the file. Afterward, you can call firstOrEmpty directly.
Prelude> :l src/Chapter2/SimpleFunctions.hs
[1 of 1] Compiling Chapter2.SimpleFunctions ( src/Chapter2/SimpleFunctions.hs, interpreted )
Warning: Top-level binding with no type signature:
      firstOrEmpty :: [[Char]] -> [Char]
Ok, modules loaded: Chapter2.SimpleFunctions.
*Chapter2.SimpleFunctions> firstOrEmpty []
"empty"
*Chapter2.SimpleFunctions> firstOrEmpty ["hello","hola"]
"hello"

You surely have noticed that loading the file has resulted in a warning. This warning tells you that you have given no type signature, that is, that you haven’t specified the type of the function.

Specifying the Function’s Type

I emphasized in Chapter 1 that Haskell is a strong, statically typed language, and now you are writing functions without any kind of type annotation. How is this possible? The answer is in the same warning message: you didn’t tell anything to Haskell, and it inferred the correct type for the function. Type inference ( i.e., the automatic determination of the type of each expression based on the functions and syntax construct being used) is a key point that makes a strong type system such as Haskell’s still manageable to developers. This is a big contrast with other programming languages, such as Java and C#, which until their last revisions asked developers to write the types of all variables in the code.

However, it’s not considered good practice to leave a function definition without an annotation about its type. That’s the reason why a warning shows up even when the interpreter was able to realize the type of the function. The way to solve this is by adding a type signature: the name of the function being defined followed by :: and its type. Type signatures are conventionally added just before the definition of the element being typed. Being reminded that function types are written using ->, you can see that the type signature for firstOrEmpty is as follows:
firstOrEmpty :: [[Char]] -> [Char]
firstOrEmpty lst = if not (null lst) then head lst else "empty"

Developing a Robust Example

Now you’ll try to define your own functions for concatenating and reversing a list, which you will call (+++) and reverse2, respectively. A general way to define functions over lists (and most of the other data structures) in Haskell is by using recursion. In this case, defining a function by recursion boils down to considering these two general cases:
  • What to do when the list is empty

  • What to do when the list has some initial element and some tail

The basic skeleton is the same in both cases:
if null list
then <case for empty list>
else <do something with (head list) and (tail list)>
Let’s start with the concatenation function. First, because of its symbolic name of (+++), you have to write the name infix. So, in the definition, you will write the following:
lst1 +++ lst2
Remember the two general cases from the earlier list. Now that you are implementing a specific function, those cases can be stated in more specific terms.
  • When concatenating an empty list with any other list, just return the second list because the first one adds no elements.

  • When having a nonempty list and appending it to a second list, you have to think about what to do with the head and tail of the first list. Using recursion, you can call (+++) to append the tail of the first list and the second one. The return value from this call will be the list you need, but without the first element. To solve this problem, you can just plug the head of the first list using the (:) operator.

When this definition is translated into code, the result is as follows:
lst1 +++ lst2 = if null lst1  {- check emptyness -}
                then lst2      -- base case
                else (head lst1) : (tail lst1 +++ lst2)

This example also showcases for the first time the use of comments in Haskell code. Comments can include any kind of text and are completely ignored by both the interpreter and the compiler (although some tools like Haddock get information from the comments). As in many programming languages, there are two kinds of comments in Haskell. The first one is a multiline comment, which spans from {- to the nearest -}. Multiline comments are not affected by carriage returns like single-line comments are. Single-line comments span from -- to the first newline symbol found in the source code.

If you have problems understanding this recursive definition, I encourage you to try applying it to some small lists. For example, the following are the steps when evaluating [1, 2] +++ [3, 4]:
  • The initial expression comes in as [1,2] +++ [3,4].

  • It evaluates recursively to 1:([2] +++ [3,4]).

  • That evaluates recursively to 1:(2:([] +++ [3,4])).

  • The first list is now empty, so the recursion ends by returning lst2 with 1:(2:[3,4]).

  • The colon operators simply append list items. Thus, 2:[3,4] evaluates to [2,3,4], and so forth.

  • The final result is [1,2,3,4].

From now on, you will go through traces of execution often. To make the examples more concise, the book will use the convention of showing the steps separated by the => symbol. Here’s what that looks like for the previous example:
[1,2] +++ [3,4] => 1:([2] +++ [3,4]) => 1:(2:([] +++ [3,4]))
                => 1:(2:[3,4]) = [1,2,3,4]
Now let’s move on to the reverse2 function. Once again you will follow the methodology of separating the work by the possible cases of a list to be passed as input. Reversing an empty list is quite easy: you simply return an empty list. To reverse a list with some number of elements, you could take the following approach:
  1. 1.

    Reverse the tail of the list.

     
  2. 2.

    Concatenate the head of the list to the end of the reversed tail.

     

The recursion occurs in step 1. Reversing the tail of a list means to reverse a list that is shorter by one element than the original input list. That shorter-by-one list is passed to the reversal function, creating yet another list, shorter by one more element. This process continues until the tail becomes empty.

Since you have no direct way to add elements at the end of a list, you will use the (+++) function just defined to concatenate a list with a single element. The result in this case is as follows:
reverse2 list = if null list
                then []
                else reverse2 (tail list) +++ [head list]

I mentioned in Chapter 1 that a useful feature of the Haskell ecosystem is the ability to interactively test functions. Exercise 2-3 describes the steps you should follow for the functions in this section.

Exercise 2-3. Testing Functions

Load the file where you defined the functions into GHCi and call them with different arguments to test them. Based on the warnings that appear, add type signatures to your code.

Returning More Than One Value

You are moving toward defining larger functions. The next one will compute the maximum and minimum of a list of numbers. The first question you may have is, how can I return more than one value in a function? In other programming languages, doing so would require defining some kind of structure or data type to hold the result. Doing this is a valid approach in Haskell, but for easy cases like this one you can use a built-in type called the tuple. A tuple is just a type with a fixed number of components, each of them holding a value, not necessarily of the same type. Tuple values are written between parentheses and separated by commas, and the same notation is used for tuple types. For example, the following code creates a tuple with two elements; the first one is just the string "hello", and the second one is the result of evaluating a numeric condition:
Prelude> :t ("hello", True, if 2 > 3 then 'a' else 'b')
("hello", True, if 2 > 3 then 'a' else 'b') :: ([Char], Bool, Char)

Warning

Tuple types of different lengths are completely different types. For example, a function working on tuples in the form (a,b) cannot be applied to tuples such as (a,b,c) that have some other number of values.

Right now you will work only with pairs, that is, tuples of two components. For those tuples, there are two destructor functions: fst gives the first component, and snd gives the second one. Now you have all the ingredients to create a function computing both a maximum and a minimum of a list. If you forget for now the case of empty lists that don’t have a well-defined maximum or minimum, you can proceed again by cases. The first case is the list with a single element, and that element should be returned as both the maximum and the minimum and thus in both components of the tuple. If the list has more than one element, you can get the maximum and minimum of the tail of the list and then compare those values with the head. Thus, the recursive solution looks as follows:
maxmin list = if null (tail list)
              then (head list, head list)
              else ( if (head list) > fst (maxmin (tail list))
                     then head list
                     else fst (maxmin (tail list))
                   , if (head list) < snd (maxmin (tail list))
                     then head list
                     else snd (maxmin (tail list))
                   )
Wow! Somehow a function for such an easy task has become completely incomprehensible and unmaintainable: the code is full of repetition, and even worse, maxmin (tail list) is recomputed four times per recursive call, which is not very performant. The solution is to use a local binding, which gives a name to an expression to be used in a larger one. There are two kinds of binding constructs in Haskell: let and where . In both cases, a binding is introduced by name = expression. The difference lies in the position over the main expression: let introduces bindings before the main expression and must end with the in keyword. On the other hand, where does so after the expression. The following code rewrites the previous code by using local bindings to refer to the head of the list and the return values of the recursive case:
maxmin list = let h = head list
              in if null (tail list)
                 then (h, h)
                 else ( if h > t_max then h else t_max
                      , if h < t_min then h else t_min )
                      where t = maxmin (tail list)
                            t_max = fst t
                            t_min = snd t
The special position of the code in all of these examples is not random or just aesthetic, as you have noticed if you’ve tried to copy the code by hand into an editor. A first guess about the reason may lead you to think about indentation-sensitive languages such as Python. However, Haskell uses a different solution, called layout. In a layout-based syntax, how a line is indented isn’t as important as the fact that all elements in the same block start in the same column. Here’s an example:
  • In an if block, the lines for then and else must be indented the same way.

  • In a let or a where block, all local bindings must start in the same position.

    Note When reading Haskell code, you will notice that Haskellers also tend to align other symbols, like the = signs in a local bindings block. The layout rule applies only to the beginning of expressions, so alignment is not enforced. However, it’s a common convention that you should follow or at least get used to.

As a final remark, Haskell also allows you to group blocks with { and } and separate expressions with ;. For example, you can rewrite the last where clause in the example as follows:
where { t = maxmin (tail list) ; t_max = fst t ; t_min = snd t }

Be aware that this kind of syntax is highly discouraged when writing new code (it is typically used in cases where Haskell code is produced automatically by some other program).

Working with Data Types

Haskell provides tuples to group a fixed number of components of different types and lists to contain an unlimited number of elements of a homogeneous type. It seems that this is enough to start modeling the data for the web application. For example, a client named Paul, age 25 and buyer of two time machines, could be represented as follows:
("Paul", 25, ["Super Time Machine 2013", "Medieval Machine"])
There are two problems with using this approach. First, code is difficult to read because of nested calls to fst, snd, and head. Second, it defies strong typing because the compiler cannot distinguish a client from, say, the description of a fish with its common name, its length, and a list of seas where it is found. The solution is to introduce a new data type specific for representing clients. The most basic kind of data type that you can create in Haskell is called an algebraic data type (ADT ) and will be the focus of this section. An ADT is defined by two pieces of data.
  • A name for the type that will be used to represent its values.

  • A set of constructors that will be used to create new values. These constructors may have arguments that hold values of the specified types.

In many languages, different constructors can be defined for a data type (or a class, if you are working on an object-oriented language). However, these constructors are somehow linked and tend to be more like shortcuts for default values. In most functional languages, such as Haskell, different constructors are used to represent completely different alternatives to construct values.

To make these ideas clear, let’s begin modeling clients. There are three kinds of clients, listed here:
  1. 1.

    Government organizations, which are known by their name

     
  2. 2.

    Companies, for which you need to record a name, an identification number, a contact person, and that person’s position within the company hierarchy

     
  3. 3.

    Individual clients, known by their name, surname, and whether they want to receive further information about offers and discounts

     
The way to represent these three client types in Haskell is as follows:
data Client = GovOrg     String
            | Company    String Integer String String
            | Individual String String Bool

As you can see, the syntax for declaring data types starts with the data keyword, followed by the type name. After that, constructors are listed, separated by |. Each of them starts with a constructor name and then the types of the arguments to that constructor.

Capitalization in Haskell

One of the special characteristics of Haskell syntax is that names given by the user must follow some capitalization rules. Here is a brief summary of the conventions:
  • Functions, parameters, and bindings must start with a lowercase letter. In the case of an operator name, it must not start with :.

  • Types, constructors, type classes, and kinds must start with an uppercase letter. If using an operator name, it must start with the : symbol.

These rules make it easier to determine the kind of element you are looking at.

Using constructors, you can create values of type Client by just writing the constructor name and the value for each of the parameters in the order in which they appear in the declaration.
*Chapter2.DataTypes> :t GovOrg "Nasa"
GovOrg "Nasa" :: Client
*Chapter2.DataTypes> :t Company "Pear Inc." 342 "Mr. Sparrow" "CEO"
Company "Pear Inc." 342 "Mr. Sparrow" "CEO" :: Client
But when you try to print the values, something goes wrong.
*Chapter2.DataTypes> Individual "Jack" "Smith" True
    No instance for (Show Client) arising from a use of `print'
    Possible fix: add an instance declaration for (Show Client)
    In a stmt of an interactive GHCi command: print it
To show the values on the screen, the interpreter internally calls a print function over them. However, you haven’t written the corresponding code for this data type, so an error arises. To fix this problem, you can use a facility in Haskell called automatic deriving that allows you to add some functionality to an ADT without writing any code. In this case, you want to be able to get a string representation of the values, so you need to derive Show. Show is a type class: implementing it means that there’s a way to get a string out of any value of this type. You can write the code yourself, or you can allow Haskell to write it for you. The following example specifies deriving Show, causing Haskell to generate it automatically:
data Client = GovOrg String
            | Company    String Integer String String
            | Individual String String Bool
            deriving Show
Now the interpreter can display the values on the screen.
*Chapter2.DataTypes> Individual "Jack" "Smith" True
Individual "Jack" "Smith" True
There’s no impediment when using one ADT that you define inside another one. For example, in the previous code, there are some divergent options for representing a person as a member of a company and as an individual. One path you can take is to define a completely new data type called Person and use it inside Client.
data Client = GovOrg     String
            | Company    String Integer Person String
            | Individual Person Bool
            deriving Show
data Person = Person String String
            deriving Show
Here are some key points regarding this refactoring:
  • If you tried to create a completely new ADT, for example, named Client2, but you used the same constructor names, you would get a build error. This is because inside a module all constructors must have different names. If you think about it, it’s sensible to ask for that condition because otherwise the compiler wouldn’t be able to distinguish which type you are trying to create.

  • Data types and constructor names live in different worlds. That means it is possible to create a constructor with the same name as a data type. Indeed, it’s a common convention for one-alternative types, such as Person, to have two names that coincide.

  • To be able to use the default deriving functionality, all types used inside another one must be showable. For example, if you didn’t include deriving Show in Person, a compilation error would be signaled.

Sometimes you are just interested in the alternatives themselves, without saving any extra information apart from the constructors. For example, you could add gender information for people. Instead of using a raw Boolean value, for which you can forget which value corresponds to men and which to women, you can create a new Gender data type. This kind of data type with empty alternatives is similar to enumerations in other languages.
data Gender = Male | Female | Unknown

Exercise 2-4 provides a step-by-step recipe on how to integrate this new Gender data type in the existing code base and how to modify the existing functionality for covering it. In the following sections I assume that the Gender type has been defined.

Exercise 2-4. More Types of Values

You have just defined a new Gender data type. The reason you defined it was to include such information in a Person record, so you should add a new field in Person.
  • Add a Gender argument to Person and make it Showable.

  • Create new values of the new Client data type with the enhanced definition you worked with throughout this section.

You have learned how to define new data types, so it’s time to look at other types that could be useful for the Time Machine Store. Time machines are defined by their manufacturer, their model (which is an integer), their name, whether they can travel to the past and to the future, and a price (which can be represented as a floating-point number). Define a TimeMachine data type holding that information. Try to use more than one ADT to structure the values.

Pattern Matching

Now it’s time to define functions over your shiny new data types. The bad news is that I haven’t taught you how to extract the information from the constructors because you have been taught to use head and tail for lists and to use fst and snd for tuples. The general solution for this task is pattern matching . Matching a value against a pattern allows you to discern the structure of the value, including the constructor that was used to create the value, and to create bindings to the values encoded inside it. When entering the body of the match, the pattern variables will hold the actual inner values, and you can work with them.

Simple Patterns

To see a first example, let’s create a function giving the name of a client. In the case of a company or a government organization, the client name will be the first component of the constructor. In the case of an individual, you will have to look inside Person and concatenate the first and last names. As you can see, the patterns in this case look exactly like the ADT constructors but with the parameters replaced by bindings:
clientName :: Client -> String
clientName client = case client of
                      GovOrg  name                -> name
                      Company name id person resp -> name
                      Individual person ads       ->
                          case person of
                            Person fNm lNm gender -> fNm ++ " " ++ lNm

Let’s see how the execution of a call to clientName (Individual [Person "Jack" "Smith" Male]) False proceeds . First the system finds a case expression. So, it tries to match with the first and second patterns, but in both cases the constructor is not the same as the value. In the third case, the system finds the same constructor, and it binds the values: person now holds Person "Jack" "Smith" Male, and ads holds the value False. In the body of the match, there’s again a case expression, from which a match is done to the Person constructor, binding fNm to "Jack", lNm to "Smith", and gender to Male. Finally, the system proceeds into the innermost body and executes the concatenation, giving “Jack Smith” as the result.

Note

When loading this definition into the interpreter, you will receive a collection of warnings that look like:

Defined but not used: `id'

This tells you that you created a binding that was not used in the body of the match. The solution for this warning is telling the compiler that you won’t use that binding in your code, and this is done by replacing its binding variable by a single underscore, _. For example, the nonwarning pattern for Company name id person resp would have been Company name _ _ _ because you are using only the first pattern variable in the subsequent matching code.

For this example, I have used the simplest kind of match, which just looks at the constructors and binds the values of the parameters. But you can specify more complex patterns, in which some inner parts of the value will have to match also against other patterns. Using this approach, you can rewrite the match in clientName to be shorter, as shown here:
clientName :: Client -> String
clientName client = case client of
                      GovOrg  name       -> name
                      Company name _ _ _ -> name
                      Individual (Person fNm lNm _) _ -> fNm ++ " " ++ lNm
One important question that arises here is, what happens if no pattern matches the value that is given? The best way to find this out is by an easy example. Let’s consider a companyName function, shown here:
companyName :: Client -> String
companyName client = case client of
                       Company name _ _ _ -> name
The interpreter already warns about the pattern not covering all the cases, that is, not being exhaustive.
Warning: Pattern match(es) are non-exhaustive
In a case alternative:
    Patterns not matched:
        GovOrg _
        Individual _ _
Applying the function to a value that is not expected yields an exception. This is similar to what happens if you try to get the head of an empty list.
*Chapter2.DataTypes> companyName (GovOrg "NATO")
"*** Exception: Non-exhaustive patterns in case
The functions that are not defined over the complete domain of their arguments are called partial (the other side of the coin are the total functions). In some cases, a default value can be returned when you don’t get an applicable value (for example, returning "unknown" in companyName if the input is not a Company). However, this problem is so common in practice that the Haskell Platform already bundles a special data type for this matter: Maybe T. As lists and tuples, the Maybe type is parameterized by the type of value it holds, so you have Maybe Integer, Maybe String, Maybe [Integer], and so on. There are only two kinds of values that this type can have: Nothing, with no arguments, usually signaling that the function doesn’t have nothing sensible to return for that specific value; and Just v, which holds a single value v of the corresponding type. Let’s rewrite companyName.
companyName :: Client -> Maybe String
companyName client = case client of
                       Company name _ _ _ -> Just name
                       _                  -> Nothing
One interesting fact is that you can pattern match directly on let and where bindings. In that case, you can handle only one pattern, but it is useful when you know that only one kind of value can happen in a specific place. Let’s say you are sure at some point that the client you are working with is a company. Instead of this not very clear code:
let name = case companyName client of
             Just n -> n
You can write the following much more concise version:
let Just name = companyName client
Constants are also patterns, which match the exact values written. Let’s focus on an archetypical example for teaching programming languages: the Fibonacci numbers. The nth Fibonacci number, F(n), is defined as F(0) = 0, F(1) = 1, and F(n) = F(n-1) + F(n-2) for a larger value of n. This is easily expressed in terms of patterns and recursion.
fibonacci :: Integer -> Integer
fibonacci n = case n of
                0 -> 0
                1 -> 1
                _ -> fibonacci (n-1) + fibonacci (n-2)

In this case, you have implicitly used the fact that patterns are checked in the same order they appear in the code. This order-dependent behavior can lead to subtle bugs and sometimes even to programs that don’t terminate or run out of resources. As an exercise, rewrite the fibonacci function putting the last pattern in the first position. Now try to test the function in the interpreter. You will see that it never terminates.

Also, once a pattern has matched, it completely stops trying other alternatives, even if a further match raises an error. For example, the following two functions are not equivalent:
f :: Client -> String
f client = case client of
             Company _ _ (Person name _ _) "Boss" -> name ++ " is the boss"
             _                                  -> "There is no boss"
g :: Client -> String
g client = case client of
             Company _ _ (Person name _ _) pos ->
               case pos of "Boss" -> name ++ " is the boss"
             _                               -> "There is no boss"
*Chapter2.DataTypes> f (Company "A" 5 (Person "John" "Do" Male) "Director")
"There is no boss"
*Chapter2.DataTypes> g (Company "A" 5 (Person "John" "Do" Male) "Director")
"*** Exception: Non-exhaustive patterns in case

When the value is given to f, the first pattern does not match because "Director" is not equal to "Boss". So, the system goes into the second black-hole match and sees that there is no boss. However, on g it first matches into being a Company, which the value satisfies, and in this point it enters the body of the match and forgets about other alternatives. Then, the inner match fails, raising the exception.

Note

I strongly emphasize the fact that pattern matching does not backtrack when something goes wrong in the body of a match. This is important to remember, especially if you are coming from a logic programming background in which unification with backtracking is the norm.

You may have noticed that most of the case expressions just pattern match on some parameter to a function. For these cases, Haskell allows you to encode the pattern directly in the definition. You include several lines for the function, each defining it for a pattern. This approach creates code that is similar to the way you write mathematical functions. For example, new versions of clientName and fibonacci look like this:
clientName (GovOrg name)                     = name
clientName (Company name _ _ _)              = name
clientName (Individual (Person fNm lNm _) _) = fNm ++ " " ++ lNm
fibonacci 0 = 0
fibonacci 1 = 1
fibonacci n = fibonacci (n-1) + fibonacci (n-2)

Try to use this new syntax when writing the solution for Exercise 2-5, which provides a set of tasks to practice pattern matching on different kind of values, both clients and time machines.

Exercise 2-5. The Perfect Match For Your Time Machines

These exercises focus on pattern matching on data types defined by you. For working with lists, follow the pattern of having different branches for the empty and general case. Also, think carefully about the order of the patterns. Afterward, test the functions in the interpreter.

For statistical purposes, write a function that returns the number of clients of each gender. You may need to define an auxiliary data type to hold the results of this function.

Every year a time comes when time machines are sold with a big discount to encourage potential buyers. Write a function that, given a list of time machines, decreases their price by some percentage. Use the TimeMachine data type you defined in Exercise 2-4.

Lists and Tuples

One question that may have naturally arisen while doing the previous exercises is whether it’s also possible to use pattern matching on lists and tuples because it seems that doing so will lead to more concise code. It’s indeed possible because lists and tuples are no more special than any other user-defined data type. Lists constructors are [] and (:), and those are the ones you can use to pattern match. Furthermore, using pattern matching in lists allows you to get rid of all the null checks and calls to head and tail. For example, the function (+++) defined earlier could be rewritten as follows:
(+++) :: [a] -> [a] -> [a]
list1 +++ list2 = case list1 of
                    []   -> list2
                    x:xs -> x:(xs +++ list2)
Or directly matching in the function declaration would look like this:
[]     +++ list2 = list2
(x:xs) +++ list2 = x:(xs +++ list2)

Note

It’s customary in Haskell to write pattern matching on lists using a letter or a small word followed by the same identifier in plural, like x:xs.

The Prelude function’s null, head, and tail have no special magic inside them; they can be defined easily using pattern matching. Are you able to do so?

Sometimes you need to match on lists of more than one element. A possible function where you would need these is one that checks whether a list of integers is sorted. To check for sorted data, three cases need to be considered. The first two are the empty or singleton cases, and those are always sorted. But if more than one element is contained in a list, you need to compare the first with the second and then see whether the list comprised of the second and subsequent elements is sorted. That check on the second and subsequent elements is done recursively. The implementation of such a sorted function is as follows:
sorted :: [Integer] -> Bool
sorted []       = True
sorted [_]      = True
sorted (x:y:zs) = x < y && sorted (y:zs)
There is still some repetition in this code; you are matching on y:zs just to later reconstruct it. This sequence of checking whether some value conforms to some pattern but later to use the value as a whole and not its components is quite common in Haskell code. For that reason, Haskell introduces a syntactic form referred to by the term as pattern. As pattern allows you to bind some value in the match, while at the same time allowing you to match on inner components of that value. To use it, you have to wrap into parentheses the whole pattern you want to give a name to and prepend it by the variable that will be used to refer to the whole pattern and the @ symbol. A new definition of sorted that uses as patterns for y:zs looks like this:
sorted []            = True
sorted [_]           = True
sorted (x : r@(y:_)) = x < y && sorted r

One last remark about matching on lists: In many cases you have a function that at first sight makes sense only on a nonempty list, such as when computing the sum of all elements in a list. In most cases, this function can be extended in a sensible way to empty lists. For example, you can assign the value 0 to the sum of an empty list because if you add that value to any number, it does not change. Values such as 0, which can be safely applied with respect to an operation, such as sum, are called the neutral elements of that operation. I will cover such neutral elements in more detail in Chapter 3 when discussing folds and again in Chapter 4 when discussing monoids.

Matching on tuples is also easy. Just use the syntax of a comma-separated list of components between parentheses. Rewriting the maxmin example from the previous section in this style makes the algorithm much more apparent to the reader or maintainer of that code.
maxmin [x]    = (x,x)
maxmin (x:xs) = ( if x > xs_max then x else xs_max
                , if x < xs_min then x else xs_min
                ) where (xs_max, xs_min) = maxmin xs

Guards

A guard is part of the pattern-matching syntax that allows you to refine a pattern using Boolean conditions that must be fulfilled by the bound values after a successful match. Guards are useful for writing clearer code and avoiding certain problems, helping you to obtain the full power of pattern matching.

Two examples can help make the need for guards clear. The first is an extension of the Fibonacci function to any integer value. You will wrap the value on a Just if you are asked to get the Fibonacci number of a nonnegative number and return Nothing otherwise. With the syntax introduced up to this point, you could write the following code:
ifibonacci :: Integer -> Maybe Integer
ifibonacci n = if n < 0
               then Nothing
               else case n of
                     0  -> Just 0
                     1  -> Just 1
                     n' -> let Just f1 = ifibonacci (n'-1)
                               Just f2 = ifibonacci (n'-2)
                           in Just (f1 + f2)

At this time, your developing sense of clear code signals you that the initial check for negativeness hides part of the algorithm, which is mostly expressed in a pattern match. And that is true. Apart from that, notice that the case statement has used a binding n'. You could have reused n, but the interpreter would complain about shadowing a previous definition. Even though the interpreter knows completely well which n the code refers to, the fact you have used the same name twice may create confusion for another developer. It’s customary in Haskell code to use the same identifier, but with ' (pronounced prime) afterward, to refer to a highly related binding.

Another mathematical function I will cover is the binomial coefficient of n and k, usually written $$ left(egin{array}{c}n\ {}kend{array}
ight) $$. This coefficient gives the number of ways in which you can get k balls from a bag of n without repetition. Using the famous Pascal’s triangle, you can give a closed definition of this coefficient as follows:
$$ left(egin{array}{c}n\ {}kend{array}
ight)=Big{kern7.5em {displaystyle egin{array}{ll}1,&amp; k=0kern0.375em mathrm{V}kern0.375em n=k\ {}left(egin{array}{c}n-1\ {}k-1end{array}
ight)+left(egin{array}{c}n-1\ {}kend{array}
ight),&amp; otherwiseend{array}} $$
Your task is translating this mathematical definition into Haskell code. A first approximation could be as follows:
binom _ 0 = 1
binom x x = 1
binom n k = (binom (n-1) (k-1)) + (binom (n-1) k)

But sadly this approach doesn’t make the interpreter happy, which shows the error Conflicting definitions for `x'. This error is because of the restriction imposed on patterns in which a variable can appear only once in each of them. A possible solution is to change the entire shape of the function. Once again, it seems that pattern matching is not giving you all the power you are asking from it.

The solution to the problems found in both functions is to use guards. A guard is itself part of a pattern, so it allows backtracking (choosing other alternative) if it fails, in contrast to matching a pattern and later checking for a condition. The Boolean conditions in a guard are separated by a | sign from the rest of the pattern and allow the use of the variables bound during the match. The following is how you would rewrite the Fibonacci and binomial functions using guards:
ifibonacci n | n < 0     = Nothing
ifibonacci 0             = Just 0
ifibonacci 1             = Just 1
ifibonacci n | otherwise = let Just f1 = ifibonacci (n-1)
                               Just f2 = ifibonacci (n-2)
                           in Just (f1 + f2)
binom _ 0          = 1
binom x y | x == y = 1
binom n k          = (binom (n-1) (k-1)) + (binom (n-1) k)

Apart from the use of guards, you should notice another tidbit in the previous code. The use of otherwise in the last pattern when using guards is a common convention in Haskell. While using otherwise doesn’t add anything to the code (using no guard is equivalent), it signals clearly that the remaining pattern takes care of all the cases not handled by other cases.

Any expression returning a Boolean value can be used in a guard. This means you can also call a function that you have defined. For example, the following code returns special strings when a number is a multiple of 2, 3, or 5, and a default string otherwise:
multipleOf :: Integer -> Integer -> Bool
multipleOf x y = (mod x y) == 0
specialMultiples :: Integer -> String
specialMultiples n | multipleOf n 2 = show n ++ " is multiple of 2"
specialMultiples n | multipleOf n 3 = show n ++ " is multiple of 3"
specialMultiples n | multipleOf n 5 = show n ++ " is multiple of 5"
specialMultiples n | otherwise      = show n ++ " is a beautiful number"
For this example where you are checking several conditions on the same argument, Haskell allows an even more compact declaration. You don’t need to write specialMultiples n every time.
specialMultiples n
  | multipleOf n 2 = show n ++ " is multiple of 2"
  | multipleOf n 3 = show n ++ " is multiple of 3"
  | multipleOf n 5 = show n ++ " is multiple of 5"
  | otherwise      = show n ++ " is a beautiful number"

Up to this point, I have introduced matching on user defined data types, on lists, and on tuples and guards. The tasks in Exercise 2-6 will help ensure that you understand these concepts.

Exercise 2-6. More Matches and Guards

Up to this point I have introduced matching on lists and tuples and guards. The following tasks will help you ensure that you understand these concepts:

Define the famous Ackermann function. Try using guards:$$ Aleft(m,n
ight)=Big{kern5em {displaystyle egin{array}{c}n+1,kern0.5em m=0\ {}kern5em Aleft(m-1,1
ight),kern0.5em m&gt;0,n=0\ {}Aleft(m-1,Aleft(m,n-1
ight)
ight),kern0.5em m&gt;0,n&gt;0end{array}} $$

Define the unzip function, which takes a list of tuples and returns two lists, one with all the first components and other one with the seconds. Here’s an example: unzip [(1,2),(3,4)] = ([1,3],[2,4]).

View Patterns

Sometimes you want to look for patterns in a value, but in some way they are not directly encoded. So, you need to preprocess the value before matching. For those cases, you can use view patterns. These patterns extend all of those previously seen with a new syntax element, (function -> pattern), which applies function to the value and then matches the result with the pattern. For example, remember the clientName function from the beginning of the chapter, and let’s add a responsibility one:
responsibility :: Client -> String
responsibility (Company _ _ _ r) = r
responsibility _                 = "Unknown"
Now you can create a function returning whether a given client is special. Let’s consider a client special if the client is the director of a company or the client’s name is "Mr. Alejandro". View patterns allow very clear code.
specialClient :: Client -> Bool
specialClient (clientName -> "Mr. Alejandro") = True
specialClient (responsibility -> "Director")  = True
specialClient _                               = False
Oops! It seems that you rushed into making some sort of mistake. Notice the following interpreter error:
    Illegal view pattern:  clientName -> "Mr. Alejandro"
    Use -XViewPatterns to enable view patterns
This problem arises because view patterns are not part of the Haskell 2010 specification but rather an extension made by GHC developers. For that reason, you are asked to explicitly enable compatibility with this extension. You can do so adding special options to the compiler or interpreter, but the suggested approach is to add a pragma to the file. A pragma is a special comment that is interpreted by the compiler and that is used to enable or disable some flags. In this case, you need to include the following at the beginning of the source:
{-# LANGUAGE ViewPatterns #-}
If you are working in the interpreter, you need to execute a :set command to enable an extension. Notice that the extension name must be prefixed by –X.
Prelude> :set -XViewPatterns

In the rest of the book, I shall remark that an extension needs to be enabled for a specific piece of code by including the corresponding pragma as you would do in the beginning of a source file.

Note

GHC includes a great many extensions (more than 30 at the moment of writing). They range from simple extensions to the syntax (like the view patterns discussed earlier) for complete overhauls of the type system. Being so different in power, some of them are accepted by the community while others are controversial. All the GHC extensions that will be introduced in this book belong to the first set: they are seen as beneficial because they make code more elegant and easier to understand, without running into any problems. The main con of extensions, even with not-controversial ones, is that they are not part of the Haskell 2010 Report, so in theory they could make your code less interoperable between different Haskell compilers. However, this interoperability is almost never a problem.

Records

In most programming languages you can find the idea of a field as something that holds a value in a larger structure. Furthermore, fields can be accessed or changed easily (e.g., in C or Java using structure.field). From what you have learned so far, you can see that pattern matching on big structures may get unwieldy quickly, because it forces to write long matches to retrieve just a single value and to re-create entire data structures merely to change just a single field.

Creation and Use

The concept of a data structure with fields that can be accessed by name does exist in Haskell. Records make accessing or updating part of a structure much easier than otherwise. Records are defined using data declarations, but instead of just using a type for each parameter, you write parameter name :: parameter type. These declarations are the only exception to the layout rule. You always need to write the set of fields between { and } and to separate them by commas.

Let’s write the Client and Person definitions but now using record syntax. To leave all the previous functions from this chapter unchanged, you are going to encode the new records in new ClientR and PersonR types. Remember, constructor names should not clash; that’s why you need to use new names for the record types. But field names can, so you are allowed to use clientRName for fields in different alternatives, given that they have the same type. Here are the new record definitions:
data ClientR = GovOrgR  { clientRName :: String }
             | CompanyR { clientRName :: String
                        , companyId :: Integer
                        , person :: PersonR
                        , duty :: String }
             | IndividualR { person :: PersonR }
             deriving Show
data PersonR = PersonR { firstName :: String
                       , lastName :: String
                       } deriving Show
You can create values from these types using the same constructor syntax that you’ve been using. However, if the data is declared as a record, you can also use the constructor name followed by a list of each field name, followed by an = sign and the corresponding value. There are two benefits for doing this. First, constructing the new value this way results in better documentation because you can see directly to which field each value corresponds. Also, this syntax allows you to write the field names in any order, giving you more freedom. Here’s an example showing this named notation:
*Chapter2.DataTypes> IndividualR { person = PersonR { lastName = "Smith", firstName = "John" } }
IndividualR {person = PersonR {firstName = "John", lastName = "Smith"}}
*Chapter2.DataTypes> GovOrgR "NATO"
GovOrgR {clientRName = "NATO"}
Field names are also used to create special functions that access those particular fields. Here’s an example:
*Chapter2.DataTypes> clientRName (GovOrgR "NATO")
"NATO"
*Chapter2.DataTypes> :t duty
duty :: ClientR -> String
Because these functions will be automatically created, Haskell enforces two extra restrictions on field names.
  • They must not clash with any other field or function name.

  • As I mentioned earlier, you are allowed to use the same field name in more than one alternative of your data type. However, if you do so, all those fields must have the same type. If such is not the case, no correct type can be given to the corresponding function.

Records are useful when pattern matching. For a traditional constructor, you need to write a binding or another pattern for each field in it. Thus, in many cases the code ends up with a collection of _ bindings, which are difficult to maintain. With a record, you can use a new pattern that resembles the building one: the constructor, plus a list of field name = pattern elements enclosed in brackets. You don’t need to include all the fields, just those for which you want to bind or match. You are even allowed to write an empty list of fields, as this example highlights:
greet :: ClientR -> String
greet IndividualR { person = PersonR { firstName = fn } } = "Hi, " ++ fn
greet CompanyR    { clientRName = c } = "Hi, " ++ c
greet GovOrgR     { }                 = "Welcome"
There are two interesting additions in GHC to record matching that encode very usual patterns, allowing for a lesser amount of boilerplate code. The first addition is record puns, which are enabled by the pragma NamedFieldPuns. When using record puns, you can replace all field patterns of the form field name = field name, which creates a binding for the corresponding field available with the same name in the body of the match, with a single field name. You can interleave this kind of matching with the more usual one. Here’s an example:
{-# LANGUAGE NamedFieldPuns #-}
greet IndividualR { person = PersonR { firstName } } = "Hi, " ++ firstName
greet CompanyR    { clientRName } = "Hi, " ++ clientRName
greet GovOrgR     { }             = "Welcome"
Another common idiom is making some field obey a pattern and binding the rest of the fields to use them in the body of the match. Even with record puns, doing so can take a large amount of code. But GHC can take advantage of its knowledge of the field names and automatically generate all that code for you. In particular, the extension RecordWildCards allows the use of .. (two dots) to automatically create bindings for all variables that haven’t been mentioned in the pattern up to that point. The previous example could get a minimal form such as follows:
{-# LANGUAGE RecordWildCards #-}
greet IndividualR { person = PersonR { .. } } = "Hi, " ++ firstName
greet CompanyR    { .. }                      = "Hi, " ++ clientRName
greet GovOrgR     { }                         = "Welcome"

Note

Remember that to use these extensions, you need to include the {-# LANGUAGE Extension #-} declaration at the beginning of your source code.

I have spoken about facilities for record building and matching. The last step is using record syntax for updating a record. If r is a binding containing a value of a record type, you can use r { field name = new value } to create an exact copy of r where the corresponding field has been changed. For example, here is a function that ensures that PersonR’s first name always starts with a capital letter:
nameInCapitals :: PersonR -> PersonR
nameInCapitals p@(PersonR { firstName = initial:rest }) =
        let newName = (toUpper initial):rest
        in  p { firstName = newName }
nameInCapitals p@(PersonR { firstName = "" }) = p

Take the time to understand this last example because it shows a lot of features from this chapter. Record syntax is used to pattern match on PersonR. Inside it, x:xs is used to match a list. As you later want to refer to the entire value to update it, an as pattern is used to bind it to p. Finally, the new name is computed inside a let expression, which is used to update p using record-updating syntax.

As you have done for clients, you can also benefit from record syntax when describing time machines. That is the purpose of Exercise 2-7.

Exercise 2-7. Time Machine Records

Rewrite the TimeMachine data type defined earlier using records. You should find that updating the prices of time machines is now much more concise.

The Default Values Idiom

We are going to end this chapter by showing a particularly helpful convention the Haskell community has come up with to support a common use case. You will look at functions that can take a long list of parameters, but most of the time those parameters take on default values. Take as an example a network library. For creating a connection, you need information about the following:
  • URL to connect to

  • Connection type: TCP or UDP

  • Connection speed

  • Whether to use a proxy

  • Whether to use caching

  • Whether to use keep-alive

  • Time-out lapse

These elements can be encoded as follows:
data ConnType = TCP | UDP
data UseProxy = NoProxy | Proxy String
data TimeOut = NoTimeOut | TimeOut Integer
data Connection = ...  -- Definition omitted
connect :: String -> ConnType -> Integer -> UseProxy
        -> Bool -> Bool -> TimeOut -> Connection
Of course, most people simply want to connect to a target URL using TCP at the highest speed, with some sensible defaults for proxying, caching, keep-alive, and time-out lapse. A first solution is to create a special function for that case.
connectUrl :: String -> Connection
connectUrl u = connect u TCP 0 NoProxy False False NoTimeOut
This solution makes it easy to connect in the simple case but poses two problems.
  1. 1.

    Maintainability is harmed. If at some point you need to add a new connection parameter, all users of the function need to change their calls to connect. Or if the default value changes, all the uses must be reconsidered and rewritten.

     
  2. 2.

    Using the library is easy only for the simplest case. If you want to connect to a URL using a proxy, you need to step back and use the full connect function, passing all the parameters. In some cases, knowing which the sensible defaults are may be difficult.

     
Records come to the rescue. Instead of passing parameters one by one, you can group all or most of them into a record and use the record as a parameter. Here’s how that would look for the connection example:
data ConnOptions = ConnOptions { connType      :: ConnType
                               , connSpeed     :: Integer
                               , connProxy     :: UseProxy
                               , connCaching   :: Bool
                               , connKeepAlive :: Bool
                               , connTimeOut   :: TimeOut
                               }
connect' :: String -> ConnOptions -> Connection
connect' url options = ...
The second step is to create a constant, which encodes sensible defaults.
connDefault :: ConnOptions
connDefault = ConnOptions TCP 0 NoProxy False False NoTimeOut
Now creating a connection with the default parameters takes just a tad more code, but you gain in return the ability to change just one parameter (like the type to UDP) without having to write all the default values. The following examples show the simplest case and also the case of specifying UDP as the connection type:
*Chapter2.DefaultValues> connect' "https://apress.com" connDefault
*Chapter2.DefaultValues> :{
*Chapter2.DefaultValues> connect' "https://apress.com"
*Chapter2.DefaultValues>          connDefault { connType = UDP }
*Chapter2.DefaultValues> :}

There is only one problem left. If you add a new option and the developer has made direct use of the constructor for the record type, that use must be changed. The solution is to forbid calling the constructor directly, forcing the use of connDefault in some way or another. This can be done by not exporting the constructor. You will see how to do this in the next chapter, where you will also learn about smart constructors.

Summary

In this chapter you learned the basics of first-order Haskell programming.
  • Basic data types were introduced: characters, Booleans, lists, and tuples.

  • You learned how to define new functions and how to use let and where to create temporary bindings that allow reusing expressions and thus writing better code. Afterward, you learned how to define a function by cases.

  • You defined your first data types, learned about ADTs and constructors, and played with creating new values in the interpreter.

  • Pattern matching is a fundamental tool for Haskell programming, which I touched upon in this chapter. You saw how to match both primitive and user-defined types and how guards, as patterns and view patterns, make matching more concise.

  • Records were introduced as a better syntax for building, accessing, and updating fields in a Haskell value. You saw the default values design pattern, which uses records at its core.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.34.161