Chapter 10
IN THIS CHAPTER
Understanding types
Creating and managing types
Fixing type errors
Using types in code
The term type takes on new meaning when working with functional languages. In other languages, when you speak of a type, you mean the label attached to a certain kind of data. This label tells the compiler how to interact with the data. The label is intimately involved with the value. In functional languages, type is more about mapping. You compose functions that express a mapping of or transformation between types of data. The function is a mathematical expression that defines the transformation using a representation of the math involved in the transformation. Just how a language supports this idea of mapping and transformation depends on how it treats underlying types. Because Haskell actually provides a purer approach with regard to type and the functional programming paradigm, this chapter focuses a little heavier on Haskell.
As with other languages, you can create new types as needed in functional languages. However, the manner in which you create and use new types differs because of how you view type. Interestingly enough, creating new types can be easier in functional languages because the process is relatively straightforward and the result is easier to read in most cases.
The other side of the coin is that functional languages tend toward stricter management of type. (This is true for the most part, at least. Exceptions definitely exist, such as JavaScript, which is being fixed; see https://www.w3schools.com/js/js_strict.asp
for details.) Because of this strictness, you need to know how to understand, manage, and fix type errors. In addition, you should understand how the use of type affects issues such as missing data. The chapter includes examples in both Haskell and Python to demonstrate all of the various aspects of type.
Functional languages provide a number of methods for defining type. Remember that no matter what programming paradigm you use, the computer sees numbers — 0s and 1s, actually. The concept of type has no meaning for the computer; type is there to help the humans writing the code. As with anything, when working with types, starting simply is best. The following sections examine the basics of type in the functional setting and discuss how to augment those types to create new types.
As mentioned in the introduction, a pure functional language, such as Haskell, uses expressions for everything. Because everything is an expression, you can substitute functions that provide the correct output in place of a value. However, values are also expressions, and you can test this idea by using :t
to see their types. When you type :t True and press Enter, you see True :: Bool
as output because True
is an expression that produces a Bool
output. Likewise, when you type :t 5 == 6 and press Enter, you see 5 == 6 :: Bool
as the output. Any time you use the :t
command, you see the definition of the type of whatever you place after the command.
Python takes a similar view, but in a different manner, because it supports multiple programming paradigms. In Python, you point to an object using a name. The object contains the value and provides its associated properties. The object controls its use because it knows how to be that particular object. You can point to a different object using the name you define, but the original object remains unchanged. To see this perception of type, you use the Python type
function. When you type type(1), you see <class 'int'>
as output. Other languages might say that the type of a value 1 is an int, rather than say that the type of a value 1 is an instance of the class int
. If you create a variable by typing myInt = 1 and pressing Enter, then use the type(myInt)
function, you still see <class 'int'>
as output. The name myInt
merely points to an object that is an instance of class int
. Even expressions work this way. For example, when you type myAdd = 1 + 1 and then use type(myAdd)
, you still get <class 'int'>
as output.
A number of nonfunctional languages use type signatures to good effect, although they may have slightly different names and slightly different uses, such as the function signature in C++. Even so, signatures used to describe the inputs and outputs of the major units of application construction for a language are nothing new. The type signature in Haskell is straightforward. You use one for the findNext function in Chapter 8:
findNext :: Int -> [Int] -> Int
In this case, the expression findNext
(on the left side of the double colon) expects an Int
and an [Int]
(list) as input, and provides an Int
as output. A type signature encompasses everything needed to fully describe an expression and helps relieve potential ambiguity concerning the use of the expression. Haskell doesn't always require that you provide a type signature (many of the examples in this book don’t use one), but will raise an error if ambiguity exists in the use of an expression and you don’t provide the required type signature. When you don’t provide a type signature, the compiler infers one (as described in the previous section). Later sections of this chapter discuss some of the complexities of using type signatures.
Python can also use type signatures, but the philosophy behind Python is different from that of many other languages. The type signature isn’t enforced by the interpreter, but IDEs and other tools can use the type signature to help you locate potential problems with your code. Consider this function with the type signature:
def doAdd (value1 : int, value2 : int) -> int:
return value1 + value2
One way around this problem is to use a static type checker such as mypy (http://mypy-lang.org/
). When you call on this tool, it checks your code against the signature you provide.
def doAdd (value1 : int = 0, value2 : int = 0) -> int:
if not isinstance(value1, int) or
not isinstance(value2, int):
raise TypeError
return value1 + value2
The problem with this approach is that it runs counter to the Python way of performing tasks. When you add type checking code of this sort, you automatically limit the potential for other people to use functions in useful, unexpected, and completely safe ways. Python relies on an approach called Duck Typing (see http://wiki.c2.com/?DuckTyping
and https://en.wikipedia.org/wiki/Duck_typing
for details). Essentially, if it walks like a duck and talks like a duck, it must be a duck, despite the fact that the originator didn't envision it as a duck.
At some point, the built-in types for any language won’t satisfy your needs and you'll need to create a custom type. The method used to create custom types varies by language. As noted in the “Understanding the functional perception of type” section, earlier in this chapter, Python views everything as an object. In this respect, Python is an object-oriented language within limits (for example, Python doesn’t actually support data hiding). With this in mind, to create a new type in Python, you create a new class, as described at https://docs.python.org/3/tutorial/classes.html
and https://www.learnpython.org/en/Classes_and_Objects
. This book doesn’t discuss object orientation to any degree, so you won’t see much with regard to creating custom Python types.
Haskell takes an entirely different approach to the process that is naturally in line with functional programming principles. In fact, you may be amazed to discover the sorts of things you can do with very little code. The following sections offer an overview of creating types in Haskell, emphasizing the functional programming paradigm functionality.
Haskell has this concept of adding types together to create a new kind of type. One of the operations you can perform on these types is AND, which equates to this type and this type as a single new type. In this case, you provide a definition like this one shown here.
data CompNum = Comp Int Int
It’s essential to track the left and right side of the definition separately. The left side is the type constructor and begins with the data
keyword. For now, you create a type constructor simply by providing a name, which is CompNum
(for complex number, see https://www.mathsisfun.com/numbers/complex-numbers.html
for details).
The right side is the data constructor. It defines the essence of the data type. In this case, it includes an identifier, Comp
, followed by two Int
values (the real component and the imaginary component). To create and test this type, you would use the following code:
x = Comp 5 7
:t x
The output, as you might expect, is x :: CompNum
, and the new data type shows the correct data constructor. This particular version of CompNum
has a problem. Type x by itself and you see the error message shown in Figure 10-1.
To fix this problem, you must tell the data type to derive the required functionality. The declarative nature of Haskell means that you don’t actually have to provide an implementation; declaring that a data type does something is enough to create the implementation, as shown here:
data CompNum = Comp Int Int deriving Show
x = Comp 5 7
:t x
x
One of the more interesting aspects of Haskell data types is that you can create a Sum data type — a type that contains multiple constructors that essentially define multiple associated types. To create such a type, you separate each data constructor using a bar (|), which is essentially an OR operator. The following code shows how you might create a version of CompNum (shown in the previous section) that provides for complex, purely real, and purely imaginary numbers:
data CompNum = Comp Int Int | Real Int | Img Int deriving
Show
When working with a real number, the imaginary part is always 0. Likewise, when working with an imaginary number, the real part is always 0. Consequently, the Real
and Img
definitions require only one Int
as input. Figure 10-3 shows the new version of CompNum in action.
As you can see, you define each of the variables using the applicable data constructor. When you check type using :t
, you see that they all use the same type constructor: CompNum
. However, when you display the individual values, you see the kind of number that the expression contains.
The ability to enumerate values is essential as a part of categorizing. Providing distinct values for a particular real-world object's properties is important if you want to better understand the object and show how it relates to other objects in the world. Previous sections explored the use of data constructors with some sort of input, but nothing says that you must provide a value at all. The following code demonstrates how to create an enumeration in Haskell:
data Colors = Red | Blue | Green deriving (Show, Eq, Ord)
Notice that you provide only a label for the individual constructors that are then separated by an OR operator. As with previous examples, you must use deriving
to allow the display of the particular variable's content. Notice, however, that this example also derives from Eq
(which tests for equality) and Ord
(which tests for inequality). Figure 10-4 shows how this enumeration works.
As usual, the individual variables all use the same data type, which is Colors
in this case. You can compare the variable content. For example, x == y
is False
because they're two different values. Note that you can compare a variable to its data constructor, as in the case of x == Red
, which is True
. You have access to all of the logical operators in this case, so you could create relatively complex logic based on the truth value of this particular type.
Enumerations also appear using alternative text. Fortunately, Haskell addresses this need as well. This updated code presents the colors in a new way:
data Colors = Red | Blue | Green deriving (Eq, Ord)
instance Show Colors where
show Red = "Fire Engine Red"
show Blue = "Sky Blue"
show Green = "Apple Green"
The instance
keyword defines a specific manner in which instances of this type should perform particular tasks. In this case, it defines the use of Show
. Each color appears in turn with the color to associate with it. Notice that you don't define Show
in deriving
any longer; you use the deriving
or instance
form, but not both. Assuming that you create three variables as shown in Figure 10-4, (where x = Red
, y = Blue
, and z = Green
), here's the output of this example:
x = Fire Engine Red
y = Sky Blue
z = Apple Green
Many data sources rely on records to package data for easy use. A record has individual elements that you use together to describe something. Fortunately, you can create record types in Haskell. Here’s an example of such a type:
data Name = Employee {
first :: String,
middle :: Char,
last :: String} deriving Show
The Name
type includes a data constructor for Employee
that contains fields named first
and last
of type String
and middle
of type Char
.
newbie = Employee "Sam" 'L' "Wise"
Notice that the 'L'
must appear in single quotes to make it the Char
type, while the other two entries appear in double quotes to make them the String
type. Because you've derived Show
, you can display the record, as shown in Figure 10-5. Just in case you’re wondering, you can also display individual field values, as shown in the figure.
The problem with this construction is that it’s rigid, and you may need flexibility. Another way to create records (or any other type, for that matter) is to add the arguments to the type constructor instead, as shown here:
data Name f m l = Employee {
first :: f,
middle :: m,
last :: l} deriving Show
This form of construction is parameterized, which means that the input comes from the type constructor. The difference is that you can now create the record using a Char
or a String
for the middle name. Unfortunately, you can also create Employee
records that really don't make any sense at all, as shown in Figure 10-6, unless you create a corresponding type signature of Name :: (String String String) -> Employee
.
The following sections talk about composing special types: monoids, monads, and semigroups. What makes these types special is that they have a basis in math, as do most things functional; this particular math, however, is about abstracting away details so that you can see the underlying general rules that govern something and then develop code to satisfy those rules.
The “Considering the math basis for monoids and semigroups” sidebar may still have you confused. Sometimes an example works best to show how something actually works, instead of all the jargon used to describe it. So, this section begins with a Haskell list, which is a monoid, as it turns out. To prove that it’s a monoid, a list has to follow three laws:
Lists automatically address the first law. If you’re working with a list of numbers, performing an operation on that list will result in a numeric output, even if that output is another list. In other words, you can’t create a list of numbers, perform an operation on it, and get a Char
result. To demonstrate the other two rules, you begin by creating the following three lists:
a = [1, 2, 3]
b = [4, 5, 6]
c = [7, 8, 9]
In this case, the example uses concatenation (++) to create a single list from the three lists. The associativity law demands that the order in which an operation occurs shouldn't matter, but that the order of the individual elements can matter. The following two lines test both of these criteria:
(a ++ b) ++ c == a ++ (b ++ c)
(a ++ b) ++ c == (c ++ b) ++ a
The output of the first comparison is True
because the order of the concatenation doesn’t matter. The output of the second comparison is False
because the order of the individual elements does matter.
The third law, the identity law, requires the use of an empty list, which is equivalent to the 0 in the set of all numbers that is often used to explain identity. Consequently, both of these statements are true:
a ++ [] == a
[] ++ a == a
When performing tasks using some Haskell, you need to use import Data.Monoid
. This is the case when working with strings. As shown in Figure 10-7, strings also work just fine as monoids. Note the demonstration of identity using an empty string. In fact, many Haskell collection types work as monoids with a variety of operators, including Sequence
, Map
, Set
, IntMap
, and IntSet
. Using the custom type examples described earlier in the chapter as a starting point, any collection that you use as a basis for a new type will automatically have the monoid functionality built in. The example at https://www.yesodweb.com/blog/2012/10/generic-monoid
shows a more complex Haskell implementation of monoids as a custom type (using a record in this case).
(a <> b) <> c == a <> (b <> c)
((a <> b) <> c) == getDual ((Dual c <> Dual b) <> Dual a)
Even though the right side would seem not to work based on earlier text, the use of the Dual
function makes it possible. To make the statement work, you must also call getDual
to convert the Dual
object to a standard list. You can find more functions of this sort at http://hackage.haskell.org/package/base-4.11.1.0/docs/Data-Monoid.html
.
The same rules for collections apply with Python. As shown in Figure 10-8, Python lists behave in the same manner as Haskell lists.
Haskell doesn’t actually have a universal sort of null value. It does have Nothing
, but to use Nothing
, the underlying type must support it. In addition, Nothing
is actually something, so it's not actually null (which truly is nothing). If you assign Nothing
to a variable and then print the variable onscreen, Haskell tells you that its value is Nothing
. In short, Nothing
is a special kind of value that tells you that the data is missing, without actually assigning null to the variable. Using this approach has significant advantages, not the least of which is fewer application crashes and less potential for a missing value to create security holes.
You normally don't assign Nothing
to a variable directly. Rather, you create a function or other expression that makes the assignment. The following example shows a simple function that simply adds two numbers. However, the numbers must be positive integers greater than 0:
doAdd::Int -> Int -> Maybe Int
doAdd _ 0 = Nothing
doAdd 0 _ = Nothing
doAdd x y = Just (x + y)
Notice that the type signature has Maybe Int
as the output. This means that the output could be an Int
or Nothing
. Before you can use this example, you need to load some support for it:
import Data.Maybe as Dm
To test this how Maybe
works, you can try various versions of the function call:
doAdd 5 0
doAdd 0 6
doAdd 5 6
The first two result in an output of Nothing
. However, the third results in an output of Just 11
. Of course, now you have a problem, because you can't use the output of Just 11
as numeric input to something else. To overcome this problem, you can make a call to fromMaybe 0 (doAdd 5 6)
. The output will now appear as 11
. Likewise, when the output is Nothing
, you see a value of 0
, as shown in Figure 10-9. The first value to fromMaybe
, 0
, tells what to output when the output of the function call is Nothing
. Consequently, if you want to avoid the whole Nothing
issue with the next call, you can instead provide a value of 1
.
The “Understanding monoids” section, earlier in this chapter, discusses three rules that monoids must follow. Semigroups are like monoids except that they have no identity requirement. Semigroups actually represent a final level of abstraction, as discussed in the earlier sidebar, “Considering the math basis for monoids and semigroups”. At this final level, things are as simple and flexible as possible. Of course, sometimes you really do need to handle a situation in which something is Nothing
, and the identity rule aids in dealing with this issue. People have differing opinions over the need for and usefulness of semigroups, as shown in the discussion at https://stackoverflow.com/questions/40688352/why-prefer-monoids-over-semigroups-in-haskell-why-do-we-need-mempty
. However, a good rule of thumb is to use the simplest abstraction when possible, which would be semigroups whenever possible. To work with semigroups, you must execute import Data.Semigroup
.
The “Considering type constructors and data constructors” section, earlier in this chapter, shows you one example of a parameterized type in the form of the Name
type. In that section, you consider two kinds of constructions for the Name
type that essentially end in the same result. However, you need to use parameterized types at the right time. Parameterized types work best when the type acts as a sort of box that could hold any sort of value. The Name
type is pretty specific, so it's not the best type to parameterize because it really can’t accept just any kind of input.
A better example for parameterizing types would be to create a custom tuple that accepts three inputs and provides the means to access each member using a special function. It would be sort of an extension of the fst
and snd
functions provided by the default tuple. In addition, when creating a type of this sort, you want to provide some sort of conversion feature to a default. Here is the code used for this example:
data Triple a b c = Triple (a, b, c) deriving Show
fstT (Triple (a, b, c)) = show a
sndT (Triple (a, b, c)) = show b
thdT (Triple (a, b, c)) = show c
cvtToTuple (Triple (a, b, c)) = (a, b, c)
In this case, the type uses parameters to create a new value: a
, b
, and c
represent elements of any type. Consequently, this example starts with a real tuple, but of a special kind, Triple
. When you display the value using show
, the output looks like any other custom type.
The special functions enable you to access specific elements of the Triple
. To avoid name confusion, the example uses a similar, but different, naming strategy of fstT
, sndT
, and thdT
. Theoretically, you could use wildcard characters for each of the nonessential inputs, but good reason exists to do so in this case.
Finally, cvtToTuple
enables you to change a Triple
back into a tuple
with three elements. The converted tuple has all the same functionality as a tuple that you create any other way. The following test code lets you check the operation of the type and associated functions:
x = Triple("Hello", 1, True)
show(x)
fstT(x)
sndT(x)
thdT(x)
show(cvtToTuple(x)))
The outputs demonstrate that the type works as expected:
Triple ("Hello",1,True)
"Hello"
1
True
("Hello",1,True)
Unfortunately, there isn't a Python equivalent of this code. You can mimic it, but you must create a custom solution. The material at https://ioam.github.io/param/Reference_Manual/param.html#parameterized-module
and https://stackoverflow.com/questions/46382170/how-can-i-create-my-own-parameterized-type-in-python-like-optionalt
is helpful, but this is one time when you may want to rely on Haskell if this sort of task is critical for your particular application and you don’t want to create a custom solution.
In a perfect world, all data acquisition would result in complete records with nothing missing and nothing wrong. However, in the real world, datasets often contain a lot of missing data, and you're often left wondering just how to address the issue so that your analysis is correct, your application doesn’t crash, and no one from the outside can corrupt your setup using something like a virus. The following sections don’t handle every possible missing-data issue, but they give you an overview of what can go wrong as well as offer possible fixes for it.
Different languages use different terms for the absence of a value. Python uses the term None and Haskell uses the term Nothing. In both cases, the value indicates an absence of an anticipated value. Often, the reasons for the missing data aren’t evident. The issue is that the data is missing, which means that it's not available for use in analysis or other purposes. In some languages, the missing value can cause crashes or open a doorway to viruses (see the upcoming “Null values, the billion-dollar mistake” sidebar for more information).
When working with Haskell, you must provide a check for Nothing
values, as described in the “Considering the use of Maybe and Just” section, earlier in this chapter. The goal is to ensure that the checks in place now that a good reason for unchecked null values no longer exist. Of course, you must still write your code proactively to handle the Nothing case (helped by the Haskell runtime that ensures that functions receive proper values). The point is that Haskell doesn't have an independent type that you can call upon as Nothing; the Nothing type is associated with each data type that requires it, which makes locating and handling null values easier.
Python does include an individual null type called None
, and you can assign it to a variable. However, note that None
is still an object in Python, although it's not in other languages. The variable still has an object assigned to it: the None
object. Because None
is an object, you can check for it using is
. In addition, because of the nature of None
, it tends to cause fewer crashes and leave fewer doors open to nefarious individuals. Here is an example of using None
:
x = None
if x is None:
print("x is missing")
Missing and incorrect data present problems. Before you can do anything at all, you must verify the dataset you use. Creating types (using the techniques found in earlier sections in the chapter) that automatically verify their own data is a good start. For example, you can create a type for bank balances that ensure that the balance is never negative (unless you want to allow an overdraft). However, even with the best type construction available, a dataset may contain unusable data entries or some entries that don’t contain data at all. Consequently, you must perform verification of such issues as missing data and data that appears out of range.
After you find missing or incorrect data, you consider the ramifications of the error. In most cases, you have the following three options:
Ignoring the issue might cause the application to fail and will most certainly produce inaccurate results when the entry is critical for analysis. However, most datasets contain superfluous entries — those that you can ignore unless you require the amplifying information they provide.
Correcting the entry is time consuming in most cases because you must now define a method of correction. Because you don’t know what caused the data error in the first place, or the original data value, any correction you make will be flawed to some extent. Some people use statistical measures (as described in the next section) to make a correction that neither adds to nor detracts from the overall statistical picture of the entries taken together. Unfortunately, even this approach is flawed because the entry may have represented an important departure from the norm.
Deleting the entry is fast and fixes the problem in a way that’s unlikely to cause the application to crash. However, deleting the entry comes with the problem of affecting any analysis you perform. In addition, deleting an entire row (case) from a dataset means losing not only the corrected entry (the particular feature) but also all the other entries in that row. Consequently, deletion of a row can cause noticeable data damage in some cases.
A statistical measure is one that relies on math to create some sort of overall or average entry to use in place of a missing or incorrect entry. Depending on the data in question and the manner in which you create types to support your application, you may be able to rely on statistical measures to fix at least some problems in your dataset.
Some statistical corrections for missing or inaccurate data see more use than others do. In fact, you can narrow the list of commonly used statistical measures down to these:
As you can see, using the right statistical measure is important. Of course, there are many other statistical measures, and you may find that one of them fits your data better. A technique that you can use to ensure that especially critical values are the most accurate possible is to plot the data to see what shape it creates and then use a statistical measure based on shape.
Haskell has plenty of type classes. In fact, you use them several times in this chapter. The most common type classes include Eq
, Ord
, Show
, Read
, Enum
, Bounded
, Num
, Integral
, and Floating
. The name type class confuses a great many people — especially those with an Object-Oriented Programming (OOP) background. In addition, some people confuse type classes and types such as Int
, Float
, Double
, Bool
, and Char
. Perhaps the best way to view a type class is as a kind of interface in which you describe what to do but not how to do it. You can't use a type class directly; rather, you derive from it. The following example shows how to use a type class named Equal
:
class Equal a where (##) :: a -> a -> Bool
data MyNum = I Int deriving Show
instance Equal MyNum where
(I i1) ## (I i2) = i1 == i2
In this case, Equal
defines the ##
operator, which Haskell doesn't actually use. Equal
accepts two values of any type, but of the same types (as defined by a
) and outputs a Bool
. However, other than these facts, Equal
has no implementation.
MyNum
, a type, defines I
as accepting a single Int
value. It derives from the common type class, Show
, and then implements an instance of Equal
. When creating your own type class, you must create an implementation for it in any type that will use it. In this case, Equal
simply checks the equality of two variables of type MyNum
. You can use the following code to test the result:
x = I 5
y = I 5
z = I 6
x ## y
x ## z
In the first case, the comparison between x
and y
, you get True
as the output. In the second case, the comparison of x
and z
, you get False as the output. Type classes provide an effective means of creating common methods of extending basic type functionality. Of course, the implementation of the type class depends on the needs of the deriving type.
18.223.107.32