Chapter 10

Dealing with Types

IN THIS CHAPTER

Check Understanding types

Check Creating and managing types

Check Fixing type errors

Check Using types in code

The term type takes on new meaning when working with functional languages. In other languages, when you speak of a type, you mean the label attached to a certain kind of data. This label tells the compiler how to interact with the data. The label is intimately involved with the value. In functional languages, type is more about mapping. You compose functions that express a mapping of or transformation between types of data. The function is a mathematical expression that defines the transformation using a representation of the math involved in the transformation. Just how a language supports this idea of mapping and transformation depends on how it treats underlying types. Because Haskell actually provides a purer approach with regard to type and the functional programming paradigm, this chapter focuses a little heavier on Haskell.

As with other languages, you can create new types as needed in functional languages. However, the manner in which you create and use new types differs because of how you view type. Interestingly enough, creating new types can be easier in functional languages because the process is relatively straightforward and the result is easier to read in most cases.

The other side of the coin is that functional languages tend toward stricter management of type. (This is true for the most part, at least. Exceptions definitely exist, such as JavaScript, which is being fixed; see https://www.w3schools.com/js/js_strict.asp for details.) Because of this strictness, you need to know how to understand, manage, and fix type errors. In addition, you should understand how the use of type affects issues such as missing data. The chapter includes examples in both Haskell and Python to demonstrate all of the various aspects of type.

Developing Basic Types

Functional languages provide a number of methods for defining type. Remember that no matter what programming paradigm you use, the computer sees numbers — 0s and 1s, actually. The concept of type has no meaning for the computer; type is there to help the humans writing the code. As with anything, when working with types, starting simply is best. The following sections examine the basics of type in the functional setting and discuss how to augment those types to create new types.

Understanding the functional perception of type

As mentioned in the introduction, a pure functional language, such as Haskell, uses expressions for everything. Because everything is an expression, you can substitute functions that provide the correct output in place of a value. However, values are also expressions, and you can test this idea by using :t to see their types. When you type :t True and press Enter, you see True :: Bool as output because True is an expression that produces a Bool output. Likewise, when you type :t 5 == 6 and press Enter, you see 5 == 6 :: Bool as the output. Any time you use the :t command, you see the definition of the type of whatever you place after the command.

Python takes a similar view, but in a different manner, because it supports multiple programming paradigms. In Python, you point to an object using a name. The object contains the value and provides its associated properties. The object controls its use because it knows how to be that particular object. You can point to a different object using the name you define, but the original object remains unchanged. To see this perception of type, you use the Python type function. When you type type(1), you see <class 'int'> as output. Other languages might say that the type of a value 1 is an int, rather than say that the type of a value 1 is an instance of the class int. If you create a variable by typing myInt = 1 and pressing Enter, then use the type(myInt) function, you still see <class 'int'> as output. The name myInt merely points to an object that is an instance of class int. Even expressions work this way. For example, when you type myAdd = 1 + 1 and then use type(myAdd), you still get <class 'int'> as output.

Considering the type signature

A number of nonfunctional languages use type signatures to good effect, although they may have slightly different names and slightly different uses, such as the function signature in C++. Even so, signatures used to describe the inputs and outputs of the major units of application construction for a language are nothing new. The type signature in Haskell is straightforward. You use one for the findNext function in Chapter 8:

findNext :: Int -> [Int] -> Int

In this case, the expression findNext (on the left side of the double colon) expects an Int and an [Int] (list) as input, and provides an Int as output. A type signature encompasses everything needed to fully describe an expression and helps relieve potential ambiguity concerning the use of the expression. Haskell doesn't always require that you provide a type signature (many of the examples in this book don’t use one), but will raise an error if ambiguity exists in the use of an expression and you don’t provide the required type signature. When you don’t provide a type signature, the compiler infers one (as described in the previous section). Later sections of this chapter discuss some of the complexities of using type signatures.

Python can also use type signatures, but the philosophy behind Python is different from that of many other languages. The type signature isn’t enforced by the interpreter, but IDEs and other tools can use the type signature to help you locate potential problems with your code. Consider this function with the type signature:

def doAdd (value1 : int, value2 : int) -> int:
return value1 + value2

Warning The function works much as you might expect. For example, doAdd(1, 2) produces an output of 3. When you type type((doAdd(1, 2))) and press Enter, you also obtain the expected result of <class 'int'>. However, the philosophy of Python is that function calls will respect the typing needed to make the function work, so the interpreter doesn't perform any checks. The call doAdd("Hello", " Goodbye") produces an output of 'Hello Goodbye', which is most definitely not an int. When you type type((doAdd("Hello", " Goodbye"))) and press Enter, you obtain the correct, but not expected, output of <class 'str'>.

One way around this problem is to use a static type checker such as mypy (http://mypy-lang.org/). When you call on this tool, it checks your code against the signature you provide.

Technicalstuff A more complete type signature for Python would tend to include some sort of error trapping. In addition, you could use default values to make the intended input more apparent. For example, you could change doAdd to look like this:

def doAdd (value1 : int = 0, value2 : int = 0) -> int:
if not isinstance(value1, int) or
not isinstance(value2, int):
raise TypeError
return value1 + value2

The problem with this approach is that it runs counter to the Python way of performing tasks. When you add type checking code of this sort, you automatically limit the potential for other people to use functions in useful, unexpected, and completely safe ways. Python relies on an approach called Duck Typing (see http://wiki.c2.com/?DuckTyping and https://en.wikipedia.org/wiki/Duck_typing for details). Essentially, if it walks like a duck and talks like a duck, it must be a duck, despite the fact that the originator didn't envision it as a duck.

Creating types

At some point, the built-in types for any language won’t satisfy your needs and you'll need to create a custom type. The method used to create custom types varies by language. As noted in the “Understanding the functional perception of type” section, earlier in this chapter, Python views everything as an object. In this respect, Python is an object-oriented language within limits (for example, Python doesn’t actually support data hiding). With this in mind, to create a new type in Python, you create a new class, as described at https://docs.python.org/3/tutorial/classes.html and https://www.learnpython.org/en/Classes_and_Objects. This book doesn’t discuss object orientation to any degree, so you won’t see much with regard to creating custom Python types.

Haskell takes an entirely different approach to the process that is naturally in line with functional programming principles. In fact, you may be amazed to discover the sorts of things you can do with very little code. The following sections offer an overview of creating types in Haskell, emphasizing the functional programming paradigm functionality.

Using AND

Haskell has this concept of adding types together to create a new kind of type. One of the operations you can perform on these types is AND, which equates to this type and this type as a single new type. In this case, you provide a definition like this one shown here.

data CompNum = Comp Int Int

It’s essential to track the left and right side of the definition separately. The left side is the type constructor and begins with the data keyword. For now, you create a type constructor simply by providing a name, which is CompNum (for complex number, see https://www.mathsisfun.com/numbers/complex-numbers.html for details).

The right side is the data constructor. It defines the essence of the data type. In this case, it includes an identifier, Comp, followed by two Int values (the real component and the imaginary component). To create and test this type, you would use the following code:

x = Comp 5 7
:t x

The output, as you might expect, is x :: CompNum, and the new data type shows the correct data constructor. This particular version of CompNum has a problem. Type x by itself and you see the error message shown in Figure 10-1.

Screen capture of WinGHCi window with codes data CompNum = Comp Int Int; x = Comp 5 7; :t x; x :: CompNum; x and the error message.

FIGURE 10-1: This data type doesn't provide a means of showing the content.

To fix this problem, you must tell the data type to derive the required functionality. The declarative nature of Haskell means that you don’t actually have to provide an implementation; declaring that a data type does something is enough to create the implementation, as shown here:

data CompNum = Comp Int Int deriving Show
x = Comp 5 7
:t x
x

Remember The deriving keyword is important to remember because it makes your life much simpler. The new data type now works as expected (see Figure 10-2).

Screen capture of WinGHCi window with codes data CompNum = Comp Int Int deriving show; x = Comp 5 7; :t x; x :: CompNum; x and output Comp 5 7.

FIGURE 10-2: Use the deriving keyword to add features to the data type.

Using OR

One of the more interesting aspects of Haskell data types is that you can create a Sum data type — a type that contains multiple constructors that essentially define multiple associated types. To create such a type, you separate each data constructor using a bar (|), which is essentially an OR operator. The following code shows how you might create a version of CompNum (shown in the previous section) that provides for complex, purely real, and purely imaginary numbers:

data CompNum = Comp Int Int | Real Int | Img Int deriving
Show

When working with a real number, the imaginary part is always 0. Likewise, when working with an imaginary number, the real part is always 0. Consequently, the Real and Img definitions require only one Int as input. Figure 10-3 shows the new version of CompNum in action.

Screen capture of WinGHCi window with codes defining x, y, z and codes, outputs: x, Comp 5 7; y, Real 5; z, Img 7.

FIGURE 10-3: Use the deriving keyword to add features to the data type.

As you can see, you define each of the variables using the applicable data constructor. When you check type using :t, you see that they all use the same type constructor: CompNum. However, when you display the individual values, you see the kind of number that the expression contains.

Defining enumerations

The ability to enumerate values is essential as a part of categorizing. Providing distinct values for a particular real-world object's properties is important if you want to better understand the object and show how it relates to other objects in the world. Previous sections explored the use of data constructors with some sort of input, but nothing says that you must provide a value at all. The following code demonstrates how to create an enumeration in Haskell:

data Colors = Red | Blue | Green deriving (Show, Eq, Ord)

Notice that you provide only a label for the individual constructors that are then separated by an OR operator. As with previous examples, you must use deriving to allow the display of the particular variable's content. Notice, however, that this example also derives from Eq (which tests for equality) and Ord (which tests for inequality). Figure 10-4 shows how this enumeration works.

Screen capture of WinGHCi window with codes defining x, y, z as Red, Blue, Green and codes, outputs: x == y, False; x == Red, False; y < x, False; y > x, True; y >= x, True.

FIGURE 10-4: Enumerations are made of data constructors without inputs.

As usual, the individual variables all use the same data type, which is Colors in this case. You can compare the variable content. For example, x == y is False because they're two different values. Note that you can compare a variable to its data constructor, as in the case of x == Red, which is True. You have access to all of the logical operators in this case, so you could create relatively complex logic based on the truth value of this particular type.

Enumerations also appear using alternative text. Fortunately, Haskell addresses this need as well. This updated code presents the colors in a new way:

data Colors = Red | Blue | Green deriving (Eq, Ord)
instance Show Colors where
show Red = "Fire Engine Red"
show Blue = "Sky Blue"
show Green = "Apple Green"

The instance keyword defines a specific manner in which instances of this type should perform particular tasks. In this case, it defines the use of Show. Each color appears in turn with the color to associate with it. Notice that you don't define Show in deriving any longer; you use the deriving or instance form, but not both. Assuming that you create three variables as shown in Figure 10-4, (where x = Red, y = Blue, and z = Green), here's the output of this example:

x = Fire Engine Red
y = Sky Blue
z = Apple Green

Considering type constructors and data constructors

Many data sources rely on records to package data for easy use. A record has individual elements that you use together to describe something. Fortunately, you can create record types in Haskell. Here’s an example of such a type:

data Name = Employee {
first :: String,
middle :: Char,
last :: String} deriving Show

The Name type includes a data constructor for Employee that contains fields named first and last of type String and middle of type Char.

newbie = Employee "Sam" 'L' "Wise"

Notice that the 'L' must appear in single quotes to make it the Char type, while the other two entries appear in double quotes to make them the String type. Because you've derived Show, you can display the record, as shown in Figure 10-5. Just in case you’re wondering, you can also display individual field values, as shown in the figure.

Screen capture of WinGHCi window with codes ending last :: l} deriving Show defining first, middle, last and code newbie, Employee {first = “Sam”, Middle = 'L', last = “Wise”} and print(first newbie) with output “Sam”.

FIGURE 10-5: Haskell supports record types using special data constructor syntax.

The problem with this construction is that it’s rigid, and you may need flexibility. Another way to create records (or any other type, for that matter) is to add the arguments to the type constructor instead, as shown here:

data Name f m l = Employee {
first :: f,
middle :: m,
last :: l} deriving Show

This form of construction is parameterized, which means that the input comes from the type constructor. The difference is that you can now create the record using a Char or a String for the middle name. Unfortunately, you can also create Employee records that really don't make any sense at all, as shown in Figure 10-6, unless you create a corresponding type signature of Name :: (String String String) -> Employee.

Screen capture of WinGHCi window with codes ending last :: l} deriving Show defining first, middle, last and code, output: wow, Employee {first = 1, middle = True, last = 2.2}.

FIGURE 10-6: Parameterized types are more flexible.

Remember Haskell supports an incredibly rich set of type structures, and this chapter doesn't do much more than get you started on understanding them. The article at https://wiki.haskell.org/Constructor provides some additional information about type constructors and data constructors, including the use of recursive types.

Composing Types

The following sections talk about composing special types: monoids, monads, and semigroups. What makes these types special is that they have a basis in math, as do most things functional; this particular math, however, is about abstracting away details so that you can see the underlying general rules that govern something and then develop code to satisfy those rules.

Remember The reason you want to perform the abstraction process is that it helps you create better code with fewer (or possibly no) side effects. Aren't functional languages supposed to be free of side effects, though? Generally, yes, but some activities, such as getting user input, introduces side effects. The math part of functional programming is side-effect free, but the moment you introduce user interaction (as an example), you begin having to perform tasks in a certain order, which introduces side effects. The article at https://wiki.haskell.org/Haskell_IO_for_Imperative_Programmers provides a good overview of why side effects are unavoidable and, in some case, actually necessary.

Understanding monoids

The “Considering the math basis for monoids and semigroups” sidebar may still have you confused. Sometimes an example works best to show how something actually works, instead of all the jargon used to describe it. So, this section begins with a Haskell list, which is a monoid, as it turns out. To prove that it’s a monoid, a list has to follow three laws:

  • Closure: The result of an operation must always be a part of the set comprising the group that defines the monoid.
  • Associativity: The order in which operations on three or more objects occur shouldn’t matter. However, the order of the individual elements can matter.
  • Identity: There is always an operation that does nothing.

Lists automatically address the first law. If you’re working with a list of numbers, performing an operation on that list will result in a numeric output, even if that output is another list. In other words, you can’t create a list of numbers, perform an operation on it, and get a Char result. To demonstrate the other two rules, you begin by creating the following three lists:

a = [1, 2, 3]
b = [4, 5, 6]
c = [7, 8, 9]

In this case, the example uses concatenation (++) to create a single list from the three lists. The associativity law demands that the order in which an operation occurs shouldn't matter, but that the order of the individual elements can matter. The following two lines test both of these criteria:

(a ++ b) ++ c == a ++ (b ++ c)
(a ++ b) ++ c == (c ++ b) ++ a

The output of the first comparison is True because the order of the concatenation doesn’t matter. The output of the second comparison is False because the order of the individual elements does matter.

The third law, the identity law, requires the use of an empty list, which is equivalent to the 0 in the set of all numbers that is often used to explain identity. Consequently, both of these statements are true:

a ++ [] == a
[] ++ a == a

When performing tasks using some Haskell, you need to use import Data.Monoid. This is the case when working with strings. As shown in Figure 10-7, strings also work just fine as monoids. Note the demonstration of identity using an empty string. In fact, many Haskell collection types work as monoids with a variety of operators, including Sequence, Map, Set, IntMap, and IntSet. Using the custom type examples described earlier in the chapter as a starting point, any collection that you use as a basis for a new type will automatically have the monoid functionality built in. The example at https://www.yesodweb.com/blog/2012/10/generic-monoid shows a more complex Haskell implementation of monoids as a custom type (using a record in this case).

Screen capture of WinGHCi window with import Data.Monoid used with codes defining a, b, c as Hello, There, ! and code, output: (a ++ b) ++ c == a ++ (b ++ c), True; a ++ [] == a, True.

FIGURE 10-7: Strings can act as monoids, too.

Tip After you import Data.Monoid, you also have access to the <> operator to perform append operations. For example, the following line of Haskell code tests the associative law:

(a <> b) <> c == a <> (b <> c)

Remember Even though this section has focused on the simple task of appending one object to another, most languages provide an assortment of additional functions to use with monoids, which is what makes monoids particularly useful. For example, Haskell provides the Dual function, which reverses the output of an append operation. The following statement is true because the right expression uses the Dual function:

((a <> b) <> c) == getDual ((Dual c <> Dual b) <> Dual a)

Even though the right side would seem not to work based on earlier text, the use of the Dual function makes it possible. To make the statement work, you must also call getDual to convert the Dual object to a standard list. You can find more functions of this sort at http://hackage.haskell.org/package/base-4.11.1.0/docs/Data-Monoid.html.

The same rules for collections apply with Python. As shown in Figure 10-8, Python lists behave in the same manner as Haskell lists.

Screen capture of python window with codes defining a  = [1, 2, 3]; b = [4, 5, 6]; c = [7, 8, 9] and code, output: (a + b) + c == a + (b + c), True; a + [] == a, True; [] + a = a, True.

FIGURE 10-8: Python collections can also act as monoids.

Tip In contrast to Haskell, Python doesn't have a built-in monoid class that you can use as a basis for creating your own type with monoid support. However, you can see plenty of Python monoid implementations online. The explanation at https://github.com/justanr/pynads/blob/master/pynads/abc/monoid.py describes how you can implement the Haskell functionality as part of Python. The implementation at https://gist.github.com/zeeshanlakhani/1284589 is shorter and probably easier to use, plus it comes with examples of how to use the class in your own code.

Considering the use of Nothing, Maybe, and Just

Haskell doesn’t actually have a universal sort of null value. It does have Nothing, but to use Nothing, the underlying type must support it. In addition, Nothing is actually something, so it's not actually null (which truly is nothing). If you assign Nothing to a variable and then print the variable onscreen, Haskell tells you that its value is Nothing. In short, Nothing is a special kind of value that tells you that the data is missing, without actually assigning null to the variable. Using this approach has significant advantages, not the least of which is fewer application crashes and less potential for a missing value to create security holes.

You normally don't assign Nothing to a variable directly. Rather, you create a function or other expression that makes the assignment. The following example shows a simple function that simply adds two numbers. However, the numbers must be positive integers greater than 0:

doAdd::Int -> Int -> Maybe Int
doAdd _ 0 = Nothing
doAdd 0 _ = Nothing
doAdd x y = Just (x + y)

Notice that the type signature has Maybe Int as the output. This means that the output could be an Int or Nothing. Before you can use this example, you need to load some support for it:

import Data.Maybe as Dm

To test this how Maybe works, you can try various versions of the function call:

doAdd 5 0
doAdd 0 6
doAdd 5 6

The first two result in an output of Nothing. However, the third results in an output of Just 11. Of course, now you have a problem, because you can't use the output of Just 11 as numeric input to something else. To overcome this problem, you can make a call to fromMaybe 0 (doAdd 5 6). The output will now appear as 11. Likewise, when the output is Nothing, you see a value of 0, as shown in Figure 10-9. The first value to fromMaybe, 0, tells what to output when the output of the function call is Nothing. Consequently, if you want to avoid the whole Nothing issue with the next call, you can instead provide a value of 1.

Screen capture of WinGHCi with UseMaybe.hs loaded and code, output: doAdd 5 0, Nothing; doAdd 6 0, Nothing; doAdd 5 6, Just 11; fromMaybe (doAdd 5 6}, 11; fromMaybe (doAdd 0 6}, 0; fromMaybe (doAdd 5 0}, 0.

FIGURE 10-9: Haskell enables you to process data in unique ways with little code.

Tip As you might guess, Python doesn't come with Maybe and Just installed. However, you can add this functionality or rely on code that others have created. The article at http://blog.senko.net/maybe-monad-in-python describes this process and provides a link to a Maybe implementation that you can use with Python. The PyMonad library found at https://pypi.org/project/PyMonad/ also includes all of the required features and is easy to use.

Understanding semigroups

The “Understanding monoids” section, earlier in this chapter, discusses three rules that monoids must follow. Semigroups are like monoids except that they have no identity requirement. Semigroups actually represent a final level of abstraction, as discussed in the earlier sidebar, “Considering the math basis for monoids and semigroups”. At this final level, things are as simple and flexible as possible. Of course, sometimes you really do need to handle a situation in which something is Nothing, and the identity rule aids in dealing with this issue. People have differing opinions over the need for and usefulness of semigroups, as shown in the discussion at https://stackoverflow.com/questions/40688352/why-prefer-monoids-over-semigroups-in-haskell-why-do-we-need-mempty. However, a good rule of thumb is to use the simplest abstraction when possible, which would be semigroups whenever possible. To work with semigroups, you must execute import Data.Semigroup.

Remember You may wonder why you would use a semigroup when a monoid seems so much more capable. An example of an object that must use a semigroup is a bounding box. A bounding box can't be empty; it must take up some space or it doesn’t exist and therefore the accompanying object has no purpose. Another example of when to use a semigroup is Data.List.NonEmpty (http://hackage.haskell.org/package/base-4.11.1.0/docs/Data-List-NonEmpty.html), which is a list that must always have at least one entry. Using a monoid in this case wouldn’t work. The point is that semigroups have a definite place in creating robust code, and in some cases, you actually open your code to error conditions by not using them. Fortunately, semigroups work much the same as monoids, so if you know how to use one, you know how to use the other.

Parameterizing Types

The “Considering type constructors and data constructors” section, earlier in this chapter, shows you one example of a parameterized type in the form of the Name type. In that section, you consider two kinds of constructions for the Name type that essentially end in the same result. However, you need to use parameterized types at the right time. Parameterized types work best when the type acts as a sort of box that could hold any sort of value. The Name type is pretty specific, so it's not the best type to parameterize because it really can’t accept just any kind of input.

A better example for parameterizing types would be to create a custom tuple that accepts three inputs and provides the means to access each member using a special function. It would be sort of an extension of the fst and snd functions provided by the default tuple. In addition, when creating a type of this sort, you want to provide some sort of conversion feature to a default. Here is the code used for this example:

data Triple a b c = Triple (a, b, c) deriving Show

fstT (Triple (a, b, c)) = show a
sndT (Triple (a, b, c)) = show b
thdT (Triple (a, b, c)) = show c

cvtToTuple (Triple (a, b, c)) = (a, b, c)

In this case, the type uses parameters to create a new value: a, b, and c represent elements of any type. Consequently, this example starts with a real tuple, but of a special kind, Triple. When you display the value using show, the output looks like any other custom type.

The special functions enable you to access specific elements of the Triple. To avoid name confusion, the example uses a similar, but different, naming strategy of fstT, sndT, and thdT. Theoretically, you could use wildcard characters for each of the nonessential inputs, but good reason exists to do so in this case.

Finally, cvtToTuple enables you to change a Triple back into a tuple with three elements. The converted tuple has all the same functionality as a tuple that you create any other way. The following test code lets you check the operation of the type and associated functions:

x = Triple("Hello", 1, True)

show(x)
fstT(x)
sndT(x)
thdT(x)
show(cvtToTuple(x)))

The outputs demonstrate that the type works as expected:

Triple ("Hello",1,True)
"Hello"
1
True
("Hello",1,True)

Unfortunately, there isn't a Python equivalent of this code. You can mimic it, but you must create a custom solution. The material at https://ioam.github.io/param/Reference_Manual/param.html#parameterized-module and https://stackoverflow.com/questions/46382170/how-can-i-create-my-own-parameterized-type-in-python-like-optionalt is helpful, but this is one time when you may want to rely on Haskell if this sort of task is critical for your particular application and you don’t want to create a custom solution.

Dealing with Missing Data

In a perfect world, all data acquisition would result in complete records with nothing missing and nothing wrong. However, in the real world, datasets often contain a lot of missing data, and you're often left wondering just how to address the issue so that your analysis is correct, your application doesn’t crash, and no one from the outside can corrupt your setup using something like a virus. The following sections don’t handle every possible missing-data issue, but they give you an overview of what can go wrong as well as offer possible fixes for it.

Handling nulls

Different languages use different terms for the absence of a value. Python uses the term None and Haskell uses the term Nothing. In both cases, the value indicates an absence of an anticipated value. Often, the reasons for the missing data aren’t evident. The issue is that the data is missing, which means that it's not available for use in analysis or other purposes. In some languages, the missing value can cause crashes or open a doorway to viruses (see the upcoming “Null values, the billion-dollar mistake” sidebar for more information).

When working with Haskell, you must provide a check for Nothing values, as described in the “Considering the use of Maybe and Just” section, earlier in this chapter. The goal is to ensure that the checks in place now that a good reason for unchecked null values no longer exist. Of course, you must still write your code proactively to handle the Nothing case (helped by the Haskell runtime that ensures that functions receive proper values). The point is that Haskell doesn't have an independent type that you can call upon as Nothing; the Nothing type is associated with each data type that requires it, which makes locating and handling null values easier.

Python does include an individual null type called None, and you can assign it to a variable. However, note that None is still an object in Python, although it's not in other languages. The variable still has an object assigned to it: the None object. Because None is an object, you can check for it using is. In addition, because of the nature of None, it tends to cause fewer crashes and leave fewer doors open to nefarious individuals. Here is an example of using None:

x = None
if x is None:
print("x is missing")

Remember The output of this example is x is missing, as you might expect. You should also note that Python lacks the concept of pointers, which is a huge cause of null values in other languages. Someone will likely point out that you can also check for None using x == None. This is a bad idea because you can override the == (equality) operator but you can't override is, which means that using is provides a consistent behavior. The discussion at https://stackoverflow.com/questions/3289601/null-object-in-python provides all the details about the differences between == and is and why you should always use is.

Performing data replacement

Missing and incorrect data present problems. Before you can do anything at all, you must verify the dataset you use. Creating types (using the techniques found in earlier sections in the chapter) that automatically verify their own data is a good start. For example, you can create a type for bank balances that ensure that the balance is never negative (unless you want to allow an overdraft). However, even with the best type construction available, a dataset may contain unusable data entries or some entries that don’t contain data at all. Consequently, you must perform verification of such issues as missing data and data that appears out of range.

After you find missing or incorrect data, you consider the ramifications of the error. In most cases, you have the following three options:

  • Ignore the issue
  • Correct the entry
  • Delete the entry and associated elements

Ignoring the issue might cause the application to fail and will most certainly produce inaccurate results when the entry is critical for analysis. However, most datasets contain superfluous entries — those that you can ignore unless you require the amplifying information they provide.

Correcting the entry is time consuming in most cases because you must now define a method of correction. Because you don’t know what caused the data error in the first place, or the original data value, any correction you make will be flawed to some extent. Some people use statistical measures (as described in the next section) to make a correction that neither adds to nor detracts from the overall statistical picture of the entries taken together. Unfortunately, even this approach is flawed because the entry may have represented an important departure from the norm.

Deleting the entry is fast and fixes the problem in a way that’s unlikely to cause the application to crash. However, deleting the entry comes with the problem of affecting any analysis you perform. In addition, deleting an entire row (case) from a dataset means losing not only the corrected entry (the particular feature) but also all the other entries in that row. Consequently, deletion of a row can cause noticeable data damage in some cases.

Considering statistical measures

A statistical measure is one that relies on math to create some sort of overall or average entry to use in place of a missing or incorrect entry. Depending on the data in question and the manner in which you create types to support your application, you may be able to rely on statistical measures to fix at least some problems in your dataset.

Remember Statistical measures generally see use for only numeric data. For example, guessing about the content of a string field would be impossible. If the analysis you perform on the string field involves a numeric measure such as length or frequency of specific letters, you might use statistical measures to create a greeked text (essentially nonsense text) replacement (see http://www.webdesignerstoolkit.com/copy.php for details), but you can’t create the actual original text.

Some statistical corrections for missing or inaccurate data see more use than others do. In fact, you can narrow the list of commonly used statistical measures down to these:

  • Average (or mean): A calculation that involves adding all the values in a column together and dividing by the number of items in that column. The result is a number that is the average of all the numbers in the column. This is the measure that is least likely to affect your analysis.
  • Median: The middle value of a series of numbers. This value is not necessarily an average but is simply a middle value. For example, in the series 1, 2, 4, 4, 5, the value 4 is the median because it appears in the middle of the set. The average (or mean) would be 3.2 instead. This is the measure that is most likely to represent the middle value and generally affects the analysis only slightly.
  • Most common (mode): The number that appears most often in a series, even if the value is at either end of the scale. For example, in the series 1, 1, 1, 2, 4, 5, 6, the mode is 1, the average is 2.8, and the median is 2. This is the measure that reflects the value that has the highest probability of being correct, even if it affects your analysis significantly.

As you can see, using the right statistical measure is important. Of course, there are many other statistical measures, and you may find that one of them fits your data better. A technique that you can use to ensure that especially critical values are the most accurate possible is to plot the data to see what shape it creates and then use a statistical measure based on shape.

Creating and Using Type Classes

Haskell has plenty of type classes. In fact, you use them several times in this chapter. The most common type classes include Eq, Ord, Show, Read, Enum, Bounded, Num, Integral, and Floating. The name type class confuses a great many people — especially those with an Object-Oriented Programming (OOP) background. In addition, some people confuse type classes and types such as Int, Float, Double, Bool, and Char. Perhaps the best way to view a type class is as a kind of interface in which you describe what to do but not how to do it. You can't use a type class directly; rather, you derive from it. The following example shows how to use a type class named Equal:

class Equal a where (##) :: a -> a -> Bool

data MyNum = I Int deriving Show
instance Equal MyNum where
(I i1) ## (I i2) = i1 == i2

In this case, Equal defines the ## operator, which Haskell doesn't actually use. Equal accepts two values of any type, but of the same types (as defined by a) and outputs a Bool. However, other than these facts, Equal has no implementation.

MyNum, a type, defines I as accepting a single Int value. It derives from the common type class, Show, and then implements an instance of Equal. When creating your own type class, you must create an implementation for it in any type that will use it. In this case, Equal simply checks the equality of two variables of type MyNum. You can use the following code to test the result:

x = I 5
y = I 5
z = I 6

x ## y
x ## z

In the first case, the comparison between x and y, you get True as the output. In the second case, the comparison of x and z, you get False as the output. Type classes provide an effective means of creating common methods of extending basic type functionality. Of course, the implementation of the type class depends on the needs of the deriving type.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.107.32