Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 6

Working with Lists and Strings

IN THIS CHAPTER

Understanding and using lists

Manipulating lists

Working with Dict and Set

Using strings

Chapter 5 may have given you the idea that the use of lambda calculus in the functional programming paradigm precludes the use of standard programming structures in application design. That's not the case, however, and this chapter is here to dispel that myth. In this chapter, you begin with one of the most common and simplest data structures in use today: lists. A list is a programmatic representation of the real-world object. Everyone creates lists in real life and for all sorts of reasons. (Just imagine shopping for groceries without a list.) You do the same thing in your applications, even when you’re writing code using the functional style. Of course, the functional programming paradigm offers a few surprises, and the chapter discusses them, too.

Sometimes you need to create data structures with greater complexity, which is where the Dict and Set structures come in. Different languages use different terms for these two data structures, but the operation is essentially the same. A Dict offers an ordered list containing name and value pairs. You access the values using the associated name, and the name is what provides the order. A Set offers an unordered collection of elements of the same type with no duplicates. You often use a Set to eliminate duplicate entries from a dataset or to perform mathematical operations such as union and intersection.

The final topic in this chapter involves the use of strings. From a human perspective, strings are an essential means of communicating information. Remember, though, that a computer sees them solely as a string of numbers. Computers work only with numbers, never text, so the representation of a string really is a combination of things that you might not normally think of going together. As with all the other examples in this book, the Haskell and Python string examples use the functional coding paradigm, rather than other paradigms you may have used in the past.

Defining List Uses

After you have used lists, you might be tempted to ask what a list can't do. The list data structure is the most versatile offering for most languages. In most cases, lists are simply a sequence of values that need not be of the same type. You access the elements in a list using an index that begins at 0 for most languages, but could start at 1 for some. The indexing method varies among languages, but accessing specific values using an index is common. Besides storing a sequence of values, you sometimes see lists used in these coding contexts:

Stack
Queue
Deque
Sets

Generally, lists offer more manipulation methods than other kinds of data structures simply because the rules for using them are so relaxed. Many of these manipulation methods give lists a bit more structure for use in meeting specialized needs. The “Performing Common List Manipulations” section, later in this chapter, describes these manipulations in detail. Lists are also easy to search and to perform various kinds of analysis. The point is that lists often offer significant flexibility at the cost of absolute reliability and dependability. (You can easily use lists incorrectly, or create scenarios in which lists can actually cause an application to crash, such as when you add an element of the wrong type.)

Depending on the language you use, lists can provide an impressive array of features and make conversions between types easier. For example, using an iterator in Python lets you perform tasks such as outputting the list as a tuple, processing the content one element at a time, and unpacking the list into separate variables. When working in Haskell, you can create list comprehensions, which are similar in nature to the set comprehensions you work with in math class. The list features you obtain with a particular language depend on the functions the language provides and your own creativity in applying them.

Creating Lists

Before you can use a list, you must create one. Fortunately, most languages make creating lists extremely easy. In some cases, it’s a matter of placing a list of values or objects within the correct set of symbols, such as square brackets (which appear to be the most commonly used symbols).

The most important thing about creating lists is to ensure that you understand how you plan to use the list within the application. Sometimes developers create a freeform list and find out later that controlling the acceptable data types would have been a better idea. Some languages provide methods for ensuring that lists remain pure, but often the ability to control list content is something to add programmatically. The following sections describe how to create lists, first in Haskell and then in Python.

LIST AND ARRAY DIFFERENCE

At first, lists may simply seem to be another kind of array. Many people wonder how lists and arrays differ. After all, from a programming perspective, the two can sound like the same thing. It’s true that lists and arrays both store data sequentially, and you can often store any sort of data you want in either structure (although arrays tend to be more restrictive).

The main difference comes in how arrays and lists store the data. An array always stores data in sequential memory locations, which gives an array faster access times in some situations but also slows the creation of arrays. In addition, because an array must appear in sequential memory, updating arrays is often hard, and some languages don’t allow you to modify arrays in the same ways that you can lists.

A list stores data using a linked data structure in which a list element consists of the data value and one or two pointers. Lists take more memory because you must now allocate memory for pointers to the next data location (and to the previous location as well in doubly-linked lists, which is the kind used by most languages today). Lists are often faster to create and add data to because of the linking mechanism, but they provide slower read access than arrays.

Using Haskell to create Lists

In Haskell, you can create lists in a number of ways. The easiest method is to define a variable to hold the list and provide the list item within square brackets, as shown here:

let a = [1, 2, 3, 4]

Notice that the declaration begins with the keyword let, followed by a lowercase variable name, which is a in this case. You could also use something more descriptive, such as myList. However, if you were to try to use an uppercase beginning letter, you receive an error message like the one shown in Figure 6-1.

Haskell provides some unique list creation features. For example, you can specify a range of values to put in a list without using any special functions. All you need to do is provide the beginning value, two dots (..), and the ending value, like this:

let b = [1..12]

You can even use a list comprehension to create a list in Haskell. For example, the following list comprehension builds a list called c based on the doubled content of list a:

let c = [x * 2 | x <- a]

In this case, Haskell sends the individual values in a to x, doubles the value of x by multiplying by 2, and then places the result in c. List comprehensions give you significant flexibility in creating customized lists. Figure 6-2 shows the output from these two specialized list-creation methods (and many others exist).

Using Python to create lists

Creating a list in Python is amazingly similar to creating a list in Haskell. The examples in this chapter are relatively simple, so you can perform them by opening an Anaconda Prompt (a command or terminal window), typing python at the command line, and pressing Enter. You use the following code to create a list similar to the one used for the Haskell examples in the previous section:

a = [1, 2, 3, 4]

In contrast to Haskell variable names, Python variable names can begin with a capital letter. Consequently, the AList example that generates an exception in Haskell, works just fine in Python, as shown in Figure 6-3.

You can also create a list in Python based on a range, but the code for doing so is different from that in Haskell. Here is one method for creating a list in Python based on a range:

b = list(range(1, 13))

This example combines the list function with the range function to create the list. Notice that the range function accepts a starting value, 1, and a stop value, 13. The resulting list will contain the values 1 through 12 because the stop value is always one more than the actual output value. You can verify this output for yourself by typing b and pressing Enter.

As does Haskell, Python supports list comprehensions, but again, the code for creating a list in this manner is different. Here's an example of how you could create the list, c, found in the previous example:

c = [a * 2 for a in range(1,5)]

This example shows the impure nature of Python because, in contrast to the Haskell example, you rely on a statement rather than lambda calculus to get the job done. As an alternative, you can define the range function stop value by specifying len(a)+1. (The alternative approach makes it easier to create a list based on comprehensions because you don't have to remember the source list length.) When you type c and press Enter, the result is the same as before, as shown in Figure 6-4.

Evaluating Lists

At some point, you have a list that contains data. The list could be useful at this point, but it isn’t actually useful until you evaluate it. Evaluating your list means more than simply reading it; it also means ascertaining the value of the list. A list becomes valuable only when the data it contains also become valuable. You can perform this task mathematically, such as determining the minimum, maximum, or mean value of the list, or you can use various forms of analysis to determine how the list affects you or your organization (or possibly a client). Of course, the first step in evaluating a list is to read it.

This chapter doesn't discuss the process of evaluation fully. In fact, no book can discuss evaluation fully because evaluation means different things to different people. However, the following sections offer enough information to at least start the process, and then you can go on to discover other means of evaluation, including performing analysis (another step in the process) using the techniques described in Part 3 and those provided by other books. The point is that evaluation means to use the data, not to change it in some way. Changes come as part of manipulation later in the chapter.

Using Haskell to evaluate Lists

The previous sections of the chapter showed that you can read a list simply by typing its identifier and pressing Enter. Of course, then you get the entire list. You may decide that you want only part of the list. One way to get just part of the list is to specify an index value, which means using the !! operator in Haskell. To see the first value in a list defined as let a = [1, 2, 3, 4, 5, 6], you type a !! 0 and press Enter. Indexes begin at 0, so the first value in list a is at index 0, not 1 as you might expect. Haskell actually provides a long list of ways to obtain just parts of lists so that you can see specific elements:

head a: Shows the value at index 0, which is 1 in this case.
tail a: Shows the remainder of the list after index 0, which is [2,3,4,5,6] in this case.
init a: Shows everything except the last element of the list, which is [1,2,3,4,5] in this case.
last a: Shows just the last element in the list, which is 6 in this case.
take 3 a: Requires the number of elements you want to see as input and then shows that number from the beginning of the list, which is [1,2,3] in this case.
drop 3 a: Requires the number of elements you don't want to see as input and then shows the remainder of the list after dropping the required elements, which is [4,5,6] in this case.

Haskell provides you with a wealth of other ways to slice and dice lists, but it really all comes down to reading the list. The next step is to perform some sort of analysis, which can come in a multitude of ways, but here are some of the simplest functions to consider:

length a: Returns the number of elements in the list, which is 6 in this case.
null a: Determines whether the list is empty and returns a Boolean result, which is False in this case.
minimum a: Determines the smallest element of a list and returns it, which is 1 in this case.
maximum a: Determines the largest element of a list and returns it, which is 6 in this case.
sum a: Adds the numbers of the list together, which is 21 in this case.
product a: Multiplies the numbers of the list together, which is 720 in this case.

Haskell does come with an amazing array of statistical functions at https://hackage.haskell.org/package/statistics, and you can likely find third-party libraries that offer even more. The “Using Haskell Libraries” section of Chapter 3 tells you how to install and import libraries as needed. However, for something simple, you can also create your own functions. For example, you can use the sum and length functions to determine the average value in a list, as shown here:

avg = x -> sum(x) `div` length(x) avg a

The output is an integer value of 3 in this case (a sum of 21/6 elements). The lambda function follows the same pattern as that used in Chapter 5. Note that no actual division operator is defined for many operations in Haskell; you use `div` instead. Trying to use something like avg = x -> sum(x) / length(x) will produce an error. In fact, a number of specialized division-oriented keywords are summarized in the article at https://ebzzry.io/en/division/.

Using Python to evaluate lists

Python provides a lot of different ways to evaluate lists. To start with, you can obtain a particular element using an index enclosed in square brackets. For example, assuming that you have a list defined as a = [1, 2, 3, 4, 5, 6], typing a[0] and pressing Enter will produce an output of 1. Unlike in Haskell, you don't have to use odd keywords to obtain various array elements; instead, you use modifications of an index, as shown here:

a[0]: Obtains the head of the list, which is 1 in this case
a[1:]: Obtains the tail of the list, which is [2,3,4,5,6] in this case
a[:-1]: Obtains all but the last element, which is [1,2,3,4,5] in this case
a[:-1]: Obtains just the last element, which is 6 in this case
a[:-3]: Performs the same as take 3 a in Haskell
a[-3:]: Performs the same as drop 3 a in Haskell

As with Haskell, Python probably provides more ways to slice and dice lists than you'll ever need or want. You can also perform similar levels of basic analysis using Python, as shown here:

len(a): Returns the number of elements in a list.
not a: Checks for an empty list. This check is different from a is None, which checks for an actual null value — a not being defined.
min(a): Returns the smallest list element.
max(a): Returns the largest list element.
sum(a): Adds the number of the list together.

Interestingly enough, Python has no single method call to obtain the product of a list — that is, all the numbers multiplied together. Python relies heavily on third-party libraries such as NumPy (http://www.numpy.org/) to perform this task. One of the easiest ways to obtain a product without resorting to a third-party library is shown here:

from functools import reduce reduce(lambda x, y: x * y, a)

The reduce method found in the functools library (see https://docs.python.org/3/library/functools.html for details) is incredibly flexible in that you can define almost any operation that works on every element in a list. In this case, the lambda function multiplies the current list element, y, by the accumulated value, x. If you wanted to encapsulate this technique into a function, you could do so using the following code:

prod = lambda z: reduce(lambda x, y: x * y, z)

To use prod to find the product of list a, you would type prod(a) and press Enter. No matter how you call it, you get the same output as in Haskell: 720.

Python does provide you with a number of statistical calculations in the statistics library (see https://pythonprogramming.net/statistics-python-3-module-mean-standard-deviation/ for details). However, as in Haskell, you may find that you want to create your own functions to determine things like the average value of the entries in a list. The following code shows the Python version:

avg = lambda x: sum(x) // len(x) avg(a)

As before, the output is 3. Note the use of the // operator to perform integer division. If you were to use the standard division operator, you would receive a floating-point value as output.

Performing Common List Manipulations

Manipulating a list means modifying it in some way to produce a desired result. A list may contain the data you need, but not the form in which you need it. You may need just part of the list, or perhaps the list is just one component in a larger calculation. Perhaps you don't need a list at all; maybe the calculation requires a tuple instead. The need to manipulate shows that the original list contains something you need, but it’s somehow incomplete, inaccurate, or flawed in some other way. The following sections provide an overview of list manipulations that you see enhanced as the book progresses.

Understanding the list manipulation functions

List manipulation means changing the list. However, in the functional programming paradigm, you can’t change anything. For all intents and purposes, every variable points to a list that is a constant — one that can’t change for any reason whatsoever. So when you work with lists in functional code, you need to consider the performance aspects of such a requirement. Every change you make to any list will require the creation of an entirely new list, and you have to point the variable to the new structure. To the developer, the list may appear to have changed, but underneath, it hasn’t — in fact, it can’t, or the underlying reason to use the functional programming paradigm fails. With this caveat in mind, here are the common list manipulations you want to consider (these manipulations are in addition to the evaluations described earlier):

Concatenation: Adding two lists together to create a new list with all the elements of both.
Repetition: Creating a specific number of duplicates of a source list.
Membership: Determining whether an element exists within a list and potentially extracting it.
Iteration: Interacting with each element of a list individually.
Editing: Removing specific elements, reversing the list in whole or in part, inserting new elements in a particular location, sorting, or in any other way modifying a part of the list while retaining the remainder.

Using Haskell to manipulate lists

Some of the Haskell list manipulation functionality comes as part of the evaluation process. You simply set the list equal to the result of the evaluation. For example, the following code places a new version of a into b:

let a = [1, 2, 3, 4, 5, 6] let b = take 3 a

You must always place the result of an evaluation into a new variable. For example, if you were to try using let a = take 3 a, as you can with other languages, Haskell would either emit an exception or it would freeze. However, you could later use a = b to move the result of the evaluation from b to a.

Haskell does provide a good supply of standard manipulation functions. For example, reverse a would produce [6,5,4,3,2,1] as output. You can also split lists using calls such as splitAt 3 a, which produces a tuple containing two lists as output: ([1,2,3],[4,5,6]). To concatenate two lists, you use the concatenation operator: ++. For example, to concatenate a and b, you use a ++ b.

You should know about some interesting Haskell functions. For example, the filter function removes certain elements based on specific criteria, such as all odd numbers. In this case, you use filter odd a to produce an output of [1,3,5]. The zip function is also exceptionally useful. You can use it to combine two lists. Use zip a ['a', 'b', 'c', 'd', 'e', 'f'] to create a new list of tuples like this: [(1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e'),(6,'f')]. All these functions appear in the Data.List library that you can find discussed at http://hackage.haskell.org/package/base-4.11.1.0/docs/Data-List.html.

Using Python to manipulate lists

When working with Python, you have access to a whole array of list manipulation functions. Many of them are dot functions you append to a list. For example, using a list like a = [1, 2, 3, 4, 5, 6], reversing the list would require the reverse function like this: a.reverse(). However, what you get isn't the output you expected, but a changed version of a. Instead of the original list, a now contains: [6, 5, 4, 3, 2, 1].

Of course, using the dot functions is fine if you want to modify your original list, but in many situations, modifying the original idea is simply a bad idea, so you need another way to accomplish the task. In this case, you can use the following code to reverse a list and place the result in another list without modifying the original:

reverse = lambda x: x[::-1] b = reverse(a)

As with Haskell, Python provides an amazing array of list functions — too many to cover in this chapter (but you do see more as the book progresses). One of the best places to find a comprehensive list of Python list functions is at https://likegeeks.com/python-list-functions/.

Understanding the Dictionary and Set Alternatives

This chapter doesn't cover dictionaries and sets in any detail. You use these two structures in detail in Part 3 of the book. However, note that both dictionaries and sets are alternatives to lists and enforce certain rules that make working with data easier because they enforce greater structure and specialization. As mentioned in the chapter introduction, a dictionary uses name/value pairs to make accessing data easier and to provide uniqueness. A set also enforces uniqueness, but without the use of the keys offered by the name part of the name/value pair. You often use dictionaries to store complex datasets and sets to perform specialized math-related tasks.

Using dictionaries

Both Haskell and Python support dictionaries. However, when working with Haskell, you use the HashMap (or a Map). In both cases, you provide name value pairs, as shown here for Python:

myDict = {"First": 1, "Second": 2, "Third": 3}

The first value, the name, is also a key. The keys are separated from the values by a colon; individual entries are separated by commas. You can access any value in the dictionary using the key, such as print(myDict["First"]). The Haskell version of dictionaries looks like this:

import qualified Data.Map as M let myDict = M.fromList[("First", 1), ("Second", 2), ("Third", 3)] import qualified Data.HashMap.Strict as HM let myDict2 = HM.fromList[("First", 1), ("Second", 2), ("Third", 3)]

The Map and HashMap objects are different; you can't interchange them. The two structures are implemented differently internally, and you may find performance differences using one over the other. In creation and use, the two are hard to tell apart. To access a particular member of the dictionary, you use M.lookup "First" myDict for the first and HM.lookup "First" myDict2 for the second. In both cases, the output is Just 1, which indicates that there is only one match and its value is 1. (The discussion at https://stackoverflow.com/questions/7894867/performant-haskell-hashed-structure provides some additional details on how the data structures differ.)

Using sets

Sets in Python are either mutable (the set object) or immutable (the frozenset object). The immutability of the frozenset allows you to use it as a subset within another set or make it hashable for use in a dictionary. (The set object doesn't offer these features.) There are other kinds of sets, too, but for now, the focus is on immutable sets for functional programming uses. Consequently, you see the frozenset used in this book, but be aware that other set types exist that may work better for your particular application. The following code creates a frozenset:

myFSet = frozenset([1, 2, 3, 4, 5, 6])

You use the frozenset to perform math operations or to act as a list of items. For example, you could create a set consisting of the days of the week. You can't locate individual values in a frozenset but rather must interact with the object as a whole. However, the object is iterable, so the following code tells you whether myFSet contains the value 1:

for entry in myFSet: if entry == 1: print(True)

Haskell sets follow a pattern similar to that used for dictionaries. As with all other Haskell objects, sets are immutable, so you don't need to make the same choices as you do when working with Python. The following code shows how to create a set:

import Data.Set as Set let mySet = Set.fromList[1, 2, 3, 4, 5, 6]

Oddly enough, the Haskell set is a lot easier to use than the Python set. For example, if you want to know whether mySet contains the value 1, you simply make the following call:

Set.member 1 mySet

Considering the Use of Strings

Strings convey thoughts in human terms. Humans don't typically speak numbers or math; they use strings of words made up of individual letters to convey thoughts and feelings. Unfortunately, computers don’t know what a letter is, much less strings of letters used to create words or groups of words used to create sentences. None of it makes sense to computers. So, as foreign as numbers and math might be to most humans, strings are just as foreign to the computer (if not more so). The following sections provide an overview of the use of strings within the functional programming paradigm.

Understanding the uses for strings

Humans see several kinds of objects as strings, but computer languages usually treat them as separate entities. Two of them are important for programming tasks in this book: characters and strings. A character is a single element from a character set, such as the letter A. Character sets can contain nonletter components, such as the carriage return control character. Extended character sets can provide access to letters used in languages other than English. However, no matter how someone structures a character set, a character is always a single entity within that character set. Depending on how the originator structures the character set, an individual character can consume 7, 8, 16, or even 32-bits.

A string is a sequential grouping of zero or more characters from a character set. When a string contains zero elements, it appears as an empty string. Most strings contain at least one character, however. The representation of a character in memory is relatively standard across languages; it consumes just one memory location for the specific size of that character. Strings, however, appear in various forms depending on the language. So computer languages treat strings differently from characters because of how each of them uses memory.

Strings don’t just see use as user output in applications. Yes, you do use strings to communicate with the user, but you can also use strings for other purposes such as labeling numeric data within a dataset. Strings are also central to certain data formats, such as XML. In addition, strings appear as a kind of data. For example, HTML relies on the agent string to identify the characteristics of the client system. Consequently, even if your application doesn’t ever interact with the user, you're likely to use strings in some capacity.

Performing string-related tasks in Haskell

A string is actually a list of characters in Haskell. To see this for yourself, create a new string by typing let myString = "Hello There!" and pressing Enter. On the next line, type :t myString and press Enter. The output will tell you that myString is of type [Char], a character list.

As you might expect from a purely functional language, Haskell strings are also immutable. When you assign a new string to a Haskell string variable, what you really do is create a new string and point the variable to it. Strings in Haskell are the equivalents of constants in other languages.

Haskell does provide a few string-specific libraries, such as Data.String, where you find functions such as lines (which breaks a string into individual strings in a list between new line characters, ) and words (which breaks strings into a list of individual words). You can see the results of these functions in Figure 6-5.

Screen capture of WinGHCi window with the codes let myString = “Hello There!”; :t myString with the output myString :: Char, and the codes import Data.String; lines testString; and words testString. — FIGURE 6-5: Haskell offers at least a few string-related libraries.

Later chapters spend more time dealing with Haskell strings, but string management is acknowledged as one of the major shortfalls of this particular language. The article at https://mmhaskell.com/blog/2017/5/15/untangling-haskells-strings provides a succinct discussion of some of the issues and demonstrates some string-management techniques. The one thing to get out of this article is that you actually have five different types to deal with if you want to fully implement strings in Haskell.

Performing string-related tasks in Python

Python, as an impure language, also comes with a full list of string functions — too many to go into in this chapter. Creating a string is exceptionally easy: You just type myString = "Hello There!" and press Enter. Strings are first-class citizens in Python, and you have access to all the usual manipulation features found in other languages, including special formatting and escape characters. (The tutorial at https://www.tutorialspoint.com/python/python_strings.htm doesn't even begin to show you everything, but it’s a good start.)

An important issue for Python developers is that strings are immutable. Of course, that leads to all sorts of questions relating to how someone can seemingly change the value of a string in a variable. However, what really happens is that when you change the contents of a variable, Python actually creates a new string and points the variable to that string rather than the existing string.

One of the more interesting aspects of Python is that you can also treat strings sort of like lists. The “Using Python to evaluate lists” section talks about how to evaluate lists, and many of the same features work with strings. You have access to all the indexing features to start with, but you can also do things like use min(myString), which returns the space, or max(myString), which returns r, to process your strings. Obviously, you can't use sum(myString) because there is nothing to sum. With Python, if you’re not quite sure whether something will work on a string, give it a try.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.