Chapter 7. Data Structures

In this chapter, we introduce the important idea of data structures: collections of values along with commonly performed functions. Python’s programmer-friendly philosophy is to provide a few powerful and efficient data structures—tuples, lists, dictionaries, and sets—that can be combined as needed to make more complex ones.

In the previous chapter we discussed strings, which can be thought of as data structures restricted to storing sequences of characters. The data structures in this chapter can contain not just characters but almost any kind of data.

Python’s two workhorse data structures are lists and dictionaries. Lists are simple, flexible data structures with many uses. Dictionaries, while necessarily a little more complex than lists, are an extremely efficient way to relate different pieces of data. Python provides a lot of support for both lists and dictionaries; in particular it has an extremely useful list-creation technique known as list comprehensions.

The type Command

It’s occasionally useful to ask Python to check the data type of a value or a variable. This is easily done with the built-in type command:

>>> type(5)
<class 'int'>
>>> type(5.0)
<class 'float'>
>>> type('5')
<class 'str'>
>>> type(None)
<class 'NoneType'>
>>> type(print)
<class 'builtin_function_or_method'>

The term class comes from object-oriented programming. Roughly speaking, classes and types are synonymous. As we will see in Chapter 10, one of the great benefits of classes is that you can create your own.

One common application of the type command is to determine the type of data stored in a collection. Python lets you store values with different types in the same data structure. For example, you can store numbers and strings in the same list; in contrast, the popular Java programming language does not let you mix data types this way. A Java list must be all strings or all integers.

Sequences

In Python, a sequence is an ordered collection of values. Python comes with three built-in sequence types: strings, tuples, and lists.

One very nice feature of sequences is that they share the same indexing and slicing commands that we saw for strings in the previous chapter. Thus, all sequences have the following characteristics:

  • They have positive indexes starting at 0 at their left end.

  • They have negative indexes starting at −1 at their right end.

  • They can have copies of sub-sequences made using slice notation. For example, seq[begin:end] returns a copy of the elements of seq starting at index location begin and ending at location end - 1.

  • They can be concatenated using + and *. The sequences must be of the same type for this to work—that is to say, you cannot concatenate a tuple and a list.

  • They have their length determined by the len function. For example, len(s) is the number of items in sequence s.

  • They test for membership using x in s—that is, x in s evaluates to True just when the element x is in the sequence s.

In practice, strings and lists are the most common kinds of sequences. Tuples have their uses but appear much less often.

It is often easy to write code that works with any sequence, no matter if it is a string, a list, or a tuple. For example, seq[-1] returns the last element of seq whether seq is a tuple, list, or string.

Tuples

A tuple is an immutable sequence of 0 or more values. It can contain any Python value—even other tuples. Here are a few examples:

>>> items = (-6, 'cat', (1, 2))
>>> items
(-6, 'cat', (1, 2))
>>> len(items)
3
>>> items[-1]
(1, 2)
>>> items[-1][0]
1

As you can see, the items of a tuple are enclosed in round brackets and separated by commas. The empty tuple is represented by (), but tuples with a single item (singleton tuples) have the unusual notation (x,). For instance:

>>> type(())
<class 'tuple'>
>>> type((5,))
<class 'tuple'>
>>> type((5))
<class 'int'>

If you forget the comma at the end of a singleton tuple, you have not created a tuple: All you’ve done is put brackets around an expression, which means you are specifying its order of evaluation.

Tuple immutability

As mentioned, tuples are immutable, meaning that once you’ve created a tuple, you cannot change it. This is not so unusual: Strings, integers, and floats are also immutable. If you do need to change a tuple, then you must create a new tuple that embodies the changes. For example, here’s how you can chop off the first element of a tuple:

>>> lucky = (6, 7, 21, 77)
>>> lucky
(6, 7, 21, 77)
>>> lucky2 = lucky[1:]
>>> lucky2
(7, 21, 77)
>>> lucky
(6, 7, 21, 77)

On the plus side, immutability makes it impossible to accidentally modify a tuple, which helps prevent errors. On the minus side, making even the smallest change to a tuple requires copying essentially the whole thing, and this can waste valuable time and memory for large tuples. If you find yourself needing to make frequent modifications to a tuple, then you should be using a list instead.

Tuple functions

Table 7.1 lists the most commonly used tuple functions. Compared with strings and lists, tuples have a relatively small number of functions. Here are some examples of how they are used:

>>> pets = ('dog', 'cat', 'bird', 'dog')
>>> pets
('dog', 'cat', 'bird', 'dog')
>>> 'bird' in pets
True
>>> 'cow' in pets
False
>>> len(pets)
4
>>> pets.count('dog')
2
>>> pets.count('fish')
0
>>> pets.index('dog')
0
>>> pets.index('bird')
2
>>> pets.index('mouse')
Traceback (most recent call last):
  File "<pyshell#41>", line 1, in <module>
    pets.index('mouse')
ValueError: tuple.index(x): x not in list

Table 7.1. Tuple Functions

NAME

RETURN VALUE

x in tup

True if x is an element of tup, False otherwise

len(tup)

Number of elements in tup

tup.count(x)

Number of times element x occurs in tup

tup.index(x)

Index location of the first (leftmost) occurrence of x in tup; if x is not in tup, raises a ValueError exception

As with strings, you can use + and * to concatenate tuples:

>>> tup1 = (1, 2, 3)
>>> tup2 = (4, 5, 6)
>>> tup1 + tup2
(1, 2, 3, 4, 5, 6)
>>> tup1 * 2
(1, 2, 3, 1, 2, 3)

Lists

Lists are essentially the same as tuples but with one key difference: Lists are mutable. That is, you can add, remove, or modify elements to a list without making a copy. In practice, lists are used far more frequently than tuples (indeed, some Python programmers are only faintly aware that tuples exist!).

The elements of a list are separated by commas and enclosed in square brackets. As with strings and tuples, you can easily get the length of a list (using len), and concatenate lists (using + and *):

>>> numbers = [7, -7, 2, 3, 2]
>>> numbers
[7, -7, 2, 3, 2]
>>> len(numbers)
5
>>> numbers + numbers
[7, -7, 2, 3, 2, 7, -7, 2, 3, 2]
>>> numbers * 2
[7, -7, 2, 3, 2, 7, -7, 2, 3, 2]

And just as with strings and tuples, you can use indexing and slicing to access individual elements and sublists:

>>> lst = [3, (1,), 'dog', 'cat']
>>> lst[0]
3
>>> lst[1]
(1,)
>>> lst[2]
'dog'
>>> lst[1:3]
[(1,), 'dog']
>>> lst[2:]
['dog', 'cat']
>>> lst[-3:]
[(1,), 'dog', 'cat']
>>> lst[:-3]
[3]

Notice that lists can contain any kinds of values: numbers, strings, or even other sequences. The empty list is denoted by [], and a singleton list containing just one element x is written [x] (in contrast to tuples, no trailing comma is necessary for a singleton list).

Mutability

As mentioned earlier, mutability is the key feature that distinguishes lists from tuples. For example:

>>> pets = ['frog', 'dog', 'cow', 'hamster']
>>> pets
['frog', 'dog', 'cow', 'hamster']
>>> pets[2] = 'cat'
>>> pets
['frog', 'dog', 'cat', 'hamster']

As you can see, this sets the second element of the list pets to point to 'cat'. The string 'cow' gets replaced and is automatically deleted by Python.

Figure 7.1 shows a helpful diagrammatic representation of a list. Just as with variables, it is important to understand that the elements of a list only point to their values and do not actually contain them.

A Python list points to its values.

Figure 7.1. A Python list points to its values.

The fact that lists point to their values can be the source of some surprising behavior. Consider this nasty example:

>>> snake = [1, 2, 3]
>>> snake[1] = snake
>>> snake
[1, [...], 3]

Here, we’ve made an element of a list point to the list itself: We’ve created a self-referential data structure. The [...] in the printout indicates that Python recognizes the self-reference and does not stupidly print the list forever(!). Figure 7.2 shows diagrammatically what snake looks like.

A self-referential list. Note that the second element is not pointing to the first element of the list, but to the entire list itself.

Figure 7.2. A self-referential list. Note that the second element is not pointing to the first element of the list, but to the entire list itself.

List Functions

Lists come with many useful functions (Table 7.2). All of these functions, except for count (which just returns a number), modify the list you call them with. Thus, these are mutating functions, so it pays to change your lists carefully. It is distressingly easy, for instance, to accidentally delete a wrong element or insert a new value at the wrong place.

Table 7.2. List Functions

NAME

RETURN VALUE

s.append(x)

Appends x to the end of s

s.count(x)

Returns the number of times x appears in s

s.extend(lst)

Appends each item of lst to s

s.index(x)

Returns the index value of the leftmost occurrence of x

s.insert(i, x)

Inserts x before index location i (so that s[i] == x)

s.pop(i)

Removes and returns the item at index i in s

s.remove(x)

Removes the leftmost occurrence of x in s

s.reverse()

Reverses the order of the elements of s

s.sort()

Sorts the elements of s into increasing order

The append function is a useful way to add elements to a list. One common programming pattern is to create an empty list at the beginning of a function and then add values to it in the rest of the function. For example, here is a function that creates a string of messages based on a list of input numbers:

# numnote.py
def numnote(lst):
    msg = []
    for num in lst:
        if num < 0:
            s = str(num) + ' is negative'
        elif 0 <= num <= 9:
            s = str(num) + ' is a digit'
        msg.append(s)
    return msg

For example:

>>> numnote([1, 5, -6, 22])
['1 is a digit', '5 is a digit',
'-6 is negative']

To print the messages on their own individual lines, you could do this:

>>> for msg in numnote([1, 5, -6, 22]): print(msg)
1 is a digit
5 is a digit
-6 is negative

Or even this:

>>> print('
'.join(numnote([1, 5, -6, 22])))
1 is a digit
5 is a digit
-6 is negative

The extend function is similar to append, but it takes an entire sequence as input:

>>> lst = []
>>> lst.extend('cat')
>>> lst
['c', 'a', 't']
>>> lst.extend([1, 5, -3])
>>> lst
['c', 'a', 't', 1, 5, -3]

The pop function removes an element at a given index position and then returns it. For example:

>>> lst = ['a', 'b', 'c', 'd']
>>> lst.pop(2)
'c'
>>> lst
['a', 'b', 'd']
>>> lst.pop()
'd'
>>> lst
['a', 'b']

Notice that if you don’t give pop an index, it removes and returns the element at the end of the list.

The remove(x) function removes the first occurrence of x from a list. However, it does not return x:

>>> lst = ['a', 'b', 'c', 'a']
>>> lst.remove('a')
>>> lst
['b', 'c', 'a']

As the name suggests, reverse reverses the order of the elements of a list:

>>> lst = ['a', 'b', 'c', 'a']
>>> lst
['a', 'b', 'c', 'a']
>>> lst.reverse()
>>> lst
['a', 'c', 'b', 'a']

It’s important to realize that reverse does not make a copy of the list: It actually moves the list pointers, so we say the reversal is done in place.

Sorting Lists

Sorting data is one of the most common things that computers do. Sorted data is usually easier to work with than unsorted data, for both humans and computers. For instance, finding the smallest element of a sorted list requires no searching at all: It is simply the first element of the list. Humans often prefer to see data in sorted order—just imagine a phone book that was not printed alphabetically!

In Python, sorting is most easily done using the list sort() function, which is extremely efficient: In practice, it can be used to quickly sort lists with tens of thousands of elements. Like reverse(), sort() modifies the list in place:

>>> lst = [6, 0, 4, 3, 2, 6]
>>> lst
[6, 0, 4, 3, 2, 6]
>>> lst.sort()
>>> lst
[0, 2, 3, 4, 6, 6]

The sort function always sorts elements into ascending order, from smallest to largest. If you want the elements sorted in reverse order, from largest to smallest, the simple trick of calling reverse after sort works well:

>>> lst = ['up', 'down', 'cat', 'dog']
>>> lst
['up', 'down', 'cat', 'dog']
>>> lst.sort()
>>> lst
['cat', 'dog', 'down', 'up']
>>> lst.reverse()
>>> lst
['up', 'down', 'dog', 'cat']

Python also knows how to sort tuples and lists. For example:

>>> pts = [(1, 2), (1, -1), (3, 5), (2, 1)]
>>> pts
[(1, 2), (1, -1), (3, 5), (2, 1)]
>>> pts.sort()
>>> pts
[(1, -1), (1, 2), (2, 1), (3, 5)]

Tuples (and lists) are sorted by their first element, then by their second element, and so on.

List Comprehensions

Lists are so used so frequently that Python provides a special notation for creating them called list comprehensions. For example, here’s how to create a list of the squares of the numbers from 1 to 10 with a list comprehension:

>>> [n * n for n in range(1, 11)]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

The main advantage of this notation is that it is compact and readable. Compare this with equivalent code without a comprehension:

result = []
for n in range(1, 11):
    result.append(n * n)

Once you get the hang of them, list comprehensions are quick and easy to write, and you will find many uses for them.

More examples

Let’s see a few more examples of comprehensions. If you want to double each number on the list and 7, you can do this:

>>> [2 * n + 7for n in range(1, 11)]
[9, 11, 13, 15, 17, 19, 21, 23, 25, 27]

Or if you want the first ten cubes:

>>> [n ** 3 for n in range(1, 11)]
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]

You can also use strings in comprehensions. For example:

>>> [c for c in 'pizza']
['p', 'i', 'z', 'z', 'a']
>>> [c.upper() for c in 'pizza']
['P', 'I', 'Z', 'Z', 'A']

A common application of comprehensions is to modify an existing list in some way. For instance:

>>> names = ['al', 'mei', 'jo', 'del']
>>> names
['al', 'mei', 'jo', 'del']
>>> cap_names = [n.capitalize() for n in names]
>>> cap_names
['Al', 'Mei', 'Jo', 'Del']
>>> names
['al', 'mei', 'jo', 'del']

Filtered comprehensions

List comprehensions can be written with tests, which allow you to filter out elements from other lists. For example, the following comprehension returns a list containing just the positive elements of nums:

>>> nums = [-1, 0, 6, -4, -2, 3]
>>> result = [n for n in nums if n > 0]
>>> result
[6, 3]

Here’s equivalent code without a comprehension:

result = []
nums = [-1, 0, 6, -4, -2, 3]
for n in nums:
    if n > 0:
       result.append(n)

Again, we see that list comprehensions are more compact and readable than a regular loop.

Here’s a comprehension that removes all the vowels from a word written inside a function:

# eatvowels.py
def eat_vowels(s):
    """ Removes the vowels from s.
    """

    return ''.join([c for c in s
                  if c.lower() not in 'aeiou'])

It works like this:

>>> eat_vowels('Apple Sauce')
'ppl Sc'

The body of eat_vowels looks rather cryptic at first, and the trick to understanding it is to read it a piece at a time. First, look at the comprehension:

[c for c in s if c.lower() not in
'aeiou']

This is a filtered comprehension that scans through the characters of s one at a time. It converts each character to lowercase and then checks to see if it is a vowel. If it is a vowel, it is skipped and not added to the resulting list; otherwise, it is added.

The result of this comprehension is a list of strings, so we use join to concatenate all the strings into a single string that is then immediately returned.

Dictionaries

A Python dictionary is an extremely efficient data structure for storing pairs of values in the form key:value. For example:

>>> color = {'red' : 1, 'blue' : 2, 'green' : 3}
>>> color
{'blue': 2, 'green': 3, 'red': 1}

The dictionary color has three members. One of them is 'blue':2, where 'blue' is the key and 2 is its associated value.

You access values in a dictionary by their keys:

>>> color['green']
3
>>> color['red']
1

Accessing dictionary values by their keys is extremely efficient, even if the dictionary has many thousands of pairs.

Like lists, dictionaries are mutable: You can add or remove key:value pairs. For example:

>>> color = {'red' : 1, 'blue' : 2, 'green' : 3}
>>> color
{'blue': 2, 'green': 3, 'red': 1}
>>> color['red'] = 0
>>> color
{'blue': 2, 'green': 3, 'red': 0}

Key restrictions

Dictionary keys have a couple of restrictions you need to be aware of. First, keys are unique within the dictionary: You can’t have two key:value pairs in the same dictionary with the same key. For example:

>>> color = {'red' : 1, 'blue' : 2, 'green' : 3,
             'red' : 4}
>>> color
{'blue': 2, 'green': 3, 'red': 4}

Even though we’ve written the key 'red' twice, Python only stores the second pair, 'red':4. There’s simply no way to have duplicate keys: Dictionary keys must always be unique.

The second restriction on keys is that they must be immutable. So, for example, a dictionary key cannot be a list or a dictionary. The reason for this requirement is that the location in a dictionary where a key:value pair is stored depends intimately on the key. If the key changes even slightly, the location of the key:value pair in the dictionary can also change. If that happens, then pairs in the dictionary can become lost and inaccessible.

Neither of these restrictions holds for values: Values can be mutable and can appear as many times as you like within the same dictionary.

Dictionary functions

Table 7.3 lists the functions that come with all dictionaries.

Table 7.3. Dictionary Functions

NAME

RETURN VALUE

d.items()

Returns a view of the (key, value) pairs in d

d.keys()

Returns a view of the keys of d

d.values()

Returns a view of the values in d

d.get(key)

Returns the value associated with key

d.pop(key)

Removes key and returns its corresponding value

d.popitem()

Returns some (key, value) pair from d

d.clear()

Removes all items from d

d.copy()

A copy of d

d.fromkeys(s, t)

Creates a new dictionary with keys taken from s and values taken from t

d.setdefault(key, v)

If key is in d, returns its value; if key is not in d, returns v and adds (key, v) to d

d.update(e)

Adds the (key, value) pairs in e to d; e may be another dictionary or a sequence of pairs

As we’ve seen, the standard way to retrieve a value from a dictionary is to use square-bracket notation: d[key] returns the value associated with key. Calling d.get(key) will do the same thing. If you call either function when key is not in d, you’ll get a KeyError.

If you are not sure whether a key is in a dictionary ahead of time, you can check by calling key in d. This expression returns True if key is in the d, and False otherwise. It is an extremely efficient check (especially as compared with using in with sequences!), so go ahead and do it when necessary.

You can also retrieve dictionary values using the pop(key) and popitem() functions. The difference between pop(key) and get(key) is that pop(key) returns the value associated with key and also removes its pair from the dictionary (get only returns the value). The popitem() function returns and removes some (key, value) pair from the dictionary.

You don’t know ahead of time which pair will be popped, so it’s useful only when you don’t care about the order in which you access the dictionary elements.

The items(), keys(), and values() functions all return a special object known as a view. A view is linked to the original dictionary, so that if the dictionary changes, so does the view. For example:

>>> color
{'blue': 2, 'orange': 4, 'green': 3, 'red': 0}
>>> k = color.keys()
>>> for i in k: print(i)
blue
orange
green
red
>>> color.pop('red')
0
>>> color
{'blue': 2, 'orange': 4, 'green': 3}
>>> for i in k: print(i)
blue
orange
green

Sets

In Python, sets are collections of 0 or more items with no duplicates. They are similar to a dictionary that only has keys and no associated values. Sets are a good way to remove duplicates from a data collection, plus they efficiently mimic finite mathematical sets, including basic set operations such as unions and intersections.

Sets come in two categories: mutable sets and immutable frozensets. You can add and remove elements from a regular set, while a frozenset can never change once it is created.

Perhaps the most common use of sets is to remove duplicates from a sequence. For example:

>>> lst = [1, 1, 6, 8, 1, 5, 5, 6, 8, 1, 5]
>>> s = set(lst)
>>> s
{8, 1, 5, 6}

Just as with dictionaries, the order of the elements in the set cannot be guaranteed.

Calling dir(set()) in the interactive command line will list the functions that all sets come with—there are quite a few! Since sets are not as frequently used as lists and dictionaries, we won’t list them all here. But keep sets in mind, and when you need them, refer to their online documentation at http://docs.python.org/dev/3.0/library/stdtypes.html#set.

 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.12.232