4.3 Storing Data for Processing

Whenever we work with large amounts of data, we need some means of organizing the data’s storage so that we can process the data in an orderly and efficient manner. Computer science provides us with a number of alternatives for accomplishing this task. Python’s built-in collections provide a means for organizing and storing data values. We have already introduced one type of collection: strings. In this chapter, we will further explore strings and consider three more Python collections: lists, dictionaries, and tuples.

4.3.1 Strings Revisited

In the Codes and Other Secrets chapter, we introduced the idea of a string as a sequential collection of characters. Each string is considered to be ordered from left to right, and each individual character can be accessed using the indexing operation. Provided that our data items can be thought of as characters, a string might make the perfect collection mechanism.

As an example, assume that we have just taken a multiple-choice exam where each question has five possible answers labeled A, B, C, D, and E. Our answer sheet might look like this:

  1. A

  2. B

  3. E

  4. A

  5. D

  6. B

  7. B

  8. A

  9. C

  10. E

To store our answers for later processing, we could use a simple string of 10 characters. The string will contain the 10 data items gathered from the student:

Image

This technique allows us easy access to each individual answer. Furthermore, using iteration over strings, we can process the entire exam. However, some potential drawbacks arise with this approach. What if the exam was in math class, where we have to solve number problems and write down our final answer? Now the answers are numbers instead of characters. Can a string still be used?

For example, suppose this is our answer sheet for a math exam:

  1. 34

  2. 56

  3. 2

  4. 652

  5. 26

  6. 1

  7. 99

  8. 865

  9. 22

  10. 16

If we attempt to use the same string-based storage technique, we might get something like this:

Image

It is easy to see that we have a real problem. How can we distinguish which numbers correspond to each answer?

In this case, we need a way to store collections of integers instead of being restricted to just using characters. As with the string, where there is one character per position, we would like an organization technique that can provide one integer per position. Fortunately, there is such a collection—the list.

4.3.2 Lists

A list is a collection that is very similar to a string in general structure, but has some specific differences that must be understood for the list to be used properly. FIGURE 4.1 shows that strings and lists are examples of sequential collections.

A block diagram infers that, Python Collections includes Sequential Collections, which in turn, includes Strings and Lists.

FIGURE 4.1 Lists and strings as sequential collections.

A list is an ordered, sequential collection of zero or more Python data objects. Lists are written as comma-delimited values enclosed in square brackets. We call a list with zero data objects the empty list, which is represented simply by []. Recall that strings are homogeneous collections because each item in the collection is the same type of object—a character. Lists, by contrast, are heterogeneous collections because lists can be composed of a mixture of any kind of object. In SESSION 4.1, the list myList consists of two integers, a floating-point value, and a string.

Image

SESSION 4.1 A Python list

As with other values in Python, asking the interpreter to evaluate a list will simply return the list itself. To remember the list for later processing, we need to assign the list to a variable. Evaluating the variable then returns the list. FIGURE 4.2 shows the sequential organization of the items in the example list in Session 4.1.

A list of four elements, 3, ‘cat,’ 6.5, and 2 is arranged sequentially, with indexes 0 through 3.

FIGURE 4.2 Sequential storage of the elements in a list with indexes.

Since lists are sequential, many operations that can be applied to any Python sequence, such as strings, can be applied to lists as well. TABLE 4.1 reviews these operations, and SESSION 4.2 gives examples of their use. Note that the last example in Session 4.2 introduces the del statement (see TABLE 4.2), which allows you to delete an item from a list. This operation is not permitted on strings. Strings are immutable collections of data in which individual items within the string cannot be changed. That is not true for lists: We can change an individual member of a list by using the assignment statement and placing the indexed location on the left-hand side. Thus, lists are mutable collections of data; that is, lists can be modified.

TABLE 4.1 Operations on Any Python Sequence

Operation Name Operator/Function Explanation
Indexing [ ] Access an element of a sequence
Concatenation + Combine sequences of the same type
Repetition * Concatenate a repeated number of times
Membership in Ask whether an item is in a sequence
Membership not in Ask whether an item is not in a sequence
Length len Ask the number of items in the sequence
Slicing [ : ] Extract a part of a sequence
Image

SESSION 4.2 Using sequence operators with lists

TABLE 4.2 An Additional Operation on Lists

Operation Name Statement Explanation
Delete del Delete an item

Note that the indexes for lists, as with strings, start with 0. The slice operation myList[1:3] returns a list of items starting with the item indexed by 1 up to, but not including, the item indexed by 3.

SESSION 4.3 shows an assignment statement modifying the item at index 2 in the list changeList. The reference diagram in FIGURE 4.3 shows that a list is a collection of references to Python objects. Changing an item in the list simply changes the reference stored at that position. Note that after item 2 has been changed, the previous reference is deleted, as indicated by a dotted line in the figure. As shown in Session 4.3, attempting to change an item in a string using the same operation will not work; the Python interpreter reports an error because strings do not support the ability to change a single character.

Image

SESSION 4.3 Mutating a list

A figure depicts the concept of perceiving list as a collection of references.

FIGURE 4.3 The collection of references forming a list.

The use of the repetition operator (*) is also affected by this idea of a collection of references. The result of using the * operator is a repetition of references to the data objects in the sequence. This can best be seen by considering the statements in SESSION 4.4.

Image

SESSION 4.4 Mutating a list created with repetition

The notation [myList] means we have a list containing one item, whose value is a reference to the list myList. Hence, it is a list containing a list. The variable listOfMyList holds a collection of three references to the original list, myList. Note that a change to one element of myList shows up in all three occurrences in listOfMyList. This outcome occurs because the repetition operation is implemented as a list of three references to the same list, as shown in FIGURE 4.4.

 A figure shows the repetition operator copying references.

FIGURE 4.4 The repetition operator copies references.

A useful function for creating lists is the list built-in function, which converts other sequences to lists. We have already seen two such sequences: strings and ranges. Recall that the range function returns an object representing a sequence of integers. SESSION 4.5 demonstrates the use of list to create some simple lists from strings and ranges.

Image

SESSION 4.5 Using the list function

Lists support a number of other useful methods, as summarized in TABLE 4.3. Examples of their use can be seen in SESSION 4.6.

TABLE 4.3 Methods Provided by Lists in Python

Method Name Use Explanation
list list(sequence) Creates a list from the elements in sequence.
append aList.append(item) Adds a new item to the end of a list.
insert aList.insert(i, item) Inserts an item at the ith position in a list.
pop aList.pop() Removes and returns the last item in a list. Raises an IndexError if the list is empty.
pop aList.pop(i) Removes and returns the ith item in a list. Raises an IndexError if the list is empty or there is no ith item in the list.
sort aList.sort() Modifies a list to be sorted.
reverse aList.reverse() Modifies a list to be in reverse order.
index aList.index(item) Returns the index of the first occurrence of item. Raises a ValueError if item is not found.
count aList.count(item) Returns the number of occurrences of item.
remove aList.remove(item) Removes the first occurrence of item. Raises a ValueError if item is not in the list.
clear aList.clear() Removes all items in the list.
Image

SESSION 4.6 Examples of list methods

You can see that some of the methods, such as pop, return a value and also modify the list. Others, such as reverse and sort, simply modify the list with no return value. Although pop will default to the end of the list, it can also remove and return an item at a specific index location. Also notice the familiar “dot” notation, which is used when asking an object to invoke a method. You can read myList.append(False) as “ask the object myList to perform its append method using the value False as a parameter.” All data objects invoke methods in this way.

Before leaving this section, we will describe one additional string method—split, shown in TABLE 4.4. This method takes a string as a parameter that indicates the places to break the string into substrings; the substrings are returned in a list. By default, if no parameter is passed to split, it will break the string using one or more spaces or tabs as the delimiter. In Python, spaces and tabs are considered to be whitespace. The split method discards the delimiter. SESSION 4.7 demonstrates how the split method works.

TABLE 4.4 The String split Method

Method Explanation
split() Return a list of substrings created by splitting the string using whitespace as the delimiter. The whitespace is not part of the list.
split(delim) Return a list of substrings created by splitting the string using delim as the delimiter, and omitting occurrences of delim from the substrings.
Image

SESSION 4.7 Using the string split method

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.37.240