Lists

Lists are the least object-oriented of Python's data structures. While lists are, themselves, objects, there is a lot of syntax in Python to make using them as painless as possible. Unlike many other object-oriented languages, lists in Python are simply available. We don't need to import them and rarely need to call methods on them. We can loop over a list without explicitly requesting an iterator object, and we can construct a list (like a dictionary) with custom syntax. Further, list comprehensions and generator expressions turn them into a veritable Swiss-army knife of functionality.

We won't go into too much detail of the syntax; you've seen it in introductory tutorials across the web and previous examples in this book. You can't code Python very long without learning how to use lists! Instead, we'll be covering when lists should be used, and their nature as objects. If you don't know how to create or append to a list, how to retrieve items from a list, or what "slice notation" is, I direct you to the official Python tutorial, post-haste. It can be found online at:

http://docs.python.org/py3k/tutorial/

Lists, in Python, should normally be used when we want to store several instances of the "same" type of object; lists of strings or lists of numbers; most often, lists of objects we've defined ourselves. Lists should always be used when we want to store items in some kind of order. Often, this is the order in which they were inserted, but they can also be sorted by some criteria.

As we saw in the case study from the previous chapter, lists are also very useful when we need to modify the contents: insert to or delete from an arbitrary location of the list, or update a value within the list.

Like dictionaries, Python lists use an extremely efficient and well-tuned internal data structure so we can worry about what we're storing, rather than how we're storing it. Many object-oriented languages provide different data structures for queues, stacks, linked lists, and array-based lists. Python does provide special instances of some of these classes, if optimizing access to huge sets of data is required. Normally, however, the list data structure can serve all these purposes at once, and the coder has complete control over how they access it.

Don't use lists for collecting different attributes of individual items. We do not want, for example, a list of the properties a particular shape has. Tuples, named tuples, dictionaries, and objects would all be more suitable for this purpose. In some languages, they might create a list in which each alternate item is a different type; for example, they might write ['a', 1, 'b', 3] for our letter frequency list. They'd have to use a strange loop that accesses two elements in the list at once, or a modulus operator to determine which position was being accessed.

Don't do this in Python. We can group related items together using a dictionary, as we did in the previous section (if sort order doesn't matter), or using a list of tuples. Here's a rather convoluted example that demonstrates how we could do the frequency example using a list. It is much more complicated than the dictionary examples, and illustrates how much of an effect choosing the right (or wrong) data structure can have on the readability of our code.

	import string
	CHARACTERS = list(string.ascii_letters) + [" "]

	def letter_frequency(sentence):
		frequencies = [(c, 0) for c in CHARACTERS]
		for letter in sentence:
			index = CHARACTERS.index(letter)
			frequencies[index] = (letter,frequencies[index][1]+1)
		return frequencies

This code starts with a list of possible characters. The string.ascii_letters attribute provides a string of all the letters, lower and upper case, in order. We convert this to a list, and then use list concatenation (the plus operator causes two lists to be merged into one) to add one more character, the space. These are the available characters in our frequency list (the code would break if we tried to add a letter that wasn't in the list, but an exception handler could solve this).

The first line inside the function uses a list comprehension to turn the CHARACTERS list into a list of tuples. List comprehensions are an important, non-object-oriented tool in Python; we'll be covering them in detail in the next chapter.

Then we loop over each of the characters in the sentence. We first look up the index of the character in the CHARACTERS list, which we know has the same index in our frequencies list, since we just created the second list from the first. We then update that index in the frequencies list by creating a new tuple, discarding the original one. Aside from the garbage collection and memory waste concerns, this is rather difficult to read!

The resulting code works, but is not nearly so elegant as the dictionary. The code has two advantages over the earlier dictionary example, however. The list stores zero frequencies for characters not in the sentence, and when we receive the list, it comes in sorted order. The output shows the difference:


>>> letter_frequency("the quick brown fox jumps over the lazy dog")
[('a', 1), ('b', 1), ('c', 1), ('d', 1), ('e', 3), ('f', 1), ('g', 1), 
('h', 2), ('i', 1), ('j', 1), ('k', 1), ('l', 1), ('m', 1), ('n', 1), 
('o', 4), ('p', 1), ('q', 1), ('r', 2), ('s', 1), ('t', 2), ('u', 2), 
('v', 1), ('w', 1), ('x', 1), ('y', 1), ('z', 1), ('A', 0), ('B', 0), 
('C', 0), ('D', 0), ('E', 0), ('F', 0), ('G', 0), ('H', 0), ('I', 0), 
('J', 0), ('K', 0), ('L', 0), ('M', 0), ('N', 0), ('O', 0), ('P', 0),
('Q', 0), ('R', 0), ('S', 0), ('T', 0), ('U', 0), ('V', 0), ('W', 0), 
('X', 0), ('Y', 0), ('Z', 0), (' ', 8)]

The dictionary version could be adapted to provide these advantages by pre-populating the dictionary with zero values for all available characters, and by sorting the keys on the returned dictionary whenever we need them in order.

Like dictionaries, lists are objects too, and they have several methods that can be invoked upon them. The most common is append(element), which adds an element to the list. Similarly, insert(index, element) inserts an item at a specific position. The count(element) method tells us how many times an element appears in the list, and index() as we saw in the previous example tells us the index of an item in the list. The reverse() method does exactly what it says: turning the list around. The sort() method is also obvious, but it has some fairly complicated object-oriented behaviors, which we'll cover now.

Sorting lists

Without any parameters, sort will generally do the expected thing. If it's a list of strings, it will place them in alphabetical order. This operation is case sensitive, so all capital letters will be sorted before lower case letters, that is Z comes before a. If it is a list of numbers, they will be sorted in numerical order. If a list of tuples is provided, the list is sorted by the first element in each tuple. If a mixture of unsortable items is supplied, the sort will raise a TypeError exception.

If we want to place objects we define ourselves into a list and make those objects sortable, we have to do a bit more work. The special method __lt__, which stands for "less than", should be defined on the class to make instances of that class comparable. The sort method on list will access this method on each object to determine where it goes in the list. This method should return True if our class is somehow less than the passed parameter, and False otherwise. Here's a rather silly class that can be sorted based on either a string or a number:

	class WeirdSortee:
		def __init__(self, string, number, sort_num):
			self.string = string
			self.number = number
			self.sort_num = sort_num

		def __lt__(self, object):
			if self.sort_num:
				return self.number < object.number
			return self.string < object.string

		def __repr__(self):
			return"{}:{}".format(self.string, self.number)

The __repr__ method makes it easy to see the two values when we print a list. This __lt__ implementation compares the object to another instance of the same class (or any duck typed object that has string, number, and sort_num attributes; it will fail if those attributes are missing). The following output illustrates this class in action, when it comes to sorting:


>>> a = WeirdSortee('a', 4, True)
>>> b = WeirdSortee('b', 3, True)
>>> c = WeirdSortee('c', 2, True)
>>> d = WeirdSortee('d', 1, True)
>>> l = [a,b,c,d]
>>> l
[a:4, b:3, c:2, d:1]
>>> l.sort()
>>> l
[d:1, c:2, b:3, a:4]
>>> for i in l:
... 	i.sort_num = False
...
>>> l.sort()
>>> l
[a:4, b:3, c:2, d:1]

The first time we call sort, it sorts by numbers, because sort_num is True on all the objects being compared. The second time, it sorts by letters. The __lt__ method is the only one we need to implement to enable sorting. Technically, however, if it is implemented, the class should normally also implement the similar __gt__, __eq__, __ne__, __ge__, and __le__ methods, so that all of the<, >, ==, !=, >=, and<= operators also work properly.

The sort method can also take an optional key argument. This argument is a function that can transform each object in a list into an object that can be somehow compared. This is useful if we have a tuple of values and want to sort on the second item in the tuple rather than the first (which is the default for sorting tuples):


>>> x = [(1,'c'), (2,'a'), (3, 'b')]
>>> x.sort()
>>> x
[(1, 'c'), (2, 'a'), (3, 'b')]
>>> x.sort(key=lambda i: i[1])
>>> x
[(2, 'a'), (3, 'b'), (1, 'c')]

The lambda keyword in the command line creates a function that takes a tuple as input and uses sequence lookups to return the item with index 1 (that is the second item in the tuple).

As another example, we can also use the key parameter to make a sort case insensitive. To do this, we simply need to compare the all lowercase versions of strings, so we can pass the built-in str.lower function as the key function:


>>> l = ["hello", "HELP", "Helo"]
>>> l.sort()
>>> l
['HELP', 'Helo', 'hello']
>>> l.sort(key=str.lower)
>>> l
['hello', 'Helo', 'HELP']

Remember, even though lower is a method on string objects, it is also a function that can accept a single argument, self. In other words, str.lower(item) is equivalent to item.lower(). When we pass this function as a key, it performs the comparison on lowercase values instead of doing the default case-sensitive comparison.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.30.19