Lists and deques

Python lists are ordered collections of elements and, in Python, are implemented as resizable arrays. An array is a basic data structure that consists of a series of contiguous memory locations, and each location contains a reference to a Python object.

Lists shine in accessing, modifying, and appending elements. Accessing or modifying an element involves fetching the object reference from the appropriate position of the underlying array and has complexity O(1). Appending an element is also very fast. When an empty list is created, an array of fixed size is allocated and, as we insert elements, the slots in the array are gradually filled up. Once all the slots are occupied, the list needs to increase the size of its underlying array, thus triggering a memory reallocation that can take O(N) time. Nevertheless, those memory allocations are infrequent, and the time complexity for the append operation is referred to as amortized O(1) time.

The list operations that may have efficiency problems are those that add or remove elements at the beginning (or somewhere in the middle) of the list. When an item is inserted, or removed, from the beginning of a list, all the subsequent elements of the array need to be shifted by a position, thus taking O(N) time.

In the following table, the timings for different operations on a list of size 10,000 are shown; you can see how insertion and removal performances vary quite dramatically if performed at the beginning or at the end of the list:

Code N=10000 (µs) N=20000 (µs) N=30000 (µs) Time
list.pop() 0.50 0.59 0.58 O(1)
list.pop(0) 4.20 8.36 12.09 O(N)
list.append(1) 0.43 0.45 0.46 O(1)
list.insert(0, 1) 6.20 11.97 17.41 O(N)

In some cases, it is necessary to efficiently perform insertion or removal of elements both at the beginning and at the end of the collection. Python provides a data structure with those properties in the collections.deque class. The word deque stands for double-ended queue because this data structure is designed to efficiently put and remove elements at the beginning and at the end of the collection, as it is in the case of queues. In Python, deques are implemented as doubly-linked lists.

Deques, in addition to pop and append, expose the popleft and appendleft methods that have O(1) running time:

Code N=10000 (µs) N=20000 (µs) N=30000 (µs) Time
deque.pop() 0.41 0.47 0.51 O(1)
deque.popleft() 0.39 0.51 0.47 O(1)
deque.append(1) 0.42 0.48 0.50 O(1)
deque.appendleft(1) 0.38 0.47 0.51 O(1)

Despite these advantages, deques should not be used to replace regular lists in most cases. The efficiency gained by the appendleft and popleft operations comes at a cost: accessing an element in the middle of a deque is a O(N) operation, as shown in the following table:

Code N=10000 (µs) N=20000 (µs) N=30000 (µs) Time
deque[0] 0.37 0.41 0.45 O(1)
deque[N -  1] 0.37 0.42 0.43 O(1)
deque[int(N / 2)] 1.14 1.71 2.48 O(N)

Searching for an item in a list is generally a O(N) operation and is performed using the list.index method. A simple way to speed up searches in lists is to keep the array sorted and perform a binary search using the bisect module.

The bisect module allows fast searches on sorted arrays. The bisect.bisect function can be used on a sorted list to find the index to place an element while maintaining the array in sorted order. In the following example, we can see that if we want to insert the 3 element in the array while keeping collection in sorted order, we should put 3 in the third position (which corresponds to index 2):

    insert bisect
collection = [1, 2, 4, 5, 6]
bisect.bisect(collection, 3)
# Result: 2

This function uses the binary search algorithm that has O(log(N)) running time. Such a running time is exceptionally fast, and basically means that your running time will increase by a constant amount every time you double your input size. This means that if, for example, your program takes 1 second to run on an input of size 1000, it will take 2 seconds to process an input of size 2000, 3 seconds to process an input of size 4000, and so on. If you had 100 seconds, you could theoretically process an input of size 1033, which is larger than the number of atoms in your body!

If the value we are trying to insert is already present in the list, the bisect.bisect function will return the location after the already present value.  Therefore, we can use the bisect.bisect_left variant, which returns the correct index in the following way (taken from the module documentation at https://docs.python.org/3.5/library/bisect.html):

    def index_bisect(a, x):
'Locate the leftmost value exactly equal to x'
i = bisect.bisect_left(a, x)
if i != len(a) and a[i] == x:
return i
raise ValueError

In the following table, you can see how the running time of the bisect solution is barely affected at these input sizes, making it a suitable solution when searching through very large collections:

Code N=10000 (µs) N=20000 (µs) N=30000 (µs) Time
list.index(a) 87.55 171.06 263.17 O(N)
index_bisect(list, a) 3.16 3.20 4.71 O(log(N))
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.19.243