Storing Data Using Sets

A set is an unordered collection of distinct items. Unordered means that items aren’t stored in any particular order. Something is either in the set or it’s not, but there’s no notion of it being the first, second, or last item. Distinct means that any item appears in a set at most once; in other words, there are no duplicates.

Python has a type called set that allows us to store mutable collections of unordered, distinct items. (Remember that a mutable object is one that you can modify.) Here we create a set containing the vowels:

 >>>​​ ​​vowels​​ ​​=​​ ​​{​​'a'​​,​​ ​​'e'​​,​​ ​​'i'​​,​​ ​​'o'​​,​​ ​​'u'​​}
 >>>​​ ​​vowels
 {'a', 'u', 'o', 'i', 'e'}

It looks much like a list, except that sets use braces (that is, { and }) instead of brackets (that is, [ and ]). Notice that, when displayed in the shell, the set is unordered. Python does some mathematical tricks behind the scenes to make accessing the items very fast, and one of the side effects of this is that the items aren’t in any particular order.

Here we show that each item is distinct; duplicates are ignored:

 >>>​​ ​​vowels​​ ​​=​​ ​​{​​'a'​​,​​ ​​'e'​​,​​ ​​'a'​​,​​ ​​'a'​​,​​ ​​'i'​​,​​ ​​'o'​​,​​ ​​'u'​​,​​ ​​'u'​​}
 >>>​​ ​​vowels
 {'u', 'o', 'i', 'e', 'a'}

Even though there were three ’a’s and two ’u’s when we created the set, only one of each was kept. Python considers the two sets to be equal:

 >>>​​ ​​{​​'a'​​,​​ ​​'e'​​,​​ ​​'i'​​,​​ ​​'o'​​,​​ ​​'u'​​}​​ ​​==​​ ​​{​​'a'​​,​​ ​​'e'​​,​​ ​​'a'​​,​​ ​​'a'​​,​​ ​​'i'​​,​​ ​​'o'​​,​​ ​​'u'​​,​​ ​​'u'​​}
 True

The reason they are equal is that they contain the same items. Again, order doesn’t matter, and only one of each element is kept.

Variable vowels refers to an object of type set:

 >>>​​ ​​type(vowels)
 <class 'set'>
 >>>​​ ​​type({1,​​ ​​2,​​ ​​3})
 <class 'set'>

In Storing Data Using Dictionaries, you’ll learn about a type that also uses the notation {}, which prevents us from using that notation to represent an empty set. Instead, to create an empty set, you need to call function set with no arguments:

 >>>​​ ​​set()
 set()
 >>>​​ ​​type(set())
 <class 'set'>

Function set expects either no arguments (to create an empty set) or a single argument that is a collection of values. We can, for example, create a set from a list:

 >>>​​ ​​set([2,​​ ​​3,​​ ​​2,​​ ​​5])
 {2, 3, 5}

Because duplicates aren’t allowed, only one of the 2s appears in the set:

images/setdict/set.png

Function set expects at most one argument. You can’t pass several values as separate arguments:

 >>>​​ ​​set(2,​​ ​​3,​​ ​​5)
 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
 TypeError: set expected at most 1 arguments, got 3

In addition to lists, there are a couple of other types that can be used as arguments to function set. One is a set:

 >>>​​ ​​vowels​​ ​​=​​ ​​{​​'a'​​,​​ ​​'e'​​,​​ ​​'a'​​,​​ ​​'a'​​,​​ ​​'i'​​,​​ ​​'o'​​,​​ ​​'u'​​,​​ ​​'u'​​}
 >>>​​ ​​vowels
 {'i', 'a', 'u', 'e', 'o'}
 >>>​​ ​​set(vowels)
 {'i', 'a', 'u', 'e', 'o'}
 >>>​​ ​​set({5,​​ ​​3,​​ ​​1})
 {1, 3, 5}

Another such type is range from Generating Ranges of Numbers. In the following code a set is created with the values 0 to 4 inclusive:

 >>>​​ ​​set(range(5))
 {0, 1, 2, 3, 4}

In Storing Data Using Tuples, you will learn about the tuple type, another type of sequence, that can also be used as an argument to function set.

Set Operations

In mathematics, set operations include union, intersection, add, and remove. In Python, these are implemented as methods (for a complete list, see Table 14, Set Operations). We’ll show you these in action.

Sets are mutable. The methods add, remove, and clear all modify which items are in a set. The letter y is sometimes considered to be a vowel; here we add it to our set of vowels:

 >>>​​ ​​vowels​​ ​​=​​ ​​{​​'a'​​,​​ ​​'e'​​,​​ ​​'i'​​,​​ ​​'o'​​,​​ ​​'u'​​}
 >>>​​ ​​vowels
 {'o', 'u', 'a', 'e', 'i'}
 >>>​​ ​​vowels.add(​​'y'​​)
 >>>​​ ​​vowels
 {'u', 'y', 'e', 'a', 'o', 'i'}

Other methods, such as intersection and union, return new sets based on their arguments.

In the following code, we show all of these methods in action:

 >>>​​ ​​ten​​ ​​=​​ ​​set(range(10))
 >>>​​ ​​lows​​ ​​=​​ ​​{0,​​ ​​1,​​ ​​2,​​ ​​3,​​ ​​4}
 >>>​​ ​​odds​​ ​​=​​ ​​{1,​​ ​​3,​​ ​​5,​​ ​​7,​​ ​​9}
 >>>​​ ​​lows.add(9)
 >>>​​ ​​lows
 {0, 1, 2, 3, 4, 9}
 >>>​​ ​​lows.difference(odds)
 {0, 2, 4}
 >>>​​ ​​lows.intersection(odds)
 {1, 3, 9}
 >>>​​ ​​lows.issubset(ten)
 True
 >>>​​ ​​lows.issuperset(odds)
 False
 >>>​​ ​​lows.remove(0)
 >>>​​ ​​lows
 {1, 2, 3, 4, 9}
 >>>​​ ​​lows.symmetric_difference(odds)
 {2, 4, 5, 7}
 >>>​​ ​​lows.union(odds)
 {1, 2, 3, 4, 5, 7, 9}
 >>>​​ ​​lows.clear()
 >>>​​ ​​lows
 set()

Table 14. Set Operations

Method

Description

S.add(v)

Adds item v to a set S—this has no effect if v is already in S

S.clear()

Removes all items from set S

S.difference(other)

Returns a set with items that occur in set S but not in set other

S.intersection(other)

Returns a set with items that occur both in sets S and other

S.issubset(other)

Returns True if and only if all of set S’s items are also in set other

S.issuperset(other)

Returns True if and only if set S contains all of set other’s items

S.remove(v)

Removes item v from set S

S.symmetric_difference(other)

Returns a set with items that are in exactly one of sets S and other—any items that are in both sets are not included in the result

S.union(other)

Returns a set with items that are either in set S or other (or in both)


Many of the tasks performed by methods can also be accomplished using operators. If acids and bases are two sets, for example, then acids | bases creates a new set containing their union (that is, all the elements from both acids and bases), while acids <= bases tests whether acids is a subset of bases—that is, that all the values in acids are also in bases. Some of the operators that sets support are listed in Table 15, Set Operators.


Table 15. Set Operators

Method Call

Operator

set1.difference(set2)

set1 - set2

set1.intersection(set2)

set1 & set2

set1.issubset(set2)

set1 <= set2

set1.issuperset(set2)

set1 >= set2

set1.union(set2)

set1 | set2

set1.symmetric_difference(set2)

set1 ^ set2


The following code shows the set operations in action:

 >>>​​ ​​lows​​ ​​=​​ ​​set([0,​​ ​​1,​​ ​​2,​​ ​​3,​​ ​​4])
 >>>​​ ​​odds​​ ​​=​​ ​​set([1,​​ ​​3,​​ ​​5,​​ ​​7,​​ ​​9])
 >>>​​ ​​lows​​ ​​-​​ ​​odds​​ # Equivalent to lows.difference(odds)
 {0, 2, 4}
 >>>​​ ​​lows​​ ​​&​​ ​​odds​​ # Equivalent to lows.intersection(odds)
 {1, 3}
 >>>​​ ​​lows​​ ​​<=​​ ​​odds​​ # Equivalent to lows.issubset(odds)
 False
 >>>​​ ​​lows​​ ​​>=​​ ​​odds​​ # Equivalent to lows.issuperset(odds)
 False
 >>>​​ ​​lows​​ ​​|​​ ​​odds​​ # Equivalent to lows.union(odds)
 {0, 1, 2, 3, 4, 5, 7, 9}
 >>>​​ ​​lows​​ ​​^​​ ​​odds​​ # Equivalent to lows.symmetric_difference(odds)
 {0, 2, 4, 5, 7, 9}

Set Example: Arctic Birds

Suppose you have a file used to record observations of birds in the Canadian Arctic and you want to know which species have been observed. The observations file, observations.txt, has one species per line:

 canada goose
 canada goose
 long-tailed jaeger
 canada goose
 snow goose
 canada goose
 long-tailed jaeger
 canada goose
 northern fulmar

The following program reads each line of the file, strips off the leading and trailing whitespace, and adds the species on that line to the set. Notice the type annotation specifying that the function returns a set of strings:

 from​ typing ​import​ Set, TextIO
 from​ io ​import​ StringIO
 
 def​ observe_birds(observations_file: TextIO) -> Set[str]:
 """Return a set of the bird species listed in observations_file, which has
  one bird species per line.
 
  >>> infile = StringIO('bird 1​​\​​nbird 2​​\​​nbird 1​​\​​n')
  >>> birds = observe_birds(infile)
  >>> 'bird 1' in birds
  True
  >>> 'bird 2' in birds
  True
  >>> len(birds) == 2
  True
  """
  birds_observed = set()
 for​ line ​in​ observations_file:
  bird = line.strip()
  birds_observed.add(bird)
 
 return​ birds_observed
 
 if​ __name__ == ​'__main__'​:
 import​ doctest
  doctest.testmod()
 with​ open(​'observations.txt'​) ​as​ observations_file:
 print​(observe_birds(observations_file))

The resulting set contains four species. Since sets don’t contain duplicates, calling method add with a species already in the set had no effect.

You can loop over the values in a set. In the following code, a for loop is used to print each species:

 >>>​​ ​​for​​ ​​species​​ ​​in​​ ​​birds_observed:
 ...​​ ​​print(species)
 ...
 long-tailed jaeger
 canada goose
 northern fulmar
 snow goose

Looping over a set works exactly like a loop over a list, except that the order in which items are encountered is arbitrary: there is no guarantee that they will come out in the order in which they were added, in alphabetical order, in order by length, or in any other order.

Set Contents Must Be Immutable

Checking for set membership must be fast. It uses a mathematical technique called hashing, which relies on set values being immutable. Mutable values such as lists cannot be added to sets because mutable values are unhashable:

 >>>​​ ​​S​​ ​​=​​ ​​set()
 >>>​​ ​​L​​ ​​=​​ ​​[1,​​ ​​2,​​ ​​3]
 >>>​​ ​​S.add(L)
 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
 TypeError: unhashable type: 'set'

This restriction means that we can’t store a set of sets. Sets themselves can’t be immutable, since we need to add and remove values, so a set can’t contain another one. To solve this problem, Python has another data type called a frozen set. As the name implies, frozen sets are sets that cannot be mutated. An empty frozen set is created using frozenset(); to create a frozen set that contains some values, use frozenset(values), where values is a list, tuple, set, or other collection.

In the next section, you will learn about tuples, which can also be used as items in sets.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.206.112