© Valentina Porcu 2018
Valentina PorcuPython for Data Mining Quick Syntax Referencehttps://doi.org/10.1007/978-1-4842-4113-4_3

3. Basic Objects and Structures

Valentina Porcu1 
(1)
Nuoro, Italy
 

One of the most important features of Python is managing data structures. Let’s take a look at them.

Numbers

The numbers in Python can be any of the following:
  • Integers, or int

  • Floating points, or float

  • Complex

  • Booleans—that is, True or False

Let’s look at some examples:
# create an object containing an integer (int)
>>> n1 = 19
>>> type(n1)
<type 'int'>
# a float
>>> n2 = 7.5
>>> type(n2)
<type 'float'>
# a Boolean (True/False)
>>> n3 = True
>>> type(n3)
<type 'bool'>
# a complex number
>>> n4 = 3j
>>> type(n4)
<type 'complex'>

Container Objects

At the heart of Python are the various types of objects that can be created (Table 3-1).
Table 3-1

Python Container Objects

Container

Delimited by

Tuples

( )

Lists

[ ]

Dictionaries

{ }

Sets

{ }

Strings

""""

Let’s examine each of them in turn.

Tuples

The tuples, as well as strings and lists, are part of the sequence category. Sequences are iterative objects that represent arbitrary-length containers. Tuples are sequences of heterogeneous and immutable objects, and are identified by parentheses. The fact that they are immutable means that after we have created a tuple, we cannot alter it; we cannot replace one of its elements with another. Tuples are very efficient with regard to memory consumption and runtime.

Let’s create a tuple:
>>> t1 = (1,2,3,4,5)
# we interrogate with the type() function based on the object type we created
>>> type(t1)
<class 'tuple'>
# Python tells us we created a tuple, so we have created the right data structure
Common operations for sequences are indexing and slicing, and concatenation and repetition. As mentioned, we cannot modify a tuple after it has been created, but we can extract, concatenate, or repeat its elements.
# we create another tuple
>>> t2 = ("a", "b", "c", "d")
>>> type(t2)
<class 'tuple'>
# we extract the first element of tuple t2
>>> t2[0]
'a'
# to count the elements of a tuple, we start with zero; to extract "a", which is the first element, we use square brackets for slicing and insert the number 0 between them
# we can also use the minus sign to extract elements of a tuple; these elements are counted from the last to the first
>>> t2[-1]
'd'
# we can extract more than one element using a colon
>>> t2[1:3]
('b', 'c')
To determine whether an item is present in a tuple, we use the “in” operator:
>>> 'a' in t2
True
>>> 'z' in t2
False
As mentioned, tuples are immutable. If we try to replace one element of a tuple with another, we get an error message:
>>> t2['a'] = 15
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
To display the functions available for tuples, type
>>> dir(t2)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'count', 'index']
We can add elements to tuples by using the functions available for them:
>>> t2 = t2.__add__(('xyz',))
# let's see our tuple again
>>> t2
('a', 'b', 'c', 'd', 'xyz')
Last, we can create tuples that contain more than one type of object:
>>> t3 = (1,2,3,4,5, "test", 20.75, "string2")
>>> t3
(1, 2, 3, 4, 5, 'test', 20.75, 'string2')

Lists

Python lists include items of various types. They are similar to tuples, with the difference that they are mutable; you can add or delete items from a list.

To create a list we include its elements in square brackets, separated by a comma:
>>> list1 = ["jan", "feb", "mar", "apr"]
type(list1)
<class 'list'>
We can also create lists that contain numerical, logical, or string values, or we can mix multiple data types:
>>> list2 = ["one", 25, True]
>>> type(list2)
<class 'list'>
We can display a list using the print() function:
>>> print(list1)
['jan', 'feb', 'mar', 'apr']
Or we can determine its length with the len() function:
>>> len(list1)
4
We can also print a single list item according to is location:
>>> list1[0]
'jan'
>>> list1[-2]
'mar'
# if we insert a position that does not match any item in the list, we get an error
>>> list1[7]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
We can select some items from a list:
>>> list1 = ["jan", "feb", "mar", "apr"]
>>> list1[1:]
['feb', 'mar', 'apr']
>>> list1[:3]
['jan', 'feb', 'mar']
We can multiply a list:
>>> list1*2
['jan', 'feb', 'mar', 'apr', 'jan', 'feb', 'mar', 'apr']
Or we can create a new list by combining two lists:
>>> list3 = list1 + list2
>>> list3
['jan', 'feb', 'mar', 'apr', 'one', 25, True]
We can even extract some items and save them to another list, which really means we are talking about slicing.
>>> list4 = list3[2:6]
>>> list4
['mar', 'apr', 'one', 25]
We can also delete an item from a list like this:
>>> del list1[1]
>>> list1
['jan', 'mar', 'apr']
By typing the dir() function with a list, we can see all the operations we can do on that list:
>>> dir(list1)
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
Some of the most important functions include the following:
  • append() Adds elements to our list

  • clear() Removes all items in a list

  • copy() Makes a copy of the list

  • extend() Combines two lists

  • insert() Adds an item to a specific location in the list

  • pop() Removes an item from the list

  • remove() Removes an item from a specific location of the list

Let’s use some of these operations on list3.
>>> list3
['jan', 'feb', 'mar', 'apr', 'one', 25, True]
# we can add an element with the append() method
>>> list3.append(7)
>>> list3
['jan', 'feb', 'mar', 'apr', 'one', 25, True, 7]
# reverse the order of the list items with reverse()
>>> list3.reverse()
>>> list3
[7, True, 25, 'one', 'apr', 'mar', 'feb', 'jan']
# delete the last element with pop()
>>> list3.pop()
'jan'
>>> list3
[7, True, 25, 'one', 'apr', 'mar', 'feb']
# reorder items of a list in ascending order with sort()
>>> list5 = [100, 12, 45, 67, 89, 7, 19]
>>> list5.sort()
>>> list5
[7, 12, 19, 45, 67, 89, 100]
# extend a list with another list with extend()
>>> list5.extend([260, 35, 98, 124])
>>> list5
[7, 12, 19, 45, 67, 89, 100, 260, 35, 98, 124]
# last, we can delete items in a list with the clear function
>>> list5.clear()
>>> list5
[]
We can also create lists that contain sublists:
>>> list6 = [(5,7), (9,2), (2,3), (14,27)]
>>> list6
[(5, 7), (9, 2), (2, 3), (14, 27)]
# in this case, let's select the third element of the list6 object:
>>> list6[2]
(2, 3)
# let's select only the second element of the third element of list6:
>>> list6[2][1]
3
We can create a list that features a series of numbers by using the range() function.
# the range function()creates a list of numbers from 1 to 19:
>>> list7 = range(20)
# let us check the type of object
>>> type(list7)
<type 'list'>
# we print the object
>>> print(list7)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

Dictionaries

Another Python data structure includes dictionaries. They are containers that store key-value pairs and are distinguished by the use of braces and two points. Dictionaries are mutable but cannot be ordered. We cannot extract items from a dictionary as we did with lists and tuples.

In our first example, let’s look at a dictionary that records the names and heights of subjects:
>>> dict1 = {'Laura': 163, 'Francis': 169, 'Kate': 165}
>>> type(dict1)
<type 'dict'>
We can query the dictionary for a given value:
>>> print dict1['Francis']
169
We can also add an element to our dictionary and rewrite it:
>>> dict1['Simon'] = '180'
>>> dict1
{'Laura': 163, 'Simon': '180', 'Francis': 169, 'Kate': 165}
To list dictionary keys, we use the .keys method:
>>> dict1.keys()
['Laura', 'Simon', 'Francis', 'Kate']
To get only the values, we use the .values method:
>>> dict1.values()
[163, 169, 165]
To determine whether a given key is in our dictionary, we use the “in” operator:
>>> 'Laura' in dict1
True
>>> 'Stephanie' in dict1
False
We can delete a dictionary element with the del command:
>>> del dict1['Simon']
>>> dict1
{'Laura': 163, 'Francis': 169, 'Kate': 165}
We can delete all dictionary elements with the .clear method:
>>> dict1.clear()
>>> dict1
{}
Now, let’s create another dictionary:
>>> dict2 = {'Statistics':28, 'Machine Learning':30, 'Marketing':27,'Analysis':29}
>>> dict2
{'Marketing':27, 'Statistics':28, 'Analysis':29, 'Machine Learning':30}
We can verify the number of elements that make up the dictionary with len():
>>> len(dict2)
4

Dictionary dict2 features four key-value pairs.

We can query a dictionary about a given element even without the print() function:
>>> dict2['Marketing']
27
Let’s check the keys with the list() function:
>>> list(dict2)
['Marketing', 'Statistics', 'Analysis', 'Machine Learning']
We can place the keys in alphabetical order:
>>> sorted(list(dict2))
['Analysis', 'Machine Learning', 'Marketing', 'Statistics']
We can display values only with the .values method:
>>> dict2.values()
[27, 28, 29, 30]
And can we display all items with the .items method:
>>> dict2.items()
[('Marketing', 27), ('Statistics', 28), ('Analysis', 29), ('Machine Learning', 30)]
We can list the elements in our dictionary by creating a function:
>>> for i in dict2: print(i)
...
Marketing
Statistics
Analysis
Machine Learning
We can also delete one of the items with the .pop method:
>>> dict2.pop('Marketing')
27
>>> dict2
{'Statistics': 28, 'Analysis': 29, 'Machine Learning': 30}
The .popitem method, on the other hand, deletes a random element from the dictionary:
>>> dict2.popitem()
('Statistics', 28)
>>> dict2
{'Analysis': 29, 'Machine Learning': 30}
There are now two elements in the dictionary dict2. We can update one of the values—for example, 29—by subtracting:
>>> dict2
{'Analysis': 29, 'Machine Learning': 30}
>>> dict2['Analysis'] -2
27
In this case, we did not overwrite the value with the new one. Let’s add 1 to 29. To do this, we need to use the following notation:
>>> dict2['Analysis'] = dict2['Analysis'] + 1
>>> dict2
{'Analysis': 30, 'Machine Learning': 30}
We can also use assignment operators presented in Chapter 2. In this case, we can subtract 2 from 30:
>>> dict2['Analysis'] -= 2
>>> dict2
{'Analysis': 28, 'Machine Learning': 30}
We can also create an empty dictionary and fill it:
>>> dict3 = {}
>>> dict3['key1'] = ['value1']
>>> dict3
{'key1': ['value1']}
>>> dict3['key2'] = ['value2']
>>> dict3
{'key2': ['value2'], 'key1': ['value1']}
One of the properties of dictionaries is called nesting . With nesting, we insert one dictionary into another dictionary:
>>> dict4 = {'key1': { 'nested1': { 'subnested1':'value1'}}}
At this point to get the value value, we have to subset like this:
>>> dict4['key1']['nested1']['subnested1']
'value1'

Sets

Sets are another Python structure. They are unordered, unduplicated items containers. They are also immutable and support typical set operations, such as union, intersection, and difference.
# we create a set
>>> set1 = {2, 5, 7, 9, 15}
# check its type of structure
>>> type(set1)
<type 'set'>
# and check its length
>>> len(set1)
5
Sets do not support indexing:
>>> set1[2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object does not support indexing
But, they do tell us whether an item is within the set:
>>> 9 in set1
True
>>> 17 in set1
False
We can also create an empty set:
>>> set2 = set()
>>> type(set2)
<type 'set'>
To fill it, we use the .add method:
>>> set2.add(17)
>>> set2
set([17])
>>> set2.add(24)
>>> set2
set([24, 17])
>>> set2.add(36)
>>> set2
set([24, 17, 36])
type(set2)
<type 'set'>
>>> len(set2)
3
Let’s make another set:
>>> set3 = {1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,5}
>>> set3
set([1, 2, 3, 4, 5])
# as you can see, a set consists of unique elements

Strings

Strings are character sequences that are enclosed in single or double quotation marks. They are immutable objects, but they can be repeated and combined, and parts can be extracted. We write a string like this:
>>> string1 = "Hi!"
# and print it
>>> string1
'Hi!'
# or write it this way with the single quotes
>>> string2 = 'Hello'
>>> string2
'Hello'
# we can print a string by writing its name or using the print() function
>>> print(string1)
Hi!
A string can be composed of single words, parts of sentences, or whole sentences. Be careful when using single quotes because they may create confusion, for example, with apostrophes:
>>> string3 = 'I'd like to code in Python'
  File "<stdin>", line 1
    string3 = 'I'd like to code in Python'
                 ^
SyntaxError: invalid syntax
>>> string4 = "I'd like to code in Python"
>>> string4
"I'd like to code in Python"
However, we can include quotation marks with a backslash as follows:
>>> haml = "Hamlet said: "to be or not to be ...". Oratio answered "
>>> haml
'Hamlet said: "to be or not to be ...". Oratio answered '
There are some control characters that could be also useful. For example, “ ” indicates a new line:
>>> haml2 = "Hamlet said: to be or not to be Oratio answered ..."
>>> print(haml2)
Hamlet said: to be or not to be
 Oratio answered ...
In addition, “ ” indicates a tab:
>>> haml3 = "Hamlet said: to be or not to be Oratio answered ..."
>>> print(haml3)
Hamlet said: to be or not to be   Oratio answered ...
Operators that can be used when referring to strings, including the concatenation operator “+”:
>>> string1 + string2
'Hi!Hello'
Or repetition operator “*”:
>>> string1*10
'Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!'
We can enter three quotation marks to mark the beginning and end of a string that extends over several lines:
>>> string5 = """I'd
... like
... to
... code
... in Python
... """
>>> print(string5)
I'd
like
to
code
in Python
We verify the class of a string with the type() function:
>>> type(string1)
<class 'str'>
And check the length with the len() function:
>>> len(string1)
3
To verify the object id, we use the id() function:
>>> id(string1)
4321859488
We can also display parts of a string:
>>> string1[0]
'H'
>>> string2[2]
'l'
>>> string4[-1]
'n'
>>> haml[1:10]
'amlet sai'
>>> haml[5:]
't said: "to be or not to be ...". Oratio answered '
>>> haml[:10]
'Hamlet sai'
>>> haml[:-2]
'Hamlet said: "to be or not to be ...". Oratio answere'
# the following notation is used to reverse a string (or even just a part of it)
>>> haml[::-1]
' derewsna oitarO ."... eb ot ton ro eb ot" :dias telmaH'
The most important functions associated with strings allow you to start, for example, an uppercase string. We can do this by using the capitalize() method:
>>> string6 = "let's do a little test"
>>> string6.capitalize()
"Let's do a little test"
Other functions allow you to put an entire string in uppercase or lowercase letters:
>>> string6.upper()
"LET'S DO A LITTLE TEST"
>>> string7 = string6.upper()
>>> string7
"LET'S DO A LITTLE TEST"
>>> string7.lower()
"let's do a little test"
The .find method, the .index method, and the .count method are used to look for one or more characters in a string:
>>> string7.find("TT")
13
>>> string7.index('D')
6
>>> string7.count('L')
3
The strip() functions removes blank spaces at the beginning and end of a string:
>>> string8 = "     test     "
>>> string8.strip()
'test'
The replace() function allows us to replace part of a string with another element:
>>> string9 = "Let's do some tests"
>>> string9.replace("some", "a couple of")
"Let's do a couple of tests"
We can verify the presence of a substring in our string like this:
>>> "do" in string9
True
>>> "ueioua" in string9
False
With the split() function, we can break a string into a list of multiple elements:
>>> string9.split()
["Let's", 'do', 'a', 'little', 'test']
The join() function allows us to group a list into a single string:
>>> "-".join(["03", "01", "2017"])
'03-01-2017'
In the previous example, a hyphen has been inserted as a separator. The following example does not include a separator. The items are thus listed consecutively:
>>> "".join(["a", "b", "c", "d"])
'abcd'
# in this case we insert a space
>>> " ".join(["a", "b", "c", "d"])
'a b c d'
Strings are subject to immutability—meaning, they cannot be modified. Even if we can always reuse a name and overwrite it with another object inside it, but it will be a different object for all it means. Let’s look at an example:
# we create a string
>>> string1 = "a b c d e f g"
# we try to replace the first element "a" with "x"
>>> string1[0] = 'x'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
# as you can see, we get an error because we can't change the string this way
The % modulus operator allows advanced string formatting. The % operator is used to search in the string for elements preceded by % and replaces them with the value or values contained in the list that follows it. The % symbol must be followed by a character that indicates the type of data we are entering in the string. To print the contents of two strings, we use “%s” like this:
# we create a first string
>>> string1 = 'test'
# if we want to print this part of the text and merge our string, we enter %s before closing the quotation marks and then insert % (string)
>>> print 'my string says: %s' %(string1)
my string says: test
We can use a loop to scroll a string:
>>> for letter in string1: print(letter)
...
t
e
s
t
We can count the number of letters in a string:
>>> word = "string test"
>>> count = 0
>>> for letter in word :
...        count = count + 1
...        print(count)
1
2
3
4
5
6
7
8
9
10
11

Caution

Python2 and Python3 manage strings a bit differently. In Python3, in fact, print() is a function and requires parentheses.

# string management in Python2
>>> print 'Hello world'
Hello world
# string management in Python3
>>> print 'Hello world'
  File "<stdin>", line 1
    print 'Hello world'
                      ^
SyntaxError: Missing parentheses in call to 'print'
>>> print('Hello world')
Hello world
# to handle strings in Python2 as they are handled in Python3, we can import the future module:
# use of future module in Python2
>>> from __future__ import print_function
>>> print('Hello world')
Hello world

Files

In addition to the features we examined, we also typically import files to analyze from our computer or from the Internet. Files are often structured as dataframes, but we can also import images, audio, binary, text, or other proprietary formats, such as SPSS, SAS, a database, and so on. We learn how to import the simplest formats, such as .csv, in Chapter 6.

Immutability

As mentioned, immutability is a characteristic of some Python structures: Once created, the structure cannot be modified (see Table 3-2). We can reuse a name and overwrite the structure with another object inside it, but it will be different for all intents and purposes.
Table 3-2

Data Structures and Immutability

Structure

Mutable

Lists

../images/469457_1_En_3_Chapter/469457_1_En_3_Figa_HTML.jpg

Dictionaries

../images/469457_1_En_3_Chapter/469457_1_En_3_Figb_HTML.jpg

Tuples

../images/469457_1_En_3_Chapter/469457_1_En_3_Figc_HTML.jpg

Sets

../images/469457_1_En_3_Chapter/469457_1_En_3_Figd_HTML.jpg

Strings

../images/469457_1_En_3_Chapter/469457_1_En_3_Fige_HTML.jpg

Let’s look at more examples. We start first with a list, which is a mutable object:
# we create a list
>>> list1 = ["jan", "feb", "mar", "apr"]
# we check the type of object
>>> type(list1)
<class 'list'>
# and check the ID of the created list
>>> id(list1)
4302269184
# we add an element
>>> list1.append("oct")
# we reprint the list
>>> list1
['jan', 'feb', 'mar', 'apr', 'oct']
# and check the ID again
>>> id(list1)
4302269184
# as you can see, the ID is identical
Now, let’s create a tuple, which is an immutable object:
>>> tuple1 = (1,2,3,4)
# we check the object class
>>> type(tuple1)
<type 'tuple'>
# and verify the ID
>>> id(tuple1)
4302119432
# we then try to add an element
>>> tuple1.append(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'append'
# we recreate the tuple that also contains the last object
>>> tuple1 = (1,2,3,4,5)
# and print its contents
>>> tuple1
(1, 2, 3, 4, 5)
# and verify the ID
>>> id(tuple1)
4301382000
# as seen, we did not overwrite the first object; we created a second object with the same name (the first tuple1 object is no longer available)
Last, let’s examine some examples with strings, which are immutable :
# we create a string
>>> string1 = "a b c d e f g"
# and try to replace the first element "a" with "x" String1 [0] = 'x'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
# we get an error

Converting Formats

We can transform one structure to another quite easily with the help of some functions.
# let's create some objects
>>> tuple1 = (1,2,3,4,5)
>>> list1 = ["jan", "feb", "mar", "apr"]
>>> string1 = "2017"
>>> int1 = 67
# we check the type of objects
>>> type(tuple1)
<type 'tuple'>
>>> type(list1)
<type 'list'>
>>> type(string1)
<type 'str'>
>>> type(int1)
<type 'int'>
To convert formats, we use the following functions :
# list() converts, for example, a tuple to a list
>>> convt1 = list(tuple1)
# it is necessary to save the result to a new object; let's do it again and recheck the type
>>> type(convt1)
<type 'list'>
# from list to tuple
>>> conv_to_list = tuple(list1)
>>> type(conv_to_list)
<type 'tuple'>
# from string to integer
>>> conv_to_int = int(string1)
>>> type(conv_to_int)
<type 'int'>

Summary

In this chapter we learned how to create and manipulate the most important basic data structures in Python. An object-oriented programming language like Python is based on two main features: objects and actions. In this chapter we learned more about the objects; in Chapter 4, we focus on actions by creating functions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.81.201