With Python and VS Code installed and Jupyter notebooks up and running, you have everything in place to dive into Python more seriously and learn about its concepts, syntax and rules. This chapter is a relatively extensive introduction: while it doesn’t go very deep, it covers a lot of ground. If you are at the beginning of your coding career, there may be a lot to digest. There’s nothing to worry though if you don’t understand everything the first time around: the concepts will get clearer once you use them in later chapters as part of a practical example. Whenever Python and VBA differ significantly, I will point this out to make sure you can transition from VBA to Python smoothly and are aware of the obvious traps. If you haven’t done any VBA before, feel free to jump over these parts.
I will start this chapter by saying a few words about object-oriented programming (OOP) and some other preliminaries before diving into the basic data types that Python offers. After that, I will introduce indexing and slicing, a core concept in Python that gives you access to specific elements of an object. Up next are data structures like lists and dictionaries that can hold multiple objects. In that context, I’ll also explain the difference between mutable and immutable objects. I’ll continue with the if statement and the for and while loops before getting to a more formal introduction of functions, modules and classes that allow you to organize and structure your code. To wrap this chapter up, I will show you how to handle exceptions and how to format your Python code properly. As you have probably guessed by now, this chapter is as technical as it can get. Running the examples for yourself in a Jupyter notebook is therefore a good idea to make everything a bit more interactive and playful.
Before we can dig into actually using Python, we need to cover some preliminaries:
after a very brief introduction to object-oriented programming, this section introduces a few Python basics, namely how to write comments, how to define variables and how to continue a line. It also explains None
and gives you a quick preview on how to call functions.
Object-oriented programming is one of the most prominent programming paradigms and often used when working with larger codebases in Python. As the name suggests, OOP is all about manipulating objects in your code. Since this is a book for beginners and practitioners, I don’t want to bore you with a formal definition of what OOP is. Instead, it’s enough to understand what an object is. To start with, in Python, everything is an object. A number is an object as well as a string or a function and everything else that I am going to introduce in this chapter. You can think of objects as the building blocks you use to write your programs. They keep related data and functionality together in an intuitive way: if you write a car racing game, you would most likely have an object in your code that represents a car and that holds all the data about that car.
In Python, an object has attributes that give you access to its data and methods that allow you to perform an action. Method is just a fancy name for function in the context of an object. The car
object could have a speed
attribute that allows you to get the current speed of your race car via car.speed
. You might also be able to accelerate the car by calling the accelerate method: car.accelerate(10)
. This would increase the speed by 10 miles per hour. In Python, you use the dot syntax on an object to get access to either an attribute (speed
) or a method (accelerate
).
This has just scratched the surface of what OOP is, but it should be enough to get you through the next sections of this chapter and understand what’s going on when I use the object.method()
syntax. I will get back to objects in a bit more detail towards the end of this chapter. Meanwhile, the rest of this section explains a few simple Python concepts that don’t really fit into any of the other sections.
Comments help other people (and yourself a few weeks after writing the code) to understand what’s going on in your program. It is good practice to only comment those things that are not already evident from reading the code: in doubt, it’s better to have no comment than an outdated comment that contradicts the code. Anything starting with a #
sign is a comment in Python and is ignored when you run the code.
In
[
1
]:
# This is a sample we've seen before.
# Every comment line has to start with a #
3
+
4
Out
[
1
]:
7
In
[
2
]:
3
+
4
# Comments also work inline
Out
[
2
]:
7
Most editors have a keyboard shortcut to comment/uncomment lines. In Jupyter notebooks and VS Code it is Ctrl-/
(Windows) or Command-/
(macOS), respectively.
I have already used variables in a few examples in the last chapter, so this is finally a formal introduction: in Python, you use the equal sign as assignment operator which simply assigns an object to a name. In the first line of the example, it assigns the object 3
to the name a
:
In
[
3
]:
a
=
3
b
=
4
a
+
b
Out
[
3
]:
7
In Python, this works the same for all objects which is simpler compared to VBA, where you use the equal sign for data types like numbers and strings and the Set
statement for objects like workbooks or worksheets. In Python, you can change a variable’s type simply by assigning it to a new object (I’ll introduce data types properly in the next section). This is referred to as dynamic typing:
In
[
4
]:
a
=
3
(
a
)
a
=
"three"
(
a
)
3 three
Unlike VBA, Python is case-sensitive so a
and A
are two different variables. Variable names must follow certain rules:
They must start with either a letter or an underscore
They must consist of letters, numbers and underscores
Python allows you to define multiple variables in one line which can be convenient:
In
[
5
]:
x
,
y
=
1
,
2
In
[
6
]:
x
Out
[
6
]:
1
In
[
7
]:
y
Out
[
7
]:
2
Sometimes, a line of code can get so long that you will need to break it up into two or more lines to keep your code readable. Technically, you can either use parentheses or a backslash to break up the line:
In
[
8
]:
a
=
(
1
+
2
+
3
)
In
[
9
]:
a
=
1
+
2
+
3
Python’s style guide, however, prefers that you use implicit line breaks if possible: whenever you are using an expression that contains parentheses, square brackets or curly braces, use them to introduce a line break without having to introduce an additional character. I will point out a few of these implicit line breaks in samples over the next few sections.
None
is a built-in constant and represents “the absence of a value” according to the official docs. For example, if a function does not explicitly return anything, it returns None
. It is also a good choice to represent empty cells in Excel as you will see in later chapters.
I will introduce functions with a lot more details later in this chapter. For now, you should simply know how to use built-in functions like print
that we have used already a few times: in Python, you call a function by adding parentheses to the function name and providing the arguments within the parentheses, which is pretty much equivalent to the mathematical notation:
function_name
(
argument1
,
argument2
,
...
)
So much for the preliminaries—we can now start the first real topic which introduces you to the most important data types.
Python, like every other programming language, treats numbers, text, booleans etc. differently by assigning them a different data type. This allows Python, for example, to offer you a method to make a text string uppercase, while it will give you an error if you try to use this method with a number. Python’s most important basic data types are integers, floats, booleans and strings. In this section, I am going to introduce them one after another with a few examples. I’ll start with the numeric types: integers and floats.
The data types int
and float
represent integers and floating-point numbers, respectively. To find out the data type of a given object, you can use the type
function:
In
[
10
]:
type
(
4
)
Out
[
10
]:
int
In
[
11
]:
type
(
4.4
)
Out
[
11
]:
float
If you want to force a number to be a float
instead of an int
, it’s good enough to use a trailing decimal point or the float
constructor:
In
[
12
]:
type
(
4.
)
Out
[
12
]:
float
In
[
13
]:
float
(
4
)
Out
[
13
]:
4.0
Using the int
constructor, you can turn a float
into an int
. If the fractional part is not zero, it will be truncated:
In
[
14
]:
int
(
4.9
)
Out
[
14
]:
4
You may need to convert a float
to an int
when you read in a number from an Excel cell and provide it as an argument to a Python function that expects an integer. The reason for that is that numbers in Excel cells are always stored as floats behind the scenes, even if Excel shows you what looks like an integer.
Python has a few more numeric types that I won’t use in this book though: There are the decimal
, fraction
and complex
data types. If floating-point inaccuracies are an issue, you can use the decimal
type for exact results. These cases are extremely rare though. As a rule of thumb: if Excel would we good enough for the calculations, use floats.
Calculating with numbers requires the use of mathematical operators like the plus or minus sign. Except for the power operator, there shouldn’t be any surprise if you come from Excel:
In
[
16
]:
3
+
4
# Sum
Out
[
16
]:
7
In
[
17
]:
3
-
4
# Subtraction
Out
[
17
]:
-
1
In
[
18
]:
3
/
4
# Division
Out
[
18
]:
0.75
In
[
19
]:
3
*
4
# Multiplication
Out
[
19
]:
12
In
[
20
]:
3
**
4
# The power operator (Excel uses 3^4)
Out
[
20
]:
81
In
[
21
]:
9
%
4
# Modulo (Excel uses MOD(9, 4))
Out
[
21
]:
1
In
[
22
]:
3
*
(
3
+
4
)
# Use of parentheses
Out
[
22
]:
21
Having integers and floats covered, let’s move straight to the next section about booleans!
In Python, boolean types are either True
or False
, exactly like in VBA. The boolean operators and
, or
and not
, however, are all lowercase while VBA shows them capitalized. Boolean expressions are similar to how they work in Excel, except for equality and inequality operators:
In
[
23
]:
3
==
4
# Equality (Excel uses 3 = 4)
Out
[
23
]:
False
In
[
24
]:
3
!=
4
# Inequality (Excel uses 3 <> 4)
Out
[
24
]:
True
In
[
25
]:
3
<
4
# Smaller than. Use > for bigger than.
Out
[
25
]:
True
In
[
26
]:
3
<=
4
# Smaller or equal. Use >= for bigger or equal.
Out
[
26
]:
True
In
[
27
]:
# You can chain logical expressions like this
10
<
12
<
17
# VBA would require you to write: 10 < 12 And 12 < 17
Out
[
27
]:
True
In
[
28
]:
not
True
# "not" operator
Out
[
28
]:
False
In
[
29
]:
False
and
True
# "and" operator
Out
[
29
]:
False
In
[
30
]:
False
or
True
# "or" operator
Out
[
30
]:
True
Every Python object evaluates to either True
or False
. Use the bool
constructor to find out the boolean value of an object:
In
[
31
]:
bool
(
2
)
Out
[
31
]:
True
In
[
32
]:
bool
(
0
)
Out
[
32
]:
False
In
[
33
]:
bool
(
"a string"
)
Out
[
33
]:
True
In
[
34
]:
bool
(
""
)
Out
[
34
]:
False
Booleans are often used with if statements as we will see later in this chapter. With booleans in our pockets, there is one more basic data type left: textual data, better known as strings.
If you have ever worked with strings in VBA that are longer than one line and contain variables and literal quotes, you probably wished it was easier. Fortunately, this is an area where Python is particularly strong. Strings can be expressed by using either double quotes ("
) or single quotes ('
). The only condition is that you have to start and end the string with the same type of quotes. You can use +
to concatenate strings or *
to repeat strings. Since I showed you the repeating case already when trying out the Python REPL in Chapter 2, here is a sample using the plus sign:
In
[
35
]:
"A double quote string. "
+
'A single quote string.'
Out
[
35
]:
'A double quote string. A single quote string.'
Depending on what you want to write, using single or double quotes can help you to easily print literal quotes without the need to escape them. If you still need to escape a character, you precede it with a backslash:
In
[
36
]:
(
"Don't wait! "
+
'Learn how to "speak" Python.'
)
Don't wait! Learn how to "speak" Python.
In
[
37
]:
(
"It's easy to
"
escape
"
characters by a leading
\
."
)
It's easy to "escape" characters by a leading .
There is a special type of string called docstring that starts and ends with triple quotes:
In
[
38
]:
(
"""This is a docstring. It can span over multiple lines.
You can use double quotes (") and single quotes (') easily.
"""
)
This is a docstring. It can span over multiple lines. You can use double quotes (") and single quotes (') easily.
As the name implies, docstrings are primarily made to document your Python code. But they are also great for other use cases like writing a multi-line database query in SQL. When you are mixing strings with variables, you usually work with f-strings. Simply put an f
in front of your string and use variables in between curly braces.
In
[
39
]:
first_adjective
,
second_adjective
=
"free"
,
"open-source"
f
"Python is {first_adjective} and {second_adjective}."
Out
[
39
]:
'Python is free and open-source.'
f-strings were introduced in Python 3.6 which means that there are alternative ways to do string formatting. Here is an example for both the format method as well as the %
operator notation. Note also the implicit line continuation by using the parentheses of format()
:
In
[
40
]:
"Python is {first_adjective} and {second_adjective}."
.
format
(
first_adjective
=
first_adjective
,
second_adjective
=
second_adjective
)
Out
[
40
]:
'Python is free and open-source.'
In
[
41
]:
"Python is
%s
and
%s
."
%
(
first_adjective
,
second_adjective
)
Out
[
41
]:
'Python is free and open-source.'
If you want to concatenate a string with a number, you need to convert that number to a string first or you will get an error (f-strings handle this automatically though):
In
[
42
]:
version
=
3.8
"This book uses Python "
+
str
(
version
)
Out
[
42
]:
'This book uses Python 3.8'
When you work with strings, a regular task is to select parts of a string: for example, you may want to get the USD
part out of a EURUSD
exchange rate notation. The next section shows you Python’s powerful indexing and slicing mechanism that allows you to do exactly this.
By using indexing and slicing, you can access specific elements of a sequence. Since strings are sequences of characters, we can use them to learn how it works. In the next section, we will meet additional sequences like lists and tuples where you can use indexing and slicing, too.
Figure 3-1 introduces the concept of indexing: Python is zero-based which means that the first element in a sequence is referred to by index 0
. Negative indices from -1
allow you to refer to elements from the end of the sequence.
If you are coming from VBA, indexing is a common error trap. VBA uses one-based indexing for most collections like sheets (Sheets(1)
) but uses zero-based indexing for arrays (MyArray(0)
), although that default can be changed. Also, note that VBA uses parentheses for indexing while Python uses square brackets.
The syntax for indexing is as follows:
sequence
[
index
]
Accordingly, you can access specific elements from a string like this:
In
[
43
]:
s
=
"PYTHON"
In
[
44
]:
s
[
0
]
Out
[
44
]:
'P'
In
[
45
]:
s
[
1
]
Out
[
45
]:
'Y'
In
[
46
]:
s
[
-
1
]
Out
[
46
]:
'N'
In
[
47
]:
s
[
-
2
]
Out
[
47
]:
'O'
Often though, you want to extract more than just a single character—that is where slicing comes in.
If you want to get more than one element from a sequence, you use the slicing syntax that works as follows:
sequence
[
start
:
stop
:
step
]
Python uses half-open intervals: the start
index is included while the stop
index is not. If you leave the start
or stop
arguments away, it will include everything from the beginning or to the end of the sequence, respectively. step
determines the direction and the step size: for example, 2
will return every second element from left to right and -3
will return every third element from right to left with step=1
being the default.
In
[
48
]:
s
[:
3
]
# Same as s[0:3]
Out
[
48
]:
'PYT'
In
[
49
]:
s
[
1
:
3
]
Out
[
49
]:
'YT'
In
[
50
]:
s
[
-
3
:]
# Same as s[-3:6]
Out
[
50
]:
'HON'
In
[
51
]:
s
[
-
3
:
-
1
]
Out
[
51
]:
'HO'
In
[
52
]:
s
[::
2
]
# Every second element
Out
[
52
]:
'PTO'
In
[
53
]:
s
[
-
1
:
-
4
:
-
1
]
# Negative step goes from right to left
Out
[
53
]:
'NOH'
So far we’ve looked at just a single index or slice operation, but Python allows you to chain multiple index and slice operations together, as I will explain next.
Slicing and indexing can be chained: for example, if you want to get the second character out of the last three characters, you could do it like this:
In
[
54
]:
s
[
-
3
:][
1
]
Out
[
54
]:
'O'
This is the same as s[-2]
so in this case, it wouldn’t make much sense to use chaining, but it will make much more sense when we use indexing and slicing with lists, one of the data structures that I am introducing in the next section.
Python offers powerful data structures that make working with a collection of objects really easy. In this section, I am going to introduce lists, dictionaries, tuples and sets. While each of these data structures has slightly different characteristics, they are all able to hold multiple objects. In VBA, you may have used collections or arrays to hold multiple values. VBA even offers a data structure called dictionary that works conceptually the same as Python’s dictionary. It is, however, only available on the Windows version of Excel out of the box. Let’s get started with lists, the data structure that you will probably use most.
Lists are capable of holding multiple objects of different data types. They are so versatile that you will use them all the time. For example, you could put together a list with the names of Excel files that you want to work with. You create a list as follows:
[
element1
,
element2
,
...
]
In practice, it looks like this:
In
[
55
]:
file_names
=
[
"one.xlsx"
,
"two.xlsx"
,
"three.xlsx"
]
numbers
=
[
1
,
2
,
3
]
Like strings, lists can easily be concatenated with the plus sign. This also shows you that lists can hold different types of objects.
In
[
56
]:
file_names
+
numbers
Out
[
56
]:
[
'one.xlsx'
,
'two.xlsx'
,
'three.xlsx'
,
1
,
2
,
3
]
As lists are objects like everything else, lists can also have other lists as their elements. I will call them nested lists:
In
[
57
]:
nested_list
=
[[
1
,
2
,
3
],
[
4
,
5
,
6
],
[
7
,
8
,
9
]]
If you rearrange this to span over multiple lines, you can easily recognize that this is a very nice representation of a matrix or a range of spreadsheet cells. Note again that the square brackets implicitly allow you to break the lines. By applying indexing and slicing, you get the elements you want:
In
[
58
]:
cells
=
[[
1
,
2
,
3
],
[
4
,
5
,
6
],
[
7
,
8
,
9
]]
In
[
59
]:
cells
[
1
]
# Second row
Out
[
59
]:
[
4
,
5
,
6
]
In
[
60
]:
cells
[
1
][
1
:]
# Second row, second and third column
Out
[
60
]:
[
5
,
6
]
You can change elements in lists:
In
[
61
]:
mylist
=
[
10
,
11
,
12
]
In
[
62
]:
mylist
.
append
(
13
)
# Most commonly you add to the end
mylist
Out
[
62
]:
[
10
,
11
,
12
,
13
]
In
[
63
]:
mylist
.
insert
(
0
,
9
)
# Insert 9 at index 0
mylist
Out
[
63
]:
[
9
,
10
,
11
,
12
,
13
]
To delete an element, use either pop
or del
. While pop
is a method, del
is implemented as a statement in Python:
In
[
64
]:
mylist
.
pop
()
# Removes and returns the last element by default
Out
[
64
]:
13
In
[
65
]:
mylist
Out
[
65
]:
[
9
,
10
,
11
,
12
]
In
[
66
]:
del
mylist
[
0
]
# del removes an element at the given index
Some other useful things you can do with lists are:
In
[
67
]:
len
(
mylist
)
# Length
Out
[
67
]:
3
In
[
68
]:
10
in
mylist
# Check if mylist contains 10
Out
[
68
]:
True
In
[
69
]:
mylist
=
[
1
,
3
,
5
,
2
]
mylist
.
sort
()
# Sorts the original list
mylist
Out
[
69
]:
[
1
,
2
,
3
,
5
]
In
[
70
]:
mylist2
=
[
1
,
3
,
5
,
2
]
(
sorted
(
mylist2
))
# Returns a new sorted list
(
mylist2
)
[1, 2, 3, 5] [1, 3, 5, 2]
Note that you can use len
and in
with other sequences, too. For example with strings:
In
[
71
]:
len
(
"Python"
)
Out
[
71
]:
6
In
[
72
]:
"free"
in
"Python is open-source and free."
Out
[
72
]:
True
To get access to elements in a list, you refer to them by their position or index—that’s not always practical. Dictionaries, the topic of the next section, allow you to get access to elements via a key (often a name).
Dictionaries map keys to values. You will come across key/value combinations all the time. For example, you could use Excel file names (key) that map to their sheet names (value). The code samples will use a currency pair (key) that maps to the exchange rate (value). The easiest way to create a dictionary is as follows:
{
key1
:
value1
,
key2
:
value2
}
While lists allow you to access elements by index, i.e. position, dictionaries allow you to access elements by key. As with indices, keys are accessed via square brackets:
In
[
73
]:
exchange_rates
=
{
"EURUSD"
:
1.1151
,
"GBPUSD"
:
1.2454
,
"AUDUSD"
:
0.6161
}
In
[
74
]:
exchange_rates
[
"EURUSD"
]
Out
[
74
]:
1.1151
The following samples show how you can change existing values and add new key/value pairs:
In
[
75
]:
exchange_rates
[
"EURUSD"
]
=
1.2
# Change an existing value
exchange_rates
Out
[
75
]:
{
'EURUSD'
:
1.2
,
'GBPUSD'
:
1.2454
,
'AUDUSD'
:
0.6161
}
In
[
76
]:
exchange_rates
[
"CADUSD"
]
=
0.714
# Add a new key/value pair
exchange_rates
Out
[
76
]:
{
'EURUSD'
:
1.2
,
'GBPUSD'
:
1.2454
,
'AUDUSD'
:
0.6161
,
'CADUSD'
:
0.714
}
The easiest way to merge two or more dictionaries is by unpacking them into a new one. You unpack a dictionary by using two leading asterisks. If the second dictionary contains keys from the first one, the values from the first will be overridden. You can follow this by looking at the GBPUSD
exchange rate and comparing it with the original one at the beginning of this section:
In
[
77
]:
exchange_rates
=
{
**
exchange_rates
,
**
{
"SGDUSD"
:
0.7004
,
"GBPUSD"
:
1.2222
}}
exchange_rates
Out
[
77
]:
{
'EURUSD'
:
1.2
,
'GBPUSD'
:
1.2222
,
'AUDUSD'
:
0.6161
,
'CADUSD'
:
0.714
,
'SGDUSD'
:
0.7004
}
Many objects can serve as keys, so you can easily create a dictionary with e.g. integers as keys:
In
[
78
]:
d
=
{
1
:
"EUR"
,
2
:
"USD"
,
3
:
"AUD"
}
In
[
79
]:
d
[
1
]
Out
[
79
]:
'EUR'
By using the get
method, dictionaries allow you to use a default value in case the key doesn’t exist:
In
[
80
]:
# d[100] would raise an Exception. Instead of 100,
# you could use any other non-existing key, too.
d
.
get
(
100
,
"XXX"
)
Out
[
80
]:
'XXX'
Dictionaries can often be used when you would use a Case
statement in VBA. Consider this VBA example:
x
=
1
Select
Case
x
Case
1
Debug
.
"Case 1"
Case
2
Debug
.
"Case 2"
Case
Else
Debug
.
"Other case"
End
Select
In Python, this could be written like this:
In
[
81
]:
x
=
1
cases
=
{
1
:
"Case 1"
,
2
:
"Case 2"
}
(
cases
.
get
(
x
,
"Other case"
))
Case 1
Now that you know how to work with dictionaries, it’s time to look at the next data structure: tuples. They are similar to lists with one big difference, as you will see in the next section.
Tuples are like lists with the difference that they are immutable: once created, their elements can’t be changed. While you can often use tuples and lists interchangeably, tuples are the obvious choice for a collection that never changes throughout the program. Tuples are created by separating values with commas, but in many cases, they also require parentheses, e.g. when you only have a single element.
mytuple
=
element1
,
element2
,
...
The following example wouldn’t require parentheses, but using them makes it often easier to read:
In
[
82
]:
currencies
=
(
"EUR"
,
"GBP"
,
"AUD"
)
Tuples allow you to access elements the same way as lists, but they won’t allow you to change elements. Instead, concatenating tuples will create a new tuple behind the scenes, then bind this new tuple to your variable name:
In
[
83
]:
currencies
[
0
]
# Accessing the first element
Out
[
83
]:
'EUR'
In
[
84
]:
# Concatenating tuples will return a new tuple.
# A tuple with a single element requires parentheses.
currencies
+
(
"SGD"
,)
Out
[
84
]:
(
'EUR'
,
'GBP'
,
'AUD'
,
'SGD'
)
I will explain the difference between mutable vs. immutable objects in more detail after introducing the last data structure of this section: sets.
Sets are collections that have no duplicate elements. While you can use them for set theory operations, you may use them more often to get the unique values of a list or a tuple. You can create sets by using curly braces like with dictionaries, but containing simple elements instead of key/value pairs:
{
element1
,
element2
,
...
}
The set constructor is often used to get the unique objects in a list or a tuple:
In
[
85
]:
set
([
'a'
,
'b'
,
'b'
,
'a'
,
'c'
,
'c'
,
'c'
])
Out
[
85
]:
{
'a'
,
'b'
,
'c'
}
Other than that, you can apply set theory operations like intersection and union:
In
[
86
]:
a
=
{
1
,
2
,
3
,
4
}
b
=
{
2
,
3
,
5
}
In
[
87
]:
a
.
union
(
b
)
# Same as b.union(a)
Out
[
87
]:
{
1
,
2
,
3
,
4
,
5
}
In
[
88
]:
a
.
intersection
(
b
)
# Same as b.intersection(a)
Out
[
88
]:
{
2
,
3
}
For a full overview of set operations, see the official docs. Before moving on, let’s quickly revise the four data structures we just met in Table 3-1. It shows a sample for each data structure in the notation I used in the previous paragraphs, the so-called literals. Additionally, I am also listing their constructors which offer an alternative to using the literals and are often used to convert from one data structure to another. For example, to convert a tuple to a list, you can do:
In
[
89
]:
mytuple
=
1
,
2
,
3
list
(
mytuple
)
Out
[
89
]:
[
1
,
2
,
3
]
Data Structure | Literals | Constructor |
---|---|---|
List |
|
|
Dictionary |
|
|
Tuple |
|
|
Set |
|
|
At this point, you know all important data types including basic ones like floats and strings and structures like lists and dictionaries. When I was introducing tuples, I mentioned that once created, they can’t be changed unlike for example lists. Let’s dig into that in a bit more detail in the next section.
Objects that can change their values are called mutable and those that can’t are called immutable objects. Table 3-2 shows how the data types of this chapter qualify (we will meet datetime
towards the end of this chapter).
Mutability | Data Types |
---|---|
mutable |
lists, dictionaries, sets |
immutable |
tuples, integers, floats, booleans, strings, datetime |
Knowing about the difference is important as mutable objects sometimes show behavior that may not be what you intuitively expect or what you are used to from other languages including VBA. The classic example is assigning the same mutable object to multiple variable names:
In
[
90
]:
a
=
[
1
,
2
,
3
]
b
=
a
# Assign "a" to "b"
In
[
91
]:
a
Out
[
91
]:
[
1
,
2
,
3
]
In
[
92
]:
b
# As expected, "b" has the same values as "a"
Out
[
92
]:
[
1
,
2
,
3
]
In
[
93
]:
a
[
1
]
=
22
# Change the second element of "a"
In
[
94
]:
a
Out
[
94
]:
[
1
,
22
,
3
]
In
[
95
]:
b
# "b" also changed!
Out
[
95
]:
[
1
,
22
,
3
]
What happened here? As variables are just names that you “attach” to an object, by doing b = a
, you attach both names to the same object, the list [1, 2, 3]
. All variables attached to that object will, therefore, show the changes to the list. This only happens with mutable objects: if you would replace the list with an immutable object like a tuple, changing a
would not change b
. If you want b
to be independent of changes in a
, you can assign b
to a copy of the list. When you slice an object, it returns a copy of that object, so one way of copying a list is by slicing it so that it returns all values, i.e. without start and end parameters, leaving us with a colon:
In
[
96
]:
a
=
[
1
,
2
,
3
]
b
=
a
[:]
# a[:] returns a copy, you could also use: a.copy()
In
[
97
]:
a
Out
[
97
]:
[
1
,
2
,
3
]
In
[
98
]:
b
Out
[
98
]:
[
1
,
2
,
3
]
In
[
99
]:
a
[
1
]
=
22
# Changing "a"...
In
[
100
]:
a
Out
[
100
]:
[
1
,
22
,
3
]
In
[
101
]:
b
# ...doesn't affect "b"
Out
[
101
]:
[
1
,
2
,
3
]
When copying a list by either using the copy
method or the slice notation, you will end up with a shallow copy. What this is and how it differs from a deep copy is explained in [Link to Come]. There, you’ll also find a few more occasions where you need to watch out when dealing with mutable objects.
We are now moving on to the next section about control flow: it shows you how you can execute certain code blocks on certain conditions only and how to repeat the execution of a code block multiple times.
This section presents the if statement, the for loop and the while loop. The if statement allows you to only execute certain lines of code if a condition is met and the for and while loops will execute a block of code repeatedly. At the end of the section, I will also introduce list comprehensions, a way to construct lists that often serve as an alternative to for loops. I will start this section with the definition of code blocks, for which I also need to introduce one of Python’s biggest particularities: significant white space.
A code block defines a section in your source code that is used for something special. For example, you use a code block to define the lines over which your program is looping or it makes up the definition of a function. In Python, you define code blocks by indenting them, not by using keywords like in VBA or curly braces like in most other languages. This is referred to as significant white space. The Python community has settled on four spaces as indentation, but you usually type them in by hitting the tab key: both Jupyter notebooks and VS Code will automatically convert your tab key into four spaces. Let me show you how code blocks are formally defined by using the if statement:
if
condition
:
pass
# Do nothing
The line preceding the code block always terminates with a colon. Since the end of the code block is defined by the end of the indentation, you need to use the pass
statement to create a dummy code block that does nothing. In VBA, this would correspond to this:
If
condition
Then
' Do nothing
End
If
Now that you know how to define code blocks, let’s start using them in the next section by introducing the if statement.
To introduce the if statement, let me get back to one of the introductory examples from Chapter 1:
In
[
102
]:
i
=
20
if
i
<
5
:
(
"i is smaller than 5"
)
elif
i
<=
10
:
(
"i is between 5 and 10"
)
else
:
(
"i is bigger than 10"
)
i is bigger than 10
Compared to VBA, the keywords are lowercase and instead of ElseIf
in VBA, Python uses elif
. If statements are an easy way to tell if a programmer is new to Python or if they have already adopted a Pythonic style: In Python, a simple if statement doesn’t require any parentheses around it. And to test if a value is True
, you don’t need to do that explicitly. Here is what I mean by that:
In
[
103
]:
is_important
=
True
if
is_important
:
(
"This is important."
)
else
:
(
"This is not important."
)
This is important.
The same works if you want to check if a sequence like a list is empty or not:
In
[
104
]:
values
=
[]
if
values
:
(
f
"The following values were provided: {values}"
)
else
:
(
"No values provided."
)
No values provided.
Programmers coming from other languages would often write something like if (is_important == True)
or if len(values) > 0
instead.
Conditional expressions, also called ternary operators, allow you to use a more compact style for simple if/else statements:
In
[
105
]:
is_important
=
False
(
"important"
)
if
is_important
else
(
"not important"
)
not important
Now that you know how to use if statements and conditional expressions, let’s turn our attention to for loops in the next section.
If you need to print the value of ten different variables, you are doing yourself a big favor by not copy/pasting the print statement ten times. Instead, use a for loop to do the work for you. For loops iterate over the items of a sequence like a list, a tuple or a string (remember that strings are sequences of characters). As an introductory example we tell Python to take each element of the currencies
list, assign it to the variable currency
and print it—one after another until there are no more elements in the list:
In
[
106
]:
currencies
=
[
"USD"
,
"HKD"
,
"AUD"
]
for
currency
in
currencies
:
(
currency
)
USD HKD AUD
As a side note, VBA’s For Each
statement is close to how Python’s for loop works. The previous example could be written like this in VBA:
Dim
currencies
(
3
)
As
String
Dim
curr
As
Variant
'currency is a reserved word in VBA
currencies
(
0
)
=
"USD"
currencies
(
1
)
=
"HKD"
currencies
(
2
)
=
"AUD"
For
Each
curr
In
currencies
Debug
.
curr
Next
In Python, if you need a counter variable in a for loop, you can use range
or enumerate
. Let’s first look at range
which provides a sequence of numbers: you call it by either providing a single stop
argument or by providing a start
and stop
argument, with an optional step
argument. Like with slicing, start
is inclusive, stop
is exclusive and step
determines the difference between the numbers with one being the default:
range
(
stop
)
range
(
start
,
stop
,
step
)
Since range
is lazy and only gives you back something when you request it, you need to convert it to a list to be able to see the generated sequence:
In
[
107
]:
list
(
range
(
5
))
Out
[
107
]:
[
0
,
1
,
2
,
3
,
4
]
In
[
108
]:
list
(
range
(
2
,
5
,
2
))
Out
[
108
]:
[
2
,
4
]
When you use range
in a for loop, you can use it directly though:
In
[
109
]:
for
i
in
range
(
3
):
(
i
)
0 1 2
If you need a counter variable while looping over a sequence, you can use enumerate
. It returns a sequence of (index, element)
tuples. By default, the index starts at zero and increments by one. You can use enumarate
in a loop like this:
In
[
110
]:
for
i
,
currency
in
enumerate
(
currencies
):
(
i
,
currency
)
0 USD 1 HKD 2 AUD
Looping over tuples and sets works the same as with lists. When you loop over dictionaries, it will loop over the keys:
In
[
111
]:
exchange_rates
=
{
"EURUSD"
:
1.1151
,
"GBPUSD"
:
1.2454
,
"AUDUSD"
:
0.6161
}
for
currency_pair
in
exchange_rates
:
(
currency_pair
)
EURUSD GBPUSD AUDUSD
By using the items
method, you can get the key and the value at the same time as tuple:
In
[
112
]:
for
currency_pair
,
exchange_rate
in
exchange_rates
.
items
():
(
currency_pair
,
exchange_rate
)
EURUSD 1.1151 GBPUSD 1.2454 AUDUSD 0.6161
To exit a loop, use the break
statement:
In
[
113
]:
for
i
in
range
(
15
):
if
i
==
2
:
break
else
:
(
i
)
0 1
You can skip the remainder of a loop with the continue
statement which means that execution continues with a new loop and the next element:
In
[
114
]:
for
i
in
range
(
4
):
if
i
==
2
:
continue
else
:
(
i
)
0 1 3
When comparing for loops in VBA with Python, there is a subtle difference: in VBA, the counter variable increases beyond your upper limit after finishing the loop:
For
i
=
1
To
3
Debug
.
i
Next
i
Debug
.
i
This prints:
1 2 3 4
In Python, it works probably more how you would expect:
In
[
115
]:
for
i
in
range
(
1
,
4
):
(
i
)
(
i
)
1 2 3 3
While for loops run a loop for all elements in a collection, you can use while loops to run a loop while a certain condition is true. Let’s see how they work!
After working through for loops, while loops should be straight forward. While loops are looping while a certain condition is True
:
In
[
116
]:
n
=
0
while
n
<=
2
:
(
n
)
n
+=
1
0 1 2
The break
and continue
statements work the same as with the for loop. Note that I have used the augmented assignment notation here: n += 1
. This is the same as if you would write n = n + 1
. You can do the same trick with all the other mathematical operators that I’ve introduced earlier on, for example for minus it would be -=
.
Quite often, you will need to collect certain elements in a list for further processing. In this case, Python offers an alternative to writing loops: list, dictionary and set comprehensions.
While list, dictionary and set comprehensions are technically a way to create the respective data structure, they often replace a for loop which is why I am introducing them here. Assume that in the following list of USD currency pairs, you’d like to pick out those currencies where USD is quoted as the second currency. You could write the following for loop:
In
[
117
]:
currency_pairs
=
[
"USDJPY"
,
"USDGBP"
,
"USDCHF"
,
"USDCAD"
,
"AUDUSD"
,
"NZDUSD"
]
In
[
118
]:
usd_quote
=
[]
for
pair
in
currency_pairs
:
if
pair
[
3
:]
==
"USD"
:
usd_quote
.
append
(
pair
[:
3
])
usd_quote
Out
[
118
]:
[
'AUD'
,
'NZD'
]
This is often easier to write with a list comprehension. A list comprehension is a concise way of creating a list. You can grab its syntax from this example which does the same as the previous for loop:
In
[
119
]:
[
pair
[:
3
]
for
pair
in
currency_pairs
if
pair
[
3
:]
==
"USD"
]
Out
[
119
]:
[
'AUD'
,
'NZD'
]
If you don’t have any condition to satisfy, simply leave the if part away. For example, to invert all the currency pairs so that the first currency comes second and vice versa, you would do:
In
[
120
]:
[
pair
[
3
:]
+
pair
[:
3
]
for
pair
in
currency_pairs
]
Out
[
120
]:
[
'JPYUSD'
,
'GBPUSD'
,
'CHFUSD'
,
'CADUSD'
,
'USDAUD'
,
'USDNZD'
]
With dictionaries, you can do dictionary comprehensions:
In
[
121
]:
exchange_rates
=
{
"EURUSD"
:
1.1151
,
"GBPUSD"
:
1.2454
,
"AUDUSD"
:
0.6161
}
{
k
:
v
*
100
for
(
k
,
v
)
in
exchange_rates
.
items
()}
Out
[
121
]:
{
'EURUSD'
:
111.50999999999999
,
'GBPUSD'
:
124.54
,
'AUDUSD'
:
61.61
}
And with sets you can do set comprehensions:
In
[
122
]:
{
s
+
"USD"
for
s
in
[
"EUR"
,
"GBP"
,
"EUR"
,
"HKD"
,
"HKD"
]}
Out
[
122
]:
{
'EURUSD'
,
'GBPUSD'
,
'HKDUSD'
}
At this point, you are already able to write a small script as you know most of the basics building blocks of Python. In the next section, you will learn how to organize your code to keep it maintainable when your scripts start to get bigger.
This section introduces various means to bring your code into a maintainable structure: first, I’ll introduce functions with all the details that you will commonly need. Next, we’ll see how you can split your codebase up into different Python modules and after that, you’ll learn a bit more about objects, specifically about classes that are used to define objects. To conclude this section, you will see modules and classes in action by working with datetime
objects.
Even if you will use Python for simple scripts only, you will still be using functions all the time: they are one of the most important constructs of every programming language and allow you to reuse the same lines of code from all over your program. I will start this section by showing you how to define and call functions. Then, I will explain how you can write functions in one line by using lambda expressions before wrapping this section up by giving you a quick introduction to function decorators, that allow you to change the behavior of a function.
To write your own function in Python, you have to use the keyword def
that stands for function definition. Unlike VBA, Python doesn’t differentiate between a function and a subroutine. In Python, the equivalent of a subroutine is simply a function that doesn’t return anything. Functions in Python follow the syntax for code blocks, i.e. you end the first line with a colon and indent the body of the function:
def
function_name
(
required_argument
,
optional_argument
=
default_value
,
...
):
return
value1
,
value2
,
...
Required arguments do not have a default value. Multiple arguments are separated by commas.
You make an argument optional by supplying a default value. None
is often used to make an argument optional if there is no meaningful default.
The return
keyword defines the value that the function returns. If you leave it away, the function automatically returns None
. As shown, Python conveniently allows you to return multiple values separated by commas.
To be able to play around with a function, let’s define one that is able to convert the temperature from Fahrenheit or Kelvin to degrees Celsius:
In
[
123
]:
def
convert_to_celsius
(
degrees
,
source
=
'fahrenheit'
):
if
source
.
lower
()
==
'fahrenheit'
:
return
(
degrees
-
32
)
*
5
/
9
elif
source
.
lower
()
==
'kelvin'
:
return
degrees
-
273.15
else
:
return
f
"Don't know how to convert from {source}"
I am using the string method lower
here which transforms the provided strings to all lowercase. This allows us to accept the source
string with any capitalization while the comparison will still work. Now that we have defined a function, let’s see how we can call it!
As briefly mentioned at the beginning of this chapter, you call a function by adding parentheses to the function name, enclosing the function arguments:
value1
,
value2
,
...
=
function_name
(
positional_arg
,
arg_name
=
value
,
...
)
If you provide a value as a positional argument (positional_arg
), the values are matched to the arguments according to their position in the function definition.
By providing the argument in the form arg_name=value
, you’re providing a keyword argument. This has the advantage that you can provide the arguments in any order. It is also more explicit to the reader and can make it easier to understand. For example, if the function definition is def f(a, b)
, you could call the function like this: f(b=1, a=2)
.
Let’s play around with the convert_to_celsius
function to see how this all works in practice:
In
[
124
]:
convert_to_celsius
(
100
,
'fahrenheit'
)
# Positional arguments
Out
[
124
]:
37.77777777777778
In
[
125
]:
convert_to_celsius
(
50
)
# Will use the default source (fahrenheit)
Out
[
125
]:
10.0
In
[
126
]:
convert_to_celsius
(
source
=
'kelvin'
,
degrees
=
0
)
# Keyword arguments
Out
[
126
]:
-
273.15
Sometimes, you need to define a simple function that you only use once. In this case, Python allows you to write a lambda expression which is the topic of the next section.
Python allows you to define a function in a single line via lambda expressions. Lambda expressions are anonymous functions which means that it is a function without a name. Consider this function:
def
function_name
(
arg1
,
arg2
,
...
):
return
return_value
This function can be rewritten as a lambda expression like this:
lambda
arg1
,
arg2
,
...
:
return_value
In essence, you replace def
with lambda
, leave away the return
keyword and the function name and put everything on one line. This can be useful at times as Python allows you to pass around functions as function arguments. If this sounds confusing, have a look at the following example:
In
[
127
]:
def
format_string
(
string
,
modifier
):
return
modifier
(
string
)
In
[
128
]:
def
greet
(
name
):
return
'hello '
+
name
In
[
129
]:
format_string
(
'world'
,
greet
)
Out
[
129
]:
'hello world'
In this sample, you submit greet
as the function argument so that the format_string
function knows how to format the string. Note that when you pass around a function, you do so by using the function name alone, without the parentheses that would cause the function to execute, i.e. greet
and not greet()
. You can rewrite the same without having to define an external function like so:
In
[
130
]:
format_string
(
'world'
,
lambda
name
:
'hello '
+
name
)
Out
[
130
]:
'hello world'
In
[
131
]:
format_string
(
'world'
,
lambda
name
:
(
'hello '
+
name
)
.
upper
())
Out
[
131
]:
'HELLO WORLD'
The second example defines a different function for the modifier so that the format_string
function returns the string in uppercase by making use of the upper
string method.
At this point, you know how to define functions, how to call them and how to write them in one line. The next topic wraps up the section about functions and describes how you can change the behavior of a function by using a decorator.
Function decorators don’t usually make it into introductory texts, but since we will use decorators when we get to user-defined functions (UDFs) in [Link to Come], I briefly want to show you how they work behind the scenes. You use a decorator in the following way:
@decorator_name
def
function_name
(
...
):
return
...
A decorator is a function name that you put on top of your function, starting with the @
sign. The decorator is a simple way to change the behavior of your function. The following example shows the definition of a decorator called verbose
that will print some text before and after the function is run. Technically, the decorator takes the function (print_hello
) and provides it as argument func
to the verbose
function. The inner function called wrapper
can then do whatever needs to be done, in this case, it prints a value before and after executing the print_hello
function. The name of the inner function doesn’t matter.
In
[
132
]:
# This is the definition of the function decorator
def
verbose
(
func
):
def
wrapper
():
(
"Before calling the function."
)
func
()
(
"After calling the function."
)
return
wrapper
In
[
133
]:
# Using a function decorator
@verbose
def
print_hello
():
(
"hello!"
)
In
[
134
]:
# Effect of calling the decorated function
print_hello
()
Before calling the function. hello! After calling the function.
You probably don’t need to write your own decorator anytime soon. Still, having an idea about what’s going on behind the scenes when using one, should allow you to decorate your functions without feeling like using dark magic.
Functions are a way of organizing your code to make it reusable and maintainable. If you write a lot of code and a lot of functions, you will additionally need to split it up into multiple files to keep the overview. The next section gives you an introduction to working with multiple source files.
When you write code for bigger projects, you will have to split it into different files to be able to bring it into a maintainable structure. As we have already seen in Chapter 2, Python files have the extension .py
and you usually refer to your main file as a script. If you now want your main script to access functionality from the other files, you need to import that functionality first. In that context, Python source files are called modules. To get a better feeling for how this works and what the different import options are, start by creating a new module, i.e. a file called temperature.py
in the same directory as you run the Jupyter notebook of this chapter, e.g. in C:Usersusernamepython-for-excelch03
. Alternatively, the file is also included in the companion repo. Use VS Code to edit the file as shown in Example 3-1.
TEMPERATURE_SCALES
=
(
'fahrenheit'
,
'kelvin'
,
'celsius'
)
def
convert_to_celsius
(
degrees
,
source
=
'fahrenheit'
):
if
source
.
lower
()
==
'fahrenheit'
:
return
(
degrees
-
32
)
*
5
/
9
elif
source
.
lower
()
==
'kelvin'
:
return
degrees
-
273.15
else
:
return
f
"Don't know how to convert from {source}"
(
"This is the temperature module."
)
In your Jupyter notebook, you can now import the temperature
module. To import, you only use the name of the module, without the .py
ending. After running the import statement, you will have access to all the objects in that Python module via the dot notation. For example, you can do temperature.convert_to_celsius()
to perform your conversion:
In
[
135
]:
import
temperature
This is the temperature module.
In
[
136
]:
temperature
.
TEMPERATURE_SCALES
Out
[
136
]:
(
'fahrenheit'
,
'kelvin'
,
'celsius'
)
In
[
137
]:
temperature
.
convert_to_celsius
(
120
,
'fahrenheit'
)
Out
[
137
]:
48.888888888888886
Note that I used uppercase letters for TEMPERATURE_SCALES
to express that it is a constant—I will say more about that a little later in this chapter. When running import temperature
, Python will run the temperature.py
file from top to bottom. You can easily see this happening since importing the module will fire the print function.
If you run the import temperature
cell again, you will notice that it does not print anything anymore. This is because Python modules are only imported once per session. If you change a module that you import, you need to restart your Python interpreter to pick up all the changes.
In reality, you usually don’t put print statements in modules. This was only to show you the effect of importing a module more than once. Most commonly you put functions and classes (see next section) in your modules. If you don’t want to type temperature
every time you use an object from the temperature module, you can change things like this:
In
[
138
]:
import
temperature
as
tp
In
[
139
]:
tp
.
TEMPERATURE_SCALES
Out
[
139
]:
(
'fahrenheit'
,
'kelvin'
,
'celsius'
)
Assigning a short alias tp
to your module can make it easier to use while it’s still always clear where an object comes from. Many third party packages are suggesting a specific convention when using an alias. For example, pandas is using import pandas as pd
. There is one more option you can use to import objects from another module:
In
[
140
]:
from
temperature
import
TEMPERATURE_SCALES
,
convert_to_celsius
In
[
141
]:
TEMPERATURE_SCALES
Out
[
141
]:
(
'fahrenheit'
,
'kelvin'
,
'celsius'
)
By using the from x import y
syntax, you import specific objects only. By doing this, you are importing them directly into the namespace of your main script: that is, without looking at the import statements, you won’t be able to tell whether the imported objects were defined in your current Python script or Jupyter notebook or if they come from another module. This could cause conflicts: if your main script has a function called convert_to_celsius
, it would override the one that you are importing from your temperature module. If, however, you use one of the two previous methods, your local function and the one from the imported module could live next to each other as convert_to_celsius
and temperature.convert_to_celsius
.
A common source for errors is to name your Python file the same as an existing Python package or module. If you create a file to test out some pandas functionality, don’t call that file pandas.py
as this will cause conflicts.
Knowing about functions and modules, you can organize your code in a meaningful way already. Object-oriented programming takes this a step further and allows you to group related functions and data into a single object. How this works is the topic of the next section.
In this section, I want to add a little more meat to the bone with regards to object-oriented programming (OOP). You’re still not going to be an OOP expert after reading this section, but you should have a much better understanding of what object methods and attributes are and how objects behave. So far, you know that everything in Python is an object and that they group related attributes and methods. In fact, we have made a few method calls in this chapter, for example, to append an object to a list or to make all characters of a string uppercase or lowercase.
What you don’t know yet is that you use classes to define new types of objects: a class behaves like a springform you use to bake a cake: depending on the ingredients you use, you get a different cake, for example, a chocolate cake or a cheesecake. The process of getting a cake (the object) out of the springform (the class) is called instantiation. Objects are therefore also called class instances. Whether chocolate or cheesecake, they are both a type of cake: with classes you can define new data types that help structure and organize your program. Let me now return to the car game example from the beginning of the chapter to make this a little less abstract. Let’s start by defining a Car
class:
In
[
142
]:
class
Car
:
def
__init__
(
self
,
color
,
speed
=
0
):
self
.
color
=
color
self
.
speed
=
speed
def
accelerate
(
self
,
mph
):
self
.
speed
+=
mph
This is a simple car class. Methods are functions that are part of a class definition. This class has one regular method called accelerate
. This method will change the data (speed) of an instance of this class. It also has a special method that starts and ends with double underscores called __init__
. It will be called automatically by Python when an object is initialized to attach some initial data to the object. The first argument of every method has to be self
which represents the instance of the class. This will get clearer when you see how you can use the Car
class. First, let’s instantiate a car for two players. You are doing this in the same way as you are calling a function: call the class by adding parentheses and by providing the arguments of the __init__
method. You never provide anything for self
, as Python will take care of that: in this sample, self
will be either car1
or car2
.
In
[
143
]:
car1
=
Car
(
'red'
)
car2
=
Car
(
color
=
'blue'
)
When you call a class, you are really calling the __init__
function which is why everything with regards to function arguments applies here, too: for car1
, the argument is provided as positional argument, while for car2
, I am using keyword arguments. After instantiating the two car objects from the Car
class, you can look at their attributes and call their methods. As you will see, after accelerating car1
, the speed of the car1
is changed, but it is unchanged for car2
as the two objects are independent:
In
[
144
]:
# By default it prints the memory location of that object
car1
Out
[
144
]:
<
__main__
.
Car
at
0x7f9b09429d50
>
In
[
145
]:
# Attributes give you access to the data of an object
car1
.
color
,
car1
.
speed
Out
[
145
]:
(
'red'
,
0
)
In
[
146
]:
# Calling the accelerate method on car_player_1
car1
.
accelerate
(
20
)
In
[
147
]:
# The attributes of car_player_1 changed
car1
.
speed
Out
[
147
]:
20
In
[
148
]:
# The attributes of car_player_2 remained the same
car2
.
speed
Out
[
148
]:
0
In Python, you can also change attributes directly without having to use methods:
In
[
149
]:
car1
.
color
=
'green'
In
[
150
]:
car1
.
color
Out
[
150
]:
'green'
To summarize: classes define the attributes and methods of objects. Classes allow you to group related functions (“methods”) and attributes together so that they can conveniently be accessed via dot notation: myobject.attribute
or myobject.method()
. Now that you have a better understanding of objects and know about importing modules, you are ready to learn about how to deal with date and time as this requires you to import the datetime
class.
Working with date and time is a common operation in Excel but it comes with limitations: for example, Excel’s cell format for time doesn’t support smaller units than milliseconds and time zones are not supported at all. In Excel, date and time are stored as a simple float called the date serial number. The Excel cell is then formatted to display it as date and/or time. For example, January 1st, 1900 has the date serial number of 1 which means that this is also the earliest date that you can work with in Excel. Time gets translated into the decimal part of the float, e.g. 01/01/1900 10:10:00
is represented by 1.4236111111
.
In Python, you work with the datetime
module which is part of the standard library. If you need to work with time-zone-aware time stamps, you can use the package dateutil
which isn’t part of the standard library but it comes pre-installed with Anaconda. Let’s go through a couple of examples to see how you work with datetime
objects in Python. Since both the Python module as well as the class contained therein are called datetime
, I will use the following import convention throughout this book: import datetime as dt
. The syntax to instantiate a datetime
object is as follows:
import
datetime
as
dt
dt
.
datetime
(
year
,
month
,
day
,
hour
,
minute
,
second
,
microsecond
,
timezone
)
Let’s first have a look at time-zone-naive datetime
objects:
In
[
151
]:
import
datetime
as
dt
In
[
152
]:
timestamp
=
dt
.
datetime
(
2020
,
1
,
31
,
14
,
30
)
timestamp
Out
[
152
]:
datetime
.
datetime
(
2020
,
1
,
31
,
14
,
30
)
In
[
153
]:
# Difference returns a timedelta object
timestamp
-
dt
.
datetime
(
2020
,
1
,
14
,
12
,
0
)
Out
[
153
]:
datetime
.
timedelta
(
days
=
17
,
seconds
=
9000
)
In
[
154
]:
# Accordingly, you can also work with timedeltas
timestamp
+
dt
.
timedelta
(
days
=
1
,
hours
=
4
,
minutes
=
11
)
Out
[
154
]:
datetime
.
datetime
(
2020
,
2
,
1
,
18
,
41
)
To format datetime
objects into strings, you can use the strftime
function and to parse a string and convert it into datetime
object, you can use the strptime
function. You can find an overview of the accepted format codes in the datetime docs.
In
[
155
]:
# Format a datetime object in a specific way
timestamp
.
strftime
(
"
%d
/
%m
/
%Y
%H
:
%M
"
)
Out
[
155
]:
'31/01/2020 14:30'
In
[
156
]:
# Parse a string into a datetime object
dt
.
datetime
.
strptime
(
'12.1.2020'
,
"
%d
.
%m
.
%Y
"
)
Out
[
156
]:
datetime
.
datetime
(
2020
,
1
,
12
,
0
,
0
)
Let’s now turn our attention to time zones. Most of the time, you want to work in the UTC time zone and only transform to local time zones for display purposes. UTC stands for Coordinated Universal Time and is the successor of Greenwich Mean Time (GMT). When working with Excel and Python, a common job may be to turn naive time stamps, as delivered by Excel, into time-zone-aware datetime
objects. As an example, if you get intraday stock prices in Excel based on local timestamps from stock exchanges across the globe, you will want to assign them their proper time zone before comparing them. The following samples show a few common operations when working with datetime
objects and time zones:
In
[
157
]:
from
dateutil
import
tz
In
[
158
]:
# Naive datetime
timestamp
=
dt
.
datetime
(
2020
,
1
,
31
,
14
,
30
)
# Printing in isoformat makes it easy to
# see the offset from UTC
timestamp
.
isoformat
()
Out
[
158
]:
'2020-01-31T14:30:00'
In
[
159
]:
# Time-zone-aware datetime object
timestamp_eastern
=
dt
.
datetime
(
2020
,
1
,
31
,
14
,
30
,
tzinfo
=
tz
.
gettz
(
"US/Eastern"
))
timestamp_eastern
.
isoformat
()
Out
[
159
]:
'2020-01-31T14:30:00-05:00'
In
[
160
]:
# Assign a time zone to a naive datetime object
timestamp_eastern
=
timestamp
.
replace
(
tzinfo
=
tz
.
gettz
(
"US/Eastern"
))
timestamp_eastern
.
isoformat
()
Out
[
160
]:
'2020-01-31T14:30:00-05:00'
In
[
161
]:
# Convert from one time zone to another time zone.
# Since the UTC time zone is so common,
# there is a shortcut: tz.UTC
timestamp_utc
=
timestamp_eastern
.
astimezone
(
tz
.
UTC
)
timestamp_utc
.
isoformat
()
Out
[
161
]:
'2020-01-31T19:30:00+00:00'
In
[
162
]:
# From time-zone-aware to naive
timestamp_eastern
.
replace
(
tzinfo
=
None
)
Out
[
162
]:
datetime
.
datetime
(
2020
,
1
,
31
,
14
,
30
)
In
[
163
]:
# Current time without time zone
dt
.
datetime
.
now
()
Out
[
163
]:
datetime
.
datetime
(
2020
,
8
,
24
,
12
,
37
,
22
,
917120
)
In
[
164
]:
# Current time in UTC time zone
dt
.
datetime
.
now
(
tz
.
UTC
)
Out
[
164
]:
datetime
.
datetime
(
2020
,
8
,
24
,
10
,
37
,
22
,
921704
,
tzinfo
=
tzutc
())
You now have a solid understanding of how to write scripts, organize your code and use functionality from the standard library like the datetime
module. What I haven’t shown you yet is how to deal with errors and files, the topic of the next section.
When an error happens in your Python script, the script will exit and print the error along with the last few lines of code that the program executed, just before the error happened. This is called the stacktrace or—using Python’s idiomatic terminology—the traceback. Often, you know which errors can happen and are therefore able to handle them gracefully. This is what the first part of this section is about. In the second part, we’re going to look at the with
statement that allows us to properly handle resources like files—even if an exception occurs during the operation.
I was mentioning exception handling in Chapter 1 as an example of where VBA with its go to mechanism has fallen behind. Here, I show you how Python uses the try/except mechanism to handle errors in your programs. Whenever something is outside of your control, there is a potential for errors happening. For example, the email server may be down when you try to send an email or a file may be missing that your program expects. Dealing with user input is another area where you have to prepare for inputs that don’t make sense. As usual, let’s get some practice: if the following function is called with a zero, you will get a ZeroDivisionError
:
In
[
165
]:
def
print_output
(
number
):
output
=
10
/
number
(
f
'Output: {output}'
)
In
[
166
]:
print_output
(
0
)
--------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) <ipython-input-166-74d723c5398e> in <module> ----> 1 print_output(0) <ipython-input-165-63114ef61b20> in print_output(number) 1 def print_output(number): ----> 2 output = 10 / number 3 print(f'Output: {output}') ZeroDivisionError: division by zero
To let your program react gracefully to such errors, you can use the try/except statements:
In
[
167
]:
def
print_output
(
number
):
try
:
output
=
10
/
number
(
f
'Output: {output}'
)
except
Exception
as
e
:
(
'There was an error: '
+
repr
(
e
))
else
:
(
'No exception was raised!'
)
finally
:
(
'Cleaning up!'
)
Whenever an error occurs in the try
block, code execution moves on to the except
block where you can handle the error. The as e
makes the exception object available as a variable e
. Getting the actual error string out of it is best done via repr(e)
. repr
stands for printable representation of an object and—in the case of exception objects—gives you back a string with the error message. Handling an error like this allows you to give the user helpful feedback or write the error to a log file. The else
clause runs if there was no error raised during the try
block and the finally
block runs always, whether or not an error was raised. Often, you will get away with just the try
and except
blocks. The following examples show how the program reacts given different inputs:
In
[
168
]:
print_output
(
1
)
Output: 10.0 No exception was raised! Cleaning up!
In
[
169
]:
print_output
(
'a'
)
There was an error: TypeError("unsupported operand type(s) for /: 'int' and 'str'") Cleaning up!
In
[
170
]:
print_output
(
0
)
There was an error: ZeroDivisionError('division by zero') Cleaning up!
The way that I have used the except statement means that any exception that happens in the try
block will cause the code execution to continue in the except
block. Usually, that is not what you want. You want to check for an error as specific as possible and handle only those you expect. Your program may otherwise fail for something completely unexpected which makes it hard to debug. To fix this, rewrite the function as follows, checking explicitly for the two errors that we expect:
In
[
171
]:
def
print_output
(
number
):
try
:
output
=
10
/
number
(
'Output: '
+
str
(
output
))
except
(
TypeError
,
ZeroDivisionError
)
as
e
:
(
'Please type in any number except 0.'
)
Let’s run the code again:
In
[
172
]:
print_output
(
'a'
)
Please type in any number except 0.
If you want to handle an error differently depending on which exception occurs, you can handle each exception separately:
In
[
173
]:
def
print_output
(
number
):
try
:
output
=
10
/
number
(
'Output: '
+
str
(
output
))
except
TypeError
as
e
:
(
'Please type in a number.'
)
except
ZeroDivisionError
as
e
:
(
"You can't use 0."
)
The next section is about exception handling in the context of managing resources such as files. For that case, Python offers the with statement that allows you to write simpler code.
This section shows you how to use the with
statement by using it with files. First of all, the with
statement in Python doesn’t have anything to do with the With
statement in VBA. In VBA, it is used to run a series of statements on the same object, while in Python, it is used to manage resources like files or database connections. If you want to load the latest sales data to be able to analyze it, you may have to open a file or establish a connection to a database. After you’re done reading the data, it’s best practice to close the file or connection as soon as possible again. Otherwise, you may run into situations where you can’t open another file or can’t establish another connection to the database: file handlers and database connections are limited resources. Open and closing a file manually works like this (w
stands for opening the file in write mode which replaces the file if it already exists):
In
[
174
]:
f
=
open
(
'output.txt'
,
'w'
)
f
.
write
(
'Some text'
)
f
.
close
()
Running this code will create a file called output.txt
in the same directory as the notebook you are running this from and write “some text” into it. To read a file, you would use the r
mode instead of w
and to append to the end of the file, use a
. Since files can also be manipulated from outside of your program, such an operation could fail. Knowing about exception handling from the last section, you could come up with a solution that uses try/except together with the finally
block where you make sure that the file is closed again—whether or not an exception happened. This works, but since this is such a common operation, Python is providing the with
statement to make things easier:
In
[
175
]:
with
open
(
'output.txt'
,
'w'
)
as
f
:
f
.
write
(
'Some text'
)
When code execution leaves the body of the with
statement, the file is automatically closed, whether or not there was an exception happening. Objects that support the with
statement are called context managers.
At this point, I have covered all basic language concepts that I wanted to introduce in this chapter. What’s left is learning some basics about code formatting. While you could use the same conventions with Python that you may have adopted for your VBA code, it would make it hard to work together with other Python programmers. Therefore, the next section introduces you to the rules that the Python community has adopted.
You may have been wondering why I was sometimes using names with underscores or in all caps. This section will give you a few rules to make your code follow Python’s official style guide. Python uses so-called Python Enhancement Proposals (PEPs) to discuss the introduction of new language features. One of these, the Style Guide for Python Code is usually referred to by its number: PEP 8. PEP 8 is a set of style recommendations for the Python community: if everybody, who works on the same code, adheres to the same style guide, the code becomes much more readable. This becomes especially useful in the world of open-source where many programmers, who often don’t know each other personally, work on the same project. Example 3-2 shows a short Python file to introduce the most important conventions.
"""This script shows a few PEP 8 rules.
"""
import
datetime
as
dt
from
dateutil
import
tz
TEMPERATURE_SCALES
=
(
'
fahrenheit
'
,
'
kelvin
'
,
'
celsius
'
)
class
DummyClass
:
pass
# doesn't do anything at the moment
def
convert_to_celsius
(
degrees
,
source
=
'
fahrenheit
'
)
:
"""This function converts degrees Fahrenheit or Kelvin into degrees Celsius.
"""
if
source
.
lower
(
)
==
'
fahrenheit
'
:
return
(
degrees
-
32
)
*
5
/
9
elif
source
.
lower
(
)
==
'
kelvin
'
:
return
degrees
-
273.15
else
:
return
f
"
Don
'
t know how to convert from {source}
"
celsius
=
convert_to_celsius
(
44
,
source
=
'
fahrenheit
'
)
non_celcius_scales
=
TEMPERATURE_SCALES
[
:
-
1
]
(
'
Current time:
'
+
dt
.
datetime
.
now
(
tz
.
UTC
)
.
isoformat
(
)
)
(
f
'
The temperature in Celsius is: {celsius}
'
)
Explain what the script/module does with a docstring at the top.
All imports are at the top of the file, one per line. List the imports of the standard library first, then those of third-party packages and finally those from your own modules. This sample doesn’t import an own module.
Use capital letters with underscores for constants. Use a maximum line length of 79 characters. If possible, take advantage of parentheses, square brackets or curly braces for implicit line breaks.
Separate classes and functions with 2 empty lines from the rest of the code.
Classes should use CapitalizedWords
as names.
Inline comments should be separated by at least 2 spaces from the code. Code blocks should be indented by 4 spaces.
Functions and function arguments should use lowercase names with underscores if they improve readability. Don’t use spaces between the argument name and its default value.
A function’s docstring should also list and explain the function arguments. This is not done here to keep the sample short.
Don’t use spaces around the colon.
Use spaces around mathematical operators. If operators with different priorities are used, use spaces around those with the lowest priority.
Use lowercase names for variables with underscores if they improve readability. When assigning a variable name, use spaces around the equal sign. However, when calling a function, don’t use spaces around the equal sign used with keyword arguments.
With indexing and slicing, don’t use spaces around the square brackets.
This is a very simplified summary of PEP 8 so it’s a good idea to have a look at the original PEP 8 once you start to get more serious with Python. PEP 8 clearly states that it is a recommendation and that your own style guides will have preference. After all, consistency is the most important factor. If you are interested in other publicly available guidelines, you may want to have a look at Google’s style guide for Python which is reasonably close to PEP 8. In practice, most Python programmers loosely adhere to PEP 8. Ignoring the maximum line length of 79 characters is probably the most prominent violation of PEP 8.
Since it might be difficult to think about your algorithm and formatting them properly at the same time, you can have your style checked automatically. The next section shows you how this works with VS Code.
When working with VS Code, there is an easy way to make sure your code sticks to PEP 8: you can use a linter. A linter checks your source code for syntax and style errors. Fire up the command palette and search for Python: Select Linter
. A popular option is flake8
. If enabled, VS Code will underline issues with squiggly lines every time you save your file. Hovering over such a squiggly line will give you an explanation in a tooltip. You can switch the linter off again by searching for Python: Enable Linting
in the command palette and choosing no
. If you prefer, you can also run flake8
on the command line to have a report printed:
$ flake8 sample.py
Python has recently taken static code analysis a step further by adding support for type hints. The next section explains how they work.
In VBA, you often see code that prefixes each variable with an abbreviation for the data type, like strEmployeeName
or wbWorkbookName
. While nobody will stop you from doing this in Python, it isn’t commonly done. You also won’t find an equivalent to VBA’s Option Explicit
or Dim
statement to declare the type of a variable. Instead, Python 3.5 introduced a feature called type hints. Type hints are also referred to as type annotations and allow you to declare the data type of a variable. They are completely optional and have no effect on how the code is run by the Python interpreter (there are, however, third-party packages like pydantic
that can enforce type hints at runtime). The main purpose of type hints is to allow text editors like VS Code to catch more errors before running the code. Type hints can also improve code autocompletion of VS Code or other editors. The most popular type checker for type annotated code is mypy
which VS Code allows you to select as a linter. To get a feeling of how type annotations work in Python, here is a short sample without type hints:
x
=
1
def
hello
(
name
):
return
f
"Hello {name}!"
And again with type hints:
x
:
int
=
1
def
hello
(
name
:
str
)
->
str
:
return
f
"Hello {name}!"
As type hints make generally more sense in bigger code bases, I am not going to use them in the remainder of this book.
This chapter was an intense introduction to Python. We met all the important building blocks of the language including data structures, functions, classes and modules. We also touched on some of Python’s particularities like meaningful white space, mutable vs. immutable objects and code formatting guidelines, better known as PEP 8. To continue with this book, you won’t need to know all the details: as a beginner, just knowing about lists and dictionaries, indexing and slicing as well as how to work with functions, modules, for loops and if statements will get you very far already.
Compared to VBA, I find Python more consistent and powerful but at the same time easier to learn. If you are a VBA die-hard fan and this chapter didn’t convince you just yet, I am pretty sure the next one will: there, I will give you an introduction to array-based calculations before starting our data analysis journey with the pandas library. pandas’ core data structure is called DataFrame and can act as a replacement for Excel or as an interface between Python and Excel.
3.145.50.83