Imagine trying to communicate without words.
All programs process data, and one of the most important types of data is text. In this chapter, you’ll work through the basics of textual data. You’ll automatically search text and get back exactly what you’re looking for. Along the way, you’ll pick up key programming concepts such as methods and how you can use them to bend your data to your will. And finally, you’ll instantly power up your programs with the help of library code.
Starbuzz Coffee has made a name for itself as the fastest growing coffee shop around. If you’ve seen one on your local corner, look across the street; you’ll see another one.
The Starbuzz CEO is always on the lookout for ways to boost profits, and he’s come up with a great idea. He wants a program that will show him the current price of coffee beans so that his buyers can make informed decisions about when to buy.
The previous programmer has already made a head start on the code, and we can use this as a basis. Here’s the existing Python code, but what does it do?
Take a closer look at the results of the program. The current price of beans is right in the middle of the output:
The Starbuzz CEO would find it a lot easier if you could extract the price of beans and just display that, rather than have to look for it in the HTML. But how do you do that?
The output of the Starbuzz program is an example of a string. In other words, it’s a series of characters like this:
Somewhere within the string is the price of coffee beans. To retrieve just the price, all you need to do is go to the right bit of the string, retrieve the characters that give the price, and display just those characters. But how?
The computer keeps track of individual characters by using two pieces of information: the start of the string and the offset of an individual character. The offset is how far the individual character is from the start of the string.
The first character in a string has an offset of 0, because it is zero characters from the start. The second character has an offset of 1, and so on:
The offset value is always 1 less than the position. Python lets you read a single character from a string by providing the offset value in square brackets after the variable name. Because the offset value is used to find a character, it is called the index of the character:
For Starbuzz, you don’t just need a single character. You need to extract the price from the string of HTML, and the price is made up of several characters.
You need to extract a smaller substring from a bigger string. A substring is a sequence of characters contained within another string. Specifying substrings in Python is a little like reading single characters from a string, except that you provide two index values within the square brackets:
The CEO just got great news from the beans supplier.
The supplier actually maintains two prices: one for regular customers and one for loyalty program customers. The different prices are published on different web pages:
That means you need to change the web page address in the code:
Let’s run it to make sure everything works OK.
The web page for loyalty customers is much more dynamic than the old web page. The page for regular customers always displays the price in a substring beginning at index 234. That’s not true for the loyalty program web page. The price on that page can be almost anywhere. All you know for sure is that the price follows the substring >$
:
You need to search for the price string.
You already know how to find a substring, so you could run through the entire web page and check each two characters to see if they match >$, like this:
You could do it this way... but should you?
There’s a lot to worry about. Which two characters are you currently comparing? Where in the string are you right now? What if “>$
” isn’t found? Searching for substrings in strings is a little more complex than it first appears...
But if you don’t want to write code to search the string, what else could you do?
The more code you write, the more you will find that you need to do the same kind of things to the data in your variables all the time. To prevent you from having to create the same code over and over, programming languages provide built-in functionality to help you avoid writing unnecessary code. Python data is smart: it can do things.
Let’s look at an example.
Imagine you have a piece of text in a variable that you want to display in uppercase (all CAPITAL letters):
msg = "Monster truck rally. 4pm. Monday."
You could write code that read through each character in the string and printed out the matching uppercase letter. But if you’re programming in a language like Python, you can do this:
But what does msg.upper()
mean?
Well, msg
is the string containing our piece of text. The .upper()
that follows it is called a string method. A method is just an instruction for the string. When you call msg.upper()
, you are telling the string to give you an UPPERCASE version of its data.
But is there a string method that can help you search for a substring within a string object?
The new version of the program works, but now there’s a design issue.
The Starbuzz CEO wants to know when the price of the beans falls below $4.74. The program needs to keep checking the Beans’R’Us website until that happens. It’s time to restructure the program to add in this new feature.
Let’s add a loop to the program that stops when the price of coffee is right.
The program crashed because it tried to compare a string with a number, which is something that doesn’t make a lot of sense to a computer program. When a piece of data is classified as a string or a number, this refers to more than just the contents of the variable. We are also referring to its datatype. If two pieces of data are different types, we can’t compare them to each other.
Think back to the previous chapter. You’ve seen this problem before, back when you were working on the guessing game program:
In the guessing-game program, you needed to convert the user’s guess into an integer (a whole number) by using the int()
function. But coffee bean prices aren’t whole numbers, because they contain numbers after a decimal point. They are floating point numbers or floats, and to convert a string to a float, you need to use a function other than int()
. You need to use float()
:
That sounds weird. What happened?
It looks like there’s a problem with the program. It’s sending so many requests that it overwhelmed the Beans’R’Us website. So why did that happen? Let’s look at the code and see:
If the value of price
isn’t low enough (if it’s more than 4.74), the program goes back to the top of the loop immediately and sends another request.
With the code written this way, the program will generate thousands of requests per hour. Multiply that by all the Starbuzz outlets around the world, and you can start to see the scale of the problem:
You need to delay the pricing requests. But how?
Just when you’re feeling completely lost, you get a phone call from the Starbuzz coder who wrote the original version of the program:
It seems that she can’t get back because of a storm in the mountains. But she does make a suggestion. You need to regulate how often you make a request of the Beans’R’Us web server. One way to do this is to use the time library. This will apparently make it possible to send requests every 15 minutes or so, which should help to lighten the load.
There’s just one thing: what’s a library?
Starbuzz Coffee is off the blacklist, because their price-checking programs no longer kill the Beans’R’Us web server. The nice people at Webland Security have, rather quietly, gone away.
Coffee beans get ordered when the price is right!
You’ve got Chapter 2 under your belt. Let’s look back at what you’ve learned in this chapter:
* Strings are sequences of individual characters.
* Individual string characters are referenced by index.
* Index values are offsets that start from zero.
* Methods provide variables with built-in functionality.
* Programming libraries provide a collection of related pre-built code and functions.
* As well as having a value, data in variables also have a “data type.”
* Number is a data type.
* String is a data type.
* s[4] - access the 5th character of the variable “s”, which is a string
* s[6:12] - access a sub-string within the string “s” (up to, but not including)
*s.find() method for searching strings
* s.upper() method for converting strings to UPPERCASE
* float() converts strings to decimal point numbers known as “floats”
* + addition operator
* > greater than operator
* urllib.request library for talking to the Web
* time library for working with dates/time
3.144.193.129