Chapter 5. Hashes and Databases: Putting data in its place

image with no caption

Arrays aren’t the only show in town when it comes to data.

Programming languages come with other data-arranging goodies too, and our chosen tool, Python, is no exception. In this chapter, you’ll associate values with names using a data structure commonly called the hash (better known as dictionary to Python-folk). And when it comes to working with stored data, you’ll read data from an external database system as well as from regular text-based files. All the world’s awash with data, so turn the page and start applying your ever-expanding programming skills to some cool data-processing tasks.

Who won the surfing contest?

In the previous chapter, you worked out the top three scores, but they’re not much use without the names of the surfers that achieved those scores. There will no be surfing for you until you’ve finished the program.

Here’s the code so far:

scores = []
result_f = open("results.txt")
for line in result_f:
    (name, score) = line.split()
    scores.append(float(score))
result_f.close()
scores.sort()
scores.reverse()
print("The top scores were:")
print(scores[0])
print(scores[1])
print(scores[2])
image with no caption
image with no caption

Load this!

Don’t forget to download results.txt from the Head First Programming website before continuing.

Associate the name with the score

Using two arrays just won’t cut it. You need some other data structure to hold your data in such a way that the association between the surfers’ name and their score is maintained.

image with no caption

You need a different data structure. But which one?

The Scholar’s Corner

image with no caption

Data Structure A standard method of organizing a collection of data items in your computer’s memory. You’ve already met one of the classic data structures: the array.

image with no caption

Note

Known in the Python world as a “dictionary.”

Use a hash

You need to use a data structure that maintains the association between the surfer score and the surfer name, which is exactly what a hash gives you. There are lots of surfers with lots of scores, and you need to maintain the association between the two pieces of information.

Let’s take a look at how hashes work.

Geek Bits

Hashes go by different names in different programming languages: mapping, dictionary, associative array, and key-value list, to name a few. In this book, we’ll stick to using the name hash.

Note

This cuts down on the amount of typing and saves our poor fingers!

Associate a key with a value using a hash

Start with an empty hash:

image with no caption

You add data to an existing hash by describing the association between the key and the value. Here’s how to associate a surfers’ name with their score:

image with no caption

Iterate hash data with for

Let’s add some additional rows of data to your hash:

image with no caption

Once you have a hash created, you can use the trusty for loop to iterate over each of the rows:

image with no caption
image with no caption

Another hash method, called items(), returns each key-value pair in turn, and can be used with the for loop, too:

image with no caption

Whichever method you use to iterate over your hash’s data is up to you, because using items() or keys() produces the same output.

The data isn’t sorted

Your program now associates surfers and their scores, but it displays the data from the hash in some sort of random ordering. You need to somehow sort the data in the hash to find out who actually won the contest.

image with no caption

Python hashes don’t have a sort() method...

In Python, hashes are optimized for speedy insertions and even speedier look-ups (searches). As a consequence, the good folks that created Python were less interested in providing a method to sort a hash, so they didn’t bother.

Note

You’ll find similar design and implementation decisions in lots of different programming languages. People are different... and so are the programming languages they create.

...but there is a function called sorted()

Obviously, there’s a need to sort a hash so, again, the good folks that created Python decided to provide a really smart built-in function that has the ability to sort any data structure, and they called their function sorted(). Here’s how to use the sorted() function with your hash:

image with no caption

That’s one small change to one line at the bottom of your program. So, let’s go ahead and make that change. Now that you are sorting the keys of the hash (which represent the surfer’s scores), it should be clear why the scores were used as the key when adding data into the hash: you need to sort the scores, not the surfer names, so the scores need to be on the left side of the hash (because that’s what the built-in sorted() function works with).

Do this!

Make the change to your code to use sorted().

When data gets complex

Hot on the heels of your success with the local surfing club, you’ve just been contacted by the Regional Surfing Association (RSA) and they want you to write a new program for them! RSA’s offering a brand-new, state-of-the-art, epoxy resin surf board as payment... once the program’s working to their satisfaction, of course.

image with no caption

You’ve been wanting to try out an epoxy board for ages. The trouble is, they’re sooooo expensive and you’re a poor surfer. The waves will have to wait (yet again). But, the thoughts of surfing an epoxy board... now that’s worth waiting for.

So what does RSA’s data look like?

image with no caption

Return a data structure from a function

Processing one line of surfer data was pretty straightforward. But now you have to work with all the lines of data in the file. Your program has to make the data available quickly so that a request to display the details of a particular surfer can be performed as soon as possible.

You already know enough to write a function that takes the surfer ID as a parameter, searches the file one line at a time for a matching ID, and then returns the found data to the caller:

image with no caption

There are really only two choices for how you return data from this function. Pass back the surfer’s data either:

  • As a string

or

  • As a hash

But which? Returning a string requires the calling code to further process the data to extract the information it needs, which (although possible) gets messy, because the calling code is then required to cut up the string using split(). This is something best left to the function, because it hides the complexity of manipulating the data from the calling code. Returning a hash allows the calling code to simply pick out the information it needs without too much fuss and without any further processing.

Return a hash from the function to keep the calling code simple.

Do this!

Type this tester code, together with your function, into IDLE. Be sure to put the function near the top of your file, so that this tester code appears after it.

Here’s your new board!

The RSA folks are delighted with your work.

image with no caption

Your program really hits the mark. The RSA folks can display the data from their tightly packed data file in a way that makes it easy for them to read and work with. It’s fast, too.

Your use of a hash within the function was an inspired choice. The calling code only needs to be aware that a hash is being returned from the function to work with it effectively. And, as you’ve seen, returning a data structure (like a hash) is as easy as returning any other variable from a function.

image with no caption

Word of your programming skills is spreading far and wide.

Meanwhile, down at the studio...

image with no caption

Head First TVN is an up-and-coming sports network specializing in everything and anything to do with water. They are covering the National Surfing Championship and want their TV presenters to be able to use your program to access each surfer’s details in much the same way that RSA did. There’s just one small kink in their plans: TVN has all their data in a database system, not in a file.

Brain Power

Which part of your program is most likely to change if you have to get the surfer data from a database, as opposed to a file?

The code remains the same; it’s the function that changes

Your program expects the find_details() function to return a hash representing the surfer’s details. Rather than the function searching the file for the data, it needs to search the TVN database, convert what the database provides to a hash, and then return the hash to the calling code.

All you need to know is which database system TVN is using and how to access it from your function.

Let’s base your code on TVN’s code.

image with no caption

TVN’s data is on the money!

With the surfing data now displayed directly from the TVN database, the sports writers no longer need to worry about all those tattered and torn competition entry forms. The data is right there on screen when they need it. They couldn’t be happier. Your program has made their day.

image with no caption
image with no caption
image with no caption

Your Programming Toolbox

You’ve got Chapter 5 under your belt. Let’s look back at what you’ve learned in this chapter:

Programming Tools

* hash - a data structure that associates a name with a value

* s[‘age’] - retrieve the value associated with the ‘age’ name in a hash called ‘s’

* returning a data structure from a function

* database system - a technology, like SQLite3, that can store large quantities of data in a very efficient way

Python Tools

* {} - an empty hash

* s[‘wind’] = “off shore” - sets that value associated with “wind” in the “s” hash to the value “off shore”

* s.keys() - provide a list of keys for the hash called ‘s’

* s.items() - provide a list of keys AND values for the hash called ‘s’

* line.split(“,”) - split the string contained within the ‘line’ variable at every occurrence of a comma

* sorted() - a built-in function that can sort most data structures

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.1.232