Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 12. Data and Dictionaries

Data visualization is one of the most active areas at the intersection of code and graphics and is also one of the most popular uses of Processing. This chapter builds on what has been discussed about storing and loading data earlier in the book and introduces more features relevant to data sets that might be used for visualization.

There is a wide range of software that can output standard visualizations like bar charts and scatter plots. However, writing code to create visualization from scratch provides more control over the output and encourages users to imagine, explore, and create more unique representations of data. For us, this is the point of learning to code and using software like Processing, and we find it far more interesting than being limited by prepackaged methods or tools that are available.

Data Summary

It’s a good time to rewind and discuss how data was introduced throughout this book. Recall that every value in a Python program has a data type. Each kind of data is unique and is stored in a different way. We started the book talking about simple data types like int (for integers) or float (for numbers with decimals). Later, we discussed compound data types like objects and lists. A compound data type keeps track of multiple values. The values in a list are accessed by their numerical index, whereas the values in an object are accessed by name as data attributes.

The examples in this chapter introduce a new compound data type: the dictionary. Dictionaries are data structures that are conceptually similar to lists, except instead of accessing values by numerical index, you access them by name. This makes dictionaries a data type especially suited for storing, transmitting, and processing structured data. There are several built-in Python tools that read data in various formats (e.g., from the data folder for a sketch) and return dictionaries. We’ll load data into dictionaries from two different sources: tables of data in comma-separated values (CSV) format, and data in JSON format.

Dictionaries

You can think of a dictionary as being sort of like a list, except you index its values not with a number but with a key. Dictionary keys are usually strings that identify the values they point to in an easy-to-remember way. Let’s say, for example, that we wanted to include in our sketch some information about the planets of our solar system. Here’s what a dictionary with information about Earth might look like in Python:

planetInfo = {
  "name": "Earth",
  "knownMoons": 1,
  "eqRadiusKm": 6387.1,
  "hasRings": False
}

In this example, we’ve created a dictionary and assigned it to a variable called planetInfo. The keys in this dictionary are name, knownMoons, eqRadiusKm, and hasRings. The values for those keys are Earth, 1, 6387.1, and False, respectively. As this example shows, dictionary values can be any data type: strings, integers, floating-point numbers, booleans—even objects, lists, and other dictionaries can be stored as dictionary values.

Once you’ve defined a dictionary, you can get the value for a particular key using square bracket notation. This looks similar to how you get the value for a particular index in a list, except this time we’re putting a string between the square brackets instead of a number:

planetInfo = {
  "name": "Earth",
  "knownMoons": 1,
  "eqRadiusKm": 6387.1,
  "hasRings": False
} 
print planetInfo['name'] # prints "Earth"
print planetInfo['knownMoons'] # prints 1

If you attempt to get the value for a key that is not present in the dictionary, Python will raise a KeyError, and your program will halt:

print planetInfo['extraterrestrialCount'] # raises KeyError

You can check to see whether or not a key is present in a dictionary by using the special operator in. Put the key you want to check for on the lefthand side of in, and the dictionary you want to check on the righthand side. The entire expression will return True or False, so you can use it in an if statement:

print 'extraterrestrialCount' in planetInfo # prints False
if 'eqRadiusKm' in planetInfo:
  print "Planetary radius (km): ", planetInfo['eqRadiusKm']
else:
  print "No radius information available."

The name dictionary is meant to evoke a physical dictionary, in which you look up words (keys) to find their definitions (values). In other computer languages (such as Java and C++), the analogous data structure is called a map. Sometimes when talking about dictionaries, we’ll say that keys “map” to values. (For instance, in the preceding example, the key name maps to the value Earth.)

Example 12-1: (Keyboard) Keys as (Dictionary) Keys

The following sketch shows how dictionaries can be used to store information and retrieve it in response to user input:

sizes = {
  'a': 40,
  'b': 80,
  'c': 120,
  'd': 160
}
def setup():
  size(200, 200)
  rectMode(CENTER)
def draw():
  background(0)
  fill(255)
  if keyPressed:
    if key in sizes:
      rect(100, 100, sizes[key], sizes[key])

This sketch displays rectangles of various sizes to the screen in response to user input. A dictionary called sizes maps particular keystrokes to integer values. Inside of draw(), we check to see if a keyboard key has been pressed and whether the string value of that keyboard key is present in the sizes dictionary. If so, we display a rectangle with its size determined by the value stored for that key.

Lists of Dictionaries

Let’s return to our dictionary with information about a particular planet. It looks like this:

planetInfo = {
  "name": "Earth",
  "knownMoons": 1,
  "eqRadiusKm": 6387.1,
  "hasRings": False
}

Now, let’s imagine that we want our program to contain information not just about Earth, but all of the terrestrial planets (Mercury, Venus, Earth, and Mars). We could do this by creating several dictionaries, one for each planet:

mercuryInfo = {
  "name": "Mercury",
  "knownMoons": 0,
  "eqRadiusKm": 2439.64,
}
venusInfo = {
  "name": "Venus",
  "knownMoons": 0,
  "eqRadiusKm": 6051.59,
}
earthInfo = {
  "name": "Earth",
  "knownMoons": 1,
  "eqRadiusKm": 6387.1,
}
marsInfo = {
  "name": "Mars",
  "knownMoons": 2,
  "eqRadiusKm": 3397.0
}

So far, so good. Let’s set ourselves to another task: how would you calculate the average equatorial radius for all planets in this data? Here’s the most obvious way to do it:

planetCount = 4
radiusSum = mercuryInfo['eqRadiusKm']
radiusSum += venusInfo['eqRadiusKm']
radiusSum += earthInfo['eqRadiusKm']
radiusSum += marsInfo['eqRadiusKm']
print radiusSum / planetCount # prints 4568.8325

But this solution has some problems, the foremost being the repetition. If we add more planet data, we would have to manually add each new planet to our statements that calculate the sum of their radii. This could get tedious very quickly.

Thankfully, in just the same way that Python allows us to make lists of integers or floating-point numbers, we can create lists of dictionaries:

planetList = [mercuryInfo, venusInfo, earthInfo, marsInfo]

This statement creates a list called planetList. Each element of planetList is a dictionary. We can access any value for a particular key for one of the dictionaries in this list like so:

print planetList[2]['name'] # prints 'Earth'
print planetList[3]['knownMoons'] # prints 2

That’s a lot of square brackets! Here’s how to understand what an expression like planetList[2]['name'] means. First, we know that we can put square brackets with a number inside (e.g., [2]) right after any expression that evaluates to a list. That expression will evaluate to the value stored at that index of the list, which in this case is a dictionary. We also know that we can put square brackets with a string inside (e.g., ['name']) right after any expression that evaluates to a dictionary. That expression will, in turn, evaluate to the value stored in the dictionary for that key. Keeping both of these ways to form expressions in mind, you can read the expression planetList[2]['name'] from left to right like so:

planetList is a list
planetList[2] is the element at index 2 of planetList, which is a dictionary
planetList[2]['name'] is the value for key name in that dictionary (Earth)

We can loop over a list of dictionaries the same way we loop over a regular list, with a for loop:

radiusSum = 0
for i in range(len(planetList)):
  radiusSum += planetList[i]['eqRadiusKm']
print radiusSum / len(planetList) # prints 4568.8325

Alternatively, we can use the for...in looping syntax introduced in Example 11-11. The “temporary loop variable” in this case will be a dictionary. With that in mind, here’s some revised code to calculate the average radius of all four terrestrial planets:

radiusSum = 0
for p in planetList:
  radiusSum += p['eqRadiusKm']
print radiusSum / len(planetList) # prints 4568.8325

Example 12-2: The Planets

This example takes a simplified version of our list of planet information dictionaries and uses it as a data source for drawing to the screen:

planetList = [
  {"name": "Mercury", "eqRadiusKm": 2439.64},
  {"name": "Venus", "eqRadiusKm": 6051.59},
  {"name": "Earth", "eqRadiusKm": 6387.1},
  {"name": "Mars", "eqRadiusKm": 3397.0}
]
def setup():
  size(600, 150)
  textAlign(LEFT, CENTER)
def draw():
  background(0)
  fill(255)
  planetCount = len(planetList)
  for i in range(planetCount):
    # scale radius to be screen-friendly
    planetRadius = planetList[i]['eqRadiusKm'] * 0.01
    offset = 50 + ((width/planetCount) * i)
    ellipse(offset, height/2, planetRadius, planetRadius)
    text(planetList[i]['name'], 10+offset+(planetRadius/2),
      height/2)

The sketch begins with a simplified version of our planetList variable. Here, instead of creating a variable for each dictionary first and then putting the variable names into the list declaration, we simply write the dictionaries straight into the list. In the draw() function, we loop over the list of planets and use each planet’s radius and name to draw it to the screen (using the planet’s position in the list to determine where on the screen to draw it).

CSV Files

Many data sets are stored in spreadsheets. You may have worked with a program like Microsoft Excel or Google Sheets that allows you to manipulate data in this format. Spreadsheets are made out of rows and columns, with each row usually representing one item and each cell in the row representing some aspect of that item.

Spreadsheet data is often stored in plain-text files with columns using commas or the tab character. A comma-separated values file is abbreviated as CSV, and uses the file extension .csv. When tabs are used, the extension .tsv is sometimes used. Python includes a library to make it easy to work with data stored in this format. In this chapter, we will focus on loading data from a CSV file.

To load a CSV or TSV file, you’ll need to place it in your sketch’s data folder (as described in Chapter 7).

The data for the next example is a simplified version of Boston Red Sox player David Ortiz’s batting statistics from 1997 to 2014. From left to right, it is the year, number of home runs, runs batted in (RBIs), and batting average. When opened in a text editor, the first five lines of the file look like this:

1997,1,6,0.327
1998,9,46,0.277
1999,0,0,0
2000,10,63,0.282
2001,18,48,0.234

Example 12-3: Read the Data

To load this data into Processing, we need to use one of Python’s built-in libraries called csv. The csv library provides functions and classes that make it easy to work with data in CSV format. We also need to use the built-in Python function open() to gain access to the file in the sketch’s data folder. Once we’ve created a CSV reader object, we use a for loop to operate on each row of data in sequence:

import csv

statsFileHandle = open("ortiz.csv")
statsData = csv.reader(statsFileHandle)
for row in statsData:
  year = row[0]
  homeRuns = row[1]
  rbi = row[2]
  average = row[3]
  print year, homeRuns, rbi, average

The import statement at the beginning of the program is what signals to Python that we want to use the built-in csv library in our program. The open() function takes the name of the CSV file we want to work with as a parameter, and returns a special kind of object called a file handle. We then pass that file handle as a parameter to the csv.reader() function, which returns a CSV reader object (which we’ve assigned to a variable called statsData here). A CSV reader object works a lot like a list, in that we can iterate over it with a for loop. (We’ve called the temporary loop variable here row, but there’s nothing special about that word. You can call it whatever you want!)

Inside the for loop, we can access data for the current row using the numerical index of the relevant column. The expression row[0] evaluates to the item in the first column of the row (i.e., the year), the expression row[1] evaluates to the item in the second column, and so forth.

Getting the Right Type

There’s one tricky thing about using CSV files, which is that they don’t contain any information about what kind of data they’re storing. To illustrate, think about how you might go about finding the sum of David Ortiz’s home runs in his career. You might write some code that looks like this:

import csv

statsFileHandle = open("ortiz.csv")
statsData = csv.reader(statsFileHandle)

homeRunTotal = 0
for row in statsData:
  homeRunTotal += row[1]

print homeRunTotal

In this code, we made a variable called homeRunTotal to store the total number of home runs. As we iterate over each row, we add the number from the second column, which contains the number of home runs for that year. Looks good, right? But there’s a problem. If you try to run this, you’ll get the following error:

TypeError: unsupported operand type(s) for +: 'int' and 'str'

This error is telling you that you were attempting to add an int to a str. Python doesn’t know how to do that, so your program didn’t work. This happened because the csv library always gives you data from a CSV file as a string, even if the underlying data looks like a number. If you want to use that string as a number, you have to explicitly convert it yourself, using one of Python’s built-in conversion functions like int().

Here’s a corrected version of the preceding example. The only change we’ve made is to the line inside the for loop, where we use the int() function to convert the value from the CSV file from a string to an int:

import csv

statsFileHandle = open("ortiz.csv")
statsData = csv.reader(statsFileHandle)

homeRunTotal = 0
for row in statsData:
  homeRunTotal += int(row[1])

print homeRunTotal

Example 12-4: Draw the Table

The next example builds on the last. It creates a list called homeRuns to store data after it is loaded inside setup() and the data from that list is used within draw(). In setup(), we again use open() to get a file handle for our CSV file, and then give the file handle as a parameter to csv.reader(). In a for loop, we append each home run count to our homeRuns list, taking care to convert the values to integers first.

Two separate tasks are accomplished in draw(). First, a for loop draws vertical lines for our graph based on the number of entries in the homeRuns list. A second for loop reads each element of the homeRuns list and plots a line on the graph using the data.

This example is the visualization of a simplified version of Boston Red Sox player David Ortiz’s batting statistics from 1997 to 2014 drawn from a table:

import csv

homeRuns = list()

def setup():
  size(480, 120)
  statsFileHandle = open("ortiz.csv")
  statsData = csv.reader(statsFileHandle)
  for row in statsData:
    homeRuns.append(int(row[1]))
  print homeRuns

def draw():
  background(204)
  # Draw background grid for data
  stroke(153)
  line(20, 100, 20, 20)
  line(20, 100, 460, 100)
  for i in range(len(homeRuns)):
    x = map(i, 0, len(homeRuns)-1, 20, 460)
    line(x, 20, x, 100)
  # Draw lines based on home run data
  noFill()
  stroke(0)
  beginShape()
  for i in range(len(homeRuns)):
    x = map(i, 0, len(homeRuns)-1, 20, 460)
    y = map(homeRuns[i], 0, 60, 100, 20)
    vertex(x, y)
  endShape()

This example is so minimal that it’s not necessary to store this data in lists, but the idea can be applied to more complex examples you might want to make in the future. In addition, you can see how this example will be enhanced with more information—for instance, information on the vertical axis to state the number of home runs and on the horizontal to define the year.

Example 12-5: 29,740 Cities

To get a better idea about the potential of working with data tables, the next example uses a larger data set and introduces a convenient feature. This table data is different because the first row—the first line in the file—is a header. The header defines a label for each column to clarify the context. This is the first five lines of our new data file called cities.csv:

zip,state,city,lat,lng
35004,AL,Acmar,33.584132,-86.51557
35005,AL,Adamsville,33.588437,-86.959727
35006,AL,Adger,33.434277,-87.167455
35007,AL,Keystone,33.236868,-86.812861

The header makes it easier to read the data—for example, the second line of the file states the zip code of Acmar, Alabama, is 35004 and defines the latitude of the city as 33.584132 and the longitude as -86.51557. In total, the file is 29,741 lines long and it defines the location and zip codes of 29,740 cities in the United States.

The next example loads this data within the setup() and then draws it to the screen in a for loop within the draw(). The setXY() function converts the latitude and longitude data from the file into a point on the screen:

import csv

citiesData = None

def setXY(lat, lng):
  x = map(lng, -180, 180, 0, width)
  y = map(lat, 90, -90, 0, height) 
  point(x, y)

def setup():
  global citiesData
  size(240, 120)
  citiesFileHandle = open("cities.csv")
  citiesData = list(csv.DictReader(citiesFileHandle))
  strokeWeight(0.1)
  stroke(255)

def draw():
  background(0, 26, 51)
  xoffset = map(mouseX, 0, width, -width*3, -width)
  translate(xoffset, -300)
  scale(10)
  for row in citiesData:
    latitude = float(row["lat"])
    longitude = float(row["lng"])
    setXY(latitude, longitude)

The csv.DictReader object is a little different from the csv.reader object that we used in the previous example. When we used the csv.reader object, we had to access each cell in a row of data by its numerical index. The csv.DictReader object, on the other hand, gives us a dictionary for each row. This dictionary uses the strings in the header line of the CSV file as its keys, and each key maps to the corresponding value for the row in question. Because each row is a dictionary, we can use (for example) the expression row["lat"] to access the latitude column, which is much easier to remember than if we needed to reference the column by its numerical index.

You may have noticed the curious use of the built-in list() function in setup(). This is necessary because csv.DictReader objects, unlike regular lists, can only be iterated over once. We use the list() function to read all of the rows from one of these objects at once and store them in a separate variable. The resulting value, stored in the variable citiesData, is a list of dictionaries (much like the planetsList variable in Example 12-2).

JSON

The JavaScript Object Notation (JSON) format is another common system for storing data. Like HTML and XML formats, the elements have labels associated with them. For instance, the data for a film might include labels for the title, director, release year, rating, and more. These labels will be paired with the data like this:

"title": "Alphaville"
"director": "Jean-Luc Godard"
"year": 1964
"rating": 9.1

To work as a JSON file, the film labels need a little more punctuation to separate the elements. Commas are used between each data pair, and braces enclose it. The data defined within the curly braces is a JSON object.

With these changes, our valid JSON data file looks like this:

{
  "title": "Alphaville",
  "director": "Jean-Luc Godard",
  "year": 1964,
  "rating": 9.1
}

There’s another interesting detail in this short JSON sample related to data types: you’ll notice that the title and director data is contained within quotes to mark them as strings, and the year and rating are without quotes to define them as numbers. Specifically, the year is an integer and the rating is a floating-point number. This distinction becomes important after the data is loaded into a sketch.

To add another film to the list, a set of brackets placed at the top and bottom are used to signify that the data is an array of JSON objects. Each object is separated by a comma.

Putting it together looks like this:

[
  {
    "title": "Alphaville",
    "director": "Jean-Luc Godard",
    "year": 1964,
    "rating": 9.1
  },
  {
    "title": "Pierrot le fou",
    "director": "Jean-Luc Godard",
    "year": 1965,
    "rating": 7.3
  }
]

This pattern can be repeated to include more films. At this point, it’s interesting to compare this JSON notation to the corresponding CSV representation of the same data.

As a CSV file, the data looks like this:

title,director,year,rating
Alphaville,Jean-Luc Godard,1965,9.1
Weekend,Jean-Luc Godard,1967,7.3

Notice that the CSV notation has fewer characters, which can be important when working with massive data sets. On the other hand, the JSON version is often easier to read because each piece of data is labeled.

Now that the basics of JSON and its relation to CSV data has been introduced, let’s look at the code needed to read a JSON file into a Processing sketch.

Example 12-6: Read a JSON File

You may have noticed that the JSON format looks very similar to the way Python data structures look when we include them directly in our program. This similarity is a little misleading, as there are a number of subtle differences between the two, and you can’t just paste a JSON data structure verbatim into your Python program and expect it to work. What we need is a way to read data stored in JSON format and convert it into a Python data structure that we can use in our program. Python supplies us with this functionality through the built-in json library.

This sketch loads the JSON file from the beginning of this section, the file that includes only the data for the film Alphaville:

import json

def setup():
  filmFileHandle = open("film.json")
  film = json.load(filmFileHandle)
  title = film["title"]
  director = film["director"]
  year = film["year"]
  rating = film["rating"]
  print "Title: ", title
  print "Director: ", director
  print "Year: ", year
  print "Rating: ", rating

The json.load() function loads data in JSON format from a given file handle. (Just as with the CSV examples, we need to create the file handle first with the built-in open() function.) The json.load() function returns a value of a compound data type that corresponds to the data in the JSON file. In this example, the JSON object in film.json is converted into a Python dictionary, which we store in the variable film. We can then use square bracket syntax to access values for particular keys in that dictionary. After we’ve converted the JSON into Python, the types of the values retrieved will reflect their types from the original JSON data structure (i.e., JSON integers will become Python integers, JSON strings will become Python strings, etc.).

Example 12-7: Visualize Data from a JSON File

In this example, the data file started before has been updated to include all of the director’s films from 1960–1966. The name of each film is placed in order on screen according to the release year and assigned a gray value based on the rating value.

There are several differences between this example and Example 12-4. The most important is the fact that the data structure in films.json is a list of dictionaries, not just a single dictionary. As a result, the call to json.load() in setup() returns a list. Each item in this list is a dictionary that contains data for a particular film. Inside draw(), we iterate over each item in this list and display its values to the screen:

import json

films = []

def setup():
  global films
  size(480, 120)
  filmFileHandle = open("films.json")
  films = json.load(filmFileHandle)

def draw():
  background(0)
  for i in range(len(films)):
    film = films[i]
    ratingGray = map(film["rating"], 6.5, 8.1, 102, 255)
    pushMatrix()
    translate(i*32 + 32, 105)
    rotate(-QUARTER_PI)
    fill(ratingGray)
    text(film["title"], 0, 0)
    popMatrix()

This example is bare bones in its visualization of the film data. It shows how to load the data and how to draw based on those data values, but it’s your challenge to format it to accentuate what you find interesting about the data. For example, is it more interesting to show the number of films Godard made each year? Is it more interesting to compare and contrast this data with the films of another director? Will all of this be easier to read with a different font, sketch size, or aspect ratio? The skills introduced in the earlier chapters in this book can be applied to bring this sketch to the next step of refinement.

Network Data and APIs

Public access to massive quantities of data collected by governments, corporations, organizations, and individuals is changing our culture, from the way we socialize to how we think about intangible ideas like privacy. This data is most often accessed through software structures called APIs.

The acronym API is mysterious, and its meaning—application programming interface—isn’t much clearer. However, APIs are essential for working with data and they aren’t necessarily difficult to understand. Essentially, they are requests for data made to a service. When data sets are huge, it’s not practical or desired to copy the entirety of the data; an API allows a programmer to request only the trickle of data that is relevant from a massive sea.

This concept can be more clearly illustrated with a hypothetical example. Let’s assume there’s an organization that maintains a database of temperature ranges for every city within a country. The API for this dataset allows a programmer to request the high and low temperatures for any city during the month of October in 1972. In order to access this data, the request must be made through a specific line or lines of code, in the format mandated by the data service.

Some APIs are entirely public, but many require authentication, which is typically a unique user ID or key so the data service can keep track of its users. Most APIs have rules about how many, or how frequently, requests can be made. For instance, it might be possible to make only 1,000 requests per month, or no more than one request per second. Many APIs also require you to register as a developer on their site to obtain an “API key,” a special identifying string that must be included with the API request.

Processing can request data over the Internet when the computer that is running the program is online. CSV, TSV, JSON, and XML files can be loaded using the corresponding load function with a URL as the parameter. For instance, the current weather in Cincinnati is available in JSON format at this URL:

http://api.openweathermap.org/data/2.5/find?q=Cincinnati&units=imperial&appid=YOUR_API_KEY

Read the URL closely to decode it:

It requests data from the api subdomain of the openweathermap.org site.
It specifies a city to search for (q is an abbreviation for query, and is frequently used in URLs that specify searches).
It indicates that the data will be returned in imperial format, which means the temperature will be in Fahrenheit. Replacing imperial with metric will provide temperature data in degrees Celsius.
It includes your API key, supplied as the appid parameter.

Visit http://openweathermap.org/api for more information on accessing the Open Weather Map API and obtaining an API key.

Looking at this data from OpenWeatherMap is a more realistic example of working with data found in the wild rather than the simplified data sets introduced earlier. At the time of this writing, the file returned from that URL looks like this:

{"message":"accurate","cod":"200","count":1,"list":[{"id":4508722,"name":"Cincinnati","coord":{"lon":-84.456886,"lat":39.161999},"main":{"temp":34.16,"temp_min":34.16,"temp_max":34.16,"pressure":999.98,"sea_level":1028.34,"grnd_level":999.98,"humidity":77},"dt":1423501526,"wind":{"speed":9.48,"deg":354.002},"sys":{"country":"US"},"clouds":{"all":80},"weather":[{"id":803,"main":"Clouds","description":"broken clouds","icon":"04d"}]}]}

This file is much easier to read when it’s formatted with line breaks and the JSON object and list structures defined with braces and brackets:

{
  "message": "accurate",
  "count": 1,
  "cod": "200",
  "list": [{
    "clouds": {"all": 80},
    "dt": 1423501526,
    "coord": {
      "lon": -84.456886,
      "lat": 39.161999
    },
    "id": 4508722,
    "wind": {
      "speed": 9.48,
      "deg": 354.002
    },
    "sys": {"country": "US"},
    "name": "Cincinnati",
    "weather": [{
      "id": 803,
      "icon": "04d",
      "description": "broken clouds",
      "main": "Clouds"
    }],
    "main": {
      "humidity": 77,
      "pressure": 999.98,
      "temp_max": 34.16,
      "sea_level": 1028.34,
      "temp_min": 34.16,
      "temp": 34.16,
      "grnd_level": 999.98
    }
  }]
}

Note that brackets are seen in the "list" and "weather" sections, indicating a list of JSON objects. Although the list in this example only contains a single item, in other cases, the API might return multiple days or variations of the data from multiple weather stations.

Example 12-8: Parsing the Weather Data

The first step in working with this data is to study it and then to write minimal code to extract the desired data. In this case, we’re curious about the current temperature. We can see that our temperature data is 34.16. It’s labeled as temp and it’s inside the main object, which is inside the list of objects given as a value for the key list. A function called getTemp() was written for this example to work with the format of this specific JSON file organization:

import json

def getTemp(fileName):
  weatherFileHandle = open(fileName)
  weather = json.load(weatherFileHandle)
  list_value = weather["list"] # get value for "list" key
  item = list_value[0] # get first item from list_value
  main = item["main"] # item is a dictionary; get "main" value
  temperature = main["temp"] # get value for "temp" key
  return temperature

def setup():
  temp = getTemp("cincinnati.json")
  print temp

The name of the JSON file, cincinnati.json, is passed into the getTemp() function inside setup() and loaded there. Next, because of the format of the JSON file, a series of lists and dictionaries are needed to get deeper and deeper into the data structure to finally arrive at our desired number. This number is stored in the temperature variable and then returned by the function to be assigned to the temp variable in setup() where it is printed to the console.

Example 12-9: Chaining Square Brackets

The sequence of JSON variables created in succession in the last example can be approached differently by chaining the indexes together. This example works like Example 12-8 except that each square bracket index is connected directly to the previous one, rather than calculated one at a time and assigned to variables in between:

import json

def getTemp(fileName):
  weather = json.load(open(fileName))
  return weather["list"][0]["main"]["temp"]

def setup():
  temp = getTemp("cincinnati.json")
  print temp

This example can be modified to access more of the data from the feed and to build a sketch that displays the data to the screen rather than just writing it to the console. You can also modify it to read data from another online API—you’ll find that the data returned by many APIs shares a similar format.

Robot 10: Data

The final robot example in this book is different from the rest because it has two parts. The first part generates a data file using random values and for loops, and the second part reads that data file to draw an army of robots onto the screen.

The first sketch uses several new code elements. First, we’ll use the open() function to create a new file, and then we’ll use the file handle object’s write() method to write data to that file. In this example, the file handle object is called output and the file is called botArmy.tsv. (You’ll need to adjust the path in the following example to reflect a folder that exists on your own computer.) Random values are used to define which of three robot images will be drawn for each coordinate. For the file to be correctly created, the close() method must be run before the program is stopped.

Notice that we called the open() function in this example with a second parameter: the string "w". This parameter signals to Python that we want to open the file not just to read its contents, but to write new contents to it. (The w stands for write.)

The code that draws an ellipse is a visual preview to reveal the location of the coordinate on screen, but notice that the ellipse isn’t recorded into the file. Also note that the we need to use the str() function to explicitly convert the x, y, and robotType values to strings so that we can build the line of text that gets written to the file:

def setup():
  size(720, 480)
  # Create the new file
  output = open("/Users/allison/botArmy.tsv", "w")
  # Write a header line with the column titles
  output.write("type	x	y
")
  for y in range(0, height+1, 60):
    for x in range(0, width+1, 20):
      robotType = str(int(random(1, 4)))
      output.write(robotType+"	"+str(x)+"	"+str(y)+"
")
      ellipse(x, y, 12, 12)
  output.close()  # Finish the file

After that program is run, you’ll find a file named botArmy.tsv in the location you specified in the first parameter to the open() function. Open it to see how the data is written. The first five lines of that file will be similar to this:

type    x       y
3	0	0
1	20	0
2	40	0
1	60	0
3	80	0

The first column is used to define which robot image to use, the second column is the x coordinate, and the third column is the y coordinate.

The next sketch loads the botArmy.tsv file and uses the data for these purposes. Note that because the data was written in tab-separated values (TSV) format instead of comma-separated values (CSV) format, we need to include delimiter=" " as an extra parameter in the call to csv.DictReader:

import csv

def setup():
  size(720, 480)
  background(0, 153, 204)
  bot1 = loadShape("robot1.svg")
  bot2 = loadShape("robot2.svg")
  bot3 = loadShape("robot3.svg")
  shapeMode(CENTER)
  robotsFileHandle = open("/Users/allison/botArmy.tsv")
  robots = csv.DictReader(robotsFileHandle, delimiter="	")
  for row in robots:
    bot = int(row["type"])
    x = int(row["x"])
    y = int(row["y"])
    sc = 0.3
    if bot == 1:
      shape(bot1, x, y, bot1.width*sc, bot1.height*sc)
    elif bot == 2:
      shape(bot2, x, y, bot2.width*sc, bot2.height*sc)
    else:
      shape(bot3, x, y, bot3.width*sc, bot3.height*sc)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 12. Data and Dictionaries

Create new playlist

Sign In

Sign Up