Working with Shapely geometries

Shapely is a very capable library for performing various calculations on geo-spatial data. Let's put it through its paces with a complex, real-world problem.

Task: Identify parks in or near urban areas

The U.S. Census Bureau makes available a Shapefile containing something called Core Based Statistical Areas (CBSAs), which are polygons defining urban areas with a population of 10,000 or more. At the same time, the GNIS website provides lists of placenames and other details. Using these two datasources, we will identify any parks within or close to an urban area.

Tip

Because of the volume of data we are potentially dealing with, we will limit our search to California. Feel free to download the larger data sets if you want, though you will have to optimize the code or your program will take a very long time to check all the CBSA polygon/placename combinations.

  1. Let's start by downloading the necessary data. Go to the TIGER website at http://census.gov/geo/www/tiger
  2. Click on the 2009 TIGER/Line Shapefiles Main Page link, then follow the Download the 2009 TIGER/Line Shapefiles now link.
  3. Choose California from the pop-up menu on the right, and click on Submit. A list of the California Shapefiles will be displayed; the Shapefile you want is labelled Metropolitan/Micropolitan Statistical Area. Click on this link, and you will download a file named tl_2009_06_cbsa.zip. Once the file has downloaded, uncompress it and place the resulting Shapefile into a convenient location so that you can work with it.
  4. You now need to download the GNIS placename data for California. Go to the GNIS website:

    http://geonames.usgs.gov/domestic

  5. Click on the Download Domestic Names hyperlink, and then choose California from the pop-up menu. You will be prompted to save the CA_Features_XXX.zip file. Do so, then decompress it and place the resulting CA_Features_XXX.txt file into a convenient place.

    Tip

    The XXX in the above file name is a date stamp, and will vary depending on when you download the data. Just remember the name of the file as you'll need to refer to it in your source code.

  6. We're now ready to write the code. Let's start by reading through the CBSA urban area Shapefile and extracting the polygons that define the boundary of each urban area:
    shapefile = osgeo.ogr.Open("tl_2009_06_cbsa.shp")
    layer = shapefile.GetLayer(0)
    
    for i in range(layer.GetFeatureCount()):
        feature = layer.GetFeature(i)
        geometry = feature.GetGeometryRef()
        ...
    

    Note

    Make sure you add directory paths to your osgeo.ogr.Open() statement (and to the file() statement below) to match where you've placed these files.

  7. Using what we learned in the previous section, we can convert this geometry into a Shapely object so that we can work with it:
    wkt = geometry.ExportToWkt()
    shape = shapely.wkt.loads(wkt)
    
  8. Next, we need to scan through the CA_Features_XXX.txt file to identify the features marked as a park. For each of these features, we want to extract the name of the feature and its associated latitude and longitude. Here's how we might do this:
    f = file("CA_Features_XXX.txt", "r")
    for line in f.readlines():
    chunks = line.rstrip().split("|")
        if chunks[2] == "Park":
            name = chunks[1]
            latitude = float(chunks[9])
            longitude = float(chunks[10])
            ...
    

    Note

    Remember that the GNIS placename database is a pipe-delimited text file. That's why we have to split the line up using line.rstrip().split("|").

  9. Now comes the fun part—we need to figure out which parks are within or close to each urban area. There are two ways we could do this, either of which will work:
    • We could use the shape.distance() method to calculate the distance between the shape and a Point object representing the park's location:
      Task: Identify parks in or near urban areas
    • We could dilate the polygon using the shape.buffer() method, and then see if the resulting polygon contained the desired point:
      Task: Identify parks in or near urban areas
  • The second option is faster when dealing with a large number of points as we can pre-calculate the dilated polygons and then use them to compare against each point in turn. Let's take this option:
    # findNearbyParks.py
    
    import osgeo.ogr
    import shapely.geometry
    import shapely.wkt
    
    MAX_DISTANCE = 0.1 # Angular distance; approx 10 km.
    
    print "Loading urban areas..."
    
    urbanAreas = {} # Maps area name to Shapely polygon.
    
    shapefile = osgeo.ogr.Open("tl_2009_06_cbsa.shp")
    layer = shapefile.GetLayer(0)
    
    for i in range(layer.GetFeatureCount()):
        feature = layer.GetFeature(i)
        name = feature.GetField("NAME")
        geometry = feature.GetGeometryRef()
        shape = shapely.wkt.loads(geometry.ExportToWkt())
        dilatedShape = shape.buffer(MAX_DISTANCE)
        urbanAreas[name] = dilatedShape
    
    print "Checking parks..."
    
    f = file("CA_Features_XXX.txt", "r")
    for line in f.readlines():
        chunks = line.rstrip().split("|")
        if chunks[2] == "Park":
            parkName = chunks[1]
            latitude = float(chunks[9])
            longitude = float(chunks[10])
    
            pt = shapely.geometry.Point(longitude, latitude)
    
            for urbanName,urbanArea in urbanAreas.items():
                if urbanArea.contains(pt):
                    print parkName + " is in or near " + urbanName
    f.close()
    

    Note

    Don't forget to change the name of the CA_Features_XXX.txt file to match the actual name of the file you downloaded. You may also need to change the path names to the tl_2009_06_CBSA.shp file and the CA_Features file if you placed them in a different directory.

If you run this program, you will get a master list of all the parks that are in or close to an urban area:

% python findNearbyParks.py 
Loading urban areas...
Checking parks...
Imperial National Wildlife Refuge is in or near El Centro, CA
TwinLakesStateBeach is in or near Santa Cruz-Watsonville, CA
AdmiralWilliamStandleyState Recreation Area is in or near Ukiah, CA
Agate Beach County Park is in or near San Francisco-Oakland-Fremont, CA
...

Note that our program uses angular distances to decide if a park is in or near a given urban area. We looked at angular distances in Chapter 2. An angular distance is the angle (in decimal degrees) between two rays going out from the center of the Earth to the Earth's surface. Because a degree of angular measurement (at least for the latitudes we are dealing with here) roughly equals 100 km on the Earth's surface, an angular measurement of 0.1 roughly equals a real distance of 10 km.

Using angular measurements makes the distance calculation easy and quick to calculate, though it doesn't give an exact distance on the Earth's surface. If your application requires exact distances, you could start by using an angular distance to filter out the features obviously too far away, and then obtain an exact result for the remaining features by calculating the point on the polygon's boundary that is closest to the desired point, and then calculating the linear distance between the two points. You would then discard the points that exceed your desired exact linear distance. Implementing this would be an interesting challenge, though not one we will examine in this book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.138.104