Working with GIS data manually

Let's take a brief look at the process of working with GIS data manually. Before we can begin, there are two things you need to do:

  • Obtain some GIS data
  • Install the GDAL Python library so that you can read the necessary data files

Let's use the U.S. Census Bureau's website to download a set of vector maps for the various U.S. states. The main site for obtaining GIS data from the U.S. Census Bureau can be found at:

To make things simpler, though, let's bypass the website and directly download the file we need:

The resulting file, tl_2009_us_state.zip, should be a ZIP-format archive. After uncompressing the archive, you should have the following files:

  • tl_2009_us_state.dbf

    tl_2009_us_state.prj

    tl_2009_us_state.shp

    tl_2009_us_state.shp.xml

    tl_2009_us_state.shx

These files make up a Shapefile containing the outlines of all the U.S. states. Place these files together in a convenient directory.

We next have to download the GDAL Python library. The main website for GDAL can be found at:

The easiest way to install GDAL onto a Windows or Unix machine is to use the FWTools installer, which can be downloaded from the following site:

If you are running Mac OS X, you can find a complete installer for GDAL at:

After installing GDAL, you can check that it works by typing import osgeo into the Python command prompt; if the Python command prompt reappears with no error message, GDAL was successfully installed and you are all set to go:

>>> import osgeo
>>>

Now that we have some data to work with, let's take a look at it. You can either type the following directly into the command prompt, or else save it as a Python script so you can run it whenever you wish (let's call this analyze.py):

importosgeo.ogr

shapefile = osgeo.ogr.Open("tl_2009_us_state.shp")
numLayers = shapefile.GetLayerCount()

print "Shapefile contains %d layers" % numLayers
print

forlayerNum in range(numLayers):
layer = shapefile.GetLayer(layerNum)
spatialRef = layer.GetSpatialRef().ExportToProj4()
numFeatures = layer.GetFeatureCount()
print "Layer %d has spatial reference %s" % (layerNum, spatialRef)
print "Layer %d has %d features:" % (layerNum, numFeatures)
print

forfeatureNum in range(numFeatures):
feature = layer.GetFeature(featureNum)
featureName = feature.GetField("NAME")

print "Feature %d has name %s" % (featureNum, featureName)

This gives us a quick summary of how the Shapefile's data is structured:

Shapefile contains 1 layers

Layer 0 has spatial reference +proj=longlat +ellps=GRS80 +datum=NAD83 +no_defs
Layer 0 has 56 features:

Feature 0 has name American Samoa
Feature 1 has name Nevada
Feature 2 has name Arizona
Feature 3 has name Wisconsin
...
Feature 53 has name California
Feature 54 has name Ohio
Feature 55 has name Texas

This shows us that the data we downloaded consists of one layer, with 56 individual features corresponding to the various states and protectorates in the U.S. It also tells us the "spatial reference" for this layer, which tells us that the coordinates are stored as latitude and longitude values, using the GRS80 ellipsoid and the NAD83 datum.

As you can see from the above example, using GDAL to extract data from Shapefiles is quite straightforward. Let's continue with another example. This time, we'll look at the details for feature 2, Arizona:

importosgeo.ogr

shapefile = osgeo.ogr.Open("tl_2009_us_state.shp")
layer = shapefile.GetLayer(0)
feature = layer.GetFeature(2)

print "Feature 2 has the following attributes:"
print

attributes = feature.items()

forkey,value in attributes.items():
print "  %s = %s" % (key, value)
print

geometry = feature.GetGeometryRef()
geometryName = geometry.GetGeometryName()

print "Feature's geometry data consists of a %s" % geometryName

Running this produces the following:

Feature 2 has the following attributes:

  DIVISION = 8
  INTPTLAT = +34.2099643
  NAME = Arizona
  STUSPS = AZ
  FUNCSTAT = A
  REGION = 4
  LSAD = 00
  AWATER = 1026257344.0
  STATENS = 01779777
  MTFCC = G4000
  INTPTLON = -111.6024010
  STATEFP = 04
  ALAND = 294207737677.0

Feature's geometry data consists of a POLYGON

The meaning of the various attributes is described on the U.S. Census Bureau's website, but what interests us right now is the feature's geometry. A Geometry object is a complex structure that holds some geo-spatial data, often using nested Geometry objects to reflect the way the geo-spatial data is organized. So far, we've discovered that Arizona's geometry consists of a polygon. Let's now take a closer look at this polygon:

importosgeo.ogr

defanalyzeGeometry(geometry, indent=0):
    s = []
s.append("  " * indent)
s.append(geometry.GetGeometryName())
ifgeometry.GetPointCount() > 0:
s.append(" with %d data points" % geometry.GetPointCount())
ifgeometry.GetGeometryCount() > 0:
s.append(" containing:")

print "".join(s)

for i in range(geometry.GetGeometryCount()):
analyzeGeometry(geometry.GetGeometryRef(i), indent+1)

shapefile = osgeo.ogr.Open("tl_2009_us_state.shp")
layer = shapefile.GetLayer(0)
feature = layer.GetFeature(2)
geometry = feature.GetGeometryRef()

analyzeGeometry(geometry)

The analyzeGeometry() function gives a useful idea of how the geometry has been structured:

POLYGON containing:
  LINEARRING with 10168 data points

In GDAL (or more specifically the OGR Simple Feature library we are using here), polygons are defined as a single outer "ring" with optional inner rings that define "holes" in the polygon (for example, to show the outline of a lake).

Arizona is a relatively simple feature in that it consists of only one polygon. If we ran the same program over California (feature 53 in our Shapefile), the output would be somewhat more complicated:

MULTIPOLYGON containing:
  POLYGON containing:
    LINEARRING with 93 data points
  POLYGON containing:
    LINEARRING with 77 data points
  POLYGON containing:
    LINEARRING with 191 data points
  POLYGON containing:
    LINEARRING with 152 data points
  POLYGON containing:
    LINEARRING with 393 data points
  POLYGON containing:
    LINEARRING with 121 data points
  POLYGON containing:
    LINEARRING with 10261 data points

As you can see, California is made up of seven distinct polygons, each defined by a single linear ring. This is because California is on the coast, and includes six outlying islands as well as the main inland body of the state.

Let's finish this analysis of the U.S. state Shapefile by answering a simple question: what is the distance from the northernmost point to the southernmost point in California? There are various ways we could answer this question, but for now we'll do it by hand. Let's start by identifying the northernmost and southernmost points in California:

importosgeo.ogr

deffindPoints(geometry, results):
for i in range(geometry.GetPointCount()):
x,y,z = geometry.GetPoint(i)
if results['north'] == None or results['north'][1] < y:
results['north'] = (x,y)
if results['south'] == None or results['south'][1] > y:
results['south'] = (x,y)

for i in range(geometry.GetGeometryCount()):
findPoints(geometry.GetGeometryRef(i), results)

shapefile = osgeo.ogr.Open("tl_2009_us_state.shp")
layer = shapefile.GetLayer(0)
feature = layer.GetFeature(53)
geometry = feature.GetGeometryRef()

results = {'north' : None,
'south' : None}

findPoints(geometry, results)

print "Northernmost point is (%0.4f, %0.4f)" % results['north']
print "Southernmost point is (%0.4f, %0.4f)" % results['south']

The findPoints() function recursively scans through a geometry, extracting the individual points and identifying the points with the highest and lowest y (latitude) values, which are then stored in the results dictionary so that the main program can use it.

As you can see, GDAL makes it easy to work with the complex Geometry data structure. The code does require recursion, but is still trivial compared with trying to read the data directly. If you run the above program, the following will be displayed:

Northernmost point is (-122.3782, 42.0095)
Southernmost point is (-117.2049, 32.5288)

Now that we have these two points, we next want to calculate the distance between them. As described earlier, we have to use a great circle distance calculation here to allow for the curvature of the Earth's surface. We'll do this manually, using the Haversine formula:

import math

lat1 = 42.0095
long1 = -122.3782

lat2 = 32.5288
long2 = -117.2049

rLat1 = math.radians(lat1)
rLong1 = math.radians(long1)
rLat2 = math.radians(lat2)
rLong2 = math.radians(long2)

dLat = rLat2 - rLat1
dLong = rLong2 - rLong1
a = math.sin(dLat/2)**2 + math.cos(rLat1) * math.cos(rLat2) 
                        * math.sin(dLong/2)**2
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
distance = 6371 * c

print "Great circle distance is %0.0f meters" % distance

Don't worry about the complex maths involved here; basically, we are converting the latitude and longitude values to radians, calculating the difference in latitude/longitude values between the two points, and then passing the results through some trigonometric functions to obtain the great circle distance. The value of 6371 is the radius of the Earth in kilometers.

More details about the Haversine formula and how it is used in the above example can be found at http://mathforum.org/library/drmath/view/51879.html.

If you run the above program, your computer will tell you the distance from the northernmost point to the southernmost point in California:

Great circle distance is 1149 meters

There are, of course, other ways of calculating this. You wouldn't normally type the Haversine formula directly into your program as there are libraries that will do this for you. But we deliberately did the calculation this way to show just how it can be done.

If you would like to explore this further, you might like to try writing programs to calculate the following:

  • The easternmost and westernmost points in California.
  • The midpoint in California. Hint: you can calculate the midpoint's longitude by taking the average of the easternmost and westernmost longitude.
  • The midpoint in Arizona.
  • The distance between the middle of California and the middle of Arizona.

As you can see, working with GIS data manually isn't too troublesome. While the data structures and maths involved can be rather complex, using tools such as GDAL make your data accessible and easy to work with.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.216.75