Chapter 7. Vector analysis with OGR

This chapter covers

  • Determining if geometries share a spatial location
  • Proximity relationships between geometries

Now you know how to access existing data and how to build your own geometries from scratch, but I see these as gateways to the more interesting task of spatial analysis. Without analysis capabilities, spatial data is only useful for making maps. Good cartography is essential for many things, but I imagine that even cartographers would get bored if new datasets weren’t continually created from various types of analyses. Plus, spatial analyses can answer countless questions relating to pretty much every discipline. In fact, you’re probably more likely to generate new data using the analysis functions described in this chapter than by creating geometries vertex by vertex as outlined previously in nauseating detail.

Spatial analysis with vector data comes down to looking at the spatial relationships between two or more geometries. Possible studies range from the extremely simple, such as the distance between two points, to much more complex algorithms such as network analyses. Have you ever wondered how certain mapping websites can provide you with various route options from point A to point B, and even provide travel times? That’s network analysis. One easy exercise that I sometimes find entertaining is comparing the distance I hiked with the straight-line distance between the starting and ending points, because these two distances can be significantly different in mountainous terrain. There probably isn’t much use for that particular example in my life, other than to satisfy my curiosity, but it’s important information for search-and-rescue teams who need to know actual distances. There are plenty of other important questions out there waiting to be answered using spatial analysis techniques.

For example, biologists can use the information downloaded from GPS collars to study how animals use various habitat types or their reactions to roads or other man-made features. Businesses use spatial data to help determine the best location for new stores or factories. Utility companies can use this type of data to select the best routes to install pipelines or electrical transmission lines, and mining companies use geographic information to determine areas that are likely rich in resources. If you’re reading this book, it’s likely that you have a specific type of analysis in mind, and it’s probably completely different from any of the examples mentioned. Spatial analyses are ubiquitous, and in fact, you use these sorts of analyses in your daily life when you choose where to live or what route to take to the office. OGR provides a good foundation for vector analysis, although it’s left to you to implement more-complicated algorithms that you may be interested in. This section will introduce you to the basic tools that make up this foundation.

7.1. Overlay tools: what’s on top of what?

One basic question in geographic analysis is what features occur at the same place. Certain entities, such as countries, don’t occur in the same location, although they may share borders. Other types of areas, such as the home ranges of individual bears, can easily overlap, as can boundaries that aren’t necessarily related, such as wetlands and land ownership. Many types of queries are concerned with this overlap idea. For example, insurance companies want to know if a parcel of land is on a floodplain before they set a premium, or even decide to insure it at all. A business looking for land to build a factory on would want to know which lots for sale are within an appropriate municipal land use zone. If you’re making a map of Stockholm, you’ll want to know which roads, train tracks, and parks, among other things, are within city limits.

What sorts of overlap tools exist? Several test certain conditions, such as Intersects, which tells you if two geometries share any space in common. For example, in figure 7.1, the line L2 intersects with the line L3 and the polygon L3. The polygons P2 and P4 also intersect. You can find out if two geometries touch edges, but don’t actually share any area, with Touches. This is also true for lines L2 and L3, but not L2 and P3, because they do more than touch. How about discovering if one geometry is contained completely within another? You can test that with either Contains or Within. Polygon P5 is within polygon P1, and P1 contains P5. See table 7.1 for a list of the available operations, along with examples for each one from figure 7.1. Note that while these functions work with polygons, they don’t work with linear rings. All functions return True or False. More information can be found in appendix C. (Appendixes C through E are available online on the Manning Publications website at https://www.manning.com/books/geoprocessing-with-python.)

Figure 7.1. Geometries used to get the results of overlay operations that are shown in table 7.1 and figure 7.2.

Table 7.1. Functions to test relationships between geometries. These all return True or False

Operation

Examples from figure 7.1

Intersects Polygons P2 and P4 intersect. Line L3 and Point B intersect. Point A and Polygon P2 intersect. Lines L2 and L3 intersect. Line L2 and Polygon P3 intersect.
Touches Polygon P2 and Point A touch. Polygon P5 and Point D do not touch. Lines L2 and L3 touch. Lines L1 and L3 do not touch.
Crosses Lines L1 and L3 cross. Lines L2 and L3 do not cross.
Within Line L1 is within Polygon P2. Line L3 is not within Polygon P2.
Contains Polygon P1 contains Polygon P5. Polygon P2 does not contain Polygon P4.
Overlaps Polygons P2 and P4 overlap. Polygons P1 and P5 do not overlap.
Disjoint Polygon P1 and Line L1 are disjoint. Polygons P1 and P4 are disjoint.

Several functions create new geometries based on the spatial relationships of existing geometries. For example, you can use Intersection to get a new geometry that represents only the area that two others have in common. In figure 7.1, the intersection of L1 and L3 is a single point; the intersection of L2 and P3 is a short segment from L2; and the intersection of P2 and P4 is shown in figure 7.2. You would probably use Intersection to create new datasets containing features only found within the Stockholm boundary when making the map mentioned earlier.

Figure 7.2. The results of several overlay operations on the P2 and P4 geometries from figure 7.1 are shown as hatched areas with dark outlines.

You can combine the areas of two existing geometries into one with Union, which may return a geometry collection if the input geometries are different types. You can treat a geometry collection kind of like a multigeometry, except that the parts don’t all need to be the same kind of geometry. For example, the union of L2 and P3 is a geometry collection containing a polygon and two lines, as shown in figure 7.3. The section of L2 that intersects P3 no longer exists as a line, and instead the space it takes up is included in the polygon. The union of P2 and P4 is a single polygon, as shown in figure 7.2. You might use this function if you were given a roads dataset in which the roads were broken up into segments based on changes in speed limits, which would be required for an analysis looking at travel time, but you want each road to be a single feature so it’s easier to use in a map.

Figure 7.3. The three parts of the geometry collection created by unioning L2 and P3 together.

It’s also possible to clip an intersection out of a geometry so that you’re left with the part of the geometry that doesn’t intersect the second geometry. Unlike Intersection and Union, the results from Difference depend on which geometry the function is called on and which is passed to it. This is also illustrated in figure 7.2.

There’s also SymDifference, which returns the union of two geometries with the intersection removed. If you were looking at the home ranges, or territories, of two different mountain lions, you might want to know the area that the first cat uses but the second doesn’t, or vice versa. You’d use Difference to get that information. You could use SymDifference to determine the area that was used by either lion, but not both. Intersection would give you the shared territory, and Union would provide the combined territories. Each type of information is likely useful to a cougar researcher, but each in a different way. In fact, it was a study similar to this, although on a threatened species of lizard and much more sophisticated than this simplified example, that got me hooked on GIS in the first place!

Let’s look at a concrete example. Figure 7.4 might remind you of our discussion of wetlands within the boundaries of New Orleans back in chapter 3. You’re about to look at two different ways of using intersections to determine the percentage of New Orleans made up by wetlands. But first, it will be helpful to do a little interactive exercise with the data to visualize what’s happening. Open the water bodies shapefile for the United States, which contains features such as lakes, streams, canals, and marshes, and plot one specific feature that represents a marsh near New Orleans. This shapefile has approximately 27,000 features, so don’t try to plot the entire file unless you want to wait all day.

Figure 7.4. A simple map of New Orleans showing the city boundary, water, and wetlands

>>> water_ds = ogr.Open(r'D:osgeopy-dataUSwtrbdyp010.shp')
>>> water_lyr = water_ds.GetLayer(0)

>>> water_lyr.SetAttributeFilter('WaterbdyID = 1011327')
>>> marsh_feat = water_lyr.GetNextFeature()
>>> marsh_geom = marsh_feat.geometry().Clone()
>>> vp.plot(marsh_geom, 'b')

You should now see something similar to figure 7.5, but without the city boundary. Add the New Orleans boundary to provide a little context:

>>> nola_ds = ogr.Open(r'D:osgeopy-dataLouisianaNOLA.shp')
>>> nola_lyr = nola_ds.GetLayer(0)
>>> nola_feat = nola_lyr.GetNextFeature()
>>> nola_geom = nola_feat.geometry().Clone()
>>> vp.plot(nola_geom, fill=False, ec='red', ls='dashed', lw=3)
Figure 7.5. The New Orleans city boundary is shown as a dashed line overlaid on a single, but large, marsh polygon from the United States water bodies dataset.

Now you have two polygons, one for New Orleans and one for a marsh that’s partly contained with the New Orleans boundary. Now intersect the two geometries:

>>> intersection = marsh_geom.Intersection (nola_geom)
>>> vp.plot(intersection, 'yellow', hatch='x')

You can see from figure 7.6 that the intersection geometry consists of the area that’s contained within both the city boundary and the marsh polygon. How can you use this to figure out how much of New Orleans is wetlands? Well, if you intersect the city boundary with all of the wetland polygons that it overlaps, then you’ll end up with a bunch of polygons that represent wetlands within the boundary. All you need to do then is sum up their areas and divide by the area of the New Orleans geometry. Let’s assume that anything in the water bodies dataset that’s not a lake is a wetland, and try this:

Figure 7.6. The result of intersecting the New Orleans boundary with the marsh is shown in the hatched area.

The first thing you do is change the attribute filter on the water bodies so that lakes, specifically Lake Pontchartrain, are ignored. Then you use a spatial filter to toss out all of the features not in the vicinity of New Orleans, which gets rid of almost everything in the shapefile. This step isn’t technically necessary, but it speeds up processing time considerably because you get to ignore most of the dataset. Then you loop through the remaining water bodies, intersect each one with the New Orleans geometry, and add the intersection area to a running total. When done with the loop, all you needed to do was divide by the area of New Orleans to get your answer.

Tip

Filtering out unneeded features, either with spatial or attribute filters, can significantly decrease your processing time.

You have an easier way to do this, however, if you want to work with layers instead of individual geometries. In this case, OGR takes care of looping through the geometries in the layers for you. Let’s intersect the New Orleans boundary with the water layer to get the area in common between the two:

As before, you limit the water bodies to the non-lakes, but you don’t perform a spatial filter because the layer intersection handles that. An empty layer is required for a layer intersection, however, so you do need the extra step of creating that. Because there’s no reason to save the layer, you use the memory driver to create the data source and layer. This driver doesn’t write anything out to disk, so it’s a good choice for temporary data. Once you have the empty layer, you pass it to the layer Intersection function, which populates it with the intersection of nola_lyr and water_lyr.

Once you have the intersected area, you can use a SQL statement to sum up the areas of all geometries in temp_lyr. Remember that ExecuteSQL returns a new layer object, so you need to get the first feature from it in order to access the results of the SUM function:

>>> sql = 'SELECT SUM(OGR_GEOM_AREA) AS area FROM temp'
>>> lyr = temp_ds.ExecuteSQL (sql)
>>> pcnt = lyr.GetFeature(0).GetField('area') / nola_geom.GetArea()
>>> print('{:.1%} of New Orleans is wetland'.format(pcnt))
28.7% of New Orleans is wetland

One more important detail is that functions that operate on entire layers instead of individual geometries preserve the attribute values from the input layers. This is handy if you still need the information about each feature. In this case you don’t need it, but think about the mountain lion home range example, but with even more cats. The researcher would almost definitely want to know which two cougars were sharing the same habitat, and a layer intersection would keep this information in the results, assuming it was in the original attribute tables.

7.2. Proximity tools: how far apart are things?

Another common problem when analyzing geographic features is determining how far apart they are from one another. For example, many municipalities have regulations concerning the types of businesses allowed within a certain distance of a church or school, and proximity to a large customer base is another important factor when considering business locations. Or how about an ornithologist trying to determine how roads affect the nesting sites chosen by various species of birds? He would need to measure the distance between each nest and the closest roads as part of his study.

Two proximity tools are included with OGR, one to measure distance between geometries and one to create buffer polygons. A buffer is a polygon that extends out a certain distance from the original geometry. Figure 7.7 shows the yard geometries from chapter 6 with buffers around them, although they’re not in their true yard configuration so that you can see the buffers better. You could use a buffer to visualize which businesses were within walking distance of your location, or to make sure that you didn’t build a pizza joint within a certain distance of an existing one. You could also buffer a stream geometry to get an idea of the riparian area surrounding it, or to show where cattle aren’t allowed to graze and risk damaging the ecosystem.

Figure 7.7. The geometries from the make-believe yard shown along with buffer geometries. Notice how the buffer for multigeometries becomes a single polygon if the individual buffers overlap.

Tip

Unprojected datasets (those using latitude and longitude) are fine for displaying data in many cases, but can be a poor choice when it comes to analysis. Think about how the longitudinal lines on a globe converge on the poles. One longitudinal degree at 40° latitude is shorter than one degree at the equator, which makes comparing distances at different latitudes extremely problematic. You’re much better off converting your data to a different coordinate system with a constant unit of measure.

As a buffering example, let’s figure out how many cities in the United States are within 10 miles of a volcano. We’ll use datasets that have an Albers projection so that the map units are meters instead of decimal degrees. We’ll also use this example to highlight a potential source of error when doing analyses like this. The first step in your analysis will be to buffer a volcano dataset by 16,000 meters, which is roughly equivalent to 10 miles. Because there isn’t a buffer function on an entire layer, you’ll buffer each volcano point individually and add it to a temporary layer. Once that’s done, you can intersect the buffer layer with the cities layer to get the number of cities that fall within that 10-mile radius. All of this is shown in the following listing.

Listing 7.1. A flawed method for determining the number of cities near volcanoes

From this you could conclude that that are 83 cities in the United States that are within 10 miles of a volcano. But for good measure, try doing the same thing with the slightly different method shown in listing 7.2. This time you’ll add the buffers to a multipolygon instead of a temporary layer. A function called UnionCascaded efficiently unions all of the polygons in a multipolygon together; you’ll use this to create one polygon from all of the volcano buffers and then use the result as a spatial filter on the cities layer.

Listing 7.2. A better method for determining the number of cities near volcanoes

Huh, somehow you lost five cities in the last few minutes, which is a little disconcerting. What happened? In the first example, a copy of a city is included in the output every time it falls within a volcano buffer. This means a city will be included more than once if it’s within 16,000 meters of multiple volcanoes. This happened with a few cities, which is why the count from the intersection method was wrong, and higher than from the spatial filter method. This is a good example of why you should always think through your methodology carefully, because the “obvious” solution might be incorrect and provide the wrong results.

Tip

Use UnionCascaded when you need to union many geometries together. It will be much faster than joining them one by one.

We’ll look at one last example. Perhaps you want to know how far a particular city is from a certain volcano. The first thing you need to do is get the geometries for the city and volcano of interest:

>>> volcano_lyr.SetAttributeFilter("NAME = 'Rainier'")
>>> feat = volcano_lyr.GetNextFeature()
>>> rainier = feat.geometry().Clone()

>>> cities_lyr.SetAttributeFilter("NAME = 'Seattle'")
>>> feat = cities_lyr.GetNextFeature()
>>> seattle = feat.geometry().Clone()

Once you have the geometries, you can use the Distance function to ask them how far apart they are from each other:

>>> meters = round(rainier.Distance(seattle))
>>> miles = meters / 1600
>>> print('{} meters ({} miles)'.format(meters, miles))
92656 meters (57.91 miles)

The city of Seattle is approximately 58 miles from Mount Rainier, which is considered an active volcano. Of course, you’d get a different answer if you used actual city boundaries instead of a point, but I doubt that the fine people of Seattle would appreciate the distinction if the mountain did erupt.

2.5D geometries

You may remember from the last chapter that geometries with z values are considered 2.5D in OGR because the z values aren’t used when performing spatial operations. To illustrate this, let’s look at the distance between two points:

>>> pt1_2d = ogr.Geometry(ogr.wkbPoint)
>>> pt1_2d.AddPoint(15, 15)
>>> pt2_2d = ogr.Geometry(ogr.wkbPoint)
>>> pt2_2d.AddPoint(15, 19)
>>> print(pt1_2d.Distance(pt2_2d))
4.0

That returns a distance of 4 units, as expected. Now try the same thing but with 2.5D points:

>>> pt1_25d = ogr.Geometry(ogr.wkbPoint25D)
>>> pt1_25d.AddPoint(15, 15, 0)
>>> pt2_25d = ogr.Geometry(ogr.wkbPoint25D)
>>> pt2_25d.AddPoint(15, 19, 3)
>>> print(pt1_25d.Distance(pt2_25d))
4.0

That also returned a distance of 4, but taking the elevation values into account, the real distance is 5. Clearly, the z values weren’t used in the calculation. How about an area example? This polygon is 10 units long on each side, so it should have an area of 100:

>>> ring = ogr.Geometry(ogr.wkbLinearRing)
>>> ring.AddPoint(10, 10)
>>> ring.AddPoint(10, 20)
>>> ring.AddPoint(20, 20)
>>> ring.AddPoint(20, 10)
>>> poly_2d = ogr.Geometry(ogr.wkbPolygon)
>>> poly_2d.AddGeometry(ring)
>>> poly_2d.CloseRings()
>>> print(poly_2d.GetArea())
100.0

You got the expected result there, but try moving the right-most edge to a higher elevation so that the rectangle is in the 3D plane:

>>> ring = ogr.Geometry(ogr.wkbLinearRing)
>>> ring.AddPoint(10, 10, 0)
>>> ring.AddPoint(10, 20, 0)
>>> ring.AddPoint(20, 20, 10)
>>> ring.AddPoint(20, 10, 10)
>>> poly_25d = ogr.Geometry(ogr.wkbPolygon25D)
>>> poly_25d.AddGeometry(ring)
>>> poly_25d.CloseRings()
>>> print(poly_25d.GetArea())
100.0

This new rectangle also claims to have an area of 100 but in reality, the area is closer to 141.

Overlay operations also ignore the elevation values. For example, if elevation were accounted for, pt1_2d would be contained in the 2D polygon but not in the 2.5D one, which isn’t what we see:

>>> print(poly_2d.Contains(pt1_2d))
True
>>> print(poly_25d.Contains(pt1_2d))
True

Now you know the basics of spatial analysis with vector data. You might not need to do anything more complicated than what you’ve seen here, but if you do, these tools are the building blocks with which to start.

7.3. Example: locating areas suitable for wind farms

Let’s do a simple analysis to look for suitable wind farm locations in Imperial County, California. The United States National Renewal Energy Laboratory provides a wind dataset that shows areas in the United States that are suitable for wind farms based on wind speed and abundance, and geographical factors such as terrain (figure 7.8). Areas are rated on a scale of 1 to 7, where anything 3 and above is generally considered suitable. We’ll combine this with census data to locate areas with an appropriate wind rating and a population less than 0.5 per square kilometer.

Figure 7.8. Census and wind data for Imperial County, CA. The darker the shading, the better the wind conditions for a wind farm. The hatched area shows census tracts with a population density less than 0.5/km2.

The census dataset contains population per census tract, but doesn’t have a population density attribute. You can calculate that given the tract area and the population, however, so the first thing to do is add a field containing that information:

census_fn = r'D:osgeopy-dataCaliforniaca_census_albers.shp'
census_ds = ogr.Open(census_fn, True)
census_lyr = census_ds.GetLayer()
density_field = ogr.FieldDefn('popsqkm', ogr.OFTReal)
census_lyr.CreateField(density_field)
for row in census_lyr:
    pop = row.GetField('HD01_S001')
    sqkm = row.geometry().GetArea() / 1000000
    row.SetField('popsqkm', pop / sqkm)
    census_lyr.SetFeature(row)

You open the census shapefile for editing and add a floating-point field. Then you loop through each row and calculate the population density. The map units for this dataset are meters, so the geometry’s area is square meters, but you convert that to square kilometers by dividing by 1,000,000. You grab the tract population from the HD01_S001 field and divide by the calculated area to get population per km2.

Now get the geometry for Imperial County so that you can use it to spatially limit your analysis. You don’t need to keep the county data source open after cloning the geometry.

county_fn = r'D:osgeopy-dataUScountyp010.shp'
county_ds = ogr.Open(county_fn)
county_lyr = county_ds.GetLayer()
county_lyr.SetAttributeFilter("COUNTY ='Imperial County'")
county_row = next(county_lyr)
county_geom = county_row.geometry().Clone()
del county_ds

But one problem exists, though. The county data uses coordinates that are latitude and longitude values, but the census and wind datasets use meters. You’ll learn how to work with different spatial reference systems like these in the next chapter, but for now please trust me that this bit of code will convert the county geometry to the correct coordinate system:

county_geom.TransformTo(census_lyr.GetSpatialRef())
census_lyr.SetSpatialFilter(county_geom)
census_lyr.SetAttributeFilter('popsqkm < 0.5')

Once the geometry is converted, you use it to set a spatial filter on the census tract data so you’ll only be considering tracts in the correct part of the state. You also set an attribute filter to further limit the tracts to those with a low population density.

Now open the wind dataset and use an attribute filter to limit it to the areas with a rating of 3 or better:

wind_fn = r'D:osgeopy-dataCaliforniacalifornia_50m_wind_albers.shp'
wind_ds = ogr.Open(wind_fn)
wind_lyr = wind_ds.GetLayer()
wind_lyr.SetAttributeFilter('WPC >= 3')

It makes sense to create a data source to put the results in before starting any analysis, so let’s do that now. Create a new shapefile that uses the same spatial reference system as the wind data, and then add fields for the wind rating and the population density. You might as well use the layer’s definition to create an empty feature for inserting data later, too.

out_fn = r'D:osgeopy-dataCaliforniawind_farm.shp'
out_ds = ogr.GetDriverByName('ESRI Shapefile').CreateDataSource(out_fn)
out_lyr = out_ds.CreateLayer(
    'wind_farm', wind_lyr.GetSpatialRef(), ogr.wkbPolygon)
out_lyr.CreateField(ogr.FieldDefn('wind', ogr.OFTInteger))
out_lyr.CreateField(ogr.FieldDefn('popsqkm', ogr.OFTReal))
out_row = ogr.Feature(out_lyr.GetLayerDefn())

You’re finally ready to look for possible wind farm locations. In the next listing, you’ll loop through the census tracts, intersect them with the suitable wind polygons, and put the results in your new shapefile.

Listing 7.3. Intersecting census and wind data

You have an extra step to get the results you want, however. Unfortunately, the census and county boundaries don’t line up exactly (figure 7.9), which means that a census tract that barely overlaps the county because of this data error will be used to select wind polygons even though you don’t need it. One way to deal with this is to intersect the census and county polygons so that you only use the part of the census polygon that falls within the county polygon (for example, the tiny sliver in figure 7.9). Once you’ve found this intersection, then you can use a spatial filter to select the wind polygons that it contains or overlaps.

Figure 7.9. The solid census tract boundary doesn’t line up perfectly with the dotted county boundary.

After setting the spatial filter, you iterate through the selected wind polygons and intersect each of them with the census polygon. This throws out parts of a census tract that don’t get enough wind or suitable wind areas with too high of a population density. The attribute filter remains in effect, even with the spatial filter changes, so this is always limited to the suitable wind polygons. You add each of these intersection polygons to the new dataset, along with attributes for wind class and population density.

Figure 7.10 is zoomed in on part of the results. You’re close, but it would be nice to have large polygons instead of many small ones. This will lose the information about wind suitability class and population density, but at this point you know that all of your polygons are appropriate, anyway.

Figure 7.10. Suitable wind farm locations according to our analysis. The darker the shading, the higher the wind rating.

The fastest way to combine the little polygons into one large one is to use the UnionCascaded function, which requires that the polygons to be joined are all contained in a single multipolygon. It works correctly only if you add individual polygons to the multipolygon, however. If you add a multipolygon, then you’ll get incorrect results later, so you need to break up any multipolygons created by your earlier intersections and add each one individually. The following listing shows this process.

Listing 7.4. Combining small polygons into large ones

After you union all of the polygons together into one large multipolygon, you go through it and break it up into individual polygons that you add to the new shapefile. Small islands of land that aren’t big enough to hold a wind farm can be thrown out, so you only keep the polygons with an area of at least a square kilometer. The results are shown in figure 7.11, and you can see that some of these little polygons that were off by themselves are now gone.

Figure 7.11. The results from unioning the small polygons in figure 7.10 together and throwing out the small island polygons

A dataset like this, with only large polygons, is probably easier to work with than one with many small polygons, as long as you don’t need the information that’s lost by joining them all together.

7.4. Example: animal tracking data

The website https://www.movebank.org/ has a database of animal tracking data for studies all over the world. I downloaded GPS location data for Galapagos Albatrosses as a CSV file, but let’s convert it into a shapefile and then play with the data a bit. You can use the x and y coordinates from the location-long and location-lat columns to create a point and copy that and the individual-local-identifier and timestamp columns as attributes. The shapefile format doesn’t support true date/time fields, so you’ll keep the timestamp information as a string. The code for this is shown in the following listing.

Listing 7.5. Create a shapefile from a .csv file

Unfortunately, if you plot this new shapefile or open it up in a GIS, you’ll see bad points over by Africa (figure 7.12). There must have been an error with the data collection for these points, so their latitude and longitude values are set to 0. Let’s get rid of them.

Figure 7.12. A few bad GPS locations by Africa instead of South America

Because you know that the bad points have coordinates of (0, 0), you can set a spatial filter to select those points and then delete them one by one:

shp_ds = ogr.Open(shp_fn, True)
shp_lyr = shp_ds.GetLayer()
shp_lyr.SetSpatialFilterRect(-1, -1, 1, 1)
for shp_row in shp_lyr:
    shp_lyr.DeleteFeature(shp_row.GetFID())
shp_lyr.SetSpatialFilter(None)
shp_ds.ExecuteSQL ('REPACK ' + shp_lyr.GetName())
shp_ds.ExecuteSQL ('RECOMPUTE EXTENT ON ' + shp_lyr.GetName())
del shp_ds

Don’t forget to use REPACK to permanently delete the points and RECOMPUTE EXTENT to recalculate the shapefile’s spatial extent. Now all of the points are between the Galapagos Islands and South America, as shown in figure 7.13.

Figure 7.13. GPS locations for Galapagos Albatrosses

Now that the bad points are gone, you can think about doing some analysis. The first things I think of doing with GPS tracking data from animals are to see how far they move and to look at the area they use. Unfortunately, latitude/longitude data in degrees isn’t ideal for this, but that’s the coordinate system used by these points. Because you won’t learn how to work with spatial references and coordinate systems until the next chapter, let’s see how to convert between coordinate systems using the ogr2ogr command-line utility. Remember that you need to run this from a terminal window or command prompt, not from Python. You’ll also need to make sure that you’re in the same folder as the albatross_dd shapefile.

You’ll convert the coordinates to a system that uses meters rather than degrees as units of measure. Not only are meters easier to understand (most people probably can’t visualize half a degree very well), but they’re constant, unlike degrees that change with latitude. The system you’ll use is called Lambert Conformal Conic, and you’ll use a variation of it that’s specific for South America. The parts of this command after -t_srs and up to +no_defs are what define the coordinate system. The output will be a shapefile called albatross_lambert.shp.

ogr2ogr -f "ESRI Shapefile" -t_srs "+proj=lcc +lat_1=-5 +lat_2=-42
 +lat_0=-32 +lon_0=-60 +x_0=0 +y_0=0 +ellps=aust_SA +units=m +no_defs"
 albatross_lambert.shp albatross_dd.shp

Now you’ve got a shapefile that uses meters, so let’s calculate the distance between each location. To do this, you need to select the points for an individual bird, so let’s write a function that will get unique values from an attribute column. You can use the function from the following listing to get tag_id values in later listings.

Listing 7.6. Function to get unique values from an attribute column
def get_unique(datasource, layer_name, field_name):
    sql = 'SELECT DISTINCT {0} FROM {1}'.format(field_name, layer_name)
    lyr = datasource.ExecuteSQL (sql)
    values = []
    for row in lyr:
        values.append(row.GetField(field_name))
    datasource.ReleaseResultSet(lyr)
    return values

To calculate distances, you’ll iterate through the points for each bird in order and then calculate the distance between each location and the previous one, so you’ll need to keep track of the previous point as you loop. The points should be in the correct order in the original .csv file, which means they’re also in order in the shapefile you created, but you’ll add code to check, just in case. If it does find something out of order, it will bail so that you can correct the problem. The following listing shows the process.

Listing 7.7. Calculating distance between adjacent points

Before starting the loop, you save the timestamp and point geometry for the first location so that you can use it to calculate distance between it and the second feature. This increments the current feature, so the loop starts with the second feature instead of the first, and you calculate a distance between the first and second points in the first iteration. After saving the distance, you store the timestamp and geometry for the current feature in your “previous” variables, so that the next time through the loop you’ll have this information. If you hadn’t stored the current values, you’d always calculate the distance to the first point, because that’s the one originally stored in previous_pt.

Now it’s an easy matter to get information about the distances. For example, you could use SQL to find out which bird had the longest distance between GPS fixes:

ds = ogr.Open(r'D:osgeopy-dataGalapagos')
for tag_id in get_unique(ds, 'albatross_lambert', 'tag_id'):
    sql = """SELECT MAX(distance) FROM albatross_lambert
             WHERE tag_id = '{0}'""".format(tag_id)
    lyr = ds.ExecuteSQL (sql)
    for row in lyr:
        print '{0}: {1}'.format(tag_id, row.GetField(0))

The first few lines of output look like this:

4264-84830852: 106053.530233
4266-84831108: 167097.198703
1103-1103: 69342.7642097

What if later you want to know the maximum travel speed from one point to the next? You’ve got the distances, but you need to know the amount of time in between GPS readings to calculate speed. The fact that the timestamp field is a string, not a date/time, presents a small but easily surmountable problem. Fortunately, you can create Python datetime objects from a string as long as you can tell it how the string is formatted. The timestamps in your dataset look like this:

timestamp = '2008-05-31 13:30:02.001'

You can create a format string that matches this using the information at https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior, and then use the strptime function to convert the string to a datetime:

date_format = '%Y-%m-%d %H:%M:%S.%f'
my_date = datetime.strptime(timestamp, date_format)

The following listing shows how to use this information to find the maximum travel speed between each location. This won’t be completely accurate because it’s probably rare that a bird has been flying the entire time between readings, but at least it’s a start.

Listing 7.8. Find maximum speed from locations and elapsed time

As with finding distance, you need to keep track of the previous point so that you can find the length of time between GPS fixes. After getting that information, you divide the distance by the number of hours between readings to get speed in meters per hour.

Now let’s take a look at the areas used by each bird. Sophisticated methods are available for determining an animal’s home range, but we’ll use convex hull polygons because they’re simple and OGR has them built in. To do this, you need to put the points for each bird into a multipoint geometry that can then be used to create the convex hull polygons, as shown in the following listing.

Listing 7.9. Create convex hull polygons for each bird

The results are shown in figure 7.14, where the polygon for one bird is filled in and the rest are hollow. Maybe it’s me, but those big polygons don’t tell me a whole lot. I’d like to see the area each bird used around the islands and around the mainland, but without the middle of the ocean. In fact, comparing the area used between different visits to the archipelago or mainland might be interesting.

Figure 7.14. The ranges for each bird. The polygon for the bird with ID 4264-84830852 is filled in, but the rest are hollow.

I can think of a few different ways to separate the points into different visits to the continent or islands, but we’ll only look at one of them. Listing 7.10 does it by ignoring all locations that are more than 100 kilometers from land, and every time the birds cross an imaginary vertical line in the middle of the ocean, a new set of points is created so the two geographical areas are separated. In the interest of space, the code to create the new polygon shapefile itself is omitted and only the code to create the polygons is shown.

Listing 7.10. Create convex hull polygons separated by geographic area

In this example you need a land dataset so you can tell which points are within 100 kilometers of land. After getting the land polygon, you buffer it by 100,000 meters, which is the same as 100 km. When iterating through the points, the first thing you do is check to see if the point falls within the land buffer. If it doesn’t, then you move on to the next point without doing anything more. If a point is within the buffer, and therefore within 100 kilometers of land, you check to see which side of the imaginary line the point is on and set a location variable to keep track of whether the point is on the island or mainland. If the location has changed since the previous point you looked at, and you encounter at least three locations (the minimum required for a polygon), then you create a new convex hull polygon with the collected points. After checking the number of points and possibly creating a polygon, you create a new multipoint object to store the next set of points. If you hadn’t created a new multipoint, then your next convex hull would use all locations you’d saved so far, but you want to start over now that you’re in a different geographic area. When you finish iterating through the points for a specific animal, the points since the last location change still need to be turned into a polygon, so you’ve got another bit of code to take care of those last locations.

The result for one bird is shown in figure 7.15. This is the same animal whose range was shaded in figure 7.14, so you can see the difference in the calculated ranges.

Figure 7.15. The area-specific ranges for bird 4264-84830852. Compare this with the large polygon for the same animal shown in figure 7.14.

I don’t know about you, but I’m curious how much of the same area is used by an individual on separate visits to the islands or the mainland—do they haunt the same locations or do they switch it up a bit? A simple way to get at this, which ignores the fact that some polygons might be created from a day’s worth of data as opposed to others with a week or two, might be to look at the ratio of common area to total area (if you’re an albatross biologist, please don’t cringe too much at my idea). Figure 7.16 shows the difference between these two for one of the bird’s visits to the islands.

Figure 7.16. The outlines show the areas used on four different visits to the islands by bird 1163-1163, and the shaded areas show the results of union and intersection operations on these polygons.

Let’s look at how you’d calculate this ratio, and then we’ll leave the albatrosses alone. The following listing does it for the bird shown in figure 7.16, but you could easily adapt the code to calculate this value for all of the birds.

Listing 7.11. Calculate percentage of total area used in all island visits
ds = ogr.Open(r'D:osgeopy-dataGalapagos')
lyr = ds.GetLayerByName('albatross_ranges2')
lyr.SetAttributeFilter("tag_id = '1163-1163' and location = 'island'")
row = next(lyr)
all_areas = row.geometry().Clone()
common_areas = row.geometry().Clone()
for row in lyr:
    all_areas = all_areas.Union(row.geometry())
    common_areas = common_areas.Intersection(row.geometry())
percent = common_areas.GetArea() / all_areas.GetArea() * 100
print('Percent of all area used in every visit: {0}'.format(percent))

The output looks like this:

Percent of all area used in every visit: 25.1565197202

It looks like a quarter of this bird’s total range is used during each visit to the islands, but it’s clear from figure 7.16 that this area is in the middle of the range. A next step might be to see how many points were used to create each polygon—maybe the larger polygons are bigger because they have more points and not because the bird changed its habits each time.

7.5. Summary

  • Overlap tools tell you the spatial relationship of geometries to each other, such as whether or not they intersect in space.
  • Proximity tools are used to determine distances between geometries or create buffers around them.
  • As with any type of analysis, it’s important to carefully consider your methodologies and assumptions when creating your workflow.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.98.250