Chapter 5. Filtering data with OGR

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5. Filtering data with OGR

This chapter covers

Efficiently selecting features using attribute values
Using spatial location to select features
Joining attribute tables from different layers

Back in chapter 3, you learned how to iterate through all of the features in a layer and use attribute values for each one to determine if it was interesting. You’ve got easier ways to throw out features that you don’t want, however, and that’s where filters come in. With filters you can easily select features that match specific criteria, such as all animal GPS locations from a certain day or all crabapple trees from a city tree inventory. Filters also let you limit features by spatial extent, so you could limit your crabapple trees to a specific neighborhood, or GPS locations to those within a kilometer of an animal feeding station. Filtering your data like this makes it easy to extract or process only the features you’re interested in. I’ve used these techniques to extract features such as city boundaries for a single county from a larger dataset, or to extract highways and freeways from road datasets, while ignoring the smaller residential roads.

You can also use SQL queries to join attribute tables together from different layers. For example, if you had a layer containing all of the locations of your store franchises and each feature had an attribute denoting the city that the store was in, then you could join this layer with one containing cities. If the city layer contained demographic information for each city, then that data would be associated with the store data, and you could easily compare demographics between stores.

Definition

SQL is short for Structured Query Language, although you’ll rarely see it written out like that. If you’ve used a relational database, then you’ve probably used SQL, even if you didn’t realize it. For example, if you build a graphical query in Microsoft Access, it still builds a SQL query behind the scenes, and you can see it if you switch to SQL View. SQL is featured more prominently in other database software such as PostgreSQL.

5.1. Attribute filters

If you need to limit the features by values contained in one or more attribute fields, then you want an attribute filter. To set one of these filters, you need to come up with a conditional statement that’s much like the WHERE clause in a SQL statement. You compare the value of an attribute field to another value, and then all features where that comparison is true are returned. The standard logical operators, such as =, !=, <>, >, <, >=, and <=, allow you to use statements such as the following:

You can probably guess what these comparisons do; they all test for equality or inequality. Notice that if you’re comparing strings, you need to put quotes around the string values, but they can be either single or double. Make sure they’re different from the quotes you use to surround the entire query string, or else you’ll end your string prematurely and get a syntax error. Don’t use quotes with numbers, because that turns them into string values, and you won’t get the comparison you were expecting. Another thing you might have noticed is that you use a single equal sign to test for equality, which isn’t the way programming languages typically work. But that’s the way SQL does things, so who are we to argue? In addition, if you want to test if something doesn’t equal another value, you can use either != or <>.

You can also combine statements using AND or OR:

'(Population > 25000) AND (Population < 50000)'
'(Population > 50000) OR (Place_type = "County Seat")'

The first of these selects features with a population value greater than 25,000 but less than 50,000. The second selects features that either have a population greater than 50,000 or are county seats (or both).

Conditions can be negated using NOT, and NULL is used to indicate a null or no data value in the attribute table:

'(Population < 50000) OR NOT (Place_type = "County Seat")'
'County NOT NULL'

That first example selects features that either have a population less than 50,000 or aren’t county seats. Again, a feature will be selected if it meets one or both of those conditions. The second example selects features that have a value for the County attribute.

If you want to check if a value is between two other values, you can use BETWEEN instead of two different comparisons joined with AND. For example, the following two statements are equivalent, and both select features with a population between 25,000 and 50,000:

'Population BETWEEN 25000 AND 50000'
'(Population > 25000) AND (Population < 50000)'

You have an easy way to check if a value is equal to one of several different values. Once again, both of these select features where the Type_code value is 4, 3, or 7:

'Type_code IN (4, 3, 7)'
'(Type_code = 4) OR (Type_code = 3) OR (Type_code = 7)'

This also works for strings:

'Place_type IN ("Populated Place", "County Seat")'

Last, you can compare strings using the normal logical operators (a is less than c), or you can do fancier, case-insensitive, string matching using LIKE. This allows you to use wildcards to match any character in a string. An underscore matches any single character and a percent sign matches any number of characters. Table 5.1 shows examples, and this is how you’d use them:

'Name LIKE "%Seattle%"'

Table 5.1. Match examples using the LIKE operator

Pattern	Matches	Doesn’t match
_eattle	Seattle	Seattle WA
Seattle%	Seattle, Seattle WA	North Seattle
%Seattle%	Seattle, Seattle WA, North Seattle	Tacoma
Sea%le	Seattle	Seattle WA
Sea_le	Seatle (note misspelling)	Seattle

If you want to read more about the SQL syntax available in OGR, check out the online documentation at http://www.gdal.org/ogr_sql.html and http://www.gdal.org/ogr_sql_sqlite.html. But for now, let’s see how to put this newfound information to use. It will definitely be more fun if you fire up a Python interactive window for testing this out, because you can use the VectorPlotter class to interactively draw your selections. After configuring an interactive vector plotter, open the global data folder and grab the low-resolution countries layer:

>>> ds = ogr.Open(r'D:osgeopy-dataglobal')
>>> lyr = ds.GetLayer('ne_50m_admin_0_countries')

Then plot out the features, but be patient if it takes it a few seconds to draw the output shown in figure 5.1, since it has a fair amount of data to plot. Remember that setting fill=False tells it to draw only country outlines.

Figure 5.1. The ne_50m_admin_0_countries shapefile layer in the global data folder, with no filters applied

>>> vp.plot(lyr, fill=False)

Now inspect the layer attributes by printing out the names of the first few features:

>>> pb.print_attributes(lyr, 4, ['name'], geom=False)
FID    name
0      Aruba
1      Afghanistan
2      Angola
3      Anguilla
4 of 241 features

Notice that the feature IDs (FIDs) are in order and also the fact that there are 241 features in the layer. Now find out how many of those are in Asia by using an attribute filter. To do this, pass a conditional statement to SetAttributeFilter:

>>> lyr.SetAttributeFilter('continent = "Asia"')
0
>>> lyr.GetFeatureCount()
53

Now the layer thinks it has only 53 features. The zero that got spit out when you called SetAttributeFilter means that the query executed successfully. Now that you have selected the countries in Asia, try drawing them in yellow; your result should look like figure 5.2:

>>> vp.plot(lyr, 'y')

Figure 5.2. An attribute filter that selects countries in Asia has been applied to the countries layer shown in figure 5.1, and the results are plotted on top of the original.

Note to Print Book Readers: Color Graphics

Many graphics in this book are best viewed in color. The eBook versions display the color graphics, so they should be referred to as you read. To get your free eBook in PDF, ePub, and Kindle formats, go to https://www.manning.com/books/geoprocessing-with-python to register your print book.

You can look a little more closely at what’s happening with the filter by printing attributes for the first few features:

>>> pb.print_attributes(lyr, 4, ['name'], geom=False)
FID    name
1      Afghanistan
7      United Arab Emirates
9      Armenia
17     Azerbaijan
4 of 53 features

Huh. Now you’re missing a bunch of FIDs. That’s because those features aren’t in Asia, so they’re ignored while iterating through the layer. Getting features by specific FID doesn’t honor the filter, however, because features aren’t truly being deleted, and therefore the FID values don’t change. You can prove it to yourself by getting a feature or two using FIDs:

>>> lyr.GetFeature(2).GetField('name')
'Angola'

You can see from this that even though Angola doesn’t show up when you iterate through the filtered layer, it’s still there. It should be obvious to you now that looping through a filtered layer using specific FIDs is a bad idea and you won’t get the desired results. Instead, you need to iterate through the layer using a for loop.

If you set another attribute filter, it doesn’t create a subset of the currently filtered features. Instead, the new filter is applied to the entire layer. To illustrate this, try applying a new filter that selects the countries in South America, and then draw them in blue, which results in the shading you see in figure 5.3.

Figure 5.3. An attribute filter that selects countries in South America has been applied to the countries layer and the results plotted on top of the previous data from figure 5.2.

>>> lyr.SetAttributeFilter('continent = "South America"')
>>> vp.plot(lyr, 'b')

You can, however, use both attribute and spatial filters together to refine your results, and you’ll see an example of that in the next section. To clear out the attribute filter and get all 241 features back, simply pass None to SetAttributeFilter:

>>> lyr.SetAttributeFilter(None)
>>> lyr.GetFeatureCount()
241

Removing the filter also resets the current feature back to the beginning, as if you had just opened the layer.

5.2. Spatial filters

Spatial filters let you limit the features by spatial extent rather than attribute value. These filters can be used to select features within another geometry or inside a bounding box. For example, if you had a dataset of global cities with no attribute indicating the country that the cities are in, but you had another dataset with the same spatial reference system that contained the boundary of Germany, you could use a spatial filter to select the German cities.

Spatial reference systems and spatial filters

The geometries or coordinates used for spatial filtering must use the same spatial reference system as the layer you’re trying to filter. Why is this? Pretend for a moment that you have a layer that uses a Universal Transverse Mercator (UTM) spatial reference system. Coordinates in that layer would be large numbers, much different than the latitude and longitude values we’re all familiar with. This means that they wouldn’t align if plotted on top of each other, and they’d appear to have non-overlapping spatial extents. For example, the UTM easting and northing coordinates for the capitol building in Salt Lake City, UT, are approximately 425045 and 4514422, but the corresponding longitude and latitude are -111.888 and 40.777. Those coordinates are awfully different from each other, and they wouldn’t overlay on each other unless one of them was transformed to the same spatial reference system as the other.

Try selecting cities in Germany using the natural earth shapefiles. After setting up a vector plotter in an interactive window, open the folder data source and get the countries layer. Then use an attribute filter to limit the countries to Germany and grab the corresponding feature and geometry:

>>> ds = ogr.Open(r'D:osgeopy-dataglobal')
>>> country_lyr = ds.GetLayer('ne_50m_admin_0_countries')
>>> vp.plot(country_lyr, fill=False)
>>> country_lyr.SetAttributeFilter('name = "Germany"')
>>> feat = country_lyr.GetNextFeature()
>>> germany = feat.geometry().Clone()

You can assume, in this case, that the attribute filter will return one and only one feature, so using GetNextFeature will get the first and only feature in the filtered results. Then you grab the geometry and clone it so that you can use the geometry even after the feature is removed from memory. Oh, and you also plot the world countries before applying the filter so that you have context for the cities later on. Now open the populated places layer and plot all cities (see figure 5.4) as yellow dots:

>>> city_lyr = ds.GetLayer('ne_50m_populated_places')
>>> city_lyr.GetFeatureCount()
1249
>>> vp.plot(city_lyr, 'y.')

Figure 5.4. All of the cities in the populated_places layer in the natural earth dataset

The call to GetFeatureCount indicates there are 1,249 city features in the full layer. Now try applying a spatial filter by passing the germany geometry that you got earlier to SetSpatialFilter, and then plot the resulting cities as large dots:

>>> city_lyr.SetSpatialFilter(germany)
>>> city_lyr.GetFeatureCount()
5
>>> vp.plot(city_lyr, 'bo')

Now the layer claims to have only five features, so five cities fall within the German boundary polygon. You can also see from your plot that the circles fall in the correct geographical area. You can use the Zoom to rectangle tool on the bottom of the plot window to zoom in on Germany if you’d like (figure 5.5).

Figure 5.5. A spatial filter has been applied to the populated_places layer to limit the features to those within the boundaries of Germany. These filtered points are shown in as large dots.

To clone or not to clone?

Geometry objects have a Clone function, which makes a copy of the object. Why would you want to use this? When you get a geometry from a feature, that geometry is still associated with that feature. If that feature is then deleted (or the variable is populated with a different feature), then the geometry is no longer useable. In fact, if you try to use it, Python will crash instead of spit out an error. This problem is easy to solve, however, by cloning the geometry. Now you can store a copy of the feature or geometry that’s no longer associated with other objects and will live on even if the parent objects disappear. Want to see this in action? Try this in an interactive window:

In this example, the geom variable holds a Geometry object that’s still owned by the Feature object stored in the feat variable, but the geom_clone variable holds a geometry that has been disassociated from that feature. After you populate the feat variable with a different feature, you can still use the geom_clone geometry, but not the object stored in the geom variable, because you no longer have a handle to the feature that it came from.

Incidentally, this is related to why all of these examples would also cause Python to crash:
feat = ogr.Open(fn, 0).GetLayer(0).GetNextFeature()
# or
lyr = ogr.Open(fn, 0).GetLayer(0)
feat = lyr.GetNextFeature()
# or
ds = ogr.Open(fn, 0)
lyr = ds.GetLayer(0)
del ds
feat = lyr.GetNextFeature()

In each case, the data source has gone out of scope or been deleted before you try to use the layer. But the layer is associated with the data source and becomes unusable once the data source is gone, the same way a geometry becomes unusable if its parent feature disappears. You should never close your data source if you still need access to the layer.

As promised, you now get to combine a spatial and an attribute query. Further refine your selection by finding the cities with a population over 1,000,000, and draw them as the squares shown in figure 5.6:

>>> city_lyr.SetAttributeFilter('pop_min > 1000000')
>>> city_lyr.GetFeatureCount()
3
>>> vp.plot(city_lyr, 'rs')

Figure 5.6. An attribute filter has been combined with a spatial filter to select the German cities with a population greater than 1,000,000 people. The selected features are shown as squares instead of circles.

Judging from these results, there are three German cities with populations of more than 1,000,000 people. Figure 5.6 shows the output plot zoomed in on Germany so you can see these features. But what if you decide that you want to know how many cities exist in the entire world with a population that large? All you have to do is remove the spatial filter by passing None to SetSpatialFilter. Note that the attribute filter will still be in effect. Go ahead and try it, drawing the results as triangles:

>>> city_lyr.SetSpatialFilter(None)
>>> city_lyr.GetFeatureCount()
246
>>> vp.plot(city_lyr, 'm^', markersize=8)

And now you know where the largest cities in the world are (figure 5.7).

Figure 5.7. The spatial filter has been removed, but the attribute filter is still in effect, so now all of the cities in the world with a population of more than 1,000,000 are drawn as triangles over the top of the original dots for all cities.

You’re not completely out of luck if you’d like to filter features spatially but don’t have a geometry to use. You can also use a rectangular extent by providing the minimum and maximum x and y coordinates:

SetSpatialFilterRect(minx, miny, maxx, maxy)

You can use this to select the countries that fall within the box shown in figure 5.8. Again, start by plotting all of the countries:

>>> vp.clear()
>>> country_lyr.SetAttributeFilter(None)
>>> vp.plot(country_lyr, fill=False)

Figure 5.8. The minimum and maximum x and y values for the rectangle surrounding Australia can be used to set a spatial extent on the global countries layer.

Now plug in the bounding coordinates shown in figure 5.8:

>>> country_lyr.SetSpatialFilterRect(110, -50, 160, 10)
>>> vp.plot(country_lyr, 'y')

Now you should have a plot that looks similar to figure 5.9, with Australia and a few surrounding countries shaded in.

Figure 5.9. The shaded countries were selected using the rectangular extent shown in figure 5.8.

Tip

To clear a spatial filter, whether it was created with a geometry or a bounding box, pass None to SetSpatialFilter. You can’t clear the filter using SetSpatialFilterRect.

5.3. Using SQL to create temporary layers

If you’re familiar with SQL, or are willing to learn, you can create more-complicated queries and do fun stuff using the ExecuteSQL function on a data source. This function applies to a data source instead of a layer because it allows you to use multiple layers if desired. It requires a SQL query and can optionally use a geometry as a spatial filter. In addition, you can also specify a different SQL dialect, but more on that later. Here’s the signature:

ExecuteSQL(statement, [spatialFilter], [dialect])

statement is the SQL statement to use.
spatialFilter is an optional geometry to use as a spatial filter on the results. The default is no filter.
dialect is a string specifying the SQL dialect to use. Available options are OGRSQL and SQLite. The default is to use the OGR dialect unless the data source has its own SQL engine (such as a SpatiaLite database).

This function is different from the filtering functions in that it returns a new layer containing the result set rather than only filtering features out of the existing layer. Let’s look at a few examples using this technique, starting with a simple one that returns global countries sorted by population in descending order:

>>> ds = ogr.Open(r'D:osgeopy-dataglobal')
>>> sql = '''SELECT ogr_geom_area as area, name, pop_est
...          FROM 'ne_50m_admin_0_countries' ORDER BY POP_EST DESC'''
>>> lyr = ds.ExecuteSQL(sql)

>>> pb.print_attributes(lyr, 3)
FID    Geometry        area                  name             pop_est
41     MULTIPOLYGON    950.9810937547769     China            1338612970.0
98     MULTIPOLYGON    278.3474038553223     India            1166079220.0
226    MULTIPOLYGON    1115.1781907153158    United States    313973000.0
3 of 241 features

As you can see from these results, the three most populous countries in the world are China, India, and the United States, in that order. The query returns each country’s name and population attributes because you request them in the SQL statement. You also use the special ogr_geom_area field to get the area of each geometry (table 5.2), and the FID and geometry itself are returned automatically. This example uses the default OGR SQL dialect because shapefiles don’t have any built-in SQL support.

Table 5.2. Special fields used in the OGR SQL dialect

Field	Returns
FID	The feature ID.
OGR_GEOMETRY	An OGR geometry type constant (see table 3.1). This is especially useful for data formats that support multiple geometry types in one layer.
OGR_GEOM_WKT	The well-known text (WKT) representation of the feature’s geometry.
OGR_GEOM_AREA	The area of the feature’s geometry. Returns zero for geometries with no area (for example, points or lines).
OGR_STYLE	The style string for the feature, if it exists. Very few applications use this.

If you’re querying a data source that has its own SQL support, that native SQL version will be used. For example, if you have the SQLite driver, you could get the same information from the natural_earth_50m.sqlite database using the SQLite version of SQL. This dialect also allows you to limit the number of returned features, so you could limit the result set to the three countries with the highest populations:

>>> ds = ogr.Open(r'D:osgeopy-dataglobal
atural_earth_50m.sqlite')
>>> sql = '''SELECT geometry, area(geometry) AS area, name, pop_est
...          FROM countries ORDER BY pop_est DESC LIMIT 3'''
>>> lyr = ds.ExecuteSQL(sql)
>>> pb.print_attributes(lyr)
FID    Geometry        area                  name             pop_est
0      MULTIPOLYGON    950.9810937547769     China            1338612970.0
1      MULTIPOLYGON    278.3474038553223     India            1166079220.0
2      MULTIPOLYGON    1115.1781907153158    United States    313973000.0
3 of 3 features

This time you could print attributes for the entire layer, because only three features are returned. You should also notice that now you use the area function instead of a special field name, and if you don’t rename it with the AS area syntax, then it would be called area(geometry) instead. You also have to specifically request the geometry because the SpatiaLite engine doesn’t return the geometry by default.

You can also use ExecuteSQL to join attributes from multiple layers. Take a look at this code and see if you can figure out what it’s doing:

The first thing to notice is that you use the ne_50m_populated_places and ne_50m_admin_0_countries shapefiles and rename them to pp and c, respectively. You do this by putting the alias directly after the layer name. This isn’t necessary, of course, but does make your SQL a lot shorter because those layer names are pretty long. You also link these two layers together by using a join, which allows you to link tables using a shared attribute. Here you use a LEFT JOIN to keep all records in the table on the left (populated places), and if a matching record exists in the table on the right (countries), then you’ll also get data from that record. But how does it figure out what matches? That’s where the ON clause comes in. For each feature in pp, it takes the adm0_a3 attribute value and tries to find a feature in the countries layer that has the same value for its adm0_a3 field. See figure 5.10 for an illustration.

Figure 5.10. An illustration of a SQL query that selects records from the populated places table where adm0cap equals 1, and then gets related data from the countries table based on the adm0_a3 field in both tables.

Now that you know what tables the data are coming from, go back to the beginning of the SQL statement and look at what attribute fields are being requested. You ask for the NAME and POP_MIN fields from the populated places layer, as well as the NAME and POP_EST fields from the countries layer. Because the fields from the two layers have the same names, it makes sense to rename them so that you can tell what’s what. Last, you use a WHERE clause to limit the results to features that represent capital cities (adm0cap = 1).

This technique is handy if you want to see related data from multiple layers at the same time. Without this, you could query city populations and country populations separately, but now you can see the country’s population right beside the city’s. To see this, look at the layer returned by this query:

pb.print_attributes(lyr, 3, geom=False)
FID    city            city_pop    country          country_pop
7      Vatican City    832         Vatican          832.0
48     San Marino      29000       San Marino       30324.0
51     Vaduz           5342        Liechtenstein    34761.0
3 of 200 features

I didn’t print the geometry column because it wouldn’t fit comfortably on the page, but because this uses the OGR SQL dialect, the geometry is returned automatically. But which one: the city or the country? It’s the city, because that’s the main table being used in the join, and corresponding country information is returned only if it existed for a city. You could plot the layer to prove it to yourself if you’d like.

Now check out a similar example using the SQLite dialect, but still shapefile data sources (you could use a SQLite database, of course, but I want to prove that the SQLite dialect will work with other data source types). See if you can spot the differences:

ds = ogr.Open(r'D:osgeopy-dataglobal')
sql = '''SELECT pp.name AS city, pp.pop_min AS city_pop,
             c.name AS country, c.pop_est AS country_pop
         FROM ne_50m_populated_places pp
         LEFT JOIN ne_50m_admin_0_countries c
         ON pp.adm0_a3 = c.adm0_a3
         WHERE pp.adm0cap = 1 AND c.continent = "South America"'''
lyr = ds.ExecuteSQL(sql, dialect='SQLite')
pb.print_attributes(lyr, 3)

The most obvious difference is the inclusion of the dialect parameter to the ExecuteSQL function. But you also add one thing to the SQL that doesn’t work with the OGR dialect. This time the results are limited to cities in South America by checking the value of the continent field in the countries layer. The OGR dialect doesn’t support using fields from the joined table in the WHERE clause, so the only attributes allowed would be ones from the populated places layer. Also, because you need to specifically request geometries if you want them when using the SQLite dialect, no geometries are returned by this particular query. You could add them in by specifying pp.geometry along with the other fields.

If your version of OGR was built with SpatiaLite support (not only SQLite), you can also manipulate geometries within your SQL. Be warned that this could take a while, depending on what you try to do. As an example, if you have SpatiaLite support, try merging all of the counties in California into one big geometry. Start with drawing the individual counties so you have something to compare your results with:

>>> ds = ogr.Open(r'D:osgeopy-dataUS')
>>> sql = 'SELECT * FROM countyp010 WHERE state = "CA"'
>>> lyr = ds.ExecuteSQL(sql)
>>> vp.plot(lyr, fill=False)

This will draw a map of the counties in California, as shown in figure 5.11A. Now try using the SpatiaLite st_union function to merge all of the county polygons into one, as shown in figure 5.11B:

>>> sql = 'SELECT st_union(geometry) FROM countyp010 WHERE state = "CA"'
>>> lyr = ds.ExecuteSQL(sql, dialect='SQLite')
>>> vp.plot(lyr, 'w')

Figure 5.11. Part A, on the left, shows the counties in California drawn individually. Part B, on the other hand, shows the result of running the SpatiaLite `st_union` function on the counties. They’re all joined together into one geometry.

Geometry operations also work with data sources that have their own native SQL flavor and the ability to perform geometry manipulations. SpatiaLite and PostGIS are two obvious examples of this. For example, this is how you’d do the same thing with a PostGIS data source:

conn_str = 'PG:host=localhost user=chrisg password=mypass dbname=geodata'
ds = ogr.Open(conn_str)
sql = "SELECT st_union(geom) FROM us.counties WHERE state = 'CA'"
lyr = ds.ExecuteSQL(sql)
vp.plot(lyr)

Don’t worry if you want to perform operations like this but aren’t using PostGIS or SpatiaLite, because you’ll learn how to do it without databases in the next chapter.

5.4. Taking advantage of filters

Remember back in chapter 3 when you copied all of the capital cities in a global shapefile into a new shapefile? You looped through each feature in the shapefile, checked the appropriate attribute, and copied the feature if it was a capital city. This whole process can be made much easier if the features you want can be selected with filters. Do you remember the CopyLayer method that was introduced in section 4.2.4? As a reminder, it copies an existing layer into a new data source. How do you think you could use this to do something similar to the code back in listing 3.3, but much easier? Think about this problem for a minute and then look at the next example:

ds = ogr.Open(r'D:osgeopy-dataglobal', 1)
in_lyr = ds.GetLayer('ne_50m_populated_places')
in_lyr.SetAttributeFilter("FEATURECLA = 'Admin-0 capital'")
out_lyr = ds.CopyLayer(in_lyr, 'capital_cities2')

Here the call to CopyLayer makes a copy of in_lyr in the ds data source. In this case, it happens to be the same data source as the original layer, but it could be any data source. Because you’ve already set an attribute filter on in_lyr, only the filtered features are copied. That’s certainly easier than checking each one.

If you only want certain attributes, you could use a layer created using ExecuteSQL. Write a SQL query that pulls out the attributes you want and copy the results to a new layer:

sql = """SELECT NAME, ADM0NAME FROM ne_50m_populated_places
         WHERE FEATURECLA = 'Admin-0 capital'"""
in_lyr2 = ds.ExecuteSQL(sql)
out_lyr2 = ds.CopyLayer(in_lyr2, 'capital_cities3')

It should be obvious by now that you can simplify your life by taking advantage of filters and the ExecuteSQL function whenever possible.

5.5. Summary

Attribute filters can be used to efficiently select specific features based on their attribute values.
Spatial filters allow you to select features based on their location by using a bounding polygon or coordinates for a bounding box. The coordinates to set a spatial filter must use the same spatial reference system as the data to be filtered.
Spatial and attribute filters can be combined.
You can use SQL queries to create temporary layers made up of multiple layers joined on attribute values.
You can’t use objects once their owners go out of scope, so if you want to use a geometry after you’ve lost the handle to its feature, make sure you clone the geometry. Always keep your data source open if you want access to the layers. If you break one of these rules, Python will crash and burn.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 5. Filtering data with OGR

Create new playlist

Sign In

Sign Up

Chapter 5. Filtering data with OGR

Definition

5.1. Attribute filters

Table 5.1. Match examples using the LIKE operator

Figure 5.1. The ne_50m_admin_0_countries shapefile layer in the global data folder, with no filters applied

Figure 5.2. An attribute filter that selects countries in Asia has been applied to the countries layer shown in figure 5.1, and the results are plotted on top of the original.

Note to Print Book Readers: Color Graphics

Figure 5.3. An attribute filter that selects countries in South America has been applied to the countries layer and the results plotted on top of the previous data from figure 5.2.

5.2. Spatial filters

Figure 5.4. All of the cities in the populated_places layer in the natural earth dataset

Figure 5.5. A spatial filter has been applied to the populated_places layer to limit the features to those within the boundaries of Germany. These filtered points are shown in as large dots.

Figure 5.6. An attribute filter has been combined with a spatial filter to select the German cities with a population greater than 1,000,000 people. The selected features are shown as squares instead of circles.

Figure 5.7. The spatial filter has been removed, but the attribute filter is still in effect, so now all of the cities in the world with a population of more than 1,000,000 are drawn as triangles over the top of the original dots for all cities.

Figure 5.8. The minimum and maximum x and y values for the rectangle surrounding Australia can be used to set a spatial extent on the global countries layer.

Figure 5.9. The shaded countries were selected using the rectangular extent shown in figure 5.8.

Tip

5.3. Using SQL to create temporary layers

Table 5.2. Special fields used in the OGR SQL dialect

Figure 5.10. An illustration of a SQL query that selects records from the populated places table where adm0cap equals 1, and then gets related data from the countries table based on the adm0_a3 field in both tables.

Figure 5.11. Part A, on the left, shows the counties in California drawn individually. Part B, on the other hand, shows the result of running the SpatiaLite st_union function on the counties. They’re all joined together into one geometry.

5.4. Taking advantage of filters

5.5. Summary

Table of Contents for
Chapter 5. Filtering data with OGR

Figure 5.11. Part A, on the left, shows the counties in California drawn individually. Part B, on the other hand, shows the result of running the SpatiaLite `st_union` function on the counties. They’re all joined together into one geometry.