Chapter 4. Working with different vector file formats

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4. Working with different vector file formats

This chapter covers

Choosing a vector data file format
Working with various vector data formats
Checking what edits are allowed on a data source

As mentioned in the previous chapter, there are many different vector file formats, and they’re not always interchangeable, at least in a practical sense. Certain formats are more appropriate for certain uses than others. In this chapter you’ll learn several of the differences and their strengths and weaknesses.

Another consideration with format is what you can and can’t do with the data using OGR. In general, working with one type is the same as working with another, but sometimes how you open the data source is different. The larger issue is the difference in capabilities of each driver. For example, certain formats can be read from but not written to, and others can be created but existing data can’t be edited. You’ll also learn how to determine what you can and can’t do with a dataset.

4.1. Vector file formats

Up to this point, you’ve only worked with shapefiles, but many more vector file formats are available. Chances are that you’ll probably only use a handful of them on a regular basis, but you need to have an idea of the available options. Several formats have open specifications and are supported by many different software programs, while others are used more sparingly. Certain formats also support more capabilities than others. Most of these formats allow for easy transfer from one user to another, much like you can give someone else your spreadsheet file. A few use database servers, however, which allows for many users to access and edit the same dataset at a central location, but sometimes makes it more difficult to move the data from one place to another.

4.1.1. File-based formats such as shapefiles and geoJSON

What I call file-based formats are made up of one or more files that live on a disk drive and can be easily transferred from one location to another, such as from your hard drive to another computer or an external drive. Several of these are relational databases, but are designed to be easily moved around (think of Microsoft Access relational databases), so they’re considered file-based for the purposes of this discussion. Several of these formats have open standards so anyone can create software to use them, while others are proprietary and limited to smaller numbers of software. Examples of open formats are GeoJSON, KML, GML, shapefiles, and SpatiaLite.

Spatial data can also be stored in Excel spreadsheets, comma- or tab-delimited files, or other similar formats, although this is most common for point data when only x and y coordinates are required. Most spatial data, however, is stored using formats designed specifically for GIS data. Several of these formats are plain text, meaning that you can open them in any text editor and look at them, and others are binary files that require software capable of understanding them.

As mentioned previously, one advantage of plain text files is that you can open them in a text editor and inspect their contents. You can even edit them by hand, rather than using GIS software, if you’re so inclined. Listing 4.1 shows an example of a GeoJSON file that contains two cities in Switzerland, Geneva and Lausanne, both represented as points.

Listing 4.1. An example GeoJSON file with two features

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": { "NAME": "Geneva", "PLACE": "city" },
      "geometry": {
        "type": "Point",
        "coordinates": [ 6.1465886, 46.2017589 ]
      }

    },
    {
      "type": "Feature",
      "properties": { "NAME": "Lausanne", "PLACE": "city" },
      "geometry": {
        "type": "Point",
        "coordinates": [ 6.6327025, 46.5218269 ]
      }
    },
  ]
}

It’s okay if you don’t understand everything in this example. The point here is that you can open and edit the file in a text editor instead of using GIS software. For example, you could easily fix the spelling of a city name or tweak one of the point coordinates. While we’re on the subject, it’s worth mentioning that small GeoJSON files are automatically rendered as interactive maps when uploaded to GitHub. The example shown here is saved as a gist at https://gist.github.com/cgarrard/8049400. If you have a GitHub account, you can copy this gist to your own account, make changes, and instantly see the result.

Plain text formats such as GeoJSON, KML, and GML are popular for transferring small amounts of data and for web applications, but they don’t work so well for data analysis. For one thing, all three of these formats allow different geometry types to be present in the same dataset, which GIS software doesn’t really appreciate. For example, data in the popular shapefile format contains all points, all lines, or all polygons, but not a mixture. Therefore, a shapefile could contain roads (lines) or city boundaries (polygons), but not both. A GeoJSON file, on the other hand, can contain a combination of all three geometries in the same dataset, such as the roads and city boundaries mentioned previously that would have to live in two different shapefiles. Because you have only one file to download and process, this is an excellent solution for passing data to a web browser so it can render it on a map. However, most GIS software expects only points, only lines, or only polygons, and won’t read the data correctly if it has a mixture. If you need to load the data into GIS software, don’t combine multiple geometry types into one dataset, even when allowed.

Perhaps a more serious problem with plain text formats when it comes to data analysis is that they don’t have the same indexing capabilities as many binary formats. Indexes are used for searching and accessing data quickly. Attribute indexes allow for searching on values in the attribute fields for the features, such as searching for all cities in a dataset with a population over 100,000. Spatial indexes store information about the spatial location of features in the dataset so that searching can be limited to features in a certain geographic area, for example, when you overlay a small watershed polygon on a larger dataset of water-monitoring stations. A spatial index would be used to quickly find the monitoring stations that fall within the watershed boundary. Both of these operations, finding large cities and finding water-monitoring stations, would be slow on large datasets if the appropriate attribute or spatial index didn’t exist. In addition, spatial indexes can help a dataset be drawn more quickly because they help find the features that fall within the viewport. For example, if you’re looking at Asian cities and zoom in on Japan, the spatial index helps find Japanese cities faster while ignoring cities in western China.

These issues aren’t as important with small datasets, but they’re extremely important with large ones. Certain formats have ways around these problems, though. For example, although the KML format doesn’t have true spatial indexes, it does allow for datasets to be broken up into different files for different spatial locations. This allows for smaller datasets to be loaded as a user zooms and pans around the map, which increases rendering speed.

Several vector data formats use familiar desktop-based, or personal, relational database software under the hood. This is true for Esri personal geodatabases and GeoMedia .mdb files, which use Microsoft Access databases to store data. Another example of a vector format based on an existing database format is SpatiaLite, a spatial extension for the SQLite database management system. These vector data formats can take advantage of the capabilities built into the database software, such as indexes. The underlying database also imposes much stricter rules for storing data. For example, all geographic features in a dataset must have the same geometry type and the same set of attribute fields. Similar to the way nonspatial databases can contain multiple tables, a spatial database can contain multiple datasets. Although an individual dataset is limited to a single geometry type, a solitary database file can contain multiple datasets, each with different geometry types and attribute fields. This is convenient for keeping related datasets together and for moving them from disk to disk. Figure 4.1 shows a schematic of a single SpatiaLite database file that contains multiple datasets with different geometries.

Figure 4.1. A sample SpatiaLite database containing multiple layers with different geometry types. All of these various datasets are contained within one easily transportable file.

Other vector formats consist of several files, such as the ever popular shapefile. These datasets store geometries, attribute values, and indexes in separate files. If you move a shapefile from one location to another, you need to ensure that you move all of the required files. Other format types that require multiple files make it a bit easier by using dedicated folders that contain the necessary files. As with shapefiles, you don’t need to know anything about the individual files, but you shouldn’t change anything in the folder. Two examples of formats that use this system are Esri grids and file geodatabases.

Many other vector data formats haven’t been mentioned here, but you should now have an idea of the types of formats and their strengths and weaknesses.

4.1.2. Multi-user database formats such as PostGIS

You’ve seen that file-based formats come in many shapes and sizes, including desktop relational database models such as SpatiaLite. One limitation of these formats is that they don’t allow multiple people to edit, or sometimes even use, a specific dataset at the same time. This is where the multi-user client-server database architecture comes in, because the data are stored in a database that is accessible by multiple clients across the network. Users access data from the server rather than opening a file on a local disk. Although this is certainly not for everyone, it’s a great choice for making data available to many users from a central location. This is especially useful if the data are updated frequently or are used by many different users, because all users will instantly have access to the updated data. It also allows multiple people to edit a dataset at once, which isn’t usually possible with file-based formats. In addition, in many cases the indexing and querying capabilities of these database systems provide faster performance when accessing data.

The most popular client-server database solutions for spatial data include PostgreSQL with the PostGIS spatial extension, ArcSDE, SQL Server, and Oracle Spatial and Graph. If you want to host the data on your own computer, you need to invest in a system like these. My favorite is PostGIS (www.postgis.net) because it’s open source and provides a feature-rich environment with many functions, operators, and indexes that are specific to spatial data. Even with huge amounts of data, you can still get good performance. Although you can’t zip up a PostGIS dataset and email it to a colleague, it comes with utilities to import and export several popular file-based formats, and it’s straightforward to run a query and export the data to a portable format. Not only does PostGIS store the data, but you can use it for many types of analyses as well, without the need for other GIS software. PostGIS also works with raster data.

If you’re not familiar with relational databases, then it might take effort to set one of these systems up and learn how to use it. But it’s extremely powerful and worth the investment in brain cells if you need to give multiple users simultaneous access to data.

4.2. Working with more data formats

Until now we’ve only worked with one data format out of many. The basics don’t change between formats, though. Once you open the data source, reading the data is pretty much the same. But for kicks, let’s look at several formats that support more than one layer, because we haven’t done that yet. Until now, we’ve used the first and only layer in a data source, but if multiple layers exist, you need to know either the name or the index of the one you’re interested in. Generally, I’d use ogrinfo to get this information, but because this is a book on Python, let’s write a simple function that opens a data source, loops through the layers, and prints their names and indexes:

def print_layers(fn):
    ds = ogr.Open(fn, 0)
    if ds is None:
        raise OSError('Could not open {}'.format(fn))
    for i in range(ds.GetLayerCount()):
        lyr = ds.GetLayer(i)
        print('{0}: {1}'.format(i, lyr.GetName()))

This function takes the filename of the data source as a parameter, and the first thing it does is open the file. Then it uses GetLayerCount to find out how many layers the data source contains, and iterates through a loop that many times. Each time through the loop, it uses the i variable to get the layer at the index corresponding to that iteration. Then it prints the name of the layer and its index. This function is included in the ospybook module, and you’ll use it to inspect other data sources in the following examples.

4.2.1. SpatiaLite

Let’s start with a SpatiaLite database. This type of data source can contain many different layers, all with unique (and hopefully descriptive) names. To see this, list the layers in the natural_earth_50m.sqlite file in the data download:

>>> import ospybook as pb
>>> pb.print_layers(r'D:osgeopy-dataglobal
atural_earth_50m.sqlite')
0: countries
1: populated_places

As you can see, the dataset has two layers. How would you get a handle to the populated_places layer? Well, you could use either the index or the layer name, so both ds.GetLayer(1) and ds.GetLayer('populated_places') would do the trick. It’s probably better to use the name rather than the index, however, because the index might change if other layers are added to the data source. To prove that this works, try plotting the layer, which will be dots representing cities around the world, as shown in figure 4.2.

Figure 4.2. The populated_places layer in natural_earth_50m.sqlite

>>> ds = ogr.Open(r'D:osgeopy-dataglobal
atural_earth_50m.sqlite')
>>> lyr = ds.GetLayer('populated_places')
>>> vp = VectorPlotter(True)
>>> vp.plot(lyr, 'bo')

Ogrinfo

GDAL comes with several extremely useful command-line utilities, and in fact, you’ve already seen how to use ogrinfo to find out which vector data formats your version of OGR supports. You can also use ogrinfo to get information about specific data sources and layers. If you pass it a data source name, it will print a list of layers contained in that data source:

D:osgeopy-dataglobal>ogrinfo natural_earth_50m.sqlite
INFO: Open of `natural_earth_50m.sqlite'
      using driver `SQLite' successful.
1: countries (Multi Polygon)
2: populated_places (Point)

You can also use ogrinfo to see metadata about a layer and even all of the attribute data. This example will show a summary only (-so) of the countries layer in the natural earth SQLite database. This includes metadata such as the extent, spatial reference, and a list of attribute fields and their data types. The second will show all attribute values for the first feature in the layer.

ogrinfo -so natural_earth_50m.sqlite countries

To display all of the attribute values for the feature with an FID of 1, you could do something like this, where –q means don’t print the metadata and –geom=NO means don’t print out a text representation of the geometry (which would be long).

ogrinfo -fid 1 -q -geom=NO natural_earth_50m.sqlite countries

See http://www.gdal.org/ogrinfo.html for full ogrinfo documentation.

4.2.2. PostGIS

What about connecting to a database server such as the PostGIS spatial extension for PostgreSQL? Note a couple of extra considerations that you don’t need to worry about with local files. You need to know the connection string to use, which involves host, port, database name, username, and password. You also need permission to connect to the database and tables in question. If you’re not managing your own database server, then you might need to talk to the database administrator to set all of this up. The following example connects to the geodata database being served by a PostgreSQL instance running on my local machine. It won’t work for you unless you go to the trouble to install PostgreSQL and PostGIS, and then set up a database.

>>> pb.print_layers('PG:user=chris password=mypass dbname=geodata')
0: us.counties
1: global.countries
2: global.populated_places
3: time_zones

You see four layers here, but they’re divided up into three different groups, or schemas. The time zones layer is in the default schema, counties is in the us schema, and the remaining two are in the global schema. Every user of the database could have access to different schemas, and even different layers within a schema, depending on how the database administrator has set up the security.

As you can see, you can access PostGIS databases with OGR, but you can do many things with a PostGIS database that aren’t covered in this book. If you’re interested in learning more about it, take a look at PostGIS in Action, also published by Manning.

4.2.3. Folders as data sources (shapefiles and CSV)

In certain cases OGR will treat entire folders as data sources. Two examples of this are the shapefile and comma-delimited text file (.csv) drivers, which can be used to open either individual files or entire folders as data sources. If you use a folder, then each file inside of the folder is treated as a layer. If a folder contains a variety of file types, then the shapefile driver is used. For example, try listing the layers in the US folder:

>>> pb.print_layers(r'D:osgeopy-dataUS')
0: citiesx020 (Point)
1: cities_48 (Point)
2: countyp010 (Polygon)
3: roadtrl020 (LineString)
4: statep010 (Polygon)
5: states_48 (Polygon)
6: volcanx020 (Point)

Compare this list to the contents of the folder, and you’ll see that it listed each of the shapefiles, but none of the others. The CSV driver is a little pickier, however, and wants all of the files in the folder to be CSV files. Although it won’t work with the US folder, it works fine with the csv subfolder. Does this mean that you can’t open a CSV file that’s in a folder with a bunch of other files? Fortunately, no. All you have to do is treat the CSV file itself as a data source with only one layer. You can do the exact same thing with a shapefile by providing the name of the .shp file.

4.2.4. Esri file geodatabases

You Esri users out there might expect to see feature datasets inside file geodatabases treated like the schemas in PostGIS. If so, you’ll be disappointed, because all you see are feature class names. Figure 4.3 shows what the natural_earth file geodatabase looks like in ArcCatalog, but the large_scale feature dataset name isn’t included in the layer names that OGR uses.

Figure 4.3. The natural_earth file geodatabase as seen in ArcCatalog

>>> pb.print_layers(r'D:osgeopy-dataglobal
atural_earth.gdb')
0: countries_10m
1: populated_places_10m
2: countries_110m
3: populated_places_110m

Fortunately, you don’t need the feature dataset name to access the layer, though; the feature class name works fine:

>>> ds = ogr.Open(r'D:osgeopy-dataglobal
atural_earth.gdb')
>>> lyr = ds.GetLayer('countries_10m')

File geodatabases have two different drivers. You can read more about the differences on the OGR website, but one huge difference is that the read-only OpenFileGDB driver is compiled into OGR by default and the read/write FileGDB driver isn’t because it requires a third-party library from Esri. If somebody gave you a file geodatabase that you needed to change but you didn’t have access to the FileGDB driver, you could still use the OpenFileGDB driver to open the geodatabase and copy the data to a format that you could edit. This may not be ideal, but at least you have the option. For example, you could copy the countries_110m feature class in the natural earth geodatabase to a shapefile like this:

gdb_ds = ogr.Open(r'D:osgeopy-dataglobal
atural_earth.gdb')
gdb_lyr = gdb_ds.GetLayerByName('countries_110m')
shp_ds = ogr.Open(r'D:Temp', 1)
shp_ds.CopyLayer(gdb_lyr, 'countries_110m')
del shp_ds, gdb_ds

You haven’t seen the CopyLayer method before. This allows you to easily copy the contents of an entire layer into a new data source or to the same data source but with a different layer name. To use it, you need to get the layer that you want to make a copy of and open the data source that you want to save the copy into. Then call CopyLayer on the data source that will get the copy, and pass it the original layer and a name for the new layer that will be created.

If you do have the Esri FileGDB driver, you can create new file geodatabases, and even feature datasets even though OGR doesn’t show you feature dataset names. Listing 4.2 shows a function that imports all of the layers from a data source into a feature dataset within a file geodatabase, but note that this only works if you have the FileGDB driver. If you try to use this function without that driver installed, you’ll get an error message that says AttributeError: ‘NoneType’ object has no attribute ‘CreateDataSource’.

Listing 4.2. Function to import layers to a file geodatabase

This function requires three parameters: the path to the original data source, the path to the file geodatabase, and the name of the feature dataset to copy the layers into. After opening the original data source, it checks to see if the file geodatabase exists. If it does, then the geodatabase is opened for writing. If it doesn’t exist, it’s created. Feature datasets are specified using layer-creation options, so then a list containing a single option for FEATURE_DATASET is created. After that, all of the layers in the original data source are looped over and copied into the geodatabase while keeping the same layer name (although they’ll be renamed if naming conflicts arise in the geodatabase). If the FEATURE_DATASET layer-creation option wasn’t provided, then the layer will be added to the file geodatabase, but it will be at the top level instead of in a feature dataset.

Now that you have this function, you could copy all of the shapefiles in a folder into a geodatabase like this:

layers_to_feature_dataset(
    r'D:osgeopy-dataglobal', r'D:Temposgeopy-data.gdb', 'global')

If you wanted to have the option of saving the feature classes to the top level of the geodatabase instead of in a feature dataset, you could modify this function so it doesn’t pass the option list to CopyLayer if the dataset_name parameter is None or an empty string.

4.2.5. Web feature services

You can also access online services, such as Web Feature Services (WFS). Let’s try this using a WFS hosted by the United States National Oceanic and Atmospheric Administration (NOAA) that serves out hazardous weather watches and advisories. Start with getting the list of available layers:

>>> url = 'WFS:http://gis.srh.noaa.gov/arcgis/services/watchWarn/' + 
...       'MapServer/WFSServer'
>>> pb.print_layers(url)
0: watchWarn:WatchesWarnings (MultiPolygon)
1: watchWarn:CurrentWarnings (MultiPolygon)

You can loop through these layers like the layers from other data sources, but all of the data are fetched immediately, so there could be quite a lag if the list has lots of features. It looks like the second layer only contains warnings, which are more severe than watches, so it should have less data. Let’s find out what type of warning the first feature represents. I’ve discovered that things crash if I try to use GetFeature with an FID, but you can do it using GetNextFeature:

>>> ds = ogr.Open(url)
>>> lyr = ds.GetLayer(1)
>>> feat = lyr.GetNextFeature()
>>> print(feat.GetField('prod_type'))
Tornado Warning

I can recommend an easier and faster way to get only the first few features if that’s all you want, however. Tack a MAXFEATURES parameter onto your URL, like this:

>>> url += '?MAXFEATURES=1'
>>> ds = ogr.Open(url)
>>> lyr = ds.GetLayer(1)
>>> lyr.GetFeatureCount()
1

You can also work with the geometries from a WFS. Figure 4.4 shows my results when I used VectorPlotter to draw the watchWarn:WatchesWarnings layer on top of states.

Figure 4.4. The WatchesWarnings layer from the NOAA web feature service. If you plot it, your results will differ because this layer shows real-time data.

Let’s do something a little different—save real-time data from a WFS and use it to build a simple web map using Folium, which is a Python module that creates Leaflet maps. If you have no idea what Leaflet is, that’s okay, because you don’t have to know anything about web mapping to work through this example. First you need to install Folium, though. On my Windows computer, I opened up a command prompt and used pip to install Folium and Jinja2 (another module that Folium requires in order to work) for Python 3.3 like this:

C:Python33Scriptspip install Jinja2
C:Python33Scriptspip install folium

If you’re not familiar with installing Python modules via pip, please refer to the installation instructions in appendix A. Now let’s look at the example script, which breaks things out into functions so code can be easily reused. Listing 4.3 contains a function to retrieve stream gauge data from a WFS and save it as GeoJSON; a function to make the web map showing these stream gauges; a function to get a geometry so that the map focuses on a single state instead of the whole country; and a couple of helper functions to format data for the WFS request and the map.

Listing 4.3. Create a web map from WFS data

You can probably understand what the get_state_geom function does and how it does it, because you’ve seen the same process before. It takes a state name as a parameter, finds the corresponding feature in a layer, and returns the cloned geometry. The filename is hardcoded because you assume that the location of this state boundary file won’t change.

The two helper functions are also simple. The get_center function takes a geometry, gets its centroid, and then returns the coordinates as a [y, x] list. The order might seem weird to you, but that’s the order that Folium wants them in for the map.

The get_bbox function takes a geometry and returns its bounding coordinates as a string formatted like min_x,min_y,max_x,max_y. This is the format that a WFS uses to spatially subset results, and it’s how you’ll limit your gauge results to the bounding box of a state. This function takes advantage of the string formatting rules to rearrange the results of GetEnvelope, which returns a geometry’s bounding box (figure 4.5) as a [min_x, max_x, min_y, max_y] list.

Figure 4.5. The line is the bounding box for the state of Oklahoma.

Now let’s look at the slightly more complicated save_state_gauges function. Here you hardcode in the URL for a WFS that returns the observed river stages data from the Advanced Hydrologic Prediction Service. You also create a dictionary containing the parameters to be passed to the WFS. As you already know, the typeNames parameter is the name of the layer to retrieve data from. The version is the WFS version to use, and srsName specifies which coordinate system you’d like your data to be returned in. You can see the available options for this in the WFS’s capabilities output, which you can get by tacking ?request=GetCapabilities onto the end of the service URL and visiting it in a web browser. For example, part of the output from http://gis.srh.noaa.gov/arcgis/services/ahps_gauges/MapServer/WFSServer?request=GetCapabilities looks like this:

<wfs:FeatureType>
    <wfs:Name>ahps_gauges:Observed_River_Stages</wfs:Name>
    <wfs:Title>Observed_River_Stages</wfs:Title>
    <wfs:DefaultSRS>urn:ogc:def:crs:EPSG:6.9:4269</wfs:DefaultSRS>
    <wfs:OtherSRS>urn:ogc:def:crs:EPSG:6.9:4326</wfs:OtherSRS>
    <snip>
</wfs:FeatureType>

From this you can see that the default spatial reference system (DefaultSRS) is EPSG 4269, which happens to be unprojected data using the NAD83 datum. If that doesn’t make much sense, don’t worry about it for now, because you’ll learn all about it in chapter 8. All you need to know now is that web-mapping libraries generally want coordinates that use WGS84, which corresponds to EPSG 4326. Fortunately, that’s listed as an OtherSRS option in the capabilities output, so you insert it into your parameters dictionary:

parms = {
    'version': '1.1.0',
    'typeNames': 'ahps_gauges:Observed_River_Stages',
    'srsName': 'urn:ogc:def:crs:EPSG:6.9:4326',
}
if bbox:
    parms['bbox'] = bbox

If the user provided a bbox parameter to the function, you also insert that into your dictionary. If a bbox parameter is provided to the WFS, it returns features that fall in that box instead of returning all of them. Remember that your get_bbox function creates a string in the correct format for this based on a geometry’s bounding box.

Creating this dictionary wasn’t absolutely necessary, because you could have built your query string the same way you did in earlier examples, but I think that using a dictionary makes it easier to see what parameters are being passed. It’s easy to create the query string from the dictionary by using the urlencode function, which formats everything for you. In Python 2, this function lives in the urllib module, but in Python 3 it lives in urllib.parse, which is why you have the next step in a try/except block. You try to create the query string using the Python 2 function, but if that fails because the script was run with Python 3, then you do it the Python 3 way instead:

try:
    request = 'WFS:{0}?{1}'.format(url, urllib.urlencode(parms))
except:
    request = 'WFS:{0}?{1}'.format(url, urllib.parse.urlencode(parms))

After creating your query string, you use it to open a connection to the WFS and get the layer. You want to save the output to a local file this time, though, so then you create an empty GeoJSON data source. Data sources have a CopyLayer function that copies an existing layer into the data source; this existing layer can be from another data source altogether. You use that function to copy the data from the WFS into your new GeoJSON file:

json_ds.CopyLayer(wfs_lyr, '')

The second parameter to CopyLayer is the name for the new layer, but GeoJSON layers don’t have names, so you pass a blank string. You could pass a real layer name, but it wouldn’t do much good. When your function returns after creating the layer, the data sources go out of scope, so the files get closed automatically, which is why you don’t bother to close them inside the function.

The last function you write is called make_map. It wants a state name along with filenames for the output GeoJSON and HTML files. It can also take other named arguments that get passed to Folium, which allows you to pass optional Folium parameters without having to worry about them in your make_map function:

def make_map(state_name, json_fn, html_fn, **kwargs):
    """Make a folium map."""
    geom = get_state_geom(state_name)
    save_state_gauges(json_fn, get_bbox(geom))
    fmap = folium.Map(location=get_center(geom), **kwargs)
    fmap.geo_json(geo_path=json_fn)
    fmap.create_map(path=html_fn)

The basic outline is shown in figure 4.6, but the first thing this function does is get the geometry for the state of interest. Then it gets the bbox for the geometry and passes that, along with the output GeoJSON filename, to the function that saves the WFS data to file. Then it creates a Folium map centered on the geometry, and also uses any named arguments that the user might have passed in. Remember that ** explodes a dictionary into key/value pairs, so all of the arguments are treated as if they’re an exploded dictionary called kwargs. You can read about the optional parameters at http://folium.readthedocs.org/en/latest/. This map uses OpenStreetMap tiles as the basemap by default, but that’s one of the things you can change.

Figure 4.6. Tasks in the `make_map` function

After creating the basic map, the contents of the GeoJSON file are added and the map is saved to the HTML filename provided by the user. All that’s left is to use it.

os.chdir(r'D:DropboxPublicwebmaps')
make_map('Oklahoma', 'ok.json', 'ok.html',
         zoom_start=7)

I used a Dropbox folder so that I could view the output on the web using the Dropbox public link functionality. You probably won’t have much luck viewing the output straight from your local drive without using a web server. If you don’t have something like Dropbox you can use, check out the sidebar to learn how to start up a simple Python web server on your local machine instead. I wanted to make a map of Oklahoma, and I also passed one of those optional parameters, zoom_start, through to Folium. By default, Folium maps start with a zoom level of 10, which is zoomed in too far to see the entire state. A start level of 7 works much better for this example.

Python SimpleHTTPServer

Python ships with a simple web server that you can use for testing things out, although you probably shouldn’t use it for production websites. The easiest way to use it is to open up a terminal window or command prompt, change to the directory that contains the files you want to serve, and then invoke the server from the command line.

For Python 2:

D:>cd dropboxpublicwebmaps
D:DropboxPublicwebmaps>c:python27python -m SimpleHTTPServer

For Python 3:

D:>cd dropboxpublicwebmaps
D:DropboxPublicwebmaps>c:python33python -m http.server

This will start up a web server running on your local port 8000, so you can get to it in a web browser at http://localhost:8000/. If a file called index.html is in the folder you started the server from (d:dropboxpublicwebmaps, in this case), then that page will automatically be displayed. Otherwise, a list of files in the folder will display, and you can click on one to see it. The URL for the Oklahoma example would be http://localhost:8000/ok.html.

Once you’ve run the script, you can get the Dropbox public link for ok.html and view it in a web browser. If all went well, it will look something like figure 4.7.

Figure 4.7. A simple Folium map made with a GeoJSON file

The map in figure 4.7 shows the location of stream gauges, but other than that, it’s not too useful. Smaller markers would be nice, and so would popups that provide the gauge reading if you click on the marker. Unfortunately, I don’t believe there’s a way to do this by adding a GeoJSON file to the map directly, but it’s not hard to do manually. Let’s add a function to make custom markers, along with a couple of helper functions, and then change the make_map function to use those instead of adding the GeoJSON straight to the map.

Listing 4.4. Custom markers for a Folium map

The first thing you do here is set up colors to use. These come from the online legend for this map service, which is available at http://gis.srh.noaa.gov/arcgis/rest/services/ahps_gauges/MapServer/0. The keys in the colors dictionary are possible values in the Status attribute field, and the values are hex strings that describe a color.

The get_popup function creates an HTML string by exploding the attributes dictionary for a feature and inserting the values in the corresponding placeholders in a template string. For example, the value from the Location field would get inserted in place of “{location}” in the template string.

The markers are created in the add_markers function, which loops through the GeoJSON layer and creates a marker for each point in the layer. This uses the Folium circle_marker function, which wants a [y, x] list as its first argument. This is where the marker will be placed on the map. You used a different color based on the flood status at that location, and also added a popup to go along with the marker. The radius parameter is the marker radius in pixels. Yours are a little larger than the default.

The last steps are to change the make_map function so that it calls add_markers instead of geo_json, and then to create a new map. This time you use Stamen Toner tiles instead of OpenStreetMap, mostly because the markers are easier to see that way. Your output should look like figure 4.8, and if you click on a marker, you’ll see a popup containing the relevant information.

Figure 4.8. A nicer map created by manually constructing colored markers with popups

Although it isn’t the subject of this book, I hope you enjoyed the short foray into web mapping. If you didn’t know anything on the subject and are anything like me, you now have another item on your “to learn” list.

4.3. Testing format capabilities

As mentioned earlier, not all operations are available with all data formats and drivers. How do you find out what’s allowed on your data, other than trying it and crossing your fingers that your code doesn’t crash? Fortunately, drivers, data sources, and layers are all willing to convey that information if you ask. Table 4.1 shows which capabilities you can check for each of these data types.

Table 4.1. Constants used for testing capabilities

Driver capabilities	OGR constant
Create new data sources	ODrCCreateDataSource
Delete existing data sources	ODrCDeleteDataSource
DataSource capabilities	OGR constant
Create new layers	ODsCCreateLayer
Delete existing layers	ODsCDeleteLayer
Layer capabilities	OGR constant
Read random features using GetFeature	OLCRandomRead
Add new features	OLCSequentialWrite
Update existing features	OLCRandomWrite
Supports efficient spatial filtering	OLCFastSpatialFilter
Has an efficient implementation of GetFeatureCount	OLCFastFeatureCount
Has an efficient implementation of GetExtent	OLCFastGetExtent
Create new fields	OLCCreateField
Delete existing fields	OLCDeleteField
Reorder fields in the attribute table	OLCReorderFields
Alter properties of existing fields	OLCAlterFieldDefn
Supports transactions	OLCTransactions
Delete existing features	OLCDeleteFeature
Has an efficient implementation of SetNextByIndex	OLCFastSetNextByIndex
Values of string fields are guaranteed to be UTF-8 encoding	OLCStringsAsUTF8
Supports ignoring fields when fetching feature data, which can speed up data access	OLCIgnoreFields

To check for a given capability, all you have to do is call the TestCapability function on a driver, data source, or layer, and pass a constant from table 4.1 as a parameter. The function will return True if that operation is allowed and False if it isn’t. Try using this to determine if you can add new shapefiles to a folder:

As you probably could’ve guessed, you’re allowed to create new layers when the folder has been opened for writing, but not when it has been opened read-only. How could you use this information to make sure you didn’t attempt to do something that would cause an error? You can modify your code to add checks before you try to do any editing:

This snippet will raise an error and not continue if you aren’t allowed to add fields to the layer. You could catch and handle this error if you needed to, or let it bail out. If you don’t want to handle the errors, the biggest reason for checking beforehand is to make sure that all edits are possible before you start.

For example, what if a layer supported editing fields but not deleting features, and you wanted to do both? If you edited the fields before deleting the features, then part of your changes would take place (the field edits) before your code crashed when trying to delete features. Obviously, this is a problem if you want all or none when it comes to your edits. If partial edits don’t bother you, then you may not want to worry about this issue, but you can avoid the problem by checking capabilities beforehand and not proceeding if you’re not allowed to make all of your changes.

Another option, if partial edits are okay in your book but you still want to handle errors instead of letting the script crash, is to use OGR exceptions. You wouldn’t need to add any code to test capabilities, but you’d need to remember to add ogr.Use-Exceptions() somewhere early in your script. Using this approach, the attempt to delete a feature would still fail, but it then throws a RuntimeError that you could catch.

A function in the ospybook module called print_capabilities will print what capabilities a driver, data source, or layer supports. Here’s how to use it from the Python interactive window:

>>> driver = ogr.GetDriverByName('ESRI Shapefile')
>>> pb.print_capabilities(driver)
*** Driver Capabilities ***
ODrCCreateDataSource: True
ODrCDeleteDataSource: True

Because this function only prints out information, you can’t use it in your code to determine what action to take based on available capabilities. You can use it in an interactive window to determine what actions were allowed on an object, though.

4.4. Summary

The vector file format you choose to use might depend on the application. You might go with GeoJSON for making a web map, but use shapefiles or PostGIS for data analysis.
Perhaps the most popular data transfer format is the shapefile because it’s simple, the specifications are public, and it has been around for a long time.
Formats based on databases, such as SpatiaLite, PostGIS, and Esri geodatabases, tend to be more efficient and support more features than other vector formats.
Although the syntax for opening various data source types differs, once you have the data source open, you can access the layers and features the same way no matter the source.
Multiple layers in a data source can be different from one another. For example, they can have different geometry types, attribute fields, spatial extents, and spatial reference systems.
You can use TestCapability to determine which edits are allowed on your dataset.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 4. Working with different vector file formats

Create new playlist

Sign In

Sign Up

Chapter 4. Working with different vector file formats

4.1. Vector file formats

4.1.1. File-based formats such as shapefiles and geoJSON

Listing 4.1. An example GeoJSON file with two features

Figure 4.1. A sample SpatiaLite database containing multiple layers with different geometry types. All of these various datasets are contained within one easily transportable file.

4.1.2. Multi-user database formats such as PostGIS

4.2. Working with more data formats

4.2.1. SpatiaLite

Figure 4.2. The populated_places layer in natural_earth_50m.sqlite

4.2.2. PostGIS

4.2.3. Folders as data sources (shapefiles and CSV)

4.2.4. Esri file geodatabases

Figure 4.3. The natural_earth file geodatabase as seen in ArcCatalog

Listing 4.2. Function to import layers to a file geodatabase

4.2.5. Web feature services

Figure 4.4. The WatchesWarnings layer from the NOAA web feature service. If you plot it, your results will differ because this layer shows real-time data.

Listing 4.3. Create a web map from WFS data

Figure 4.5. The line is the bounding box for the state of Oklahoma.

Figure 4.6. Tasks in the make_map function

Figure 4.7. A simple Folium map made with a GeoJSON file

Listing 4.4. Custom markers for a Folium map

Figure 4.8. A nicer map created by manually constructing colored markers with popups

4.3. Testing format capabilities

Table 4.1. Constants used for testing capabilities

4.4. Summary

Table of Contents for
Chapter 4. Working with different vector file formats

Figure 4.6. Tasks in the `make_map` function