Chapter 6. Spatial reference system considerations

 

This chapter covers

  • Characteristics of spatial reference systems
  • How to determine and select spatial reference systems

 

Up to this point we’ve been working mostly with fictitious data and only glimpsed at real-world data. Using sample data to learn the basics of PostGIS is an excellent beginning. You’re immediately rewarded with results without facing the distractions and the obstacles of real-world data. From this chapter forward, we’re not going to shield you any more.

We start this chapter with coverage of spatial reference systems. We follow up with exercises on determining the spatial reference of source data and selecting suitable ones for storage.

The art and science of modeling our bulbous earth and being able to get a 2D representation on paper have been around since the antiquities. Geodetics is the science of measuring and modeling the earth. Cartography is the science of representing the earth on flat maps. The intricacies of these two venerated sciences are far beyond the scope of this book. After all these mathematical gyrations, we end up with something that’s of utmost importance to GIS: the spatial reference system (SRS).

In this chapter, we’re not going to take the easy way out by accepting SRS without understanding it. We’ll also avoid the path of arcane mathematics necessary to study the science in all its glory. We choose a middle ground so that you can at least have more than a one-sentence explanation of SRS when your kids finally get around to asking you about it. Our journey into the real world begins.

6.1. Spatial reference system: What is it?

The topic of spatial reference systems is one of the more abstruse in GIS to understand. This is mainly due to the loose way in which people use the term spatial reference system and secondly to its unglamorous nature compared to other areas of GIS. If GIS is Disneyland, think of SRS as the bookkeeping necessary to keep the Disneyland operation afloat.

Take any two paper maps from your collection having one point in common and overlay one atop of the other using as a reference the point they have in common. Both maps represent the whole or a part of earth, but unless you’re extremely lucky, the two maps have no relation to each other. Travel five centimeters right on one map and you can end up on another street. Five centimeters on the other map could put you in another continent. Your two maps don’t overlay well because they don’t have the same spatial reference system. The main reason for the GIS data consumer to become acquainted with SRS is to bring in data from disparate sources in different SRSes and be able to overlay one atop another. Many standards exist to make this task easy without having to delve into the nuances of SRS. The most common one is the European Petroleum Survey Group (EPSG) numbering system. Take any two sources of data with the same EPSG number, and they’ll overlay perfectly. EPSG is a fairly recent SRS numbering system. If you uncover data from a few decades ago, you’ll not find an EPSG number. You’ll have no choice but to delve into the constituent pieces that form a spatial reference system. So what is a spatial reference system?

6.1.1. The geoid

From outer space, our good earth appears spherical, often described as a blue marble. To anyone living on its surface, nothing can be further from the truth. The slick glossy surface seen from outer space actually comprises mountain ranges, deep canyons, and ocean trenches. The surface of the earth with all its nooks and crannies resembles a slightly charred English muffin much more than a lustrous marble. Even the idea of the earth being spherical isn’t accurate, because the equator bulges out, making a trip around the equator about 42.72 km longer than a trip on one of the meridians.

In light of the fact that we have a deeply pitted and somewhat squashed orange under our feet, what are we going to do? With our new GPS toys we could conceivably represent every square meter on earth as a satellite map, assigning it a spherical 3D coordinate, and be done with it. This is the approach taken by many digital elevation models. Though this brute force computation method could certainly become the standard one day, we still need a simpler and more computationally cost-effective model for most use cases.

A model, by definition, is a simplified representation of reality. All models are inherently flawed in some way or other. In exchange for their shortcomings, they provide us with a more cost-effective way of doing things. A key factor in selecting a model is finding one that balances cost of computation (in speed and complexity) with observed failure. Some models may fail in ways you don’t care about because you’ll never exercise their points of failure. Until the time when we can afford to carry around portable holograms of the earth, we need several cheap models.

A starting point for any 3D model is the choice of definition of the surface of the earth. Do you use the mean sea level? An average of the peaks and valleys? Quite a few options are available, but they all suffer from a common problem; you can’t really go out and set up a standard of measurement that’s applicable around the entire world. Take the notion of sea level, for instance. Someone in Cardiff, England, can say that her house is 50 meters above the sea during low tide and use this as a reference against her neighbor’s house. Suppose a fellow in Pago Pago has a small house and measures his house also to be 50 meters above sea level. What can we say about the relative elevation of the two houses to each other? Not much. Sea level varies from place to place relative to the center of the earth. And even the notion of center of the earth is ambiguous.

Along comes Gauss, who, with the help of a crude pendulum, determined in the early nineteenth century that the surface of the earth should be defined using gravitational measurements. Though he lacked a digital gravity meter, we can picture the idea of going around the surface of the globe with such a device and measuring out a surface where gravity was constant—an equipotential surface. This is the basic idea behind the geoid. We take gravity readings of various sea levels to come up with a consensus and then use this constant gravitational force to map out an equigravitational surface around the globe. Many consider the geoid to be the true figure of the earth.

Surprisingly, the geoid is far from spherical; see figure 6.1. You must not forget that the core of the earth isn’t homogenous. Mass is distributed unevenly, giving rise to bulges and craters that rival those found on the lunar surface. The advent of the geoid didn’t simplify matters. On the contrary, it created even more headaches. The true surface of the earth is now even less marblelike, even a slightly squashed orange is no longer a faithful representation.

Figure 6.1. A geoid seen from different angles

Although the geoid is rarely talked about in GIS, it’s the foundation of both planar and geodetic models. In the next section, we’ll discuss the more commonly used ellipsoids, which are simplifications of the geoid and are generally good enough for most geographic modeling needs.

6.1.2. Ellipsoids

Since ancient times, the point for modeling the earth has always been an ellipsoid of some sort. An ellipsoid is merely a 3D ellipse.

 

Ellipsoids

An ellipsoid is composed of three radii: a and b are equatorial radii (along the X and Y axes), and c is the polar radius (along the Z axis). In geodesy only two axes are considered: semi major and semi minor. Spheroids are a subclass of ellipsoids where a = b. A spheroid where c > a is called an oblate spheroid. By the way, if a = b = c, you have a perfect sphere.

 

By varying the X/Y and polar axes on the ellipsoid, you can model the equatorial bulge. At some point in the history of cartography, people must have postulated one ellipsoid that could be used all around the world—a reference ellipsoid. Everyone can locate each other by finding their placement on the reference ellipsoid. The discovery of the geoid shattered the idea of using a single ellipsoid. One look at the geoid will show why. The geoid paints a picture where the local curvature varies from place to place. An ellipsoid that fits the curvature for one spot may be awfully inaccurate for another; see figure 6.2.

Figure 6.2. The geoid and the ellipsoid seen together

Now instead of one ellipsoid to rule us all, people on different continents want their own to better reflect the regional curvature of the earth. This gave rise to the multitude of ellipsoids we have today. This was all well and good when we didn’t care about people far away from us. This disparate use of different systems became more of an issue with time because of the need for scientists and governments to collaborate and the rise of oil surveying and aviation. Fortunately, today the world is settling on the World Geodetic System (WGS 84) and GRS 80 ellipsoids, with WGS 84 becoming the standard of choice. WGS 84 is what all GPS systems are based on. To call WGS 84 simply an ellipsoid isn’t quite accurate. The WGS 84 GPS systems we use have a geoid component as well. The present WGS 84 system uses the 1996 Earth Gravitational Model (EGM96) geoid and is the best-fitting ellipsoid to the geoid model for the selected survey points in the set.

Common ellipsoids used today are:

  • GRS 80
  • WGS 84 (more common nowadays and the standard for GPS data)

The 80 and 84 stand for 1980 and 1984, when the standards came out, and they’re very similar.

Many ellipsoids have been used over the years, and some continue to be used because of their better fit for a particular region. All historical data is still referenced against other ellipsoids. Table 6.1 shows a sampling of some common ellipsoids and their various ellipsoidal parameters.

Table 6.1. Common ellipsoids

Ellipsoid

Equatorial radius (m)

Polar radius (m)

Inverse flattening

Where used

Clarke 1866 6,378,206.4 6,356,583.8 294.9786982 North America
NAD 27 6,378,206.4 6,356,583.8 294.978698208 North America
Australian 1966 6,378,160 6,356,774.719 298.25 Australia
GRS 80 6,378,137 6,356,752.3141 298.257222101 North America
WGS 84 6,378,137 6,356,752.3142 298.257223563 GPS (World)
IERS 1989 6,378,136 6,356,751.302 298.257 Time (World)

One common old ellipsoid is the Clarke 1866 (this is so close to what is called the NAD 27 ellipsoid that they’re synonymous for most purposes). So even though these old data points are measured in longitude and latitude, they aren’t the same longitude and latitude we use today, and they also use different grounding points. They’re shifted.

 

Lon lat which ellipsoid?

This is why it’s important to not just call things lon lat. You can have NAD 27 lon lat, NAD 80 lon lat, and WGS 84 lon lat, and each will be subtly different. As a rule, when people nowadays refer to lon lat, they mean WGS 84 datum and WGS 84 spheroid in lon lat units. NAD 27 is the most different because it was done a long time ago. (Note that datum is the shift of a spheroid. See the next section.)

 

In the next section we’ll discuss the concept of datums and how they fit into the overall picture of the spatial reference system.

6.1.3. Datum

The ellipsoid alone only models the overall shape of the earth. After picking out an ellipsoid, you need to anchor it should you ever need to use it for real-world navigation. Every ellipsoid that’s not a perfect sphere has two poles. This is where the axis arrives at the surface. These ellipsoid poles must permanently be tagged to actual points on earth. This is where the datum comes into play. Even if two reference systems use the same ellipsoid, they could still have different anchors, or datum, on earth.

The simplest example of a datum is to look at the tilt between the geographic pole and the magnetic pole. In both models, the earth has the same spherical shape, but one is anchored at the north pole and the other is somewhere in Canada.

To anchor an ellipsoid to a point on earth, you need two types of datum: a horizontal datum to specify where on the plane of the earth to pin down the ellipsoid and a vertical datum to specify the height. For example, the North American Datum of 1927 (NAD 27) is anchored at Meades Ranch in Kansas because it’s close to the geographical centroid of the United States. NAD 27 is both a horizontal and a vertical datum. Here are some commonly used datums:

  • NAD 83 (North American 1983, which is often accompanied with the GRS 80 spheroid)
  • NAD 27 (North American 1927, which is generally accompanied by the Clarke 1866/NAD 27 ellipsoid)
  • European Datum 1950
  • Australian Geodetic System 1984

6.1.4. Coordinate reference system

Many people confuse coordinate reference systems(CRS) with spatial reference systems. A CRS is only a necessary ingredient that goes into the making of a SRS and not the SRS itself. To identify a point on our reference ellipsoid, you need a coordinate system. For use on a reference ellipsoid, the most popular CRS is the geographical coordinate system (also known as geodetic coordinate system or simply as lon lat). You’re already intimately familiar with this coordinate system. You find the two poles on an ellipsoid and draw longitude (meridian) lines from pole to pole. You then find the equator of your ellipsoid and start drawing latitude lines. Keep in mind that even though you’ve only seen geographical coordinate systems used on a globe, the concept applies to any reference ellipsoid. For that matter, it applies to anything resembling an ellipsoid. For instance, a watermelon has nice longitudinal bands on its surface.

6.1.5. Projection

Let’s summarize what we discussed thus far about spatial reference systems:

  • We start by modeling the earth using some variant of a reference ellipsoid, which should be the ellipsoid that deviates least from the geoid for the regions on earth we care about.
  • We use a datum to pin the ellipsoid to an actual place on earth, and we assign a coordinate reference system to the ellipsoid so we can identify every point on the surface. For example, the zero milestone in Washington, D.C. is W -77.03655 and N 38.8951 (in spatial x: -77.03655, y: 38.8951) on a WGS 84 ellipsoid using WGS 84 datum, but on a NAD 27 datum, Clarke 1866 ellipsoid, this would be W -77.03685, N 38.8950.

We can quit at this point, because we have all the elements necessary to tag every spot on earth. We can even develop transformation algorithms to convert coordinates based on one ellipsoid in relation to another. Many sources of geographic data do stop at this point and don’t go on to the next step, projections. We term this data unprojected data. All data served up in the form of latitudes and longitudes is unprojected. You can do quite a bit with unprojected data, such as by using the great circle distance formula, you can get distances between any two points. You can also use it to navigate to and from any points on earth.

Projection has distortion built in. The concept of projection generally refers to taking an ellipsoidal earth and squashing it on a flat surface. Because geodetic and 3D globes are ellipsoidal, they by definition don’t refer to a flat surface and are referred to as unprojected. In the next section, we’ll briefly go over the different kinds of projections and why we have them.

6.1.6. Different kinds of projections

So why do we have 2D projections of our ellipsoid or geoid? The obvious reason is eminently practical: You can’t carry a huge globe everywhere you go. Less obvious but more relevant is the mathematical and visual simplicity that comes with planar (Euclidean) geometry.

As we have repeated many times, PostGIS works for the most part on a Cartesian plane, and most of the powerful functions assume a Cartesian model. Your brain and the quite different brain of PostGIS can perform area and distance calculations quickly on a Cartesian plane. On a plane, the area of a square is its side squared. Distance is nothing more than applying the Pythagorean theorem. A planar model fits nicely on a piece of paper. Calculating the area of a square directly on the surface of an ellipsoid becomes quite a challenge, not the least aspect of which is deciding what constitutes a square on an ellipsoid in the first place.

 

PostGIS 1.5 supports geodetic data

PostGIS 1.5 introduced support for geodetic data using the new datatype geography, similar in concept to SQL Server 2008 geography types. All spatial functions work for geometry data, with only a few functions and operators also for geography, such as distance functions.

 

How exactly you’d squash an ellipsoidal earth on a flat surface is controlled by several classes of rules we’ll loosely refer to as the classes of Cartesian coordinate systems. Each class of rules tries to optimize for a set of features, each specific instance of a coordinate system is bounded by a particular region on earth, and each uses a particular unit (usually meters or feet).

Needless to say, you try to balance four conflicting features. The importance you place on each will dictate the choice of coordinate system and eventually of the spatial reference system(s):

  • Measurement
  • Shape—How accurately it represents angles
  • Direction—Is north really north?
  • Range of area supported

The general tradeoff is if you want to span a large area, you have to give up measurement accuracy or deal with the pain of maintaining multiple spatial reference systems and some mechanism to shift among them. The larger your area, the less accurate and potentially grossly unusable your measurements will be. If you try to optimize for shape and to cover a large range, your measurements may be off, perhaps way off.

There are a few flavors of projections (squashing) you can do to optimize for different things. These are listed here:

  • Cylindrical projections— Imagine a piece of paper rolled around the globe and imprinting the globe on its surface. Then you unroll it to make it flat. The most common of these is the Mercator projection, which has the bottom of the rolled cylinder parallel to the equator. This results in great distortion at the polar regions, whereas measurement accuracy is best the closer you are to the equator, because there the approximation of flat is most accurate.
  • Conic projections— These are sort of like the cylindrical projection except you wrap a cone around the globe, take the imprint of the globe on the cone, and then roll it out.
  • Azimuthal projections— You project a spherical surface onto a plane tangent to the spheroid. Within these three kinds of projections you must also consider the orientation of the paper you roll around the globe. These are the possibilities:
  • Oblique— Neither parallel nor perpendicular to the equator; some other angle
  • Equatorial— Perpendicular to the plane of the equator
  • Transverse— Parallel along the equator

Combinations of these categories form the main classes of planar coordinate systems:

  • Lambert Azimuthal Equal Area (LAEA)— These are reasonably good for measurement and can cover some large areas but are not great for shape. The one we like most when dealing with United States data and when we’re concerned with somewhat decent measurement is US National Atlas (EPSG:2163). This is a meter-based spatial reference system. These are in general not good at maintaining direction or angle.
  • Universal Trans Mercator (UTM)— These are generally good for maintaining measurement and shape and direction but only span six-degree longitudinal strips. If you need to cover the whole globe and you use one of these, you’ll have to maintain about 60 spatial ref IDs. You cannot use them for the polar regions.
  • Mercator— These are good for maintaining shape and direction and span the globe, but they’re not good for measurement, and they make the regions near the poles look huge. The measurements you get from them are nothing less than cartoonish, depending on where you are. The most common Mercator projections in use are variants of World Mercator (SRID:3395) or Spherical Mercator (aka Google Mercator (SRID:900913), which is now an EPSG standard with EPSG:3785. This last one is fairly new, so you may not find it in your spatial_ref_sys table if your PostGIS version is older. They’re common favorites for web map display because you only have to maintain one SRID, and they look good to most people.
  • National Grid Systems— These are generally a variant of UTM or LAEA but are used to define a restricted region such as a country. As mentioned, US National Atlas (SRID:2163, US National Atlas Equal Area) is common for the United States. These are generally decent for measurement (but not super accurate), don’t always maintain good shape, but cover a fair amount of area, which is in many cases the national area you care about.
  • State Plane— These are U.S. spatial reference systems. They’re usually designed for a specific state, and most are derived from UTM. Generally there are two for a state—one measured in meters and one measured in feet—although some larger states have four or more. Optimal for measurement, these are commonly used by state/city land surveyors but, as we said, they can deal with only a single state.
  • Geodetic— PostGIS can store WGS 84 lon lat (4326) as a geometry data type, but more often than not you’ll want to transform it to another spatial reference system or store it in the geography data type for it to be usable. You can sometimes get away with using it as a geometry data type for small distances along the same longitude and when two things intersect, but keep in mind that when you use it, PostGIS is really projecting it. It squashes it on a flat surface, treating longitude as X and latitude as Y, so even though it looks unprojected, in reality it’s projected and in a mostly unusable way. The colloquial name for this kind of projection is Plate Carrée.

Given all these different options for spatial reference systems, determining which one your source data is in as well as choosing one for storage is often a tricky undertaking. In the next section we’ll show how to select a spatial reference system as well as some simple exercises for determining which spatial reference system your source data is in.

6.2. Selecting a spatial reference system to store data

One of the most common questions people ask is what spatial reference system(s) is appropriate for their data. The answer is, it depends.

Table 6.2 lists the most commonly used spatial reference systems and their PostGIS/EPSG SRIDs. PostGIS SRIDs follow the EPSG numberings, so you can assume for sake of argument they’re the same. This isn’t necessarily true for other spatial databases, so keep in mind that a spatial reference system can have several different IDs. Although EPSG is the most common authority on spatial reference systems, it isn’t the only one. Many people, for example, load up their tables with ESRI definitions, which are sometimes identical to EPSG definitions, but under an SRID code that’s more ArcGIS friendly.

Table 6.2. Common spatial reference systems and their fitness for purpose

EPSG/PostGIS SRID

Colloquial name

Range

Measurement

Shape

4326 WGS 84 lon lat Excellent Bad Bad
3785/900913 (old number) Spherical Mercator Good Bad Good
900913 (deprecated) Google Mercator Good Bad Good
32601-32760 UTM WGS 84 Zones Medium Fairly good Good
2163 US National Atlas EA All U.S. Medium Medium
State Planes US State Planes Medium Good Good

If you deal with mostly regional data, say for a country or state, then it’s generally best to stick with one of the national grid or State Planes systems. You’ll get fairly good measurement accuracy, and it will also look good on a map.

Be forewarned that because PostGIS 1.4 and lower support only Cartesian coordinate systems, you may have to use several if you need to span large areas and maintain measurement accuracy.

6.2.1. Pros and cons of using EPSG:4326

The most common spatial reference system people use is WGS 84 lon lat (EPSG:4326). Aside from the common reason, that people just don’t know any better, the reasons why knowledgeable people use this system are:

  • It covers the whole globe and is the most common transport spatial reference system. For example, all GPS data is stored in this SRS. If you need to cover the world, dish out data to lots of people, and also deal with lots of GPS data, this isn’t a bad choice.
  • Most commercial mapping toolkits, although they use some variant of Mercator for display, expect the data to be fed in WGS 84 lon lat. ST_Transform also introduces some rounding errors as you retransform data, so it’s best to transform only once from the source format. ST_Transform is a fairly cheap process, so it’s okay to run it for each geometry if you keep functional indexes on the transformations you use for distance checking.

Reasons not to use it:

  • It’s bad for measurement. If measurement is something you do often, especially when you’re concerned about only small regions such as a country or state, you’ll spend a lot of time transforming back and forth if you use 4326. There are hacks for avoiding this with point data using a combination of ST_Distance_Spheroid/Sphere and ST_DWithin, and in PostGIS 1.5+ you can just use the geography data type instead (in exchange for much fewer functions). For non-point geometries where you need minimum distance rather than distance from centroid, the ST_Distance_Spheroid/Sphere hack doesn’t work for PostGIS 1.4 and below.
  • Things like intersects, intersection, and union generally work fine for small geometries but fall apart for large geometries, like continents or long fault lines.
  • It’s bad for shape. It also doesn’t look good on a map. It’s all squashed because we’re showing longitude and latitude, which are meant to be measured around an ellipsoid, and we’re showing it on a planar axis we call X and Y.

6.2.2. Geography data type for EPSG:4326

If you’ll be storing your data in WGS 84 spatial reference system and are using PostGIS 1.5, you should consider using the new geography data type that was introduced in PostGIS 1.5. The key benefit it provides over the geometry EPSG:4326 is that it’s ideal for measurement because it’s not projected and measurements are always in meters. Pros are as follows:

  • It will more or less work out of the box for you.
  • Distance and area measurements are as good or better than UTM, so if your data covers the globe and you just need distance, area, and length measurements, this is probably the best.
  • Most web mapping layers such as Google, Virtual Earth (Bing), and the like expect data to be fed to them in WGS 84 so geography will work fine out of the box.

So if geography is great, why should you use geometry instead?

  • Processing functions for geography are limited. As of PostGIS 1.5, you can do an ST_Intersection and an ST_Buffer. But these are just wrappers around the geometry implementation that perform behind the scenes a transformation to a suitable planar projection, so it’s not too hard to roll your own functions.
  • Although you can piggyback on the geometry functions for processing by casting and transforming to geometry and casting back, the ST_Transform operation isn’t a lossless operation. ST_Transform introduces some floating-point errors that can quickly accumulate if you do a fair amount of geometry processing.
  • If you’re dealing with regional data, WGS 84 is generally not quite as accurate for measurement as regional spatial reference systems.
  • If you’re building your own mapping app, you’ll still need to learn how to transform your data to other spatial reference systems if you want them to look good on a map, and although the transformation process is fairly cheap, it can quickly become taxing the more data you pull, the more users hitting your database, or the greater number of points you have in a geometry.
  • Not as many tools support geography. In theory, any tool that just uses the ST_AsBinary and other output functions of PostGIS geometries will work fine with geography without any change.

6.2.3. Mapping just for presentation

Although the basic Mercator projections are horrible for measurement calculations, especially far from the equator, they’re a favorite for web mappers because they look good on a map. The advantage of Google Mercator, for example, is that the whole globe is covered with just one spatial ref.

So if your primary concern is looking good on a map and overlaying on Google Maps with something like OpenLayers, Mercator isn’t a bad option for native storage of data. If you’re concerned with distances and areas, it depends on the accuracy you need. Table 6.3 (generated from code in chapter 8) lists the distances between city pairs measured using various spatial reference systems.

Table 6.3. Results of distance calculations in kilometers

city1

city2

sp

spwgs84

wm

Beijing Jerusalem 7119 7135 9104
Beijing Melbourne 9128 9095 9938
Beijing Philadelphia 11060 11085 21330
Beijing Sao Paulo 17600 17601 19656
Beijing Shanghai 1066 1065 1315
Cairo Jerusalem 423 424 494
Cairo Melbourne 13977 13973 15024
Cairo Philadelphia 9154 9173 11928
Cairo Sao Paulo 10224 10216 10667
Cairo Shanghai 8351 8367 10045
Rio de Janeiro Jerusalem 10323 10315 10808
Rio de Janeiro Melbourne 13221 13240 21078
Rio de Janeiro Philadelphia 7706 7680 8250
Rio de Janeiro Sao Paulo 338 338 368
Rio de Janeiro Shanghai 18249 18256 19399
Sydney Jerusalem 14114 14111 15040
Sydney Melbourne 694 694 858
Sydney Philadelphia 15895 15895 26702
Sydney Sao Paulo 13357 13377 22041
Sydney Shanghai 7878 7849 8354

In this table are various city point pairs and their distances measured in WGS 84 sphere (sp), WGS 84 spheroid (spwgs84), and Web Mercator (wm). As you can see, Web Mercator distance precision is much worse than the others and gets worse the farther away two cities are from each other or for regions farther from equator. The computed distance between, for example, Beijing and Philadelphia is really poor with Mercator. The sphere calculations are pretty good for long-range/short-range rule-of-thumb calculations.

This table covers distance, but what about the areas of geometries? How bad is the story there? Again, this depends where you are on the globe, but in general the situation is bad. Table 6.4 shows the areas of 10-meter buffers around the globe, generated from code in chapter 8.

Table 6.4. List of different areas in different regions of the world

City

utm_sm

geog_sm

wm_sm

diff_utm_wm

diff_utm_g

Honolulu 312 312 362 0.13 49.48
San Francisco 312 312 500 0.22 188.03
Boston 312 312 572 0.02 260.22
Paris 312 312 722 0.24 409.54
Oslo 312 312 1240 0.18 927.74
Saint Petersburg 312 312 1241 0.09 929.03
Helsinki 312 312 1260 0.15 947.76
Bergen 312 312 1272 0.11 959.40
Arkhangelsk 312 312 1681 0.20 1368.54
Murmansk 312 312 2412 0.25 2100.22

 

Why is a 10-meter buffer of a point 314 sq m?

It isn’t. If you do your calculation, a perfect 10-meter buffer will give you an area of 10*10*pi(), which is around 314 sq m. The default buffer in PostGIS is a 32-sided polygon (eight points approximate a quarter segment of a circle). You can make this more accurate by using the overloaded version of the ST_Buffer function that allows you to pass in the number of points to approximate a quarter segment.

 

6.2.4. Covering the globe when distance is a concern

If you’re in the unfortunate predicament of needing to cover the whole globe with good measurements and shape accuracy, then most likely a single spatial reference system isn’t going to cut it. A common favorite is the UTM family of SRIDs. There are about 60 UTM SRIDs for WGS 84, each covering six-degree longitudinal strips. There is also a series of UTMs for NAD 83, but the WGS 84 one is more common.

You’ll need to figure out the UTM WGS 84 SRID for your particular dataset. There is a function for that in the PostGIS wiki at http://trac.osgeo.org/postgis. The following listing shows a slight variant of that function that takes any geometry and returns the WGS 84 UTM SRID of the centroid of that geometry.

Listing 6.1. Determing WGS 84 UTM SRID of a geometry

We convert our geometry to a point and then transform it to WGS 84 lon lat. This function assumes the SRIDs are named the same as the EPSG for UTMs, which is the case with the default spatial_ref_sys that comes packaged with PostGIS. We determine whether latitude is positive or negative: UTM EPSG numbers start with 32600 and increment every six degrees. Negative latitude, or 0, starts at 32700. So the final SRID is between these numbers.

If you need to maintain multiple SRIDs, you have three approaches:

  • Store one (usually 4326) and transform on the fly as needed.
  • Maintain one for each region and possibly partition your data by region using table inheritance.
  • Maintain multiple geometries, one field for each you commonly use.

There are many philosophies about the correct way to go, and none is right or wrong. For our cases, we’ve found that keeping one SRID (usually 4326) and transforming as needed works best, provided we maintain functional indexes on transforms used for distance calculations. We also like using views as an abstraction layer where the view contains the calculated transform. PostgreSQL supports not only functional indexes but also partial ones. A partial index, for example, allows you to index only part of your data. So in general you should only apply an ST_Transform function for the region defined for a given UTM; otherwise you’ll run into coordinate bounds issues. Generally speaking, it’s best to partition your data using table inheritance and use different transform indexes for each table separately. The following listing is an example of a functional st_transform index and a possible view you may create to take advantage of it.

Listing 6.2. Using functional indexes
CREATE INDEX feature_data_the_geom_utm
 ON feature_data
 USING gist
 (st_transform(the_geom, 32611));

CREATE VIEW vwfeature_data AS
    SELECT gid, f_name, the_geom,
        ST_Tranform(the_geom,32611) As the_geom_utm
    FROM feature_data;

In this view, we’re transforming our native data to SRID 32611, which is one of the UTM SRIDs for a region of California in the United States.

 

Functional indexes on ST_Transform

Putting functional indexes on ST_Transform is something we do when building a view on our data with the transformed version of the data. It’s a gray zone, in the sense that we’re exploiting a small violation of treating ST_Transform as an immutable function, when technically it isn’t. In PostGIS, the ST_Transform is marked as immutable mostly for performance reasons, which means when you calculate it for a given geometry it can be assumed to never change, and PostgreSQL kindly believes PostGIS and caches it and allows it to be used in functional indexes. Only functions marked as immutable can be used in functional indexes, and in theory a function that relies on a table (except possibly for a static system table in pg_catalog) is at best considered stable (meaning it won’t change within a query given the same inputs). In actuality, it’s a bit of lie that it’s immutable, because it relies on entries in the spatial_ref_sys table. If you happen to change the entry for your transform in the table, you’ll need to reindex your data, otherwise it will be wrong, but then again so would be the case if you kept a second transformed geometry column. We tend to think a bit liberally and think of the spatial_ref_sys table as practically immutable. Though you may add entries, it’s rare that you’d change the definitions of entries once created, and thus the immutability argument is valid.

The other issue with functional indexes is they get dropped when you restore your data, unless you make sure to set the search_path of the ST_Transform function to include the schema the spatial_ref_sys resides in (supported only in PostgreSQL 8.3 and above). Read our diatribe on this topic for more details: http://www.postgresonline.com/journal/index.php?/archives/121-Restore-of-functional-indexes-gotcha.html.

So why do we use it even though it’s a bit of a no-no? The other alternative is to keep a geometry field for your alternative spatial references. This is annoying for two reasons: (1) You have to ensure it’s updated when your main geometry field is updated, which means putting in a trigger. Someone may get confused and update that one instead. (2) The more annoying reason is that if you have big geometries, having a second big geometry in your table slows down updates considerably because of the MVCC nature of PostgreSQL to create a copy of a record during update. It probably slows down selects too because you have a fatter row to contend with. Using ST_Transform on the fly is cheap, but doing an index search on this calculated call isn’t possible without a GIST index on this transformed data.

 

Often you’ll have to load spatial data into your database that you didn’t create. Before you even worry about what spatial reference you should use to transform your source data to for storage, you first have to figure out what spatial reference system your source data is in. If you guess wrong on that, then all your spatial transformations will be wrong. In the next section we’ll cover how to determine the spatial reference system of a data source.

6.3. Determining the spatial reference system of source data

In this section, we’ll go through some exercises to determine the spatial reference system of source data. This will prepare you for the next chapter, where we finally start loading real data. Before being able to do that, you need to know where you can get free data to play with. Locations for free data can be found in appendix A.

Determining the spatial reference system of your source data is sometimes a fairly easy task and sometimes not. Sometimes a site just tells you the EPSG code for its data, and your work is done. Often, it will give you a text representation of the spatial reference system either in WKT SRS notation or some sort of free text. In these cases you’ll need to match up the description with a record in the spatial_ref_sys table.

With newer ESRI shapefiles there often is a file with a .prj extension giving the spatial reference system information in WKT SRS notation. This file is often used by third-party tools to derive the projection for the case where different layers need to be transformed to the same spatial reference system to be overlaid on a map. In the following exercises, we’ll demonstrate some SRS text descriptions and demonstrate how you can match these with an SRID in the spatial_ref_sys table. In some cases your task may be hard, especially when the record you’re looking for doesn’t exist and you’ll need to add it. We’ll go over that too.

More shockingly, some data comes with no spatial reference information or (even worse) the wrong information. The easiest way to determine this is to overlay a map where you suspect this to be the case on top of a layer for the same region that you know the spatial reference system for and reproject to the suspected projection. Common errors are, for example, using NAD 27 data in a NAD 83 spatial reference system. In these cases you’ll see Doppler-like shifts when you overlay the two. If things are way off, one of your layers won’t even show when you transform it to the same SRS as your known layer. This is the cause for a well-known beginner’s FAQ: “Why don’t I see anything?”

6.3.1. Guessing at a spatial reference system

We’ll go over some simple but common exercises for determining the spatial reference system of source data. In these examples we’ll cover picking out key elements in SRS text representations.

Exercise 1: The Us States Data

Earlier in this chapter, we downloaded the file http://edcftp.cr.usgs.gov/pub/data/nationalatlas/statesp020.tar.gz. But for this particular set, the site gave us a states020.txt file, which gives us spatial reference information as well as lots of details about how the dataset was made and its licensing.

If you scroll down far enough in the file, you’ll see this:

Spatial_Reference_Information:
 Horizontal_Coordinate_System_Definition:
 Geographic:
 Latitude_Resolution: 0.000278
 Longitude_Resolution: 0.000278
 Geographic_Coordinate_Units: Decimal degrees
 Geodetic_Model:
 Horizontal_Datum_Name: North American Datum of 1983
 Ellipsoid_Name: GRS1980
 Semi-major_Axis: 6378137
 Denominator_of_Flattening_Ratio: 298.257222

This is an important piece of information. It tells us that the data is in decimal degrees, and uses ellipsoid GRS1980 and datum North American Datum of 1983. These are the three ingredients you need to know about every data source you have:

  • Unit: degrees
  • Ellipsoid: grs1980
  • Datum: nad1983

If you’re dealing with projected data (non-degree data), there are some other fuzzy pieces you’ll need to know. One is the projection, and depending on the projection, each type of projection has additional parameters:

  • Projection: (degree is longlat), eaea, utm, tmerc, lcc, stere

Once you’ve figured out these pieces, the next thing to do is match your source to a spatial reference system defined in the spatial_ref_sys table and then record the SRID number for it. Sometimes the record you’re seeking isn’t in the table and you’ll need to add it. Living without one is only an option if you know your data is planar, you know the units, and all data you’ll be getting is from the same source and was made using the same spatial reference system. In this case, you’re using the unknown SRID, which is -1 currently in PostGIS but 0 in the OGC standard.

Two fields of information in the spatial_ref_sys table can help you guess at the projection. For the previous data, we do a simple SELECT query to determine the SRID and use the PostgreSQL ILIKE predicate to do a case-insensitive search:

SELECT srid, srtext,proj4text
FROM spatial_ref_sys
WHERE proj4text ILIKE '%nad83%'
    AND proj4text ILIKE '%grs80%' AND proj4text ILIKE '%longlat%';

The SELECT query will return one record with SRID 4269. It’s generally easier to query the proj4text field for matches because the proj4text field is much shorter and more consistent than the srtext field.

Exercise 2: San Francisco Data (Reading From .Prj Files)

For this second exercise we grabbed a zip file with Bay Area bridges. The file includes a .prj file, which has projection information: http://gispub02.sfgov.org/website/sfshare/catalog/bayarea_bridges.zip.

The .prj contents look like this:

PROJCS["NAD_1983_StatePlane_California_III_FIPS_0403_Feet",
GEOGCS["GCS_North_American_1983",
DATUM["D_North_American_1983",
  SPHEROID["GRS_1980",6378137.0,298.257222101]],
   PRIMEM["Greenwich",0.0],
UNIT["Degree",0.0174532925199433]],
PROJECTION["Lambert_Conformal_Conic"],
 PARAMETER["False_Easting",6561666.666666666],
PARAMETER["False_Northing",1640416.666666667],
PARAMETER["Central_Meridian",-120.5],
PARAMETER["Standard_Parallel_1",37.06666666666667],
PARAMETER["Standard_Parallel_2",38.43333333333333],
PARAMETER["Latitude_Of_Origin",36.5],
UNIT["Foot_US",0.3048006096012192]]

We can surmise from this file based on the PROJCS that the units are measured in feet, it’s NAD83 datum, and the projection is some California State Plane. So now we guess by doing a query:

SELECT srid, srtext,proj4text
FROM spatial_ref_sys
WHERE srtext ILIKE '%california%' AND proj4text ILIKE '%nad83%'
    AND proj4text ILIKE '%ft%';

This query yields six records. When we look at the srtext field of each, each has something of the form NAD83 / California zone 1 (ftUS), where the number ranges from 1 to 6. Remembering our Roman numeral lessons from grade school, we recall that III is the Roman numeral for 3. So our answer must be SRID 2227, which has an srtext field that looks like this:

"PROJCS["NAD83 / California zone 3 (ftUS)",
GEOGCS["NAD83",DATUM["North_American_Datum_1983",
SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6269"]],
PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],
UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4269"]],
UNIT["US survey foot",0.3048006096012192,AUTHORITY["EPSG","9003"]],
PROJECTION["Lambert_Conformal_Conic_2SP"],
PARAMETER["standard_parallel_1",38.43333333333333],
PARAMETER["standard_parallel_2",37.06666666666667],
PARAMETER["latitude_of_origin",36.5],
PARAMETER["central_meridian",120.5],
PARAMETER["false_easting",6561666.667],
PARAMETER["false_northing",1640416.667],
AUTHORITY["EPSG","2227"],
AXIS["X",EAST],AXIS["Y",NORTH]]"

Now that you have a small grasp of how to match an SRS to one in your table, what do you do if there isn’t one in the table?

Example 3: If You Guess Wrong

Let’s imagine you guessed wrong at the SRID of your data, and you’ve already loaded in all your data. What do you do now? Luckily there’s a maintenance function in PostGIS to help you out in this situation called UpdateGeometrySRID, which will correct the mistake.

SELECT UpdateGeometrySRID('sf', 'bridges', 'the_geom', 2227);

Let’s imagine that we brought our San Francisco data in an unknown with -1 SRID or some wrong spatial reference. This would become quite apparent if we tried to transform our data. If we did and the data was wrong, we’d get errors such as “NaN” when doing distance checks on the transformed data or a transform error when doing the transformation. In the next section we’ll talk a bit about what to do when you have concluded your spatial_ref_sys doesn’t have the spatial reference you’re looking for.

6.3.2. When the spatial reference system is missing

Sometimes you may come up short, and no record in the spatial reference system matches what you’re looking at. The best place to go at that point is http://spatialreference.org.

The spatialreference.org site contains thousands of user-contributed spatial reference systems in addition to the standard ones. Best of all, if the record you’re looking for can’t be found and you happen to have a .prj file, you can submit the contents of that via the Upload Your Own link, and the site will magically determine the INSERT statement you need to use to insert the new item into your spatial_ref_sys table.

 

SpatialReference.org uses the auth_srid field instead of SRID

The spatial reference site by default assigns an SRID starting with 9 to denote it was grabbed from the spatialreference.org site. For sake of consistency, we replace this SRID number with what is listed in the auth_srid field. By using this convention, you won’t accidentally insert a record into spatial_ref_sys that’s already in the table.

 

Although it’s possible to create your own custom spatial reference system to suit your specific needs, such a topic is beyond the scope of this book. PostGIS uses the PROJ.4 library to underpin its projection support. For those interested in how to do this, the links to articles in appendix A on spatial reference systems and POJ.4 syntax may be of use.

6.4. Summary

In this chapter we explained the details of a spatial reference system and what makes up one. We hope from our discussions that you understand their importance, as well as the general rules of thumb for selecting one and determining which ones your source data is using.

In the next chapter we’ll continue our journey into the real world by loading real geographic data. We’ll cover some of the more popular free and open source tools, both packaged and not packaged with PostGIS, that are useful for importing and exporting data. We’ll go over the pros and cons of each as well as provide examples of how to use them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.196.103