Up to this point we’ve been working mostly with fictitious data and only glimpsed at real-world data. Using sample data to learn the basics of PostGIS is an excellent beginning. You’re immediately rewarded with results without facing the distractions and the obstacles of real-world data. From this chapter forward, we’re not going to shield you any more.
We start this chapter with coverage of spatial reference systems. We follow up with exercises on determining the spatial reference of source data and selecting suitable ones for storage.
The art and science of modeling our bulbous earth and being able to get a 2D representation on paper have been around since the antiquities. Geodetics is the science of measuring and modeling the earth. Cartography is the science of representing the earth on flat maps. The intricacies of these two venerated sciences are far beyond the scope of this book. After all these mathematical gyrations, we end up with something that’s of utmost importance to GIS: the spatial reference system (SRS).
In this chapter, we’re not going to take the easy way out by accepting SRS without understanding it. We’ll also avoid the path of arcane mathematics necessary to study the science in all its glory. We choose a middle ground so that you can at least have more than a one-sentence explanation of SRS when your kids finally get around to asking you about it. Our journey into the real world begins.
The topic of spatial reference systems is one of the more abstruse in GIS to understand. This is mainly due to the loose way in which people use the term spatial reference system and secondly to its unglamorous nature compared to other areas of GIS. If GIS is Disneyland, think of SRS as the bookkeeping necessary to keep the Disneyland operation afloat.
Take any two paper maps from your collection having one point in common and overlay one atop of the other using as a reference the point they have in common. Both maps represent the whole or a part of earth, but unless you’re extremely lucky, the two maps have no relation to each other. Travel five centimeters right on one map and you can end up on another street. Five centimeters on the other map could put you in another continent. Your two maps don’t overlay well because they don’t have the same spatial reference system. The main reason for the GIS data consumer to become acquainted with SRS is to bring in data from disparate sources in different SRSes and be able to overlay one atop another. Many standards exist to make this task easy without having to delve into the nuances of SRS. The most common one is the European Petroleum Survey Group (EPSG) numbering system. Take any two sources of data with the same EPSG number, and they’ll overlay perfectly. EPSG is a fairly recent SRS numbering system. If you uncover data from a few decades ago, you’ll not find an EPSG number. You’ll have no choice but to delve into the constituent pieces that form a spatial reference system. So what is a spatial reference system?
From outer space, our good earth appears spherical, often described as a blue marble. To anyone living on its surface, nothing can be further from the truth. The slick glossy surface seen from outer space actually comprises mountain ranges, deep canyons, and ocean trenches. The surface of the earth with all its nooks and crannies resembles a slightly charred English muffin much more than a lustrous marble. Even the idea of the earth being spherical isn’t accurate, because the equator bulges out, making a trip around the equator about 42.72 km longer than a trip on one of the meridians.
In light of the fact that we have a deeply pitted and somewhat squashed orange under our feet, what are we going to do? With our new GPS toys we could conceivably represent every square meter on earth as a satellite map, assigning it a spherical 3D coordinate, and be done with it. This is the approach taken by many digital elevation models. Though this brute force computation method could certainly become the standard one day, we still need a simpler and more computationally cost-effective model for most use cases.
A model, by definition, is a simplified representation of reality. All models are inherently flawed in some way or other. In exchange for their shortcomings, they provide us with a more cost-effective way of doing things. A key factor in selecting a model is finding one that balances cost of computation (in speed and complexity) with observed failure. Some models may fail in ways you don’t care about because you’ll never exercise their points of failure. Until the time when we can afford to carry around portable holograms of the earth, we need several cheap models.
A starting point for any 3D model is the choice of definition of the surface of the earth. Do you use the mean sea level? An average of the peaks and valleys? Quite a few options are available, but they all suffer from a common problem; you can’t really go out and set up a standard of measurement that’s applicable around the entire world. Take the notion of sea level, for instance. Someone in Cardiff, England, can say that her house is 50 meters above the sea during low tide and use this as a reference against her neighbor’s house. Suppose a fellow in Pago Pago has a small house and measures his house also to be 50 meters above sea level. What can we say about the relative elevation of the two houses to each other? Not much. Sea level varies from place to place relative to the center of the earth. And even the notion of center of the earth is ambiguous.
Along comes Gauss, who, with the help of a crude pendulum, determined in the early nineteenth century that the surface of the earth should be defined using gravitational measurements. Though he lacked a digital gravity meter, we can picture the idea of going around the surface of the globe with such a device and measuring out a surface where gravity was constant—an equipotential surface. This is the basic idea behind the geoid. We take gravity readings of various sea levels to come up with a consensus and then use this constant gravitational force to map out an equigravitational surface around the globe. Many consider the geoid to be the true figure of the earth.
Surprisingly, the geoid is far from spherical; see figure 6.1. You must not forget that the core of the earth isn’t homogenous. Mass is distributed unevenly, giving rise to bulges and craters that rival those found on the lunar surface. The advent of the geoid didn’t simplify matters. On the contrary, it created even more headaches. The true surface of the earth is now even less marblelike, even a slightly squashed orange is no longer a faithful representation.
Although the geoid is rarely talked about in GIS, it’s the foundation of both planar and geodetic models. In the next section, we’ll discuss the more commonly used ellipsoids, which are simplifications of the geoid and are generally good enough for most geographic modeling needs.
Since ancient times, the point for modeling the earth has always been an ellipsoid of some sort. An ellipsoid is merely a 3D ellipse.
An ellipsoid is composed of three radii: a and b are equatorial radii (along the X and Y axes), and c is the polar radius (along the Z axis). In geodesy only two axes are considered: semi major and semi minor. Spheroids are a subclass of ellipsoids where a = b. A spheroid where c > a is called an oblate spheroid. By the way, if a = b = c, you have a perfect sphere.
By varying the X/Y and polar axes on the ellipsoid, you can model the equatorial bulge. At some point in the history of cartography, people must have postulated one ellipsoid that could be used all around the world—a reference ellipsoid. Everyone can locate each other by finding their placement on the reference ellipsoid. The discovery of the geoid shattered the idea of using a single ellipsoid. One look at the geoid will show why. The geoid paints a picture where the local curvature varies from place to place. An ellipsoid that fits the curvature for one spot may be awfully inaccurate for another; see figure 6.2.
Now instead of one ellipsoid to rule us all, people on different continents want their own to better reflect the regional curvature of the earth. This gave rise to the multitude of ellipsoids we have today. This was all well and good when we didn’t care about people far away from us. This disparate use of different systems became more of an issue with time because of the need for scientists and governments to collaborate and the rise of oil surveying and aviation. Fortunately, today the world is settling on the World Geodetic System (WGS 84) and GRS 80 ellipsoids, with WGS 84 becoming the standard of choice. WGS 84 is what all GPS systems are based on. To call WGS 84 simply an ellipsoid isn’t quite accurate. The WGS 84 GPS systems we use have a geoid component as well. The present WGS 84 system uses the 1996 Earth Gravitational Model (EGM96) geoid and is the best-fitting ellipsoid to the geoid model for the selected survey points in the set.
Common ellipsoids used today are:
The 80 and 84 stand for 1980 and 1984, when the standards came out, and they’re very similar.
Many ellipsoids have been used over the years, and some continue to be used because of their better fit for a particular region. All historical data is still referenced against other ellipsoids. Table 6.1 shows a sampling of some common ellipsoids and their various ellipsoidal parameters.
Ellipsoid |
Equatorial radius (m) |
Polar radius (m) |
Inverse flattening |
Where used |
---|---|---|---|---|
Clarke 1866 | 6,378,206.4 | 6,356,583.8 | 294.9786982 | North America |
NAD 27 | 6,378,206.4 | 6,356,583.8 | 294.978698208 | North America |
Australian 1966 | 6,378,160 | 6,356,774.719 | 298.25 | Australia |
GRS 80 | 6,378,137 | 6,356,752.3141 | 298.257222101 | North America |
WGS 84 | 6,378,137 | 6,356,752.3142 | 298.257223563 | GPS (World) |
IERS 1989 | 6,378,136 | 6,356,751.302 | 298.257 | Time (World) |
One common old ellipsoid is the Clarke 1866 (this is so close to what is called the NAD 27 ellipsoid that they’re synonymous for most purposes). So even though these old data points are measured in longitude and latitude, they aren’t the same longitude and latitude we use today, and they also use different grounding points. They’re shifted.
This is why it’s important to not just call things lon lat. You can have NAD 27 lon lat, NAD 80 lon lat, and WGS 84 lon lat, and each will be subtly different. As a rule, when people nowadays refer to lon lat, they mean WGS 84 datum and WGS 84 spheroid in lon lat units. NAD 27 is the most different because it was done a long time ago. (Note that datum is the shift of a spheroid. See the next section.)
In the next section we’ll discuss the concept of datums and how they fit into the overall picture of the spatial reference system.
The ellipsoid alone only models the overall shape of the earth. After picking out an ellipsoid, you need to anchor it should you ever need to use it for real-world navigation. Every ellipsoid that’s not a perfect sphere has two poles. This is where the axis arrives at the surface. These ellipsoid poles must permanently be tagged to actual points on earth. This is where the datum comes into play. Even if two reference systems use the same ellipsoid, they could still have different anchors, or datum, on earth.
The simplest example of a datum is to look at the tilt between the geographic pole and the magnetic pole. In both models, the earth has the same spherical shape, but one is anchored at the north pole and the other is somewhere in Canada.
To anchor an ellipsoid to a point on earth, you need two types of datum: a horizontal datum to specify where on the plane of the earth to pin down the ellipsoid and a vertical datum to specify the height. For example, the North American Datum of 1927 (NAD 27) is anchored at Meades Ranch in Kansas because it’s close to the geographical centroid of the United States. NAD 27 is both a horizontal and a vertical datum. Here are some commonly used datums:
Many people confuse coordinate reference systems(CRS) with spatial reference systems. A CRS is only a necessary ingredient that goes into the making of a SRS and not the SRS itself. To identify a point on our reference ellipsoid, you need a coordinate system. For use on a reference ellipsoid, the most popular CRS is the geographical coordinate system (also known as geodetic coordinate system or simply as lon lat). You’re already intimately familiar with this coordinate system. You find the two poles on an ellipsoid and draw longitude (meridian) lines from pole to pole. You then find the equator of your ellipsoid and start drawing latitude lines. Keep in mind that even though you’ve only seen geographical coordinate systems used on a globe, the concept applies to any reference ellipsoid. For that matter, it applies to anything resembling an ellipsoid. For instance, a watermelon has nice longitudinal bands on its surface.
Let’s summarize what we discussed thus far about spatial reference systems:
We can quit at this point, because we have all the elements necessary to tag every spot on earth. We can even develop transformation algorithms to convert coordinates based on one ellipsoid in relation to another. Many sources of geographic data do stop at this point and don’t go on to the next step, projections. We term this data unprojected data. All data served up in the form of latitudes and longitudes is unprojected. You can do quite a bit with unprojected data, such as by using the great circle distance formula, you can get distances between any two points. You can also use it to navigate to and from any points on earth.
Projection has distortion built in. The concept of projection generally refers to taking an ellipsoidal earth and squashing it on a flat surface. Because geodetic and 3D globes are ellipsoidal, they by definition don’t refer to a flat surface and are referred to as unprojected. In the next section, we’ll briefly go over the different kinds of projections and why we have them.
So why do we have 2D projections of our ellipsoid or geoid? The obvious reason is eminently practical: You can’t carry a huge globe everywhere you go. Less obvious but more relevant is the mathematical and visual simplicity that comes with planar (Euclidean) geometry.
As we have repeated many times, PostGIS works for the most part on a Cartesian plane, and most of the powerful functions assume a Cartesian model. Your brain and the quite different brain of PostGIS can perform area and distance calculations quickly on a Cartesian plane. On a plane, the area of a square is its side squared. Distance is nothing more than applying the Pythagorean theorem. A planar model fits nicely on a piece of paper. Calculating the area of a square directly on the surface of an ellipsoid becomes quite a challenge, not the least aspect of which is deciding what constitutes a square on an ellipsoid in the first place.
PostGIS 1.5 introduced support for geodetic data using the new datatype geography, similar in concept to SQL Server 2008 geography types. All spatial functions work for geometry data, with only a few functions and operators also for geography, such as distance functions.
How exactly you’d squash an ellipsoidal earth on a flat surface is controlled by several classes of rules we’ll loosely refer to as the classes of Cartesian coordinate systems. Each class of rules tries to optimize for a set of features, each specific instance of a coordinate system is bounded by a particular region on earth, and each uses a particular unit (usually meters or feet).
Needless to say, you try to balance four conflicting features. The importance you place on each will dictate the choice of coordinate system and eventually of the spatial reference system(s):
The general tradeoff is if you want to span a large area, you have to give up measurement accuracy or deal with the pain of maintaining multiple spatial reference systems and some mechanism to shift among them. The larger your area, the less accurate and potentially grossly unusable your measurements will be. If you try to optimize for shape and to cover a large range, your measurements may be off, perhaps way off.
There are a few flavors of projections (squashing) you can do to optimize for different things. These are listed here:
Combinations of these categories form the main classes of planar coordinate systems:
Given all these different options for spatial reference systems, determining which one your source data is in as well as choosing one for storage is often a tricky undertaking. In the next section we’ll show how to select a spatial reference system as well as some simple exercises for determining which spatial reference system your source data is in.
One of the most common questions people ask is what spatial reference system(s) is appropriate for their data. The answer is, it depends.
Table 6.2 lists the most commonly used spatial reference systems and their PostGIS/EPSG SRIDs. PostGIS SRIDs follow the EPSG numberings, so you can assume for sake of argument they’re the same. This isn’t necessarily true for other spatial databases, so keep in mind that a spatial reference system can have several different IDs. Although EPSG is the most common authority on spatial reference systems, it isn’t the only one. Many people, for example, load up their tables with ESRI definitions, which are sometimes identical to EPSG definitions, but under an SRID code that’s more ArcGIS friendly.
EPSG/PostGIS SRID |
Colloquial name |
Range |
Measurement |
Shape |
---|---|---|---|---|
4326 | WGS 84 lon lat | Excellent | Bad | Bad |
3785/900913 (old number) | Spherical Mercator | Good | Bad | Good |
900913 (deprecated) | Google Mercator | Good | Bad | Good |
32601-32760 | UTM WGS 84 Zones | Medium | Fairly good | Good |
2163 | US National Atlas EA | All U.S. | Medium | Medium |
State Planes | US State Planes | Medium | Good | Good |
If you deal with mostly regional data, say for a country or state, then it’s generally best to stick with one of the national grid or State Planes systems. You’ll get fairly good measurement accuracy, and it will also look good on a map.
Be forewarned that because PostGIS 1.4 and lower support only Cartesian coordinate systems, you may have to use several if you need to span large areas and maintain measurement accuracy.
The most common spatial reference system people use is WGS 84 lon lat (EPSG:4326). Aside from the common reason, that people just don’t know any better, the reasons why knowledgeable people use this system are:
Reasons not to use it:
If you’ll be storing your data in WGS 84 spatial reference system and are using PostGIS 1.5, you should consider using the new geography data type that was introduced in PostGIS 1.5. The key benefit it provides over the geometry EPSG:4326 is that it’s ideal for measurement because it’s not projected and measurements are always in meters. Pros are as follows:
So if geography is great, why should you use geometry instead?
Although the basic Mercator projections are horrible for measurement calculations, especially far from the equator, they’re a favorite for web mappers because they look good on a map. The advantage of Google Mercator, for example, is that the whole globe is covered with just one spatial ref.
So if your primary concern is looking good on a map and overlaying on Google Maps with something like OpenLayers, Mercator isn’t a bad option for native storage of data. If you’re concerned with distances and areas, it depends on the accuracy you need. Table 6.3 (generated from code in chapter 8) lists the distances between city pairs measured using various spatial reference systems.
city1 |
city2 |
sp |
spwgs84 |
wm |
---|---|---|---|---|
Beijing | Jerusalem | 7119 | 7135 | 9104 |
Beijing | Melbourne | 9128 | 9095 | 9938 |
Beijing | Philadelphia | 11060 | 11085 | 21330 |
Beijing | Sao Paulo | 17600 | 17601 | 19656 |
Beijing | Shanghai | 1066 | 1065 | 1315 |
Cairo | Jerusalem | 423 | 424 | 494 |
Cairo | Melbourne | 13977 | 13973 | 15024 |
Cairo | Philadelphia | 9154 | 9173 | 11928 |
Cairo | Sao Paulo | 10224 | 10216 | 10667 |
Cairo | Shanghai | 8351 | 8367 | 10045 |
Rio de Janeiro | Jerusalem | 10323 | 10315 | 10808 |
Rio de Janeiro | Melbourne | 13221 | 13240 | 21078 |
Rio de Janeiro | Philadelphia | 7706 | 7680 | 8250 |
Rio de Janeiro | Sao Paulo | 338 | 338 | 368 |
Rio de Janeiro | Shanghai | 18249 | 18256 | 19399 |
Sydney | Jerusalem | 14114 | 14111 | 15040 |
Sydney | Melbourne | 694 | 694 | 858 |
Sydney | Philadelphia | 15895 | 15895 | 26702 |
Sydney | Sao Paulo | 13357 | 13377 | 22041 |
Sydney | Shanghai | 7878 | 7849 | 8354 |
In this table are various city point pairs and their distances measured in WGS 84 sphere (sp), WGS 84 spheroid (spwgs84), and Web Mercator (wm). As you can see, Web Mercator distance precision is much worse than the others and gets worse the farther away two cities are from each other or for regions farther from equator. The computed distance between, for example, Beijing and Philadelphia is really poor with Mercator. The sphere calculations are pretty good for long-range/short-range rule-of-thumb calculations.
This table covers distance, but what about the areas of geometries? How bad is the story there? Again, this depends where you are on the globe, but in general the situation is bad. Table 6.4 shows the areas of 10-meter buffers around the globe, generated from code in chapter 8.
City |
utm_sm |
geog_sm |
wm_sm |
diff_utm_wm |
diff_utm_g |
---|---|---|---|---|---|
Honolulu | 312 | 312 | 362 | 0.13 | 49.48 |
San Francisco | 312 | 312 | 500 | 0.22 | 188.03 |
Boston | 312 | 312 | 572 | 0.02 | 260.22 |
Paris | 312 | 312 | 722 | 0.24 | 409.54 |
Oslo | 312 | 312 | 1240 | 0.18 | 927.74 |
Saint Petersburg | 312 | 312 | 1241 | 0.09 | 929.03 |
Helsinki | 312 | 312 | 1260 | 0.15 | 947.76 |
Bergen | 312 | 312 | 1272 | 0.11 | 959.40 |
Arkhangelsk | 312 | 312 | 1681 | 0.20 | 1368.54 |
Murmansk | 312 | 312 | 2412 | 0.25 | 2100.22 |
It isn’t. If you do your calculation, a perfect 10-meter buffer will give you an area of 10*10*pi(), which is around 314 sq m. The default buffer in PostGIS is a 32-sided polygon (eight points approximate a quarter segment of a circle). You can make this more accurate by using the overloaded version of the ST_Buffer function that allows you to pass in the number of points to approximate a quarter segment.
If you’re in the unfortunate predicament of needing to cover the whole globe with good measurements and shape accuracy, then most likely a single spatial reference system isn’t going to cut it. A common favorite is the UTM family of SRIDs. There are about 60 UTM SRIDs for WGS 84, each covering six-degree longitudinal strips. There is also a series of UTMs for NAD 83, but the WGS 84 one is more common.
You’ll need to figure out the UTM WGS 84 SRID for your particular dataset. There is a function for that in the PostGIS wiki at http://trac.osgeo.org/postgis. The following listing shows a slight variant of that function that takes any geometry and returns the WGS 84 UTM SRID of the centroid of that geometry.
We convert our geometry to a point and then transform it to WGS 84 lon lat. This function assumes the SRIDs are named the same as the EPSG for UTMs, which is the case with the default spatial_ref_sys that comes packaged with PostGIS. We determine whether latitude is positive or negative: UTM EPSG numbers start with 32600 and increment every six degrees. Negative latitude, or 0, starts at 32700. So the final SRID is between these numbers.
If you need to maintain multiple SRIDs, you have three approaches:
There are many philosophies about the correct way to go, and none is right or wrong. For our cases, we’ve found that keeping one SRID (usually 4326) and transforming as needed works best, provided we maintain functional indexes on transforms used for distance calculations. We also like using views as an abstraction layer where the view contains the calculated transform. PostgreSQL supports not only functional indexes but also partial ones. A partial index, for example, allows you to index only part of your data. So in general you should only apply an ST_Transform function for the region defined for a given UTM; otherwise you’ll run into coordinate bounds issues. Generally speaking, it’s best to partition your data using table inheritance and use different transform indexes for each table separately. The following listing is an example of a functional st_transform index and a possible view you may create to take advantage of it.
CREATE INDEX feature_data_the_geom_utm ON feature_data USING gist (st_transform(the_geom, 32611)); CREATE VIEW vwfeature_data AS SELECT gid, f_name, the_geom, ST_Tranform(the_geom,32611) As the_geom_utm FROM feature_data;
In this view, we’re transforming our native data to SRID 32611, which is one of the UTM SRIDs for a region of California in the United States.
Putting functional indexes on ST_Transform is something we do when building a view on our data with the transformed version of the data. It’s a gray zone, in the sense that we’re exploiting a small violation of treating ST_Transform as an immutable function, when technically it isn’t. In PostGIS, the ST_Transform is marked as immutable mostly for performance reasons, which means when you calculate it for a given geometry it can be assumed to never change, and PostgreSQL kindly believes PostGIS and caches it and allows it to be used in functional indexes. Only functions marked as immutable can be used in functional indexes, and in theory a function that relies on a table (except possibly for a static system table in pg_catalog) is at best considered stable (meaning it won’t change within a query given the same inputs). In actuality, it’s a bit of lie that it’s immutable, because it relies on entries in the spatial_ref_sys table. If you happen to change the entry for your transform in the table, you’ll need to reindex your data, otherwise it will be wrong, but then again so would be the case if you kept a second transformed geometry column. We tend to think a bit liberally and think of the spatial_ref_sys table as practically immutable. Though you may add entries, it’s rare that you’d change the definitions of entries once created, and thus the immutability argument is valid.
The other issue with functional indexes is they get dropped when you restore your data, unless you make sure to set the search_path of the ST_Transform function to include the schema the spatial_ref_sys resides in (supported only in PostgreSQL 8.3 and above). Read our diatribe on this topic for more details: http://www.postgresonline.com/journal/index.php?/archives/121-Restore-of-functional-indexes-gotcha.html.
So why do we use it even though it’s a bit of a no-no? The other alternative is to keep a geometry field for your alternative spatial references. This is annoying for two reasons: (1) You have to ensure it’s updated when your main geometry field is updated, which means putting in a trigger. Someone may get confused and update that one instead. (2) The more annoying reason is that if you have big geometries, having a second big geometry in your table slows down updates considerably because of the MVCC nature of PostgreSQL to create a copy of a record during update. It probably slows down selects too because you have a fatter row to contend with. Using ST_Transform on the fly is cheap, but doing an index search on this calculated call isn’t possible without a GIST index on this transformed data.
Often you’ll have to load spatial data into your database that you didn’t create. Before you even worry about what spatial reference you should use to transform your source data to for storage, you first have to figure out what spatial reference system your source data is in. If you guess wrong on that, then all your spatial transformations will be wrong. In the next section we’ll cover how to determine the spatial reference system of a data source.
In this section, we’ll go through some exercises to determine the spatial reference system of source data. This will prepare you for the next chapter, where we finally start loading real data. Before being able to do that, you need to know where you can get free data to play with. Locations for free data can be found in appendix A.
Determining the spatial reference system of your source data is sometimes a fairly easy task and sometimes not. Sometimes a site just tells you the EPSG code for its data, and your work is done. Often, it will give you a text representation of the spatial reference system either in WKT SRS notation or some sort of free text. In these cases you’ll need to match up the description with a record in the spatial_ref_sys table.
With newer ESRI shapefiles there often is a file with a .prj extension giving the spatial reference system information in WKT SRS notation. This file is often used by third-party tools to derive the projection for the case where different layers need to be transformed to the same spatial reference system to be overlaid on a map. In the following exercises, we’ll demonstrate some SRS text descriptions and demonstrate how you can match these with an SRID in the spatial_ref_sys table. In some cases your task may be hard, especially when the record you’re looking for doesn’t exist and you’ll need to add it. We’ll go over that too.
More shockingly, some data comes with no spatial reference information or (even worse) the wrong information. The easiest way to determine this is to overlay a map where you suspect this to be the case on top of a layer for the same region that you know the spatial reference system for and reproject to the suspected projection. Common errors are, for example, using NAD 27 data in a NAD 83 spatial reference system. In these cases you’ll see Doppler-like shifts when you overlay the two. If things are way off, one of your layers won’t even show when you transform it to the same SRS as your known layer. This is the cause for a well-known beginner’s FAQ: “Why don’t I see anything?”
We’ll go over some simple but common exercises for determining the spatial reference system of source data. In these examples we’ll cover picking out key elements in SRS text representations.
Earlier in this chapter, we downloaded the file http://edcftp.cr.usgs.gov/pub/data/nationalatlas/statesp020.tar.gz. But for this particular set, the site gave us a states020.txt file, which gives us spatial reference information as well as lots of details about how the dataset was made and its licensing.
If you scroll down far enough in the file, you’ll see this:
Spatial_Reference_Information: Horizontal_Coordinate_System_Definition: Geographic: Latitude_Resolution: 0.000278 Longitude_Resolution: 0.000278 Geographic_Coordinate_Units: Decimal degrees Geodetic_Model: Horizontal_Datum_Name: North American Datum of 1983 Ellipsoid_Name: GRS1980 Semi-major_Axis: 6378137 Denominator_of_Flattening_Ratio: 298.257222
This is an important piece of information. It tells us that the data is in decimal degrees, and uses ellipsoid GRS1980 and datum North American Datum of 1983. These are the three ingredients you need to know about every data source you have:
If you’re dealing with projected data (non-degree data), there are some other fuzzy pieces you’ll need to know. One is the projection, and depending on the projection, each type of projection has additional parameters:
Once you’ve figured out these pieces, the next thing to do is match your source to a spatial reference system defined in the spatial_ref_sys table and then record the SRID number for it. Sometimes the record you’re seeking isn’t in the table and you’ll need to add it. Living without one is only an option if you know your data is planar, you know the units, and all data you’ll be getting is from the same source and was made using the same spatial reference system. In this case, you’re using the unknown SRID, which is -1 currently in PostGIS but 0 in the OGC standard.
Two fields of information in the spatial_ref_sys table can help you guess at the projection. For the previous data, we do a simple SELECT query to determine the SRID and use the PostgreSQL ILIKE predicate to do a case-insensitive search:
SELECT srid, srtext,proj4text FROM spatial_ref_sys WHERE proj4text ILIKE '%nad83%' AND proj4text ILIKE '%grs80%' AND proj4text ILIKE '%longlat%';
The SELECT query will return one record with SRID 4269. It’s generally easier to query the proj4text field for matches because the proj4text field is much shorter and more consistent than the srtext field.
For this second exercise we grabbed a zip file with Bay Area bridges. The file includes a .prj file, which has projection information: http://gispub02.sfgov.org/website/sfshare/catalog/bayarea_bridges.zip.
The .prj contents look like this:
PROJCS["NAD_1983_StatePlane_California_III_FIPS_0403_Feet", GEOGCS["GCS_North_American_1983", DATUM["D_North_American_1983", SPHEROID["GRS_1980",6378137.0,298.257222101]], PRIMEM["Greenwich",0.0], UNIT["Degree",0.0174532925199433]], PROJECTION["Lambert_Conformal_Conic"], PARAMETER["False_Easting",6561666.666666666], PARAMETER["False_Northing",1640416.666666667], PARAMETER["Central_Meridian",-120.5], PARAMETER["Standard_Parallel_1",37.06666666666667], PARAMETER["Standard_Parallel_2",38.43333333333333], PARAMETER["Latitude_Of_Origin",36.5], UNIT["Foot_US",0.3048006096012192]]
We can surmise from this file based on the PROJCS that the units are measured in feet, it’s NAD83 datum, and the projection is some California State Plane. So now we guess by doing a query:
SELECT srid, srtext,proj4text FROM spatial_ref_sys WHERE srtext ILIKE '%california%' AND proj4text ILIKE '%nad83%' AND proj4text ILIKE '%ft%';
This query yields six records. When we look at the srtext field of each, each has something of the form NAD83 / California zone 1 (ftUS), where the number ranges from 1 to 6. Remembering our Roman numeral lessons from grade school, we recall that III is the Roman numeral for 3. So our answer must be SRID 2227, which has an srtext field that looks like this:
"PROJCS["NAD83 / California zone 3 (ftUS)", GEOGCS["NAD83",DATUM["North_American_Datum_1983", SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]], AUTHORITY["EPSG","6269"]], PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]], UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]], AUTHORITY["EPSG","4269"]], UNIT["US survey foot",0.3048006096012192,AUTHORITY["EPSG","9003"]], PROJECTION["Lambert_Conformal_Conic_2SP"], PARAMETER["standard_parallel_1",38.43333333333333], PARAMETER["standard_parallel_2",37.06666666666667], PARAMETER["latitude_of_origin",36.5], PARAMETER["central_meridian",120.5], PARAMETER["false_easting",6561666.667], PARAMETER["false_northing",1640416.667], AUTHORITY["EPSG","2227"], AXIS["X",EAST],AXIS["Y",NORTH]]"
Now that you have a small grasp of how to match an SRS to one in your table, what do you do if there isn’t one in the table?
Let’s imagine you guessed wrong at the SRID of your data, and you’ve already loaded in all your data. What do you do now? Luckily there’s a maintenance function in PostGIS to help you out in this situation called UpdateGeometrySRID, which will correct the mistake.
SELECT UpdateGeometrySRID('sf', 'bridges', 'the_geom', 2227);
Let’s imagine that we brought our San Francisco data in an unknown with -1 SRID or some wrong spatial reference. This would become quite apparent if we tried to transform our data. If we did and the data was wrong, we’d get errors such as “NaN” when doing distance checks on the transformed data or a transform error when doing the transformation. In the next section we’ll talk a bit about what to do when you have concluded your spatial_ref_sys doesn’t have the spatial reference you’re looking for.
Sometimes you may come up short, and no record in the spatial reference system matches what you’re looking at. The best place to go at that point is http://spatialreference.org.
The spatialreference.org site contains thousands of user-contributed spatial reference systems in addition to the standard ones. Best of all, if the record you’re looking for can’t be found and you happen to have a .prj file, you can submit the contents of that via the Upload Your Own link, and the site will magically determine the INSERT statement you need to use to insert the new item into your spatial_ref_sys table.
The spatial reference site by default assigns an SRID starting with 9 to denote it was grabbed from the spatialreference.org site. For sake of consistency, we replace this SRID number with what is listed in the auth_srid field. By using this convention, you won’t accidentally insert a record into spatial_ref_sys that’s already in the table.
Although it’s possible to create your own custom spatial reference system to suit your specific needs, such a topic is beyond the scope of this book. PostGIS uses the PROJ.4 library to underpin its projection support. For those interested in how to do this, the links to articles in appendix A on spatial reference systems and POJ.4 syntax may be of use.
In this chapter we explained the details of a spatial reference system and what makes up one. We hope from our discussions that you understand their importance, as well as the general rules of thumb for selecting one and determining which ones your source data is using.
In the next chapter we’ll continue our journey into the real world by loading real geographic data. We’ll cover some of the more popular free and open source tools, both packaged and not packaged with PostGIS, that are useful for importing and exporting data. We’ll go over the pros and cons of each as well as provide examples of how to use them.
3.135.196.103