As the old saying goes, “No man is an island”; the same holds true for geometries. In the previous chapter we concentrated on describing geometries in isolation. We described common properties for geometries and various functions to measure, morph, or transform single geometries. From this chapter forward, we’ll no longer entertain ourselves with one geometry at a time. The richness and the power of spatial queries really come to light when we start working with more than a single geometry. If we liken geometries to tables, an SQL statement that queries from a single table can only go so far. It is only when more than one table gets involved, and we have join operations at our disposal, that things become interesting. Mastery of join operations is what separates the casual database user from the serious database analyst.
Spatial databases have a similar jumping-off point; the casual consumer of a spatial database may use PostGIS to store geometry data or to filter geometries befitting certain conditions. The serious spatial database analyst will be able to write queries that join and morph multiple geometries to solve seemingly intractable problems with brisk elegance.
Although spatial databases are thought of as a tool for geographic information systems, the problems they can tackle aren’t limited to geographic systems. In this chapter we’ll explore the fundamental underpinnings of spatial databases. Spatial databases are all about space—how objects occupy space and interact with other objects in space. Any problem you can state using the physical or abstract concept of vector space is a potential use case for a spatial database. For these exercises we’ll be using the unknown spatial reference system. Later on we’ll delve into spatial reference systems when loading geographic data and cover the special considerations involved with dealing with geographic data.
As the old saying goes, “No pain, no gain.” Working with more than one geometry introduces a new level of conceptual challenges. In non-spatial databases, disparate data interacts through various mathematical or string operations. When one number meets another number, you can add, subtract, multiply, divide, or do some combination thereof. When one string meets another, you can concatenate, or “substring,” one against the other. In spatial databases, when one geometry meets another, things heat up quite a bit. PostGIS has many ways in which the relationship can be consummated. This chapter explores the most commonly used of these relationships. We’ll describe each relationship separately as much as possible in this chapter so that you can gain a solid understanding of what each means. Keep in mind that the full analytical power of spatial SQL usually entails multiple relationship functions, operators, and processing functions being applied in unison.
To avoid muddling meaning while speaking about functions, we suggest that instead of saying ST_SomeRelationship(A,B) say A SomeRelationship B. For example,
ST_Contains(geom1,geom2)
would be read as
geom1 contains geom2
Spatial relationship functions in PostGIS accept two input geometries and return either true or false or another geometry. As the name implies, relationship functions describe how the two input geometries relate to each other spatially. For example, if you want to see if one geometry encloses another, you could use ST_Contains. If you want to see if two geometries rub up against each other, you could use ST_Touches.
Not all relationship functions are commutative. Reversing the order of geometries in non-commutative relationships is a fairly common mistake. For instance, if you want to know if A contains B, reversing the input arguments will give you exactly the opposite answer. The exception is invalid geometries, which often return false regardless of the spatial relationship.
When using spatial relationship functions, the two geometries being compared must both be in either the same spatial reference system or in the unknown spatial reference system. If they aren’t, the function may return an error. Keep in mind that all spatial relationship functions for the geometry data type presuppose a planar projection (Cartesian coordinates), so when using lon lat data (spherical coordinates), use the geography type instead, especially when comparing large areas. The geography data type will model the spatial relationships using a true geodetic model, but it’s not as rich in the number of spatial relationship functions you can use with it. For small areas, the Cartesian spatial relationship assumptions will generally work fine, but as the degree differences increase or you approach the poles, treating lon lat as flat is no longer correct, and you may end up with incorrect results.
Although you can create geometries with X, Y, Z, M, and curved geometries in PostGIS, as far as relationships go (those that return true/false), PostGIS currently ignores the third and fourth dimensions, whereas the GEOS relationship functions reject curved geometries.
However, the geometry relationship output functions like ST_Intersection, ST_Difference, and ST_SymDifference don’t completely ignore the Z coordinate, but they apply the Z coordinate after doing a 2D relationship process. The results are sometimes less than desirable.
As a workaround for the lack of support for curves, you can approximate a curve with a non-curve by converting to non-curve and then applying the relationship ST_Intersects(ST_CurveToLine(a.the_geom,100), ST_CurveToLine(b.the_geom, 100)), where 100 is the number of segments to approximate a quarter circle; the default is 32. The 3D issue is harder to compensate for, but PostGIS 2.0 offers new relationship functions specifically designed for 3D.
We start with intersections, because this is by far the most commonly used relationship between two geometries.
The idea of intersection encompasses a wide range of ways in which geometries can interact. We’ll delve into the nuances in time, but let’s start with the basic definition of intersection: Two geometries intersect when they share space.
PostGIS has two functions that work with intersections. The first is ST_Intersects. It takes two geometries and returns true if any part of those geometries is shared between the two. The other function is ST_Intersection. This function returns a geometry that represents the shared part of the two input geometries. If the geometries don’t intersect, then the intersection is an empty geometry. Both functions are defined in the OGC/SQL-MM specs and so can be found in most databases that follow the ISO SQL-MM model.
An empty geometry is a geometry with no points, but it isn’t NULL. You can create an empty geometry by this command: ST_GeomFromText('GEOMETRYCOLLECTION EMPTY'),. Although you can have an empty polygon, PostGIS silently converts it to an empty geometry collection. In version PostGIS 2.0, an empty polygon will return an empty polygon instead of an empty geometry collection.
We’ll demonstrate the two functions in action with some examples.
In listing 5.1, we start with a polygon and a linestring and see if they intersect and what the resultant geometry looks like. This example is quite common in real-world scenarios. The linestring can represent the planned route for a new roadway. The polygon can represent private property. Our query quickly tells us whether the new road will cut through the private property. If so, we can determine which part of the road falls within the boundaries to determine the cost associated with an eminent domain takeover. Although we show only a simple example, you can imagine how useful this can be if you have all the private properties in a city and want to determine which properties the road will cut through. The route planner can virtually trace any path through the city and obtain an immediate calculation for the eminent domain purchase.
Figure 5.1 shows a planned roadway (linestring), our land mass (polygon), and the resulting intersection geometry—the portion we need to take over by eminent domain.
The code to generate these images is as follows:
SELECT ST_Intersects(g1.geom1,g1.geom2) As they_intersect, GeometryType( ST_Intersection(g1.geom1, g1.geom2)) As intersect_geom_type FROM (SELECT ST_GeomFromText('POLYGON((2 4.5,3 2.6,3 1.8,2 0, -1.5 2.2,0.056 3.222,-1.5 4.2,2 6.5,2 4.5))') As geom1, ST_GeomFromText('LINESTRING(-0.62 5.84,-0.8 0.59)') As geom2) AS g1;
In this code we end up with a MULTILINESTRING, as shown in table 5.1, because the line is cut by the polygon.
they_intersect |
intersect_geom_type |
---|---|
t | MULTILINESTRING |
Should you be unimpressed by the previous example, the next one ought to change your mind.
One of the most common uses of the ST_Intersection function is to clip polygons. Clipping loosely refers to the process of breaking up a geometry into smaller segments or regions. For instance, if you were in charge of sales for a city and had a dozen sales representatives on your staff, you could clip the polygon of the city into 12 sales regions, one for each representative. Another common use is to make your spatial database queries faster by breaking up your geometries beforehand. If you have data covering more area than you generally need to work with, you can clip the original geometry so as to query against a smaller geometry. For example, if you’re working with data covering the entire island of Hispaniola but only need to report on Haiti, you could clip the island using a linestring and so only query against the data covering the Haitian half of the island. We start with an example where we break up an arbitrarily shaped polygon (the one we used in section 5.2.1) into square regions, as shown in figure 5.2. In order to perform this trick we take a rectangle, break it into eight equal-size cells, and then intersect with our starting polygon, as shown in the following listing.
We use the PostgreSQL generate_series to generate two series from min/max x and min to max y and skip every two steps so each square is two units wide and high. This represents the region we want to cut. We combine the two and take the intersection, which results in our geometry being diced.
The well-known text representation of each slice is shown in table 5.2.
grid_xy |
intersect_geom |
---|---|
0 0 | POLYGON((0.5 0.942857142857143 ......)) |
2 0 | POLYGON((2.5 0.9,2 0,0.5 0.942857142857143,0.5 2,2.5 2,2.5 0.9)) |
0 2 | POLYGON((-1.18181818181818 2,-1.5 2.2,..)) |
2 2 | POLYGON((2.26315789473684 4,2.5 3.55,...)) |
0 4 | POLYGON((-1.18179959100204 4,-1.5 4.2,0.5 5....)) |
2 4 | POLYGON((2 4.5,2.26315789473684 4,0.5 4,0.5 5.51428571428571...)) |
2 6 | POLYGON((1.23913043478261 6,2 6.5,2 6,1.23913043478261 6)) |
This example shows how intersections can be useful for partitioning a single geometry into separate records. Notice that the cutting squares don’t need to completely cover the polygon. In this case we left out the last sliver by making our grid not completely cover the extent of the region.
Another important thing to keep in mind is that the geometry type returned by ST_Intersection may look rather different than the input geometries, but it’s guaranteed to be of equal or lower dimension than the lowest dimension geometry. For example, if you have two polygons that share an edge, then the intersection of the two will be the linestring representing the shared edge. Similarly, if you intersect a road with a parcel of land, then the intersection would be possibly a linestring that represents the portion of the road that runs through the parcel of land.
To summarize, if A and B are the input geometries to ST_Intersection, then the following are true:
Now that we’ve covered the basic concept of intersects and intersection, we’ll delve into the finer details of intersecting relationships.
Recall that the definition of intersection involves two geometries sharing space. Sometimes, you may want to have more detail about how the space is shared and have to say something about the space not being shared. For these situations, you have at your disposal many PostGIS functions that focus on the properties of the intersection. Again, these functions rely on the fact that the spatial reference system (SRS) of both geometries is the same, though their geometric dimensions need not be the same. The geometries should also be valid; otherwise, the results can’t be trusted.
Most of the intersection relationship functions rely on the concepts of interior, exterior, and boundary of a geometry and whether these intersect with the interior, exterior, and boundary of the second geometry. In the case of intersection of these three parts, the geometric dimension of the resulting geometry is also important. Will the intersection result in a geometry of zero, one, or two dimensions?
We’ll cover these concepts in more detail in a later section of this chapter. For brevity, this is what the terms mean:
The boundary of a geometry is another geometry. In the case of finite points, the boundary is an empty geometry. You can get the boundary of a geometry with the ST_Boundary function, and this resultant geometry also has an interior, exterior, and boundary. The model of an interior and exterior is a bit harder to fathom without introducing the concept of limit theorems. You can’t adequately represent it with a geometry construct except in the case of interior for points and multipoints. The interior of points is simply the points, and the exterior is the rest of the coordinate space that’s not the points. In the case of lines and polygons, the interior and exterior are limits approaching the boundary of the geometry and therefore not representable by themselves. In short, the model of a geometry is a mathematical trick. In the case of linestrings and polygons, we can’t quantify what an interior is or an exterior is, but we can say there exists an object called a “geometry” composed of an infinite number of points that has an interior, an exterior, and a boundary, where the interior and exterior approach the boundary.
The result of an intersection of these nine pairs can be non-dimensional (no intersection), zero dimensional (finite points), one dimensional (lineal), two dimensional (areal polygons), or a combination thereof in which the dimension is the dimension of the highest dimension element in the collection.
These intersect classes of functions should be used only with valid geometries. The reason is that when there are self-intersections at the boundaries, the concepts of interior, exterior, and boundary are not well defined. This is a common mistake.
The official PostGIS manual has diagrams of these relationships. We’ll try to focus on the corner cases where people have a hard time comprehending the relationships.
We’ll use the house example shown in figure 5.3 for explanation of specific intersection relationships because it’s a simple corner case that exercises the subtleties of all these relationships.
When these geometries are viewed together on the same grid, the overlay looks like figure 5.4.
In this example, we have four geometries that we can envision as any set of objects:
At a particular moment in time, we see all these objects at these particular positions.
We create this particular dataset with the following piece of code.
CREATE TABLE example_set(ex_name varchar(150) PRIMARY KEY, the_geom geometry); INSERT INTO example_set(ex_name, the_geom) VALUES ('A polygon with hole', ST_GeomFromText('POLYGON ((110 180, 110 335, 184 316, 260 335, 260 180, 209 212.51, 110 180), (160 280, 200 240, 220 280, 160 280))') ), ('A point',ST_GeomFromText('POINT(110 245)')) , ('A linestring', ST_GeomFromText('LINESTRING(110 245,200 260, 227 309)')) , ('A multipoint', ST_GeomFromText('MULTIPOINT(110 245,200 260)')) ;
Next we’ll look at Contains and Within.
Contains and Within are companion relationships. If geometry A is within geometry B, then geometry B contains geometry A. The Within and Contains relationships are supported by the PostGIS ST_Within and ST_Contains functions. Both of these functions are OGC/SQL-MM–defined functions, so they can be found in other spatial databases. They have more or less the same meaning in all spatial databases.
One of the confusing but necessary conditions for geometry A to contain geometry B is that the intersection of the boundary of A with B can’t be B. In other words, B can’t sit entirely on the boundary of A. A geometry doesn’t contain its boundary, but a geometry always contains itself.
With the following query we can answer such fascinating questions as, are both greeters inside the house and not in the courtyard? Are they both on the walkway? Is one still at the front door?
SELECT A.ex_name As a_name, B.ex_name As b_name, ST_Contains(A.the_geom, B.the_geom) As a_co_b, ST_Intersects(A.the_geom, B.the_geom) As a_in_b FROM example_set As A CROSS JOIN example_set As B;
The result of this code is shown in table 5.3.
a_name |
b_name |
a_co_b |
a_in_b |
---|---|---|---|
A polygon with hole | A polygon with hole | t | t |
A polygon with hole | A point | f | t |
A polygon with hole | A linestring | f | t |
A polygon with hole | A multipoint | f | t |
A point | A polygon with hole | f | t |
A point | A point | t | t |
A point | A linestring | f | t |
A point | A multipoint | f | t |
A linestring | A polygon with hole | f | t |
A linestring | A point | f | t |
A linestring | A linestring | t | t |
A linestring | A multipoint | t | t |
A multipoint | A polygon with hole | f | t |
A multipoint | A point | t | t |
A multipoint | A linestring | f | t |
A multipoint | A multipoint | t | t |
The example brings to light the following items:
If we were to use ST_Within, it would just be the inverse of ST_Contains—just flip the A and B geometry columns.
As you’ve observed from the Contains example, the concept of OGC/SQL-MM containment is non-intuitive at the boundaries. Most people make the mistake that a geometry should contain its boundary points, and that’s an often-desired feature. This is why PostGIS introduced the concept of Covers and CoveredBy to satisfy this need that interestingly enough also exist in Oracle Spatial via the SDO_RELATE mask=COVEREDBY construct. These functions are called ST_Covers and ST_CoveredBy in PostGIS, and they are not OGC/SQL-MM defined functions. Here are a couple of other caveats:
The following listing and table 5.4 demonstrate situations where ST_Covers covers a geometry but does not contain it.
a_name |
b_name |
a_co_b |
a_in_b |
---|---|---|---|
A polygon with hole | A point | t | t |
A linestring | A point | t | t |
SELECT A.ex_name As a_name, B.ex_name As b_name, ST_Covers(A.the_geom, B.the_geom) As a_co_b, ST_Intersects(A.the_geom, B.the_geom) As a_in_b FROM example_set As A CROSS JOIN example_set As B WHERE NOT (ST_Covers(A.the_geom, B.the_geom) = ST_Contains(A.the_geom, B.the_geom));
In this code we’re going to list only those geometries where the ST_Covers answer is different from the ST_Contains answer. Table 5.4 shows the result of this query.
Because we limited our result to the case where the answer produced by ST_Covers is different from that of ST_Contains, A is considered to cover B even in the case where a geometry sits wholly on the boundary of A. Both the walkway and the house cover the door, but they don’t contain the door.
ContainsProperly is a concept that’s more stringent than the Contains or Covers relationships. Its main benefit is that it’s in general faster to compute than the others, and if you want to exclude geometries that sit partly or wholly on the boundary of another, it’s the one you want. You may want to do this, for example, if you want to make sure your ships are all without a doubt legally within your political boundary.
ContainsProperly will give you the same result as Contains except in the case where any part of geometry B sits on the boundary of A. In listing 5.4 we repeat the same exercise, except that we list only the ContainsProperly options where ST_ContainsProperly gives a different answer from ST_Contains. Here are a couple of caveats to consider:
SELECT A.ex_name As a_name, B.ex_name As b_name, ST_ContainsProperly(A.the_geom, B.the_geom) As a_co_b, ST_Intersects(A.the_geom, B.the_geom) As a_in_b FROM example_set As A CROSS JOIN example_set As B WHERE NOT (ST_ContainsProperly(A.the_geom, B.the_geom) = ST_Contains(A.the_geom, B.the_geom));
Table 5.5 shows the result of this query. Observe that ST_ContainsProperly gives an identical answer to ST_Contains except in the case of an areal geometry, a line geometry compared to itself, or a geometry sitting partly on the boundary of another. A geometry never properly contains itself except in the case of points and multipoints. The reason points and multipoints properly contain themselves is that they consist of a finite number of points and therefore have no boundary to speak of. A point can never be sitting partly on its nonexistent boundary.
a_name |
b_name |
a_co_b |
a_in_b |
---|---|---|---|
A polygon with hole | A polygon with hole | f | t |
A linestring | A linestring | f | t |
A linestring | A multipoint | f | t |
You can see from this that the only cases where the ST_ContainsProperly answer is different from that of ST_Contains are the cases of a polygon against itself, a linestring against itself, or a point or multipoint that partly sits on the boundary of another. This tells us one person must be on the start or end of the walkway because the walkway doesn’t properly contain both people, but it does contain both people.
Two geometries overlap when they’re the same geometric dimension (points, areal, linestring), they intersect, and one is not contained in another. The function that supports overlaps in PostGIS is called ST_Overlaps. This function is an OGC/SQL-MM–compliant function. If we used the same example as we used previously, comparing if each one overlaps the other, we’d find that none overlap. The reasons for that are as follows:
Two geometries are considered to touch if they have at least one point in common but those points don’t lie in the interior of both geometries. The function that supports this relationship is ST_Touches, and it’s an OGC/SQL-MM–defined function. Revisiting our example of the polygon with a hole (house with a courtyard), linestring (walkway), point (door), and multipoint (front and courtyard greeters), we ask which pairs of these touch. Can you guess?
SELECT A.ex_name As a_name, B.ex_name As b_name, ST_Touches(A.the_geom, B.the_geom) As a_tou_b, ST_Contains(A.the_geom, B.the_geom) As a_co_b FROM example_set As A CROSS JOIN example_set As B WHERE ST_Touches(A.the_geom, B.the_geom) ;
The result of this question is listed in table 5.6.
a_name |
b_name |
a_tou_b |
a_in_b |
---|---|---|---|
A polygon with hole | A point | t | f |
A polygon with hole | A multipoint | t | f |
A point | A polygon with hole | t | f |
A point | A linestring | t | f |
A linestring | A point | t | f |
A multipoint | A polygon with hole | t | f |
There are a couple of important points to glean from these results:
Two geometries are said to cross each other if they have some interior points in common but not all. The function that supports crosses is ST_Crosses, and it’s an OGC/ SQL-MM–defined function.
Revisiting our example of the polygon with a hole (the house with a courtyard), linestring (walkway), point (door), and multipoint (front door and courtyard greeters), we ask which pairs of these cross. Can you guess?
SELECT A.ex_name As a_name, B.ex_name As b_name, ST_Crosses(A.the_geom, B.the_geom) As a_cr_b, ST_Contains(A.the_geom, B.the_geom) As a_co_b FROM example_set As A CROSS JOIN example_set As B WHERE ST_Crosses(A.the_geom, B.the_geom) ;
The result of this question is listed in table 5.7.
a_name |
b_name |
a_cr_b |
a_co_b |
---|---|---|---|
A polygon with hole | A linestring | t | f |
A linestring | A polygon with hole | t | f |
We can glean a few things from this example:
The Disjoint relationship is the antithesis of Intersects. It means the two geometries have no interiors or boundaries shared. In the case of invalid geometries, it’s possible for the ST_Intersects and ST_Disjoint functions to both return false. If you see such a thing, you know your geometry is invalid.
The disjoint relationship is supported by the function ST_Disjoint, and it too is an OGC/SQL-MM–defined function.
Now that we’ve covered all the many facets of intersects and intersection, in the next section we’ll take a look at output functions that are very closely related to Intersection. These are the Difference family of functions.
Two output relationship functions are very closely related to Intersection and Intersects. These are Difference and Symmetric Difference. These are much less commonly used than ST_Intersection and return the remainder of an intersection. ST_Difference is a non-commutative function whereas Symmetric Difference is, as the name implies, commutative.
Symmetric Difference is the dark twin of the intersection. It will return the simplest geometric representation of what’s left out when two geometries form an intersection. The ST_Difference function when given a geometry A and B ST_Difference(A,B) returns that portion of A that’s not shared with B.
Here’s one way to think about it:
ST_SymDifference(A,B) = Union(A,B) - Intersection(A,B) ST_Difference(A,B) = A - Intersection(A,B)
In the following listing we repeat a similar exercise to the one we did with Intersection except we’re computing a Difference instead of an Intersection.
This results in a polygon, which is pretty much the same polygon we started out with. This may be surprising to some; you’d expect a linestring would split it. This results in a multilinestring composed of three linestrings where the polygon cuts through. Finally, this results in a geometry collection as expected, composed of a multilinestring and a polygon.
Figure 5.5 is a diagram of the results.
As you can see, the linestring doesn’t bisect the polygon, though this is a common desire in many cases. Why? If you think of a geometry as an infinite set of points, the difference caused by the linestring would be two polygons with partially shared boundaries. The simplest geometry to describe them is the union of these polygons, which is more or less the original polygon, not a geometry collection of two polygons or a multipolygon. It can’t be a multipolygon because a valid multipolygon can’t have polygons that intersect at more than one point.
The result you get of the difference of the polygon from the line is not quite the polygon you started out with. It looks like it, but it has one or two point differences at the boundaries where the line cuts through. This is more an artifact of the differencing operation and not because of any theoretical reason.
One not so elegant hack to achieve bisection is to turn the linestring into a thin knife by buffering it ever so slightly, such that the boundaries of the resulting polygons won’t intersect, and then gluing back the leftover slivers onto one of the resulting polygons. This often is good enough in many cases. The following listing is such a demonstration minus the glue. Table 5.8 and figure 5.6 show the result.
gid |
wktpoly |
---|---|
1 | POLYGON((2 4.5,3 2.6,3 1.8,2 0,-0.760732 1.7353172,...-0.6572407 4.7538133,2 6.5,2 4.5)) |
2 | POLYGON((-0.760732 1.7353173,-1.5 2.2,-0.7274017 2.7074521,-0.760732 1.7353173)) |
3 | POLYGON((-0.6936062 3.6931535,-1.5 4.2,-0.6572407 4.7538133,-0.6936062 3.6931535)) |
SELECT foo.path[1] As gid, ST_AsText( ST_SnapToGrid(foo.geom, 0.0000001)) As wktpoly FROM (SELECT g1.geom2 As the_knife_cut, (ST_Dump(ST_Difference(g1.geom1, g1.geom2))).* FROM (SELECT ST_GeomFromText('POLYGON((2 4.5,3 2.6,3 1.8,2 0, -1.5 2.2,0.056 3.222,-1.5 4.2,2 6.5,2 4.5))') As geom1, ST_Buffer( ST_GeomFromText('LINESTRING(-0.62 5.84,-0.8 0.59)'),0.00000001) As geom2) AS g1 WHERE ST_Intersects(g1.geom1,g1.geom2)) As foo;
As you can see in this example, we buffer the line to make it a very thin polygon we can cut with. This cuts the polygon into three, resulting in a multipolygon, which we then break apart into three polygons using the ST_Dump function you learned about in the previous chapter. In later sections we’ll delve into more precise but advanced options of cutting geometries.
PostGIS 2.0 added the ST_Split function, which allows for splitting a line by a point, line by line, or polygon by line with a single statement. All these can be done much simpler and more exact using ST_Split.
In the coming sections we’ll explore the ST_DWithin relationship function that you’ve seen in earlier examples. It’s very closely related in functionality to ST_Intersects.
We briefly covered the ST_DWithin function in chapter 1. In this section, we’ll cover some more examples of its usage.
ST_DWithin, as mentioned earlier, is used most commonly to find geometries that are close to another or determine if a geometry is within X units of another. This process is often called a nearest neighbor search. In versions of PostGIS prior to 1.3.5, this was shorthand for ST_Expand(A,X) && B AND ST_Expand(B,X) && A AND ST_Distance(A,B) < X.
These expand calls may seem redundant: If one is true then both are true, and if one is false then both are false. The reason both are needed is that depending on how you expand you may end up using an index or not, and SQL planners don’t process statements in order. They first try to process statements in order of cost when computing costs are not more costly than sequential. At runtime one of those checks is generally far more costly than another because one is generally a constant geometry (that’s the one you’d want to expand) and the other is a table with a spatial index on the geometry, which if you expand it will no longer use the spatial index. The planner will choose the least costly way to process the first, so only in the case of positives will the second call be made. Also keep in mind that ST_Expand expands the bounding box, not the geometry, and so returns a box. You can think of ST_Expand as the lazy sibling of ST_Buffer.
From PostGIS 1.3.5 on, additional short-circuit logic is built in.
The ST_DWithin is not an MM/SQL function, but it’s pseudo standards compliant in that the many Web Feature Services (WFS) such as GeoServer, MapServer, MapInfo WFS, and Deegree have a DWithin filter operator that does the same thing. Oracle also has a function called SDO_GEOM.WITHIN_DISTANCE that operates the same way, except the Oracle version accepts a unit of measure as argument, whereas the PostGIS one always assumes the units are the units defined for the spatial reference system of the geometries.
Another side use of this function is as a substitute for the ST_Intersects function. This usage we term as doing an intersects with tolerance.
Here are a few reasons why you may choose to use this function instead of the more obvious ST_Intersects:
When used in this manner, the distance parameter must be set to very small to represent the maximum distance error to consider two geometries as intersecting. For example, you may consider the linestring and a point to be intersecting if they’re within 0.01 units of each other:
SELECT ST_DWithin( ST_GeomFromText('LINESTRING(1 2, 3 4)'), ST_Point(3.00001,4.000001), 0.0001 ) ;
Next we’ll look at finding the N closest objects.
There are many ways of doing a nearest neighbor search. The classic example is finding, for example, the nearest five roads to a particular location that are within 10 kilometers of the location:
SELECT r.name, ST_Distance(r.geom, loc.geom)/1000 As dist_km FROM ch01.roads As r INNER JOIN (SELECT ST_Transform( ST_SetSRID(ST_Point(-118.42494, 37.31942), 4326), 2163) As geom) As loc ON ST_DWithin(r.geom, loc.geom, 10000) ORDER BY ST_Distance(r.geom, loc.geom) LIMIT 5;
In this example we’re concerned only with finding roads near a single point. The location needn’t be a point; it can just as well be a lake, a river, or a building that’s represented as a linestring or a polygon or even a geometry collection. Here we’re assuming our table geometries are stored in US National Atlas Equal Area meter projection (2163) and our point of interest is in lon lat (4326), so we transform to (2163) to do a meter-based search. The ST_DWithin check as described in earlier chapters returns true if any point on geometry A is within X distance of any point on geometry B. The X is always in the units of the spatial reference system of those geometries, and the SRID of the two geometries must be the same.
As you’ll note, with ST_DWithin you need to provide a limit distance. If you wanted to find the five closest roads, doing a simple order by ST_Distance(....) limit 5 without the ST_DWithin, it would be very slow because the query would resort to what’s called a table scan—calculating the distance of loc to every geometry in the table and returning the five closest. ST_DWithin reduces the set of false positives significantly. Thus it requires that you know something about your data—that you know all the top five geometries are within 10 kilometers. The less you know about your data, the larger you should make your X, and in return the slower your query will become because it will grab more false positives to inspect. Some of this will change in PostgreSQL 9.1/ 9.2 and PostGIS 2.0+ because k nearest neighbors (kNN) scanning will be added to GIST indexes in PostgreSQL 9.1 or PostgreSQL 9.2 that may make this exercise much easier and efficient.
Another common use case is to find the closest from some other geometry. For this kind of query the DISTINCT ON custom SQL addition of PostgreSQL in conjunction with ST_DWithin comes in very handy. Listing 5.7 is a classic example of its use. DISTINCT ON is guaranteed to return at most one record for any DISTINCT set defined in the ON part. In this example we do the same as previously, except we do it for all locations of interest and return the closest road to each location of interest.
SELECT DISTINCT ON(loc.loc_name, loc.loc_type) loc.loc_name, loc.loc_type, r.road_name, ST_Distance(r.the_geom, loc.geom)/1000 As dist_km FROM ch05.loc LEFT JOIN ch05.road As r ON ST_DWithin(r.the_geom, loc.geom, 10000) ORDER BY loc.loc_name, loc.loc_type, ST_Distance(r.the_geom, loc.geom) ;
There are three important things about using DISTINCT ON:
Sometimes you want n nearest neighbors for each of your location records; sadly neither DISTINCT ON nor the LIMIT clause will let you do this in one query. Luckily Post-greSQL 8.4 has what are called Window functions. Window functions are an ANSI SQL 2003 standard piece you’ll find in all the high-end enterprise-class commercial databases (Oracle, IBM DB2, and SQL Server 2005/2008). The windowing support in PostgreSQL is more feature rich than in SQL Server 2005/2008, but it’s not quite as feature rich as what you’ll find in Oracle or IBM DB2. Read the SQL Primer appendix for more details of what ANSI windowing functionality is supported and what is not.
The following example requires PostgreSQL 8.4. It will return the top five closest roads for each location of interest that are within 20 kilometers.
This query is run against the fictitious poorly designed town we demonstrated at Post-greSQL Conference 2009 (PGCON2009). The code to build the town is available for download from http://www.postgis.us/presentations. The dataset already built is included in the chapter 5 code data download.
Table 5.9 shows a sampling of the results. There are three important things to take away from this example:
Pid |
land_type |
row_num |
road_name |
dist_km |
---|---|---|---|---|
000001038 | police station | 1 | ||
000001202 | police station | 1 | ||
000002997 | police station | 1 | Main Rd | 0.25 |
000003927 | police station | 1 | Main Rd | 0.07 |
000006442 | police station | 1 | ||
000010131 | police station | 1 | Main Rd | 0.23 |
000010131 | police station | 2 | Curvy St | 0.34 |
000013872 | police station | 1 | ||
000015423 | police station | 1 | Elephantine Rd | 0.45 |
Table 5.9 lists partial results from our query. Observe that for police stations farther than one kilometer from a road, we still get back a record—but the street slots are filled in with NULLs. Some police stations have more than one road within one kilometer (for example, 000010131), and for that one we get two records back with the first including the closest road and the second the next closest.
Now that we’ve covered the basics of proximity analysis, we’ll move on to bounding boxes and basic geometry comparators. Although we didn’t stress it in this section, what makes proximity queries fast are the box-based short-circuit comparisons. Bounding boxes and their geometry operators are used by spatial indexes to reduce the set of records that need to be more closely scrutinized by the exact comparison operators. We’ll cover this in the next section.
Recall from the previous chapter that every geometry has a bounding box, defined as the smallest rectangular box that completely encloses the geometry. Bounding boxes play a critical role when two geometries interact. Now that we’ve covered the basics of proximity analysis, we’ll move on to bounding boxes and basic geometry comparators.
Let’s demonstrate with a quick example. Suppose we have two multipolygons, one representing the state of Washington and one representing the state of Florida. If the bounding box of Washington is strictly above and to the left of (northwest of) Florida, then we know for sure that the actual geometries share the same relationship as well. Of course, if the bounding boxes check is false, we can’t be sure. Remember now that both states have numerous islands hanging off their coasts. In order to answer the questions unequivocally, we’d have to pretty much visit every point in Washington and compare it to every point in Florida. Only after this exhaustive point-by-point checking could we conclude that all of Washington is northwest of all of Florida. The bounding box methodology shortcuts the point-by-point checking by first drawing rectangular boxes around each state and then asking if the box enclosing Washington is above and to the left of the box enclosing Florida. We can obtain an answer almost instantly. Furthermore, because a rectangular box is completely specified by the coordinate of two opposing corners, we can pre-calculate all the bounding boxes for geometries in our table and store their coordinates in indexes. Once we have the bounding box of every geometry indexed, comparing any two geometries becomes a simple task of comparing two pairs of numbers.
Bounding boxes are so fundamental to the spatial queries that PostGIS always assumes that you’d take advantage of them, freeing you from having to worry about explicitly calculating bounding boxes and creating indexes out of them.[1] Certainly, we can foresee many instances where bounding boxes won’t do us much good in the end. Suppose we want to know if the centroid of Washington State is to the left of the centroid of Oregon; we can’t shortcut ourselves to an answer by simply looking at the bounding box of the two states. The next listing contains examples of geometries with their bounding boxes wrapped around them.
1 The only exception is when you use deprecated functions.
SELECT ex_name, ST_Box2D(the_geom) As bbox2d, the_geom FROM ( VALUES ('A line', ST_GeomFromEWKT('LINESTRING (3 5, 3.4 4.5, 4 5)')), ('A multipoint',ST_GeomFromText('MULTIPOINT (4.4 4.75, 5 5)')) , ('A triangle', ST_GeomFromText('POLYGON ((2 5, 1.5 4.5, 2.5 4.5, 2 5))') ) ) AS foo(ex_name, the_geom);
Figure 5.7 illustrates the output of this query, showing the geometries encased in their bounding boxes.
For brevity we stated that the bounding box is the smallest box you can wrap around a geometry. Strictly speaking, it’s not the smallest. PostGIS will often expand a bounding box to ensure that the coordinates can be represented with float4 numbers. For example, if the smallest box is defined by the two corners of (-3.14159265,0) and (0,2.71828182), PostGIS may round this off to (-3.15,0) and (0,2.72). The size limit of the box coordinates may change in future versions.
PostGIS offers a number of geometry bounding box comparators that work exclusively with box2d objects and one comparator that works against the actual geometry. Some but not all of these operators have functional counterparts that apply to the entire geometry. As a convenient shorthand, PostGIS uses various operators to symbolize comparators. For example, A && B returns true if bounding box of geometry A intersects bounding box of geometry B or vice versa where the double ampersand operator (&&) is the intersection comparator. The && operator is the one most commonly used as a precheck for spatial relationships.
Table 5.10 is a quick table of the operators, what they do, and what kind of index they use. Keep in mind that in general you put GIST indexes on geometries. B-tree indexes are possible only if geometry objects are relatively small, such as points or small polygons and lines. If you have no B-tree index, then an operator that works only with B-tree won’t use an index. You can have both a GIST and a B-tree index on the same geometry field; B-tree indexes on geometries are rare and of minimal utility.
Operator |
What is checks |
Index |
---|---|---|
&& | Returns true if A’s bounding box intersects B’s | gist |
&< | Returns true if A’s bounding box overlaps or is to the left of B’s. | gist |
&<| | Returns true if A’s bounding box overlaps or is below B’s. | gist |
&> | Returns true if A’s bounding box overlaps or is to the right of B’s | gist |
<< | Returns true if A’s bounding box is strictly to the left of B’s | gist |
<<| | Returns true if A’s bounding box is strictly below B’s | gist |
= | Returns true if A’s bounding box is the same as B’s | B-tree |
>> | Returns true if A’s bounding box is strictly to the right of B’s | gist |
@ | Returns true if A’s bounding box is contained by B’s | gist |
|&> | Returns true if A’s bounding box overlaps or is above B’s | gist |
|>> | Returns true if A’s bounding box is strictly above B’s | gist |
~ - | Returns true if A’s bounding box contains B’s | gist |
~= | Obsolete; superseded by ST_OrderingEquals | gist |
Although all the operators are used to compare only bounding boxes (box2d objects), except for the ~= sameness operator, which works against true geometries, all these operators can be called for both plain bounding boxes box objects (box2d) as well as actual geometries. They are generally used with geometries and look at the bounding box wrapper around the geometry.
Now that we’ve covered the basics of bounding boxes, which are used extensively as a precheck for other relationships, we’ll explore the most fundamental relationship: equality and its multifaceted meaning.
In conventional databases, you probably never gave the equality comparator (=) a second thought before using it. This unambiguity doesn’t carry over to spatial databases. When we compare two geometries, equality is a multifaceted notion. We can ask whether geometries occupy the same space. We can ask whether they are represented by the same points. We can ask whether their bounding boxes are the same.
Three basic kinds of equality are specific to geometries in PostGIS:
We consider two geometries as spatially equal if they occupy exactly the same underlying space. PostGIS uses ST_Equals to test for spatial equality. ST_Equals is also an OGC/ SQL-MM–defined function you’ll find in many spatial databases and is based on what’s called an intersection matrix model.
ST_Equals is most commonly used to determine if two geometries that are described by potentially different points or polygon rings in a different orientation represent the same geometry. It will also equate a polygon with a multipolygon or geometry collection that has only that single polygon in its list. It’s an equality that doesn’t care about the vector direction of the line segments or point ordering in the geometry.
Geometric equality is even more strict than spatial equality. When two valid geometries are geometrically equal, not only must they share the same space, but the underlying geometric representation must also be more or less the same. For example, take any interstate highway road in the United States. Depending on which side of the road you’re traveling on, the interstate is signed as north versus south or east versus west. Although it’s the same interstate highway, the direction of travel matters. Sometimes it matters greatly when you get lost. In the same vein, LINESTRING(0 0,1 1) is spatially equal to LINESTRING (1 1,0 0) but not geometrically equal.
PostGIS offers the ST_OrderingEquals function for geometric equality. In versions pre PostGIS 1.4, the ~= meant the same thing, but in newer installs it’s just bounding box equality that uses a spatial index. To be safe, stay away from using ~= and stick with ST_OrderingEquals. In listing 5.10 we demonstrate the difference between ST_OrderEquals and ST_Equals.
In the case of invalid geometries, ST_Equals (spatial equality) may be false when two invalid geometries are exactly the same. Only in this case will you run across the paradox of geometries being geometrically equivalent ST_OrderingEquals but not being spatially equivalent. The reason for this is that the space occupied by an invalid geometry is often ambiguous.
SELECT ex_name, ST_OrderingEquals(the_geom, the_geom) As g_oeq_g, ST_OrderingEquals(the_geom, ST_Reverse(the_geom)) As g_oeq_rev, ST_OrderingEquals(the_geom, ST_Multi(the_geom)) AS g_oeq_m, ST_Equals(the_geom, the_geom) As g_seq_g, ST_EQuals(the_geom, ST_Multi(the_geom)) As g_seq_m FROM ( VALUES ('A 2d line', ST_GeomFromText('LINESTRING(3 5, 2 4, 2 5)') ), ('A point',ST_GeomFromText( 'POINT(2 5)')) , ('A triangle', ST_GeomFromText('POLYGON((3 5, 2.5 4.5, 2 5, 3 5))') ), ('poly with self-inter', ST_GeomFromText('POLYGON((2 0,0 0,1 1,1 -1, 2 0))') ) ) AS foo(ex_name, the_geom);
The results are shown in table 5.11.
ex_name |
g_oeq_g |
g_oeq_rev |
g_oeq_m |
g_seq_g |
g_seq_m |
---|---|---|---|---|---|
A 2d line | t | f | f | t | t |
A point | t | t | f | t | t |
A triangle | t | f | f | t | t |
poly with self-inter | t | f | f | f | f |
Observe also that the multigeometry variant is not geometrically equal to the singular version, but in the case of spatial equality, because they occupy the same space, they’re equal.
The geometric equality is not quite the same as what you get when you compare the point-by-point structure of a geometry. It doesn’t even follow the OGC ST_Ordering-Equals standard, which considers geometries equal if they are spatially equal and the ordering of the points is the same.
Another caveat is that ST_OrderingEquals doesn’t work with curved geometries in Post-GIS 1.4 and below, though it works fine with 3D geometries. If you have curved geometries, do a binary compare with ST_AsEWKB(A) = ST_AsEWKB(B), and if you don’t care about SRID do a ST_AsBinary(A) = ST_AsBinary(B).
In versions of PostGIS 1.3 and below, ~= returned the same answer as ST_Ordering-Equals. This may or may not be true in PostGIS 1.4 versions and above and all depends on whether you did a soft upgrade or a hard upgrade. For people who soft-upgraded from PostGIS 1.3 and PostGIS 1.4.1 and below, it still behaves as ST_OrderingEquals, but for PostGIS 1.4.0 and PostGIS 1.5+, it behaves like a spatial indexable =. We suggest you don’t rely on ~= anymore and use ST_OrderingEquals instead.
In PostGIS, the one-and-only equality comparator (=) is reserved for bounding box equality. If you ask if geometry A = geometry B, the result will return true if the bounding boxes of A and B are spatially equal. Because bounding box equality usurped the ubiquitous equal sign, many people mistake bounding box equality for geometric equality. A = B doesn’t mean that A is B. Here’s an example illustrating the difference:
SELECT ST_GeomFromText('LINESTRING (3 5, 3.4 4.5, 4 5)') = ST_GeomFromText('POLYGON ((3 5, 3 4.5, 4 4.5, 3 5))') As op_eq ; --Result t
The comparison of a polygon and a linestring returns true. You may ask, how is this possible? It’s because their bounding boxes are equal although the geometries are quite different. However, if you use the geometric equality operator, you get the expected false answer:
SELECT ST_OrderingEquals(ST_GeomFromText('LINESTRING (3 5, 3.4 4.5, 4 5)') , ST_GeomFromText('POLYGON ((3 5, 3 4.5, 4 4.5, 3 5))') ) As op_same; --Result f
Bounding box equality is what PostGIS uses for equality comparison, which produces often-unexpected results when UNIONing without ALL, using DISTINCT or doing a GROUP BY on a geometry. The examples in the following listing demonstrate this anomaly.
These examples demonstrate the oddity that is the bounding box = operator. Because = for geometries is mapped to = of the bounding box and SQL uses = for DISTINCT checks, you end up with somewhat strange situations, as demonstrated in the listing. In we get two records back because we’re doing a UNION ALL, and a UNION ALL by definition returns all records in the union. In we get one record back, which is the first one that’s hit (the linestring), because our subquery has a UNION, and UNION without ALL puts in an implicit DISTINCT. Because the bounding boxes of the geometries are equal, they are seen as equal. In we get one record back for the same reason as . In we get two records back because the output of ST_AsText isn’t a geometry, but text and text = text means the text has to match exactly.
This operator also comes into play when you group by geometries, and this is probably the case where people get bitten the most. Here’s a demonstration of this tragedy.
As you can see in the DISTINCT count gives an answer of 1, because both geometries share the same bounding box and therefore there’s only one distinct bounding box. In we get an answer of 2 because we changed the linestring slightly so that the bounding box is different from the polygon. In we get only one geometry back, because the group by sees the geometries as the same although they are different. In we get both geometries back because they have different bounding boxes.
If this behavior causes so much confusion and pain, why do we have it? We’re not sure. One theory is it’s efficient and isn’t that much of an issue. It’s efficient because when doing a group by or a union, the query planner need only consider the bounding box caricature that surrounds the geometry, which is a lot less painful than considering a huge complex geometry. In most cases, it’s rare that we’re only doing a DISTINCT or GROUP BY on a geometry and that our geometries have exactly the same bounding boxes, so the uniqueness of the other fields and the rareness of non-dupes having exactly the same bounding box counterbalances this behavior.
The bounding box equality issue comes into play in several common SQL constructs:
Note that in this example we’re stuffing the text representation of the geometry into a geometry field. The text representation happens to be the HEXEWB normally displayed when you do a SELECT of a geometry. When you insert the code into the table, PostgreSQL silently casts it to a geometry for you.
All the operators work for curved geometries except for ~=. For 3D geometries (geometries with a Z coordinate), the Z coordinate is ignored for the bounding box operators but considered with ~=.
Next we’ll look at the underpinnings of relationship functions.
The intersection relationship we covered earlier might have given you the impression that ST_Intersect is the most generic relationship between two geometries. In actuality, we can generalize one step further. The underpinning of most relationship functions in PostGIS and in fact most spatial databases is based on the Dimensionally Extended 9 Intersection Matrix (DE-9IM), which we’ll loosely refer to as the intersection matrix. The PostGIS function that can work directly with an intersection matrix is the ST_Relate function.
The intersection matrix is the foundation of most geometric relationships supported by the OpenGIS OGC/SQL-MM standards. It’s a mathematical approach that defines the pair-wise intersection and geometric dimension of the resulting intersection of the three regions of a geometry. It’s a 3 x 3 matrix consisting of interior, boundary, and exterior on each axis, with one axis defining geometry A and the other defining geometry B. This matrix is used to both define a requirement for an arbitrary relationship as well as define the most encompassing relationship between two geometries. When used to define a custom relationship, it can have (T, F, *, 0, 1, or 2) in each of the nine cell slots. When used to output the most constraining relationship between predefined geometries A and B, it can contain only F, 0, 1, or 2 in the cell slots. The reason for that is that an intersection must always have a corresponding dimensionality, and with F (no intersection) there’s no dimensionality. Not only do there exist quite a number of possible matrices, but you can construct more complex statements by chaining intersection matrices together with boolean and/or operations.
The DE-9IM matrix concept derives from the work by M. J. Egenhofer, J. R. Herring, et al. http://www.spatial.maine.edu/~max/9intReport.pdf.
PostGIS has two variants of the ST_Relate function. The first variant returns a boolean true or false that states whether geometries A and B satisfy the specified relationship matrix. The second variant denotes the most constraining relationship matrix satisfied by the two geometries.
The three quadrants of the intersection matrix are listed here as well as what each means:
Each cell of the matrix can hold one of the values shown in table 5.12.
Value | Description |
T | An intersection must exist; the resultant geometry can be 0, 1, or 2 dimensions (point, line, area). |
F | An intersection must not exist. |
* | It doesn’t matter if an intersection exists or not. |
0 | An intersection must exist, and the intersection must be at finite points (dim = 0 ). |
1 | An intersection must exist, and the intersection’s dimension must be 1 (finite lines). |
2 | An intersection must exist, and the intersection’s dimension must be 2 (areal). |
In figure 5.8 we show ST_Disjoint in intersection notation. ST_Intersects is the opposite of ST_Disjoint. If you were to write out ST_Intersects in DE-9IM notation, it would require three matrix statements. In DE-9IM notation it’s easier to use proof by contradiction (assuming you’re dealing with valid geometries)—state that geometry A intersects geometry B if they are not Disjoint, thus reducing the three matrix statements to a NOT 1 matrix.
The intersection matrix idea of equality means you can represent two geometries with totally different points or reversed points, and as long as the resulting geometry occupies the same space, they’re equal. This is what we earlier referred to as “spatial equality” (space equal). This kind of equality is determined using the OGC SQL-MM function ST_Equals, which can be written as shown in figure 5.9 in DE-9IM notation.
Observe in the chart that interiors must intersect; exterior/interior and exterior/ boundary never intersect. The reason for that is that for a given geometry there’s a demarcation between exterior/interior, so the exterior should never intersect with the interior for a valid geometry. Points, however, have no boundary, so you can’t say two points that are equal have intersecting boundaries or say anything about the intersection relation of the boundary with its interior.
The following listing is a simple example for various geometries and the accompanying ST_Relate matrix.
SELECT ex_name, ST_Equals(the_geom, ST_Reverse(the_geom)) As g_eq_rev, ST_Equals(the_geom, the_geom) As g_eq_g, ST_AsText(ST_Reverse(the_geom)) As g_rev, ST_Relate(the_geom, ST_Reverse(the_geom)) As g_rel_rev, ST_Equals(the_geom, ST_Multi(the_geom)) AS g_eq_m FROM ( VALUES ('A 2d line', ST_GeomFromText('LINESTRING(3 5, 2 4, 2 5)') ), ('A point',ST_GeomFromText( 'POINT(2 5)')) , ('A triangle', ST_GeomFromText('POLYGON((3 5, 2.5 4.5, 2 5, 3 5))') ), ('poly with self-inter', ST_GeomFromText('POLYGON((2 0,0 0,1 1,1 -1, 2 0))') ) ) AS foo(ex_name, the_geom);
As you can see in the results of this query, shown in table 5.13, a given geometry is generally equal to itself and its reverse (same geometry with coordinate points reversed), and it’s also equal to its multigeometry counterpart.
ex_name |
g_eq_rev |
g_eq_g |
g_rev |
g_rel_rev |
g_eq_m |
---|---|---|---|---|---|
A 2d line | t | t | LINESTRING(2 5,2 4,3 5) | 1FFF0FFF2 | T |
A point | t | t | POINT(2 5) | 0FFFFFFF2 | T |
A triangle | t | t | POLYGON((3 5,2 5,...)) | 2FFF1FFF2 | t |
Poly with self-inter | f | f | POLYGON((2 0,1 -1,1 1,0 0,2 0)) 212111212 |
f |
This model of a geometry that’s equal to itself can break down if you have an invalid geometry. The DEM-9IM relation matrix of all satisfies the T*F**FFF* rule except for our bowtie self-intersecting polygon from chapter 2. It fails the DE-9IM test because its interior intersects with its exterior. In other words, the area it defines is ambiguous.
The most generic of all relationship functions is ST_Relate. There are two variants. One takes two geometries as argument and returns the relationship matrix between the two. The other function accepts any two geometries and intersection matrix as an input argument and returns true or false whether the geometries satisfy the constraints defined by the matrix.
In theory most of the Intersect type relationships can be constructed using one or more ST_Relate calls. In practice they aren’t because the core relationship functions have numerous shortcuts imbedded in them that take advantage of the kind of geometric type each geometry is and how many geometries and so forth. Most of the other relationship functions such as ST_Contains and ST_Touches also take advantage of spatial indexes because their bounding boxes are required to intersect. ST_Relate doesn’t take advantage of spatial indexes automagically, because it can be used to express both Intersect relation types as well as non-intersecting relationships.
In some cases, the various permutations that can be allowed by the intersection matrix are more than can be achieved with the functions we’ve described. Although ST_Relate is rarely used, it’s still good to understand it to get a better grasp of what the other relationship functions mean because many can be unequivocally expressed in DE-9IM geeky notation.
listing 5.15 is an example that exercises both ST_Relate functions. The example uses CTEs introduced in 8.4 to create our virtual table that we use twice in our query. It also uses table row constructors syntax (VALUES ..) introduced in PostgreSQL 8.2. These are both SQL features defined in the ANSI SQL specs.
In we use a CTE and row constructors (VALUES) to construct an inline table of sample geometries called example_set. We then use example_set in and for each row to relate to each other row in the set. The output of this query is shown in figure 5.10.
The results of our query are shown in table 5.14.
a_name |
b_name |
de9im |
inter |
rel_disj |
not_rel_disj |
---|---|---|---|---|---|
A 2d line | A 2d line | 1FFF0FFF2 | t | f | t |
A 2d line | A point | FF1FF00F2 | t | f | t |
A 2d line | A triangle | F11F00212 | t | f | t |
A point | A 2d line | FF0FFF102 | t | f | t |
A point | A point | 0FFFFFFF2 | t | f | t |
A point | A triangle | FF0FFF212 | f | t | f |
A triangle | A 2d line | FF2101102 | t | f | t |
A triangle | A point | FF2FF10F2 | f | t | f |
A triangle | A triangle | 2FFF1FFF2 | t | f | t |
Observe in this example that the result of the not disjoint DE-9-IM statement is equivalent to the answer we get with intersects. Note that the relationship of the triangle with the 2d line is FF2101102 and the 2d line with the triangle is F11F00212. The DE-9IM notation of this relationship is shown in figure 5.11.
Observe how if you take FF2101102 and flip the rows and columns, you end up with F11F00212. Here are some other important observations:
For a more in-depth explanation of the DE-9-IM model refer to http://docs.codehaus.org/display/GEOTDOC/Point+Set+Theory+and+the+DE-9IM+Matrix#PointSetTheoryandtheDE-9IMMatrix-9IntersectionMatrix.
Figure 5.12 shows the intersection matrix for ST_Within.
From the ST_Within example you can see that for a geometry to be within another, the interiors of both must intersect, the interior of A can’t fall outside B (it can’t intersect with the exterior of B), and the boundary can’t fall outside B (the boundary can’t intersect with the exterior of B). The boundaries, however, are free to intersect or not to intersect.
In this chapter we covered a fair amount of territory involving spatial relationships. Hopefully we provided you insight into the subtleties of these not-quite-obvious relationships. Now that you understand the foundations of spatial databases, we’ll look at their application in more real-world examples. You’ll learn how to load data from various formats, dealing with spatial references and more detail about what they are, and how to do more concrete things with spatial functions. Some of the spatial functions we expose may be ones we haven’t covered; many will be ones we’ve already explored but that we’ll combine with other functions in thought-provoking ways.
Two subjects we haven’t yet delved into too deeply are spatial aggregates and geometric processing functions. In the coming chapters we’ll demonstrate these.
3.16.81.33