Spatial Data

Spatial data refers to anything that needs to be localized on the Earth. From points to localize a person or a point of interest on the Earth surface, to lines to model a street or a river path, and to areas to delimit countries and buildings, all of these geometries are spatial data. They have been harmonized within Geographic Information System (GIS), which provides a unified approach for representing these objects. In this chapter, we will discover the shapefile and GeoJSON formats. The full power of these systems is their ability to compute metrics based on these geometries with ease: getting a line length, a surface area, or computing the distance between objects.

While spatial data has always been important in the field of cartography, it has attracted more and more interest with the development of smartphones that are equipped with GPS sensors and are able to localize their carrier at any time. In this chapter, we are going to learn how to manage spatial data with Neo4j. Indeed, Neo4j provides both a built-in type for spatial points and another plugin to manage more complex types, such as lines and polygons. In the later sections of this chapter, we will be linking the GDS plugin that we covered in the previous chapter (Chapter 4, The Graph Data Science Library and Path Finding) with spatial data to build a routing engine for New York City.

The following topics will be covered in this chapter:

  • Representing spatial attributes
  • Creating a geometry layer in Neo4j with neo4j-spatial
  • Performing spatial queries
  • Finding the shortest path based on distance
  • Visualizing spatial data with Neo4j

Technical requirements

The following tools will be used in this chapter:

neo4j-spatial is not yet compatible with Neo4j 4.0 (lastest version at the time of writing: 0.26.2). 
If you are using Neo4j < 4.0, then the latest compatible version of the GDS is 1.1.
If you are using Neo4j ≥ 4.0, then the first compatible version of the GDS is 1.2.

Representing spatial attributes

In this section, we are going to use the point built-in data type from Neo4j. Before that, we need to understand a few concepts purely related to spatial data.

Understanding geographic coordinate systems

The spatial data we'll be talking about in this chapter are points on the Earth's surface. There is no unique way to assign coordinates to each point.

Coordinates of a point P on a fixed radius sphere rely on two angles: the longitude, the angle from a reference meridian (θ in the following diagram), and the latitude, the angle with respect to the equator (δ in the following diagram):

Changing the longitude involves moving in a north-south direction, which corresponds to the y-axis in other projection systems. This means in a general case that longitude is denoted with an x and latitude with a y, which is counter-intuitive.

But, as you may know, the Earth is not a perfect sphere. Indeed, the radius close to the equator is higher than the radius at the poles due to centripetal forces, as illustrated in the following image (with exaggerated flatness):

To make things even more complex, the Earth's surface is not smooth, but has a distorted shape due to the effects of gravity. The following image shows the Earth's actual shape, with exaggerated distortion:

Image credits: https://en.wikipedia.org/wiki/File:Geoid_undulation_10k_scale.jpg

In order to assign a unique coordinate to a point on such a shape, we have to decide between two tradeoffs:

  • Global: A global model of the Earth has the advantage of being universal, but its accuracy won't be constant in all places.
  • Local: On the other hand, you can use a local model, a representation of the Earth surface valid for a small area (say, a country like the UK). Local models are required when accuracy is crucial.

In the rest of this chapter, we will work with the World Geodetic System 1984 (WSG84), which is the most widely used system due to its adoption by all GPS systems. According to the preceding nomenclature, WSG84 is a global model, working everywhere on the Earth's surface.

After choosing a convenient shape, you will face the problem of drawing maps on a plane. Spheres have an annoying mathematical property: they cannot be mapped into a plane without any distortion. This means that, when projecting a sphere on a plane, you have to make choices about which properties will be conserved and which properties will be distorted.

Look at the following diagram, representing two different projections of the Earth on a map:

You can see that the distances along the horizontal axis are much shorter in the left projection compared to the right one, while the distances along the vertical axis are much higher in the left image. Although we are mostly used to the left representation, the right image corresponds to the WSG84, so we will stick to this representation in this chapter.

Using the Neo4j built-in spatial types

If your application only deals with punctual coordinates, then the Neo4j built-in spatial type is perfect for you. It allows you to store locations as points and perform queries based on distance without adding any external dependency.

Creating points

To create a point, we need to provide its latitude and longitude:

RETURN point({latitude: 45, longitude: 3})

The point representation looks like this:

point({srid:4326, x:3, y:45}

You can recognize SRID 4326, meaning the coordinates are given in the GPS projection system. As I have already mentioned previously, the y coordinate holds the latitude, while the longitude is stored to the x coordinate.

We can also use the equivalent syntax by explicitly setting the srid and using the xy notation directly:

RETURN point({srid: 4326, y: 45, x: 3})

Be careful, however. If you omit the srid parameter in the preceding expression, the point will be interpreted as being in a Cartesian projection with SRID 7203. Creating a point in the Cartesian projection can be achieved with the following syntax:

RETURN point({y: 45, x: 3})

The previous query returns a totally different point:

point({srid:7203, x:3, y:45})

If you try to draw this point on a map, it will not be at latitude 45 and longitude 3, since it is not defined using the latitude/longitude coordinate system (srid=4326).

In the rest of this chapter, we will only work with latitude and longitude.

Of course, a point can be attached to a node as a property. We are going to import some data to help demonstrate this. The NYC_POI.csv dataset contains several points of interest in Manhattan (New York City). You can download the file from GitHub (https://github.com/PacktPublishing/Hands-On-Graph-Analytics-with-Neo4j/ch5/data/NYC_POI.csv) and check it has the following structure:

longitude,latitude,name
-73.9829219091849,40.76380762325644,ED SULLIVAN THTR
-73.98375310434135,40.76440130450186,STUDIO 54
-73.96566408280091,40.79823099029733,DOUGLASS II HOUSES BUILDING 13

Each row contains the information about a single point of interest: its name, its latitude, and its longitude. In total, there are 1,076 points of interest.

We can import this data into a new Neo4j graph by using the LOAD CSV statement:

LOAD CSV WITH HEADERS FROM "file:///NYC_POI.csv" AS row
CREATE (n:POI) SET n.name=row.name,
n.latitude=toFloat(row.latitude),
n.longitude=toFloat(row.longitude),
n.point = point({latitude: toFloat(row.latitude),
longitude: toFloat(row.longitude)})

You can check that each created node has a point attribute:

MATCH (n:POI) RETURN n LIMIT 1

The preceding query returns the following JSON result, which includes a point key:

{
  "name": "ST NICHOLAS HOUSES BUILDING 11",
  "point": point({srid:4326, x:-73.9476517312946, y:40.813276483717175}),
  "latitude": "40.813276483717175",
  "longitude": "-73.9476517312946"
}

Now, let's see what we can do with this information.

Querying by distance

The only built-in spatial operation as of Neo4j 4.0 is the distance function, which computes the distance between two points. It can be used in the following way:

RETURN distance(point({latitude: 45, longitude: 3}), point({latitude: 44, longitude: 2}))

According to this function, the distance between these two random points in France is 136,731 meters, that is to say, around 137 km (approximately 85 miles).

The two points in the distance function need to be in the same CRS!
The result is in meters if the two points were in the GPS projection, otherwise the unit depends on the unit of the projection.

Let's use this function in a more realistic example. We are going to use the NYC_POI.csv dataset we imported in the previous section to get the five closest points of interest from a given point, for instance, Times Square, whose GPS coordinates are (40.757961, -73.985553):

MATCH (n:POI)
WITH n, distance(n.point, point({latitude: 40.757961, longitude: -73.985553})) as distance_in_meters
RETURN n.name, round(distance_in_meters)
ORDER BY distance_in_meters
LIMIT 5

The result looks like the following table:

╒══════════════════╤═══════════════════════════╕
│"n.name" │"round(distance_in_meters)"│
╞══════════════════╪═══════════════════════════╡
│"MINSKOFF THEATRE"│34.0 │
├──────────────────┼───────────────────────────┤
│"BEST BUY THEATER"│58.0 │
├──────────────────┼───────────────────────────┤
│"MARQUIS THEATRE" │83.0 │
├──────────────────┼───────────────────────────┤
│"LYCEUM THEATRE" │86.0 │
├──────────────────┼───────────────────────────┤
│"PALACE THEATRE" │132.0 │
└──────────────────┴───────────────────────────┘

In other words, we've found many theaters close to Times Square.

This kind of distance-based query can be used to recommend activities close to the location of the user.

If we want to represent more complex spatial data, such as lines (streets) or polygons (area boundaries), we will have to use a plugin, neo4j-spatial, which we are going to look at in the following section.

Creating a geometry layer in Neo4j with neo4j-spatial

neo4j-spatial is an extension of Neo4j containing tools to represent and manipulate complex spatial data types. In this section, we are going to learn about this plugin by importing data on districts in Manhattan and finding points of interest located within each district.

Introducing the neo4j-spatial library

The neo4j-spatial plugin can be installed by downloading the latest release jar from https://github.com/neo4j-contrib/spatial/releases and copying this jar to the plugins directory of your active graph. You then need to restart the graph for the changes to be taken into account. Once this is done, you can check that the spatial plugin is enabled by calling the spatial.procedures() procedure, listing all available procedures within the plugin:

CALL spatial.procedures()

With this plugin, we will be able to do the following:

  • Import from well-known geographic data formats such as shapefile
  • Use topology operations such as contains or intersects
  • Take advantage of spatial indexing for faster queries

A note on spatial indexes

Spatial indexes, like any other indexes, are a way of running spatial queries much faster by avoiding having to perform the most complex operations on all the entities. For instance, let's consider the intersection query, where we're trying to find the points intersecting a complex polygon, similar to the one illustrated in the following diagram:

The rules of deciding whether points lie within the real shape are quite complex. However, it is straightforward to decide whether the same points are within the rectangle drawn around the complex area: we just have to perform four comparisons:

x1 < x < x2
and y1 < y < y2

Only the points fulfilling this condition will be tested in order to decide whether they belong to the real shape Doing this usually reduces the number of (complex) operations to be performed considerably.

The rectangle drawn around the real shape is called the bounding box of the real shape. It is the smaller rectangle containing the whole shape.

Let's now go back to neo4j-spatial and create our first spatial layer.

Creating a spatial layer of points

neo4j-spatial is able to represent complex geometries, but also points, so that we do not have to switch between using Neo4j built-in and neo4j-spatial for spatial data. In the first example we are going to study now, we will create a spatial layer of points, using the same dataset as the previous section, representing some points of interest within the Manhattan district of New York.

Defining the spatial layer

The first step to managing geometries within neo4j-spatial is to define a layer. A layer contains information about the type of data it stores. To create a simple point layer named pointLayer, we can use the addPointLayer procedure:

CALL spatial.addPointLayer('pointLayer');

You will notice that this procedure adds a new node to the graph. This node is the one containing the information about the layer. To check the existing layers within a graph, you can call the following:

CALL spatial.layers();

The result of the previous statement is copied here:

╒════════════╤══════════════════════════════════════════════════════════════════════╕
│"name" │"signature" │
╞════════════╪══════════════════════════════════════════════════════════════════════╡
│"pointLayer"│"EditableLayer(name='pointLayer', encoder=SimplePointEncoder(x='longit│
│ude', y='latitude', bbox='bbox'))" │
└────────────┴──────────────────────────────────────────────────────────────────────┘

So far, we have only created a layer, or a container for our spatial data, but this container is empty. Let's now add some data to it.

Adding points to a spatial layer

Once the spatial layer is created, we can add data to it, which will also update the spatial index. The following query adds all nodes with the POI label to the newly created spatial layer with the name 'pointLayer':

MATCH (n:POI)
CALL spatial.addNode('pointLayer', n) YIELD node
RETURN count(node)

This operation adds two more properties to the node:

  • point attribute, similar to the built-in Neo4j type
  • bbox attribute used for querying the data with a spatial index, as discussed earlier in this chapter

We will see how to use this data in the following sections. Before that, we are going to deal with other types of geometries, so as to take full advantage of neo4j-spatial compared to built-in Neo4j types.

Defining the type of spatial data

So far, we have only used points, defined as a single couple of x and y coordinates. But when dealing with geometries, we may have to deal with more complex data structures:

  • POINT is a single location.
  • LINESTRING represents lines, an ordered collection of points. It can be used to represent the shape of a street or a river, for instance.
  • POLYGON represents closed areas, such as city and country boundaries, forests, or buildings.

There are standard representations for each of these types of geometries. The one we are going to use in the rest of this section is the Well-Known Text (WKT) representation, which is a human-readable list of coordinates. The following tables give examples of WKT representation for each of the aforementioned three types of geometries:

Type WKT representation
POINT POINT (x y)
LINESTRING LINESTRING (x1 y1, x2 y2, x3 y3, ..., xn yn)
POLYGON POLYGON (x1 y1, x2 y2, x3 y3, ..., xn yn, x1 y1)

 

In the following subsection, we are going to use some polygon geometries representing the districts of Manhattan. After that, we will use linestrings to represent the streets in this borough.

Creating layers with polygon geometries

We are now going to use some Polygon geometries with our Manhattan data. In this section, we will download and import the data into Neo4j.

Getting the data

The dataset we are going to use now was created from data published by the city of New York. You can download it from the GitHub repository associated with this book, under ch5/data/mahattan_district.shp. More information about how this file was created can also be found at the same location.

The data file is in a shapefile format, a format very common among spatial data professionals. It contains the boundaries and name of each community district within Manhattan in New York. The following image shows the districts we are going to work with:

The 11 districts of the borough of Manhattan in New York. Image created using QGIS

Any GIS is able to decode the shapefile format, as is neo4j-spatial. The first step of our analysis with Neo4j is to import the data into our graph.

Creating the layer

With neo4j-spatial, we can load data directly from a shapefile. The procedure is called spatial.importShapefile and accepts a single parameter: the path to the data file. It will automatically create a new spatial layer to hold the data, with a name derived from the input file name, and create a node for each geometry.

The full syntax is as follows:

  1.  First, let's create the layer, with a given name and type:
CALL spatial.addLayer("manhattan_districts")
  1. Import the shapefile data into this newly created layer:
CALL spatial.importShapefile("/<absolute_path_to>/manhattan_districts.shp")

After you run this procedure, your database contains twelve more nodes, one for each district contained in the shapefile. Each of these nodes contains the following information:

{
"CD_ID": 4,
"geometry": {....},
"gtype": 6,
"ID": 1,
"bbox": [
-74.01214813183593,
40.7373608893982,
-73.98142817921234,
40.77317951096692
],
"NAME": "Chelsea, Clinton"
}

I have cut the geometry representation because it is very long, but the geometry attribute is there and non-empty. It also contains a NAME node (taken from shapefile) and the bbox property (added by neo4j-spatial to serve the spatial index).

To summarize, we now have two spatial layers in our graph:

  1. The pointLayer class we created in the previous section to hold the point of interest.
  2. Our newly created manhattan_districts layer, containing the boundaries of twelve partitions of Manhattan.

It is now time to learn how to use these geometries. The following section deals with spatial queries, in other words, how to take full advantage of the fact that we know the exact location and shape of our nodes.

Performing spatial queries

Now that we have data inside our graph, we are going to use it to extract geographical information. More precisely, we will learn how to select nodes that lie within a certain distance from a given point. We will also discover a new functionality: the ability to find nodes contained within a polygon.

Finding the distance between two spatial objects

With neo4j-spatial, we can get nodes within a certain distance from another node. The procedure to perform this operation is called spatial.withinDistance and its signature is the following:

spatial.withinDistance(layerName, coordinates, distanceInKm) => node, distance

This means it will search, within a given layer, all points that are less than distanceInKm away from coordinates. It returns those nodes and the computed distance between the returned node and coordinates.

The coordinates parameter can be an instance of a point or a map of latitude and longitude.

For instance, we can find the point of interest located less than 1 km away from Times Square with this query:

CALL spatial.withinDistance("ny_poi_point", {latitude: 40.757961, longitude: -73.985553}, 1)
YIELD node, distance
RETURN node.name, distance
ORDER BY distance

We could also look for the points of interest that are close to the Marquis Theater. In this case, we would first find the node corresponding to the Marquis Theater, and then use the point property of this node for the spatial query:

MATCH (p:POI {name: "MARQUIS THEATRE"})
CALL spatial.withinDistance("ny_poi_point", p.point, 1)
YIELD node, distance
RETURN node.name, distance
ORDER BY distance

Last but not least, distance queries also work for non-point layers:

MATCH (p:POI {name: "MARQUIS THEATRE"})
CALL spatial.withinDistance("manhattan_districts", p.point, 1)
YIELD node, distance
RETURN node.NAME, round(distance*1000) as distance_in_meters
ORDER BY distance

The preceding query returns a list of districts whose closest distance to the Marquis Theater is less than 1 km:

╒═════════════════════════════╤════════════════════╕
│"node.NAME" │"distance_in_meters"│
╞═════════════════════════════╪════════════════════╡
│"Midtown Business District" │0.0 │
├─────────────────────────────┼────────────────────┤
│"Chelsea, Clinton" │205.0 │
├─────────────────────────────┼────────────────────┤
│"Stuyvesant Town, Turtle Bay"│965.0 │
└─────────────────────────────┴────────────────────┘

We are now experts in writing distance queries. Next, we are going to discover other kinds of queries that can be performed with spatial data.

Finding objects contained within other objects

From the last query we wrote, we can infer that the Marquis Theater is located within the Midtown Business District of Manhattan. But there is an easier way to find this information, thanks to the intersects procedure:

MATCH (p:POI {name: "MARQUIS THEATRE"})
CALL spatial.intersects("manhattan_districts", p) YIELD node as district
RETURN district.NAME

Now, we get only one result – Midtown Business District. We can even use the graph pattern to save this information by adding a relationship, CONTAINED_IN, between the point of interest and the matched district:

MATCH (p:POI {name: "MARQUIS THEATRE"})
CALL spatial.intersects("manhattan_districts", p) YIELD node as district
CREATE (p)-[:CONTAINED_IN]->(district)

In this way, we will have to perform the spatial query only once, when inserting the node or creating the data, and then rely on graph traversals and Cypher only to get the information we need, which may simplify the queries.

We now know how to import, store, and query spatial data within Neo4j. During the last few sections, we talked a lot about distance between points. This is related to the shortest path concept we studied in the previous chapter (Chapter 4, The Graph Data Science Library and Path Finding). In the next two sections, we will use both spatial data and shortest path algorithms to build a routing engine to guide us around New York.

Finding the shortest path based on distance

Spatial data and path finding algorithms are very much related. In this section, we are going to use a dataset representing the road network in New York, neo4j-spatial, and the GDS plugin (see Chapter 4, The Graph Data Science Library and Path Finding) to build a routing system.

The specifications for this routing application are the following:

  • The user will input the start and end locations as (latitude, longitude) tuples.
  • The system must return an ordered list of the streets the user needs to follow in order to go from his start to his end location by traveling the shortest distance.

Let's start by discovering and preparing the data.

Importing the data

In order to build a routing engine, we need a precise description of the road network in our area of interest. Luckily, the street network of New York is available as open data. You can find this file in the GitHub repository for this book, together with more information about its provenance (https://github.com/PacktPublishing/Hands-On-Graph-Analytics-with-Neo4j/ch5/data/manhattan_road_network.graphml.zip). The file format is called GraphML. It is an XML-like file format, with graph-specific entities. Here's a sample of this data file:

<?xml version='1.0' encoding='utf-8'?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://grap
hml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key attr.name="geometry" attr.type="string" for="edge" id="d16" />
<key attr.name="maxspeed" attr.type="string" for="edge" id="d15" />
<key attr.name="length" attr.type="string" for="edge" id="d9" />
<key attr.name="name" attr.type="string" for="edge" id="d8" />
<key attr.name="ref" attr.type="string" for="node" id="d7" />
<key attr.name="osmid" attr.type="string" for="node" id="d5" />
<key attr.name="longitude" attr.type="double" for="node" id="d4" />

<key attr.name="latitude" attr.type="double" for="node" id="d3" />
<key attr.name="streets_per_node" attr.type="string" for="graph" id="d2" />
<key attr.name="name" attr.type="string" for="graph" id="d1" />
<key attr.name="crs" attr.type="string" for="graph" id="d0" />
<graph edgedefault="directed">
<data key="d0">{'init': 'epsg:4326'}</data>
<data key="d1">Manhattan, New York, USA</data>
<node id="42459137">
<data key="d3">40.7755735</data>
<data key="d4">-73.9603796</data>
<data key="d5">42459137</data>
</node>
<node id="1773060099">
<data key="d3">40.7137811</data>
<data key="d4">-73.9980743</data>
<data key="d5">1773060099</data>
</node>
<edge source="42434559" target="1205714910">
<data key="d8">Margaret Corbin Drive</data>
<data key="d16">LINESTRING (-73.9326239 40.8632443, -73.93267090000001 40.8631814, -73.93273120000001 40.8630891, -73.9327701 40.863009, -73.9338518 40.8594721, -73.93399549999999 40.8594143)</data>
<data key="d9">491.145731265</data>
</edge>
</graph>
</graphml>

As you can infer, the first part defines the different keys or attributes that can be associated with the different entities. These attributes have a name, a type, are assigned to either nodes or edges, and have a key identifier. Only these identifiers is used in the rest of the document. For instance, the node with ID 42459137 (the first one in the preceding reproduced list) has d3=40.7137811, and checking the definition of keys, d3 means y or latitude.

 In the previous snippet, I have highlighted the fields we are going to use for this section:

  • For the nodes representing the intersections between streets, we are going to use latitude and longitude fields. osmid will also be used to uniquely identify nodes.
  • For the edges representing the streets themselves, we are mainly going to use name and length. In the final section dedicated to visualization, we will also use the geometry attribute.

It is now time to import this data into our Neo4j graph. This section will not use the data from the beginning of this chapter, so you can import it into a new empty graph.

Preparing the data

Fortunately, someone has already implemented the import procedure to load this data into Neo4j. It is part of the APOC plugin. You can easily install it from Neo4j Desktop from the Plugins tab of the Manage view of your graph.

Importing data

So, after installing APOC and copying the data file into your import folder, you can simply call the apoc.import.graphml procedure in this way:

CALL apoc.import.graphml("manhattan_road_network.graphml", {defaultRelationshipType:"LINKED_TO"})

After a couple of seconds, you will see a result similar to this one:

4,426 nodes (junctions) were inserted and 9,626 relationships of type LINKED_TO representing road segments were created between them.

In order to facilitate the queries in the rest of this chapter, we will assign a label, Junction, to the newly created nodes:

MATCH (n) 
WHERE n.latitude IS NOT NULL AND n.longitude IS NOT NULL
SET n:Junction

Our graph is well imported. You can check a few nodes and relationships. For instance, those associated with Fifth Avenue look like the following diagram:

In the final section of this chapter, we will learn how to represent this data on a map for a more realistic visualization. Before that, let's work on our routing engine.

Creating a spatial layer

Both junctions and streets have spatial attributes. Junctions have latitude and longitude properties and streets (relationships of the LINKED_TO type in our graph) have a geometry property containing the WKT representation of a LINESTRING object. We can create a spatial layer for each of these entities; however, remember that the GDS path finding algorithms work between nodes, not relationships. This means that, from the (latitude, longitude) user input, we will have to find the closest Junction node. So we need to create a spatial point layer to index our 4426 junctions. For now, there is no need to create a layer to hold the streets; we will create it later on if necessary.

Let's then create the point layer that will index the nodes with the Junction label:

CALL spatial.addPointLayer("junctions")

Now, add points to it:

MATCH (n:Junction)
CALL spatial.addNode("junctions", n) YIELD node
RETURN count(node)

After a few seconds, 4,426 nodes are added to the junctions spatial layer.

Let's now enter into the most important part of the exercise: the path finding algorithm itself.

Running the shortest path algorithm

Before running any algorithm from the GDS, we need to create a projected graph. At this stage, we need to be very careful about the entities to include in this graph. For shortest path applications, we need Junction nodes, LINKED_TO relationships, and the length of each road segment, stored in the length property of each relationship. However, this property is a String type, which is not compatible with the addition operation we need to do in shortest path algorithms. For this reason, we are going to create the projected graph using Cypher projection in order to cast the length property in the projected graph:

CALL gds.graph.create.cypher(
"road_network",
"MATCH (n:Junction) RETURN id(n) as id",
"MATCH (n)-[r:LINKED_TO]->(m) RETURN id(n) as source, id(m) as target, toInteger(r.length) as length"
)

Now, we can use the shortest path finding algorithm. Since our nodes do have latitude and longitude properties, we are able to use the A* implement ation in Neo4j, which uses the Haversine formula as heuristics. Let's remind ourselves of the A* algorithm syntax:

CALL gds.alpha.shortestPath.astar.stream(
graphName::STRING,
{
startNode::NODE,
endNode::NODE,
relationshipWeightProperty::STRING,
propertyKeyLat::STRING,
propertyKeyLon::STRING
}
)

In this signature, we have the following:

  • graphName is the name of the projected graph we want our algorithm to run on.
  • startNode is the starting node in the path. We are going to use the closest Junction to a given pair of coordinates, (latStart, lonStart).
  • endNode is the destination node. Again, we will use the closest Junction to a given pair of coordinates, (latEnd, lonEnd).
  • relationshipWeightProperty is the relationship property containing the weight to be applied to each link, if any. In our case, we will use the length of the street as the weight.
  • propertyKeyLat and propertyKeyLon are the names of the properties containing the latitude and longitude of each node. The A* algorithm uses them to infer which way it needs to go for the shortest path.

The first step is to identify the closest junctions. For the given location, let's say Times Square. This can be achieved with the following query:

WITH {latitude: 40.757961, longitude: -73.985553} as p
CALL spatial.withinDistance("junctions", p, 1) YIELD node, distance
RETURN node.osmid
ORDER BY distance LIMIT 1

With this query, we get that the closest junction to Times Square has osmid = 42428297.

Let's set Times Square as our start location. Our destination for the rest of this section is Central Park (South), whose coordinates are (40.767098, -73.978869). Using a similar query to the previous one, we find out that the closest node to the end location has osmid=42435825. This information is summarized in the following table.

Name Latitude Longitude OSM ID Closest Node
Start location Times Square 40.757961 -73.985553 "42428297"
End location Central Park (South) 40.767098 -73.978869

"42423674"

 

With these identifiers (osmid), we can then execute the A* algorithm using the following query:

MATCH (startNode:Junction {osmid: "42428297"})
MATCH (endNode:Junction {osmid: "42423674"})
CALL gds.alpha.shortestPath.astar.stream(
"road_network",
{
startNode: startNode,
endNode: endNode,
relationshipWeightProperty: "length",
propertyKeyLat: "latitude",
propertyKeyLon: "longitude"
}
) YIELD nodeId, cost
WITH gds.util.asNode(nodeId) as node, cost
RETURN node.osmid as osmid, cost

This produces the following result:

╒════════════╤══════╕
│"osmid" │"cost"│
╞════════════╪══════╡
│"42428297" │0.0 │
├────────────┼──────┤
│"42432700" │274.0 │
├────────────┼──────┤
│"42435675" │353.0 │
├────────────┼──────┤
│"42435677" │431.0 │
├────────────┼──────┤
│"42435680" │510.0 │
├────────────┼──────┤
│"42432589" │589.0 │
├────────────┼──────┤
│"42435684" │668.0 │
├────────────┼──────┤
│"42435687" │746.0 │
├────────────┼──────┤
│"42435702" │823.0 │
├────────────┼──────┤
│"42435705" │903.0 │
├────────────┼──────┤
│"42435707" │984.0 │
├────────────┼──────┤
│"42435710" │1062.0│
├────────────┼──────┤
│"42431556" │1140.0│
├────────────┼──────┤
│"42435714" │1226.0│
├────────────┼──────┤
│"42435716" │1312.0│
├────────────┼──────┤
│"1825841700"│1361.0│
├────────────┼──────┤
│"1825841743"│1370.0│
├────────────┼──────┤
│"4557482266"│1393.0│
├────────────┼──────┤
│"4347550074"│1444.0│
├────────────┼──────┤
│"42423674" │1631.0│
└────────────┴──────┘

The last row tells us that the shortest path between Times Square and South Central Park is around 1.6 km long.

We can extract more useful information by getting the name of the street and summing the length of all relationships belonging to the same street. The RETURN statement from our previous query is then replaced by the following code:

WITH gds.util.asNode(nodeId) as node, cost
WITH collect(node) as path, collect(cost) as costs
UNWIND range(0, size(path)-1) AS index
WITH path[index] AS currentNode, path[index+1] AS nextNode
MATCH (currentNode)-[r:LINKED_TO]->(nextNode)
RETURN r.name, sum(toFloat(r.length)) as length

The result of the full query is as follows:

╒════════════════════╤═══════════════╕
│"r.name" │"length" │
╞════════════════════╪═══════════════╡
│"West 45th Street" │274.598479979 │
├────────────────────┼───────────────┤
│"8th Avenue" │1095.3842483859│
├────────────────────┼───────────────┤
│"Columbus Circle" │33.37408896743 │
├────────────────────┼───────────────┤
│"Central Park South"│238.7529348628 │
└────────────────────┴───────────────┘

This means that, starting from Times Square, I will have to walk 275 meters along West 45th Street, then turn onto 8th Avenue and follow it for 1 km. Arriving at Columbus Circle, I will have to walk along Central Park South for 240 meters before reaching my destination. Let's check this result on a map:

As you can see, the shortest path from the starting point (bottom triange) to the destination point (top triangle, close to Center Park), is using 7th Avenue. However, this is a one-way street, and cars are not allowed to use it in the south-north direction. If we were driving, the preceding route would be correct. But let's assume we are walking and we do not want to make such a detour. With the GDS plugin, this means we have to create another projected graph, road_network_undirected, where the relationships direction will be ignored. We can do this in the following way:

CALL gds.graph.create.cypher(
"road_network_undirected",
"MATCH (n:Junction) RETURN id(n) as id",
"MATCH (n)-[r:LINKED_TO]-(m) RETURN id(n) as source, id(m) as target, toInteger(r.length) as length"
)

The difference compared to the road_network graph is the query used to define the relationships:

  • In road_network, we used directed edges from (n) to (m).

 

MATCH (n)-[r:LINKED_TO]->(m)
  • On the other hand, in road_network_undirected, we used undirected edges simply by removing the > symbol in Cypher:
MATCH (n)-[r:LINKED_TO]-(m)

Running the same path finding algorithm on road_network_undirected, we get the following result:

╒════════════╤═══════════════╕
│"r.name" │"length" │
╞════════════╪═══════════════╡
│"7th Avenue"│1130.5023049762│
└────────────┴───────────────┘
Remember to remove the > symbol in the query producing the result as well.

We reduced the route by 500 meters, which is a long way on foot!

Thanks to spatial data and the A* path finding algorithm from the Neo4j GDS plugin, we have built a fully functional routing system. It even works for cars (with the directed projected graph) and pedestrians (with the undirected projected graph).

For the routing system to be fully trusted by pedestrians, we'll need to exclude motorways and unsuitable roads for walkers.

In the following section, we are going to talk about spatial data visualization when it is stored inside a Neo4j graph.

Visualizing spatial data with Neo4j

Spatial data experts have developed several tools for visualizing and manually updating geometries. ArgQIS and QGIS desktop applications allow us to load data from various data sources, such as shapefile, create edit geometries, and perform operations such as distance calculations.

In this section, we are going to investigate two solutions compatible with Neo4j. The first one is an application that can be installed to Neo4j Desktop and that allows us to visualize nodes. The second one uses the Neo4j JavaScript driver and leaflet, a JavaScript library done especially to visualize maps in order to start creating a web application to visualize the shortest path between two points.

neomap – a Neo4j Desktop application for spatial data

neomap is an open source application available at https://github.com/stellasia/neomap/. It can be added to Neo4j Desktop and connected to the active graph when you start it. It allows you to visualize nodes with spatial attributes (latitude and longitude). We will see two applications in this section.

Disclaimer: I am the author of this application, and as far as I know at the time of writing, there is no equivalent.

Visualizing nodes with simple layers

There are two ways to create a layer in neomap. The first and simplest one is to select the node label(s) you want to fetch and to set the name of the properties holding the latitude and longitude properties. The following screenshot shows the simple layer configuration needed to display the Junctions node from our Manhattan street network graph:

When needed, you can turn the markers into a heatmap by changing the map rendering configuration.

Visualizing paths with advanced layer

The advanced layer mode allows the user to more precisely select the nodes to be displayed. For instance, we may want to visualize only the nodes involved in the shortest path from Times Square to Central Park South. In that case, we will have to write the Cypher query to match the nodes, and return at least two elements: the latitude and longitude of the nodes.

You can see the configuration used to display the nodes belonging to our shortest path in the following screenshot:

This application helps in visualizing data for data analysis and understanding. However, if we want to visualize spatial attributes in a more interactive way in web applications, we will have to investigate other methods. The following section will demonstrate how to use two JavaScript libraries, the Neo4j JavaSscript driver, and leaflet, to visualize the same path.

In the following section, we will go deeper into the Neo4j JavaScript driver to build stand-alone web pages able to interact with Neo4j.

Using the JavaScript Neo4j driver to visualize shortest paths

Neo4j officially provides a driver for JavaScript. We are going to use this driver to create a small web application to visualize the shortest paths between two junctions. The full code is available at https://github.com/PacktPublishing/Hands-On-Graph-Analytics-with-Neo4j/blob/master/ch5/shortest_path_visualization.html and you just have to open it with your favorite web browser to see the result (all dependencies are included in the file).

Neo4j JS driver

The first step is to set the connection parameters to Neo4j:

     var driver = neo4j.driver(
// update the next two lines depending on your configuration
'bolt://127.0.0.1:7687',
neo4j.auth.basic('user', 'password')
);
var session = driver.session();

You can find the bolt port in the active graph management window as illustrated in the following screenshot:

Leaflet and GeoJSON

Leaflet is an open source JavaScript library used to visualize maps. Like many map visualization systems, it works with superimposed layers. Each layer can be a different type. In this section, we will only use two of them:

  • The Tile layer displays square PNG images. Depending on the layer, it can contain streets, toponymy, or even points of interest. In our case, we will only use the default Open Street Map tile layer.
  • The GeoJSON layer displays data with the GeoJSON format. So far, we have only seen geometries stored as WKT. GeoJSON is another human-readable representation of geometries. The following code shows the WKT and GeoJSON representations of the same shape:
WKT:
LINESTRING(-73.9879285 40.7597869,-73.9878847 40.759847)

GeoJSON:
{"type":"LineString","coordinates":[[-73.9879285,40.7597869],[-73.9878847,40.759847]]}

Let's get started and build our map. Creating a leaflet map is as simple as executing the following command:

var map = L.map('mapid').setView([40.757965, -73.985561], 7);

The mapid parameter is the ID of the HTML element the map will be drawn in. This means that our document object model (DOM) needs to contain the following:

<div id="mapid"></div>

The parameters in setView are the coordinates of the map center and the initial zoom level, respectively. As we will see later, these are not really of any importance for us.

Once the map is created, we can start adding layers to it. The first layer we are going to add is the OSM tiles, in order to work out where we are:

L.tileLayer(
'https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', {
maxZoom: 19,
attribution: '&copy; <a href="https://openstreetmap.org/copyright">OpenStreetMap contributors</a>'
}).addTo(map);

Then, we can start querying Neo4j and adding the geometry to the map. To do so, we will use the same query we wrote earlier in this chapter to find the shortest path between two junctions. The osmid of the junction node will be a parameter of our query, so the beginning of the query has to be written as follows:

MATCH (startNode:Junction {osmid: { startNodeOsmId }})
MATCH (endNode:Junction {osmid: { endNodeOsmId }})

We are going to display the street geometries, which are contained in the geometry parameter of the relationships, as WKT format, so let's return this information at the end of our query like this:

RETURN r.name, r.geometry as wkt

We can define our query variable as follows:

var query = `
MATCH (startNode:Junction {osmid: { startNodeOsmId }})
MATCH (endNode:Junction {osmid: { endNodeOsmId }})
// [...] same query than in previous section
RETURN r.name, r.geometry as wkt
`;

The parameters of our query are as follows:

var params = {
"startNodeOsmId": "42428297",
"endNodeOsmId": "42423674"
};

Finally, we can query Neo4j and add the result to the map. This is done in two steps:

  1. Send a query to Neo4j, read the result, and parse WKT to GeoJSON in a format understood by leaflet:
session
.run(query, params)
.then(function(result){
let results = [];
result.records.forEach(function(record) {
let wkt = record.get("wkt");
// we need to filter out records
// for which we do not have any geometry information
if (wkt !== null) {
// parse result from WKT to GeoJSON
let geo = wellknown.parse(wkt);
results.push(geo);
}
});
return results;
})
  1. Once we have a list of GeoJSON objects, we can create the new layer:
.then(function(result) {
var myLayer = L.geoJSON(
result,
{
"weight": 5,
"color": "red",
}
).addTo(map);
map.fitBounds(myLayer.getBounds());
});

We need to set some some styling properties for this new layer in order for it to be more visible. Finally, with the fitBounds operation, we tell leaflet to automatically find the correct viewport so that our path is fully visible. 

If you open the full HTML file with your browser, after updating your connection parameters, you should see a map similar to the one reproduced in the preceding section. It looks pretty nice, except for the fact that some segments are missing from the path. The reason is that some edges in our graph do not have a geometry property. Since the streets in this area are quite straight, we just need the position of the first and last point of the segment to have the geometry displayed nicely. This means we can manually update the geometry of the road segment by using the position of its starting and ending nodes, like this:

MATCH (currentNode)-[r:LINKED_TO]->(nextNode) 
WHERE r.geometry is null
SET r.geometry = 'LINESTRING (' + currentNode.longitude + ' ' + currentNode.latitude + ', ' + nextNode.longitude + ' ' + nextNode.latitude + ')'

If you reload the shortest_path_visualization.html page once this update is complete, you will see a fully colored path, as illustrated on the following map:

You can also check the result by running the algorithm on the undirected projected graph, or even choosing different start and end nodes.

This first visualization example can, of course, be improved. We'll touch on some ideas for improvement in the Questions section of this chapter, and we will learn more interesting techniques later in this book, especially GraphQL, to prevent exposing the query in the JavaScript code (see Chapter 11, Using Neo4j in Your Web Application).

Summary

In this chapter, you have learned how spatial data is stored in information systems, and how to use it with Neo4j. You now know how to use the Neo4j built-in point type and perform distance queries using it. You have also learned about the neo4j-spatial plugin, allowing more complex operations such as geometry intersection and within distance queries.

Finally, you have learned how to build a graph-based routing engine using spatial data and the shortest path algorithms implemented in the GDS library. With the help of some JavaScript code, you have even been able to nicely display the result of the path finding algorithm on a map.

In the next chapter, you will discover a new type of algorithm: centrality algorithms. They are used to quantify the node importance, depending on your definition of importance.

Questions

  1. Spatial data:
    1. Add a new point of interest in New York and create a Neo4j relationship between it and the district it belongs to.
    2. Write the query to find the closest street to a given point (latitude, longitude).
  2. Routing engine:
    1. Modify the shortest path algorithm to find the shortest path in terms of duration instead of distance.
    2. Improve the routing engine for pedestrians by excluding motorways from the possible streets.
    3. How would you find alternative paths?
  3. Visualization:
    1. Modify the web page we created to let the user choose the start and end nodes.
    2. Modify the web page we created to let the user choose the start and end latitude and longitude. The script will have to find the OSM ID of the start and end nodes in order to display the shortest path between them.

Further reading

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.178.9