Time for action – creating a repository of cities

Our work is then very simple. We have a city name in our index, but we don't have its coordinates. We can check in the Nominatim Open Street Map API to obtain its coordinates if present, and some extra metadata. The procedure for this is as follows:

  1. I decided that you need not worry about retrieving that data, so I put a very simple Scala script called import_cities_osm.sh in the /SolrStarterBook/test/chp05/cities directory. The script will search for the list of cities in your running paintings core and then ask Nominatim if there are matches to it, importing them if present into a new core, which we will simply call cities. Obviously this will produce some extra noise, but it's good as we are still experimenting.
  2. Before you run the script we not only have to define the new cities core, we need to define a new solrconfig.xml file (this is nothing new. We define a new core with usual capabilities) and the schema will be really minimal, as shown in the following code:
    <schema name="cities" version="1.1">
      <types>
        <fieldtype name="string" class="solr.StrField" />
        <fieldType name="long" class="solr.TrieLongField" />
        <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_lat_lon" />
        <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
          <analyzer>
            <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt" />
            <tokenizer class="solr.StandardTokenizerFactory" />
            <filter class="solr.LowerCaseFilterFactory" />
          </analyzer>
        </fieldType>
      </types>
    
      <fields>
        <field name="id" type="long" required="true" />
        <field name="city" type="text" required="true" indexed="true" stored="true" />
        <dynamicField name="*" type="string" multiValued="false" indexed="true" stored="true" />
        <dynamicField name="*_coordinates" type="location" indexed="true" stored="true" />
      </fields>
      <uniqueKey>id</uniqueKey>
    </schema>
  3. When we have defined our new core, we can restart the Solr instance and run the following script:
    >> ./import_cities_osm.sh
    
  4. And in a while, we will have our small repository of city information. To verify that we can simply perform a filtered range query over values for latitude and longitude fields:
    >> curl -X GET 'http://localhost:8983/solr/cities/select?&q=*:*&wt=csv&fq=lat:[46.00+TO+50.00]+AND+lon:[2.00+TO+3.00]'
    
  5. Using this we can find out that we now have information about Paris and other places.

What just happened?

The only new part in this small experiment was the introduction of a new type of field, specifically for handling spatial search, as given at: http://wiki.apache.org/solr/SpatialSearch

Note that we are using a filtered query (we will see it again in the next chapters), in which we are actually defining two range queries in the latitude (lat) and longitude (lon) field separately. We saved the coordinate values both as a couple of separated fields and as a unique field, which is internally handled and divided into two numerical parts, optimized for this kind of search. This is what we have defined with the new schema field and field types, if you look carefully at the example in which I have maintained both the solutions; you can test it as you want.

Playing more with spatial search

Then, remembering the introduction of this new type of field, we can perform more or less the same search by using a bounded box search, in order to capture all the cities in a specific area:

>> curl -X GET 'http://localhost:8983/solr/cities/select?&q=*:*&fq=coordinates:[46.00,2.00+TO+50.00,3.00]&wt=csv'

But we can also perform queries based on an assigned distance radius (for example, 190 km), given a certain central point:

>> curl -X GET 'http://localhost:8983/solr/cities/select?q=*:*&fl=city,display_name&q=*:*&fq={!bbox}&sfield=coordinates&pt=48.8565056,2.3521334&d=190&spatial=true&wt=json&indent=true'

In the latter case, in my dataset I obtained again Paris with other two cities in France (Le Liège, Rueil-Malmaison) that are within the radius. This kind of search is quite useful when used in the back end of some HTML widget that permits defining searches, for example, with a slider.

Looking at the new Solr 4 spatial features – from points to polygons

In the previous example, we saw how Solr is able to manage coordinates with basic distance and spatial functions. These functions were introduced to offer a more structured approach. Instead of using two distinct floating point values for latitude and longitude, we can handle them together in a single type, designed to internally store the values separately for queries in range and other operations using a syntax designed for geolocalization.

Since Lucene 4 introduced a new spatial module, Solr itself adds new spatial features based on it. These components are changing fast, and I suggest you follow the updates on the page http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4.

Starting from Solr 3 spatial points handled with LatLonType, it's now possible to index the following shapes:

  • Points (latitude, longitude): This has the same syntax as LatLonType.
  • Rectangles: This is used with syntax minX, minY, maxX, and maxY (the same used for bounding box queries).
  • Circles: This is used with syntax Circle(latX, lonX, d=distance). Note that this kind of object extends LatLonType with the distance functions already supported in Solr 3 as query parameters.
  • Polygons, LineString, and other shapes: These are written using the Well Known Text (WKT) syntax POLYGON ((LatLon1, LatLon2, Latlon3, …)).

The basic idea is to construct complex objects (and shapes) over the simplest, as expected.

Note

Well Known Text is a text markup format originally defined by Open Geospatial Consortium (OGC), and since then adopted by many databases and systems with spatial support, for example, PostGis, GeoServer, Oracle, and others. WKT formats can represent various kinds of spatial concepts, from points to complex polyhedral surfaces. The Wikipedia page http://en.wikipedia.org/wiki/Well-Known_Text also contains some examples and references to the standards.

In order to properly handle ploygons in the queries, new spatial predicates have been introduced: Contains, Intersects, IsWithin, and IsDisjointTo.

The new components extend the Lucene 4 spatial component with features from spatial4J and JTS (Java Topology Suite) libraries. There are still some minor problems with dependencies at the moment, and if you want to test by yourself these features, you have to do a little hacking of the Solr application. You have to add the libraries into the standard WAR file in SOLR_DIST/example/webapps/solr.war, or in the expanded application, SOLR_DIST/example/solr-webapp/webapp/WEB-INF/lib, remembering that if you change the WAR file, the expanded folder will also be changed by Jetty. Another option is to use Maven, and in this case you can take a look at the sample application at GitHub: https://github.com/ryantxu/spatial-solr-sandbox, which can be used to test these features.

The most notable addition in these new features is the the introduction of the PrefixTree type. This is basically a structured type that represents the world as a grid, and internally decomposes each grid recursively into more fine grained cells. This fractal-like approach is interesting since it can be also used to store other kinds of content (not only strictly related to spatial data), still using the topology functions on them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.148.103.210