Using Solr for indexing

As already mentioned several times in this book, there already exists a very nice plugin for searching, which is based on Apache Lucene. The primary drawback of this plugin is that it does not scale due to the fact that it searches on indexes stored on the local file system. If you had a multi-node setup, you would have to make sure that changes of objects are propagated to all nodes and all indexes are updated. This would be very cumbersome and there is already a solution for this problem. Its name is Apache Solr and basically it is a HTTP interface around Lucene. In fact, it has more functionality than that, but for this example, this fact is sufficient. This means using Apache Solr basically moves the index from one Play instance to a real search engine, which then is queried and updated by any Play instance.

This recipe will create a completely new plugin, which will use Solr for searching. This means that every change of an entity has to be forwarded to the search index. The search itself will also always be handled by Solr.

The source code of the example is available at examples/chapter6/search-solr.

Getting ready

As usual we will write a test to show off what the Solr support should look like. This test should be put inside an application, not inside of the plugin which is created throughout this recipe:

public class SolrSearchTest extends UnitTest {

   CommonsHttpSolrServer server;
   
   @Before
   public void setup() throws Exception {
      Fixtures.deleteAllModels();

      server = new CommonsHttpSolrServer("http://localhost:8983/solr");
      server.setRequestWriter(new BinaryRequestWriter());
      clearSolrServerIndex();

      Fixtures.loadModels("test-data.yml");
   }

   private void clearSolrServerIndex() throws Exception {
      server.deleteByQuery( "*:*" );
      server.commit();
   }

    @Test
    public void testEnhancedSearchCapability() {
       assertEquals(1, Car.search("byBrandAndType", "BMW", "320").fetchIds().size());
       
       List<User> users = User.search("byNameAndTwitter", "a*ex", "spinscale*").fetch();
       User user = users.get(0);
       User u1 = User.find("byName", "alex").first();
       assertEquals(user.id, u1.id);
    }
}

If you ignore the setup of the test and take a look at the test, you will see the Model.search() method, which looks pretty similar to the Model.find() method. Its basic function is the same, with the exception that it always queries the Solr index instead of the database. Also, there are two further methods to access the data returned by the Solr query. The first is Model.search().fetchIds() and the second is Model.search().fetch(). The first method fetches the IDs from the index, allowing you to get the data out of the database manually, whereas the second method issues the database query automatically.

You also should have a working Solr instance in order to follow this example. You can download the most up to date release of Solr at http://lucene.apache.org/solr/ and start the example application. It is actually sufficient. Any changes in the configuration file of Solr will be documented in the next section. After downloading and extracting the archive, you can start Solr by:

cd apache-solr-3.1.0/example
java –jar start.jar

You should now be able to go to http://localhost:8983/solr/ and see a Welcome to Solr message.

How to do it...

After creating a new module via play new-module solr for a Solr plugin you need to get the dependencies right. As there are several dependencies overlapping with the versions in Play, you have to carefully define what files should be downloaded and what files should not. Create the following dependencies.yml file and run play dependencies:

self: play -> solr 0.1

require:
    - org.apache.solr -> solr-solrj 3.1.0:
       transitive: false
    - commons-httpclient -> commons-httpclient 3.1:
        transitive: false
    - org.codehaus.woodstox -> wstx-asl 3.2.7
    - stax -> stax-api 1.0.1

The next step is to put the module together. Do not forget to create an appropriate play.plugins file. The plugin actually needs only four classes. The first is the SearchableModel, which extends the standard Model class. Put all classes into the play.modules.solr package:

public class SearchModel extends Model {

   public static Query search(String query, String ... values) {
        throw new UnsupportedOperationException("Check your configuration. Bytecode enhancement did not happen");
   }
   
   protected static Query search(Class clazz, String query, String ... values) {
      StringBuilder sb = new StringBuilder();
      if (query.startsWith("by")) {
         query = query.replaceAll("^by", "");
      }
      String fieldNames[] = query.split("And");
      
      for (int i = 0 ; i < fieldNames.length; i++) {
         String fieldStr = fieldNames[i];
         String value = values[i];
         
         String fieldName = StringUtils.uncapitalize(fieldStr);
         String solrFieldName = getSolrFieldName(fieldName, clazz);
         
         sb.append(solrFieldName);
         sb.append(":");
         sb.append(value);
         
         if (i < fieldNames.length-1) {
            sb.append(" AND ");
         }
      }
      
      return new Query(sb.toString(), clazz);
   }
   
   private static String getSolrFieldName(String fieldName, Class clazz) {
      try {
         java.lang.reflect.Field field = clazz.getField(fieldName);
         Field annot = field.getAnnotation(Field.class);
         if (annot != null && !annot.value().equals("#default")) {
            return annot.value();
         }
      } catch (Exception e) {
         e.printStackTrace();
      }
      return fieldName;
   }
}

The main part of this class is to define the static search() method, which was already used in the test. This method is filled by using bytecode enhancement, so a bytecode enhancer is needed next:

public class SolrEnhancer extends Enhancer {

   public void enhanceThisClass(ApplicationClass applicationClass) throws Exception {
        CtClass ctClass = makeClass(applicationClass);

        if (!ctClass.subtypeOf(classPool.get("play.modules.solr.SearchModel"))) {
            return;
        }

        String method = "public static play.modules.solr.Query search(String query, String[] values) { return search("+applicationClass.name+".class, query, values); }";
        CtMethod count = CtMethod.make(method, ctClass);
        ctClass.addMethod(count);
        
        // Done.
        applicationClass.enhancedByteCode = ctClass.toBytecode();
        ctClass.defrost();
        
        Logger.info("Enhanced search of %s", applicationClass.name);
   }
}

The enhancer replaces the empty static search() method by invoking the second defined search() method with the class parameter and supplies it during enhancement with this information. As the method does not return entity objects but an object being a Query class, this result class has to be defined as well. The Query class issues the actual query to the Solr server and handles the response:

public class Query {

   private SolrQuery query;
   private SolrServer server;
   private Class clazz;

   public <T extends Model> Query(String queryString, Class<T> clazz) {
      query = new SolrQuery();
      query.setFilterQueries("searchClass:" + clazz.getName());
      query.setQuery(queryString);
      this.server = SolrPlugin.getSearchServer();
      this.clazz = clazz;
   }
   
   public Query limit(int limit) {
      query.setRows(limit);
      return this;
   }
   
   public Query start(int start) {
      query.setStart(start);
      return this;
   }
   
   public List<String> fetchIds() {
      query.setFields("id");
      SolrDocumentList results = getResponse();
      List<String> ids = new ArrayList(results.size());
      for (SolrDocument doc : results) {
         String entityId = doc.getFieldValue("id").toString().split(":")[1];
         ids.add(entityId);
      }
      
      return ids;
   }
   
   public <T extends Model> List<T> fetch() {
      List<T> result = new ArrayList<T>();
      
      List<String> ids = fetchIds();
      for (String id : ids) {
         Object objectId = getIdValueFromIndex(clazz, id);
         result.add((T) JPA.em().find(clazz, objectId));
      }

      return result;
   }
   
   private SolrDocumentList getResponse() {
      try {
         QueryResponse rp = server.query(query);
         return rp.getResults();
      } catch (SolrServerException e) {
         Logger.error(e, "Error on solr query: %s", e.getMessage());
      }
      
      return new SolrDocumentList();
   }
   
    private Object getIdValueFromIndex(Class<?> clazz, String indexValue) {
        java.lang.reflect.Field field = getIdField(clazz);
        Class<?> parameter = field.getType();
        try {
            return Binder.directBind(indexValue, parameter);
        } catch (Exception e) {
            throw new UnexpectedException("Could not convert the ID from index to corresponding type", e);
        }
    }

    private java.lang.reflect.Field getIdField(Class<?> clazz) {
        for (java.lang.reflect.Field field : clazz.getFields()) {
            if (field.getAnnotation(Id.class) != null) {
                return field;
            }
        }
        throw new RuntimeException("Your class " + clazz.getName()  + " is annotated with javax.persistence.Id but the field Id was not found");
    }
}

This class issues the query, gets the result, and performs database lookups for the returned IDs, if necessary.

The plugin class which invokes the bytecode enhancer on startup is the central piece of the following plugin:

public class SolrPlugin extends PlayPlugin {

   private SolrEnhancer enhancer = new SolrEnhancer();
   
    public void enhance(ApplicationClass applicationClass) throws Exception {
      enhancer.enhanceThisClass(applicationClass);
    }

   public void onEvent(String message, Object context) {
      if (!StringUtils.startsWith(message, "JPASupport.")) {
         return;
      }
      
      try {
         Model model = (Model) context;
         String entityId = model.getClass().getName() + ":" + model.getId().toString();
         
         SolrServer server = getSearchServer();
         server.deleteById(entityId);
         
         if ("JPASupport.objectUpdated".equals(message)) {
            SolrInputDocument doc = new SolrInputDocument();
            doc.addField("id", entityId);
            doc.addField("searchClass", model.getClass().getName());
            
            for (java.lang.reflect.Field field : context.getClass().getFields()) {
               String fieldName = field.getName();
               Field annot = field.getAnnotation(Field.class);
               if (annot == null) {
                  continue;
               }
               
               String annotationValue = annot.value();
               if (annotationValue != null && !"#default".equals(annotationValue)) {
                  fieldName = annotationValue;
               }
               
               doc.addField(fieldName, field.get(context));
            }
            server.add(doc);
         }
         server.commit();
      } catch (Exception e) {
         Logger.error(e, "Problem updating entity %s on event %s with error %s", context, message, e.getMessage());
      }
   }
   
   public static SolrServer getSearchServer() {
      String url = Play.configuration.getProperty("solr.server", "http://localhost:8983/solr");
      CommonsHttpSolrServer server = null;
      try {
         server = new CommonsHttpSolrServer( url );
         server.setRequestWriter(new BinaryRequestWriter());
      } catch (MalformedURLException e) {
         Logger.error(e, "Problem creating solr server object: %s", e.getMessage());
      }
      return server;
   }
}

The last step is to configure your entities inside your application appropriately like the following User entity:

package models;

import javax.persistence.Entity;
import org.apache.solr.client.solrj.beans.Field
import play.modules.solr.SearchModel;

@Entity
public class User extends SearchModel {

   @Field
   public String name;
   @Field
   public String shortDescription;
   @Field("tw_s")
   public String twitter;

   public Car car;
}

As you can see, you can annotate your fields to be indexed with the @Field annotation from the solrj client. As SolrJ is also used when indexing the entities the plugin does not have to define its own annotations.

How it works...

A lot of source code is needed, for a lot of functionality. There are many pieces needing an explanation. First, a little bit of information about how Solr querying works. The plugin does not use XML to import data into Solr, but rather binary data, as it is way faster than creating an XML response and sending it to the server. You explicitly have to configure Solr to support this binary format, which is described in the SolrJ wiki page at http://wiki.apache.org/solr/Solrj.

Furthermore, SolrJ supports indexing beans directly instead of creating a SolrInputDocument out of it. This does work for entities, as they are complex objects with more than simple types. The problem is the entity ID field. This field is included in the model class, where it also would need to be annotated. However, this would mean that there could be only one class stored, as entity IDs are similar on different models. So the creation of the ID has to be performed manually. The adding of entities to the index is again done via the event mechanism of Play. Whenever an entity is updated or deleted it is propagated to the index. This is done in the SolrPlugin. When you take a look at the onEvent() method, you will see that when the right event occurs the following happens:

  • The old document gets deleted from the index
  • A new document is created, with the ID of the entity
  • A searchClass property is added, which is needed for querying entities of a certain type
  • Every field of an entity with a @Field annotation is added to the document, possibly under a different name than the name of the field in the entity, if specified in the annotation

Be aware that you cannot store object graphs with Solr. This is basically all the code needed to store data into Solr.

The SearchModel class serves two purposes. First, it enables you to have IDE support and auto completion for the static search() method, second it translates the already known find("byFooAndBar") syntax to a real search query.

The last part is the Query class, which issues actual search queries and parses the results. Similar to the JPAQuery, which is returned when calling the find() method of an entity, it features a limit on the result size as well as starting from an offset. It also features the fetch() method. This method fetches all the IDs from the Solr search and then queries the database to get the complete entity. Be aware that this again adds some load to your database. The fetchIds() method returns the IDs from the search, but will not query your database.

Until now it has looked like as if it was not necessary at all to configure Apache Solr. This is not completely true. When you take a look at your schema.xml file, you will find the case of defining a schema for indexing and the support for dynamic fields. Dynamic fields often end with "_s" to represent a dynamic string field. This is used in the User entity above for the Twitter handle. However, as the name and shortDescription fields of this entity do not have such an alias set, they have to be defined like this:

  <field name="name" type="string" indexed="true" stored="true"/>
  <field name="shortDescription" type="text" indexed="true" stored="true"/>
  <field name="searchClass" type="textgen" indexed="true" stored="true" required="true" />
  <field name="id" type="string" indexed="true" stored="true" required="true" />

The name and shortDescription fields are needed for the User class, while the searchClass and id properties are used for any entity which gets indexed.

When writing real applications you should also use a better index value than the one used in this case like models.User.1, which consists of a long string instead of a numerical hash, though of course still being unique from an application point of view. A murmur hash function or similar might be a good choice. Solr also has support for UUID based IDs. Check http://wiki.apache.org/solr/UniqueKey for more information.

This is quite a good example on how to integrate an existing API into Play framework, as it closely resembles the existing methods like find(). It adds up the search() methods, which makes it pretty easy for the developer to adapt to this new functionality. You should always think about the possibilities of integration. For example, the SolrQuery class used in this example supports facetted searches. However, integrating this neatly into the Play framework is a completely different issue. If it does not work in a nice way, you should use the native API instead of just adding another layer of complexity, which needs to be maintained.

If you want to optimize this module, the first possible spot to look for would be the fetch() method in the Query class. It is very inefficient to execute JPA.em().find() on every ID returned by the search query. You could easily group this query into one statement using a JPA query and the IN operator of JPQL.

There's more...

Search is a really complex topic. Make sure you know the search engine you are using before you tie it to your application.

More information about SolrJ

SolrJ is an incredibly useful library for connecting Solr with Java. You should read more about it at http://wiki.apache.org/solr/Solrj.

More complex queries

As you can access the Solr query inside the query class, you can define arbitrary complex queries if you change the query class as needed. This would include stuff such as facetting, range queries, and many more. Solr is actually very versatile and here we use only a very small number of features.

Support for other search engines

There is a very nice module for the elasticsearch engine, which is a new kid on the block of Lucene-based search engines. It scales transparently and you can even use it as a complete backend data store. More information about elasticsearch is available at http://www.elasticsearch.org

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.253.31