Setting up full-text search

While most relational databases provide some mechanism for full-text search, these databases are optimized for online transaction processing (OLTP) type workloads. Document databases, on the other hand, are designed specifically for full-text search queries, and excel at them. In this recipe, I'll show you how to use NHibernate Search and Lucene.Net to provide full-text search capabilities for your entities.

Getting ready

  1. Download the NHibernate Search binary files from SourceForge at http://sourceforge.net/projects/nhcontrib/files/.
  2. Extract NHibernate.Search.dll and Lucene.Net.dll from the downloaded ZIP file to your solution's Lib folder.
  3. Complete the Eg.Core model and mappings from Chapter 1.

How to do it...

  1. In Eg.Core, add a reference to NHibernate.Search.dll.
  2. On the Entity base class, decorate the Id property with the DocumentId attribute from NHibernate.Search.Attributes.
  3. On the Product class, add the following attributes:
    [Indexed]
    public class Product : Entity
    {
    
      [Field]
      public virtual string Name { get; set; }
    
      [Field]
      public virtual string Description { get; set; }
      public virtual Decimal UnitPrice { get; set; }
    
    }
  4. On the book class, add the following attributes:
    [Indexed]
    public class Book : Product
    {
    
      [Field(Index = Index.UnTokenized)]
      public virtual string ISBN { get; set; }
    
      [Field]
      public virtual string Author { get; set; }
    
    }
  5. Create a new console project named Eg.Search.Runner.
  6. Add references to the Eg.Core model, log4net.dll, Lucene.Net.dll, NHibernate.dll, and NHibernate.ByteCode.dll.
  7. Add an App.config file with the standard log4net and hibernate-configuration sections.
  8. Add a new class named SearchConfiguration using the following code:
    public class SearchConfiguration
    {
    
      public ISessionFactory BuildSessionFactory()
      {
        var cfg = new Configuration().Configure();
        SetSearchPropscfg);
        AddSearchListeners(cfg);
        var sessionFactory = cfg.BuildSessionFactory();
        return new SessionFactorySearchWrapper(
          sessionFactory);
      }
    
      private void SetSearchProps(Configuration cfg)
      {
        cfg.SetProperty(
          "hibernate.search.default.directory_provider", 
          typeof(FSDirectoryProvider)
          .AssemblyQualifiedName);
    
        cfg.SetProperty(
          "hibernate.search.default.indexBase",
          "~/Index");
      }
    
      private void AddSearchListeners(Configuration cfg)
      {
        cfg.SetListener(ListenerType.PostUpdate, 
          new FullTextIndexEventListener());
        cfg.SetListener(ListenerType.PostInsert, 
          new FullTextIndexEventListener());
        cfg.SetListener(ListenerType.PostDelete, 
          new FullTextIndexEventListener());
        cfg.SetListener(ListenerType.PostCollectionRecreate, 
          new FullTextIndexCollectionEventListener());
        cfg.SetListener(ListenerType.PostCollectionRemove, 
          new FullTextIndexCollectionEventListener());
        cfg.SetListener(ListenerType.PostCollectionUpdate, 
          new FullTextIndexCollectionEventListener());
      }
    }
  9. Create a new class named SessionFactorySearchWrapper using the following code:
    public class SessionFactorySearchWrapper 
    : ISessionFactory
    {
      private readonly ISessionFactory _sessionFactory;
    
      public SessionFactorySearchWrapper(
        ISessionFactory sessionFactory)
    
      {
        _sessionFactory = sessionFactory;
      }
    
      public ISession OpenSession()
      {
        var session = _sessionFactory.OpenSession();
        return WrapSession(session);
      }
    
      public ISession OpenSession(
        IDbConnection conn, 
        IInterceptor sessionLocalInterceptor)
      {
        var session = _sessionFactory
          .OpenSession(conn, sessionLocalInterceptor);
        return WrapSession(session);
      }
    
      public ISession OpenSession(
        IInterceptor sessionLocalInterceptor)
      {
        var session = _sessionFactory
          .OpenSession(sessionLocalInterceptor);
        return WrapSession(session);
      }
    
      public ISession OpenSession(
        IDbConnection conn)
      {
        var session = _sessionFactory.OpenSession(conn);
        return WrapSession(session);
      }
    
      private static ISession WrapSession(
        ISession session)
      {
        return NHibernate.Search
          .Search.CreateFullTextSession(session);
      }
    
    }
  10. Implement the remaining ISessionFactory methods and properties in SessionFactorySearchWrapper by passing the call to the _sessionFactory field, as shown in the following code:
    public IClassMetadata GetClassMetadata(string entityName)
    {
      return _sessionFactory.GetClassMetadata(entityName);
    }
  11. In Program.cs, use the following code:
    class Program
    {
      static void Main(string[] args)
      {
        
        XmlConfigurator.Configure();
        var log = LogManager.GetLogger(typeof(Program));
    
        var cfg = new SearchConfiguration();
        var sessionFactory = cfg.BuildSessionFactory();
    
        var theBook = new Book()
                        {
                          Name = @"Gödel, Escher, Bach: An Eternal Golden Braid",
                          Author = "Douglas Hofstadter",
                          Description =
                            @"This groundbreaking Pulitzer Prize-winning book sets the standard for interdisciplinary writing, exploring the patterns and symbols in the thinking of mathematician Kurt Godel, artist M.C. Escher, and composer Johann Sebastian Bach.",
                          ISBN = "978-0465026562",
                          UnitPrice = 22.95M
                        };
    
        var theOtherBook = new Book()
                             {
                               Name = "Technical Writing",
                               Author = "Joe Professor",
                               Description = "College text",
                               ISBN = "123-1231231234",
                               UnitPrice = 143.73M
                             };
    
        var thePoster = new Product()
                          {
                            Name = "Ascending and Descending",
                            Description = "Poster of famous Escher print",
                            UnitPrice = 7.95M
                          };
    
        using (var session = sessionFactory.OpenSession())
        {
          using (var tx = session.BeginTransaction())
    
          {
            session.Delete(«from Product»);
            tx.Commit();
          }
        }
    
        using (var session = sessionFactory.OpenSession())
        {
          using (var tx = session.BeginTransaction())
          {
            session.Save(theBook);
            session.Save(theOtherBook);
            session.Save(thePoster);
            tx.Commit();
          }
        }
    
    
        var products = GetEscherProducts(sessionFactory);
        OutputProducts(products, log);
    
        var books = GetEscherBooks(sessionFactory);
        OutputProducts(books.Cast<Product>(), log);
      }
    
      private static void OutputProducts(
        IEnumerable<Product> products,
        ILog log)
      {
    
        foreach (var product in products)
        {
          log.InfoFormat("Found {0} with price {1:C}",
                         product.Name, product.UnitPrice);
        }
    
      }
    
      private static IEnumerable<Product> 
        GetEscherProducts(
        ISessionFactory sessionFactory)
      {
        IEnumerable<Product> results;
        using (var session = sessionFactory.OpenSession()
                             as IFullTextSession)
        {
          using (var tx = session.BeginTransaction())
          {
            var queryString = "Description:Escher";
            var query = session
              .CreateFullTextQuery<Product>(queryString);
            results = query.List<Product>();
            tx.Commit();
          }
        }
        return results;
      }
    
      private static IEnumerable<Book> GetEscherBooks(
        ISessionFactory sessionFactory)
      {
        IEnumerable<Book> results;
        using (var session = sessionFactory.OpenSession()
                             as IFullTextSession)
        {
          using (var tx = session.BeginTransaction())
     
         {
            var queryString = "Description:Escher";
            var query = session
              .CreateFullTextQuery<Book>(queryString);
            results = query.List<Book>();
            tx.Commit();
          }
        }
        return results;
    
      }
    }
  12. Build and run your application

How it works...

In this recipe, we've offloaded our full-text queries to a Lucene index in the bin/Debug/Index folder.

First, let's quickly discuss some Lucene terminology. The Lucene database is referred to as an Index. Each record in the Index is referred to as a Document. In the case of NHibernate Search, each Document in the Index has a corresponding entity in the relational database. Each Document has Fields, and each field comprises a name and value. By default, fields are tokenized or broken up into terms. A term can best be described as a single, significant, lower-case word from some string of words. For example, the string "Bag of Cats" would be tokenized into the terms "bag" and "cat". Additionally, Lucene maintains a map of terms in a field, which documents contain a given term, and the frequency of that term in the document. This makes keyword searches extremely fast.

Entity classes with the Indexed attribute will be included as documents in the Lucene index. The remaining attributes are used to determine what properties from these entities should be included in the document, and how that data will be stored. Automatically, the _hibernate_class field stores the entity type. Each searchable entity must have a field or property decorated with the DocumentId attribute. This is stored in the ID field, and is used to maintain the relationship between entities and documents. In our case, the ID property on Entity will be used.

To be useful, we should include additional data in our documents using the Field attribute. For keyword searches, we've included the tokenized name and description of every product, and the author of every book. We've also included the ISBN of every book, but have chosen not to tokenize it because a partial ISBN match is useless.

The SearchConfiguration class is responsible for building an NHibernate configuration, adding the necessary NHibernate Search settings to the configuration, building an NHibernate session factory, and wrapping the session factory in our search wrapper.

The SessionFactorySearchWrapper wraps the standard NHibernate session factory and returns IFullTextSearchSession from calls to OpenSession. These sessions behave as normal NHibernate sessions, and provide additional methods for creating full-text search queries against the Lucene index. The CreateFullTextQuery method of the session takes a Lucene query in string or query object form and returns a familiar NHibernate IQuery interface, the same interface used for HQL and SQL queries. When we call List or UniqueResult, the query is executed against our Lucene index. For example, the query in our GetEscherProduct query will search Lucene for documents with a Description containing the term escher. This query returns two results: the GEB book and the M. C. Escher poster. The IDs of each of those search results are gathered up and used to build a SQL database query similar to the next query.

SELECT this_.Id          as Id0_0_,
       this_.Name        as Name0_0_,
       this_.Description as Descript4_0_0_,
       this_.UnitPrice   as UnitPrice0_0_,
       this_.Director    as Director0_0_,
       this_.Author      as Author0_0_,
       this_.ISBN        as ISBN0_0_,
       this_.ProductType as ProductT2_0_0_
FROM   Product this_
WHERE  (this_.Id in ('5933e3ba-3092-4db7-8d19-9daf014b8ce4' /* @p0 */,'05058886-8436-4a1d-8412-9db1010561b5' /* @p1 */))

Because this database query is performed on the primary key, it is amazingly fast. The Lucene query is fast because the database was specially designed for that purpose. This has the potential for huge performance and functionality gains over the weak full-text search capabilities in most relational databases.

There's more...

This is just the most basic example of what we can do with NHibernate Search. We can also choose to store the original value of a field in the document. This is useful when we want to display Lucene query results without querying the SQL database. Additionally, Lucene has many more features, like search-term highlighting and spell-checking. Although Lucene is a very capable document database, remember that it is not relational. There is no support for relationships or references between documents stored in a Lucene index.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.247.11