Setting up full-text searches

While many relational databases provide some mechanism for full-text searches, these databases are optimized for Online Transaction Processing (OLTP) type workloads. Full-text search engines, on the other hand, are designed specifically for text queries, and excel at them. In this recipe, we'll show you how to use NHibernate search and Lucene.Net to provide full-text search capabilities to your applications.

Getting ready

Complete the Eg.Core model and mappings from Chapter 1, The Configuration and Schema.

How to do it…

  1. Install the NHibernate Search package the to Eg.Core project using NuGet Package Manager Console by running the following command:
    Install-Package NHibernate.Search
    
  2. In the Entity base class, decorate the Id property with the [DocumentId] attribute from NHibernate.Search.Attributes.
  3. Add the following attributes to the Product class:
    [Indexed]
    public class Product : Entity
      {
      [Field]
      public virtual string Name { get; set; }
    
      [Field]
      public virtual string Description { get; set; }
      public virtual Decimal UnitPrice { get; set; }
    }
  4. Add the following attributes to the Book class:
    [Indexed]
    public class Book : Product
    {
        [Field(Index = Index.UnTokenized)]
        public virtual string ISBN { get; set; }
    
        [Field]
        public virtual string Author { get; set; }
    }
  5. Create a new console project named Eg.Search.Runner.
  6. Install the NHibernate search and log4net packages to the Eg.Search.Runner project using NuGet Package Manager Console by running the following commands:
    Install-Package NHibernate.Search
    Install-Package log4net
    
  7. Add an App.config file with the standard log4net and hibernate-configuration sections.
  8. Add a new class named SearchConfiguration using the following code:
    public class SearchConfiguration
    {
    
      public ISessionFactory BuildSessionFactory()
      {
        var cfg = new Configuration().Configure();
        SetSearchProps(cfg);
        AddSearchListeners(cfg);
        var sessionFactory = cfg.BuildSessionFactory();
        return sessionFactory;
      }
    
      private void SetSearchProps(Configuration cfg)
      {
         cfg.SetProperty(
         "hibernate.search.default.directory_provider", 
          typeof(FSDirectoryProvider)
          .AssemblyQualifiedName);
    
           cfg.SetProperty(
           "hibernate.search.default.indexBase","~/Index");
      }
    
      private void AddSearchListeners(Configuration cfg)
      {
        cfg.SetListener(ListenerType.PostUpdate, 
          new FullTextIndexEventListener());
        cfg.SetListener(ListenerType.PostInsert, 
          new FullTextIndexEventListener());
        cfg.SetListener(ListenerType.PostDelete, 
          new FullTextIndexEventListener());
        cfg.SetListener(ListenerType.PostCollectionRecreate, 
          new FullTextIndexCollectionEventListener());
        cfg.SetListener(ListenerType.PostCollectionRemove, 
          new FullTextIndexCollectionEventListener());
        cfg.SetListener(ListenerType.PostCollectionUpdate, 
          new FullTextIndexCollectionEventListener());
      }
    }
  9. In Program.cs, use the following code:
    class Program
      {
        static void Main(string[] args)
         {
           XmlConfigurator.Configure();
           var log = LogManager.GetLogger(typeof(Program));
           var cfg = new SearchConfiguration();
           var sessionFactory = cfg.BuildSessionFactory();
           var theBook = new Book()
            {
               Name = @"Gödel, Escher, Bach: An Eternal Golden Braid",
               Author = "Douglas Hofstadter",
               Description = @"This groundbreaking Pulitzer Prize-winning book 
               sets the standard for interdisciplinary writing, exploring the 
               patterns and symbols in the thinking of mathematician Kurt Godel, 
               artist M.C. Escher, and composer Johann Sebastian Bach.",
               ISBN = "978-0465026562",
               UnitPrice = 22.95M
            };
           var theOtherBook = new Book()
            {
               Name = "Technical Writing",
               Author = "Joe Professor",
               Description = "College text",
               ISBN = "123-1231231234",
               UnitPrice = 143.73M
            };
           var thePoster = new Product()
            {
               Name = "Ascending and Descending",
               Description = "Poster of famous Escher print",
               UnitPrice = 7.95M
            };
           using (var session = sessionFactory.OpenSession())
           using (var tx = session.BeginTransaction())
            {
               session.Delete("from Product");
               tx.Commit();
            }
           using (var session = sessionFactory.OpenSession())
           using (var tx = session.BeginTransaction())
            {
              session.Save(theBook);
              session.Save(theOtherBook);
              session.Save(thePoster);
              tx.Commit();
            }
        var products = GetEscherProducts(sessionFactory);
           OutputProducts(products, log);
           var books = GetEscherBooks(sessionFactory);
           OutputProducts(books.Cast<Product>(), log);
         }
        private static void OutputProducts(
        IEnumerable<Product> products,
        ILog log)
         {
           foreach (var product in products)
            {
               log.InfoFormat(
               "Found {0} with price {1:C}",
               product.Name, 
               product.UnitPrice);
            }
         }
        private static IEnumerable<Product>
        GetEscherProducts(
        ISessionFactory sessionFactory)
         {
           IEnumerable<Product> results;
           using (var session = sessionFactory.OpenSession())
           using (var search = Search.CreateFullTextSession(
           session))
           using (var tx = session.BeginTransaction())
          {
           var queryString = "Description:Escher";
           var query = search
           .CreateFullTextQuery<Product>(queryString);
           results = query.List<Product>();
           tx.Commit();
          }
         return results;
         }
        private static IEnumerable<Book> GetEscherBooks(
        ISessionFactory sessionFactory)
         {
          IEnumerable<Book> results;
          using (var session = sessionFactory.OpenSession())
          using (var search = Search.CreateFullTextSession(session))
          using (var tx = session.BeginTransaction())
          {
           var queryString = "Description:Escher";
           var query = search
           .CreateFullTextQuery<Book>(queryString);
           results = query.List<Book>();
           tx.Commit();
          }
         return results;
         }
      }
  10. Build and run your application

How it works…

In this recipe, we've offloaded our full-text queries to a Lucene index in the bin/Debug/Index folder.

First, let us quickly discuss some Lucene terminologies. The Lucene database is referred to as an Index. Each record in the Index is referred to as a document. In the case of NHibernate Search, each document in the Index has a corresponding entity in the relational database. Each document has fields and each field comprises a name and value. By default, fields are tokenized or broken up into terms. A term can best be described as a single, significant, and lowercase word from some string of words. For example, the string Bag of Cats can be tokenized into the terms bag and cat. Additionally, Lucene maintains a map of terms in a field, details of which documents contain a given term, and the frequency of that term in the document. This makes keyword searches extremely fast.

Entity classes with the Indexed attribute will be included as documents in the Lucene index. The remaining attributes are used to determine which properties from these entities should be included in the document, and how that data will be stored. Automatically, the _hibernate_class field stores the entity type. Each searchable entity must have a field or property decorated with the DocumentId attribute. This is stored in the Id field, and is used to maintain the relationship between entities and documents. In our case, the Id property on Entity will be used.

To be useful, we should include additional data in our documents using the Field attribute. For keyword searches, we've included the tokenized name and description of every product, and the author of every book. We've also included the ISBN of every book, but have chosen not to tokenize it because a partial ISBN match is useless.

The SearchConfiguration class is responsible for building an NHibernate configuration, adding the necessary NHibernate Search settings to the configuration, and building an NHibernate session factory.

The Search.CreateFullTextSession method wraps the standard NHibernate session and returns IFullTextSearchSession. These sessions behave as normal NHibernate sessions, but provide additional methods for creating full-text search queries against the Lucene index. The CreateFullTextQuery method of the session takes a Lucene query in string or query object form and returns a familiar NHibernate IQuery interface, the same interface used for HQL and SQL queries. When we call List or UniqueResult, the query is executed against our Lucene index. For example, the query in our GetEscherProduct query will search Lucene for documents with a Description containing the term escher. This query returns two results: the GEB book and the M. C. Escher poster. The IDs of all of those search results are gathered up and used to build a SQL database query similar to the next query.

SELECT this_.Id          as Id0_0_,
       this_.Name        as Name0_0_,
       this_.Description as Descript4_0_0_,
       this_.UnitPrice   as UnitPrice0_0_,
       this_.Director    as Director0_0_,
       this_.Author      as Author0_0_,
       this_.ISBN        as ISBN0_0_,
       this_.ProductType as ProductT2_0_0_
FROM   Product this_
WHERE  (this_.Id in ('5933e3ba-3092-4db7-8d19-9daf014b8ce4' /* @p0 */,'05058886-8436-4a1d-8412-9db1010561b5' /* @p1 */))

It is amazingly fast because this database query is performed on the primary key. The Lucene query is fast because the index was specially designed for that purpose. This has the potential for huge performance and functionality gains over the weak full-text search capabilities in most relational databases.

There's more…

This is just the most basic example of what we can do with NHibernate Search. We can also choose to store the original value of a field in the document. This is useful when we want to display Lucene query results without querying the SQL database. Additionally, Lucene has many more features, such as search-term highlighting and spell-checking. Although Lucene is a very capable document database, remember that it is not relational. There is no support for relationships or references between documents stored in a Lucene index.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.226.121