While many relational databases provide some mechanism for full-text searches, these databases are optimized for Online Transaction Processing (OLTP) type workloads. Full-text search engines, on the other hand, are designed specifically for text queries, and excel at them. In this recipe, we'll show you how to use NHibernate search and Lucene.Net
to provide full-text search capabilities to your applications.
Complete the Eg.Core
model and mappings from Chapter 1, The Configuration and Schema.
Eg.Core
project using NuGet Package Manager Console by running the following command:Install-Package NHibernate.Search
Entity
base class, decorate the Id
property with the [DocumentId]
attribute from NHibernate.Search.Attributes
.Product
class:[Indexed] public class Product : Entity { [Field] public virtual string Name { get; set; } [Field] public virtual string Description { get; set; } public virtual Decimal UnitPrice { get; set; } }
Book
class:[Indexed] public class Book : Product { [Field(Index = Index.UnTokenized)] public virtual string ISBN { get; set; } [Field] public virtual string Author { get; set; } }
Eg.Search.Runner
.Eg.Search.Runner
project using NuGet Package Manager Console by running the following commands:Install-Package NHibernate.Search Install-Package log4net
App.config
file with the standard log4net
and hibernate-configuration
sections.SearchConfiguration
using the following code:public class SearchConfiguration { public ISessionFactory BuildSessionFactory() { var cfg = new Configuration().Configure(); SetSearchProps(cfg); AddSearchListeners(cfg); var sessionFactory = cfg.BuildSessionFactory(); return sessionFactory; } private void SetSearchProps(Configuration cfg) { cfg.SetProperty( "hibernate.search.default.directory_provider", typeof(FSDirectoryProvider) .AssemblyQualifiedName); cfg.SetProperty( "hibernate.search.default.indexBase","~/Index"); } private void AddSearchListeners(Configuration cfg) { cfg.SetListener(ListenerType.PostUpdate, new FullTextIndexEventListener()); cfg.SetListener(ListenerType.PostInsert, new FullTextIndexEventListener()); cfg.SetListener(ListenerType.PostDelete, new FullTextIndexEventListener()); cfg.SetListener(ListenerType.PostCollectionRecreate, new FullTextIndexCollectionEventListener()); cfg.SetListener(ListenerType.PostCollectionRemove, new FullTextIndexCollectionEventListener()); cfg.SetListener(ListenerType.PostCollectionUpdate, new FullTextIndexCollectionEventListener()); } }
Program.cs
, use the following code:class Program { static void Main(string[] args) { XmlConfigurator.Configure(); var log = LogManager.GetLogger(typeof(Program)); var cfg = new SearchConfiguration(); var sessionFactory = cfg.BuildSessionFactory(); var theBook = new Book() { Name = @"Gödel, Escher, Bach: An Eternal Golden Braid", Author = "Douglas Hofstadter", Description = @"This groundbreaking Pulitzer Prize-winning book sets the standard for interdisciplinary writing, exploring the patterns and symbols in the thinking of mathematician Kurt Godel, artist M.C. Escher, and composer Johann Sebastian Bach.", ISBN = "978-0465026562", UnitPrice = 22.95M }; var theOtherBook = new Book() { Name = "Technical Writing", Author = "Joe Professor", Description = "College text", ISBN = "123-1231231234", UnitPrice = 143.73M }; var thePoster = new Product() { Name = "Ascending and Descending", Description = "Poster of famous Escher print", UnitPrice = 7.95M }; using (var session = sessionFactory.OpenSession()) using (var tx = session.BeginTransaction()) { session.Delete("from Product"); tx.Commit(); } using (var session = sessionFactory.OpenSession()) using (var tx = session.BeginTransaction()) { session.Save(theBook); session.Save(theOtherBook); session.Save(thePoster); tx.Commit(); } var products = GetEscherProducts(sessionFactory); OutputProducts(products, log); var books = GetEscherBooks(sessionFactory); OutputProducts(books.Cast<Product>(), log); } private static void OutputProducts( IEnumerable<Product> products, ILog log) { foreach (var product in products) { log.InfoFormat( "Found {0} with price {1:C}", product.Name, product.UnitPrice); } } private static IEnumerable<Product> GetEscherProducts( ISessionFactory sessionFactory) { IEnumerable<Product> results; using (var session = sessionFactory.OpenSession()) using (var search = Search.CreateFullTextSession( session)) using (var tx = session.BeginTransaction()) { var queryString = "Description:Escher"; var query = search .CreateFullTextQuery<Product>(queryString); results = query.List<Product>(); tx.Commit(); } return results; } private static IEnumerable<Book> GetEscherBooks( ISessionFactory sessionFactory) { IEnumerable<Book> results; using (var session = sessionFactory.OpenSession()) using (var search = Search.CreateFullTextSession(session)) using (var tx = session.BeginTransaction()) { var queryString = "Description:Escher"; var query = search .CreateFullTextQuery<Book>(queryString); results = query.List<Book>(); tx.Commit(); } return results; } }
In this recipe, we've offloaded our full-text queries to a Lucene index in the bin/Debug/Index
folder.
First, let us quickly discuss some Lucene terminologies. The Lucene database is referred to as an Index. Each record in the Index
is referred to as a document. In the case of NHibernate Search, each document in the Index
has a corresponding entity in the relational database. Each document has fields and each field comprises a name and value. By default, fields are tokenized or broken up into terms. A term can best be described as a single, significant, and lowercase word from some string of words. For example, the string Bag of Cats
can be tokenized into the terms bag and cat. Additionally, Lucene maintains a map of terms in a field, details of which documents contain a given term, and the frequency of that term in the document. This makes keyword searches extremely fast.
Entity classes with the Indexed
attribute will be included as documents in the Lucene index. The remaining attributes are used to determine which properties from these entities should be included in the document, and how that data will be stored. Automatically, the _hibernate_class
field stores the entity type. Each searchable entity must have a field or property decorated with the DocumentId
attribute. This is stored in the Id field, and is used to maintain the relationship between entities and documents. In our case, the Id property on Entity
will be used.
To be useful, we should include additional data in our documents using the Field
attribute. For keyword searches, we've included the tokenized name and description of every product, and the author of every book. We've also included the ISBN of every book, but have chosen not to tokenize it because a partial ISBN match is useless.
The SearchConfiguration
class is responsible for building an NHibernate configuration, adding the necessary NHibernate Search settings to the configuration, and building an NHibernate session factory.
The Search.CreateFullTextSession
method wraps the standard NHibernate session and returns IFullTextSearchSession
. These sessions behave as normal NHibernate sessions, but provide additional methods for creating full-text search queries against the Lucene index. The CreateFullTextQuery
method of the session takes a Lucene query in string or query object form and returns a familiar NHibernate IQuery
interface, the same interface used for HQL and SQL queries. When we call List
or UniqueResult
, the query is executed against our Lucene index. For example, the query in our GetEscherProduct
query will search Lucene for documents with a Description
containing the term escher. This query returns two results: the GEB book and the M. C. Escher poster. The IDs of all of those search results are gathered up and used to build a SQL database query similar to the next query.
SELECT this_.Id as Id0_0_, this_.Name as Name0_0_, this_.Description as Descript4_0_0_, this_.UnitPrice as UnitPrice0_0_, this_.Director as Director0_0_, this_.Author as Author0_0_, this_.ISBN as ISBN0_0_, this_.ProductType as ProductT2_0_0_ FROM Product this_ WHERE (this_.Id in ('5933e3ba-3092-4db7-8d19-9daf014b8ce4' /* @p0 */,'05058886-8436-4a1d-8412-9db1010561b5' /* @p1 */))
It is amazingly fast because this database query is performed on the primary key. The Lucene query is fast because the index was specially designed for that purpose. This has the potential for huge performance and functionality gains over the weak full-text search capabilities in most relational databases.
This is just the most basic example of what we can do with NHibernate Search. We can also choose to store the original value of a field in the document. This is useful when we want to display Lucene query results without querying the SQL database. Additionally, Lucene has many more features, such as search-term highlighting and spell-checking. Although Lucene is a very capable document database, remember that it is not relational. There is no support for relationships or references between documents stored in a Lucene index.
3.148.103.210