Apache Solr

Apache Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene. Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via JSON, XML, CSV or binary over HTTP. You query it via HTTP GET and receive JSON, XML, CSV or binary results.

The features at http://lucene.apache.org/solr/features.html are listed here, making it an ideal choice for the capability that we are looking for in our Data Lake implementation:

  • Advanced and optimized full-text search: Powered by Lucene's advanced matching and searching capability
  • Capable of handling high-volume traffic
  • Standards based open interfaces: XML, JSON and HTTP: because of the following standards, easy to code applications and also easy to maintain
  • Comprehensive administration interfaces: Built-in responsive administrative user interface
  • Easy monitoring: Publishes various metrics via Java Management eXtensions (JMX)
  • Highly scalable and fault-tolerant: Uses Apache ZooKeeper internally for scaling out easily and also distributable
  • Flexible with adaptable configuration
  • Near real-time indexing: Uses Lucene’s real-time indexing capability to achieve this.
  • Extensible with plugin architecture: built-in packaged plugins/extensions and easy creation of custom ones as needed.
  • Support for both schema and schema-less documents
  • Faceted search and filtering capability
  • Capable of geospatial search: Location based search features built-in
  • Highly configurable text analysis: Built-in support for many languages and also has other text analysis tools built-in
  • Configurable and extensible caching
  • Built-in security: SSL, authentication and role-based authorization
  • Diverse and advanced storage options
  • Capable of rich document parsing: Apache Tika built-in, is making it easy to index rich content in the form of PDF, Word, and so on

These are some of the features that could be looked upon if this technology has to be chosen for your specific use cases.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.37.196