Index

[A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X]

A

Abstract Window Toolkit.
    See AWT.
access control list.
    See ACL.
ACL
Activity Monitor process monitor
administration interface
Adobe Flash, extracting text from2nd
Adobe PDF, extracting text from2nd.
    See also PDF.
AFP
  used for indexing
AgoFilterBuilder
AllDocCollector2nd
AlreadyClosedException
analysis, Snowball, supported languages.
    See also indexing, analysis.
analytics interface
  Google Analytics
  Lucene-specific metrics
analyzer, provided during indexing
AnalyzerDemo2nd3rd
AnalyzerUtils5th
  displayTokens2nd
  displayTokensWithFullDetails
  displayTokensWithPositions
  tokensFromAnalysis
AnalyzingQueryParser
Another Tool for Language Recognition.
    See ANTLR.
Ant, building Lucene.
    See also Apache Ant.
ANTLR2nd
Apache Ant
  preparing to use
  to build contrib modules
Apache Commons
  columnar formatting
  Digester, indexing using
Apache Jakarta
Apache JMeter
Apache POI project
Apache Software Foundation
Apache Software License
Apache subversion instance
Apache Tika.
    See Tika.
Aperture
  open source project
Apple Mac OS X, search feature
AR Archives, extracting text from
ArabicAnalyzer
Aroush, George
ASCIIFoldingFilter2nd
Asian language analysis
Attribute
AttributeSource
  addAttribute
  analysis
  captureState
  restoreState
audio formats, extracting text from metadata
AutoDetectParser3rd
  Tika class
Autonomy
AWT

B

backward compatibility.
    See Version.
BalancedMergePolicy
BalancedSegmentMergePolicy
Balmain, David
Beagle2nd
benchmark, OpenIndex
Berkeley DB, storing index
BerkeleyDBJESearcher
Bialecki, Andrzej
Bobo Browse
  beyond simple faceting
  integration with Zoie
  Runtime FacetHandlers
  sorting
BoboBrowser
BoboIndexReader2nd
BodyContentHandler2nd
BookLinkCollector
BooksLikeThis
  using MoreLikeThis
BooleanClause
  Occur
    MUST
    MUST_NOT
    SHOULD
BooleanFilter
BooleanQueryBuilder
BooleanScorer
BoostingQuery
  negativeQuery
  positiveQuery
boosts
BrazilianAnalyzer
Browsable
Browse Engine, denormalization
BrowseFacet
BrowseHit
BrowseRequest
  setFilter
BrowseResult
  getFacets
  getHits
BrowseSelection
Builder
BulletinPayloadsAnalyzer
BulletinPayloadsFilter
BZIP2 files, extracting text from2nd

C

C#, tokenizing
C++, tokenizing
CachedFilter
caching
  field values
  filter
CachingSpanFilter
CachingTokenFilter
  used during highlighting
CachingWrapperFilter2nd3rd4th
CartesianTierPlotter
Cascading Style Sheets.
    See CSS.
Catasta, Michele
catchall field, for searching multiple fields
categorizing documents, using term vectors
CellConjunctionScorer
CellDisjunctionScorer
CellQuery
CellReqExclScorer
CellScorer
ChainedFilter
  combining with AND
  combining with ANDNOT
  combining with OR
  combining with XOR
  security filter example
chaining filters
charades
CharFilter
CharReader
CharStream
CharTokenizer
Chinese analysis
Chinese, Japanese, and Korean.
    See CJK.
ChineseAnalyzer
ChineseDemo
ChineseTest
CIFS.
    See Samba file system.
CJK, analysis of
CJKAnalyzer2nd
client-server port definition
CloseIndex benchmark task
CLucene
  API compatibility
  supported platforms
  Unicode support
Collator, used for sorting String fields
Collector3rd4th
  acceptsDocsOutOfOrder
  collect
  custom
  setNextReader
  setScorer
  using field cache
common errors
Comparable
Compass
  denormalization
ComplexPhraseQueryParser
compound index, creating
CompressionTools
ConcurrentMergeScheduler2nd3rd4th5th
  setMaxThreadCount
ConcurrentModificationException
ConstantScoreQuery
content
  acquiring
  dividing into shards
  raw, extracting for documents
ContentHandler2nd
ContentSource2nd
contrib modules
  introduction
  spatial search
coordination, query term
CorePlusExtensionsParser
CorruptIndexException
CPIO Archives, extracting text from
createLineFile.alg
CreateSpellCheckerIndex
CreateThreadedIndexTask
CSS
  in highlighting
CustomQueryParser
CustomScoreQuery
  getCustomScoreProvider
Cutting, Doug3rd
  relevant work
CzechAnalyzer

D

database
  primary key
  storing index inside Berkeley DB
DatabaseConfig
DateField, dateToString
DateFilter, within ChainedFilter
DateFormat, SHORT
DateRecognizerSinkTokenizer
DateTools
debugging queries
DefaultEncoder
DefaultSimilarity
Delbru, Renaud
DeletionPolicy
denormalization
DERI
  Sindice.com search engine
Dictionary
Digester.
    See also Apache Commons Digester.
  addCallMethod
  addObjectCreate
  addSetNext
  addSetProperties
DigesterXMLDocument2nd
Digg
Digital Enterprise Research Institute.
    See DERI.
DirContentSource
Directory implementations
  FileSwitchDirectory
  MMapDirectory
  NIOFSDirectory
  RAMDirectory
  SimpleFSDirectory
directory in Berkeley DB
DirectSolrConnection
DisjunctionMaxQuery
  tie-breaker
DistanceComparatorSource
DistanceQueryBuilder
DistanceSortSource
DocIdBitSet
DocIdSet
Document3rd4th7th8th
  editing with Luke
  reuse
  setBoost
document type definition.
    See DTD.
documentation
documents and fields
DOMUtils
Donovan, Aaron
downloading Lucene
Droids
DSight, denormalization
DTD
DuplicateFilter
DutchAnalyzer
dynamic fragmenting vs. highlighting

E

EdgeNGramFilter
EdgeNGramTokenizer
Edit distance.
    See Levenshtein distance.
Elastic search, sharding and replication
Elschot, Paul
encoding UTF-8
entitlements, definition
EnvironmentConfig
EnwikiContentSource
Eventful
Excel.
    See Microsft Excel.
Explanation
Extensible Hypertext Markup Language.
    See XHTML.
Extensible Stylesheet Language.
    See XSL.

F

FacetAccessible
faceted search
  Bobo Browse
  definition
FacetHandler2nd
FacetSpec
  setMaxCount
  setMinHitCount
  setOrderBy
FastVectorHighlighter
  compared to Highlighter
Ferret
field cache
  DEFAULT
  memory usage2nd
  per segment readers
  setInfoStream
  used by sorting
  used for sorting
field options
  combinations
  compressing fields
  indexing
    ANALYZED
    ANALYZED_NO_NORMS
    NO
    NOT_ANALYZED
    NOT_ANALYZED_NO_NORMS
  sorting
  storing
    NO
    YES
  term vectors
    NO
    WITH_OFFSETS
    WITH_POSITIONS
    WITH_POSITIONS_OFFSETS
    YES
FieldCacheRangeFilter2nd
FieldCacheSource
FieldCacheTermsFilter2nd
FieldComparator
FieldDocs
FieldMaskingSpanQuery
FieldNormModifier
FieldQuery
FieldScoreQuery
FieldSelector2nd8th
  accept
  loading only specified fields
  specify fields by set
  stopping after first field
  time savings
FieldSelectorResult
  LAZY_LOAD
  LOAD
  LOAD_AND_BREAK
  LOAD_FOR_MERGE
  NO_LOAD
  SIZE
  SIZE_AND_BREAK
FieldSortedTermVectorMapper
file descriptors, finding the limit
FileNotFoundException over remote file systems
FileSwitchDirectory
FilteredDocIdSet2nd
  match
FilteredQuery2nd3rd
filtering token.
    See TokenFilter.
finding similar documents using termvectors
FlagsAttribute
Flash.
    See Adobe Flash.
Formatter
FragmentsBuilder
FrenchAnalyzer
frequency factor formula
fsync
Fuller, Robert
function queries
  boosting by recency
  using field cache
FuzzyLikeThisQuery
FuzzyQuery4th5th6th7th
  formula
  minimumSimilarity
  prohibiting

G

GermanAnalyzer
Glouser, Grant
Google Analytics
Google Enterprise Connector Manager
GradientFormatter
Grails search plugin
GreekAnalyzer
Grub
GZIP compression, extracting text from

H

Hadoop, creation of
Harwood, Mark2nd3rd
hasDeletions
Hatcher, Erik
Heritrix
Hibernate Search, denormalization
hierarchical organizational schemes
HighFreqTerms
highlighting
  query terms2nd
  using CSS
  vs. dynamic fragmenting
HightlightIt
Hoschek, Wolfgang
HTML
  cookie
  extracting text from2nd
  meta tag
  parsing
HtmlParser
HTTP headers, indexing Last-Modified header
HTTP request, content type
HttpServletRequest
Humphrey, Marvin2nd

I

I18N.
    See internationalization.
IDF2nd
images, extracting text from metadata
index structure, converting
index, inverted
IndexCommit
  getUserData
IndexDeletionPolicy2nd3rd
  example usage
Indexer program
IndexFiles
IndexHTML
indexing classes
IndexMergeTool
IndexReaderDecorator2nd
IndexReaderFactory
IndexReaderWarmer
IndexSplitter
IndexWrter
information
  explosion, dealing with
  overload
  specific, locating quickly
information retrieval.
    See IR.
InputStream
INSO, filters
Installing Lucene
InstantiatedIndex
InstantiatedIndexWriter
intelligent agent, creating
internationalization
InvalidTokenOffsetException
inverse document frequency.
    See IDF.
inverted index.
    See index, inverted.
InvIndexer
IR
  definition
  library vs. search engine
ISYS file readers
iTunes search feature

J

J2ME
JaroWinkler, distance metric for spell correction
Java 2 Micro Edition.
    See J2ME.
Java C Compiler.
    See JCC.
Java class files, extracting text from
Java JAR files, extracting text from
Java Management Extensions.
    See JMX.
Java Native Interface.
    See JNI.
Java Runtime Environment.
    See JRE.
javac, compile with UTF-8 encoding
JavaCC, building Lucene
JavaServer Page.
    See JSP.
JCC
JConsole used by Zoie
JEDirectory
Jetty
JFlex2nd
  building Lucene
  usage in SIREn
JMX
  used by Zoie
JNI
Jones, Tim
JRE2nd
JSP
Jython

K

Katta, sharding and replication
KeepOnlyLastCommitDeletionPolicy2nd
KeyView filters
keyword analyzer
KeywordAnalyzer2nd3rd4th
KeywordTokenizer
KinoSearch
  differences vs Lucene
Krugle
  enterprise appliance
Krugle.org
Krugler, Ken
KStem

L

language detection
Last.fm
lemmatization
LengthFilter
letter ngrams used by spellchecker
LetterTokenizer2nd3rd
Levenshtein
  distance
  distance metric for spell correction
LineDocSource2nd
LinkedIn2nd3rd
LoadFirstFieldSelector
local wrapper port, definition
LockFactory
locking
  during indexing
  write.lock file
LockObtainFailedException
LockStressTest
LockVerifyServer
LogByteSizeMergePolicy
  setMaxMergeDocs
  setMaxMergeMB
  setMergeFactor
  setMinMergeMB
LogDocMergePolicy
LowerCaseFilter2nd3rd4th
LowerCaseTokenizer2nd3rd4th
lowercasing, order may matter
lsof
Lucene ports
Lucene.Net
  API compatibility
  index compatibility
  performance
LUCENE_24
LUCENE_29
Lucy
Luke2nd3rd19th
  Analyzer Tool
  browsing by term
  browsing term vectors
  Custom Similarity
  document browsing
  editing documents
  Hadoop Plugin
  indexing file view
  Overview tab
  scripting with JavaScript
  search explanation
  searching
  searching with QueryParser
  viewing synonyms
  viewing term statistics
LuSQL, denormalization

M


Mannix, Jake
MAP
MapFieldSelector
MappingCharFilter
MatchAllDocsQuery2nd
  used for browsing facets
Maven 2, used by Tika
maxDoc vs. numDocs
MaxFieldLength
  UNLIMITED
  UNLIMITED or LIMITED
MD5, reducing field cache memory usage
mean average precision.
    See MAP.
mean reciprocal rank.
    See MRR.
MemoryIndex
mergeFactor3rd4th5th
  performance impact
MergePolicy2nd3rd5th
  avoiding large segments
MergeScheduler2nd
merging
  LogByteSizeMergePolicy
  LogDocMergePolicy
  waiting for merges to finish
Metadata, Tika class
Metaphone
Microsoft Excel, extracting text from2nd
Microsoft Office 2007, extracting text from
Microsoft Outlook, extracting text from2nd
Microsoft PowerPoint, extracting text from2nd
Microsoft Visio, extracting text from2nd
MIDI files, extracting text from
Miller, George and WordNet
MMapDirectory2nd
Montezuma
MoreLikeThis
MoreLikeThisQuery
MP3 audio, extracting text from tags
MRR
MultiFieldQueryParser
  default operator
  interations with Analyzer
multifile index, creating
MultiPassIndexSplitter
MultiPhraseQuery2nd3rd
  QueryParser
  slop
MultiSearcher2nd3rd
multithreaded searching.
    See ParallelMultiSearcher.

N

native port, definition
native2ascii, Java tool
NativeFSLockFactory2nd
near-real-time reader
near-real-time search4th6th
  avoiding commit
  introduction
  reducing turnaround time
Networked File System.
    See NFS.
newBooleanQuery
newFuzzyQuery
newMatchAllDocsQuery
newMultiPhraseQuery
newPhraseQuery
newPrefixQuery
newRangeQuery
newTermQuery
newWildcardQuery
NFS
  sharing index over
  used for indexing
NGramTokenizer
NIOFSDirectory2nd
NoLockFactory
non-English language analysis
normalization
  field length
  query
NullFragmenter
numDocs vs. maxDoc
numeric range queries
NumericField6th9th10th
  filtering during searching
  precisionStep
  setDoubleValue
  setIntValue
  setLongValue
  sorting
NumericPayloadTokenFilter
NumericRangeFilter2nd3rd
NumericRangeQuery2nd3rd
  created by QueryParser
  creation from QueryParser
  precisionStep
NutchDocumentAnalyzer

O

O’Leary, Patrick
OfficeParser
OffsetAttribute2nd
  endOffset
OLE
Open Office, extracting text from
open source software, judging success
OpenBitSet, used by Filter
OpenDocument files, extracting text from
OpenSolaris, open file limit
optimize
Oracle/Lucene integration, denormalization
OS, I/O cache
Outlook.
    See Microsoft Outlook.
OutOfMemoryError
OutOfMemoryException
OutputStream

P

paging through results
ParallelMultiSearcher2nd3rd
Parr, Terr
ParseContext
ParseException
parsing3rd
  query expressions
  QueryParser method
  versus analysis.
    See QueryParser.
ParsingReader
partitioning indexes
PayloadAttribute2nd
PayloadHelper
PayloadNearQuery
payloads9th
  access via TermPositions
  and SpanQuery
  constructors
  during analysis
  during searching
  example uses
  usage in SIREn
  used by SIREn
PayloadTermQuery2nd
PDF.
    See also Adobe PDF.
PDFBox
PDFParser
PerFieldAnalyzerWrapper2nd
Perl, tokenizing
per-segment searching, field cache
PersianAnalyzer
PHP Bridge
PhraseQuery7th11th12th14th
  contrasted with SpanNearQuery
  converting to SpanNearQuery
  forcing term order
  from QueryParser
  multiple terms
  scoring
  slop
  slop factor
  with synonyms
PipedReader
PipedWriter
plain text, detecting character set
PLucene
Porter stemmer.
    See Porter stemming algorithm.
Porter stemming algorithm
Porter, Dr. Martin2nd
PorterStemFilter2nd3rd4th
PositionalPorterStopAnalyzer
PositionBasedTermVectorMapper
PositionIncrementAttribute
  setPositionIncrement
positionIncrementGap
PowerPoint.
    See Microsoft PowerPoint.
PrecedenceQueryParser
precision, definition
PrefixFilter2nd
PrefixQuery2nd
PrintStream2nd
probabilistic model
Process Monitor
properties file, encoding
ps Unix process monitor
pure Boolean model
PyLucene2nd
  API compatibility
Python, tokenizing

Q

queries, built-in
query expression.
    See QueryParser.
QueryAutoStopWordAnalyzer
QueryBuilder
QueryBuilderFactory
querying
QueryNodeProcessor
QueryTemplateManager
QueryTermScorer
QueryWrapperFilter2nd3rd4th

R

RAID array
RAMDirectory2nd3rd4th5th6th
RangeFilter
RDF2nd
  creating the Web of Data
  definition
  triplestores
ReadTokens task
RecencyBoostingQuery
RegexFragmenter
RegexQuery
regular expressions.
    See WildcardQuery.
relational database
relevance
remote file systems
Remote Method Invocation.
    See RMI.
remote procedure call.
    See RPC.
remote searching
RemoteSearchable
RemoteSearcher
removing common terms.
    See stop words.
Representational State Transfer.
    See REST.
Resource Description Framework.
    See RDF.
REST
ReutersContentSource
reverse native port, definition
ReverseStringFilter
Rich Text Format.
    See RTF.
robocopy, for hot backups of an index
RPC
RSolr
rsync, for hot backups of an index
RTF
  extracting text from2nd
Ruby, tokenizing
RussianAnalyzer

S

Samba file system, used for indexing
SAX
  parsing using
scaling
  index replication
  index sharding
schema, flexible
SCM
SCMI
score
ScoreCachingWrapperSource
ScoreDoc2nd3rd
ScoreOrderFragmentsBuilder
scoring
  formula
  raw score
scrolling.
    See paging.
search model
  probabilistic
  pure Boolean
  vector space
search within search, using Filters
Searchable2nd3rd
SearchClient
Searcher program
SearcherManager
  get
  maybeReopen
  release
  warm
SearchFiles
searching classes
SearchServer
security filtering
  mixed static and dynamic
SegmentReader2nd
SegmentTermEnum, next
Sekiguchi, Koji
Semantic Information Retrieval Engine.
    See SIREn.
semantic web
semistructured data
SerialMergeScheduler2nd
SetBasedFieldSelector
setMaxBufferedDeleteTerms
setRAMBufferSizeMB
shard, definition
Similarity3rd4th
  improving default relevance
  lengthNorm
similarity between documents.
    See term vectors.
similarity scoring formula
Simple API for XML.
    See SAX.
SimpleDateFormat
SimpleFragmenter
SimpleFSDirectory2nd
SimpleFSLockFactory
SimpleHTMLEncoder
SimpleHTMLFormatter
SimpleSpanFragmenter
SingleInstanceLockFactory
sinks
SinkTokenizer
SinusoidalProjector
SIREn
  benchmarks
  BooleanQuery performance compared to Lucene
  data model
  data preparation
  postings format compare to Lucene
  searching entities
  semistructured search2nd
SirenPayloadFilter2nd
slop
  factor defined
  with MultiPhraseQuery
  with SpanNearQuery
SmartChineseAnalyzer2nd3rd
Snowball stemmer
SnowballAnalyzer2nd
solid-state disk.
    See SSD.
SolPerl
SolPHP
SolPython
Solr5th
  creating analysis chain
  Ruby response format
  sharding and replication
  SIREn integration2nd
Solr.pm
Solr.QParser
Solr.QParserPlugin
SortedTermVectorMapper
SortField3rd
  types
SortingExample
Soundex.
    See Metaphone.
source code management interface.
    See SCMI.
source code management.
    See SCM.
span queries
  access to payloads
  combining
  dumpSpans method
  excluding matches
  matching near one another
  matching near the field start
  matching single term
  phrase within phrase matching
  QueryParser
  turning into a filter
SpanFirstQuery2nd
SpanGradientFormatter
SpanNearQuery2nd3rd8th9th10th
  contrasted with PhraseQuery
  deriving from PhraseQuery
  inOrder flag
  slop
SpanNotQuery2nd
SpanOrQuery2nd3rd
SpanQuery5th7th8th
  aggregating
  and QueryParser
  getSpans
  visualization utility
SpanQueryFilter2nd
  bitSpans
SpanRegexQuery
SpanScorer2nd
SpanTermQuery2nd3rd4th
SPARQL query language
SPARQLParser
SPARQLParserPlugin
SPARQLQueryAnalyzer2nd
SpecialsAccessor
SpecialsFilter2nd
SpellChecker
  setAccuracy
  suggestSimilar
Spencer, David2nd
spider.
    See web crawler.
Spolksy, Joel
Spotlight search
Spring
Spring-RPC
SSD
Stale NFS file handle exception
StandardFilter2nd
StandardQueryParser
StandardTokenizer2nd
Stellent document filters.
    See INSO filters.
stemmers, SnowballAnalyzer family
stemming analyzer
stop words4th
  default
  removing
StopAnalyzer2nd3rd4th5th
StopFilter2nd3rd4th
  setEnablePositionIncrements
StopWordFilter
Store, YES
stored fields, custom loading
String.compareTo, compares by UTF16 code unit
StringDistance, getDistance
StringUtils
swappiness, controlling swapping on Linux
SweetSpotSimilarity
SynLookup
SynonymAnalyzer2nd3rd
SynonymAnalyzerViewer
SynonymEngine
Syns2Index
System, nanoTime

T

Tan, Kelvin
TAR Archives, extracting text from2nd
tar, for hot backups of an index
TeeSinkTokenFilter
TeeTokenFilter
Term
TermAttribute
TermFreqVector
TermPositions
TermPositionVector
TermRangeFilter6th
  includeLower
  includeUpper
  open-ended ranges
  with caching
TermRangeQuery2nd4th5th
  created by QueryParser
terms, vs. tokens
TermsFilter2nd
  addTerm
TermVectorAccessor
TermVectorMapper2nd8th9th
  isIgnoringOffsets
  isIgnoringPositions
  map
  setDocumentNumber
  setExpectations
ThaiAnalyzer
The Grinder load testing tool
ThreadedIndexWriter2nd
Tika
  alternatives
  built-in text extraction tool
  customizing parser selection
  getFileMetadata
  installing
  introduction
  limitations
  logical design
  metadata extraction
  modular design
  parse
  parser implementations
  using UNIX pipes
  utility class
TikaConfig
  getParsers
TikaException
TikaIndexer
TimeExceededException
TimeLimitingCollector
  limitations
Token
TokenFilter6th
  additional
  importance of order
  shingles
  splitting source code terms
TokenFilters, for creating payloads
tokenization, definition
Tokenizer
  additional
TokenOffsetPayloadTokenFilter
TokenRangeSinkTokenizer
TokenSources
  getAnyTokenStream
TokenStream2nd
  architecture
  buffering
  incrementToken
  used for highlighting
TokenTypeSinkTokenizer
Tomcat, demo application
tool, Luke
top Unix process monitor
top, measuring page faults
TopDocs2nd3rd
TopFieldCollector
TopFieldDocs
TopScoreDocCollector2nd
Toupikov, Nickolai
triplestore, searching the Web of Data
troubleshooting
truncation.
    See field truncation.
Tummarello, Giovanni
TupleAnalyzer2nd
TupleQuery
  addClause
TupleScorer
TupleTokenizer2nd
two-phased commit
TypeAsPayloadTokenFilter2nd
TypeAttribute

U

UI novel, creating
unanalyzed fields, searching
Unicode
Unix, deletion of open files
URINormalisationFilter2nd
user interface.
    See UI.
UTF-8

V

Vajda, Andi2nd
value2nd
ValueSource
ValueSourceQuery
van Klinken, Ben
van Rossum, Guido
Vector Space Model2nd
VerifyingLockFactory
Version
Visio.
    See Microsoft Visio.
vmstat, measuring page faults

W

W3C
Wall, Larry
Wang, John
WAVE Audio, extracting text from sampling metadata
Web 3.0
web application
  CSS highlighting
  demo
web application server, thread pool
Web of Data
Wettin, Karl
WhitespaceAnalyzer2nd3rd
WhitespaceTokenizer
Wikipedia
  document source
  indexing
WikipediaTokenizer
WildcardQuery3rd4th
  inefficiency
  prohibiting
Windows Explorer
Windows Server 2003, open file limit
Windows, deletion of open files
with payloads
Word.
    See Microsoft Word.
WordNet
  adding synonyms during analysis
  building synonym index
  example synonyms
WordNetSynonymEngine
write.lock
WriteLineDoc
Writer

X

XHTML
  used by Tika
XmlQueryParser2nd
XSL

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.210.143