Summarizing some easy recipes for the maintenance of an index

There are some actions useful for the ordinary maintenance of an index, essential for testing it while we modify our configurations. Some of these commands should be saved for later reference, as they are commonly used.

Let's try to include some of them in a short list:

  • Adding a simple dummy document:
    >> curl -X POST 'http://localhost:8983/solr/pdfs/update?commit=true&wt=json' -H 'Content-Type: text/xml' -d '<add><doc><field name='id'>ID01</field><field name='text'>Test Content</field></doc></add>'
    
  • Deleting a document by criteria:
    >> curl -X POST 'http://localhost:8983/solr/pdfs/update?commit=true -H 'Content-Type: text/xml' --data-binary '<delete><query>uid:00000000</query></delete>'
    
  • Extracting text and metadata from a file:
    >> curl -X POST 'http://localhost:8983/solr/pdfs/update/extract?extractOnly=true' -F '[email protected]'
    
  • Posting and indexing a file:
    >> curl -X POST 'http://localhost:8983/solr/pdfs/update/extract?commit=true' -F '[email protected]'
    
  • Saving the last uncommitted modifications to the index:
    >> curl -X POST 'http://localhost:8983/solr/pdfs/update?commit=true' -H 'Content-Type: text/xml' --data-binary '<commit />'
    
  • Ignoring the last uncommitted modifications to the index:
    >> curl -X POST 'http://localhost:8983/solr/pdfs/update?commit=true' -H 'Content-Type: text/xml' --data-binary '<rollback />'
    
  • Optimizing the index:
    >> curl -X POST 'http://localhost:8983/solr/pdfs/update?commit=true' -H 'Content-Type: text/xml' --data-binary '<optimize />'
    
  • Cleaning the index
    >> curl -X POST 'http://localhost:8983/solr/pdfs/update?commit=true' -H 'Content-Type: text/xml' --data-binary '<delete><query>*:*</query></delete>'
    

This short list can be easily used as a quick cheat sheet for the most used operations when testing Solr; I'm sure you will use them many times while reading this book.

In further chapters we will move to more details, step by step, to explore the main parts of the two XML files seen here.

Pop quiz

Q1. Where is the data actually saved?

  1. Under the core/index folder
  2. Under the core/index/data folder
  3. Under the core/data/index folder

Q2. What are the differences between enabling a field to be stored or indexed?

  1. A field stored is always indexed
  2. A field defined as indexed can be used for searches, while a stored one cannot
  3. A field defined as indexed can be used for searches, regardless if is stored or not
  4. A field defined as stored can be returned in the output

Q3. How do we remove only the documents with a field author containing the term Alighieri from the index ?

  1. Posting a document containing the text <delete><query>*:*</query></delete>
  2. Posting a document containing the text <delete><query><field name='author'>alighieri</field></query></delete>
  3. Posting a document containing the text <delete><query>author:alighieri</query></delete>

Q4. What can we see with SimpleTextCodec?

  1. The codec used for saving binary files
  2. The internal structure of an index
  3. The text saved in the index for a full-text search

Q5. Disable tokenization, restart and look again at the index, then index some more data again. Take a look at the SimpleTextCodec saved file; has the data been saved differently?

  1. There are no differences in the file
  2. There are more items in the file, one for each word
  3. There are more items in the file, one for each term

Q6. After cleaning or optimizing your Index with one of the recipes provided at the end of the Chapter, how does the index change?

  1. The number of segments in the core/data/index directory changes
  2. All the files in the core/data/index directory get deleted
  3. All the files in the core/index directory get deleted

Q7. How can we index more than one document?

  1. Writing a single XML file containing multiple documents for indexing them at once
  2. Writing multiple XML files, one for each document to be indexed
  3. Changing the configuration for an update handler

Q8. Is it possible to index a PDF file, adding custom metadata to the corresponding generated Solr document?

  1. Yes, using a parameter in the request sent to the /update handler
  2. No, all the metadata is extracted from Tika and we can't control them
  3. Yes, but only changing the configuration files for the Tika library
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.128.105