Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Summarizing some easy recipes for the maintenance of an index

There are some actions useful for the ordinary maintenance of an index, essential for testing it while we modify our configurations. Some of these commands should be saved for later reference, as they are commonly used.

Let's try to include some of them in a short list:

Adding a simple dummy document:

>> curl -X POST 'http://localhost:8983/solr/pdfs/update?commit=true&wt=json' -H 'Content-Type: text/xml' -d '<add><doc><field name='id'>ID01</field><field name='text'>Test Content</field></doc></add>'

Deleting a document by criteria:

>> curl -X POST 'http://localhost:8983/solr/pdfs/update?commit=true -H 'Content-Type: text/xml' --data-binary '<delete><query>uid:00000000</query></delete>'

Extracting text and metadata from a file:

>> curl -X POST 'http://localhost:8983/solr/pdfs/update/extract?extractOnly=true' -F '[email protected]'

Posting and indexing a file:

>> curl -X POST 'http://localhost:8983/solr/pdfs/update/extract?commit=true' -F '[email protected]'

Saving the last uncommitted modifications to the index:

>> curl -X POST 'http://localhost:8983/solr/pdfs/update?commit=true' -H 'Content-Type: text/xml' --data-binary '<commit />'

Ignoring the last uncommitted modifications to the index:

>> curl -X POST 'http://localhost:8983/solr/pdfs/update?commit=true' -H 'Content-Type: text/xml' --data-binary '<rollback />'

Optimizing the index:

>> curl -X POST 'http://localhost:8983/solr/pdfs/update?commit=true' -H 'Content-Type: text/xml' --data-binary '<optimize />'

Cleaning the index

>> curl -X POST 'http://localhost:8983/solr/pdfs/update?commit=true' -H 'Content-Type: text/xml' --data-binary '<delete><query>*:*</query></delete>'

This short list can be easily used as a quick cheat sheet for the most used operations when testing Solr; I'm sure you will use them many times while reading this book.

In further chapters we will move to more details, step by step, to explore the main parts of the two XML files seen here.

Pop quiz

Q1. Where is the data actually saved?

Under the core/index folder
Under the core/index/data folder
Under the core/data/index folder

Q2. What are the differences between enabling a field to be stored or indexed?

A field stored is always indexed
A field defined as indexed can be used for searches, while a stored one cannot
A field defined as indexed can be used for searches, regardless if is stored or not
A field defined as stored can be returned in the output

Q3. How do we remove only the documents with a field author containing the term Alighieri from the index ?

Posting a document containing the text <delete><query>*:*</query></delete>
Posting a document containing the text <delete><query><field name='author'>alighieri</field></query></delete>
Posting a document containing the text <delete><query>author:alighieri</query></delete>

Q4. What can we see with SimpleTextCodec?

The codec used for saving binary files
The internal structure of an index
The text saved in the index for a full-text search

Q5. Disable tokenization, restart and look again at the index, then index some more data again. Take a look at the SimpleTextCodec saved file; has the data been saved differently?

There are no differences in the file
There are more items in the file, one for each word
There are more items in the file, one for each term

Q6. After cleaning or optimizing your Index with one of the recipes provided at the end of the Chapter, how does the index change?

The number of segments in the core/data/index directory changes
All the files in the core/data/index directory get deleted
All the files in the core/index directory get deleted

Q7. How can we index more than one document?

Writing a single XML file containing multiple documents for indexing them at once
Writing multiple XML files, one for each document to be indexed
Changing the configuration for an update handler

Q8. Is it possible to index a PDF file, adding custom metadata to the corresponding generated Solr document?

Yes, using a parameter in the request sent to the /update handler
No, all the metadata is extracted from Tika and we can't control them
Yes, but only changing the configuration files for the Tika library

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Summarizing some easy recipes for the maintenance of an index

Create new playlist

Sign In

Sign Up

Summarizing some easy recipes for the maintenance of an index

Pop quiz

Table of Contents for
Summarizing some easy recipes for the maintenance of an index