Loading sample data

Now that we've got acquainted with Solr and the various commands involved in Solr's day-to-day usage, let's populate it with data so that we can query it as needed. Solr comes with sample data in examples. We will use the same $solr_home/example/films for our queries.

Fire up the terminal and create a collection films with 10 shards:

 solrin create -c films -shards 10

Now, in $solr_home/example/films there is file called file.json. Let's import it into our collection, films. Based on your OS, hit the appropriate command for the post script or post.jar.

Uh Oh!! It throws an error, as follows:

What must have gone wrong? By checking the logs while creating the collection, you must have seen a warning.

Warning
Using _default config set data driven schema functionality is enabled by default, which is not recommended for production use.

To turn it off, use the following command:

curl http://localhost:8983/solr/test11/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'

While creating the collection, we went on with the _default config set. The _default config set does two things: we make use of the managed schema meaning, the schema can be modified only through Solr's schema API. We don't specify the field mapping and let the config set to do the guessing. It may seem advantageous at one point because we don't have to restrict Solr to know any pre-fields and we can adopt the concept of schemaless. Solr will create fields on demand as it encounters documents. Now, this very advantage had become a problem. If you open up films.json and check the name of the first field, you will see it as .45. Solr, guessing it as Float type, keeps the datatype of the filmname as Float; the moment it encounters text, it spits out the error. We end up with huge trouble as we can't change the mapping fields once the index contains data.

For the same reason, it is not recommended to go with the _default schema config set. So, let's solve this issue by leveraging the schema API to modify our schema definition.

Delete the films collection by hitting the rest point:

http://localhost:8983/solr/admin/collections?action=DELETE&name=films

Create the collection again using the following command:

binsolr.cmd -c films -shards 10 -n schemaless

We will be passing -n so that it will pick up the schemaless configuration instead of the previous default configuration.

Now make a post call with the following details, which will basically tell Solr to expect a field's name as a text rather than letting it auto-guess as Float:

End point	`http://localhost:8983/solr/films/schema`
Header	`{“Content-Type”:application/json}`
Body	`{"add-field": {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}`

You should get a successful response.

Now try hitting the import command again:
java -Dauto -Dc=films -jar post.jar E:solr-7.1.0solr-7.1.0examplefilmsfilms.json

You should be able to do so successfully. Go to the browser and open http://localhost:8983/solr/films/select?q=*:*. You should be able to see 1,100 records. You can similarly import CSV and XML files.

While importing a CSV file, you need to specify that if a field has multiple values, it should split; and you need to specify what the separator would be. If you check out films.csv, then you will notice that genres and directed_by are such fields. In this case, our import command would be:

java -jar -Dc=films -Dparams=f.genre.split=true & f.directed_by.split=true &  f.genre.separator=| & f.directed_by.separator=| -Dauto exampleexampledocspost.jar examplefilms*.csv

This is how we can load unstructured data in Solr. Let's now look at how to load structured data in Solr.

Table of Contents for Loading sample data

Create new playlist

Sign In

Sign Up

Table of Contents for
Loading sample data