Case-insensitive

Case sensitivity is a common use case for indexes. Up until version 3.4, this was dealt with at the application level by creating duplicate fields with all lowercase characters and indexing this field to simulate a case-insensitive index.

Using the collation parameter, we can create case-insensitive indexes and even collections that behave as case-insensitive.

Collation in general allows users to specify language-specific rules for string comparison. A possible (but not the only) usage is for case-insensitive indexes and queries.

Using our familiar books collection, we can create a case-insensitive index on a name like this:

> db.books.createIndex( { "name" : 1 },
                          { collation: {
                              locale : 'en',
                              strength : 1
                            }
                          } )

strength is one of the collation parameters, the defining parameter for case sensitivity comparisons. Strength levels follow the International Components for Unicode (ICU) comparison levels. The values it accepts are as follows:

Strength value	Description
1a	Primary level of comparison. Comparison based on string value, ignoring any other differences such as case and diacritics.
2	Secondary level of comparison. Comparison based on primary level and if this is equal then compare diacritics (that is, accents).
3 (default)	Tertiary level of comparison. Same as level 2, adding case and variants.
4	Quaternary level. Limited for specific use cases to consider punctuation when levels 1-3 ignore punctuation or for processing Japanese text.
5	Identical level. Limited for specific use case: a tie breaker.

Creating the index with collation is not enough to get back case-insensitive results. We need to specify collation in our query as well:

> db.books.find( { name: "Mastering MongoDB" } ).collation( { locale: 'en', strength: 1 } )

If we specify the same level of collation in our query as our index, then the index will be used.

We could specify a different level of collation as follows:

> db.books.find( { name: "Mastering MongoDB" } ).collation( { locale: 'en', strength: 2 } )

Here, we cannot use the index as our index has collation level 1 and our query looks for collation level 2.

If we don't use any collation in our queries, we will get results defaulting to level 3, that is, case-sensitive.

Indexes in collections that were created using a different collation from the default will automatically inherit this collation level.

If we create a collection with collation level 1 as follows:

> db.createCollection("case_sensitive_books", { collation: { locale: 'en_US', strength: 1 } } )

Then, the following index will also have collation strength: 1:

> db.case_sensitive_books.createIndex( { name: 1 } )

And default queries to this collection will be collation strength: 1, case-sensitive. If we want to override this in our queries we need to specify a different level of collation in our queries or ignore the strength part altogether. The following two queries will return case-insensitive, default collation level results in our case_sensitive_books collection:

> db.case_sensitive_books.find( { name: "Mastering MongoDB" } ).collation( { locale: 'en', strength: 3 } ) // default collation strength value
> db.case_sensitive_books.find( { name: "Mastering MongoDB" } ).collation( { locale: 'en'  } ) // no value for collation, will reset to global default (3) instead of default for case_sensitive_books collection (1)

Collation is a pretty strong and relatively new concept in MongoDB and so we will keep exploring it throughout different chapters.

Table of Contents for Case-insensitive

Create new playlist

Sign In

Sign Up

Table of Contents for
Case-insensitive