Securing Solr from prying eyes

Solr, by default, comes completely open. Anyone can make search requests, anyone can upload documents, anyone can access the administration interface, and anyone can delete data. However, it isn't difficult to lock down Solr for use in any kind of environment. We can do this by making use of the standard practices that you would apply to any kind of web application or server software.

Limiting server access

The single biggest thing you can do to secure Solr is to lock down who has access to the server. Using standard firewall techniques, you can control what IP addresses are allowed to connect to the Solr through the 8983 port.

Unless you have very unusual needs, you won't expose Solr to the Internet directly; instead users will access Solr through some sort of web application, that in turn forwards requests to Solr, collects the results, and displays them to your users. By limiting the IP addresses that can connect to Solr to just those belonging to your web farm, you've ensured that random Internet users and internal users don't mess with Solr.

Note

If you lock down access via IP addresses, then don't forget that if you have external processes uploading content, you need to make sure those IP addresses are added.

Using IP addresses to control access is crude and basic; it doesn't help if someone is connecting to Solr from one of the valid IP addresses. Fortunately, Solr is just a WAR file deployed in a Servlet container, so you can use all of the capabilities of Servlet containers to control access. In order to limit access to /solr/update* and /solr/admin/* in Jetty by requiring BASIC authentication from your users, you merely edit the web.xml in your Solr WAR adding the following stanza at the bottom:

<security-constraint>
  <web-resource-collection>
     <web-resource-name>Solr Admin</web-resource-name>
     
	 <url-pattern>/admin/*</url-pattern>
  </web-resource-collection>
  <auth-constraint>
     <role-name>admin</role-name>
  </auth-constraint>
</security-constraint>
<security-constraint>
  <web-resource-collection>
     <web-resource-name>Solr Update</web-resource-name>
     <url-pattern>/update*</url-pattern>
  </web-resource-collection>
  <auth-constraint>
     <role-name>admin</role-name>
     <role-name>content_updater</role-name>
   </auth-constraint>
</security-constraint>

<login-config>
   <auth-method>BASIC</auth-method>
   <realm-name>Test Realm</realm-name>
</login-config>

This specifies that access to the /update* URLs is limited to anyone in the roles of admin or content_updater, although only admin users can access the /admin/* URLs. The realm-name is what ties the security constraints to the users configured in Jetty.

Tip

Customizing web.xml in Jetty

Sometimes cracking open a WAR file just to customize web.xml can be a pain. However, if you are a Jetty user, then you can put the changes into ./etc/webdefault.xml and Jetty will apply the changes to any WAR file deployed. This is a nice trick if you have just a single webapp in the Jetty container. See ./examples/solr/etc/webdefault.xml and ./examples/solr/etc/jetty.xml for example.

Edit the jetty.xml file and uncomment the <Set name="UserRealms"/> stanza so that it looks like the following:

<Set name="UserRealms">
   <Array type="org.mortbay.jetty.security.UserRealm">
     <Item>
        <New class="org.mortbay.jetty.security.HashUserRealm">
           <Set name="name">Solr Realm</Set>
           <Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties
           </Set>
        </New>
     </Item>
  </Array>
</Set>

The ./etc/realm.properties file contains a list of users with their passwords and roles to which they belong. We've specified that the user named administrator has the roles of content_updater and admin, and therefore can access any /update and /admin URLs. However, the user eric can only access the /update URLs as shown in the following code:

administrator: $ecretpa$$word,content_updater,admin
eric: mypa$$word, content_updater
guest: guest,read-only

Adding authentication introduces an extra roadblock for automated scripts that need to interact with Solr to upload information. However, if you use BASIC authentication, then you can easily pass the username and password as part of the URL request. The only downside is that the password is being transmitted in cleartext, and you should wrap the entire request in SSL for maximum security:

http://USERNAME:PASSWORD@localhost:8080/solr/update 

Note

Normally, you wouldn't want to store passwords in plain text on the server in a file such as realm.properties that isn't encrypted. More information is available at http://docs.codehaus.org/display/JETTY/Realms.

Put Solr behind a Proxy

Another approach to securing Solr is to lock it down via firewall rules and run a proxy that mediates access to the locked down Solr. If you specify that port 8983 isn't accessible to the public, but only accessible on the local box, then you can deploy a proxy on the same server that controls access. There are a some Solr-specific proxy servers available: https://github.com/o19s/solr_nginx, and a NodeJS option: https://github.com/dergachev/solr-security-proxy.

Let's try out the NodeJS option. Assuming that you have Node Package Manager npm installed, run the following code:

>> npm install solr-security-proxy

Then, to start the proxy that allows access to the mbartists and mbtracks cores, but none of the other cores on port 9090, run the startup script in /examples/11 as follows:

>> ./start-solr-security-proxy.sh

You can verify access by trying to access the mbartists core at http://localhost:9090/solr/mbartists/select?q=*:*, but being denied access to the karoke core at http://localhost:9090/solr/mbartists/select?q=*:*. Go ahead, try out some attacks like trying to trigger commits or access the admin control panel!

To administer the protected Solr, you will either need to be on the local box, or set up the firewall to allow access to your specific IP address.

Securing public searches

Although, typically, you access Solr through an intermediate web application, you may want to expose Solr directly to the Internet, albeit in a limited way. One scenario for this is exposing a search in an RSS/Atom feed made possible with Solr's XSLT support (see Chapter 5, Searching, for more on XSLT). Another is using JavaScript, AJAX, and JSONP callbacks from the browser to directly connect to Solr and issue searches. There may be other scenarios where firewall rules and/or passwords might still be used to expose parts of Solr, such as for modifying the index, but some search requests must be exposed to direct Internet access. In this case, you need to configure the exposed request handlers with invariants and/or appends clauses as applicable. For a limited example of this, see the A RequestHandler per search interface section earlier in this chapter.

If there are certain records that need to be excluded from public access, then you'll need to specify an appropriate fq (filter query). If there are certain fields on documents that need to be kept private, then this can be problematic to completely secure, especially if you are working with sensitive data. It's simple enough to specify fl (field list) through invariants, but there are a good number of other parameters that might expose the data (for example, highlighting, maybe faceting) in ways you didn't realize:

<lst name="invariants">
  <int name="fl">public_id,public_description</int>
  <str name="fq">public:true</int>
</lst>

Therefore, if you are working with sensitive data, exposing Solr in this way is not recommended.

Controlling JMX access

If you have started Solr with JMX enabled, then you should also have a JMX username and password configured. While, today, the JMX interface only exposes summary information about the Solr components and memory consumption, in the future versions, actual management options such as triggering optimizing indexes will most likely be exposed through JMX. So, putting JMX access under lock and key is a good idea.

Securing index data

One of the weaknesses of Solr, due to the lack of a built-in security model, is that there aren't well-defined approaches for controlling which users can manipulate the indexes by adding, updating, and deleting documents, and who can search which documents. Nevertheless, there are some approaches for controlling access to documents.

Controlling document access

You can start off with some of the ideas talked about in the A RequestHandler per search interface section to control search access to your index. However, if you need to control access to documents within your index and must control it based on the user accessing the content, then one approach is to leverage the faceted search capabilities of Solr. You may want to look back at Chapter 7, Faceting, to refresh your memory on faceting. For example, you may have a variety of documents that have differing visibility depending on whether someone is a member of the public or an internal publicist.

The public can only see a subset of the data, but a publicist can see more information, including information that isn't ready for public viewing. When indexing documents, you should store the roles in a separate multiValued field that a user must belong to in order to gain access to the document:

<field name="roles" type="text" indexed="true" stored="true" multiValued="true" />

A document that was for everyone would be indexed with the role values Public and Publicist. Another document that was for internal use would just have the Publicist role. Then, at query time, you could append extra request parameters to limit what is returned depending on the roles that someone belonged to by treating the roles as a facet:

/solr/select/?q=music&start=0&facet=on&facet.field=roles&fq=role%3Apublic

In the preceding example, we are querying for music that is accessible by anyone with the role public. Obviously, this requires significant logic to be implemented on the client side interfacing with Solr, and is not as robust a solution as we may wish.

Other things to look at

Remote streaming is the ability to give Solr the URL to a remote resource or local file and have Solr download the contents as a stream of data. This can be very useful when indexing large documents as it reduces the amount of data that your updating process needs to move around. However, it means that if you have the /debug/dump request handler enabled, then the contents of any file can be exposed. Here is an example of displaying to anyone my authorized_keys file:

http://localhost:8983/solr/mbartists/debug/dump?stream.file=/Users/epugh/.ssh/authorized_keys

If you have this turned on, then make sure that you are monitoring the log files, and also that access to Solr is tightly controlled. The example application has this function turned on by default.

In addition, in a production environment, you want to comment out the /debug/dump request handler, unless you are actively debugging an issue.

Just as you need to be wary of a SQL injection attack for a relational database, there is a similar concern for Solr. Solr should not be exposed to untrusted clients if you are concerned about the risk of a denial of service attack. This is also a concern if you are lax in how your application acts as a broker to Solr. It's fairly easy to bring down Solr by asking it to sort by every field in the schema, which would result in sudden exorbitant memory usage. There are other similar attacks if an attacker can submit an arbitrary function query as part of their query.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.236.219