Accessing Solr from PHP applications

There are a number of ways to access Solr from PHP, and none of them seem to have taken hold of the market as the single best approach. So keep an eye on the wiki page at http://wiki.apache.org/solr/SolPHP for new developments.

Adding the URL parameter, wt=php, produces simple PHP output in a typical array data structure:

array(
  'responseHeader'=>array(
  'status'=>0,
  'QTime'=>0,
  'params'=>array(
    'wt'=>'php',
    'indent'=>'on',
    'rows'=>'1',
    'start'=>'0',
    'q'=>'Pete Moutso')),
  'response'=>array('numFound'=>523,'start'=>0,'docs'=>array(
array(
    'a_name'=>'Pete Moutso',
    'a_type'=>'1',
    'id'=>'Artist:371203',
    'type'=>'Artist'))
  ))

The same response using the Serialized PHP output specified by wt=phps URL parameter is a much less human-readable format that is more compact to transfer over the wire:

a:2:{s:14:"responseHeader";a:3:{s:6:"status";i:0;s:5:"QTime";i:1;s:6:"params";a:5:{s:2:"wt";s:4:"phps";s:6:"indent";s:2:"on";s:4:"rows";s:1:"1";s:5:"start";s:1:"0";s:1:"q";s:11:"Pete Moutso";}}s:8:"response";a:3:{s:8:"numFound";i:523;s:5:"start";i:0;s:4:"docs";a:1:{i:0;a:4:{s:6:"a_name";s:11:"Pete Moutso";s:6:"a_type";s:1:"1";s:2:"id";s:13:"Artist:371203";s:4:"type";s:6:"Artist";}}}}

Tip

Think twice before using the PHP writer types

Un-serializing potentially untrusted data can increase security vulnerability. Additionally, the future of these writer types is in some doubt as PHP client abstraction projects such as solr-php-client and Solarium both use JSON in preference to the PHP writer types.

solr-php-client

A richer option for PHP integration is the solr-php-client. It is available at http://code.google.com/p/solr-php-client/. Interestingly enough, this project leverages the JSON writer type to communicate with Solr instead of the PHP writer type, showing the prevalence of JSON for facilitating interapplication communication in a language-agnostic manner. The developers chose JSON over XML because they found that JSON parsed much quicker than XML in most PHP environments. Moreover, using the native PHP format requires using the eval() function, which has a performance penalty and opens the door for code injection attacks.

The solr-php-client can both create documents in Solr as well as perform queries for data. In /examples/9/solr-php-client/demo.php, there is a demo on how to create a new artist document in Solr for the singer Susan Boyle, and then performing some queries. Installing the demo in your specific local environment is left as an exercise for the reader. On a Macintosh, you should place the solr-php-client directory in /Library/WebServer/Documents/.

An array data structure of key value pairs that match your schema can be easily created and then used to create an array of Apache_Solr_Document objects to be sent to Solr. Notice that we are using the artist ID value -1. Solr doesn't care what the ID field contains, just that it needs to be present. Using -1 ensures that we can find Susan Boyle by ID later!

 $artists = array(
  'susan_boyle' => array(
   'id' => 'Artist:-1',
   'type' => 'Artist',
   'a_name' => 'Susan Boyle',
   'a_type' => 'person',
   'a_member_name' => array('Susan Boyle')
  )
 );

The value for a_member_name is an array, because a_member_name is a multivalued field.

Sending the documents to Solr and triggering the commit and optimize operations is as simple as follows:

  $solr->addDocuments( $documents );
  $solr->commit();
  $solr->optimize();

If you are not running Solr on the default port, then you will need to tweak the Apache_Solr_Service configuration:

$solr = new Apache_Solr_Service( 'localhost', '8983', '/solr/mbartists' );

Queries can be issued using one line of code; the variables $query, $offset, and $limit contain what you would expect them to:

$response = $solr->search( $query, $offset, $limit );

Displaying the results is very straightforward as well. Here we are looking for Susan Boyle based on her ID of -1, highlighting the result using a blue font:

foreach ( $response->response->docs as $doc ) { 

  $output = "$doc->a_name ($doc->id) <br />";

  // highlight Susan Boyle if we find her.
  if ($doc->id == 'Artist:-1') {
    $output = "<em><font color=blue>" . $output . "</font></em>";
   }

  echo $output;
}

Successfully running the demo creates Susan Boyle and issues a number of queries. Notice that if you know the ID of the artist, it's almost like using Solr as a relational database to select a single specific row of data. Instead of select * from artist where id=-1, we did q=id:"Artist:-1", but the result is the same!

Note

Solarium may be what you want!

Solarium (http://www.solarium-project.org/) attempts to improve on other PHP client libraries by not just abstracting away the HTTP communication layer but also more fully modeling the concepts expressed by Solr. It has objects that allow you to easily build complex filter queries and faceting logic.

Drupal options

Drupal is a very successful open source Content Management System (CMS) that has been used for building everything from the WhiteHouse.gov site to political campaigns and university websites. Drupal's built-in search has always been considered adequate, but not great, so the option of using Solr to power search is very popular.

The Apache Solr Search integration module

The Apache Solr Search integration module, hosted at http://drupal.org/project/apachesolr, builds on top of the core search services provided by Drupal, but provides extra features such as faceted search and better performance by offloading servicing search requests to another server. The module has had significant adoption and is the basis of some other Drupal search-related modules.

In order to see the Apache Solr module in action, just visit Drupal.org and perform a search to see the faceted results.

Hosted Solr by Acquia

Acquia is a company providing commercially supported Drupal distributions, and also offers hosted Solr search for Drupal sites that want better search than the built-in MySQL-based search. Acquia's adoption of Solr as a better solution for Drupal than Drupal's own search shows the rapid maturing of the Solr community and platform.

Acquia maintains in the cloud, a large infrastructure of Solr servers saving individual Drupal administrators from the overhead of maintaining their own Solr server. A module provided by Acquia is installed into your Drupal and monitors for content changes. Every five or ten minutes, the module sends content that either hasn't been indexed, or needs to be re-indexed, up to the indexing servers in the Acquia network. When a user performs a search on the site, the query is sent up to the Acquia network, where the search is performed, and then Drupal is just responsible for displaying the results. Acquia's hosted search option supports all of the usual Solr goodies, including faceting. Drupal has always been very database intensive, with only moderately complex pages performing hundreds of individual SQL queries to render! Moving the load of performing searches off one's Drupal server into the cloud, drastically reduces the load of indexing and performing searches on Drupal.

Acquia has developed some slick integration beyond the standard Solr features based on their tight integration into the Drupal framework, which include the following:

  • The Content Construction Kit (CCK) allows you to define custom fields for your nodes through a web browser. For example, you can add a particular field onto a blog node such as oranges/apples/peaches. Solr understands these CCK data model mappings and actually provides a facet of oranges/apples/peaches for it.
  • Turn on a single module and instantly receive content recommendations giving you more-like-this functionality based on results provided by Solr. Any Drupal content can have recommendation links displayed with it.
  • Multisite search is a strength of Drupal and provides the support of running multiple sites on a single codebase, such as drupal.org, groups.drupal.org, and api.drupal.org. Currently, part of the Apache Solr module is the ability to track where a document came from when indexed, and as a result, add the various sites as new filters into the search interface.

Acquia's hosted search product is a great example of Platform as a Service (PaaS), and hosted Solr search is a very common integration approach for many organizations that don't wish to manage their own Java infrastructure or need to customize the behavior of Solr drastically. For a list of all the companies offering hosted Solr search, visit http://wiki.apache.org/solr/SolrHostingProviders.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.251.155