Such has been the pace of change in the search business, that this chapter has had to be totally rewritten for this edition. All of the largest independent search vendors have now been acquired. There are still some that have found market niches, but given the rate of adoption of open source search, their business strategy is going to have to be agile. This chapter provides an introduction to the search business, with more details on open source search and on Microsoft SharePoint in Chapters 6 and 7, respectively.
Tracking developments in the business of search is eased substantially by the work of Stephen Arnold and his team of researchers who compile the Beyond Search blog, as well as publish reports on various sectors of the market. International Data Corporation and Gartner Group also track developments in this sector, but this research is only available to corporate subscribers. Gartner Group also prepares an annual Magic Quadrant report on this sector that is usually released publicly fairly quickly after publication by one or more of the vendors who have been given a strong endorsement in the review.
Over the last decade, search vendors have come and (mostly) gone. Their technology was generally good, but until recently there was no compelling reason for organizations to invest in search technology. Moreover, large multinational companies required global support and most of the vendors had very limited marketing, sales, and technical support outside of the United States. There would perhaps have been substantial benefits in these vendors working together in some form of trade association to raise the visibility and value of search. However, each vendor considered themselves to be the world leader in technology and working together with their competitors was anathema. The end result was a lonely disappearance from the market.
Some did succeed, at least to a reasonable degree. Autonomy was the most visible and the only search software company to be publicly listed. However, by the end of 2012, most of these mid-range vendors had been acquired, as the following table shows.
Company | Acquired by | Date |
---|---|---|
FAST |
Microsoft |
April 2009 |
Exalead |
Dassault Systems |
June 2010 |
Autonomy |
HP |
August 2011 |
Oracle |
Endeca |
October 2011 |
Isys |
Lexmark |
March 2012 |
Vivisimo |
IBM |
April 2012 |
Subsequent to the purchase of Autonomy, HP took the view that Autonomy was overvalued, and HP then wrote down $8.8 billion of the $10 billion purchase price. Subsequent legal actions may have impacted the degree of confidence of current and prospective customers in the long-term future of Autonomy and its IDOL search software. FAST, Exalead, Endeca, Vivisimo, and Isys have all now been fully integrated into enterprise application suites.
As a result, IBM, Oracle, Microsoft, HP, and SAP are all able to offer search functionality in their enterprise suites. However, it can be difficult to ascertain exactly what functionality is offered and to be able to speak to search experts within pre-sales, sales, and support operations. The roadmap for the search functionality is also driven by the overall roadmap for the enterprise suite. As an example, some of Microsoft SP2013 functionality is currently only available in the cloud implementations.
There are at least 60 independent search vendors whose main line of business is the development of search applications (see Appendix E for a list of these vendors). Most of these independent search vendors have revenues of less than $20 million, and many operate largely in a specific national market to reduce the costs of customer sales and support.
The challenge that these companies face is that they cannot afford to do much in the way of marketing and are virtually unknown to most IT managers. There is also a procurement issue in that procurement departments are always concerned about potential suppliers that have no published accounts. All the vendors will provide financial information under a nondisclosure agreement, but in many cases the profits will be minimal, as they are being ploughed back into the development of the software. Even then, the number of people who have a full understanding of the search software code base will be quite small.
There is no doubt that commercial search vendors have had a difficult time over the last few years. Many have vanished either through acquisition or a fundamental failure to attract and keep customers. Others, notably Coveo and Funnelback, seem to have been able to capitalize on this situation because there remains a good case for having a vendor deliver an integrated stack of applications and then maintain performance through technical upgrades and ongoing support. It is very difficult to get a reliable figure for the installed base of commercial search applications, but in trying to get a sense of the level of adoption, you need to take into account IBM and Oracle enterprise suite customers, any organization running SharePoint (especially SharePoint 2013), Google ESA installations, and, of course, cloud services such as Amazon AWS. The search capability of file-share applications is getting better, and most of the intranet products (e.g., Thoughtfarmer and Interact Intranet) offer good search applications. Sitecore now has Coveo as a partner—just one example of improved search performance in the CMS business. An important trend to note is the increasing use of Apache Lucene and Solr, with additional applications integrated into the suite and charged for on a commercial basis. This is, of course, the situation with IBM Omnifind, and other examples include Attivio, PolySpot, and IntraFind.
Clearly, open source development will continue to be a very important solution to a wide range of unstructured and structured content discovery requirements, but this approach does not suit every organization. An IT department may not have the resources in terms of skills and time to develop a set of search requirements, manage the development process, integrate it into the existing stack of enterprise applications, and then continue to manage the upgrade process as new requirements come along. The integration process is a particular challenge. As the benefits of search become more widely recognized, there will be a requirement to search across a wider range of repositories and applications.
This integration expertise could be provided by an internal or external development team, but software development companies often do not want to be in the business of systems integration. One of the justifications that is often used for open source search adoption is that the risk that the vendor goes out of business is eliminated. Provided that the application is built solely on open source components, then that would be the case, but as the industry continues to add in proprietary code to lock in organizations and to improve margins, that justification is increasingly of less value.
Much is made, rightly, of the benefits of having a development community that is constantly seeking to improve the code base. For example, the Elasticsearch support plans are all focused on development and production support. However, many organizations are looking for a different type of community, one that offers the ability to meet other users of the application and share implementation and development experience. Before the Microsoft acquisition of FAST Search and Transfer in 2008, the annual event in the search business was the FAST Forward user conference, providing an excellent opportunity for customers to meet together and share experiences, as well as helping FAST identify product development opportunities.
The need to provide their venture fund owners with a return on investment is inevitably going to focus the minds of the open source development community on revenue opportunities. My sense is that there is a slow but steadily increasing awareness of the importance of effective search solutions, thanks mainly to the results of surveys from AIIM, Findwise, and NetStrategyJMC, referred to in Chapter 1. The result is that the market for both new and replacement search applications is growing. That is good news for the venture funds behind companies such as Elasticsearch and Lucid Works. But it is also good news for startup companies that have a vision for a new generation of search applications built on open source platforms but targeted at IT departments (who still own or manage the majority of search applications) that feel more comfortable with purchasing a product with edges to it, and a traditional approach to product roadmap development and post-implementation support.
Over the last few years, open source search has moved center stage as a search solution. Although there are a number of open source search applications available, the dominant applications are Lucene, used in combination with Solr, and Elastic, which is based on Lucene. Both can be downloaded at no charge. The indications are that there have been over 20 million downloads of Elastic by early 2015. Chapter 6 looks in detail at the structure and future prospects for open source search.
There are a range of commercial and open source intranet application solutions, all of which include a search application. Most of these applications are custom designed to integrate very closely with the functionality (especially in searching for people) of the intranet. The issue here is the extent to which the search application can index other repositories—for example, a social networking application. In addition, consideration needs to be given to how the search application is going to be supported. It could be supported by the intranet team itself, except that this team is likely to be very small and have little expertise in search management.
A search appliance is a search application and disk storage ready installed in a standard rack casing. In principle, it can be installed and switched on in perhaps 30 minutes. The product concept has been made famous by Google with its Enterprise Search Appliance, but Google was not the first company to offer an appliance product. The search appliance was pioneered by the US company Thunderstone in 2003, though the company itself was founded in 1981. Other appliance vendors include Fabasoft Mindbreeze (Austria), MaxxCat, SearchBlox, Searchdiamon, and Teradata.
The Google innovation was the pricing policy, which is based on the number of documents to be indexed and searched. The search appliance license points begin at indexing 500,000 documents, and extend all the way up to 30 million documents or more. The Google Search Appliance is offered at two- or three-year license points, which include support, hardware replacement coverage, and software updates. When the contract period ends, a new contract has to be negotiated and a new appliance is provided.
This means that some careful calculations have to be made about the total cost of ownership over a five-year period that would be the minimum typical life span for a more conventional application. Most companies have no idea of how much information they need to index, much less the number of documents. Multiple versions of the same document quickly increase the number being indexed. Another factor to be considered is the cost of purchasing additional server licenses to provide for redundancy in the event of a server failure and also for development and test purposes.
In general, search appliances offer very good processing performance because the software and hardware are fully integrated by the vendor. However, it is usually difficult to tune appliances to improve relevancy, the range of connectors to other applications is limited, and customer support is often restricted to a local partner.
Microsoft SharePoint is probably the most widely installed of all search applications, with some organizations still working with SharePoint 2007 (aka MOSS07), SharePoint 2010, and now SharePoint 2013 either on premise or in the cloud. There is more about SharePoint search in Chapter 7.
There are two key points to take into consideration. The first is that the full functionality of SharePoint 2013 search is only available against content that is stored in SharePoint. The second is that specialized search expertise is needed to get the best out of SharePoint 2013. If you have been running FAST Search Server for SharePoint 2010 (FS4SP) on an Enterprise license, then the jump to SharePoint 2013 is not too difficult. By comparison, the jump from the Standard license search in SharePoint 2010 to SharePoint 2013 is considerable.
One of the most contentious issues around search applications is what the future roadmap for the application is going to be. The situation with search applications is no different from most other enterprise applications, with vendors being very reluctant to disclose more than perhaps a six-month release schedule. Two elements to pay particular attention to are any proposed changes to the server architecture and any proposed changes that might require a partial or complete re-index. Small changes in the user interface or the administration interface can usually be accommodated without too much effort.
Before considering the prospects for commercial search, the question of whether there is a future for “keyword” search is worth considering. In 1947, Winston Churchill remarked to the House of Commons that “No one pretends that democracy is perfect or all wise. Indeed it has been said that democracy is the worst form of government except all those other forms that have been tried from time to time.” The same can be said for keyword search. The approach may not be perfect, though in the case of search, defining what is perfection is not possible. Other approaches have come along (probably the most visible being Autonomy IDOL) but none have become widely adopted.
The demands of the intelligence community, in particular, are stimulating the development of applications that can sift through very large quantities (often streaming) to find weak signals, but these applications are being used in organizations with highly skilled staff and very sophisticated technology platforms.
For the foreseeable future, most organizations are not able to make use of the capabilities of the wide range of keyword-based solutions that are available, and behind the scenes, there is a substantial amount of research being undertaken in how to extend the functionality and performance of these solutions.
Some of the software modules used in enterprise search applications are highly specialized. This is particularly the case with the management of languages. Two companies, Basis Technology and Teragram (SAS), are the market leaders in providing very sophisticated text analytics applications. Both companies have developed techniques for parsing and indexing Arabic and Asian languages that are widely used within the search industry. Another important sector is the development of document filters, which are available from companies such as Lexmark Enterprise Software and dtSearch. Oracle and HP/IDOL also have document filter modules, but the availability of these for use outside of their enterprise suites may be questionable.
The scale of the cloud-based file-sharing industry is immense and many IT departments see cloud-based solutions as the default strategy for the organization. It also seems likely that hybrid search applications will emerge to take advantage of the scalability of cloud applications and yet maintain the security management and data privacy management features of on-premise applications.
It is vital to look very carefully at the small print to understand what is actually being crawled and indexed, what the costs are, and what the implications will be of migrating from one cloud provider to another in terms of transferring the index files. If these cannot be transferred then all the repositories will have to be re-indexed. There could also be differences between an on-premise solution and a cloud solution of the same product, with SharePoint 2013 being a good example.
Similar problems can arise with hosted extranet and collaboration applications, intranets, and enterprise social network applications. Currently, most of these applications provide no more than a fairly basic level of search functionality. The hosted search/on-premise search strategy needs to be carefully considered from a user perspective, as well as from an enterprise architecture perspective.
Some vendors will provide a version of their search application to companies in the document management, customer relationship management, and other enterprise applications. The version supplied to the customer may well have a reduced functionality compared to the current version of the product, and indeed may not be subject to the same upgrade roadmap as the standalone search product. In addition, it is highly unlikely that the search application can be extended to search other repositories.
Smaller search vendors will often work directly with clients, especially where the software has been designed to work out of the box. There may be a need for a few days of support, mainly around the installation of the software on the server, sorting out disaster recovery options and testing them, and setting up the crawl routines.
There are now a number of systems integration companies that specialize in search implementation projects, offering a range of services, including defining the search requirements, managing the process of product selection, and then supporting the implementation. Most of these companies tend to focus their business around a selection of search software applications, but will have the skills and expertise to handle almost any search implementation project.
In some cases, vendors may feel that the implementation process is too complex for them to support, especially in countries where they may have little or no local office support, or where there are particular technical issues to be overcome, and will then partner with a local search systems integrator. This is usually a win-win situation for all concerned, though it is wise to make sure that the integration team is fully conversant with the version of the search software they are planning to implement.
Companies often outsource IT services, or use a systems integrator to provide support for the implementation of new applications. Search implementation usually only represents a very small revenue opportunity for systems integrators, and so there may not be many staff who can manage a search implementation. For this reason, systems integrators work with a small number of search vendors who can provide backup support to their consultants. It is therefore not surprising that a search integrator only works with a small number of search vendors.
The market for independent commercial search application vendors is being eroded by the high level of adoption of Microsoft SharePoint, the rapidly increasing use of open source search solutions, and to some extent, the Google Enterprise Search Appliance. However, it is unclear whether the open source search business is sustainable given the requirement of investors for a return either through dividends or through a trade sale. This will not affect the availability of Lucene, Solr, and Elasticsource code for development purposes, but may result in a new generation of commercial vendors targeting at specific market niches through building specialized and charged-for applications on top of open source code.
The search industry is tracked and analyzed by Stephen Arnold on his blog Beyond Search.
The Gartner Magic Quadrant report on Enterprise Search does not have a defined publication date each year. The report can be purchased from Gartner, but is often published (sometimes in a slightly edited format) by companies featured in the report. In the past, Forrester, Ovum, and the Real Story Group published reports on the enterprise search business, but these have been discontinued.
Steve Silberman, “The Quest for Meaning: The Story of Autonomy,” Wired, February 2000.
3.22.79.82