CHAPTER 9

image

Implementing Enterprise Search

Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information on it.

—Samuel Johnson

How can users find content if they do not know where a piece of content exists or what a particular file name is? If you know where something is, then you can click into the directory and directly access it; in all other cases, you can use a search engine to quickly connect you to the information you seek. In this chapter, I provide an overview of enterprise search and the search architecture in SharePoint. From there, I discuss how to analyze your enterprise search requirements. I also walk you through how to create an enterprise search application, and how to configure content sources to index and make available for search results.

After reading this chapter, you will know how to

  • Explain the importance of search.
  • Describe enterprise search.
  • Describe the SharePoint search architecture.
  • Analyze your enterprise search requirements.
  • Create an enterprise search application.
  • Configure search content sources.

The Importance of Search

Search has become the centerpiece of any enterprise content management solution. Its criticality relates to all the other features and capabilities depending on search for different aspects of information management, such as content auditing and reporting, but in particular, content discovery.

This chapter focuses on providing a search engine to support users who are explicitly searching for desired information—an aspect of the SharePoint search engine that provides a compelling portion of an organization’s content discovery needs. But the search capabilities run even deeper and reach even further than providing search results for a user’s query entered into a search box. With the content crawled and included in the index, search provides a foundational service that other capabilities can consume and take advantage of in providing their own content discovery features.

Two capabilities that especially utilize and extend the search engine are social and eDiscovery, as illustrated in Figure 9-1. In the next chapter, I point out areas where SharePoint suggests relevant search results without a user entering a search query, all based on social data and what SharePoint search predicts the user will find relevant. Then, in Chapter 11, I walk you through how search provides the plumbing underneath to make eDiscovery and case management work, and I show how the more extensive the amount of content you include in your search index, the more effective your eDiscovery and case management solution will be.

9781430261698_Fig09-01.jpg

Figure 9-1. Social and eDiscovery leveraging the SharePoint search engine

My point is that search reaches further and can affect more than what first might be apparent. Maximizing the number of content sources to crawl and include in the index helps to provide a richer search experience for users, providing them with a single place to search for information across a variety of content repositories. A thorough index also increases the range of content you can discover and place on hold in a legal or regulatory case, as I discuss later.

Search crawls content across your organization, indexes it, and then relates pieces of content to each other. It gives you a comprehensive view into information across your organization. I like to think of SharePoint as the glue that holds the different enterprise systems together; this makes search the adhesive part of that glue because its index can hold a global perspective across all the content repositories, no matter how many silos you use to structure your information.

I often hear questions about whether SharePoint should replace different systems, particularly as stakeholders learn about the different capabilities built into the product. SharePoint does not have to replace every enterprise application, and indeed, it should not replace an application when the other application does a better job. Pick the best application for each job, and then use SharePoint search to provide a centralized access point to all of the enterprise data.

Deploying the SharePoint search component adds a wealth of features to support your enterprise content management initiative, and as such, the effort deserves your forethought and attention to think through the possibilities and maximize a search engine’s potential in your organization. Most notably, a well-designed enterprise search engine provides your organization with

  • An entry-point to discover information in different repositories across your organization.
  • A rich drill-down experience for filtering and refining search results.
  • Customizable search-result formatting and ranking.
  • Document previews to determine result relevancy for supported Office documents.
  • A usage analytics and reporting engine to identify popular content and usage trends.
  • Secure search results filtered based on what a user has permission to see.

Even though I only focus on the search engine itself in this chapter, know that your work here provides the groundwork for those other capabilities when you are ready to enable them. Investing effort and planning in search now will reward you later as you tackle those other capabilities and leverage the search engine and index. Before I venture into the technical details of search, I want to discuss the conceptual aspects of enterprise search.

Understanding Enterprise Search

The goal of an enterprise search engine is to connect users with relevant information, quickly. People are used to quickly finding information on the Internet using one of the popular public search engines. Enterprise search engines essentially attempt to replicate that experience for users within an organization.

Your ultimate goal for an enterprise search deployment is to provide a search experience where users can find relevant information based on their context and what they are seeking. To achieve this, you need a search engine with sophisticated algorithms to determine the relevancy of each piece of content in a massive corpus of data. You also need a search engine that crawls data in different content repositories, not just SharePoint sites.

The SharePoint search engine crawls content across the enterprise, in multiple repositories and file formats, to then build an index that users can run queries against to find relevant content. As Figure 9-2 illustrates, an enterprise search portal can aggregate search results from a range of content sources, including:

  • Local and remote SharePoint sites
  • Network file shares
  • Structured and unstructured data accessed using Business Connectivity Services (BCS)

image Note  For more information on Business Connectivity Services in SharePoint 2013, including how to create data connections to external data sources for search to crawl and index, please see the MSDN site at http://msdn.microsoft.com/jj163782.

  • E-mail and public folders in Exchange or Lotus Notes
  • Portals and web sites hosting HTML pages
  • People and profile information

9781430261698_Fig09-02.jpg

Figure 9-2. Enterprise search portal with multiple content sources

Including all the different silos of content across the organization makes the search results more compelling for users—the more content repositories you include, the richer the search experience will be for users because they will be able to search across a more complete range of content. This helps to provide users with search results likely to include the piece of content they are looking for; it also helps the search engine to tune relevancy and build a more complete index of the enterprise’s content.

When I think about the search capability in SharePoint, I think about deploying an enterprise search portal—one where users can treat as a web destination to find information anywhere in the organization. To establish it as a destination, it needs to include all of the content repositories necessary to consider the search portal as an enterprise portal.

image Note  For more information on how to install and configure the search service in a SharePoint 2013 farm, please see the search configuration details in the “Farm Configuration” section of my SharePoint 2013 Build Guide at http://stevegoodyear.wordpress.com/sharepoint-2013-build-guide.

SharePoint Search Architecture Overview

Search runs as a service in SharePoint 2013, and Microsoft designed its architecture to facilitate redundancy and scalability in multiple directions. The search architecture consists of components and databases working cohesively to provide the search service.

Microsoft architected the search service across six components, each responsible for processing a specific portion of the search service. A search architecture includes the following components:

  • Crawl Component: Accesses content repositories and crawls content.
  • Content Processing Component: Processes crawled items, including document parsing and property mapping.
  • Index Component: The logical representation of the search index. You can divide the index into discrete partitions, each stored in files on a disk.
  • Query Processing Component: Analyzes and processes search queries and results.
  • Analytics Processing Component: Runs the search analytics and usage analytics.
  • Search Administration Component: Runs system processes essential to search.

Search uses the components to run and process the different aspects of the search service. In addition to the search components, the search architecture also includes the following databases:

  • Crawl Database: Manages the crawl operations and stores the crawl history.
  • Link Database: Stores link click-through information as well as link information extracted by the content processing component.
  • Search Administration Database: Stores search configuration data.
  • Analytics Reporting Database: Stores the results of usage analytics.

Figure 9-3 illustrates the relationship between the search components and databases in the search architecture. At the center of the architecture is the search index, which is paramount to providing a rich search experience. The search architecture feeds content to the index through the crawl and content processing components. A search portal sends a user’s search query to the index through the query processing component.

9781430261698_Fig09-03.jpg

Figure 9-3. The SharePoint search architecture

In the smallest deployment on single-server farms, you will deploy all of the search components on the SharePoint server and all of the databases on the database server. As your search needs evolve, you can scale out the farm by adding additional servers to host specific search components. For example, if you scaled from a single-server farm to a six-server farm, you can distribute your search components across the servers with redundancy—two servers handling web requests, two servers hosting the query processing component, and two servers hosting the remaining crawl, content processing, analytics, and administration components.

One of the key benefits this search architecture offers is the ability to scale your SharePoint farm to meet a wide variety of usage characteristics by targeting individual components to specific servers. You can also create additional index partitions to handle a growing number of content items and divide the index load across multiple servers.

image Note  For more information on the search architecture in SharePoint 2013, please see the TechNet site at http://technet.microsoft.com/jj898538.

Analyzing Your Enterprise Search Requirements

Search is one of my favorite aspects of SharePoint to deploy. It is like bringing an organization out of the dark ages, mostly because it suddenly makes information available—people can find whatever they want, wherever it is located, with just a few keywords. I might be dramatizing it a little, but at the time, it can feel that exciting. It all starts with gathering the requirements and imagining the possibilities.

As you analyze your search requirements, avoid complicating them. Remember: one of the things that Google did well is it created a simple interface with a single text box where users could enter a search query. It made a simple user interface. My best advice to you is to follow that trend and avoid getting too crazy with design activities. Figure 9-4 shows the default search portal page—a simple user interface with everything your users need to conduct a search. This is a good place to start.

9781430261698_Fig09-04.jpg

Figure 9-4. The default SharePoint search portal

The first big requirement to solve is what happens from there, once users submit a search query. Luckily, SharePoint has solved most of the details around requirement. As Figure 9-5 shows, the search results on a default enterprise search portal are already feature-rich. Along the left side of the page, you can see refiners, a set of links where users can click to refine the search results. Down the middle are search results, and if you hover over one, you can see a result preview pop up along the right.

9781430261698_Fig09-05.jpg

Figure 9-5. The search results page

I generally try to start with this experience in the initial phase of the search deployment. Unless you are replacing an existing and well-established enterprise search engine in your organization, these default features will almost always be sufficient enough to generate a wow factor for your initial launch. Avoid complicating things in the user interface at this stage unless you have a compelling reason (such as adding functionality to meet a specific and critical requirement).

Hopefully, you can avoid those requirements initially and focus instead on the content. Indexing the right content, after all, is the value of your enterprise search engine. You can always enhance the interface and user experience in the search portal later because this does not have a lasting effect on overall adoption. Having relevant and comprehensive search results, on the other hand, does have a lasting effect on overall adoption. This is what attracts users and establishes your search portal as a destination for finding information, but only if it delivers in its search results right from the start. This is the most critical piece.

The essence of your requirements analysis for an enterprise search, at least initially, consists of identifying all of the content repositories available on your network. This includes every SharePoint farm, every network file share, and every other web resource containing information. If there is information contained somewhere on the network that will benefit users with being searchable and discoverable, now is the time to identify that content and consider including it in the search index.

image Important  Particularly with network file shares, I have noticed some people find themselves practicing something commonly referred to as security through obscurity—content the system does not disclose only because others are not aware it exists, even though there are no permissions restricting access to the content. As you analyze requirements for indexing a network file share, look at where you might be missing access control permissions, because otherwise, the search engine will expose the content in related search results.

Think about this: how valuable would Google be if it only contained half of the content on the Web? Google can calculate relevancy based on the number of other sources referencing an item; it can even translate text based on human translations somewhere else on the Web. Having such an extensive and comprehensive corpus of data feeds many of Google’s algorithms. The larger the corpus, the more effective the algorithms. But more importantly for our purposes, not only does its underlying engine rely on its indexes having enough coverage of the Web’s content, but its users expect results from an extensive and fresh index of content.

Your users will have the same expectation that your enterprise search engine’s index adequately covers your organization’s array of content. Start with identifying and including the content first, and then build your requirements from there based on the search experience your users desire or the relevancy you need to tune.

Its simplicity is what makes search striking. This does not mean there is no work to do or things to administer; there is. In addition to continuing to refine your search requirements and tune your search engine, you should always start with the content.

Administering an Enterprise Search Service

Administering an enterprise search service revolves around two main areas: indexing content and tuning for queries. The entire goal is to crawl all the necessary content as frequently as needed to provide users with meaningful and relevant search results for their search queries. The majority of your search administration tasks will involve those two areas.

You can administer and monitor search in a similar fashion as other aspects of administering SharePoint, only through the Search Administration page. From SharePoint Central Administration, click the Manage Service Applications link in the Application Management section. Select the desired search service and click the Manage button in the ribbon to navigate to the Search Administration page for the search service, as shown in Figure 9-6.

9781430261698_Fig09-06.jpg

Figure 9-6. The Search Administration page

Notice the different administration options on the left navigation menu. The navigation menu groups the administrative options into three categories: Diagnostics, for reviewing logs and reports; Crawling, for configuring and managing the crawling settings; and Queries and Results, for configuring and managing how the search service processes queries.

In the center area, the page displays the system status. This dashboard includes details about any active crawling activity and the number of searchable items that the index contains. It also includes the content access account and e-mail address for crawls, both of which you can edit by clicking the respective link for the item you wish to change.

image Note  The Contact E-mail Address for Crawls specifies the e-mail address that SharePoint will include along with the request header information that the crawler service sends to servers as it requests content from them. The target servers can then log the request, including the e-mail address, so that a server administrator can use it to contact a search administrator if there are any issues with the crawling service, such as the crawler overloading the target server’s resources.

Scrolling down on the Search Administration page, you can see the Search Application Topology, similar to the one shown in Figure 9-7. This topology lists the different components configured in the search architecture and which server hosts each component. In my example, I have a single server and it hosts all six components. The topology also lists the four databases in the search architecture and which database server hosts each.

9781430261698_Fig09-07.jpg

Figure 9-7. The Search Application Topology section

image Note  Although this is not a comprehensive guide for managing SharePoint search, I wanted to provide you with a brief introduction to the service and the key areas you can configure. For more thorough administration guidance, please see the TechNet site at http://technet.microsoft.com/ee792877.

The Search Administration page is the main entry page for administering different aspects of your search service. In the next few sections, I cover the most common tasks you need to perform to administer a search service, starting with configuring a content source.

Configuring Search Content Sources

Content sources in a SharePoint search identify the target locations and the protocol to use for connecting to the target storage location. A content source loosely relates to a content repository; although a content source can include multiple content repository locations in some cases, typically referred to as start addresses, as long as they share the same protocol for connecting and accessing the location. A content source also manages the crawling schedule and the account used to authenticate for crawling the content.

Your search engine will not provide any search results if you do not have a content source configured and scheduled to crawl content. As such, this is the most important step for you to complete before you can enable an enterprise search engine. To manage the content sources for your search service, click the Content Sources link in the Crawling section on the left navigation menu on the Search Administration page to navigate to the Manage Content Sources page similar to Figure 9-8.

9781430261698_Fig09-08.jpg

Figure 9-8. The Manage Content Sources page

You can manage the settings for an existing content source by clicking the name of the content source you wish to edit. You can also create a new content source by clicking the New Content Source button. If you create a new content source, you have to specify a name for the content source and the type of content to crawl (the protocol to connect with), as shown in Figure 9-9. By default, you can select from the following content source types:

  • SharePoint Sites
  • Web Sites
  • File Shares
  • Exchange Public Folders
  • Line of Business Data (Business Connectivity Services)
  • Custom Repositories

9781430261698_Fig09-09.jpg

Figure 9-9. The Add Content Source page

Next, enter the start addresses for the content source. You can include multiple content repositories by adding multiple start addresses—they do not have to be the same host, only the same source type (protocol). You might also separate content repositories into different content sources to specify different crawl schedules for each. Finally, select or create the crawl schedules and click OK.

image Tip  For SharePoint content sources, you specify the people profile content as well as the My Site content by specifying sps3 and http(s) protocols for the respective start addresses. For example, you could include sps3://people and http://people for profiles and site content hosted on that web application.

Configuring Crawl Rules

You can fine-tune how SharePoint crawls a location by creating a crawl rule. Create a new crawl rule by clicking the Crawl Rules link on the left navigation menu on the Search Administration page, and then clicking the New Crawl Rule link. Figure 9-10 shows the options for creating a new crawl rule.

9781430261698_Fig09-10.jpg

Figure 9-10. The Add Crawl Rule page

You can apply a crawl rule by following these steps:

  1. Specify a path and any wildcards or regular expressions for identifying the location to apply the rule.
  2. Select the crawl configuration to specify what to exclude or include in the location during a crawl.
  3. Specify the authentication credentials for the crawl content access account.
  4. Click OK.

Configuring Result Sources

Use results sources to scope search results and to federate queries to external sources, such as remote SharePoint farms and public Internet search engines. You can specify result sources for the following protocols:

  • Local SharePoint: Results from the index of the local search service
  • Remote SharePoint: Results from the index of a search service hosted in another farm
  • OpenSearch 1.0/1.1: Results from a search engine that uses this protocol
  • Exchange: Results from an Exchange source

image Note  Result Sources in SharePoint 2013 replace the deprecated Search Scopes in previous versions of SharePoint.

To create a new result source, click the Result Sources link in the left navigation menu on the Search Administration page to navigate to the Result Sources page, and then click the New Result Source button. On the Add Result Source page shown in Figure 9-11, specify a name, description, and protocol for the desired result source. Set any other relevant options and click Save.

9781430261698_Fig09-11.jpg

Figure 9-11. The Add Result Source page

Configuring Query Suggestions

Query suggestions appear when users begin to type a query in the search text box. SharePoint dynamically manages the list of suggestions based on popular search terms and what a user has searched for in the past. You can also manually supply a list of phrases to always or never suggest, depending on your objective.

Configure the query suggestions lists by clicking the Query Suggestions link in the left navigation menu on the Search Administration page to navigate to the Query Suggestion Settings page shown in Figure 9-12.

9781430261698_Fig09-12.jpg

Figure 9-12. The Query Suggestions Settings page

If there are phrases you do not want to appear as query suggestions, such as query phrases that users frequently mistype, simply list the phrases in a text file and click to import the text file for the Never Suggest Phrases option.

For those phrases you wish to always suggest, list them in a separate text file and click to import it for the Always Suggest Phrases option. Figure 9-13 illustrates an example of the phrase “practical sharepoint 2013 enterprise content management” I added to always suggest. Now, whenever a user begins to type the word practical on this search portal, the search page will suggest the entire phrase.

9781430261698_Fig09-13.jpg

Figure 9-13. Search query suggestion example

Building an Enterprise Glossary

One common theme running across each and every company I have consulted with or worked for involves a distinct language involving company- or industry-specific terms and acronyms. These are any terms that a regular person outside the organization would be unfamiliar with, such as terms with a specific usage only relevant within your firm or acronyms that outsiders would be unaware of.

To give you an example, years ago, I worked for Pepsi, and a common acronym at the time was PQI. It was so common, in fact, that our people would use it as a verb. To an outsider, this would seem foreign and difficult to understand, simply because the letters in the acronym are not obvious, at least not to anyone outside the soft-drink industry at the time. The acronym stands for product quality initiative, and it essentially refers to the “best before” dates on the soft drinks.

When private-label soft drinks came on the market and challenged the bigger bottlers and their brand-name beverages with cheaper alternatives, Pepsi responded by adding best-before dates to each bottle and can. The message was that buying the brand-name drink meant you were buying freshness and quality—you could visually verify a bottle or can has not sat on a shelf going stale. In other words, you could visually verify the product quality of the name-brand drinks, but not the no-name alternatives, introducing an initiative to differentiate. Hence, product quality initiative.

If a bottler used PQI as a verb, as in “these drinks are going PQI,” he or she was saying that the drinks were about to expire. Anyone familiar with the anonym, which you now are, would understand; but for everyone else, particularly new employees, this would sound odd and foreign. To compound this, try to imagine just how many different acronyms and terms a typical organization would have (at Pepsi, we printed out a list, and later we discovered we missed a few despite the list spanning nearly thirty pages).

You can solve this problem like we solved this problem at Pepsi back in the day by printing out a glossary booklet. Paper-based glossaries can be difficult to distribute and a challenge to maintain, and let’s face it: a bound booklet glossary is not convenient to carry around with you. A better solution is to add a glossary to your SharePoint search engine.

I like to describe this second option in the context of new people joining your organization. As part of their orientation, you could introduce them to the search portal, telling them that when they hear an unfamiliar term, they can type it into search and see a glossary defining it in the search results. Instead of being lost or confused, forcing them to face the embarrassment of asking what a term means, they can inconspicuously search and find out what it means on their own—from the browser on their desktop or even from their mobile device while they are on the go.

A search-powered glossary provides a nice way to onboard new employees and a convenient way to maintain a list of relevant terms in an easy-to-find source. This feature also helps to drive adoption of the search engine, because not only can people use it to find content and other people (as I discuss in the next chapter), they can also find information on terms or phrases used within the organization.

In previous versions of SharePoint, you could add terms in what the product referred to as best bets. In SharePoint 2013, the product team made some subtle changes to the idea of promoted search results, ultimately to extend the functionality and add a finer degree of control to how you define the criteria for promoted results. This functionality now found in the query rules.

Configuring Query Rules

Query rules enable you to conditionally promote important search results, show blocks of additional result information, and even tune result rankings.

You can create a query rule to provide a glossary item in the search results by following these steps:

  1. Click the Query Rules link in the left navigation menu on the Search Administration page to navigate to the Manage Query Rules page.
  2. Select the Local SharePoint Results result source from the Result Source drop-down menu. This adds the glossary item to the default result source.
  3. Click the New Query Rule button to navigate to the Add Query Rule page.
  4. Specify a rule name. In my example, I called the rule ECM.
  5. Define the Query Conditions that make the rule fire. In my example in Figure 9-14, I specified “ecm; enterprise content management” for the Query Conditions so that the rule would fire and return the glossary definition for when users submit either the acronym or the phrase for a search query.

    9781430261698_Fig09-14.jpg

    Figure 9-14. The Add Query Rule page

  6. Click the Add Promoted Result link in the Actions section. Enter the glossary term’s title and definition in the modal window, as shown in Figure 9-15. Click Save to dismiss the modal window, and then click Save to create the rule.

9781430261698_Fig09-15.jpg

Figure 9-15. The Add Promoted Result modal window

You can test the query rule by navigating to the search portal and submitting the term as the search query. If you created the query rule successfully, you should see a result similar to Figure 9-16.

9781430261698_Fig09-16.jpg

Figure 9-16. Promoted result example

Wrapping Up

Users use search engines to find the relevant information that they seek. This makes search an important aspect of your SharePoint deployment, or even as I suggested, the centerpiece of your SharePoint deployment. Enterprise search offers a single entry-point to discover information in the enterprise. Its architecture in SharePoint includes the components that access and crawl the content to build an index, and the components that process search queries by utilizing the index. Your enterprise search solution should start with the content, and design its search experience from there.

People in an organization search for all types of information: sometimes this is content, and sometimes it is other people and their knowledge. Sometimes people discover content through other people and their knowledge. Discovering information through others is the essence of social computing, an aspect of which builds on the SharePoint search engine in addition to other social features. In the next chapter, I discuss the topics relating to social capabilities in SharePoint, including searching for people and for knowledge through other people. I also look at other social features, such as profiles, social tagging, and blogging.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.63.41