Configuring and Maintaining the Search and Indexing Components

Configuring search and indexing correctly is important so that users receive accurate, relevant, and expected results. Configuring search for your organization involves three primary steps:

  • Indexing content: All the content that you want to make available to users must be defined as content sources and indexed on a regular basis so that the results are available to be returned in searches. Content sources are defined and indexing is scheduled and completed by the Shared Service Provider (SSP).

  • Associating Web applications with content indexes: Each Web application needs to have an associated SSP that provides the indexes for searching.

  • Define scopes: Search scopes allow users to target the slice of content they want to search. If configured appropriately, users should be able to pick a scope for their search that returns a reasonable number of relevant results.

Indexing content

Choosing what content to index and configuring your indexing settings is the primary task for indexing content. MOSS can index content from several types of content sources, including these sources for which SharePoint has provided out-of-the-box support:

  • File share content: SharePoint can index content that is placed on file shares.

  • Exchange server content: Exchange public folders content is a good potential source for indexing corporate knowledge.

  • Lotus Notes servers: If your organization uses Lotus Notes, SharePoint can index the Lotus Notes databases. You will need to run the Lotus Notes Index Setup Wizard to configure the Lotus Notes protocol handler before configuring a Lotus Notes content source.

  • SharePoint sites: MOSS searches all the local SharePoint content and can be configured to index content that is not a consumer of the local SSP for cross-organizational content.

  • Business data: MOSS can also index data that has been defined in the Business Data Catalog.

    Cross-Ref

    For more information on business data catalog, see Chapter 18.


  • Web sites: MOSS can index Web site content for cross-platform or cross-product integration.

Planning content sources

Selecting your content sources from the myriad of available corporate repositories of data is an important step in the indexing configuration process. Indexing content can be a resource-intensive task, both for the indexing server and for the server that is responding to the crawler requests for the content.

The SSP is automatically configured with a content scope that searches all the local SharePoint sites, which is defined as all the site collections that are using the SSP. You can choose to index SharePoint content that is external to the SSP, but this will most likely result in content being indexed more than once: first by the local SSP and one or more times by external SSPs that have defined it as a content source. The exception to this would be in the SharePoint sites that you are indexing as part of a WSS farm.

For each set of content that is a potential source for your SSP users, you need to decide what the source is and how often you will do full and incremental updates to the index.

Warning

Keep in mind that for external content, or content that you do not control, requesting too much content or content too often may overload the external source, and that administrator can block you from crawling in the future.


For each content source that you identify, determine the following content source options:

  • How deep would you like to crawl? For SharePoint sites, you can determine whether you want to search for everything under the start address or just the SharePoint site of the start address. For Web content, you need to decide if you will stay on just the first page of the site or the entire server. You can also set custom hop settings for the indexing to follow to limit the number of server hops and depth of the pages. Setting the server hop limit to 2 or more can be overwhelming to your indexing resources because indexing will not only index your starting address, but also any server that is linked from the starting address content (one server hop), and any other servers mentioned in the first server hop (second server hop).

  • What is the crawl schedule? The crawl schedule can be determined by understanding how often the target content changes and how long it takes to index the content source. Try to plan full crawls for times when the content source has low resource usage and schedule them less frequently if the content does not change frequently. Schedule incremental crawls to update the content between full crawls.

  • Does this content source need to be accessed by an account other than the default content access account? The default content access account credentials will be presented to gain read access to content sources unless specified by a crawl rule for a specific site. Managing several content access accounts can be a time-consuming procedure, especially if the accounts require password changes on a regular basis. We recommend only defining unique content access accounts if necessary because the default content access account cannot be used.

Implementing content sources

To implement the content sources that you have identified, follow these steps for each content source:

1.
Navigate to the administration page for your Shared Service Provider and select Search settings from the Search section.

2.
Select Content sources and crawl schedules.

3.
Click New Content Source in the top navigation bar.

4.
Enter a name for your content source in the Name field, as shown in Figure 7.1.



Figure 7.1. Add a new content source


5.
Select the type of content to be crawled.

If your content source is a SharePoint server, enter the address of the top-level SharePoint site in the start address box. If you want to include more than one top-level site in your content source, you can add additional start addresses on separate lines. Select whether you want the crawl to crawl everything under the hostname or just the SharePoint site in the Crawl Settings section.

If your content is a Web Site, enter the address of the site in the start address box. If you want to include more than one site in your content source, you can add additional start addresses on separate lines. Select whether you want the crawl to crawl the server, only the first page, or a custom depth of your start addresses in the Crawl Settings section. The custom crawling behavior allows you to set the page depth and server hops. Page depth is how many levels down from the first page the crawler will follow. The number of server hops defines how many servers the crawler is allowed to follow.

If your content is a file share, enter the address of the site in the start address box. If you want to include more than one file share in your content source, you can add additional start addresses on separate lines. Select whether you want to crawl only the start address folder or the folder and its subfolders in the Crawl Settings section.

If your content is an Exchange public folder, enter the address of the site in the start address box. If you want to include more than one public folder in your content source, you can add additional start addresses on separate lines. Select whether you want to crawl only the start address folder or the folder and its subfolders in the Crawl Settings section.

If your content is business data, select which BDC application you would like to crawl or select whether you want to crawl the entire BDC in the Applications section.

6.
Set the schedule for the full crawl and incremental crawls in the Crawl Schedules section. Any schedules that you have previously configured will be available in the pull-down, or you can select the Create Schedule link to define a new schedule.

7.
Select whether the crawl should immediately start a full crawl of the content source in the Start Full Crawl section.

8.
Click OK.

Implementing SSP settings for all sources

There are several SSP search and indexing settings that apply to all content sources, including the default content access account, crawl rules, and file-type inclusions.

The default content access account provides the credentials that are used to gain read access to the content sources for indexing content. This account was configured during the post-setup configuration steps of the SSP in Chapter 2. You should choose a default content access account that has broad read access to your content sources to simplify the content access account administration process.

Crawl rules are used to limit the content crawls as appropriate to either increase the relevancy of results or to limit resource impact on the sources. You can create crawl rules to include or exclude from a URL or set of URLs. You can also create crawl rules to set broad rules of how the crawler handles whether to just crawl links identified at the source, whether the crawler should crawl URLs with complex characters, and whether SharePoint sites should be crawled as HTTP. Crawl rules also allow you to specify authentication for a particular path to be different from the default crawling account.

File-type inclusions let the crawler know what file extensions to crawl or not crawl. You can add file types to the file-type list that is populated initially with the commonly used file types. If you add a file type, you must have an iFilter that MOSS can use to understand and crawl that content type.

To create crawl rules, follow these steps:

1.
Navigate to the administration page for your Shared Service Provider and select Search settings from the Search section.

2.
Select Crawl rules.

3.
Click New Crawl Rule in the top navigation bar.

4.
Enter a path in the Path field. You can use wildcards in the path to designate that the rule should apply to anything that matches, so http://*.* would match any hostname.

5.
Select whether you want to include or exclude content with this rule in the Crawl Configuration section.

6.
Specify whether you want to use a different crawling account from the default crawling account or client certification in the Specify Authentication section, as shown in Figure 7.2.

Figure 7.2. Specifying a different crawling account using a crawl rule


7.
Click OK.

Implementing server name mappings

SharePoint provides the capability to change the display location URL for search item results by using server name mappings. Server name mappings are configured at the SSP level and are applied when search queries are performed.

Server name mappings are useful when you want to replace local addresses for content with addresses on the server or if you want to hide the source of the content. However, you should not use them unless you have access or display problems. To implement server name mappings, follow these steps:

1.
Navigate to the administration page for your Shared Service Provider and select Search settings from the Search section.

2.
Select Server name mappings.

3.
Click New Mapping in the top navigation bar.

4.
Type an address in the Address in index field. This is the name that you want the SSP to find and replace.

5.
Type an address in the Address in search results field. This is the name that you want the SSP to insert in the search results.

6.
Click OK.

Configuring Search for your server farm

In addition to configuring your content sources, you need to configure the settings that the crawler uses to reach the content sources. These settings are configured at the farm level. To configure your farm search settings, follow these steps:

1.
Open SharePoint Central Administration and select the Application Management tab.

2.
Select Manage search service in the Search section.

3.
Select Farm-level search settings.

4.
Type the contact e-mail address in the E-mail Address field as shown in Figure 7.3. This is the e-mail address that the gatherer will use to let other administrators know who to contact in the case of a problem.

Figure 7.3. Configuring the e-mail account in server farm settings for search


5.
Enter the proxy settings in the Proxy Server Settings section. The proxy settings need to be configured so that the crawler can reach sites that are on the other side of the proxy, if appropriate for your organization.

6.
Enter the timeout settings for connection time and acknowledgement time. This is the amount of time the crawler is configured to wait for content before moving to the next item.

7.
Select whether the crawler should ignore SSL warnings. This setting determines whether the SSL certificate must match exactly.

8.
Click OK.

The farm-level search settings also include the settings for the crawler impact rules. These rules determine how many documents that the crawler requests at a time and how frequently the crawler requests documents from a particular site. To configure crawl rules, follow these steps:

1.
Open SharePoint Central Administration and select the Application Management tab.

2.
Select Manage search service in the Search section.

3.
Select Crawler impact rules.

4.
Click Add Rule in the top navigation bar.

5.
Enter the site for which you want this rule to apply.

6.
Select how many documents you want the crawler to request at a time. You can configure the crawler to make simultaneous requests of up to 64 documents without waiting between requests, or you can configure the crawler to request one document at a time with a specific waiting time between requests.

7.
Click OK.

Configuring search scopes

Search scopes can be configured at both the SSP level and the site collection level. SSP search scopes should be broad scopes that are relevant to all users regardless of the site collection. In the out-of-the-box installation, SharePoint creates two SSP-level search scopes: All Sites and People. Additional SSP search scopes could be configured for organizational information that is pertinent to everyone such as a corporate event portal. Site collection search scopes should be specific to the information on that site collection.

Note

MOSS provides the ability to create an RSS subscription to search results. However, this subscription will work only for search scopes created at the SSP level.


Search scopes are defined by one or more rules. The rules can be based on managed properties and location or content sources, and can include rules that exclude content. Your goal in creating search scopes is to create logical divisions of the content so that users understand which scope to pick and get a reasonable number of results returned when they execute their search.

Search scopes are organized in display groups. Search Web Parts use these groups to identify which scopes to show in the search drop-down menu.

Defining search scopes at the SSP

In addition to defining the SSP scopes, the SSP administrator can also create scope rules. To define a scope at the SSP level, follow these steps:

1.
Navigate to the administration page for your Shared Service Provider and select Search settings from the Search section.

2.
Select View scopes from the Scopes section.

3.
Select New Scope from the top navigation bar.

4.
Provide a name for your search scope in the Title box and the contact for the scope in the Contact field, as shown in Figure 7.4.

Figure 7.4. Creating a new shared scope


5.
Select whether you want to use the default Search results page, or enter a different search results page that you would like to use and click OK.

6.
From the View Scopes page, select Edit Properties and Rules from the drop-down menu on the scope title that you just added.

7.
Select New Rule.

8.
If you want to create a rule based on the Web address properties of the indexed items, select Web Address in the Scope Rule Type and select whether the Web address will be limited by folder, hostname, or domain/subdomain.

If you select folder, enter the URL of the folder that you want the rule to be based on in the Folder box, as shown in Figure 7.5. For example, http://server/site/folder.

Figure 7.5. Creating a scope rule to include content from a Web address


If you select hostname, enter the hostname that you want the rule to be based on in the Hostname box. For example, servername.

If you select domain or subdomain, enter the domain name that you want the rule to be based on in the Domain or subdomain box. For example, office.microsoft.com.

9.
If you want to create a rule based on the properties of the indexed items, select Property Query in the Scope Rule Type section. Select the property that you want the rule to be based on in the Add property restrictions pulldown. Enter the rule value in the = field. For example, Author (is the property) = John Doe (rule value).

10.
If you want to create a rule that is based on a specific content source, select Content Source in the Scope Rule Type section and select the content source from the pull-down menu.

11.
Select the All Content radio button if you want the scope to return all indexed items.

12.
Select if you want to include, require, or exclude content based on the rules you enter.

Include rules specify what content will be included unless another rule excludes them.

If you choose a required rule, all items returned in the scope must match the rule.

Exclude rules specify what content will not be included. This content will not be included even if it matches the other rules.

13.
Click OK.

14.
Add as many rules as you need to tune your scope to the appropriate content.

Copying SSP scopes for your site collection

Although it is not possible for a site collection administrator to modify a scope created by the SSP administrator, the site collection administrator can duplicate and subsequently modify a copy of an SSP search scope.

If you want to implement changes to an SSP scope by copying it to your site collection, follow these steps:

1.
Go to the top-level site of the site collection for which you want to add the scope and select Site Settings from the Site Actions menu in the top right corner.

2.
Select Search Scopes from the Site Collection Administration menu.

3.
From the pull-down menu on the scope that you want to copy, select Make copy.

Defining site collection search scopes and scope display groups

SharePoint provides two display groups to organize your site collection search scopes, one for the search dropdown and one for the advanced search page. To create a new display group, follow these steps:

1.
Go to the top-level site of the site collection for which you want to add the scope and select Site Settings from the Site Actions menu in the top right corner.

2.
Select Search scopes from the Site Collection Administration menu.

3.
Select New Display Group from the top navigation bar.

4.
Provide a name for your display group in the Title field.

5.
If you have already created the scope or scopes that you want to include in this display group, select the scope in the Scopes section, as shown in Figure 7.6.

Figure 7.6. Creating a new display group


6.
From the View Scopes page, select Edit Properties and Rules from the drop-down menu on the scope title that you just added.

7.
Click OK.

To define a scope for your site collection, follow these steps:

1.
Go to the top-level site of the site collection for which you want to add the scope and select Site Settings from the Site Actions menu in the top right corner.

2.
Select Search scopes from the Site Collection Administration menu.

3.
Select New Scope from the top navigation bar.

4.
Provide a name for your scope in the Title field and the contact for the scope in the Contact field.

5.
Select whether you want to use the default Search results page, or enter a different search results page and click OK.

6.
From the View Scopes page, select Edit Properties and Rules from the drop-down menu on the scope title that you just added.

7.
Select New rule.

8.
If you want to create a rule based on the Web address properties of the indexed items, select Web Address in the Scope Rule Type section and select whether the Web address will be limited by folder, hostname, or domain/subdomain.

If you select folder, type the URL of the folder that you want the rule to be based on in the Folder box. For example, http://server/site/folder.

If you select hostname, type the hostname that you want the rule to be based on in the Hostname box. For example, servername.

If you select domain or subdomain, enter the domain name that you want the rule to be based on in the Domain or subdomain box. For example, office.microsoft.com.

9.
If you want to create a rule based on the managed properties of the indexed items, select Property Query in the Scope Rule Type section. Select the property that you want the rule to be based on in the Add property restrictions pulldown. Enter the rule value in the = field. For example, Author (is the property) = John Doe (rule value). This rule operates by finding exact matches.

10.
Select the All Content radio button if you want the scope to return all indexed items.

11.
Select if you want to include, require, or exclude content based on the rules you enter.

Include rules specify what content will be included unless another rule excludes them.

If you choose a required rule, all items returned in the scope must match the rule.

Exclude rules specify what content will not be included. This content will not be included even if it matches the other rules.

12.
Click OK.

13.
Add as many rules as you need to tune your scope to the appropriate content.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.147.20