Create a New Content Source

Content sources track the locations of content that you want indexed for search. SharePoint 2010 creates a default content source for indexing all the content stored inside SharePoint. If you want to search any other content stored on the network, you will need to create additional content sources to index.

A content source can reference more than one start address, but each start address must be of the same type—that is, SharePoint must use the same protocol for accessing all the addresses. Content sources are crawled based on a schedule configured for each content source. All of the addresses in the content source are crawled under the same schedule.

The decision to create a new content source is therefore based on two primary criteria: the protocol of the content address and the schedule that will be applied. You will need to create a new content source if the address you want to crawl uses a different protocol than the existing content sources. Likewise, you will need to create a new content source if you want to crawl the address on a different schedule from the existing content sources.

The setup of the search service automatically creates the Local SharePoint Sites content source that holds addresses to all the SharePoint web applications. This content source can be modified by moving some or all of the content addresses into different content sources. SharePoint automatically grants the default content access account permissions on these addresses through the Policy for Web Applications feature.

Creating a New Content Source

SharePoint 2010 has a recommended limit of 50 content sources per search service. While it is possible to exceed this limit, there may be problems with scalability as the number of content sources increases.

When a content source is crawled, SharePoint will index all the start addresses listed in the content source. As more addresses are added to a content source, the crawl takes longer. Adding more content sources will reduce the crawl time for each and allow for portions of an address to be crawled on different schedules.

Regardless of the type of content that is being crawled, the content crawler account must be granted full read permissions on all the content to be indexed.

To add a new content source, do the following:

1. From the Search Administration page, click Content Sources.

2. On the Manage Content Sources page, click New Content Source.

3. On the New Content Source page, shown in Figure 8.10, enter a name for the content source that describes the content that will be crawled.

Figure 8.10: Add Content Source page (partial)

image

4. Under the Content Source Type section, select the option for the protocol or type of content that will be included. While some types may sound similar to each other, such as SharePoint Sites and Web Sites, there are differences in the way that the content is treated. These differences will be discussed in the next section, “Types of Content Sources.”

5. Under Start Addresses, enter the addresses of all content locations that will be included in this content source.

6. In the Crawl Settings section, elect to crawl all content under the start address or only the portion of the content specified by the start address. This option allows you to set up different crawl schedules for different subsites or subfolders of content so that some portions of the content can be crawled more frequently.

7. Under Crawl Schedules, shown in Figure 8.11, it is necessary to select an existing schedule or create a new one in order for index crawls to occur. For more details on crawl schedules, see the section, “Managing Crawl Schedules,” later in this section.

Figure 8.11: Setting crawl schedules

image

8. Under Content Source Priority, designate which content sources will be ranked higher in the crawl queue. Higher-ranked sources are given priority if multiple crawls are running at the same time. You might want to configure these settings if an important set of content is not being indexed in a timely manner because other crawls are slowing down the indexing process.

9. Check the Start Full Crawl option if this content source should be available immediately to the farm.

10. Click OK.

Types of Content Sources

The following are the types of content sources that SharePoint can crawl:

SharePoint Content Sources A SharePoint content source can be used to index sites in farms that are running SharePoint 2010 (including SharePoint Foundation), SharePoint 2007 including WSS 3.0, or SharePoint 2003 including WSS 2.0. The sites can be located either in the local farm or in a separate farm, as long as the content access account is granted permissions to the sites.

Note that SharePoint 2010 cannot automatically identify all site collections hosted in earlier versions of SharePoint, so each site collection start address will have to be added separately.

SharePoint is able to read the security descriptors on all SharePoint content for use in filtering search queries by users.

Website Content Sources This content source is used to index any HTTP or HTTPS website regardless of the technology used by its host. With this type of content, SharePoint does not read the security settings on the pages, so all content will be searchable by all users. With this type of content source, you have the option of crawling the entire site or only the first page of the site, or you can specify a page depth and hop count from the start page, as shown in Figure 8.12.

Figure 8.12: Website content source crawl settings

image

File Share Content Sources The file share content source allows indexing of any Windows server shared folder and all its contents. SharePoint automatically indexes all the security descriptors on this type of source and filters the search results for users accordingly.

Exchange Public Folder Content Sources This content source is used to index document, calendars, and other types of public folders hosted in an Exchange server. This source type does expose security information to SharePoint for use in filtering content.

Line-of-Business Data Content Sources This content source is used to index external data accessed through Business Connectivity Services (BCS). The data sources must be configured to support indexing before the content source can be configured.

Custom Repository Content Sources Custom repositories represent any additional content source type installed on the server such as third-party connectors for document management systems and non-Windows servers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.47.169