Chapter 4. Deployment

After completing this chapter, you will be able to

  • Understand the FS4SP hardware and software requirements and best practices.

  • Deploy and configure your own development environment.

  • Plan and scale your production FS4SP farm according to your usage scenario.

When you install enterprise software such as Microsoft SharePoint and Microsoft FAST Search Server 2010 for SharePoint (FS4SP), being familiar with network and hardware setup is a necessity. Both SharePoint and FS4SP require substantial hardware resources when running in a production environment; planning which internal FS4SP component should run on what hardware is key when optimizing your FS4SP deployment.

This chapter describes the hardware and software requirements for an FS4SP farm and goes over some of the important points about installing FS4SP in a development environment. It also covers various scaling strategies in production environments.

Overview

Developing for FS4SP is very similar to developing for SharePoint. For efficiency and stability reasons, it is often recommended to configure separate environments for development, testing, staging, and production. By using multiple environments, developers can safely test new features without interrupting the live production environment. Typically, each of these environments has its own characteristics; a development environment often consists of one machine only, but the number of servers increases as you move your solution over to the testing, staging, and production environments. As such, a SharePoint farm shares similarities with an FS4SP farm.

In practice, when using FS4SP, each developer often runs the development environment locally, has one or two server test installations—perhaps virtualized—and the production environment. When FS4SP is installed in addition to SharePoint Server 2010, a fully deployed FS4SP solution consists of both a SharePoint farm and an FS4SP farm.

Hardware Requirements

FS4SP is hardware-intensive, requiring fast CPUs, a lot of RAM, and low-latency disks. CPU and RAM are the most important factors when indexing content, whereas disk speed is a limiting factor for query latency and throughput. RAM is also important for searches, depending on which and how many index features you have enabled. The following list provides an easy-to-remember—although crude—rule of thumb of FS4SP’s relationship to hardware:

  • Faster CPU. The system can index data faster.

  • More RAM. The system can handle more item processing and more index features, such as refiners and sortable properties.

  • Faster disk. The system can handle more queries per second (QPS) and execute complex and heavier queries faster.

A good principle is to allocate 2 GB of RAM per CPU core. This allocation provides high CPU usage processes with enough memory for the processing they do. Additionally, because more RAM allows the operating system to cache disk reads and writes, FS4SP performance generally improves across the board.

Complementing these general guidelines is a list of performance effects, provided by Microsoft TechNet, for enabling a certain index feature. You can find this list at http://technet.microsoft.com/en-us/library/gg702611.aspx. Refer to this list when designing solutions in which high query throughput is anticipated.

The general hardware requirements from Microsoft are listed in Table 4-1. Keep in mind that a multi-server farm can benefit from heterogeneous hardware depending on which FS4SP services, such as item processing or indexing, run on each server.

Table 4-1. Official FS4SP hardware requirements

Minimum

Recommended

4 GB of RAM

16 GB of RAM

4 CPU cores, 2.0 GHz CPU

8 CPU cores, 2.0 GHz CPU

50-GB disk

1 TB of disk space on RAID across six spindles or more

Storage Considerations

When you set up storage for the servers storing the search index, you can find multiple options available from hardware vendors. Should you use local disks or a storage area network (SAN), and what disk characteristics are important? This section outlines what you need to consider when deciding on your storage approach for FS4SP.

Disk Speed

FS4SP is very I/O intensive; the nature of a full-text index gives rise to lots of random disk access operations. As such, disk latency is a very important factor for achieving good performance during indexing and searching. In general, a faster disk means lower latency and is usually a good investment when equipping FS4SP servers. SAS disks are preferred over SATA disks, and 15,000 RPM disks are preferred over 10,000 RPM disks.

Today, more options are available for using solid-state drives (SSD) in server environments. The main characteristic of an SSD is low latency on random access because of its use of memory chips instead of mechanical components. You should consider using SSDs in your FS4SP deployment if you have the option; they can cut down on the number of columns used in your deployment. The more items you store per server, the more data the search engine must evaluate for each search query. Because using SSDs cuts down the disk latency compared to regular disks, you can evaluate more data in the same amount of time, meaning that you can store all your content in fewer columns.

Note

Within a fixed power and cost budget, it is better to get a larger number of 10,000 RPM disks compared to fewer 15,000 RPM disks. For example, if you have two 15,000 RPM disks and a particular operation requires 100 disk reads, you would have 50 reads per disk. If you have five 10,000 RPM disks instead, the operation would require 20 reads per disk. All read operations are executed simultaneously, so the 20 read operations would be faster even though the disks themselves are slower.

Disk Layout

If you are planning to use local disks, FS4SP benefits from partitioning the physical disks similarly to how you would partition a Microsoft SQL Server database: using the three separate volumes shown in Table 4-2. By keeping data and log files away from the operating system partition, you will prevent the operating system drive from filling up and potentially crashing your system. Storing the data files on a high-performance volume configured for high availability is good production practice; likewise, you should keep the log files on a smaller volume (approximately 30 GB) because they consume less space.

Table 4-2. FS4SP volume layout

Volume

Contains

Physical disk layout

1

Operating system, FS4SP program files

RAID 1

2

FS4SP data files (binary index, dictionaries)

RAID 10 (1+0), RAID 5, or RAID 50 (5+0)

3

FS4SP log files

RAID 1

If you have a large number of drives available for the data files volume, RAID 50 is usually preferred over RAID 10 because the storage capacity is almost doubled on the same number of disks.

Note

FS4SP uses predefined locations for data files and log files. If you want to partition your drives as per Table 4-2, see the section Changing the Location of Data and Log Files in Chapter 5.

More Info

Read more about the different RAID levels and how to combine them at http://en.wikipedia.org/wiki/RAID. The level that is available to you depends on your disk controller.

If you are storing your FS4SP files on a SAN, network-attached storage (NAS), or SSD disks, other rules apply than what was just outlined; the following sections discuss these storage technologies.

Using a SAN

Because company server farms are often streamlined for storage operational efficiency, you might not have the option to use local disks on an FS4SP server in your deployment. The company might have standardized on a SAN storage or even a NAS storage for its data usage.

When you use a SAN, multiple servers share a centralized storage pool, shown in Figure 4-1. This storage pool facilitates data exchange between the servers connected to the SAN. Each server is connected to the SAN by using a dedicated Fibre Channel and has its own storage area on the SAN where they can access data in much the same fashion as when using local disks.

Storage area network (SAN).

Figure 4-1. Storage area network (SAN).

Tests by Microsoft have shown that a sufficiently powerful SAN will not be the bottleneck in an FS4SP farm if the SAN is properly configured with dedicated disks. The key metrics for determining whether your SAN’s performance is enough for FS4SP are as follows:

  • 2,000–3,000 I/O operations per second (IOPS)

  • 50–100 KB average block size

  • Less than 10 milliseconds (ms) average read latency

More Info

You can read more about performance characteristics of different deployment configurations at http://technet.microsoft.com/en-us/library/ff599526.aspx. Both the extra-small and medium scenarios describe configurations with SSD disks.

If you have five servers in your FS4SP farm, the SAN must be able to serve 10,000–15,000 IOPS to the FS4SP farm alone, regardless of other servers using the same SAN.

More Info

You can use a tool called SQLIO to measure the raw I/O performance of your storage system. Go to http://technet.microsoft.com/en-us/library/gg604775.aspx for information about how to obtain and use SQLIO on an FS4SP server.

Using NAS

NAS, shown in Figure 4-2, operates similarly to a SAN except the traffic is carried over a standard high-speed local network rather than over a dedicated Fibre Channel.

Network-attached storage (NAS).

Figure 4-2. Network-attached storage (NAS).

NAS does not scale as well as either SAN or local disks because it shares bandwidth with the centralized storage instead of having dedicated connections. In addition to sharing bandwidth, the NAS bandwidth of a network connection is lower compared to Fibre Channel and local disks: 125 MBps for network 1 Gbps, 250 MBps for Fibre Channel 2 Gbps, and 320 MBps for local disks. The network becomes the limiting factor after you add more servers. Because FS4SP is I/O-intensive, you want disk operations to be as fast as possible, so using the network as the main I/O transport is just not a good idea.

Important

The only NAS usage you should consider with FS4SP is to store log files or to do backup because NAS hinders search and indexing performance.

Using SSD

A solid-state drive (SSD) uses solid-state memory instead of magnetic platters to store persistent data. As such, an SSD has no moving parts like a traditional hard drive and does not require spin-up time. Disk access is typically around 0.1 ms—many times faster than a mechanical hard drive. This faster disk access provides improved I/O performance and very fast random disk access, which helps FS4SP perform better across the board.

High-query volume scenarios are ideal for using solid-state drives. Although a 15,000-RPM SAS disk can deliver around 200 IOPS on random read/write, an SSD drive can deliver from 5,000 IOPS to more than 1 million IOPS depending on your hardware. A test performed by Microsoft shows that two SSDs perform the same as seven 10,000-RPM SAS disks; both setups deliver more than 60 QPS while crawling is idle. To maintain a high QPS during crawling, you need to set up a dedicated search row.

More Info

For more information about IOPS and to see where the numbers just referenced came from, go to http://en.wikipedia.org/wiki/IOPS. To see the results of the Microsoft test, go to http://technet.microsoft.com/en-us/library/gg512814.aspx.

FS4SP and Virtualization

For several years, there has been a shift toward setting up virtualized servers instead of physical ones and using a SAN instead of physical disks. The typical use case for virtualizing a server is to use hardware resources more efficiently and make it easier to scale a particular server’s allotted resources. By hosting several servers on the same physical hardware, you also are able to cut utility costs: Because fewer physical servers require less cooling, using fewer servers by itself is cheaper to host and requires less maintenance.

Even though FS4SP supports virtualization with Microsoft Hyper-V and VMware server products, Microsoft recommends that you only use virtualized servers for FS4SP in nonproduction environments with one exception. The exception occurs when you have a separate administration server, which can be virtualized because the I/O requirements are low for the administration services. Tests by Microsoft have shown a general decrease of 30 percent to 40 percent in overall performance when running on virtualized servers; this decrease most likely results from the heavy use of CPU and disk I/O by FS4SP.

More Info

You can read more about setting up FS4SP in a virtualized environment at http://technet.microsoft.com/en-us/library/gg702612.aspx. To see the recommendations resulting from the Microsoft tests, go to http://technet.microsoft.com/en-us/library/gg702612.aspx.

When discussing virtual machines, we talk about virtual CPUs (vCPUs). A vCPU is what the guest operating system sees as an available CPU. A vCPU maps to a physical CPU core, not a physical CPU. Virtual machines have access to only a limited number of vCPUs—four vCPUs with Hyper-V and eight vCPUs with VMware. This low access to vCPUs limits the resources you can assign to the virtual FS4SP server, even though the physical hardware may have more available. The following provides the pros and cons of virtualization:

  • Pros. If you are a small- or medium-sized organization and have fewer than 8 million items to index, tests from Microsoft show that you can get close to 10 QPS on a single-server FS4SP farm running on Hyper-V. For a small- or medium-sized organization, 10 QPS can be enough to serve queries from a Search Center in SharePoint.

  • Cons. An issue with virtualization is redundancy. An FS4SP farm can be scaled to multiple rows to provide a hot failover of indexing and search; however, if the farm is running on the same virtual infrastructure, the entire solution could fail if the host server goes down.

To put this all into perspective, our experience shows that typical intranet scenarios peak at around 3 QPS for normal out-of-the-box FS4SP usage, well within what a virtual server can deliver. If you build a lot of search-driven applications and rely heavily on search for data retrieval on your site, the query latency will increase. When the latency per query rises above what you deem acceptable, you should move to physical hardware to scale better.

Software Requirements

You will not install any SharePoint components on the FS4SP servers, but FS4SP requires either SharePoint 2010 with Enterprise Client Access Licenses (ECAL) or a SharePoint Server 2010 for Internet Sites Enterprise Edition license in order to run. This is because all search queries pass through SharePoint, as explained in Chapter 3.

More Info

You can find information about the software and hardware requirements for SharePoint at http://technet.microsoft.com/en-us/library/cc262485.aspx.

You can install FS4SP on the following operating systems:

  • Windows Server 2008 R2 x64

  • Windows Server 2008 SP2 x64 (Standard, Enterprise, and Datacenter versions are supported.)

You should fully update the server with the latest service pack and updates before you install FS4SP. In addition, you must install Microsoft .NET Framework 3.5 SP1 on each FS4SP server before installing the FS4SP software. You can download .NET Framework 3.5 SP1 from http://www.microsoft.com/download/en/details.aspx?id=22.

Note

You may use .NET Framework 4 for Pipeline Extensibility stages because an External Item Processor can be any executable and installing .NET Framework 4 will not interfere with the older versions of .NET installed on the server.

Installation Guidelines

This section does not contain detailed step-by-step installation instructions for FS4SP; instead, it focuses on what you need to do before installing FS4SP and on the parts of the installation that require close attention.

More Info

For a complete installation guide for FS4SP, go to http://technet.microsoft.com/en-us/library/ff381243.aspx.

FS4SP can be set up and configured in numerous ways, from a single server installation to a farm handling several hundred million documents. Adding to the complexity of having many servers, each server can also host several internal FS4SP components in a multitude of configurations. If the complexity of the SharePoint farm and IT infrastructure is included, you are bound to encounter issues that are not covered directly in the deployment guides provided by Microsoft. It’s impossible to cover everything, but this section tries to outline what you should think about before deploying FS4SP and what you need to pay extra attention to during the installation. Table 4-3 lists some good resources that may help when you encounter an issue you do not find an immediate solution to.

Table 4-3. Troubleshooting resources

Resource title

Link

“Troubleshooting (FAST Search Server 2010 for SharePoint)”

http://technet.microsoft.com/en-us/library/ff393753.aspx

“Microsoft Search Server 2010 and Microsoft FAST Search Server 2010 Known Issues/ReadMe”

http://office.microsoft.com/en-us/search-server-help/microsoft-search-server-2010-and-microsoft-fast-search-server-2010-known-issues-readme-HA101793221.aspx

“Survival Guide: FAST Search Server 2010 for SharePoint”

http://social.technet.microsoft.com/wiki/contents/articles/2149.aspx

“FAST Search for SharePoint” (forum)

http://social.technet.microsoft.com/Forums/en-US/fastsharepoint/threads

Before You Start

To successfully install FS4SP, you must fulfill several requirements. Use the information in Table 4-4 as a checklist before and during the installation. This information is also available as a separate downloadable document (PDF file) at http://fs4spbook.com/fs4sp-install-checklist.

Table 4-4. Installation checklist

Category

Requirement

Notes

Accounts

The user running the installer must be a member of the local Administrators group.

Administration rights are needed to perform certain operations during installation.

Accounts

The user running the installer must have the Allow Log On Locally permission.

This permission is assigned during installation and should not be removed. Make sure you don’t have any policies that will change this on reboot.

Accounts

Create a domain account to run FS4SP under, for example, sp_fastservice.

The user needs the following rights:

  • dbcreator role on the SQL server

  • Log on as a service

  • Allow logon locally

Make sure the user account is not a local administrator and not a domain administrator because this is unnecessary and increases security risks. Also make sure you don’t have Group Policies overwriting the account rights.

Accounts

To simplify the process of adding more search administrators, create a domain group, for example, FASTSearchAdmin. Add this group to a local group named FASTSearchAdministrators on every FS4SP server.

You can also manually create a local group called FASTSearchKeywordAdministrators and add user accounts that should have access only to keyword administration.

By using a domain group, you can add new search administrators in one place instead of on all servers. Only user accounts that are added via a domain group or directly to the local group named FASTSearchAdministrators are able to execute commands against FS4SP.

Accounts

Add all users who will administer the FS4SP installation to your domain search administrators group FASTSearchAdmin.

If you didn’t create a domain group, you must add the users directly to the local FASTSearchAdministrators group on each FS4SP server, as stated earlier in this table.

Accounts

Add the user who runs the Application Pool for the web application hosting your search services to your domain search administrators group FASTSearchAdmin.

Typically, the application pool is named Web Application Pool - SharePoint - 80 or something similar.

The user account running the Application Pool for the web application is needed to communicate with the FS4SP administration server.

If you didn’t create a domain group, you must add the user accounts directly to the local FASTSearchAdministrators group on each FS4SP server, as stated earlier in this table.

Antivirus

Exclude all FS4SP folders from antivirus scans. This includes the installation folder of FS4SP, the location of your data and log files, and any folder you have created for pipeline extensibility modules.

This prevents the antivirus software from falsely identifying FS4SP files as malware and ensures that performance is not affected by virus scans.

Antivirus

Exclude the following FS4SP binaries from antivirus scans:

  • cobra.exe

  • contentdistributor.exe

  • create_attribute_files.exe

  • docexport.exe

  • fastsearch.exe

  • fastsearchexec.exe

  • fdispatch.exe

  • fdmworker.exe

  • fixmlindex.exe

  • fsearch.exe

  • fsearchctrl.exe

  • ifilter2html.exe

  • indexer.exe

  • indexingdispatcher.exe

  • jsort2.exe

  • make_pu_diff.exe

  • monitoringservice.exe

  • pdftotext.exe

  • procserver.exe

  • qrserver.exe

  • spelltuner.exe

  • truncate_anchorinfo.exe

  • walinkstorerreceiver.exe

  • walookupdb.exe

  • webanalyzer.exe

All binaries reside in the <FASTSearchFolder>in folder.

Database

Verify that the main SQL Server service and the SQL Server Browser service are running.

SQL Server and SQL Server Browser have to be running in order for the installer to detect the SQL Server instance.

Database

In SQL Server Network Configuration, verify that TCP/IP is enabled under Protocols for your running instance.

FS4SP communicates with SQL Server over the TCP/IP protocol.

Database

Make sure your FS4SP service account is given the dbcreator role.

This role is needed for the installer to be able to create the admin database.

Network

Verify that all servers in the FS4SP farm have Windows Firewall turned on.

You can check the status of the Windows Firewall via Windows PowerShell with Get-Service MpsSvc.

You can start the Windows Firewall service via Windows PowerShell by using Start-Service MpsSvc.[a]

Network

The FS4SP installer uses the computer’s default settings for Internet Protocol security (IPsec) key exchange, data protection, and authentication.

If you do not use the default settings for IPsec—for example, because of domain Group Policies or the computer’s local policies—ensure that the custom global firewall IPsec settings provide sufficient security. Also, ensure that the global IPsec settings are the same for all servers in the deployment.

Network

Make sure you are using static IP addresses.

Make sure that all FS4SP servers have static IP addresses to enable the installer to automatically configure IPsec in the firewall during the installation.

Network

Make sure port 13390 is open for connection on the FS4SP administration server, and make sure the other FS4SP server can connect to the port.

Non-administration servers communicate default on port 13390 to the administration server.

Network

During installation, FS4SP adds the port range 13000–13499 to the Windows Firewall settings.

If you change the base port from 13000, the range and ports are adjusted accordingly.

Network

The following default ports should be accessible from the SharePoint farm to the FS4SP farm (assuming the base port is 13000):

  • 13255: Resource Store (HTTP)

  • 13257: Administration Service (HTTP)

  • 13287: Query Service (HTTP)

  • 13391: Content Distributor

You have the option of securing the query and administration traffic by using HTTPS. The following default ports should be accessible when using HTTPS:

  • 13258: Administration Service (HTTPS)

  • 13286: Query Service (HTTPS)

Network

Make sure the TCP/IP offloading engine (TOE) is disabled.

In a multi-server FS4SP farm, the non-administration servers communicate with the administration server via IPsec and IPsec does not work correctly with TOE. This can lead to issues with the Kerberos sessions and with the FS4SP servers not being able to communicate with each other.[b]

Network

Make sure your database server can accept connections on port 1433.

The FS4SP administration server accesses the database server via TCP/IP on port 1433 by default. You must make sure port 1433 is added to the SQL Server firewall rules.[c]

Operating system

Make sure you are running either Windows Server 2008 SP2 64 bit or Windows Server 2008 R2 64 bit.

Standard, Enterprise, and Datacenter editions are supported for Windows Server 2008 SP2.

Security

Make sure you do not have a Group Policy that restricts the Kerberos Maximum Token Size. Also make sure you do not have a Group Policy that restricts Kerberos to use only User Datagram Protocol (UDP).

Changing these default values prevents FS4SP servers from communicating with each other.

Server

The server name of your FS4SP server cannot exceed 15 characters.

This is a limitation within some FS4SP components.

Server

If possible, use lowercase letters for your server name and fully qualified domain name (FQDN). Make sure you use the same casing when entering the FQDN during installation.

Some components of FS4SP require the correct casing of server names even though Windows itself is not case sensitive.[d]

If you can use lowercase letters everywhere instead of mixed capitalization, you are less prone to errors. As an example, use fast.contoso.com instead of FAST.contoso.com.

Server

Make sure your machine is part of a domain.

FS4SP requires your servers to be part of a domain.

Server

Turn off automatic Windows updating.

Plan to install updates after a controlled shutdown of FS4SP to avoid any possible data corruption.

Server

Disable automatic adjustment of daylight saving time (DST).

If DST adjustment is enabled, query timeouts may occur for a brief period around midnight on the date of DST adjustment.

Server

Verify that the clocks on the servers in the FS4SP farm and the SharePoint Server farm are synchronized on minute level.

The DST setting on the servers in the SharePoint Server farm is allowed to differ from the DST setting on the servers in the FS4SP farm.

Server

Activate the following features on your SharePoint farm:

  • SharePoint Server Publishing Infrastructure

  • SharePoint Server Enterprise Site Collections

By activating these SharePoint features, you can create a search center based on the FAST Search Center site template.

Server

Make sure you apply SP1 before running the configuration if you have more than four CPU cores enabled during installation.

The RTM version of FS4SP has a known bug that makes the installation of FS4SP fail if you have too many CPU cores in your system.[e]

Server

Make sure the Windows service called Secondary Logon is running.

This service is often disabled during hardening of the operating system but is needed by the FS4SP configuration.

Software

Install .NET Framework 3.5 SP1

Handled by the prerequisites installer.

Software

SharePoint software is not installed on the server.

Although you can have SharePoint installed on the same server(s) as FS4SP, this setup is not a supported scenario by Microsoft. In a development scenario, it is acceptable to host SharePoint and FS4SP on the same server.

Software

The server is not a domain controller.

Although FS4SP works on a domain controller, this is not a supported scenario by Microsoft. In a development scenario, it is acceptable to host FS4SP on a domain controller.

[a] Even if you intend to run without a firewall in production—for example, to improve performance—you must have it turned on during installation.

[b] For support information about communication issues between Kerberos sessions and the FS4SP servers, go to http://support.microsoft.com/kb/2570111.

[c] For information about how to configure a Windows Firewall for Database Engine access, go to http://msdn.microsoft.com/en-us/library/ms175043.aspx.

[d] For more information about case sensitivity, go to http://support.microsoft.com/kb/2585922.

[e] For more information about this bug, go to http://support.microsoft.com/kb/2449600.

Software Prerequisites

Before installing FS4SP, you must prepare the server with some additional software components. The FS4SP installer bundle contains a prerequisites installer that takes care of downloading and installing these components for you.

If your server is not connected to the Internet during installation, you need to download the software prerequisites manually. You can install them in two ways:

Table 4-5 lists the software prerequisites for installing FS4SP, and Listing 4-1 provides a script for installing the components via the prerequisites installer. Using a script is well suited for unattended installation.

Table 4-5. Prerequisites and download links

Software component

Download link

Command-line option

Web Server (IIS) Role

Operating system feature

 

Mimefilt.dll

Operating system feature

 

Distributed Transaction Support

Operating system feature

 

Windows Communication Foundation (WCF) Activation Components

Operating system feature

 

XPS Viewer

Included with .NET Framework 3.5 SP1 redistributable package

 

Microsoft .NET Framework 3.5 SP1

http://go.microsoft.com/FWLink/?Linkid=188659

/NETFX35SP1:file

Microsoft .NET Framework 3.5 SP1 Hotfix[a]

http://go.microsoft.com/FWLink/?Linkid=166368

/KB976394:file

WCF Hotfix[b]

http://go.microsoft.com/FWLink/?Linkid=166369

/KB976462:file

Windows PowerShell 2.0

Included with Windows Server 2008 R2.

Windows Server 2008 SP2 x64: http://go.microsoft.com/FWLink/?Linkid=161023

/PowerShell:file

Windows Identity Foundation

Windows Server 2008 SP1: http://go.microsoft.com/FWLink/?Linkid=160381

Windows Server 2008 R2: http://go.microsoft.com/FWLink/?Linkid=166363

/IDFX:file /IDFXR2:file

Microsoft Primary Interoperability Assemblies 2005

Included with the FS4SP installer

/VSInteropAssembly:file

Microsoft Visual C++ 2008 SP1 Redistributable Package (x64)

http://go.microsoft.com/FWLink/?Linkid=188658

/VCCRedistPack:file

Microsoft Filter Pack 2

Included with the FS4SP installer

/FilterPack:file

[a] Required for Windows Server 2008 SP2 only

[b] Required for Windows Server 2008 R2 only

FS4SP Preinstallation Configuration

Before installing FS4SP, you need to change the execution policy of Windows PowerShell scripts. This is to ensure that the Configuration Wizard you run later is able to run Windows PowerShell scripts during configuration. To change the execution policy, open a Windows PowerShell command window as an administrator. At the Windows PowerShell prompt, run the following code.

Set-ExecutionPolicy RemoteSigned

You can install the FS4SP binaries in two ways:

  • Via the GUI by starting the installer from the splash screen or by manually executing fsserver.msi

  • Unattended by using the following command.

    Msiexec /i fsserver.msi /q FASTSEARCHSERVERINSTALLLOCATION="<InstallDir>" /l <LogFile>

<InstallDir> is the location where you want to install the FS4SP binaries. <LogFile> is the path and file name of the installation log file. You can omit the /q parameter if you want to see progress while installing. For example:

Msiexec /i fsserver.msi /q FASTSEARCHSERVERINSTALLLOCATION="C:FASTSearch" /l C:fastinstall.log

After you install the binaries, continue with the post-setup configuration and be sure to add your FAST admin users to the local FASTSearchAdministrators group before rebooting the server.

Important

If you are using SQL authentication between your FS4SP farm and SQL Server, you must configure FS4SP via Windows PowerShell and cannot use the Microsoft FAST Search Server 2010 For SharePoint Configuration Wizard.

FS4SP Update Installation

After you complete the initial FS4SP configuration and setup, Microsoft recommends that you install the latest service pack and/or cumulative updates for FS4SP. Note that fixes for FS4SP occur both on the SharePoint farm and on the FS4SP farm. Use the following links to find information on the latest updates:

When applying updates to a running system, you need to address a couple of points:

  • Have you added a custom Pipeline Extensibility module? See Chapter 7, for more information about how to extend the indexing pipeline.

  • Have you increased the content capacity to support more than 30 million items per column? See Chapter 5 for more information about how to increase the content capacity.

If you answered “yes” to one or both questions, make sure you back up the changed configuration files before running the post-setup configuration script. If you don’t, there is a high probability the files will be overwritten during the patch process and you will lose your customizations. See the section Manual and Automatic Synchronization of Configuration Changes later in this chapter for a list of files that are overwritten during configuration changes.

Best practice with FS4SP is to make a copy of any configuration file you have edited manually before applying an update. After applying an update and running the post-setup configuration script in patch mode, compare the backed-up files with the live ones to ensure all your edits are still in place.

Run the post-setup configuration script in patch mode

An update may include updates to configuration files, and these updates are usually patched into the existing files. As such, you need to run the post-setup configuration script in patch mode to apply the patches to your system.

  1. Open an FS4SP shell.

  2. Browse to <FASTSearchFolder>installerscripts, where <FASTSearchFolder> is the path of the folder in which you have installed FAST Search Server 2010 for SharePoint, for example, C:FASTSearch.

  3. Type the following command to run the post-setup configuration script in patch mode.

    .psconfig.ps1 –action p.

More Info

For a detailed description about how to apply a software update to FS4SP in a single-server or multi-server farm, go to http://technet.microsoft.com/en-us/library/hh285624.aspx.

FS4SP Slipstream Installation

There is no official documentation on slipstreaming in service packs or cumulative updates with the installation of FS4SP, but because service packs and cumulative updates contain Windows Installer patch files (.msp files), you can use the patch parameter of Windows Installer to also install the updates during installation.

Using service pack 1 as an example, execute the following command from a command prompt to unpack the .msp files from the service pack. The same procedure can be applied to cumulative updates. Type the following command, where x:installfolder is the destination drive and folder location where you unpack the service pack.

fastsearchserver2010sp1-kb2460039-x64-fullfile-en-us.exe /extract:x:installfolder

Next, save the script in Listing 4-2 to a file named fs4spslipstream.cmd.

Example 4-2. Slipstream installation script

@echo off
if "%1" == "" goto error
if "%2" == "" goto error
setlocal enabledelayedexpansion
set patches=
for %%i In ("%2*.msp") DO set patches=!patches!%%i;
%1fsserver.msi /update "%patches%"
goto end
:error
echo.
echo Usage: fs4spslipstream.cmd [fs4sp install files folder] [patch files folder]
echo.
echo Example: fs4spslipstream.cmd d: c:patches
echo.
:end

If your FS4SP installation files reside in d: and your extracted patches in c:patches, you can do a slipstream installation by using the following command from a command prompt.

fs4spslipstream.cmd d: c:patches

After the installation, you need to run the post-setup configuration as normal.

Single-Server FS4SP Farm Configuration

This section covers the steps of the Microsoft FAST Search Server 2010 For SharePoint Configuration Wizard. (We refer to this wizard throughout the rest of the chapter as just the Configuration Wizard.) The section also explains the different options found in the Configuration Wizard. The same options must be set in a scripted configuration.

Configure a single-server FS4SP farm by using the Configuration Wizard

  1. When you configure FS4SP as a single-server FS4SP farm, choose the Single Server (Stand-Alone) option in the Configuration Wizard, shown in Figure 4-4. (The other two options are used for deploying a multi-server farm.)

    Single-server FS4SP deployment.

    Figure 4-4. Single-server FS4SP deployment.

  2. On the next page of the wizard, enter the user name and password that your FS4SP service will run under, as shown in Figure 4-5. (See Table 4-4 for the account requirements.)

    FS4SP service user credentials.

    Figure 4-5. FS4SP service user credentials.

  3. On the next page of the wizard, enter a password for the self-signed certificate, as shown in Figure 4-6. This password is needed when installing the certificate on the SharePoint farm (if you decide to go with the self-signed certificate for your deployment).

    FS4SP self-signed certificate password.

    Figure 4-6. FS4SP self-signed certificate password.

  4. On the Server Settings page of the Configuration Wizard, shown in Figure 4-7, you can either use a default deployment configuration file or provide a custom one. See the section Deployment Configuration later in this chapter for more information about the default deployment configuration file.

    Use a default or an existing deployment file.

    Figure 4-7. Use a default or an existing deployment file.

  5. On the next page of the wizard, enter the database settings for where FS4SP stores the administration database, as shown in Figure 4-8.

    Database settings.

    Figure 4-8. Database settings.

  6. Click-through relevancy is a feature that gives an additional boost to items that are frequently opened from the search results. Over time, this feature helps identify high-value and important items, ensuring they are moved up the result list.

    If you want to enable click-through relevancy, on the Click-Through Relevancy Settings page, choose either Standalone or Server Farm depending on the installation mode of SharePoint Server, as shown in Figure 4-9. If you set up SharePoint Server as a farm, enter the user name of the user running the Microsoft SharePoint 2010 Timer Service.

    Note

    When you enable click-through relevancy, SharePoint logs all clicks for results on the search page in a SharePoint Search Center site. These logs are then transferred to your FS4SP server at specific intervals for processing.

    The more clicks a result item receives from your users on a given search term, the higher ranked the result item will be for that search term. The result item is given a boost for future queries, potentially moving it higher up the result list as more users click it.

    If you decide not to install click-through relevancy during the initial configuration, you can enable it afterward by following the steps outlined at this link: http://technet.microsoft.com/en-us/library/ff384289.

    Enabling click-through relevancy.

    Figure 4-9. Enabling click-through relevancy.

  7. The last page of the wizard shows a summary of your choices. Click Configure to finish the configuration.

Deployment Configuration

When using the default deployment configuration, the following components are installed on your server:

  • Administration services

  • Document processors (four instances)

  • Content Distributor

  • Indexing Dispatcher

  • FAST Search Web crawler

  • Web Analyzer

  • Indexer

  • QR Server

The deployment configuration file in use after an initial configuration with the default file deployment.xml is located at <FASTSearchFolder>etcconfig_datadeploymentdeployment.xml. The deployment configuration file is identical in configuration to the single-server sample deployment configuration located at <FASTSearchFolder>etcdeployment.sample.single.xml.

The following code shows a sample file for single-server deployment.

<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="" modifiedTime="2010-06-21T14:39:17+01:00" comment="FAST
Search Server single node deployment example" xmlns="http://www.microsoft.com/enterprisesearch"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.microsoft.
com/enterprisesearch deployment.xsd">
  <!-- Logical name for entire installation - replace with suitable name after own wishes -->
  <instanceid>FAST Search Server Single Node</instanceid>
  <!-- Used by connectors - replace with correct values -->
  <connector-databaseconnectionstring><![CDATA[jdbc:sqlserver://sqlservernameinstancename:portn
uber;DatabaseName=dbname]]></connector-databaseconnectionstring>
  <!-- Single node, replace name of host with correct hostname -->
  <host name="single01.search.microsoft.com">
    <admin />
    <!-- Set with number of document processors needed -->
    <document-processor processes="4" />
    <content-distributor id="0" />
    <indexing-dispatcher />
    <!-- replace organization name and email if crawler will be used, this information is left
behind at  sites the crawler visits. This tag can be removed if the crawler will not be used -->
    <crawler role="single" />
    <!-- Max targets is the number of CPU's the web analyzer utilizes -->
    <webanalyzer server="true" max-targets="1" link-processing="true" lookup-db="true" />
    <searchengine row="0" column="0" />
    <query />
  </host>
  <searchcluster>
    <row id="0" index="primary" search="true" />
  </searchcluster>
</deployment>

Multi-Server FS4SP Farm Configuration

When configuring a multi-server farm, you use a deployment configuration file very similar to the one used for single-server deployment. The difference is that you define more host servers where you spread out the different components, as described in Chapter 3. Inside the deployment configuration file, you define the configuration for all the servers building up your FS4SP farm, and you use the same configuration file on all machines. The first server to be configured in a multi-server farm is the server that holds the administration component.

Listing 4-3 shows configuration for a two-server deployment, where multi01.contoso.com hosts the administration component and, thus, must be installed first as an administration server by using the option Admin server, shown in bold in Example 4-3. The other server, multi02.contoso.com, uses the Non-admin server option, also shown in Example 4-3.

Example 4-3. Deployment configuration file for two servers

<?xml version="1.0" encoding="utf-8" ?>
<deployment xmlns="http://www.microsoft.com/enterprisesearch">
  <instanceid>FAST Search Server Multi Node 1</instanceid>
  <connector-databaseconnectionstring><![CDATA[jdbc:sqlserver://sqlservernameinstancename:portn
uber;DatabaseName=dbname]]></connector-databaseconnectionstring>
  <host name="multi01.contoso.com">
    <admin />
    <document-processor processes="8" />
    <content-distributor />
    <indexing-dispatcher />
    <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4" />
    <searchengine row="0" column="0" />
    <query />
  </host>
  <host name="multi02.contoso.com">
    <document-processor processes="4" />
    <searchengine row="1" column="0" />
    <query />
  </host>
  <searchcluster>
    <row id="0" index="primary" search="true" />
    <row id="1" index="none" search="true" />
  </searchcluster>
</deployment>

Your FS4SP installation comes with three reference sample deployment files located in <FASTSearchFolder>etc:

  • A single-server farm (deployment.sample.single.xml)

  • A two-server farm (deployment.sample.multi1.xml)

  • A five-server farm (deployment.samlple.multi2.xml)

More Info

A detailed explanation about all settings of the deployment.xml file can be found at http://technet.microsoft.com/en-us/library/ff354931.aspx.

Manual and Automatic Synchronization of Configuration Changes

When setting up a multi-server FS4SP farm, make sure all servers performing specific roles are configured in the same way. You can change some settings on the administration server and they will be automatically copied over to the other servers; with other settings, you have to manually execute on the different servers.

Table 4-6 outlines the different configuration files in FS4SP that you can edit, on which server or servers you have to edit the files, and if you need to take special actions on the files when applying an FS4SP service pack or cumulative update.

Table 4-6. Supported configuration files

File

Where to update

Comment

File location

loggerconfig.xml

All servers

This file is overwritten when you apply updates and service packs.[a]

<FASTSearchFolder>etc

logserverconfig.xml

All servers

 

<FASTSearchFolder>etc

monitoringservice.xml

All servers

 

<FASTSearchFolder>etc

custompropertyextractors.xml

Administration server

 

<FASTSearchFolder>etcconfig_dataDocumentProcessor

deployment.xml

Administration server

 

<FASTSearchFolder>etcconfig_datadeployment

optionalprocessing.xml

Administration server

 

<FASTSearchFolder>etcconfig_dataDocumentProcessor

user_converter_rules.xml

Administration server

This file is overwritten when applying updates and service packs.[a]

<FASTSearchFolder>etcconfig_dataDocumentProcessorformatdetector

xmlmapper.xml

Administration server

 

<FASTSearchFolder>etcconfig_dataDocumentProcessor

pipelineextensibility.xml

Servers running a document processor

This file is overwritten when applying updates and service packs.[a]

<FASTSearchFolder>etc

Third-party IFilters

Servers running a document processor

You must install the IFilters on all servers.

 

All custom pipeline extensibility modules

Servers running a document processor

You must deploy your custom modules to all servers.

 

jdbctemplate.xml

Servers running the FAST Database connector

You should always make a copy of this file and not use the template.

<FASTSearchFolder>etc

beconfig.xml

Servers running the FAST Enterprise Web crawler

 

<FASTSearchFolder>etc

crawlercollectiondefaults.xml

Servers running the FAST Enterprise Web crawler

This file is overwritten when you apply updates and service packs.[a]

<FASTSearchFolder>etc

crawlerglobaldefaults.xml

Servers running the FAST Enterprise Web crawler

This file is overwritten when you apply updates and service packs.[a]

<FASTSearchFolder>etc

cctklog4j.xml

Servers running the FAST Search Database connector and Servers running the FAST Search Lotus Notes connector

This file is overwritten when you apply updates and service packs.[a]

<FASTSearchFolder>etc

lotusnotessecuritytemplate.xml

Servers running the FAST Search Lotus Notes connector

You should always make a copy of this file and not use the template.

<FASTSearchFolder>etc

lotusnotestemplate.xml

Servers running the FAST Search Lotus Notes connector

You should always make a copy of this file and not use the template.

<FASTSearchFolder>etc

rtsearchrc.xml

Servers that are part of the search cluster

This file is overwritten when you apply updates and service packs.[a]

<FASTSearchFolder>etcconfig_dataRTSearchwebcluster

[a] When you run the post-setup configuration script in patch mode after installing an update or a service pack, the script executes the FS4SP cmdlet Set-FASTSearchConfiguration, which overwrites the file.

Certificates and Security

Certificates are used for authenticating traffic between FS4SP and SharePoint and for authenticating traffic between servers in a multi-server FS4SP farm. In addition, certificates are used to encrypt traffic when you use Secure Socket Layer (SSL) communication between the SharePoint farm and the FS4SP farm over the optional HTTPS protocol for query traffic and administration traffic (for example, when adding best bets).

Each server in an FS4SP farm potentially has three certificates that serve different functions and must be configured and replaced separately. The first two certificates can be combined into one certificate if you do not have a requirement to use HTTPS for the query and administration services traffic:

  • A general purpose FS4SP certificate that is used for internal communication, administration services, and when indexing content via SharePoint (FAST Content SSA).

  • A server-specific certificate for query traffic that uses HTTPS. This is needed only on query servers that have HTTPS query traffic enabled.

  • A Claims certificate, which is a certificate that you export from your SharePoint farm to the FS4SP servers running as QR Servers.

By using certificates issued by a common certification authority (CA) for authentication and for traffic encryption over HTTPS, FS4SP can provide a very high level of security if needed. This ensures that no one can access sensitive information that is contained in the query traffic; best practice is to use HTTPS if your documents contain highly sensitive information.

Important

Even though traffic is encrypted when you use HTTPS, the query logs are still accessible in clear text on the FS4SP servers for users that have access to those servers.

During the initial installation, FS4SP generates a self-signed certificate. This certificate has a one-year expiration date from the date of installation and is meant to be used in test and development environments. If you decide to use the self-signed certificate in production and don’t want to replace the certificate every year, you can modify the certificate creation script in FS4SP in order to extend the expiration date to, for example, 100 years. See Chapter 9, for information about how to extend the default certificate.

To verify that your certificate is registered correctly, you can execute the following Windows PowerShell cmdlet. This cmdlet checks whether the SharePoint server can connect to a content distributor on the FS4SP farm.

Ping-SPEnterpriseSearchContentService

Follow these steps:

  1. Open a SharePoint Management Shell as an administrator.

  2. Type the following command, where <hostname>:<port> is the host name and port number for your content distributor found in <FASTSearchFolder>Install_Info.txt.

    Ping-SPEnterpriseSearchContentService -HostName <hostname>:<port>

The command output should list ConnectionSuccess as True for your certificate, shown in Figure 4-10. If all lines read False, the certificate is not correctly set up.

Successfully installed FS4SP self-signed certificate in SharePoint.

Figure 4-10. Successfully installed FS4SP self-signed certificate in SharePoint.

Creating FAST Content SSAs and FAST Query SSAs

As described in Chapter 3, FS4SP requires two Search Service Applications (SSAs). The FAST Query SSA handles all incoming search requests as well as the search index for People Search, and the FAST Content SSA handles the content sources and indexing.

More Info

Setting up the FAST Content SSA is described in a TechNet article at http://technet.microsoft.com/en-us/library/ff381261.aspx. Setting up the FAST Query SSA is described in a TechNet article at http://technet.microsoft.com/en-us/library/ff381251.aspx.

When you create the SSAs, both have one crawler component assigned, and each crawler component is associated with one crawler database. Adding crawler components to additional farm servers is done to distribute the crawl load. With FS4SP, the crawl components use fewer system resources compared to the built-in SharePoint search because of the architecture: Items are sent over to the FS4SP farm for processing instead of being processed on the SharePoint server. However, during monitoring of your system, if it is peaking on CPU usage during crawling or the crawler database on the SQL Server is showing reduced performance, adding an additional crawler component and optional crawler database can improve the situation.

To provide failover for search queries, you can add additional query components in the same fashion that you add crawler components. Because the FS4SP farm handles the actual query processing and the load associated with it, you should never need more than two query components in your SharePoint farm.

Important

You must configure each server that hosts a crawl component to use an SSL certificate for FS4SP. If you used the self-signed certificate when you enabled SSL communication for the FAST Content SSA, you must use the same certificate on the server with the added crawler component. If you instead used a CA-issued certificate, you must use a certificate issued by the same CA as the original certificate on the server with the added crawler component.

A detailed description about how to add and remove crawler components and the required certificates can be found at http://technet.microsoft.com/en-us/library/ff599534.aspx.

Enabling Queries from SharePoint to FS4SP

Queries from SharePoint are sent over to the FS4SP farm by using claims-based authentication with certificates. To set up claims-based authentication, you export a certificate from the SharePoint farm and install it on all QR Servers in the FS4SP farm. The traffic is, by default, sent over HTTP, but you can optionally secure this traffic further by using SSL certificates over HTTPS.

More Info

Detailed steps about installing the certificates can be found at http://technet.microsoft.com/en-us/library/ff381253.aspx.

Creating a Search Center

After you set up the FAST Content SSA for crawling, the FAST Query SSA for handling search queries, and the communication between the SharePoint farm and the FS4SP farm, you can set up a Search Center on your SharePoint farm. This Search Center can be used to execute queries against the FAST Query SSA. The Search Center is a SharePoint site based on the FAST Search Center site template. Customizations and configurations of the Search Center are covered in depth in Part II of this book.

More Info

Detailed steps about setting up a Search Center site can be found at http://technet.microsoft.com/en-us/library/ff381248.aspx.

Scripted Installation

When you install FS4SP, particularly in larger deployments, you are often required to automate and streamline the deployment procedure. With FS4SP, this can be achieved for both the installer and the post-configuration process because they can be run in unattended mode. The included offering requires you to manually start the installation and configuration in unattended mode locally on each server; however, it is possible to automate this process even further by creating custom Windows PowerShell scripts that use remoting to install and configure FS4SP on all servers.

After you install and configure FS4SP, you still must manually execute the certificate procedures between the FS4SP farm and the SharePoint farm, set up the FAST Content SSA and the FAST Query SSA on the SharePoint farm, and create a Search Center. These tasks are outside the scope of the scripts included with FS4SP but can be automated with custom scripts.

More Info

You can read more about scripted installation and configuration of FS4SP at http://technet.microsoft.com/en-us/library/ff381263.aspx.

Advanced Filter Pack

Out of the box, FS4SP supports extracting text from the file formats listed in Table 4-7. The default text extraction is provided by the Microsoft Filter Pack, one of the prerequisites when installing FS4SP.

Table 4-7. File types included by default

File name extension

Comment

.pdf

Adobe Portable Document Format file

.html (and all other HTML files, regardless of file name extension)

Hypertext Markup Language file

.mht

MHTML web archive

.eml

Microsoft email message

.xlb

Microsoft Excel binary spreadsheet (Excel 2003 and earlier)

.xlsb

Excel binary spreadsheet (Excel 2010 and Excel 2007)

.xlm

Excel macro-enabled spreadsheet (Excel 2003 and earlier)

.xlsm

Excel Open XML macro-enabled spreadsheet (Excel 2010 and Excel 2007)

.xlsx

Excel Open XML spreadsheet (Excel 2010 and Excel 2007)

.xls

Excel spreadsheet (Excel 2003 and earlier)

.xlc

Excel spreadsheet chart file

.xlt

Excel template

.one

Microsoft OneNote document

.msg

Microsoft Outlook mail message

.pptm

Microsoft PowerPoint Open XML macro-enabled presentation (PowerPoint 2010 and PowerPoint 2007)

.pptx

PowerPoint Open XML presentation (PowerPoint 2010 and PowerPoint 2007)

.ppsx[a]

PowerPoint Open XML slide show (PowerPoint 2010 and PowerPoint 2007)

.ppt

PowerPoint presentation

.pps

PowerPoint slide show

.pot

PowerPoint template

.pub

Microsoft Publisher document

.vsd

Microsoft Visio drawing file

.vdw

Visio Graphics Service file

.vss

Visio stencil file

.vsx

Visio stencil XML file

.vst

Visio template

.vtx

Visio template XML file

.doc

Microsoft Word document (Word 2003 and earlier)

.dot

Word document template (Word 2003 and earlier)

.docx

Word Open XML document (Word 2010 and Word 2007)

.dotx

Word Open XML document template (Word 2010 and Word 2007)

.docm

Word Open XML macro-enabled document (Word 2010 and Word 2007)

.xps

Microsoft XML Paper Specification file

.mhtml

MIME HTML file

.odp

OpenDocument presentation

.ods

OpenDocument spreadsheet

.odt

OpenDocument text document

.txt (and all other plain text files, regardless of file name extension)

Plain text file

.rtf

Rich Text Format file

.nws

Windows Live Mail newsgroup file

.zip

Zipped file

[a] Support for this format requires that you have installed Microsoft Office 2010 Filter Pack Service Pack 1, which is available at http://support.microsoft.com/kb/2460041.

FS4SP comes with an option named the Advanced Filter Pack. With the Advanced Filter Pack, you can extract text and metadata from several hundred additional document formats. These formats complement those included with the Microsoft Filter Pack.

More Info

You can see the complete list of files supported by the Advanced Filter Pack by opening the file <FASTSearchFolder>etcformatdetectorconverter_rules.xml in a text editor.

Before purchasing a third-party IFilter for metadata and text extraction, you should see whether the Advanced Filter Pack already supports your document format. The Advanced Filter Pack is turned off by default but is easily enabled by using Windows PowerShell. See Chapter 9 for information about enabling or disabling the Advanced Filter Pack.

Important

Remember to activate the Advanced Filter Pack on all servers that have the document processor component installed.

IFilter

If you have a file format that is not covered by the Microsoft Filter Pack or the Advanced Filter Pack, you can obtain and install a third-party IFilter that handles the document conversion. You can find several vendors of third-party IFilters; you can also develop your own.

When installing a third-party IFilter, you should pay attention to the following points:

  • Only 64-bit versions of IFilters work with FS4SP.

  • The IFilter has to be installed on all FS4SP servers hosting the document processor component.

  • Make sure the file extension that your new IFilter handles is not listed in the File Types list on your FAST Content SSA. The File Types list includes all extensions not to be crawled.

  • Edit <FASTSearchFolder>etcconfig_dataDocumentProcessorformatdetectoruser_converter_rules.xml on the FS4SP administration server to include the extension for the file handled by the IFilter.

    More Info

    For information about configuring FS4SP to use a third-party IFilter, go to http://msdn.microsoft.com/en-us/library/ff795798.aspx.

  • Issue psctrl reset in order for the document processors to pick up and use the new IFilter.

Replacing the Existing SharePoint Search with FS4SP

Depending on your content volume, doing a full crawl can take a lot of time. Fortunately, if you are upgrading to FS4SP from an existing SharePoint deployment, you can keep your existing SharePoint search while you deploy FS4SP and get it up and running.

SharePoint Central Administration has a section where you configure which service applications are associated with your current web application. The SharePoint Search Service Application is set as the default SSA in the application proxy group.

As long as you keep your existing SSA and include it in the application proxy group, SharePoint redirects any search queries to the existing search. When you are finished setting up the FAST Content SSA and the FAST Query SSA, you associate the FAST Query SSA as your new SSA in the application proxy group instead of your existing one.

More Info

You can read more about the application proxy group and how to add and remove service applications to a web application at http://technet.microsoft.com/en-us/library/ee704550.aspx.

To ease the transition of moving your crawler setup from your existing SSA to the FAST Content SSA, you can use the export and import Windows PowerShell scripts at the following locations:

Another approach is to upgrade your existing SSA to a FAST Query SSA. See Chapter 9 for information about how you can upgrade your existing SSA to ease the migration process.

Development Environments

When developing for FS4SP, you need both SharePoint and FS4SP installed. Because you may often change configurations in both environments, the most suitable development environment has everything installed on the same machine. This is not a supported production environment but works fine for a single-user development setup.

For a single-server FS4SP development setup, you start off with a domain controller that has SQL Server and SharePoint installed, and you continue with installing FS4SP on the same machine. If you develop pipeline extensibility stages or custom Web Parts in your solution, you also have to install Microsoft Visual Studio on your machine.

More Info

For information about setting up the development environment for SharePoint on Windows Vista, Windows 7, and Windows Server 2008, go to http://msdn.microsoft.com/en-us/library/ee554869.aspx.

FS4SP supports SharePoint in both stand-alone and farm mode, but it requires SharePoint to be installed with an Enterprise license to enable the FS4SP features.

Single-Server Farm Setup

As explained earlier, FS4SP is a disk I/O–intensive application. The same goes for SharePoint because most operations require database access. When bundling SQL Server, SharePoint, and FS4SP on the same machine, disk I/O quickly becomes the bottleneck.

Desktop computers have room for setting up multiple disks in RAID and can increase the disk I/O throughput that way; however, laptops usually have room for only one or two hard drives, which are also slower than the desktop counterparts because of the smaller form factor.

Using an SSD disk is key to setting up an optimal single-server development environment for FS4SP. This yields far better performance compared to mechanical drives, even when the mechanical drives are set up in RAID. If you have the option to use SSD disks, you can save a lot of waiting time during development and test cycles, and you will become a more effective developer.

When it comes to RAM, 4 GB is sufficient when combined with an SSD disk but should be increased if you don’t have SSD as an option.

To sum it up, the faster the disk is and the more memory you have, the more responsive your development environment will be.

Multi-Server Farm Setup

If you develop modules for both the SharePoint environment and the FS4SP environment, best practice is to have both SharePoint and FS4SP installed on the same server; equal to the single-server setup described in the previous section.

By adding a second server, you can move the domain controller role and SQL Server over to that machine. This move reduces the disk I/O on your development machine.

If you develop modules for either SharePoint or FS4SP only, you can add a third server and move out the system you are not targeting in your development to that server. This leaves the most resources to the server you are working on for your development, ensuring that your environment is as effective as possible.

Physical Machines

If you have a dedicated machine for development, install Windows Server 2008 SP2 or Windows Server 2008 R2 as your operating system and install SharePoint and FS4SP afterward, just like you would when setting up a development server for SharePoint. This allows your setup to have full access to disk, RAM, and CPU.

Virtual Machines

If you don’t have a dedicated FS4SP development machine but have to host FS4SP on an existing machine that runs your day-to-day applications like Microsoft Exchange Server or Active Directory directory services, a better option is to use a virtual machine. A virtual machine does not have direct access to your disk, RAM, and CPU and will have a performance overhead, most noticeably on the disk.

If you must use solely Microsoft products, your only option for running a virtual machine is to use Windows Server 2008 R2 with Hyper-V as your desktop operating system. Microsoft Virtual PC is not an option because it does not support running guest operating systems in 64 bit, a requirement for SharePoint and FS4SP.

Other alternatives are to use the free VMware Player, the commercial VMware Workstation, or Oracle VirtualBox. All of these support 64-bit guest operating systems, for example, Windows Server 2008 R2.

Booting from a VHD

An alternative to running a full virtual server is to only virtualize the disk and use dual boot. With dual booting, you can have several operating systems installed side by side on different disk partitions or on Hyper-V virtual disks and pick which operating system to start when you turn on your computer.

Both Windows 7 and Windows 2008 R2 come with a feature that you can use to boot the operating system from a virtual hard file (VHD). This boot option has the benefit that only the disk subsystem is virtualized while the CPU and RAM are accessed directly. This way, you can run your FS4SP development setup side by side with your day-to-day machine setup but at the expense of having to reboot in order to switch between the environments.

More Info

TechNet has a video on how to set up dual-boot with Windows 7 and Windows Server 2008 R2 at http://technet.microsoft.com/en-us/edge/Video/ff710851.

Production Environments

Setting up a production environment for FS4SP involves careful planning and consideration to match your search requirements. This section highlights important areas that affect how you set up your production environment.

More Info

TechNet has a thorough explanation on performance and capacity planning from both a business perspective and system architectural perspective at http://technet.microsoft.com/en-us/library/gg604780.aspx.

Content Volume

When you start to plan your FS4SP farm, the first thing you should consider is how much content you intend to index. Content includes objects such as documents, webpages, SharePoint list items, and database records.

In our opinion, it is better to know how many files of different file formats you want to index instead of knowing the disk usage of the raw data, for example, 2 TB. There are a couple of reasons to use the number-of-items approach instead of the disk space approach. One reason is that a file server usually contains a wide variety of files, many of which consume a lot of space but are not interesting for indexing. Log files, backups, and company event pictures are such examples and often consume a large part of a file server’s total volume. The second reason is the default absolute maximum limit for the number of items in one index column of 30 million items, as mentioned earlier; when exceeding the recommended 15 million items, indexing goes slower and slower until you hit 30 million. At this point, the index column stops accepting more items.

More Info

For recommendations about content volume capacity for FS4SP, go to http://technet.microsoft.com/en-us/library/gg702617.aspx.

The 30-million item limit exists regardless of the item size because FS4SP internally is configured with six internal partitions, each holding 5 million items. During indexing, items are moved between the partitions; when you start filling them up, you generate more disk I/O, which accounts for the degradation in indexing speed. Also, searching over a larger index size requires more disk I/O per search. That said, size is not unimportant, and it is important to know what data the 2 TB contains and how many unique items it represents. Indexing 100,000 list rows in SharePoint generates a smaller index on disk compared to indexing 100,000 Word documents. Hence, searching across the 100,000 list items performs faster compared to the Word documents for the same reason.

Note

The 15-million item limit can be extended to 40 million by using high density mode. For more information, go to http://technet.microsoft.com/en-us/library/gg482017.aspx.

Ideally, you would be able to add more servers as your content grows; however, there is currently no automatic distribution of the already-indexed content to new servers. You must either perform an index reset and full crawl when extending the index capacity or see the section Server Topology Management in Chapter 5, which discusses how to redistribute the internal FIXML files. If downtime of your search solution is acceptable, you can go for the first approach, which is easier to perform. The second approach—redistributing the FIXML files—only causes downtime for indexing new content and is the most desirable when adding servers to accommodate content growth. For most businesses, however, search has become a vital part of the information infrastructure, and you should plan and estimate your content growth for the next two to three years in order to provide a search solution without downtime or partial results.

As an example, if you have 7 million items today and expect an annual growth of 30 percent, you will hit 15 million items in three years’ time. Although one server would accommodate your needs for this timespan, using two would be better because you can grow even more. With two servers, you also mitigate the risk that your content volume grows faster than planned while potentially improving performance.

Even though you can extend the capacity to 40 million items per server, you are better off following the 15-million item recommendation for performance reasons. If you follow the 15 million items per server recommendation, two index servers would be the best choice for the previous example. Content is distributed on all index servers in your FS4SP farm, and you get an added speed benefit.

Important

When you add a new index column, you must reindex all your content. There is no automatic distribution of the already-indexed items to the newly added index column. With this information in mind, you should plan your FS4SP farm in regard to content growth as well as what you have today.

Failover and High Availability

After you have scaled your FS4SP farm for content volume, you have to think about failover capabilities. If you need failover for search queries, you can add more search rows. If you want failover on indexing, you have to add a row with a backup indexer. The backup indexer does not provide an automatic failover like search does, and you must manually configure your deployment to start using the backup indexer.

In the previously mentioned example, you used three servers (columns). In order to add failover capabilities, you must add at least three more servers to add a new row to the deployment. Depending on your budget, you may want to start with two servers in a one-column, two-row layout where you can store up to 30 million items per server, and have failover for the items. The cost of using fewer servers is speed degradation over time, search downtime, and a full crawl performance when you add more index servers.

Query Throughput

After you decide on what content you want to index, you have to plan for how this content is going to be used in search scenarios. The number of queries your solution needs to handle is largely dependent on how many users you have and how often those users issue search queries. Many Internet retailers are dependent on search to drive the user interface. Retailers typically have thousands of concurrent users around the clock, all with completely different characteristics, and with most searches happening during work hours.

In cases where you don’t know what query volume you have to serve, we recommend that you start out simple with a one-row or two-row deployment and increase the query capacity as you go along by adding more search rows. You also need to determine how important search is in your organization. Again, for an Internet retailer, search is what drives the website, and you have to set up your system for redundancy and high availability. But for a small company, search might not be considered that important; the company might be able to tolerate several hours of downtime for the search system because it won’t affect day-to-day operations in the same way as for the Internet retailer.

In a one-row setup, your servers handle both indexing and search queries. During crawling, the search performance may degrade because of increased disk I/O. If your current search row is not able to serve the incoming query traffic, you can add more search rows to your FS4SP farm.

When scaling the query throughput with additional search rows, the search queries are distributed among the rows. Adding search rows improves performance in a linear fashion: If one search row can deliver 5 QPS, then two rows can deliver 10 and three rows can deliver 15 QPS. Adding new search rows does not require downtime. How many queries per second you can get from one search row depends on how many items you have, how many managed properties are searchable, and the number of refiners returned for your search query. More items increase query latency because more items have to be evaluated before returning the result.

More Info

Query latency is measured as the average round-trip delay from the point where a query is issued until a query result is returned.

The query latency can be reduced by limiting the number of items per column or by adding one or more dedicated search rows, or both. By adding search rows, you avoid having the indexing load affect the query latency, and you also achieve increased query availability. We recommend that you add a separate search row when the query latency must be kept low. Also, quicker disks reduce query latency, as mentioned earlier in this chapter.

The FAST Query SSA is what handles the search queries on the SharePoint farm. You can achieve high availability for the SharePoint farm by adding an additional query component to the FAST Query SSA.

Important

Do not deploy more than one FAST Query SSA associated with your FS4SP farm. For information about FAST Query SSA redundancy and availability, go to http://technet.microsoft.com/en-us/library/ff599525.aspx#QuerySSARedundancyAndAvailability.

See Performance Monitoring in Chapter 5 for information about how to identify where your bottlenecks are and for recommendations about adding more columns or rows to your FS4SP farm.

Freshness

Content freshness refers to how long an item appears in the search results after it is created or updated. When a user adds a new document to a SharePoint document library, does it need to be accessible in search results right away, or is waiting 15 minutes or until tomorrow sufficient? Because SharePoint uses schedule-based crawling, the theoretical minimum time from when an item is created to when it is searchable is around 2 minutes, but 10–15 minutes is a more realistic number to tell your users.

The answer to the freshness question varies depending on the data you index and is something you must take into account when setting up crawl schedules for your content. The more often you crawl, the fresher your content, but also the more load you put on the source systems and the FS4SP indexer servers. You might find that crawling at off-peak hours is sufficient for some sources and near-instant search is needed for other items.

Balancing your freshness requirements against the load these requirements will put on the different systems impacted by crawling is something you have to monitor and adjust accordingly when deploying FS4SP.

More Info

For detailed information about how to plan your FS4SP topology, go to http://technet.microsoft.com/en-us/library/ff599528.aspx.

Disk Sizing

When setting up your FS4SP servers and the SSAs on the SharePoint farm, you have to plan for how much disk space is needed for your deployment.

More Info

The following tables are based on the “Performance and capacity results” scenarios from TechNet at http://technet.microsoft.com/en-us/library/ff599526.aspx. The scenarios use a mix of SharePoint containing file server and web content and mimic what you would find in a typical intranet scenario.

In Table 4-8 and Table 4-9, we have listed the disk usage characteristics for each scenario. Only the medium scenario had actual numbers for Web Analyzer disk usage. For the other two scenarios, this number has been increased to account for more realistic numbers in a production environment. Also, note that the extra-large scenario uses two search rows, effectively doubling the disk space needed.

More Info

A note for the medium scenario states that the disk value is somewhat lower than the recommended dimensioning. For more information, go to http://technet.microsoft.com/en-us/library/ff599532.aspx.

Table 4-8. FS4SP disk sizing

Scenario

Number of items (million)

Original data size (TB)

Rows

Web Analyzer/million items peak (GB)

Index size (TB)

Total index size (TB)

Medium

44.0

11.0

1

1.63

2.2

2.3

Large

105.0

28.0

1

3.00

5.0

5.3

Extra-large

518.0

121.0

2

3.00

44.2

45.8

Table 4-9. Crawler database sizing

Crawl database data + log (GB)

149.0

369.0

4400.0

Note

One crawler database can hold approximately 50 million items. You can read about crawl components and crawl databases in the FS4SP Architecture section of Chapter 3.

When breaking down the numbers in space needed per item (per search row), we get the values in Table 4-10.

Table 4-10. Per-item size (KB)

Original data size

Index size

Crawler database

250.00

51.63

3.39

266.67

50.62

3.51

233.59

44.16

8.49

If we compare the numbers in Table 4-10 to a test in which you indexed 100,000 HTML files in 20 different languages, you get per-item size shown in Table 4-11.

Table 4-11. Per-item size (KB)

Original data size

Index size

Crawler database

2.08

19.93

2.50

For the intranet scenarios, we have a ratio of approximately 1:5 on raw data to indexed data, whereas the small HTML files have a ratio of 10:1 on raw data to indexed data. Seeing how different the index size can be compared to the raw data shows the importance of doing a test indexing of a representative sample of data; by doing so, you can properly estimate how much disk space is needed on your FS4SP servers.

Server Load Bottleneck Planning

When you index content, many integration points come into play. You can look in several places for bottlenecks. Figure 4-11 provides a graphical overview of bottleneck points between the different components. The following list describes some of the typical bottlenecks seen with FS4SP deployment and provides suggestions about how you should prepare and plan to avoid them:

  • Content sources. How long does it take to do a full or incremental crawl of a particular content source? If you want to decrease this time, you first have to see whether the connector in use is working as fast as possible or if it’s the source system that is not able to deliver content fast enough. As long as the content source is able to deliver data without maxing out the hardware limit or the bandwidth available, or without disturbing normal operations, you can increase the number of simultaneous requests a connector issues to the content source. By adding more connector threads, you can decrease the time you use to retrieve all items.

  • Document processors. After the FAST Content SSA has sent the data over to the FS4SP farm, each item has to be run through the indexing pipeline. If you don’t have enough document processors in your system to handle the data coming in, there is no need to add crawler impact rules to speed up crawling of the content sources.

    Load bottleneck points in FS4SP.

    Figure 4-11. Load bottleneck points in FS4SP.

    When monitoring your FS4SP farm, if you see that the document processors are not running at full capacity during indexing, you can add crawler impact rules on the content source to speed up crawling. If doing so, you have to check that the source system can handle the increased load. As a rule, you should not add crawler impact rules to speed up crawling if you have fewer than 100 document processors in your FS4SP farm because the crawlers will most likely fetch data quicker than the document processors can handle. By default, the FAST Content SSA sends a maximum of 100 batches at a time to the Content Distributor, and each batch is further sent to a document processor. The FAST Content SSA throttles the crawls if the FS4SP farm cannot handle all items being sent over without queuing them up. Throttling starts if FS4SP takes more time to process the items it receives than it takes for the crawling to put more items in the queue.

    Note

    When adding more document processors, make sure the indexer can keep up with the data coming in to prevent moving the bottleneck from the document processors to the indexer. Also, pay close attention to whether the source system you are indexing can handle the crawl load.

    More Info

    For more information about performance and capacity tuning, go to http://technet.microsoft.com/en-us/library/gg604781.aspx. Also, see the section Performance Monitoring in Chapter 5 for information about how to monitor your FS4SP deployment to find potential performance bottlenecks.

  • Network. Determine how much bandwidth is available between the content sources and SharePoint, between SharePoint and FS4SP, and between the different FS4SP servers. Planning and monitoring this is key to having an optimized search system. If you have reached your maximum bandwidth, latency will start increasing, and you might consider using dedicated network cards and switches between key components.

  • Dedicated Search Rows. When building the search indexes, the FS4SP indexing servers use a lot of hardware resources. If you also have the search component on the same server, this is affected by indexing, thus lowering the queries per second the search row can deliver. By having one or more dedicated search rows, you have control over the queries per second your FS4SP farm can deliver without being affected by the indexing.

Conclusion

When you set out to deploy FS4SP, you should know your content and have a good idea of how you want to search against it. What type of items you have, what content sources they come from, and how many of them you will index are key factors that will determine hardware and structural considerations. The more questions you can answer up front, the easier it is to make the right decisions when you plan your initial deployment.

Factors like content volume, query freshness, and redundancy are important to consider when you start sketching out how your SharePoint and FS4SP farms should be deployed and how much and what type of hardware to deploy. Discussing the farm topologies with your IT administrator is another important part of your deployment in order to get all domain policies and service accounts set up correctly.

Also keep in mind which scenarios you may have to prepare for in the future. Be aware that adding new search columns to your deployment requires you to plan ahead to reduce potential downtime. If you set up a large farm, you should consider how to script the installation in order to more quickly move changes between environments, or consider changing the FS4SP topology itself.

The goal of this chapter has been to help you surface the proper questions, understand the hardware and software requirements of FS4SP, and be ready to deploy the solution in the most efficient and problem-free way.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.103.5