Chapter 5. Operations

After completing this chapter, you will be able to

  • Manage the day-to-day operations of FS4SP.

  • Modify the FS4SP farm topology.

  • Configure logging and do basic monitoring and system health analysis.

  • Back up your FS4SP solution, and recover it when disaster strikes.

You can manage many features of Microsoft FAST Search Server 2010 for SharePoint (FS4SP) in Microsoft SharePoint directly. But many more FS4SP toggles and settings are not immediately apparent nor available from the SharePoint GUI. In this chapter, you explore the importance of Windows PowerShell and delve into how you can manage your search engine on a daily basis. Additionally, you go through the tools available for monitoring and analyzing system health and learn what you should do to back up your solution properly.

Introduction to FS4SP Operations

Operations are an essential element of any IT platform. The concept of “set it and forget it” is entirely foreign to Enterprise Search. Search is a constant three-party dance between content sources, the search engine, and the users. Keeping a constant eye on your search engine’s health and performance and all its components makes the difference between success and failure.

Compared to previous FAST technology, monitoring is now three-tiered. You have Windows-based event logging for simple identification of critical errors and monitoring via performance counters and Systems Center Operations Manager (SCOM). You can perform more in-depth investigations using the detailed FAST logs. Finally, you also have logs accessible in SharePoint.

On a high level, FS4SP administration has four main methods. Table 5-1 briefly describes these four methods, which are covered in more detail in the following sections of this chapter.

Table 5-1. Four main administration methods

Administration method

Example of typical operation

SharePoint

Examine crawl logs and search usage via Central Administration.

Windows PowerShell

Change FS4SP farm topology, apply patches, and modify the index schema.

Configuration files

Modify low-level settings, and edit FAST Search specific connector configurations.

Command-line tools

Start or stop internal FS4SP processes, and retrieve statistics of search servers.

Administration in SharePoint

Via Central Administration, you have access to information regarding crawling and searching, based on the activity of the Content Search Service Application (SSA) and FAST Query SSA. Timer jobs are run at regular intervals to check the health of the crawler database and will trigger alerts in SharePoint Central Administration when something is wrong.

Administration in Windows PowerShell

You can execute most of the FS4SP Windows PowerShell commands in the Microsoft FAST Search Server 2010 for SharePoint shell (FS4SP shell), but you must run a few commands from the SharePoint 2010 Management Shell.

In general, if a cmdlet contains the word FAST as part of its name, you execute it in an FS4SP shell on an FS4SP server; otherwise, you should execute it in a SharePoint Management Shell on a SharePoint server.

Note

Some of the command-line utilities and Windows PowerShell scripts in FS4SP require elevated privileges to run. Opening your shells by using Run as Administrator provides the appropriate rights. Alternatively, modifying the shortcut so that it always runs as a user that has administrative privileges eases management because you will always have the required permissions to execute any FS4SP-related command or script.

Also make sure the logged-in user executing the Windows PowerShell cmdlets is a member of the FASTSearchAdministrators local group on the server as specified during installation.

More than 80 different FS4SP-related Windows PowerShell cmdlets are available. These commands are roughly divided into five categories:

  • Administration cmdlets

  • Index schema cmdlets

  • Installation cmdlets

  • Spell-tuning cmdlets

  • Security cmdlets

More Info

For more information about Windows PowerShell cmdlets, go to http://technet.microsoft.com/en-us/library/ff393782.aspx.

Other Means of Administration

You can perform administrative tasks for FS4SP in a couple more ways, as the following topics explain.

Using Configuration Files

If you are moving from an older FAST incarnation to FS4SP, you might notice that the number of configuration files has gone down drastically. Many low-level settings are still available only by editing files, whether plain text or XML, but there are only 16 configuration files, of which 6 are related to the FAST Search specific connectors; you may edit any of these 16 configuration files without leaving your system in an unsupported state. The 6 FAST Search specific connectors are all XML files.

More Info

For more information about FS4SP configuration files, go to http://msdn.microsoft.com/en-us/library/ff354943.aspx.

There are several more important files, and you’ll learn more about a few of them in this chapter.

Using Command-Line Tools

You run and control some FS4SP operations using command-line tools (.exe files). These tools live in the <FASTSearchFolder>in folder and are available from the FS4SP shell. All in all, there are 25 of these tools. Throughout this book, you’ll become familiar with the most critical of these tools. You should understand the difference between these command-line tools and the Windows PowerShell cmdlets mentioned in the preceding section, Administration in Windows PowerShell. An example of a command-line tool operation is nctrl restart (nctrl.exe), which restarts the local FS4SP installation. An example of a Windows PowerShell cmdlet is Get-FASTSearchMetadataManagedProperty, which displays information about managed properties.

More Info

For a complete list of FS4SP command-line tools, go to http://msdn.microsoft.com/en-us/library/ee943520.aspx.

Basic Operations

When working with FS4SP, as with any larger system, you need to be able to control it in a safe manner. Even though the internal processes have safeguards in place, it is never a good idea to abruptly end one of the internal processes. If you do, the best case scenario is that you will temporarily lose some functionality. In the worst case, you can corrupt your index and cause the system to become nonfunctional. This section describes how to safely start, stop, suspend, and resume FS4SP or any of its internal components.

The Node Controller

Chapter 3, lists and describes all the FS4SP internal processes in Table 3-1. Depending on how many servers you’re using in your solution, these internal processes might be spread out or duplicated across the servers. The tool you use to safely manage these processes is called the Node Controller, which is an internal process just like any other in FS4SP except that the Node Controller is always running on all servers in your FS4SP farm.

You use the Node Controller to start, stop, suspend, resume, and dynamically scale up certain internal processes. Its command-line interface is called nctrl.exe. You find it in the <FASTSearchFolder>in folder. This tool is one of the command-line tools mentioned in the previous section, Administration in Windows PowerShell.

From an FS4SP shell, you can invoke the Node Controller with the command nctrl status. This returns a list of all internal FS4SP processes and their status on the local machine, as shown in Figure 5-1. Running this command is a good first step when you want to check the status of FS4SP. However, when troubleshooting FS4SP, a process marked as Running does not necessarily mean it is running correctly and without any problems.

Output from issuing the command nctrl status in a single-server FS4SP installation.

Figure 5-1. Output from issuing the command nctrl status in a single-server FS4SP installation.

Starting and Stopping a Single-Server Installation

To shut down a local single-server installation in a controlled and safe manner, run the command nctrl stop from an FS4SP shell. Subsequently, you can bring the server back up again using the command nctrl start. If you want to restart FS4SP immediately after a shut down, you can save yourself some typing using the command nctrl restart.

Important

Microsoft recommends that you suspend all crawls before shutting down FS4SP. This is also recommended when using the FAST Search specific connectors.

When running nctrl stop, the order in which FS4SP shuts down its internal processes is predefined according to the Node Controller’s configuration file. See the section A Note on Node Controller Internals later in this chapter for more information about this.

Important

If you need to reboot the whole server, Microsoft recommends that you shut down FS4SP manually before doing so, because FS4SP does not adequately shut down its internal processes when Windows is shutting down. Read more about this in the section Relationship Between the Node Controller and the FS4SP Windows Services later in this chapter.

Starting and Stopping a Multi-Server Installation

If your FS4SP installation is spread out over several servers, starting and stopping the solution works in the same manner for each server as it does in a single-server installation. To shut down the entire installation, you have to manually (or by using remote scripting) issue the nctrl command on each server. Follow the procedure described in the previous section, Starting and Stopping a Single-Server Installation.

Depending on your farm configuration, the order in which you shut down the servers can be important. On a standard multi-server installation, you should follow this procedure:

  1. Suspend all crawls and content feeding.

  2. Stop all FAST Search specific connectors.

  3. Shut down all servers configured to run document processors.

  4. Shut down all servers configured with the indexer component.

  5. Shut down all servers configured as QR Servers.

  6. Shut down all remaining servers; the FS4SP administration server should be the last one to shut down.

Even though a healthy FS4SP installation should have no problem being shut down in a different order than just listed, this strategy minimizes the risk of ending up with an index in an unknown or invalid state. It also minimizes the number of warnings that the FS4SP internal processes inevitably emit when other processes become unavailable.

During startup, you start the servers in the reverse order—proceeding from item 6 to item 2 in the previous list—and then resume all relevant crawls.

More Info

Microsoft has published a farm version of nctrl that you can use to start or stop any FS4SP server remotely from the administration server, making it considerably easier to manage large farms. Go to http://gallery.technet.microsoft.com/scriptcenter/cc2f9dc4-2af8-4176-98d2-f7341d0d5b39 for more information.

Starting and Stopping Internal Processes

To stop an individual FS4SP process, run nctrl stop [process name], for example, nctrl stop indexer. This puts the specified process into a “User Suspended” status, meaning that you have to manually bring it back up by running nctrl start [process name]. If you do not, the process remains in the “User Suspended” state even after an nctrl restart, or server reboot.

FS4SP also provides the psctrl.exe tool, which can control all available document processors in the solution—regardless of which server they run on. Among other things, you can use it to restart all available document processors without having to log on to every server in the farm. You do this by issuing a psctrl stop command from any FS4SP server in the farm. Confusingly, there is no start command, because FS4SP automatically restarts the document processors shortly after they shut down.

More Info

For more information about the psctrl.exe tool, go to http://technet.microsoft.com/en-us/library/ee943506.aspx.

Relationship Between the Node Controller and the FS4SP Windows Services

As the boxed area in Figure 5-2 shows, six Windows services are related to FS4SP. There is an overlap between the Node Controller and these services. In fact, if you compare the services in Figure 5-2 to the processes shown in Figure 5-1, you will notice that five of the six services are also defined in the Node Controller.

The FS4SP services. All services except FAST Search for SharePoint Monitoring are defined in the Node Controller. Note that this screenshot shows a single-server installation, and the content may differ for a multi-server installation.

Figure 5-2. The FS4SP services. All services except FAST Search for SharePoint Monitoring are defined in the Node Controller. Note that this screenshot shows a single-server installation, and the content may differ for a multi-server installation.

This overlap of services means that instead of shutting down the FAST Search for SharePoint Browser Engine service through the GUI, you can run nctrl stop browserengine. This applies to all FS4SP services except the FAST Search for SharePoint Monitoring service, which exists only as a Windows service. Because it is a monitoring tool, it is of course crucial to never shut down the FAST Search for SharePoint Monitoring service unless stated otherwise in the documentation, even though the rest of FS4SP is shut down using the nctrl stop command.

Restarting the most important Windows service, FAST Search for SharePoint, is the equivalent of running nctrl restart. Doing so also shuts down all five other services—all but the monitoring service. Choose whichever method you prefer; there are no differences between the two. If the monitoring service is already stopped when you issue an nctrl start or when you start the FAST Search for SharePoint service, the monitoring service automatically starts.

Notice that the Windows services (and consequently the internal processes) running on a particular server in a multi-server installation vary, depending on your deployment. A larger FS4SP multi-server deployment might have dedicated servers for the indexer, document processors, and so on. See Chapter 3 and Chapter 4, for more information.

Adding and Removing Document Processors

In the list of running processes shown in Figure 5-1, you can see four Document Processor processes. These are the processes that run items through the indexing pipeline, executing every processor for every item that flows through the pipeline. Thus, the number of document processors is one of the factors that limits how fast you can crawl and index items. The Production Environments section in Chapter 4 provides in-depth information about how you can distribute the FS4SP components over several servers to achieve optimal performance and avoid bottlenecks.

A default FS4SP single-server installation comes with four document processors, meaning that the solution can process a maximum of four batches of items at one time.

Note

Items are not sent one by one through the indexing pipeline, but rather in batches of several items at one time.

It is generally recommended that you enable no more than one document processor per available CPU core. However, depending on your hardware, you might want to increase or decrease this number. If you do, remember to leave some CPU power for the operating system itself and for the rest of the FS4SP processes running on the same server.

Note

A dual core CPU with hyper-threading enabled shows up as four CPU cores, so you can enable four document processors.

To dynamically add a new document processor on a server configured with document processors, run the following command.

nctrl add procserver

After you run the command, the nctrl status command should list one additional document processor, for example, “procserver_5.” As soon as the process is marked as Running, it is live and ready to process incoming documents. Note that it is possible to add document processors to any server within a server installation, but remember that the nctrl add procserver command affects only the server that is being run.

Warning

If you have developed your own Pipeline Extensibility processors, ensure that all servers on which you intend to run document processors are equipped with a local copy and that the input parameters are correct. Custom code is not distributed automatically. Input parameters that do not exist will make the indexing pipeline skip the item entirely.

You can remove a document processor on a local server with the following command.

nctrl remove procserver

This command removes the document processor with the highest available number as listed by nctrl status. If the document processor is currently running, it will shut down before it is removed.

It is possible to perform these operations during content feeding. However, if you are shutting down a document processor that is currently processing content, you might see warnings in your content source or connector logs. Each currently processed item triggers a callback indicating that it was dropped in the indexing pipeline before it reached the indexer. Both the FAST Content SSA crawlers and the FAST Search specific connectors typically re-feed such items to another document processor as soon as possible after such callbacks occur.

You can also increase the number of document processors by changing the deployment configuration as described in the section Server Topology Management later in this chapter, but doing so requires downtime on the servers. In contrast, adding them with nctrl is a run-time operation.

A Note on Node Controller Internals

During a full startup or shutdown, the processes are started and stopped in the sequence defined at the top of the Node Controller’s configuration file (<FASTSearchFolder>etcNodeConf.xml). Here you can also see exactly how each internal process is invoked, which binary is called, and what parameters are passed to it.

Even though you can modify which processes run on a particular server by reconfiguring the Node Controller configuration, the supported method to perform deployment reconfiguration is through the deployment configuration templates and the Set-FASTSearchConfiguration cmdlet.

More Info

For more information about reconfiguring farm deployment, go to http://technet.microsoft.com/en-us/library/ff381247.aspx.

Warning

Do not modify the Node Controller configuration file unless you know exactly what you are doing. NodeConf.xml is an internal configuration file, so in addition to leaving your system in an unsupported state, you might also end up breaking your installation.

If you do change the Node Controller configuration file, you have to issue the command nctrl reloadcfg to force the Node Controller to pick up the changes.

Indexer Administration

FS4SP provides an indexer administration tool called indexeradmin.exe. It provides only a few—but invaluable—commands for interacting with the indexer. Additionally, the tool is equipped with a set of parameters that allows you to control all indexers in a multi-server farm, as opposed to just the local indexer.

More Info

For more information about indexeradmin.exe, go to http://technet.microsoft.com/en-us/library/ee943517.aspx.

Suspending and Resuming Indexing Operations

A frequent usage of indexeradmin is to instruct it to suspend all indexer processes while feeding FS4SP large volumes of data at once, for example, at a full crawl. When suspending the indexers, FS4SP continues to receive items, run them through the document processors, and generate the intermediate FAST Index Markup Language (FIXML) files used to build the search index, but the indexer(s) will not convert these files into a searchable index. Hence, indexing will be suspended. This suspension reduces the number of simultaneous read and write I/O operations on the disk subsystem, improving the speed at which FS4SP can receive items.

After the large feeding session completes, you can resume indexing operations, at which point the indexer will start processing the FIXML files, making content searchable to the end users.

You suspend indexing operations by issuing the following command.

indexeradmin -a suspendindexing

The parameter -a tells indexeradmin to suspend indexing on all indexers in the farm. It is, however, possible to manually go to each server and run the same command without -a.

You resume indexing operations on all servers by issuing the following command.

indexeradmin -a resumeindexing

You can use the tool indexerinfo.exe to retrieve information about the FS4SP indexer. To check the status of the indexer(s), issue the following command and examine the XML output, as shown in Figure 5-3.

indexerstatus -a status

More Info

For more information about indexerinfo.exe, go to http://technet.microsoft.com/en-us/library/ee943511.aspx.

Status from a two-server deployment, where the server node2.comp.test has a suspended indexer while the server test.comp.test is “running ok”.

Figure 5-3. Status from a two-server deployment, where the server node2.comp.test has a suspended indexer while the server test.comp.test is “running ok”.

Rebuilding a Corrupt Index

You can also use the indexeradmin.exe tool to rebuild the binary index from the intermediate FIXML files. If you experience unusual behavior after an index schema configuration change, or notice warnings or even errors in the indexer logs, rebuilding the binary index can often alleviate the problems.

Because you would typically rebuild your index from FIXML only when your solution is malfunctioning, it is considered good practice to stop all current crawls during the process to minimize strain on the solution. Start the process by invoking the following command.

indexeradmin -a resetindex

This operation is asynchronous, and although you’ll see command output that says SUCCESS in capital letters, that message indicates only that the indexers have successfully started the procedure. You can monitor the operation’s progress by using the command indexerinfo -a. Issue the command repeatedly and observe the attribute status of each <partition> tag. Each partition will change status from “waiting” to “indexing” until it finally reaches “idle.”

Note

The indexeradmin resetindex command is different from the SharePoint index reset procedure. An indexeradmin resetindex command rebuilds the binary index on the FS4SP index servers, whereas the FAST Content SSA’s reset index procedure erases the crawler information in SharePoint, requiring a full crawl of all content sources, and possibly rendering the crawler store and the search index out of sync. For more information about resetting the content index, go to http://technet.microsoft.com/en-us/library/ff191228.aspx.

Search Administration

The indexerinfo tool previously mentioned has an equivalent for the query matching and query dispatching components in FS4SP. The tool is called searchinfo.exe. (For more information about this tool, go to http://technet.microsoft.com/en-us/library/ee943522.aspx.)

Running the following command returns an XML report showing the current state of the search and query subsystem of FS4SP.

searchinfo -a status

As was the case with indexeradmin and indexerinfo, omitting the parameter -a returns only the status from the server that the command is executed from.

The output from searchinfo can be extremely large and quite overwhelming. A simple method for making it easier to digest is to use the built-in XML parser in Windows PowerShell. Running the following script in an FS4SP shell rewrites the output from searchinfo to a table, as shown in Figure 5-4.

[Xml]$xml = (searchinfo status) -Replace '<!DOCTYPE search-stats SYSTEM "search-stats-1.0.dtd">'
$xml."search-stats".fdispatch.datasets.dataset

Of particular interest in this report is the uptime of the solution, the total number of searches, and the various search times. You can also inspect the status of each partition by printing the engine property.

[Xml]$xml = (searchinfo status) -Replace '<!DOCTYPE search-stats SYSTEM "search-stats-1.0.dtd">'
$xml."search-stats".fdispatch.datasets.dataset.engine

The output printed from the previous command gives you a broad view of the status of each index partition across all servers configured as QR Servers. When the status of a partition is listed as “up,” the partition is functioning properly, and items stored in the partition are returned for matching queries. When a partition is listed as “down,” the particular slice of the index held in the corresponding partition is not searchable.

Search statistics showing five executed searches; the average time per search was 0.021 seconds.

Figure 5-4. Search statistics showing five executed searches; the average time per search was 0.021 seconds.

More Info

For detailed documentation about the XML report syntax of searchinfo, see the TechNet article at http://technet.microsoft.com/en-us/library/gg471169.aspx.

Search Click-Through Analysis

If you enabled search click-through analysis during the installation process, you activated the SPRel component of FS4SP. SPRel is a search click-through analysis engine that analyzes which entries users click in search results. SharePoint collects the click-through logs, which are harvested by FS4SP using a timer job. The SPRel component analyzes the logs, flagging the most frequently clicked items as best matches for the terms that were used to retrieve the search result. SPRel uses this information to improve the relevance of future queries, giving the flagged best matches an extra boost in the search results.

Checking the Status of SPRel

You check the status of SPRel with the spreladmin.exe command-line tool.

spreladmin showstatus

Running this command tells you how often SPRel is scheduled to run, lists the overall status, and prints out run-time statistics from previous log analysis sessions.

More Info

For more information about sprelladmin.exe, go to http://technet.microsoft.com/en-us/library/ee943519.aspx.

Reconfiguring SPRel

View the current configuration of SPRel by using the following command.

spreladmin showconfig

Several aspects of SPRel are configurable, such as the number of CPU cores that will be used during processing and whether old click-through logs are kept or deleted. For example, to change how many days of logs SPRel should consume in each log analysis session, you can reconfigure the use_clicklogs parameter.

spreladmin setconfig –k use_clicklogs –u 90

Note that you can change all other parameters listed from spreladmin showconfig in the same manner.

Note

Bear in mind that changing how many days of logs SPRel includes in the analysis, as was done in the previous example, is directly related to the ranking of search results. The default setting is 30 days. If it seems that click-through analysis has little to no effect in search results, as is typical in a solution with few users, it makes sense to increase the time span so that more clicks have time to be accumulated.

Be careful when lowering this value from the default, because small amounts of user traffic then would be able to influence the ranking of your search results.

Link Analysis

The Web Analyzer components examine crawled content and scan items for incoming and outgoing links to other items. A link graph is calculated to find highly connected items, which receive a contribution to their rank component. The technique is similar to Google’s well-known PageRank algorithm. FS4SP uses this technique not only when crawling external websites, but also when crawling content from SharePoint.

The Web Analyzer, which if necessary can be scaled out to several servers, is made from four logical components:

  • webanalyzer. A single-server process that handles configuration, scheduling, and acts as a master in distributed Web Analyzer configurations.

  • fdmworker. A server that receives tasks from the Web Analyzer master during link processing. Distributed across servers in large-scale solutions.

  • walinkstorerreceiverA server that receives tasks from the Web Analyzer master during link processing. Distributed across servers in large-scale solutions.

  • walookupdb. A key-value lookup server that retrieves link processing information. When items are processed, the indexing pipeline talks to this component to find out how well-connected the current item is in the link graph. Distributed across servers in large-scale solutions.

Checking the Status of the Web Analyzer

The waadmin.exe tool is the common front end for administrating the Web Analyzer. Check the current status by using the following command.

waadmin showstatus

The output tells you whether the Web Analyzer is currently processing links, which content collections are being monitored, whether any errors were reported during processing, and other information.

More Info

For more information about waadmin.exe, go to http://technet.microsoft.com/en-us/library/ee943529.aspx.

Forcing the Web Analyzer to Run

You start the Web Analyzer outside of the running schedule with the following command.

waadmin forceprocessing

Listing the Relevance Data for a Specific Item

The following example shows relevance data for an item with an identifying <URL>, assuming the item has been indexed and processed by the Web Analyzer.

waadmin -q <URL>

Tip

The <URL> is, confusingly enough, not a standard URL; instead, it’s the primary key of the index that FS4SP uses internally. If a particular item has been indexed using any of the FAST Content SSA connectors, the item starts with ssic, for example, ssic://13451. If a particular item was indexed using any of the FAST Search specific connectors, or by using the docpush command-line utility, the <URL> is typically a proper URl, such as http://support.microsoft.com/kb/2592062.

Find the <URL>—that is, the primary key—for a particular item by using either of the following two methods:

  • If the item was indexed through the FAST Content SSA, use Central Administration. Follow these steps:

    1. In Central Administration, click Manage Service Applications, and then click your FAST Content SSA.

    2. Click Crawl Log in the left pane.

    3. Click the Successes number on the content source for your item.

    4. Under Search By, choose URL Or Host Name, and paste it in your document link, for example, http://test/Docs/workerz/Shared%20Documents/Platform-Test-Framework-1.0.doc.

    5. Click Search.

    6. Make a note of the internal ID for this document, as shown in Figure 5-5. Prefix this value with ssic:// to build the item ID expected by the Web Analyzer utilities.

      Inspect a content source for a crawled item and display the item ID. Note that this value will have to be prefixed with ssic:// to correctly reflect the internal primary key of FS4SP.

      Figure 5-5. Inspect a content source for a crawled item and display the item ID. Note that this value will have to be prefixed with ssic:// to correctly reflect the internal primary key of FS4SP.

  • If the item was indexed using the FAST Search specific connectors, use the Search Result Page. Follow these steps:

    1. Navigate to your Search Center result page.

    2. Edit the page, and then edit the Core Results Web Part.

    3. Under Display Properties, clear the Use Location Visualization check box, and then add a new column definition to the Fetched Properties list.

      <Column Name="contentid"/>
    4. Either edit the XSLT via the XSL Editor... button to include listing the new column, or use the following XSLT to render all data that reaches the Web Part as pure XML without any design elements.

      <?xml version="1.0" encoding="UTF-8"?>
      <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
        <xsl:template match="/">
          <xmp><xsl:copy-of select="*"/></xmp>
        </xsl:template>
      </xsl:stylesheet>
    5. Save your Web Part, and then reload the Search Center result page.

    6. Search for something that will return the item for which you want to find the ID.

    7. Make a note of the value listed inside the <contentid> tag, as shown in Figure 5-6.

      A search result listed as raw XML, displaying the value that FS4SP uses internally as the primary key of the index.

      Figure 5-6. A search result listed as raw XML, displaying the value that FS4SP uses internally as the primary key of the index.

When you have figured out the exact <URL> that the Web Analyzer utility expects, call the waadmin tool (shown in the following example command) to show a particular item’s link relevance. Example results of running the command are shown in Figure 5-7.

waadmin -q ssic://294 ShowURIRelevance
Retrieving link relevance for the item identified with ssic://294.

Figure 5-7. Retrieving link relevance for the item identified with ssic://294.

Server Topology Management

FS4SP was designed from the beginning to scale well. You might, for example, want to increase index volume, get better performance, or add servers for fail-over redundancy. However, because FS4SP is integrated into SharePoint, the different scaling scenarios involve modifying either the FS4SP farm or the SharePoint farm. The following two sections describe which scenarios relate to which farm and outline the steps you need to take.

Warning

Modifying the server topology is considered an advanced operation. You should be very careful when doing so, especially in production environments.

Modifying the Topology on the FS4SP Farm

The main scenarios for modifying the topology of your FS4SP farm—that is, adding more machines to your server configuration—are:

  • Increasing the index capacity or indexing throughput.

  • Increasing the query throughput.

  • Adding redundancy to a component.

Important

If you add one or more columns to your existing index topology, you need to do a full recrawl of all your content because the index will be deleted. Other topology changes cause downtime, but the index will be left untouched and incremental crawls can be continued.

There is currently no automatic distribution of the existing content from the existing column over to the added one(s), and therefore you must either perform a full crawl of all the content or use the fixmlfeeder tool. Because of this limitation, it is recommended that you plan ahead to determine how much content you will need to index in the next one to three years. The more content you have, the longer it will take to run full recrawls.

TechNet provides a step-by-step guide on how to increase the capacity by adding new index columns. Go to http://technet.microsoft.com/en-us/library/gg482015.aspx.

Modify the index topology

  1. If adding new index columns, stop all crawls and reset the content index.

    More Info

    For detailed steps about adding index columns, go to http://technet.microsoft.com/en-us/library/gg482015.aspx.

  2. Edit <FASTSearchFolder>etcconfig_datadeploymentdeployment.xml on the administration server and make the necessary changes.

  3. Stop the FAST Search for SharePoint service and the FAST Search for SharePoint Monitoring service on the administration server.

  4. Update the deployment configuration by executing the following command in an FS4SP shell.

    Set-FASTSearchConfiguration
  5. Start the FAST Search for SharePoint service. This also starts the FAST Search for SharePoint Monitoring service.

  6. Repeat steps 2–5 on the other servers in your FS4SP farm.

  7. If you added index columns or Query Result (QR) servers, update the FAST Content SSA and FAST Query SSA configuration via Central Administration with the updated values from <FASTSearchFolder>Install_Info.txt.

    More Info

    For detailed steps about updating FAST Content SSA or FAST Query SSA servers, go to http://technet.microsoft.com/en-us/library/ff381247.aspx#BKMK_MakeTheNecessaryChangesOnMOSS.

  8. If you added index columns, start a full crawl on all your content sources.

Adding One or More Index Columns Without Recrawling

As stated previously, a full crawl of all content sources is required when adding index columns to an FS4SP farm deployment. This is because content has to be redistributed to span across all columns, including the new one. Recrawling all content is the easy but time-consuming way to achieve this. Another method, arguably harder but also much faster, involves using the FIXMLfeeder.exe tool.

This tool was created for the exact purpose of redistributing content across columns, and is used to refeed the intermediate FIXML files that were generated during the original indexing. Instead of performing a full crawl where all items are processed, the already-processed FIXML files are fed directly to the indexing dispatcher, bypassing the content distributor and indexing pipeline. This direct feeding ensures that content is again distributed correctly across all servers, including the new one.

More Info

The steps for using the FIXMLfeeder.exe tool to redistribute the index when adding or removing columns are described in detail in the white paper “FAST Search Server 2010 for SharePoint Add or Remove an Index Column,” which can be downloaded from http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=28548.

More Info

For detailed steps about updating FAST Content SSA or FAST Query SSA servers, go to http://technet.microsoft.com/en-us/library/ff381247.aspx#BKMK_MakeTheNecessaryChangesOnMOSS.

Modifying the Topology on the SharePoint Farm

Scaling the SharePoint end of your search topology relates mainly to the crawling of content via the FAST Content SSA. It is possible to scale the FAST Query SSA as well, but because all search logic is handled by the FS4SP farm, there is little need for this except to provide failover on the SharePoint farm for search queries. As such, the main scenarios for modifying the topology of your SharePoint farm are:

  • Increasing the crawl rate.

  • Scaling the crawler database.

The methods used for modifying the topology are as follows:

  • Add crawler impact rules. You can create crawler impact rules to specify how many items are retrieved in parallel and how long to wait between each request. Before increasing the number of items to retrieve in parallel, make sure the system you are crawling can handle the load and you have enough processing power on the FS4SP farm to handle the increased crawling rate.

    More Info

    For more information about managing crawler impact rules, go to http://technet.microsoft.com/en-us/library/ff381257.aspx.

  • Add crawl components. If your SharePoint application server is saturated on CPU usage or network bandwidth caused by crawling of content sources, you can distribute the content source crawling by adding crawl components on several SharePoint farm servers. This will distribute the crawl load between the servers.

    More Info

    For more information about adding or removing crawl components, go to http://technet.microsoft.com/en-us/library/ff599534.aspx.

  • Add crawl databases. When crawling more than 50 million items, you should consider adding additional crawler databases. Doing so avoids crawling as a result of increased time spent updating the SQL server crawl database.

    More Info

    For more information about adding or removing crawl databases, go to http://technet.microsoft.com/en-us/library/ff599536.aspx.

  • Add host distribution rules. If you have only one crawler database, you do not need to add any host distribution rules. Host distribution rules are used to associate a host with a specific crawl database. By default, hosts are load balanced across crawl databases based on space availability. However, you may want to assign a host to a specific crawl database for availability and performance optimization.

    More Info

    For more information about adding or removing host distribution rules, go to http://technet.microsoft.com/en-us/library/ff599527.aspx.

Refer to Chapter 3 for a longer discussion on crawl components and crawl databases.

Changing the Location of Data and Log Files

FS4SP stores data and log files in the <FASTSearchFolder>data and <FASTSearchFolder>var log folders. These are predefined locations and not customizable during the installation process. It is possible to change these locations after the installation by editing two internal configuration files, although this is not supported by Microsoft. Service packs or hotfixes might overwrite the configuration files, changing your edits back to the default settings. Creating a junction point from any subfolder below the <FASTSearchFolder> folder to other folders and volumes is, however, supported.

Note

A junction point is different from a shortcut and is a feature of the NTFS file system. Junction points are transparent to the user; the links appear as normal files or folders and can be acted upon by the user or application in exactly the same manner as if the file or folder were physically present in the location referred to.

You can use the command-line utility mklink to create junction points. For information about the mklink utility, go to http://technet.microsoft.com/en-us/library/cc753194(WS.10).aspx.

Change the location of data and log files

In this scenario, assume the operating system is on the C drive, and FS4SP is installed in the C:FASTSearch folder. Also assume that you want to store the data files on the D drive and the log files on the E drive:

  1. Open an FS4SP shell with administrative privileges.

  2. Stop the local FS4SP solution by running nctrl stop.

  3. Move the FS4SP data files to the new volume.

    move C:FASTSearchdata c:
    ewdata
  4. Move the FS4SP log files to the new volume.

    move C:FASTSearchvarlog e:log
  5. Create a junction point from the new volume to the old data location.

    mklink /j C:FASTSearchdata d:data
  6. Create a junction point from the new volume to the old log location.

    mklink /j C:FASTSearchvarlog e:log
  7. Start the local FS4SP solution by running nctrl start. Verify that all internal processes are running again by using nctrl status.

You need to do this on all your FS4SP servers, and we recommend doing this during initial deployment before you start any indexing to avoid downtime.

Logging

FS4SP can produce log files from almost every aspect of the solution. General-purpose logs are generated during run time, and are a good indicator of the current health of the solution but also useful when something goes wrong. In situations in which something goes wrong, you can typically increase the log-level threshold for the module you suspect is to blame for any problems, and rerun the operation that triggered the error.

Besides generating the general-purpose logs, FS4SP can be configured to emit targeted activity logs regarding document processing and search queries. These are described in the section Functional Logs later in this chapter.

In addition to logs, FS4SP publishes a large amount of monitoring data and performance counters that can be consumed by SCOM, the built-in Windows tool Performance Monitor, and custom scripts as well as several third-party monitoring tools. Read more about this in this chapter’s section on Performance Monitoring.

General-Purpose Logs

All FS4SP internal processes log information to files on disk. This topic is covered at the end of this section, in Internal FS4SP Logs. Depending on the configuration, most information from these file-based logs is also syndicated into the Windows Event Viewer and to the SharePoint Unified Logging Service (ULS) trace logs. However, because the internal FS4SP logs are the source of almost everything else, you should be sure to at least acquire a basic understanding of these.

Windows Event Logs

A good starting point for FS4SP logs is the standard Windows Event Viewer, shown in Figure 5-8. There are two important event log folders: FAST Search and FAST Search Farm. Both of these reside under the Applications and Services Logs server.

Logs from all local FS4SP processes are logged into the FAST Search event log folder by the local FS4SP log server. If several servers are involved, the administration server generates aggregated logs from all servers to the FAST Search Farm event log folder. If your solution involves only one FS4SP server, there is still a FAST Search Farm folder, simulating farm-level logs.

Logs that are collected in the Event Viewer are produced from the various FS4SP internal processes. These internal logs are available on disk on each server but are also collected and aggregated on the farm administration server. For troubleshooting, it might be easier to deduce what is wrong from these internal logs, especially when a higher level of detail is required. Note that the log levels in the Windows Event logs and the internal FS4SP logs differ. See Table 5-2 for a comparison, and the section Internal FS4SP Logs later in this chapter for more details.

The Event Viewer showing logs aggregated on the main log server on the FS4SP administration server.

Figure 5-8. The Event Viewer showing logs aggregated on the main log server on the FS4SP administration server.

Table 5-2. Comparison of log levels used in FS4SP internal logs and the Windows event log

FS4SP logs

Event log

Role

CRITICAL

Error

Critical problems are effectively rendering an important component of FS4SP unusable. Example: indexing, searching, or internal communication in between processes.

ERROR

Error

A specific task failed. May imply bigger problems. Example: FS4SP failed to index a document.

WARNING

Warning

An issue should be looked into, but it is not necessarily crucial to the well-being of the system.

INFO

Information

Standard run-time information is available.

VERBOSE

Information

Verbose run-time information is available. Useful for debugging. Not enabled by default.

DEBUG

Information

Very detailed run-time information is available. Useful for low-level debugging, but sometimes too detailed. Not enabled by default.

SharePoint Unified Logging Service (ULS)

In addition to logging messages to <FASTSearchFolder>varlog, FS4SP is also using the SharePoint Unified Logging Service for logging via the FAST Search for SharePoint Monitoring service. The default log location is %ProgramFiles%Common FilesMicrosoft SharedULS14Logs on your FS4SP servers.

Important

If you install FS4SP on the same machine as a server that has SharePoint installed, the ULS service will not be installed and logs will not be written to the %ProgramFiles%Common FilesMicrosoft SharedULS14Logs folder.

A good tool for reading and live monitoring of the ULS log files is the free ULS Viewer tool from Microsoft, shown in Figure 5-9.

The ULS Viewer tool.

Figure 5-9. The ULS Viewer tool.

More Info

For more information about the ULS Viewer tool, go to http://archive.msdn.microsoft.com/ULSViewer.

Internal FS4SP Logs

All components of FS4SP have their own folder below <FASTSearchFolder>varlog on the server on which they are running. When you experience problems with a component in FS4SP, this is a good place to start looking for errors.

Additionally, the administration server in your deployment stores aggregated logs from all the other servers in the farm in <FASTSearchFolder>varlogsyslog, making it easy to check all the logs using only one server instead of having to check each log on each server.

Functional Logs

Besides the general run-time logs, FS4SP can emit special types of functional logs produced during item processing and when FS4SP receives incoming search queries (query logs).

Item Processing Logging

A couple of tools are at your disposal that make life a lot easier when developing custom Pipeline Extensibility processors or when debugging the indexing pipeline.

Inspect runtime statistics

To see statistics about the indexing pipeline and the items that passed through, issue the command psctrl statistics. As mentioned in the section Basic operations earlier in this chapter, the psctrl tool is used to control all available document processors, regardless of which machine they run on. As such, running psctrl statistics gathers run-time data from all relevant servers in the farm.

The output generated from psctrl statistics should look similar to Figure 5-10. For each stage, you can inspect time and memory consumption and see how many items have passed through the stage correctly. Note that if a stage drops an item, the item will not continue to be processed in subsequent stages. These statistics are especially important if you added your own external item processing components, because you can examine how much time your custom code consumes, and how many items make it through successfully.

Statistics per stage in the indexing pipeline.

Figure 5-10. Statistics per stage in the indexing pipeline.

Even more item processing run-time statistics can be gathered using the resource compiler tool rc.exe. Running rc -r esp/subsystems/processing shows the minimum, maximum, and average time used per stage in the indexing pipeline.

Note

The rc.exe tool collects monitoring data from several of the running components of FS4SP, for example, the content distributor, document processors, and indexing dispatcher. For more information about the rc.exe tool, go to http://technet.microsoft.com/en-us/library/ee943503.aspx.

Turn on debug and tracing for the running document processors

If you want deep information about what goes on per item in the indexing pipeline, turn on tracing and debug logging using the following commands.

psctrl doctrace on
psctrl debug on

At the next indexing session, FS4SP will gather trace logs showing items entering and leaving each and every stage in the indexing pipeline, and display all properties that were added, modified, or deleted in any given stage during processing. To inspect these logs, use the tool doclog.exe.

More Info

For detailed information about the doclog tool, go to http://technet.microsoft.com/en-us/library/ee943514.aspx.

To display the trace log for every item that passed through the indexing pipeline since tracing was enabled, run the following command.

doclog –a

However, the output from the preceding command can be a little overwhelming in a large feeding session; the variants doclog -e and doclog -w show only items that either were dropped in the indexing pipeline or triggered warnings. These two variants are especially useful when debugging a faulty indexing pipeline.

Note that it is possible to run the doclog commands just mentioned without turning on doctrace and debug with psctrl, but the information returned lists only logs printed using the INFO log level.

Important

After you stop investigating these detailed item processing logs, be sure to disable them with the following commands. The log generation is very expensive and will, if left enabled, put a heavy footprint on your indexing pipeline and also fill up large amounts of disk space.

psctrl doctrace off
psctrl debug off

Inspect crawled properties in the indexing pipeline by using FFDDumper

FFDDumper is a convenient tool for inspecting which crawled properties are actually sent in to the indexing pipeline. This inspection can be particularly useful when mapping crawled properties to managed properties or when you are debugging your own custom processors.

Enable the tool by editing the configuration file for Optional Processing processors residing in <FASTSearchFolder>etcconfig_dataDocumentProcessoroptionalprocessing.xml. Change no to yes in the following line.

<processor name="FFDDumper" active="no"/>

After editing this file, you must also issue a psctrl reset in an FS4SP shell to make the indexing pipeline pick up the changes.

As you probably guessed by now, the FFDDumper is nothing more than an item processor. After activation, all crawled properties that are sent into the pipeline, and consequently also through FFDDumper, are logged to disk so that they are easily inspected.

The output is put into <FASTSearchFolder>dataffd. Start a small test crawl to verify the output. Check the output directory after the test crawl finishes. It should now contain files with the suffix .ffd. Open one of these in a text editor. Each line contains the following data:

  • A number indicating the length of the crawled property name

  • The name of the crawled property

  • The character s, followed by a number indicating the length of the crawled property

  • The content of the crawled property

Be sure to disable the FFDDumper by reverting to active=“no” in optionalprocessing.xml.

Note

The output files will be stored only on the local server in the FS4SP farm that processes the particular item.

Crawl Logging

Logs from crawling content sources using the built-in FAST Content SSA are easily inspected in Central Admin. On your FAST Content SSA, click Crawling | Crawl Log. Each available content source is listed along with the number of successfully indexed items, the number of items that generated warnings during indexing, and the number of items that could actually not be indexed at all because of an error of some kind. Click any of these numbers to display a list with time stamps and log information detailing what happened and why it happened.

Content crawled and indexed using the FAST Search specific connectors do not have a content source listed on the FAST Content SSA, and as such, this method for inspecting crawler logs is not valid.

Query Logging

FS4SP logs search queries into the <FASTSearchFolder>varlogquerylogs folder on all servers hosting a QR Server component. These logs contain detailed internal timing data and are particularly useful for performance analysis. The logs are stored in plain text.

The log files are named query_log.[TIMESTAMP], for example, query_log.20110907190001. Consequently, a log file is created for every hour the local qrserver process is running.

Tip

It is a good idea to use a competent and fast text editor when inspecting query logs in Notepad. Query logs use a nonstandard suffix, you need to rename the log files to .txt if you want to open them in Notepad. Query logs can grow quite large in high-volume solutions, so it is beneficial to use an editor that performs well even with large files.

Each line in any of the query log files corresponds to a search query and contains information such as originating IP, a UTC time stamp, the internal qrserver request string (into which the user query is converted), an XML representation of the resulting internal query, and some statistics. Quite hidden in the middle of each line, you can find a sequence looking like the one shown in Figure 5-11.

An excerpt from a query log file.

Figure 5-11. An excerpt from a query log file.

The numbers depicted in Figure 5-11 reveal some interesting performance data. The last number tells you how many search results were returned for the particular query. The meanings of the time-related parameters are:

  • Total processing time. The total time as seen by the QR Server component—that is, measured from when the query was received until the results were sent back to the service that initiated the search request. The lower boundary of this time is the document summary time plus the query time. However, the time is usually significantly higher because of additional processing in the QR Server and communication overhead between processes in the FS4SP farm.

  • Query time. The time spent locating matching items in the index.

  • Document summary time. Sometimes referred to as the docsum time, this number corresponds to the time FS4SP spent actually pulling matching items from the index.

In addition to the query logs located in <FASTSearchFolder>varlogquerylogs, a community tool called FS4SP Query Logger, created by one of the authors of this book, can be used to inspect live queries as they arrive. Detailed information about the queries, including a special type of relevance log explaining how rank was calculated, is included. The tool is shown in Figure 5-12.

More Info

For more information about this tool, go to http://fs4splogger.codeplex.com.

Live monitoring of queries using the FS4SP Query Logger.

Figure 5-12. Live monitoring of queries using the FS4SP Query Logger.

Performance Monitoring

The Windows service FASTSearchMonitoring is automatically installed on all servers in your FS4SP farm with the indexing or search components. The service repeatedly collects statistics, performance counters, and other health information from the FS4SP components and publishes them into performance counters and Windows Management Instrumentation (WMI).

More Info

For more information about WMI, go to http://msdn.microsoft.com/en-us/library/aa394582(VS.85).aspx.

Monitoring the FS4SP performance counters is one the easiest methods to see how your solution is behaving. Performance counters can be accessed via the Windows Reliability and Performance Monitor (perfmon.exe). This tool is preinstalled in the operating system and is shown in Figure 5-13.

More Info

For more information about Windows Reliability and Performance Monitor, go to http://go.microsoft.com/FWLink/?Linkid=188660.

Viewing FS4SP performance counters by using Windows Reliability and Performance Monitor.

Figure 5-13. Viewing FS4SP performance counters by using Windows Reliability and Performance Monitor.

Note

Although using perfmon.exe is quick and easy, the recommended monitoring solution is Microsoft Systems Center Operations Manager (SCOM). For more information about SCOM, go to http://technet.microsoft.com/en-us/library/ff383319.aspx. A preconfigured FS4SP management pack for SCOM is available for download at http://go.microsoft.com/FWLink/?Linkid=182110.

Performance counters for FS4SP have 13 different categories, covering areas such as indexing, searching, item processing, link analysis, and the Enterprise Web Crawler. The complete list and explanation of all available performance counters is listed on TechNet at http://technet.microsoft.com/en-us/library/ff383289.aspx.)

The sheer amount of performance data can be overwhelming at first. In the following sections, a couple of common and important monitoring scenarios are listed along with their key characteristics.

Identifying Whether an FS4SP Farm Is an Indexing Bottleneck

On the SharePoint server hosting the FAST Content SSA, you can examine the performance counter named Batches Ready in the OSS Search FAST Content Plugin category. If this counter reports that zero batches are ready, then FS4SP is processing and indexing content faster than the FAST Content SSA is capable of crawling your content sources. This means that you can further increase your crawling rate. If the number is higher than zero, consider adding more document processors, or see whether you have added an external item processor that slows down the speed of certain items flowing through the indexing pipeline.

If you add more document processors, ensure that the CPU load stays below 90 percent on average. See the next section for more information.

Tip

You can increase the crawler speed by adding crawler impact rules, essentially increasing the number of items being crawled in parallel. For more information, go to http://technet.microsoft.com/en-us/library/ff381257.aspx.

Identifying Whether the Document Processors Are the Indexing Bottleneck

If the servers hosting the document processors are constantly utilizing 90 percent to 100 percent of the CPU capacity, this is a good indicator that you should add more servers to your FS4SP farm to host additional document processors. In addition to monitoring the CPU load, you can watch the Queue Full State performance counter of the FAST Search Document Processor category. If this counter reports 1 over a long time, it indicates that the queue is full and you are not emptying it quickly enough, in which case you might need to add more document processors. A rule of thumb is to not have more than one document processor per available CPU core in the operation system.

Identifying Whether Your Disk Subsystem Is a Bottleneck

FS4SP is a very disk-intensive system with lots of random disk reads and writes. If you monitor the Avg. Disk Queue Length performance counter in the Physical Disk category, you can see whether your disk is being saturated. If the value stays above 2 for a single disk for longer periods of time, it is an indication of the disk being the bottleneck. The value should stay below 2 for a single disk.

If the volume is built up of six disks in an array and the average disk queue length is 10, the average disk queue length per physical disk is 1.66 (10/6=1.66), which is within the recommended value of 2.

If your disk is a bottleneck, you can either expand your RAID with more disks or add more columns or rows to your FS4SP farm to spread the load. You need to add an index column if indexing is saturating your disks, and additional search rows if searching is saturating your disks.

Backup and Recovery

Backup and recovery is an important aspect of your disaster recovery plan, and deciding how to implement it depends on several factors, such as how critical the service is for your business and how much downtime is acceptable.

If you require uptime 24 hours a day, seven days a week, you should consider configuring redundancy on the components in FS4SP. This is described in the Production Environments section in Chapter 4.

When backing up FS4SP, there are four parts to consider: the FS4SP administration database, the FS4SP configuration files, the FS4SP binary index, and any custom pipeline extensibility modules you have deployed.

FS4SP comes bundled with a Windows PowerShell backup script that backs up all of FS4SP, including any custom pipeline extensibility stages. The backup script uses functions in the Microsoft.SqlServer.Management.Smo namespace to perform backup and to restore of the administration database. The tool robocopy is used for reliable file copying.

More Info

The Microsoft.SqlServer.Management.Smo namespace is included with SQL server and is available for download at http://go.microsoft.com/FWLink/?Linkid=188653. For more information about robocopy, go to http://technet.microsoft.com/en-us/library/cc733145(WS.10).aspx.

Note

Although the backup of the FS4SP files is stored at a specified file area during backup, the administration database is not copied over to this folder. The backup of the FS4SP administration database is located in a folder called Backup below the Microsoft SQL Server installation folder on SQL Server. The exact path and host name of the database backup file can be found in the file SQLServerMetaData.txt in the backup folder, on the lines that contain BackupDirectory and NetName. The backup file name can be found in the file dbbackup.txt in the backup folder.

With SQL Server 2008 R2, the backup would reside in C:Program FilesMicrosoft SQL ServerMSSQL10_50.MSSQLSERVERMSSQLBackup.

Certificates are not backed up and have to be put in place manually if they expire.

The backup script can be run in two modes: configuration and full. Configuration mode makes copies of the FS4SP internal configuration files, excluding any pipeline extensibility changes, and can be run while the system is up and running. Full mode copies all the folders and data below <FASTSearchFolder> and requires suspending indexing while the backup is taking place. Table 5-3 lists what is covered in the different backup modes, including backup of the SharePoint farm.

Table 5-3. What is covered by the different backup options in FS4SP

Component

Configuration backup

Full backup

Full backup + SharePoint farm backup

Installer-generated files

 

Yes

Yes

Config server files

Yes

Yes

Yes

SAM admin configuration

Yes

Yes

Yes

SPRel configuration

Yes

Yes

Yes

Web Analyzer configuration

Yes

Yes

Yes

FAST Enterprise Crawler configuration

 

Yes

Yes

Binaries and DLLs

   

People Search

  

Yes

Searchable index

 

Yes[a]

Yes[a]

FIXML files

 

Yes

Yes

Custom folders in <FASTSearchFolder>

 

Yes

Yes

FAST Search Administration database

Yes

Yes

Yes

FAST Content SSA state

  

Yes

[a] Can be disabled with the -excludeindex parameter

Note

Besides the FAST Search Administration database, FS4SP is also storing content in databases for the FAST Content SSA and the FAST Query SSA. These databases are covered by the backup routines in SharePoint and thus are not covered by the FS4SP backup scripts.

Which backup mode you decide to implement in your backup and recovery strategy is largely dependent on your accepted threshold for downtime. Table 5-4 lists the most common scenarios and recommendations, and the cost related to upfront investments. Lower-cost alternatives might end up costing you more in downtime if, or rather when, disaster strikes.

Table 5-4. Downtime considerations

Method

Downtime period

Recommendation

Action Cost

Multiple rows[a]

No downtime

Add search rows to your deployment in order to provide failover and redundancy for one or more of the FS4SP components.

High

You have to add more servers and purchase additional FS4SP server licenses.

Full backup

One or more days

Use the FS4SP backup/restore scripts for full backup or use a third-party backup solution.

Medium

The cost is related to storage, and will vary depending on how much data you have indexed.

Configuration backup

Days to several weeks

Reinstall your farm from scratch, use the FS4SP backup/restore scripts for configuration backup/restore, and do a full crawl of all your content sources.

Low

Backing up the configuration takes very little space.

[a] Can also be combined with “Full backup”

More Info

You can read more about backup and recovery strategies for SharePoint at TechNet, where business requirements, what parts to backup, and what tools to use are also discussed. Go to http://technet.microsoft.com/en-us/library/cc261687.aspx.

Prerequisites

Before you run the backup or restore script in FS4SP, certain prerequisites need to be fulfilled. Table 5-5 lists the prerequisites and can be used as a checklist when preparing your solution for backup.

Table 5-5. Prerequisites for running the backup and restore scripts

Prerequisite

Comment

Perform at

The user who runs the backup and restore scripts must be a member of the local FASTSearchAdministrators group.

Commands executed in the scripts are secured by requiring the user to be a member of the FASTSearchAdministrators group.

All FS4SP servers

The user who runs the backup and restore scripts must be a member of the local administrators group.

This is required for Windows PowerShell remoting to work in its default setup.

All FS4SP servers

Windows PowerShell remoting must be enabled on all servers.

To enable remoting, open a Windows PowerShell prompt as an administrator and run the following command.

Enable-PSRemoting -Force

All FS4SP servers

Windows PowerShell remoting must be set up as a Credential Security Support Provider (CredSSP) server.

To enable the CredSSP server, open a Windows PowerShell prompt as an administrator, and run the following command.

Enable-WSManCredSSP -Role Server -Force

All FS4SP servers

Increase the maximum amount of memory allocated to a Windows PowerShell instance to 1 GB.

To increase the memory limit, open a Windows PowerShell prompt as an administrator and run the following command.

Set-Item WSMan:localhostShellMaxMemoryPerShellMB 1000

All FS4SP servers

Set up Windows PowerShell remoting as a CredSSP client.

To enable the CredSSP client, open a Windows PowerShell prompt as an administrator and run the following command:

Enable-WSManCredSSP -Role Client -DelegateComputer server1,server2,...,serverN –Force

where server1,server2,...,server must be identical to the server names listed in <host name=“server”> in <FASTSearchFolder>etcconfig_datadeploymentdeployment.xml.

Administration server

The SQL Server Management Objects (SMO) .NET assemblies must be installed.

If SQL server is installed on the same server as the FS4SP administration server, the assemblies are already installed. If not, download Microsoft SQL Server System CLR Types and Microsoft SQL Server 2008 Management Objects. These downloads can be found at http://technet.microsoft.com/en-us/library/cc261687.aspx and http://go.microsoft.com/FWLink/?Linkid=123709, respectively.

Administration server

The backup store must be available as a UNC path by the user who runs the backup scripts, with read and write permissions.

Ensure that the file share has correct read and write permissions for the user running the backup.

Backup store

The user who runs the backup and restore script must have sysadmin permissions on the FS4SP administration database.

Database permissions can be verified and modified by using SQL Server Management Studio.

SQL Server

Backup and Restore Configuration

The configuration backup option makes a copy of the FS4SP configuration, most importantly the <FASTSearchFolder>etcconfig_data folder, and only backs up files that are modified or generated after the initial FS4SP installation. The search index is not a part of this kind of backup. Because the backup contains an incomplete set of files, you have to restore it over an existing FS4SP deployment.

The configuration backup is not deployment-aware and can safely be restored to any system by using the -action configmigrate parameter during restore. This means you can restore the configuration from a five-server deployment to, for example, a two-server deployment.

Note

A configuration backup is nonintrusive and can be performed without interrupting a live system. It does not require shutting down any FS4SP services.

Be sure to always perform a configuration backup and restore when the set of managed properties changes. If the restored system and the backed-up system have unsynchronized managed properties, your content will not be searchable after the restore and you will have to perform a full crawl to put the system back in a consistent state.

Performing a Configuration Backup

To run a configuration backup, use the Windows PowerShell script backup.ps1, located in the <FASTSearchFolder>in folder. For example:

.ackup.ps1 –action config –backuppath <UNC path>

<UNC path> is the location of your backup store, for example, \storagefs4spbackup.

Performing a Configuration Restore

The following procedure can be used to do a configuration restore to a system that uses the same server name(s) and deployment configuration as the backup. This can either be the same system from where you took the backup or an identical mirrored system. The restore process must be performed on a running system, and you do not stop any FS4SP services.

To run a configuration restore, use the Windows PowerShell script restore.ps1 located in the <FASTSearchFolder>in folder. For example:

.
estore.ps1 –action config –backuppath <UNC path>

<UNC path> is the location of your backup store, for example, \storagefs4spbackup.

If you are restoring the configuration to a system that uses a different deployment configuration, use the -action configmigrate parameter as mentioned before. For example:

.
estore.ps1 –action configmigrate –backuppath <UNC path>

After the restore has completed, restart FS4SP, for example, by executing the following commands on each server in the farm.

Stop-Service FASTSearchService
Start-Service FASTSearchService

Full Backup and Restore

A full backup makes a complete copy of the FS4SP installation folder and the FS4SP administration database. For each server in your FS4SP farm, the backup stores a copy of the <FASTSearchFolder> except for the executable binaries and DLL files. The backup is coupled to the deployment it was generated from and can be restored only on the same set of servers or on exact duplicates of the originating servers.

When performing a full backup, be sure to stop all processes related to item processing and indexing. This ensures that the search will still work during a backup. Doing a full restore, on the other hand, requires you to shut down FS4SP completely on all servers in the farm.

It is important that the data in the crawler database associated with the FAST Content SSA is synchronized with the indexed content in FS4SP to avoid possible duplicates or indexed items not being searchable. Therefore, when setting up your FS4SP backup schedule, be sure to pair it with the backup of your SharePoint farm to ensure consistency.

Performing a Full Backup

Before starting the backup, ensure that all crawls are paused by running the following commands in a SharePoint Management Shell.

$ssa = Get-SPEnterpriseSearchServiceApplication <FAST Content SSA>
$ssa.Pause()

You must also stop all indexing and feeding processes on your FS4SP farm before starting the backup. Do this with the Windows PowerShell script suspend.ps1 located in <FASTSearchFolder>in. After the backup has completed, you can resume operations with the resume.ps1 script.

To run a full backup, run the following commands in an FS4SP shell on the administration server.

.suspend.ps1
.ackup.ps1 –action full –backuppath <UNC path>
.
esume.ps1

<UNC path> is the location of your backup store, for example, \storagefs4spbackup. It is good practice to let the system settle for a few minutes after running suspend.ps1 before starting the backup script to make sure all processes are properly stopped and files have flushed to disk.

When FS4SP is back up and running, you can resume crawling operations by executing the following in a SharePoint Management Shell.

$ssa = Get-SPEnterpriseSearchServiceApplication <FAST Content SSA>
$ssa.ForceResume($ssa.IsPaused())

Performing a Full Restore

When performing a full restore, it is important that the target system is identical to the source system used for the backup. What this means is that:

  • The numbers of servers are the same.

  • The roles assigned to each server are the same.

  • The names of the servers are the same.

Before restoring FS4SP, be sure that you have done a restore of the SharePoint farm first with the associated SharePoint farm backup.

  1. Stop the alternate access mapping job on SharePoint. From a SharePoint Management Shell, run the following commands.

    $extractorJob = Get-SPTimerJob | where {$_.Name.StartsWith("FAST Search Server 2010 for
       SharePoint Alternate Access Mapping Extractor Job")}
    Disable-SPTimerJob $extractorJob
  2. Perform a restore on the SharePoint Server farm installation. From a SharePoint Management Shell, run the following command.

    Restore-SPFarm -RestoreMethod overwrite –Directory <path of backup directory>
    -RestoreThreads 1
  3. Resume the alternate access mapping job, and pause all crawls. From a SharePoint Management Shell, run the following commands.

    $extractorJob = Get-SPTimerJob | where {$_.Name.StartsWith("FAST Search Server 2010
       for SharePoint Alternate Access Mapping Extractor Job")}
    Enable-SPTimerJob $extractorJob
    $ssa = Get-SPEnterpriseSearchServiceApplication <FAST Content SSA>
    $ssa.Pause()
  4. Stop the FS4SP services on all servers, for example, by running the following command.

    Stop-Service FASTSearchService
    Stop-Service FASTSearchMonitoring
  5. Perform the FS4SP restore. In an FS4SP shell, run the following command.

    .
    estore.ps1 –action full –backuppath <UNC path>
  6. Start the FS4SP services on all servers, for example, by running the following command.

    Start-Service FASTSearchService
  7. Resume crawling operations. From a SharePoint Management Shell, run the following commands.

    $ssa = Get-SPEnterpriseSearchServiceApplication <FAST Content SSA>
    $ssa.ForceResume($ssa.IsPaused())

Incremental Backup

You perform incremental backups by adding the -force option to the backup.ps1 script.

.ackup.ps1 –action full –backuppath <UNC path> -force

You can use the same backup path with this option, and because robocopy is run in incremental mode, only modified files are copied, which saves time in subsequent backups.

Speeding Up Backups

You can speed up backups by adding the -multithreaded option to the backup.ps1 script.

.ackup.ps1 –action full –backuppath <UNC path> -multithreaded 16

This option enables robocopy to use multiple threads in parallel during file copying. Though it is configurable, it cannot go outside the range from 1 and 128.

Warning

The multi-thread option of robocopy is not supported on Windows 2008, only on Windows 2008 R2.

Conclusion

A lot of things can happen in a solution as complex as FS4SP. It is important to establish procedures and guidelines in your organization for monitoring the solution and being able to catch indications of problems as they are reported. Knowing the tools that can help identify problems and knowing which logs to examine when something does happen is key to managing the FS4SP farm efficiently, ensuring the best possible search experience for your users.

If you experience server failures or other unexpected problems, you need to have proper routines in place for recovery by using the redundancy capabilities of FS4SP, backing up the FS4SP servers, or using a combination of both. After you choose a backup and recovery strategy, be sure that you perform regular test runs of the procedure so that you are well-prepared when disaster strikes and you have to execute the backup plan in a production environment.

This chapter covered tools and procedures that we think will get you ready for running FS4SP in production. You saw how to monitor an FS4SP deployment and the various areas in which logs related to FS4SP are held and how to effectively monitor them. The chapter also outlined how to effectively and safely modify the FS4SP topology. Finally, the chapter looked at the different methods of backing up an FS4SP deployment and gave several best practices for doing so.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.34.0