The best way to illustrate how to troubleshoot problems is by example. This section covers some of the common problems that customers encounter and offers pointers and hints for possible causes as well as how to resolve them.
To facilitate diagnostics, install the Office Communications Server 2007 R2 Resource Kit Tools, which provides additional tools to help diagnose problems. For more information on these tools, see Chapter 20.
This section details login and basic operation troubleshooting when using Office Communicator. Specific examples of failures as well as the data or troubleshooting steps associated with them are presented, and appropriate next steps are identified. Enabling Communicator event logs and using protocol trace logs on the client and server can make most problems readily apparent and even point out helpful next steps.
In the following log examples, the date, time, and thread columns have been removed to keep the data succinct and easy to read.
In this scenario, internal users are unable to log on using Office Communicator 2007 R2. The IT administrator has validated that the Office Communications Server 2007 R2 services are running and the users reporting the problem have been properly enabled for Office Communications. The following is one approach that can be used to troubleshoot the problem.
The IT administrator troubleshoots this issue by requesting that an affected user enable client logging on Office Communicator and attempt the logon process. Doing so will result in the information being recorded in the logs. The following excerpt from the Communicator log file shows the client doing DNS Service Record Locator (SRV) resolutions as part of automatic configuration and shows where errors are reported as indicated by the tag, ERROR, at the beginning of each log entry:
INFO :: QueryDNSSrv - DNS Name[_sipinternaltls._tcp.litwareinc.com] ERROR :: QueryDNSSrv GetDnsResults query: _sipinternaltls._tcp.litwareinc.com failed 0 ERROR :: DNS_RESOLUTION_WORKITEM::ProcessWorkItem ResolveHostName failed 8007232a INFO :: QueryDNSSrv - DNS Name[_sip._tls.litwareinc.com] INFO :: CUccDnsQuery::UpdateLookup - error code=80ee0066, index=0 INFO :: CUccDnsQuery::CompleteLookup - index=0 ERROR :: QueryDNSSrv GetDnsResults query: _sip._tls.litwareinc.com failed 0 ERROR :: DNS_RESOLUTION_WORKITEM::ProcessWorkItem ResolveHostName failed 8007232a INFO :: CUccDnsQuery::UpdateLookup - error code=80ee0066, index=1 INFO :: CUccDnsQuery::CompleteLookup - index=1 ... ERROR :: gethostbyname failed for host sipinternal.litwareinc.com, error: 0x2afc ERROR :: DNS_RESOLUTION_WORKITEM::ProcessWorkItem ResolveHostName failed 80072afc TRACE :: SIP_MSG_PROCESSOR::OnDnsResolutionComplete[012CAC60] Entered host sipinternal.litwareinc.com ERROR :: SIP_MSG_PROCESSOR::OnDnsResolutionComplete - error : 80ee0066 ... ERROR :: gethostbyname failed for host sip.litwareinc.com, error: 0x2afc ERROR :: DNS_RESOLUTION_WORKITEM::ProcessWorkItem ResolveHostName failed 80072afc TRACE :: SIP_MSG_PROCESSOR::OnDnsResolutionComplete[01D93130] Entered host sip. litwareinc.com ERROR :: SIP_MSG_PROCESSOR::OnDnsResolutionComplete - error : 80ee0066 ... ERROR :: gethostbyname failed for host sipexternal.litwareinc.com, error: 0x2afc ERROR :: DNS_RESOLUTION_WORKITEM::ProcessWorkItem ResolveHostName failed 80072afc TRACE :: SIP_MSG_PROCESSOR::OnDnsResolutionComplete[01D93130] Entered host sipexternal.litwareinc.com ERROR :: SIP_MSG_PROCESSOR::OnDnsResolutionComplete - error : 80ee0066
In this case, the DNS resolution for the SRV records failed with error code 0x80ee0066. By using the Resource Kit Tool Err.EXE, the error message means that no record existed. In the absence of the SRV records, Office Communicator attempts to resolve the following host names: sipinternal.litwareinc.com, sip.litwareinc.com, and finally sipexternal.litwareinc.com. Note that these requests also fail. It is likely that the issue is due to the DNS infrastructure not being reachable or that the infrastructure is configured incorrectly.
To further isolate the root cause of this issue, use nslookup.exe to determine what DNS server is being used and change the default server to the publishing point for the domain’s DNS SRV records (the server <NewDnsServerName> command changes the default DNS server from the NSLookup command prompt). Failure to resolve the DNS entries directly from the server that should be hosting these service records indicates that either the DNS service is not running on that machine or that the entries are configured incorrectly on the DNS server. Failures to resolve the records from intermediate machines could be related to the machines’ inability to see the publishing server as an authority or to existing DNS caches holding invalid entries.
Make sure that all necessary autoconfiguration records exist in the same domain name space as the user’s SIP domain. The server’s A record that the SRV records point to needs to have a domain name suffix that matches the user’s domain. If either of these do not match, Communicator will not accept the returned DNS information. Communicator will not be able to log on.
This section shows three common scenarios that are encountered when users are unable to connect due to certificate-related issues.
Case 1: Certificate Name Does Not Match DNS Name TRACE :: SIP_MSG_PROCESSOR::OnDnsResolutionComplete[01D87BA0] Entered host sipserver. litwareinc.com ... ERROR :: SECURE_SOCKET: negotiation failed: 80090322
In case 1, the error shown is recorded in the Communicator log when the client is unable to connect due to the certificate validation process. These types of issues occur when Communicator uses Transport Layer Security (TLS) and isn’t able to validate the server’s certificate.
Scanning for the keyword ERROR, we can discover that Communicator received the error code 0x80090322 (err.exe translates into SEC_E_WRONG_PRINCIPLE), which means that the FQDN to which the client connected does not match the subject name (SN) of the certificate or an entry on the certificate’s Subject Alternate Name (SAN) list. By looking back in the trace log, you can see the FQDN name of the server or pool that the client connected to: sipserver.litwareinc.com. Make sure that this name matches the SN or an entry in the SAN list on the certificate installed on the front-end services. The SN and the SAN list of a certificate can be checked by viewing the certificate in the Communications Server Management Snap-In and checking the Subject and Subject Alternate Names settings on the assigned certificate’s Details tab.
This failure can also occur when the client is configured manually to use an IP address to identify the server when using TLS. The host name needs to be used so that it can be matched against the name in the certificate.
Case 2: Server Certificate Is Not Issued by a Trusted CA
ERROR :: SECURE_SOCKET: negotiation failed: 80090325
In case 2, the client’s TLS negotiation failed because the server certificate isn’t trusted. The error code 0x80090325 maps to SEC_E_UNTRUSTED_ROOT, which indicates that the root certification authority (CA) isn’t trusted. The root CA certificate must be present in the list of Trusted Root CAs found in the Computer Certificate store. For testing, you can add the root CA certificate to the Trusted Root CA path in the user’s store. If doing so resolves the issue, repeat this process for the Computer store so that it will impact all users of the machine.
Windows servers are configured to hand out the root and intermediate CA certificates along with the server certificate to avoid certificate authentication problems. However, because some public instant messaging (IM) connectivity (PIC) partners do not use Windows-based servers, it is recommended that the intermediate CA certificates be placed in the Computer’s Certificate Store list of intermediate CAs.
Case 3: Root CA Certificate Is Missing the Server EKU
ERROR :: SECURE_SOCKET: negotiation failed: 80090349
In case 3, Communicator received the SEC_E_CERT_WRONG_USAGE failure message (0x80090349), which means that the certificate is not trusted for the purpose that it is being presented. It does not have rights to be used as a server certificate. Often this is caused due to the root CA certificate not being enabled for Server Authentication and Client Authentication usage; the certificate that Office Communications Server presents has the usage listed in its Enhanced Key Usage (EKU) field. This can be verified by using the Certificate MMC snap-in. Figure 21-1 shows the property of an incorrectly configured root CA certificate.
To resolve this issue, make sure that the root CA and Intermediate CA certificates stored in the Machine store are enabled for at least Server Authentication and Client Authentication as defined by the EKU field of the certificate being used by Office Communications Server.
If problems validating or negotiating certificates arise, use the Certificate MMC snap-in to validate the certificate configuration on the client and server. Using a combination of the Office Communications Server event log and the Certificate MMC, you can also determine if the issue is due to an expired certificate.
Remember to use lcserror.exe from the Office Communications Server Resource Kit or err. exe to interpret any error codes into strings for more information about the failure. Err.exe is included in Windows Server 2008 and Windows Vista operating systems and can be obtained from the Windows Resource Kit for earlier releases. Last, the validation wizards can be used to help verify that the certificates are properly installed on the Office Communications Server.
This section details the troubleshooting steps used to resolve a common Web conferencing issue.
In the following log example, the date, time, and thread columns have been removed to keep the data succinct and easy to read.
In this scenario, participants are unable to download content, such as handouts or a Microsoft Office PowerPoint presentation, that the meeting organizer has successfully uploaded. This problem is reproducible and occurs with all meetings hosted on a single pool. This scenario assumes that the files are successfully uploaded to the Web Component Server.
The first step an administrator takes is to reproduce the problem to generate a set of logs to be used for troubleshooting. PWConsole logging, which is used in this scenario, is enabled by default. Start by creating a new meeting by using the Office Live Meeting client. There are two ways to attempt to reproduce the issue:
Upload an Office PowerPoint file to the meeting (on the Content menu, click Share, select Add File To Meeting, and then choose View). Once the file is uploaded, the main console window displays a gray background with an error message: Content failed to download due to a problem with the Conference Center configuration. Contact your administrator.
Upload a file by using the Handouts feature (see the Handouts button on the toolbar). Once you have uploaded a file, try to download it. A message box pops up with the following error message: Download failed.
Once the problem has been reproduced, review the console trace file, which can be found in the user’s default temp folder (%temp%). This folder is easily accessed by running the following in the command console:
cd %temp%
Usually, this folder will contain multiple console trace files for the user. To view these trace files from the command prompt window, run the following command:
dir pwconsole-debug*.txt
Using your favorite text editor (for example, Notepad), open the most recent trace file. This can be determined by using the modified date/timestamp. Look for the section in which the failure occurs by searching the trace log for Downloader::addRequest(). This will help you review all areas in the log relating to the file download process. The error will look similar to the following:
[MC] 21:06:10:064 GMT [THREAD 4888] [I] Downloader::addRequest() – Found previously failed request, will not download https://se.litwareinc.com/ etc/place/null/FileTree/IE6HRFPCBJ3K1CCUC1 HH75UPN8Q/6a24bb55f381433285cc878baa11ed3a/slidefiles/ xc75dbd0baa6a.epng
This trace line indicates that the console failed to download the content from a specific URL. The URL is printed at the end of the trace line, starting with HTTPS (for example, https://se.litwareinc.com). When debugging the problem, copy the URL from the actual log for use later in the investigation.
Next, verify that Internet Explorer can browse to the URL. Open Internet Explorer and paste the URL copied from the trace file. Change the URL HTTPS to http://se.litwareinc.com. If Internet Explorer is properly configured, the result will be a page that says Under construction.
If Internet Explorer returns a result that says Internet Explorer cannot display the webpage instead of Under construction, verify that Internet Information Services (IIS) is running on the Office Communications Server. To do this check, connect to the Office Communications Server running the Web Conferencing service and open the IIS Management Console.
Use the IIS Management Console and view the SEWeb SitesDefault Web Site icons in the left panel tree. Red circles on the icons indicate that a failure has occurred within IIS. If the service is not running, try to restart IIS by right-clicking the machine icon. From the context menu, select All Tasks and then Restart IIS. If the red circles remain, there are two options to resolve the issue:
Reinstall IIS.
Deactivate Web Components.
Uninstall IIS.
Reinstall IIS.
Activate Web Components.
Contact Microsoft Support Services or an Office Communications Server certified partner for assistance with issue resolution.
Upon resolution, retest by opening Internet Explorer and browsing to the base URL listed in the trace log, for example, http://se.litwareinc.com. If the Web page still does not display correctly, verify that the DNS is working properly by using NSLookup. In this scenario, Internet Explorer displays the Under Construction page, and the next step is to check if the file was uploaded successfully.
Using the IIS Manager console, expand the nodes in the left panel tree:
Internet Information Services
SE
Web Sites
Default Web Site
Etc
Null
FileTree
<organizerguid>
<consoleguid>
sidefiles
The following image, Figure 21-2, shows what the expanded IIS Management Console will look like.
The <organizerguid> and <consoleguid> come from the URL copied from the PWConsole log. In this scenario, the <organizeguid> is IE6HRFPCBJ3K1CCUC1HH75UPN8Q and the <conferenceguid> is 6a24bb55f381433285cc878baa11ed3a.
Click the Sidefiles node, and a list of files should appear in the right panel list. One of these files should be the EPNG file from the end of the URL (xc75dbd0baa6a.epng).
If the file is not located in the directory, there are multiple possible issues: problems with the Web Conferencing Server or connectivity problems between the Web Conferencing Server and meeting participants. Check the Office Communications Server event logs for issues reported on the service running the Web Conferencing Server role. Also verify that the necessary ports are open between the meeting organizer and the Web Conferencing Server (these ports are discussed in detail in Chapter 4). In the case where the organizer is remotely connected, check port connectivity to the Web Conferencing service on the Edge Server.
The Live Meeting Console downloads the conference content by using Hypertext Transfer Protocol Secure (HTTPS), and therefore the server must be configured to accept such requests. Verify that IIS is properly configured for HTTPS. IIS does not accept such requests by default, and it must be manually activated. There is a chance that this configuration step for IIS was overlooked.
Using the IIS Management Console, right-click Default Web Site. Select Properties, then Directory Security, and then Server Certificate. Follow the steps in the wizard by selecting Assign An Existing Certificate. (A Web certificate must already be installed on the server machine. See Chapter 4.) Select the certificate and ensure that the port is set to 443. Save the settings by clicking Finish in the wizard and then clicking OK on the Properties page.
IIS should now be configured for HTTPS. Double-check that everything is working by repeating the resolution procedures by using the original HTTPS URL (for example, https://se.litwareinc.com). Internet Explorer should be able to display the Under Construction page.
If the issue is still unresolved, run the validation wizard and the Best Practices Analyzer tools. These tools verify the functionality of the underlying components and may identify additional problems that are preventing issue resolution.
This section details external audio troubleshooting when using Office Communicator. Specific examples of failures as well as troubleshooting steps associated with them are presented and appropriate next steps are identified. Enabling Communicator event logs and using protocol trace logs on the client and server can make most problems readily apparent and even point out helpful next steps.
In the following log examples, the date, time, and thread columns have been removed to keep the data succinct and easy to read.
In this scenario, internal users can call other internal users; however, they are unable to place or receive calls when they are outside the corporate network behind a Network Address Translation (NAT) device, such as a router or firewall. The IT administrator has validated that the Office Communications Server 2007 R2 services are running, and the users reporting the problem have been properly enabled for Office Communications and remote access. The following is one approach that can be used to troubleshoot the problem.
The IT administrator begins to troubleshoot this issue by enabling client logging on Office Communicator and attempting to place a call to an external user behind a firewall. The call attempt will be recorded in the log file.
Now that the IT administrator has a set of logs from the customer, he can view the log output by using Snooper, a tool from the Office Communications Server 2007 R2 Resource Kit. For detailed information on Snooper and how to use it, please refer to Chapter 20. To isolate the failed call attempt, the IT administrator uses the following search string:
Invite m=audio The m= field in the message body can be used to determine the specific modality of a particular INVITE. The following are some common examples: – m=message is used for IM messages – m=audio is used for audio calls
This will return multiple SIP messages, including INVITE messages and other messages related to the invitation. The next step is to find the correct INVITE message from the failed call attempt. This is accomplished by looking at the TO header in the SIP message, as well as the Time/Date timestamp. Because the log was taken from the person making the call, the TO field should be to the SIP URI of the remote user, and the timestamp should match when the call was placed. The following is an example of the INVITE message:
10/1/2008|14:31:53.354 2A44:2B98 INFO :: INVITE sip:[email protected] ... From: <sip:[email protected]>;tag=616870b365;epid=3da06eb148 To: sip:[email protected]
After finding the specific INVITE, the IT administrator looks at the details of the INVITE message, searching for the listed a=candidate entries. The following lists a=candidate entries returned in this scenario:
a=candidate:wL0o20SkJFunhXCMnB6ql+Z/kzBn0FuzWk1XGE28clY 1 GtkmEX1ZUj65wXiQ4YyfhQ UDP 0.870 192.168.0.198 50035 a=candidate:wL0o20SkJFunhXCMnB6ql+Z/kzBn0FuzWk1XGE28clY 2 GtkmEX1ZUj65wXiQ4YyfhQ UDP 0.870 192.168.0.198 50027
Note that this list contains only entries with the user’s internal IP address, 192.168.0.198, and does not include any a=candidate entries that contain the IP address of the Edge Server. Because the information for the Edge Server is not listed and the internal user’s IP address is in a non–publicly routable network, the external user will not be able to establish a point-to-point audio connection. In most situations in which the Edge Server’s a=candidate entries are missing, there is an issue with the configuration of the A/V Authentication service or a communications issue caused by an improperly configured firewall.
To further verify the root cause of the issue, the IT administrator uses Snooper to search the log for the keyword MRAS.
Note that the registration with the Media Relay Authentication Server (MRAS) occurs only at login. If the search turns up no responses, it means one of two things:
The log does not contain the user logon information. This is common when logging is enabled after the user is logged in. After logging is enabled, have the user log back in and then repeat the test.
There is not an MRAS Server defined for the pool. Check the pool’s A/V Conferencing setting to pick the correct MRAS Server for the environment.
This will show you the original request sent to MRAS, as well as the response returned. A common failure response code in these types of issues is a 504. The MS diagnostic header of the 504 message will provide further information that can be used to understand the cause of the problem and which servers are involved. The following is the MS diagnostics header from a 504 error:
ms-diagnostics: 1007;reason="Temporarily cannot route";source="se.litwareinc.com"; ErrorType="Connect Attempt Failure";WinsockFailureDescription="The peer actively refused the connection attempt";WinsockFailureCode="274D(WSAECONNREFUSED)";Peer="edge. litwareinc.com"
This error shows that the connection attempt to the MRAS, edge.litwareinc.com, was actively refused. These types of errors normally occur due to port configuration on the Edge Server being improperly set in the Global Settings properties. The default port for the Edge Server audio/video (A/V) authentication is 5062; however, when setting the Edge Server configuration in the Global Settings properties, it is commonly set to 5061 or another port. Verify that these port numbers match on the Edge Server, pools, Mediation Servers, and global settings.
Another common issue that shows similar symptoms is caused by specific ports being blocked on either the internal or external firewall. Refer to Chapter 4 for additional information on firewall requirements.
18.188.3.236