The Office Communications Server product line has always provided event logs to help identify problems and point out irregularities in the system. In Office Communications Server 2007, there is even a filtered view of server warnings and errors presented in the MMC snap-in as a tabbed overview for each server. Office Communicator 2007 has added the capability to create event logs that can be enabled from the Tools | Options menu item on the General tab by selecting Turn On Windows Event Logging For Communicator, as shown in Figure 16-1. When issues arise, the event log is the first place to turn for high-level guidance on problems that the server or client might have already identified. In general, warnings and errors will be of interest, but informational event logs will not.
Because the event log has the capability to filter and sort itself, it is not necessary to spend time hunting through the Application Event Log. All events logged by the client show up in the Application Event Log, and all events from servers show up in the Office Communications Server Event Log. The View menu provides the ability to filter by event source; therefore, filtering for Communicator events or a particular server role is possible, as is filtering for only warnings and errors.
The following example presents a normal Office Communicator event log along with examples of problematic logs that may show up. Additional information and advice on how to resolve problems are presented with each log.
Normal event log entries are provided here simply to point out expected log entries so that they can be ignored during troubleshooting. The first log shown is created during every normal Office Communicator login when event logging is enabled.
Type:
InformationalEventID:
1020Source:
CommunicatorDescription:
Communicator has enabled event logging. Information about failed calls will be sent to the Windows event log.
Event ID 1020 is useful only for identifying that client event logs are turned on when looking for problematic warnings and errors in the event log.
The detailed tracing log (described in more detail in the following section) is a primary means of understanding the details of problems when they occur, and if event logs are enabled in Office Communicator, it will also create event log entries when there are failures to provide diagnostic information. These entries are incredibly useful for diagnosing login problems. The following examples clearly explain the problem, point at data related to the problem, and explain the steps involved for solving the problem.
This first sample event log message (1007) occurs because Communicator is configured with an invalid server name, IConfiguredAnInvalidServerName.contoso.com. The message explains that Communicator cannot find a server with that name when attempting to resolve the hostname specified using the DNS. As is typical for most events, the actions to be taken next are clearly laid out in the event log Resolution, which greatly aids in resolving the problem.
Type:
ErrorEventID:
1007Source:
CommunicatorDescription:
Communicator was unable to resolve the DNS hostname of the login server IConfiguredAnInvalidServerName.contoso.com. Resolution: If you are using manual configuration for Communicator, please check that the server name is typed correctly and in full. If you are using automatic configuration, the network administrator will need to double-check the DNS A record configuration for IConfiguredAnInvalidServerName.contoso.com because it could not be resolved.
The second sample event log error (1015) occurs because Communicator is configured to connect to an invalid port (9999) instead of a valid port (for example, 5061) on the server. The message explains that Communicator is unable to connect (10065 is the code for Windows WSAEHOSTUNREACH error, which means that the host is unreachable on the network). Again, the steps to resolve the problem are laid out in the Resolution section in the event log text.
Type:
ErrorEventID:
1015Source:
CommunicatorDescription:
Communicator failed to connect to server srv.contoso.com (192.168.3.1) on port 9999 due to error 10065. The server is not listening on the port in question, the service is not running on this machine, the service is not responsive, or network connectivity doesn't exist. Resolution: Please make sure that your workstation has network connectivity. If you are using manual configuration, please double-check the configuration. The network administrator should make sure that the service is running on port 9999 on server srv.contoso.com (192.168.3.1).
The third sample event log error occurs because Communicator cannot validate the certificate credentials presented by the server to which it is connecting. The message explains that Communicator attempted to connect to sipfed.contoso.com, but received a certificate with a Subject Name (SN) that didn't match. This typically occurs because the wrong certificate is installed on the server or an unsupported name is published in the DNS and is used to connect to the server. The Resolution section offers the appropriate steps to take to move forward.
Type:
ErrorEventID:
1008Source:
CommunicatorDescription:
Communicator could not connect securely to server sipfed.contoso.com because the certificate presented by the server did not match the expected hostname (sipfed.contoso.com). Resolution: If you are using manual configuration with an IP address or a NetBIOS shortened server name, a fully qualified server name will be required. If you are using automatic configuration, the network administrator will need to make sure that the published server name in DNS is supported by the server certificate.
Some event log messages do not clearly state what needs to happen next, which can occur when the server infrastructure identifies a problem and Office Communicator simply reports the raw data that it receives from the server. Event ID 1000 is used for reporting exactly these types of scenarios, and examples of logs with this event as well as additional information about them are provided in the samples that follow. For event ID 1000, the first number in the technical data (1011 for this example) and the reason text ("Ms-Diagnostics header not provided by previous hop") are linked in that the number is simply an identifier that matches the reason text. It is not another form of error code to be analyzed.
The first sample event log message displays an unexpected failure (diagnostic event 1011 passed back from the server) during federation routing. An error is returned from the remote organization (woodgrovebank.com) by the server sip.woodgrovebank.com, but no diagnostics information is provided. Therefore, the server sipfed.contoso.com adds the ms-diagnostics header to identify the source of the failure. The problem might be transitory and not occur again later (e.g., it is handled by the client and the infrastructure by retrying the request, or a network outage will be resolved at some point), it might be a software bug in the remote implementation, or it might be an invalid request that can't be handled. In the latter case, looking at the transaction details in the Office Communicator tracing logs is the best approach to fully identify the issue.
Type:
WarningEventID:
1000Source:
CommunicatorDescription:
Communicator failed in an unexpected way. More information is contained in the following technical data: 1011;reason="Ms-Diagnostics header not provided by previous hop";source="sipfed.contoso.com";Domain="woodgrovebank.com";PeerServer= "sip.woodgrovebank.com" Resolution: If this error continues to occur, please contact your network administrator. The network administrator can use a tool like winerror.exe from the Windows Resource Kit or lcserror.exe from the Office Communications Server Resource Kit in order to interpret any error code listed above.
The second sample event log message shows an unexpected failure (diagnostic event 1004) during internal routing. An error is returned by server5.contoso.com because it identifies a problem at the protocol level. In this case, a messaging session or some other interaction must have been in place long enough so that the server signature used to protect and validate the routing information is no longer valid. Servers rotate these keys regularly, and extremely long sessions can come up against this error, but Office Communicator should handle this error by re-establishing the session automatically.
Type:
WarningEventID:
1000Source:
CommunicatorDescription:
... 1004;reason="Route set is no longer valid";source="server15.contoso.com";ErrorType="A key that was used to sign the route set is no longer valid" ...
The third sample event log message (1013) occurs because the client and the server are not in sync on the current (universal) time. This can happen when the time is set properly but the time zone is set improperly because the system time has not been set to take daylight savings time or simply because the clock is set to the incorrect time. Authentication using Kerberos relies on clock accuracy, so the solution is to check the time and the time zones on the client and then on the server.
Type:
WarningEventID:
1000Source:
CommunicatorDescription:
... 1013;reason="Significant time skew detected during authentication";source="server6.contoso.com" ...
The fourth sample event log message (1010) occurs due to certificate validation problems between two servers. This can happen because the servers are in different organizations and do not have common trust for each other's certificates, the other server's certificate isn't valid due to its name, the certificate has expired, or the remote server isn't using a server certificate. In this case, the problem is actually listed in the ErrorType field, which explains that Transport Layer Security (TLS) or Mutually Authenticated TLS (MTLS) was not able to be negotiated quickly enough. This is likely due to some networking problems (slow networks or firewall issues) or because the server went offline during the connection. Steps to resolve the problem would be to explore the remote server and the network path to that server to understand what caused the delay.
Type:
WarningEventID:
1000Source:
CommunicatorDescription:
... 1013;reason="Certificate trust with next-hop server could not be established";source="sipfed.contoso.com";ErrorType="The peer did not respond to TLS or MTLS negotiation in a timely manner" ...
The fifth sample event log message shows an unexpected failure (diagnostic event 2) with an error code that doesn't provide much explanation. The Windows calculator (calc.exe) can often be used to convert error codes into hexadecimal or decimal numbers so that a tool, such as winerror.exe or lcserror.exe, can provide a descriptive error message. Sometimes, error codes are printed in decimal numbers and need to be converted to hexadecimal numbers to display properly. At other times, a Windows system error message is wrapped with decorations and presented as a hexadecimal error code. If the tools are unable to present an error name and text, try using only the last four hexadecimal digits because it can take a bit of fooling around with calc.exe in scientific mode to back out some error codes. However, for this example, converting the error isn't necessary because it can be interpreted when passed directly to lcserror.exe: "SIPPROXY_E_AUTHENTICATION_LEG <--> The request processing was stopped to continue authentication exchange with the client through challenge response." This error message explains that sometimes the client and server get out of sync with authentication and, though the client thinks its current credentials are still valid, the server needs to prompt for authentication and cannot continue processing the request. In general, this is a harmless warning because the client will establish new credentials and the user is unlikely to even notice a problem.
Type:
WarningEventID:
1000Source:
CommunicatorDescription:
... 2;reason="See response code and reason phrase";source="sipfed.contoso.com";HRESULT="C3E93D81 (SIPPROXY_E_AUTHENTICATION_LEG)" ...
Examples of normal Office Communications Server event logs are presented to avoid confusion, along with examples of problematic logs that might arise. Additional advice on how to resolve problems is presented with each log.
When working through problems in the topology or on specific servers, a good way to minimize the scope or to quickly identify errors is to use the Validation Wizard, which is available as a tool in the Office Communications Server 2007 administration MMC snap-in. This tool checks configuration against connectivity and does the basic validation checks to point out and avoid misconfigurations in Office Communications Server 2007 settings or in network and certificate configurations that relate to the server.
For anyone who has looked at the number of events created in the event logs on a server as part of starting up, it can be an overwhelming experience to understand what information is actually useful and what can be ignored. In general, all informational logs can safely be ignored, but examples are provided here to explain some of the detailed information available within them. Warning and error messages are also commonly seen, and some are explained here to alleviate administrator concerns about warnings or errors that aren't fully understood. All server logs will show up in the Office Communications Server Event Log, which can be loaded as a snap-in in the MMC or launched directly as eventvwr.exe, as shown in Figure 16-2.
Examples of Normal Informational Events The following startup event log message (Event ID 14426) simply identifies that the registry key was enabled to log message bodies, as shown in Figure 16-3.
This isn't a great cause for concern. However, depending on your corporate security policy, it might be a privacy violation to log this type of information in tracing logs; therefore, this type of logging should be used only for short intervals during detailed diagnostic sessions if message bodies are actually required (typically, they are not). Additional information about enabling message body tracing is available in the Using Client and Server Trace Logs section later in this chapter.
Type:
InformationalEventID:
14426Source:
OCS Protocol StackDescription:
Configuration information was loaded from the registry. The registry value is EnableMessageBodyTracing with a DWORD value of 1.
The next event log message (56001) is similar to several of the first messages in the 56000s, which identify that default settings are being used because they weren't actually stored in the Windows Management Instrumentation (WMI) store. None of these messages are of much concern other than to record the current settings that the server will be using in case of future debugging or to work with Product Support Services (PSS).
Type:
InformationalEventID:
56001Source:
OCS WMI Event ProviderDescription:
The WMI event provider parameter 'initial due time' was not set or failed to be retrieved. Use the default value. Initial due time: 300000 milliseconds
Several additional settings are shown in other server events during startup, and these are included in the following examples to demonstrate some of the configuration information available and identify the default settings for some potentially interesting detailed configurations. The first event log message shows some of the detailed default configuration used by the SIP stack.
Type:
InformationalEventID:
14413Source:
OCS Protocol StackDescription:
Proxy configuration changes were applied successfully. Request compression on outgoing server to server connections [1=yes,0=no]: 0 Accept compression requests from server connections [1=yes,0=no]: 1 Accept compression requests from client connections [1=yes,0=no]: 1 Maximum number of server-to-server connections: 1024 Maximum number of client-to-server connections: 5000 Maximum outgoing TLS connections to the same server: 4 Maximum size of content body for client to server connections: 131072 bytes Maximum size of content body for server to server connections: 5120000 bytes
The second event log message shows some of the detailed default configuration used by the User Services SIP registrar.
Type:
InformationalEventID:
30926Source:
OCS User ServicesDescription:
User Services Global Settings configuration applied: MinRegistrationExpiry: 300. DefRegistrationExpiry: 600. MaxRegistrationExpiry: 900. MinPresenceSubscriptionExpiry: 1200. DefPresenceSubscriptionExpiry: 28800 MaxPresenceSubscriptionExpiry: 43200 MinRoamingDataSubscriptionExpiry:900 DefRoamingDataSubscriptionExpiry: 43200 MaxRoamingDataSubscriptionExpiry: 86340 NumOfDevicesPerUser: 8 MaxSubscriptionPerUser: 200 AllowPollingForPresence: true EnableBENotify: true UserDomainList: false
The third event log message shows additional default configuration used by User Services, with the most interesting probably being the maximum contacts that a given user is allowed to maintain.
Type:
InformationalEventID:
30934Source:
OCS User ServicesDescription:
Server specific User Services Settings configuration applied: RedirectMethods: AsAppropriate MaintenanceHourOfDay: 2 MaxContactsPerUser: 150.
As previously mentioned, most informational logs are useful only for backtracking configuration settings of the server after problems are detected or for determining whether the configuration is the same as it was expected to be (based on setting registry keys, WMI configuration, or changes in the user interface [UI]).
Examples of Normal Warning and Error Events Several warning and error event log messages can show up during initial startup of the server, which isn't really a cause for concern. The first event log message warns that a critical application or service has not yet registered with the server. This can occur due to longer startup processing times when the server is booted. As long as the server starts successfully and a subsequent information log identifies that the application finished startup, this log is not of much concern.
Type:
WarningEventID:
30232Source:
OCS Applications ModuleDescription:
Office Communications Server startup is pending. Some configured critical applications have not yet registered. Resolution: For script only applications ensure that the application is available in the path specified in the MMC, and that no errors are reported by the Office Communications Server Script-Only Applications Service. For non-script only critical applications ensure that they are configured to register on server startup.
The next event log message warns that no Exchange Unified Messaging (UM) servers could be found in the domain. This message is a concern only if servers were installed, but this is not a mandatory part of the installation.
Type:
WarningEventID:
44028Source:
OCS Exchange Unified Messaging RoutingDescription:
Exchange Unified Messaging Routing Application did not find any Exchange UM servers. Cause: No Exchange UM servers are configured for SIP traffic or the RTCSRV service account does not have permission to read Exchange objects in Active Directory. Resolution: Configure one or more Exchange UM servers to handle SIP traffic.
The following event log message has an error due to the failure of a service (in this case, the Address Book Server [ABS]) to start up and respond within an expected time frame. Again, this is something that can happen while the server is starting up shortly after the machine boot process when many processes are loading or on slower servers on which a lot of programs are running. As long as subsequent event log messages show that the service eventually started, this message isn't of concern.
Type:
ErrorEventID:
12330Source:
OCS ServerDescription:
Failed starting a worker process. Process: 'C:Program FilesOffice Communications Server 2007ServerCoreABServer.exe' Exit Code: C3E8302D (The worker process failed to initialize itself in the maximum allowable time.). Cause: This could happen due to low resource conditions or insufficient privileges. Resolution: Try restarting the server. If the problem persists contact Product Support Services. ...
The next event log message warns that delta files aren't being generated. However, this is normally the case for small servers or test systems because there just aren't enough users. This message can largely be ignored and treated as an informational event log.
Type:
WarningEventID:
21012Source:
OCS Address Book ServerDescription:
A delta file would be too large in relation to the full file it is based on. The delta file will not be generated. Path: C:Program FilesOffice Communications Server 2007Web Components Address Book FilesD-0909-0926.lsabs Full File Records: 5 Delta File Records: 3 Cause: A delta file is generated in memory. If the number of new/updated contacts plus the number of deleted contacts is greater than a configured percentage of the number of contacts in the base file, the delta file is not generated because it would be quicker for the client to just download the full file. Resolution: None needed.
Many possible ways exist in which server configurations, network conditions, and invalid requests can create error conditions for the server. A few examples are included in this section both for reference and to explain what is happening.
This first error event occurs because the server didn't have a static IP address. Office Communications Server must be run on a static IP address because doing so will provide the most reliable service if its IP address doesn't change. The SIP records and uses IP addresses during routing, and a change in this address can cause failures for messaging or other active sessions. Under normal conditions, the Dynamic Host Configuration Protocol (DHCP) server (which is in charge of handing out IP addresses) does not change the IP address that it originally assigned to the Office Communications Server, but it can happen, especially across reboots or service windows. The following log helps identify that the IP address on which the server was listening no longer exists, which is an indicator that DHCP was in use.
Type:
ErrorEventID:
14336Source:
OCS Protocol StackDescription:
A configured transport has failed to start. Transport TLS has failed to start on local IP address 192.168.3.104 at port 5061.Cause: This can occur due to a configuration error, low system resources or other programs using the specified port. It can also happen if the IP address specified has become invalid. Resolution: Ensure that the IP address specified is valid and that no other program is listening on the specified port.
The second event shows a warning that is logged to alert the administrator that a server certificate (used by Office Communications Server for authentication and encryption with TLS) is close to expiring. Server certificates are valid only for a certain period of time after they are issued and then need to be replaced by a newly issued certificate. This warning helps administrators act before the certificate expires so that an unplanned outage and diagnostics can be avoided.
Type:
WarningEventID:
14342Source:
OCS Protocol StackDescription:
The certificate configured for secure transport will expire soon. Transport TLS on 192.168.1.103:5061 will expire on Thursday, June 05, 2007 at 09:35 Local Time. The certificate serial number is attached for reference. <attached binary data shows the certificate's serial number>
The third event shows an error that will be logged if the administrator doesn't update the certificate, which will prevent the server from starting successfully. The solution for this and the previous log are the same—install a valid certificate for use by the server.
Type:
ErrorEventID:
14341Source:
OCS Protocol StackDescription:
The certificate configured for secure transport has expired or is not yet valid. Transport TLS on 192.168.1.103:5061 will not start. The certificate serial number is attached for reference. Resolution: Renew the certificate or replace the transport with a new one. <attached binary data shows the certificate's serial number>
Many possible types of events could arise, and all of them cannot be presented here. Use of winerror.exe and/or lcserror.exe from the Windows Server 2003 or Office Communications Server 2007 Resource Kit Tools can help explain any error codes that you come across. However, almost all of these events contain relatively detailed information to help point out the problem, identify specific data related to the problem, and offer solutions or items to investigate.
Please see the Additional Resources section at the end of this chapter for instructions on obtaining the Windows Server 2003 Resource Kit Tools.
3.142.194.55