How to troubleshoot a problem

We cannot troubleshoot any issue by just referring to the user comments or problem statement. In order to troubleshoot the issue, we have to narrow down the problem to its root level and fix the issue.

Many web administrators always ask this question; how do we know that the system has a particular problem?

The solution to these issues will be found, if you dig the problem in the correct path. Secondly, if you come across a number of problems in your career, then you can correlate them and solve the problem. If you ask me, practically it's impossible to teach troubleshooting, as it comes from your own experience and your interest to solve the problem. Here, we discuss one of the common problems, application slowness, that occurs in every environment and the web administrator has to face this problem in his/her career.

Slowness issue in applications

Let's take a real-time situation where users complain about the performance of the application. The application comprises of an enterprise setup, which is a combination of the Apache HTTP server as a frontend, Tomcat 7 is used as a servlet container, and the Oracle database running as a backend database server.

Issue:

Let's discuss one of the common issues of the middleware application, which make it very difficult for the administrator to solve. This issue is called slowness of application, where users complain that the application is running slow. It's a very critical problem from the administrator's point of view, as slowness can be caused by any component of the web application, such as the OS, DB, web server, network, and so on.

Until and unless we find out which particular component is causing the problem, the slowness will persist and from the user's point of view, the application will not run in a stable manner. The following figure shows the typical web infrastructure request flow for a web application:

Slowness issue in applications

How to solve slowness issues in Tomcat 7

Slowness in the application can be caused by any component, so it is best practice to start troubleshooting from the user end.

User end troubleshooting

Perform the following steps to troubleshoot:

  1. Try to access the application from the user's browser and check how much time it takes to load the application page.
  2. Check the ping response of the server from the user side, for example, abc.com, using the command ping. If you get an appropriate response, it means the connectivity for the application server and user machine is working fine.
    ping abc.com
    
User end troubleshooting
  • The previous screenshot indicates the ping response for abc.com. There are some important points we have to keep in mind during ping status monitoring, which are mentioned as follows:
    • The packet sent and received should have an equal count. In the previous screenshot we can see it's 4. If the count is less, it means that there is some issue within the network.
    • There should be no packet loss. Also, the average response time should not be high.

Note

Many external sites disable the ping response for their nodes. This doesn't mean the system is down. In that case, try the telnet port, by using the command telnet URL port.

Note

Windows 7, by default, does not come with telnet, we need to install it.

The previous screenshot shows the ping response for the server working appropriately. That means there are no issues from the user end in terms of the system and network.

Web server troubleshooting

Once we know that there are no issues at the user end, we will move to the next level in the application, that is web server. Now, we have to dig down in the server to check if there are any issues.

Web server issues are more often related to the load of the server, user threads, or mounting problems. Let us see how to solve the issue.

  1. Check whether the web server process is running or not. If it is running, check how many processes are running by using the following command. This command will show the number of processes and their status.
    ps -aef |grep httpd
    
    • The previous command shows the number of the processes running for the Apache httpd server. If the processes are greater than 50, it means that there is some issue with the web server such as a high CPU utilization, high user traffic, high disk I/O, and so on.
  2. Then, check the CPU utilization and memory status of the system to see if any Apache processes are consuming a high CPU usage by using the following command:
    top|head
    
    • The previous command will display the process which consumes the highest CPU usage and load average of the machine. The following screenshot shows the output of the previous command. If the load average is high or Apache process has a high CPU utilization, then it is one of the reasons for slowness in the application, otherwise we can proceed to the next level.

    Tip

    In such cases, as mentioned earlier, you have to kill all Apache processes and then recycle the Apache instance.

    Web server troubleshooting
  3. The next step is to check the Apache logs and search for errors in the error and access logs. The following screenshot shows the system has started successfully:
    Web server troubleshooting
    httpd.exe: Could not reliably determine the server's fully qualified domain name, using 10.0.0.3 for ServerName.
    
    • The previous message is a notification message (info) in apache error_log. The log in the previous screenshot shows that " the Apache HTTP server could not find a fully qualified domain". This means that in the httpd.conf, we have missed defining the server name with a fully-qualified domain, for example, we have defined the localhost as the server name; instead of that, we have to define [email protected].

      Also, there are two commands which are useful for searching the error in the logs. They are as follows:

      tail -f log file |grep ERROR
      
    • The previous command is used when you want to search the error in the logs.
      grep " 500 " access_log
      
    • The previous command is used to search error codes in the logs.

    Note

    In case logs are not generated for Apache, it may be due to the hard drive running out of space.

  4. One of the major reasons for the hard drive running out of space on the server mount, where application logs are mounted, is improper log rotation. Use the df command to check the mount space, where df = disk free and switch -h = human readable. The syntax to use the df command is as follows and the output is shown in the following screenshot:
    df -h
    
    Web server troubleshooting

    Tip

    If any mount is running greater than 95 percent, then reduce the disk utilization, otherwise the system may cause a disruption of services.

If we don't find any error in the previously mentioned components, than we can conclude that there are no issues with the web server.

Tomcat 7 troubleshooting

In Java-based applications, slowness is caused due to many issues. Some of them are due to the JVM memory, improper application deployment, incorrect DB configuration, and so on. Let's discuss some basic troubleshooting steps for Tomcat 7:

  1. Check the Java processes for Tomcat and the load average for the instance machine:
    ps -ef |grep java
    
    Tomcat 7 troubleshooting
    • The previous screenshot shows the Java processes running in the machine. The previous command checks all the Java processes running in the system and the load average for the Tomcat instance. The load average gives us some important clues. In case you find the load average is very high, then check which process has a high CPU usage and find out the reason for using a high CPU. Also, it shows the RAM and Swap usage.

      The following screenshot shows the output of the head command on the Tomcat server:

      top|head
      
      Tomcat 7 troubleshooting
    • The head command displays the content from the first line of a file or output. It is very frequently used with -n switch where n= the number of lines to display. By default, it displays 10 lines if -n is not used.
  2. Then check the Tomcat logs which can be found in TOMCAT_HOME/logs, and search for the exception in the log files, mainly in catalina.out, localhost.yyyy-mm-dd.log using the following command:
    grep INFO catalina.out
    
    Tomcat 7 troubleshooting
    • The previous screenshot shows the Tomcat startup in the logs. If there are any errors in the logs, they can be checked using the following command:
      grep ERROR catalina.out
      

Troubleshooting at the database level

As a web administrator, you don't have access to the database servers. But a web administrator can connect to the DB server externally, without logging into a physical machine, as the administrator has the connection string (credentials for accessing the database). For example, you can do the telnet on the port where the DB server is running, and check whether the services are running or not.

Telnet DB server IP port

If the telnet is successful, then you can verify the following processes:

  • Number of database connections: We can always ask our DBA to check the number of connections on the database. If the connections count is high, then we can work with the DBA to reduce the connections on the server.
  • SQL query optimization: We can check with the DBA to see which queries consume more time to execute in the database and ask our developers to optimize the query. This really helps in improving the performance of the application.
  • Load balancing database across multiple servers: Another important point which may cause slowness in the application is the load balancing of the database across multiple servers. If the load balancing is not configured correctly, then it may cause slowness in the application. If there is a delay in the network between the two database servers, then sync may not happen appropriately.

JVM analysis in the Tomcat instance

There are some chances where the JVM is over utilized in the application. To view the memory allocation for the JVM instance, you can use the command-line utility, jmap. This command comes with JDK 1.6. It's a Java utility, which determines the entire memory allocation of the Tomcat instance.

[root@localhost logs]# jmap -heap "TOMCAT INSTANCE PID "

Let us discuss how the previous command performs. The jmap command internally collects the JVM memory details, -heap is the switch that tells jmap to collect and display the heap memory footprint, TOMCAT INSTANCE PID is the process ID of the Tomcat instance for which process jmap has to fetch the memory details.

[root@localhost logs]# jmap -heap 10638

The following screenshot shows the output of the jmap command for the previous process ID:

Tip

How to find the process ID

We can find the process ID using the following command:

ps -ef |grep "tomcat instance name " |awk -F" " '{print $2}'|head -1

This command can be described as, ps -ef |grep "tomcat instance name " will find all the processes running for the Tomcat instance. awk -F" " '{print $2}' awk prints the process ID of a particular process and head -1 will display the first process ID.

The jmap command is present in JAVA_HOME/bin and if you set the JAVA _HOME/bin in the path, then you can execute the command from anywhere.

JVM analysis in the Tomcat instance

The previous utility gives the entire footprint of the JVM memory and its allocation for the Tomcat instance. The JVM memory comprises of the following components:

  • Heap configuration
  • Heap usage
  • From space
  • To space
  • Tenured generation
  • Perm generation
  • Eden space

Out of memory issues such as perm generation and max heap are very commonly known issues in the production environment. Check the memory to see whether any of the previous components are utilizing more than 95 percent. If so, then we have to increase the respective parameter.

Now it comes to the place where we can determine which JVM component is creating the issue for the Tomcat instance. If the memory is working fine, then it is time to generate a thread dump to drill the application-level issue.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
34.239.150.167