TCP retransmission – where do they come from and why

When TCP sends a packet or a group of packets (refer to the How it works… section later in this recipe), it waits for acknowledgment to confirm the acceptance of these packets. Retransmissions, obviously, happen due to a packet that has not arrived, or acknowledgment that has not arrived on time. There can be various reasons for this, and finding the reason is the goal of this recipe.

Getting ready

When you see that the network becomes slow, one of the reasons for this can be retransmissions. Connect Wireshark in the port mirror to the suspicious client or server, and watch the results.

In this recipe, we will see some common problems that we encounter with Wireshark, and what they indicate.

How to do it...

Let's get started:

  1. Start capturing data on the relevant interface.
  2. Go to the Analyze | Expert Info menu.
  3. Under Notes, look for Retransmissions.
  4. You can click on the (+) sign and a list of retransmissions will open. A single mouse click on every line will bring you the retransmission in the packet capture pane.
  5. Now comes the important question: how to locate the problem.

    Tip

    When you capture packets over a communication line, server interface, link to the Internet, or any other line, you can have traffic from many IP addresses, many applications, and even specific procedures on every application, for example, accessing a specific table in a database application. The important thing here is to locate the TCP connections on which the retransmissions happen.

  6. You can see where the retransmissions come from by:
    • Moving packet-by-packet in the Expert Info window, and looking for what packets does it take you in the packet capture pane (good for experienced users)
    • In the packet pane, configure the display filter expert.message == "Retransmission (suspected)", and you will get all retransmissions in the capture file
    • By applying the filter, and then checking the Limit to display filter section to the right-bottom corner of the window in the Statistics à Conversations window

Case 1 – retransmissions to many destinations

In the following screenshot, you see that we've got many retransmissions, spread between many servers, with destination ports 80 (HTTP). What we can also see from here is the 10.0.0.5 port sends the retransmission, so packets were lost on the way to the Internet, or acknowledgement was not sent back on time from the web servers.

Case 1 – retransmissions to many destinations

Well, obviously something is wrong on the line to the Internet. How can we know what it is?

  1. From the Statistics menu, open IO Graph.
  2. In this case (case 1), we see that the line is nearly empty. Probably it is an error, or another loaded line on the way to the Internet.
  3. You can check packet losses and errors that cause them by logging into the communications equipment or by any SNMP browser (when the SNMP agent is configured on the equipment). Check the following screenshot for reference:
    Case 1 – retransmissions to many destinations

Case 2 – retransmissions on a single connection

If all retransmissions will be on a single IP, with a single TCP port number, it will be a slow application. We can see this in the following screenshot:

Case 2 – retransmissions on a single connection

For retransmissions on a single connection, perform the following steps:

  1. We can also verify this by opening Conversations from the Statistics menu, and by selecting the Limit to display filter checkbox, we will get all the conversations that have retransmissions, in this case, a single conversation.
  2. By choosing the IPv4 tab as shown in the following screenshot we will see from which IP addresses we get the retransmissions:
    Case 2 – retransmissions on a single connection
  3. By choosing the TCP tab as shown in the following screenshot we will see from which port numbers (or applications) we get the retransmissions:
    Case 2 – retransmissions on a single connection

To isolate the problem, perform the following steps:

  1. Look at the IO graph, and make sure that the line is not busy.

    Tip

    An indication of a busy communication line will be a straight line very close to the maximum bandwidth of the line. For example, if you have a 10 Mbps communication line, you port mirror it, and see in the IO graph a straight line which is close to the 10 Mbps, this is a good indication of a loaded line. A non-busy communication line will have many ups and downs, peaks and empty intervals.

  2. If the line is not busy, it can be a problem on the server for the IP address 10.1.1.200 (10.90.30.12 is sending most of the retransmissions, so it can be that 10.1.1.200 responds slowly).
  3. From the packet pane we see that the application is FTP-DATA. It is possible that the FTP server works in an active mode. Hence we've opened a connection on one port (2350), and the server changed the port to 1972, so it can be a slow non-responsive FTP software (that was the problem here eventually).

Case 3 – retransmission patterns

An important thing to watch for in TCP retransmissions is if the retransmissions have any pattern that you can see.

In the following screenshot, we see that all retransmissions are coming from a single connection, between a single client and NetBIOS Session Service (TCP port 139) on the server.

Case 3 – retransmission patterns

Looks like a simple server/application problem, but when we look at the packet capture pane, we see something interesting (refer to the following screenshot):

Case 3 – retransmission patterns

The interesting thing is that when we look at the pattern of retransmissions, we see that they occur cyclically every 30 ms. The time format here is seconds, since the previously displayed packet and the time scale is in seconds.

The problem in this case was a client that performed a financial procedure in the software that caused the software to slow down every 30-36 ms.

Case 4 – retransmission due to a non-responsive application

Another reason for retransmissions can be when a client or a server does not answer to requests. In this case, you will see five retransmissions, with an increasing time difference. After these five consecutive retransmissions, the connection is considered to be lost by the sending side (in some cases, reset will be sent to close the connection, depending on the software implementation). After the disconnection, two things may happen:

  • An SYN request will be sent by the client, in order to open a new connection. What the user will see in this case is a freeze in the application, and after 15-20 seconds it will start to work again
  • No SYN will be sent, and the user will have to run the application (or a specific part of it) again

In the following screenshot we can see a case in which a new connection is opened:

Case 4 – retransmission due to a non-responsive application

Case 5 – retransmission due to delayed variations

TCP is a protocol that is quite tolerant of delays, as long as the delay does not vary. When you have variations in delay, you can expect retransmissions. The way to find out if this is the problem is as follows:

  1. The first thing to do is, of course, to ping the destination, and get the first information of the communications line delay. Look at the How it works… section to see how it should be.
  2. Check for the delay variations, which can happen due to the following reasons:
    • A delay can happen due to a non-stable or busy communication line. In this case, you will see delay variations using the Ping command. Usually it will happen on lines with a narrow bandwidth, and in some cases on cellular lines.
    • A delay can happen due to a loaded or inefficient application. In this case, you will see many retransmissions on this specific application only.
    • A delay can happen due to a loaded communication equipment (CPU load, buffer load, and so on). You can check this by accessing the communication equipment directly.
  3. Use the Wireshark tools as explained in Chapter 13, Troubleshooting Bandwidth and Delay Problems.

    Tip

    The bottom line with TCP retransmissions is that retransmissions are a natural behavior of TCP as long as we don't have too many of them. Degradation in performance will start when the retransmissions are around 0.5 percent, and disconnections will start around 5 percent. It also depends on the application and its sensitivity to retransmissions.

Finding what it is

When you see retransmissions on a communication link (to the Internet, on a server, between sites, or any other link), perform the following:

  1. Locate the problem—is it a specific IP address, specific connection, specific application, or some other problem.
  2. Check if the problem is because of the communication link, packet loss, or a slow server or PC. Check if the application is slow.
  3. If it is not due to any of the preceding reasons, check for delay variations.

How it works...

Let's see the regular operation of TCP, and what are the causes for problems that might happen.

Regular operation of the TCP Sequence/Acknowledge mechanism

One of the mechanisms that is built into TCP is the retransmission mechanism. This mechanism enables the recovery of data that is damaged, lost, duplicated, or delivered out of order.

This is achieved by assigning a sequence number to every transmitted byte, and expecting an acknowledgment (ACK) from the receiving party. If the ACK is not received within a timeout interval, the data is retransmitted.

At the receiver end, the sequence numbers are used in order to verify that the information comes in the order that it was sent. If not, rearrange it to its previous state.

This mechanism works as follows:

  1. At the connection establishment, both sides tell each other what will be their initial sequence number.
  2. When data is sent, every packet has a sequence number. The sequence number indicates the number of the first byte in the TCP payload. The next packet that is sent will have the sequence number of the previous one, plus the number of bytes in the previous packet, plus one (in the next screenshot).
  3. When a packet is sent, the RTO (Retransmission Timeout) counter starts to count the time from the moment it was sent.

    Tip

    The Retransmission Timeout timer is based on the Van Jacobson congestion avoidance and control algorithm, which basically says the TCP is tolerant to high delays, but not to fast delay variations.

  4. When the receiver receives the packet, it answers with an ACK (Acknowledge) packet that tells the sender to send the next packet. In the following screenshot you will see how it works:
    1. You can see from here that 10.0.0.7 is downloading a file from 62.219.24.171. The file is downloaded via HTTP (the Wireshark window was configured to show tcp.seq and tcp.ack from the Edit | Preferences columns configuration, as described in Chapter 1, Introducing Wireshark).
      Regular operation of the TCP Sequence/Acknowledge mechanism
    2. You can see from here that 62.219.24.171 sends a packet with a sequence number of 120185105, and then a packet with the sequence number 120186557. When receiving these two packets, the client 10.0.0.7 tells the server to send him the next packet with ACK = 120188009, after which the server sends the packet with the sequence number 120188009, and the next packet with sequence number 120189461, and so on.

      You can see a diagram for this.

    Regular operation of the TCP Sequence/Acknowledge mechanism

What are TCP retransmissions and what do they cause

When a packet acknowledgment is lost, or when an ACK does not arrive on time, the sender will perform two things:

  1. Send the packet again, as described earlier in this recipe.
  2. Decrease the throughput.

    In the next screenshot we see an example of retransmissions that reduce the sender throughput (red thin lines added for clarity):

    What are TCP retransmissions and what do they cause

There's more...

TCP is tolerant of high delays, as long as they are reasonably stable. The algorithm that defines the TCP behavior under delay variations (among other things) is called the Van Jacobson algorithm (after the name of its inventor). The Van Jacobson algorithm enables tolerance of up to 3-4 times the average delay, so if for example, you have a delay of 100 ms, TCP will be tolerant to delays of up to 300-400 ms as long as they are not frequently changed.

See also

You can check the Van Jacobson algorithm at http://ee.lbl.gov/papers/congavoid.pdf.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.14.118