When TCP sends a packet or a group of packets (refer to the How it works… section later in this recipe), it waits for acknowledgment to confirm the acceptance of these packets. Retransmissions, obviously, happen due to a packet that has not arrived, or acknowledgment that has not arrived on time. There can be various reasons for this, and finding the reason is the goal of this recipe.
When you see that the network becomes slow, one of the reasons for this can be retransmissions. Connect Wireshark in the port mirror to the suspicious client or server, and watch the results.
In this recipe, we will see some common problems that we encounter with Wireshark, and what they indicate.
Let's get started:
When you capture packets over a communication line, server interface, link to the Internet, or any other line, you can have traffic from many IP addresses, many applications, and even specific procedures on every application, for example, accessing a specific table in a database application. The important thing here is to locate the TCP connections on which the retransmissions happen.
expert.message == "Retransmission (suspected)"
, and you will get all retransmissions in the capture fileIn the following screenshot, you see that we've got many retransmissions, spread between many servers, with destination ports 80
(HTTP). What we can also see from here is the 10.0.0.5
port sends the retransmission, so packets were lost on the way to the Internet, or acknowledgement was not sent back on time from the web servers.
Well, obviously something is wrong on the line to the Internet. How can we know what it is?
If all retransmissions will be on a single IP, with a single TCP port number, it will be a slow application. We can see this in the following screenshot:
For retransmissions on a single connection, perform the following steps:
To isolate the problem, perform the following steps:
An indication of a busy communication line will be a straight line very close to the maximum bandwidth of the line. For example, if you have a 10 Mbps communication line, you port mirror it, and see in the IO graph a straight line which is close to the 10 Mbps, this is a good indication of a loaded line. A non-busy communication line will have many ups and downs, peaks and empty intervals.
10.1.1.200
(10.90.30.12
is sending most of the retransmissions, so it can be that 10.1.1.200
responds slowly).2350
), and the server changed the port to 1972
, so it can be a slow non-responsive FTP software (that was the problem here eventually).An important thing to watch for in TCP retransmissions is if the retransmissions have any pattern that you can see.
In the following screenshot, we see that all retransmissions are coming from a single connection, between a single client and NetBIOS Session Service (TCP port 139
) on the server.
Looks like a simple server/application problem, but when we look at the packet capture pane, we see something interesting (refer to the following screenshot):
The interesting thing is that when we look at the pattern of retransmissions, we see that they occur cyclically every 30 ms. The time format here is seconds, since the previously displayed packet and the time scale is in seconds.
The problem in this case was a client that performed a financial procedure in the software that caused the software to slow down every 30-36 ms.
Another reason for retransmissions can be when a client or a server does not answer to requests. In this case, you will see five retransmissions, with an increasing time difference. After these five consecutive retransmissions, the connection is considered to be lost by the sending side (in some cases, reset will be sent to close the connection, depending on the software implementation). After the disconnection, two things may happen:
In the following screenshot we can see a case in which a new connection is opened:
TCP is a protocol that is quite tolerant of delays, as long as the delay does not vary. When you have variations in delay, you can expect retransmissions. The way to find out if this is the problem is as follows:
The bottom line with TCP retransmissions is that retransmissions are a natural behavior of TCP as long as we don't have too many of them. Degradation in performance will start when the retransmissions are around 0.5 percent, and disconnections will start around 5 percent. It also depends on the application and its sensitivity to retransmissions.
When you see retransmissions on a communication link (to the Internet, on a server, between sites, or any other link), perform the following:
Let's see the regular operation of TCP, and what are the causes for problems that might happen.
One of the mechanisms that is built into TCP is the retransmission mechanism. This mechanism enables the recovery of data that is damaged, lost, duplicated, or delivered out of order.
This is achieved by assigning a sequence number to every transmitted byte, and expecting an acknowledgment (ACK) from the receiving party. If the ACK is not received within a timeout interval, the data is retransmitted.
At the receiver end, the sequence numbers are used in order to verify that the information comes in the order that it was sent. If not, rearrange it to its previous state.
This mechanism works as follows:
10.0.0.7
is downloading a file from 62.219.24.171
. The file is downloaded via HTTP (the Wireshark window was configured to show tcp.seq
and tcp.ack
from the Edit | Preferences columns configuration, as described in Chapter 1, Introducing Wireshark).62.219.24.171
sends a packet with a sequence number of 120185105
, and then a packet with the sequence number 120186557
. When receiving these two packets, the client 10.0.0.7
tells the server to send him the next packet with ACK = 120188009
, after which the server sends the packet with the sequence number 120188009
, and the next packet with sequence number 120189461
, and so on.You can see a diagram for this.
When a packet acknowledgment is lost, or when an ACK does not arrive on time, the sender will perform two things:
In the next screenshot we see an example of retransmissions that reduce the sender throughput (red thin lines added for clarity):
TCP is tolerant of high delays, as long as they are reasonably stable. The algorithm that defines the TCP behavior under delay variations (among other things) is called the Van Jacobson algorithm (after the name of its inventor). The Van Jacobson algorithm enables tolerance of up to 3-4 times the average delay, so if for example, you have a delay of 100 ms, TCP will be tolerant to delays of up to 300-400 ms as long as they are not frequently changed.
You can check the Van Jacobson algorithm at http://ee.lbl.gov/papers/congavoid.pdf.
18.116.14.118