Chapter 15. IPsec Overhead and Fragmentation

Introduction

Finding out how much overhead IPsec will add to a given packet is not a simple task—there are many different reasons why overhead may be involved in producing an IPsec packet. In the first half of this chapter, we will first look at the structure of GRE, Encapsulation Security Payload, and Authentication Header protocols and the overhead involved and the characteristics of the algorithms that may be used to protect the traffic. We will then look at several ways to derive the plaintext maximum transmission unit (MTU) from the transport MTU and present a table and a set of formulas that make it easy to compute the maximum IPsec overhead for any given situation.

In the second half of this chapter, we will first review fragmentation in IPv4 and IPv6 as well as the concepts of MTU, Path MTU Discovery (PMTUD), and Maximum Segment Size (MSS) tuning. Then, we will look at fragmentation in IPsec and explain how the previous concepts translate to virtual interfaces. We will finish with recommendations on how to avoid the undesirable effects of fragmentation.

This chapter contains a lot of theory. Even though it would have been easier to produce a short chapter with a simple table of algorithm combinations and associated overhead, we strongly feel that it is crucial to build a deep understanding of these topics in order to avoid the many traps of IPsec fragmentation. Performance and packet drop problems due to incorrect MTU/MSS or broken PMTUD are far more common than they should be, and too many of those problems are due to lack of knowledge about how IPsec handles fragmentation.

Considerations regarding legacy crypto map configurations have been intentionally left out. Crypto maps introduce additional complexity with regard to fragmentation that we feel would only add confusion in an already dense chapter. Furthermore, as mentioned in Chapter 4IOS IPsec Implementation”, the way forward in IOS VPN is clearly to move away from legacy crypto maps in favor of tunnel interfaces due to their numerous advantages.

Computing the IPsec Overhead

This section describes the various components involved in computing the IPsec overhead.

General Considerations

The following factors may contribute to the IPsec encapsulation overhead:

Image the addition of a new IP header (tunnel mode only)

Image the GRE encapsulation overhead (GRE over IPsec only)

Image the addition of an Authentication Header or Encapsulation Security Payload header and trailer

Image the payload structure of the encryption algorithm (Encapsulation Security Payload only)

Image the length of the plaintext and the block size of the encryption algorithm (Encapsulation Security Payload only)

Image the output size of the integrity algorithm (Encapsulation Security Payload with integrity or Authentication Header only)

Image padding required for payload or integrity check value (ICV) alignment

Image additional TFC padding

The IPsec-v3 standards (RFCs 4301 to 4303) introduced new concepts such as combined-mode algorithms (which provide both confidentiality and integrity) and Traffic Flow Confidentiality (TFC). Encapsulation Security Payload allows for TFC to enable extra padding to hide the actual length of the plaintext in an Encapsulation Security Payload packet, which is not supported by Cisco IOS. TFC also allows for the generation of dummy packets to be sent that are used to shape encrypted traffic flows to hide the appearance of encrypted traffic. The following example illustrates enabling dummy traffic on Cisco IOS.

Router(config)#crypto ipsec security-association dummy pps ?
  <0-25>  Rate of simulated traffic (in PPS)

Router(config)#crypto ipsec security-association dummy seconds ?
  <1-3600>  Rate of simulated traffic (seconds between packets)

For the sake of simplicity and because IOS does not allow it, we will not cover TFC padding in this chapter; refer to RFC 4303 for details.

Combined-mode algorithms (AES-GCM for example) provide confidentiality and integrity services. A combined-mode algorithm returns both a ciphertext and an integrity check value (ICV) as output, thus a separate integrity algorithm is not needed. Particular combined-mode algorithms (AES-GMAC for example) may provide no encryption and only integrity; those can be used with Encapsulation Security Payload to carry unencrypted but authenticated traffic.

A valid but rare scenario is the combination of Encapsulation Security Payload and Authentication Header in tunnel or transport mode (double encapsulation). Since that scenario can be reduced to having nested Encapsulation Security Payload and Authentication Header security associations, the overhead can be calculated in two steps provided that the order of encapsulation is known (that is, which transform is applied first). Refer to RFC 2401 for detailed considerations regarding nested transforms.

IPsec Mode Overhead (without GRE)

Encapsulation Security Payload and Authentication Header in tunnel mode add an IP header in front of the original packet. The base length of an IP header is 20 bytes for IPv4 and 40 bytes for IPv6. The actual length can be greater if IP options (for IPv4) or extension headers (for IPv6) are added.

With IPv4, the protocol field in the new IP header is set to 50 for Encapsulation Security Payload or 51 for Authentication Header and the Encapsulation Security Payload or Authentication Header is next in the encapsulation chain. With IPv6, the Encapsulation Security Payload or Authentication Header is added as an extension header to the new IP packet.

Table 15-1 shows the IP header overhead of tunnel mode and transport mode (without GRE) for IPv4 and IPv6.

Image

Table 15-1 IPsec Overhead

GRE Overhead

GRE/IPsec (see Chapter 4, “IOS IPsec implementation”) involves two successive encapsulations: first the original packet within GRE, then the resulting GRE packet within IPsec. During the first step, an IP header (IPv4 or IPv6 depending on the GRE endpoints) and a GRE header are added at the beginning of the plaintext packet. The length of the GRE header depends on the tunnel configuration (for example, presence of the Key field).

The influence of these two additional headers on the total IPsec overhead depends on the IPsec mode used in the second step:

Image with transport mode, the IP header from GRE encapsulation is reused by IPsec, and only the GRE header becomes part of the plaintext data;

Image with tunnel mode, the IP and GRE headers both become part of the plaintext data, and a new IP header is added by IPsec.

It is important to differentiate the number of bytes added to the IP header and to the plaintext, as the total length of the plaintext can have an influence on the amount of padding required to reach the appropriate block size for encryption (Encapsulation Security Payload with certain algorithms only; see “Encryption Overhead” later in this section). It is also important to remember that using GRE with tunnel mode brings a potentially unnecessary IP header that serves no purpose when using IPsec on Cisco IOS except for NAT traversal when there are multiple nodes behind the same PAT device with same public address but unique private addresses..

Figure 15-1 shows the order of headers in the output packet for each of the two modes.

Image

Figure 15-1 GRE Overhead

Table 15-2 summarizes the impact of IPsec mode, IPsec transport protocol (IPv4 or IPv6) and GRE mode (over IPv4 or IPv6) on the IP header overhead and the plaintext overhead.

Image

Table 15-2 GRE with IPsec Overhead


Note

In this chapter we consider only the case of GRE over IPsec. The opposite scenario, IPsec over GRE, is a rare but valid order of encapsulation that may be used in some specific designs. In that case, the IPsec overhead does not depend on the presence of GRE, since the GRE encapsulation takes place after IPsec encapsulation; thus the total overhead is the sum of the (outer) GRE overhead and the (inner) IPsec overhead.

Image The IPsec and the GRE transport protocols can differ when the IPsec and GRE tunnel endpoints are different. For example, IPsec tunnel mode of IPv4 with GRE/IPv6 or IPsec tunnel mode of IPv6 with GRE/IPv4 .

Image The IP header overhead refers to the overhead of the IP header in front of the IPsec header which depends only on the IP transport type (IPv4/IPv6) and is independent of IPsec transport or tunnel mode.

Image The plaintext overhead refers to the extra bytes protected by IPsec in addition to the original IP packet.


Encapsulating Security Payload Overhead

Encapsulation Security Payload adds both a header and a trailer around the encapsulated packet. Figure 15-2 shows the general structure of an Encapsulation Security Payload packet (outer IP header not shown).

Image

Figure 15-2 Encapsulating Security Payload Format

The Encapsulation Security Payload header contains two 32-bit (4-byte) values, the SPI and sequence number. The payload data that follows is the result of the encryption algorithm applied to the plaintext packet and may contain additional data on top of the ciphertext, for example an Initialization Vector (IV). The structure of the payload data is only relevant to the specific encryption algorithm that is used and is transparent to Encapsulation Security Payload.

The Encapsulation Security Payload trailer contains two 8-bit (1-byte) values the Pad Length, which contains the number of padding bytes (see below), and the Next Header field, which contains the IP protocol number for the packet contained within the payload field. Those two fields are encrypted along with the payload data and are always the last 2 bytes of the encrypted portion of the Encapsulation Security Payload packet.

If an integrity algorithm is part of the transform set, the trailer also contains an ICV produced by the algorithm.

The variable-length padding (0 to 255 bytes) that precedes the trailer has two purposes:

Image adjusting the plaintext (which includes the Pad Length and Next Header fields) to a multiple of the block size required by the block cipher used for encryption (if applicable)

Image aligning the start of the ICV on a 32-bit (4-byte) boundary

The first item may not be required if the cipher operates as a stream cipher or if it allows for any plaintext input size without padding (like AES-GCM for example).

Based on this, the Encapsulation Security Payload overhead will be:

Image 8 bytes for the header

Image a certain amount of payload data (length depending on the encryption algorithm and the size of the plaintext packet, see “Encryption Overhead” later in this section)

Image a certain amount of padding (length depending on the amount of payload data)

Image 2 bytes for the fixed trailer fields

Image a certain amount of ICV data (length depending on the integrity algorithm, see “Integrity Overhead” later in this section)

We will therefore need to examine the overhead for the various encryption and integrity algorithms before we can calculate the exact Encapsulation Security Payload overhead. Also note that the plaintext may include a GRE header and an IP header (see “GRE Overhead” earlier in this section).

When using a combined-mode algorithm (like AES-GCM), the ICV data may be accompanied by additional data such as the IV for the algorithm (see “Combined-mode Algorithm Overhead” later in this section).

Authentication Header Overhead

Authentication Header only adds a header in front of the encapsulated packet. Figure 15-3 shows the structure of an Authentication Header packet (outer IP header not shown).

Image

Figure 15-3 Authentication Header Format

The Authentication Header contains two 8-bit (1-byte) values: a Next Header field with the same meaning as in Encapsulation Security Payload and a Payload Length field that indicates the length of the Authentication Header (not the payload data length as the name seems to imply; see RFC 4302 for details). The next 16 bits (2 bytes) are reserved and set to zero. Two 32-bit (4-byte) fields for SPI and sequence number follow. The total is thus 12 bytes.

The last part of the Authentication Header is the ICV produced by the integrity algorithm. If the ICV does not terminate on a 32-bit (4-byte, for IPv4 transport) or 64-bit (8-byte, for IPv6 transport) boundary, padding is added after the ICV so that the payload starts on the next boundary.

Based on this, the Authentication Header overhead will be

Image 12 bytes for the fixed-length header fields

Image a certain amount of ICV data (length depending on the integrity algorithm; see “Integrity Overhead” later in this section)

Image 0 to 7 bytes of padding (depending on the ICV length and the transport protocol)

When a combined-mode algorithm (like AES-GMAC) is used with Authentication Header, the ICV data may be accompanied by additional data such as the IV for the algorithm. This additional data will have to be accounted for in the total overhead.

Encryption Overhead

The structure of the encrypted payload in Encapsulation Security Payload packets depends on the type and characteristics of the encryption algorithm. Block ciphers work on input and output blocks of a given length: before encryption, the input data must be padded to reach the block size, and after decryption, the padding must be removed to get back the original input data. This is referred to as explicit padding as it is actually carried as part of the encrypted payload, concealed within the ciphertext (Figure 15-4).

Image

Figure 15-4 Ciphertext Explicit Padding

Algorithms that work in cipher block chaining (CBC) or Counter (CTR) mode require that an Initialization Vector be carried along with the encrypted payload. The IV typically comes first and is followed by the encrypted payload. The IV length is the same as the block size for the algorithm (note that the block size is not necessarily the same as the key size, for example AES-256 uses a 256-bit key and a 128-bit block).

Table 15-3 lists the block/IV and key size for the most common encryption algorithms, as well as the most recent RFC that defines their use in IPsec. The NULL algorithm (Encapsulation Security Payload without encryption) produces output that is the same as its input, thus no input padding is required.

Image

Table 15-3 Algorithm Block, IV, and Key Sizes

Integrity Overhead

Many integrity algorithms are based on some sort of one-way-hash function; a few others use an encryption algorithm (for example, AES-XCBC-MAC-96 is based on 128-bit AES). Different hash functions or algorithms have different output sizes, and the resulting hash is sometimes truncated to a certain length to produce the ICV. For example, the outputs of SHA-1 and MD5 are 160 bits (20 bytes) and 128 bits (16 bytes) long, respectively, but they are all truncated to 96 bits (12 bytes) for use in IPsec.

Most origin authentication and integrity algorithms use HMAC, an algorithm defined in RFC 2104 that provides message authentication based on a given hash function and a secret key. The output size of HMAC is the same as that of the underlying hash function, but it may also be truncated to produce a shorter ICV.

For the protection of IPsec Security Associations, the secret key is derived from the attributes negotiated in the IKE_SA_INIT exchange;

KEYMAT = prf + (SK_d, Ni | Nr)

Where SK_d is generated from the IKE_SA_INIT exchange.

Additionally at rekey when new IPsec Security Associations are created using the formula;

KEYMAT = prf + (SK_d, gir (new) | Ni | Nr)

Where gir (new) is the shared secret from the ephemeral Diffie-Hellman exchange.

Table 15-4 lists the most common integrity algorithms, the output size of the hash function they are based on (if applicable), the actual length of the produced ICV, and the most recent RFC that defines their use in IPsec.

Image

Table 15-4 HMAC Base and ICV Output

These integrity algorithms have different input block sizes, and the input data may require padding when computing the ICV. However this padding is only used during computation and is never sent as part of the packet (it is referred to as implicit padding); therefore Table 15-4 makes no mention of the input block size since it has no bearing on the integrity overhead in the output packet.

Combined-mode Algorithm Overhead

Since IPsec-v3, encryption and integrity protection with Encapsulation Security Payload can be achieved using a single algorithm called a combined-mode algorithm (which returns both a ciphertext and an ICV), like AES-GCM (Galois Counter Mode) for example.

A particular case of such an algorithm is AES-GMAC, which only provides integrity but is available both as an integrity algorithm (for use in Authentication Header) and a combined algorithm (for use in Encapsulation Security Payload). Both variants require the transmission of an IV; this sets the integrity variant of GMAC aside from other integrity algorithms that only add an ICV to the packet.

Table 15-5 shows the most common combined-mode algorithms, which share a number of characteristics:

Image any input size is allowed (no input padding is required)

Image the output ciphertext is the same length as the input plaintext

Image an IV needs to be added to the Encapsulation Security Payload packet

Image AES is the underlying algorithm and the three AES key sizes are supported

Image

Table 15-5 Combined-Algorithm IV, ICV, and Key Sizes

Plaintext MTU

The IPsec overhead does not depend only on the transform set; it also varies with the length of the packet due to the presence of padding. Table 15-6 shows the encrypted packet length and total overhead for selected plaintext packet lengths when using Encapsulation Security Payload tunnel mode, AES-CBC 128 and HMAC-SHA-1-96. The overhead varies between 58 and 73 bytes.

Image

Table 15-6 Varying Overhead for AES-CBC 128 and HMAC-SHA-1-96

Internally, Cisco IOS computes the IPsec overhead by considering the IP MTU on the output interface and finding the largest possible plaintext packet that will fit inside that MTU after encapsulation. Table 15-6 tells us that for an MTU of 1500 bytes, the largest possible plaintext packet is 1438 bytes; therefore the overhead as computed by IOS for these transforms and for an MTU of 1500 is 62 bytes.

This is the best way to compute the highest possible plaintext MTU: any plaintext packet with a length of (1500 – 62) = 1438 bytes or smaller will not require fragmentation. Note that this value is not the true IPsec overhead for the packet, which is (1496 – 1438) = 58 bytes. The disadvantage of this method is that it requires a new computation every time the output IP MTU changes.

Maximum Overhead

Another, simpler method to find a safe value for the plaintext MTU is to find an upper bound for the overhead. This method has the advantage of returning a worst case value that can be trusted regardless of the output interface MTU, at the expense of a slightly suboptimal plaintext MTU.

Table 15-7 shows the overhead characteristics of the encryption, combined and integrity algorithms that were mentioned so far, summarized into the following values:

Image Maximum input padding: the largest amount of input data padding required for the plaintext to reach the appropriate input size. For integrity algorithms the value is N/A as implicit padding is used (see“Integrity overhead”).

Image Maximum output overhead: the largest amount of output data produced by the algorithm. It can be an Initialization Vector, an integrity check value or both. For encryption and combined-mode algorithms, this comes on top of the ciphertext (which is the same length as the original plaintext plus the input padding).

Image Maximum ICV padding: for encryption algorithms, the largest amount of padding required before the ICV (if present) to align it with a 4-byte boundary; for integrity algorithms (only when used in Authentication Header), the largest amount of padding required after the ICV to align the payload to a 4- or 8-byte boundary (see “Authentication Header Overhead”).

Image
Image

Table 15-7 Worst Case Maximum Overhead

In this list, all encryption algorithms that require the input to be aligned to a certain block size (that is, all except NULL, AES-GCM, AES-CCM, and AES-GMAC) have a block size that is a multiple of 4 bytes. Therefore, after encryption the ICV (if present) is already aligned on a 4-byte boundary as required by Encapsulation Security Payload, and no extra padding is necessary. For the others, when they are used in combination with data integrity (which is the case by definition for the three combined algorithms and is mandated by the RFC for ESP-NULL), up to 3 additional bytes may be required to align the ICV.

Maximum Encapsulation Security Payload Overhead

The maximum Encapsulation Security Payload overhead for a given encryption or combined-mode algorithm and an optional integrity algorithm is the sum of:

Image 20 bytes (tunnel mode over IPv4 only)

Image 40 bytes (tunnel mode over IPv6 only)

Image 24 bytes (GRE/IPv4 only) + 4 bytes (if tunnel key configured)

Image 44 bytes (GRE/IPv6 only) + 4 bytes (if tunnel key configured)

Image 10 bytes (fixed header/trailer fields)

Image the maximum input padding of the encryption/combined algorithm

Image the maximum output overhead of the encryption/combined algorithm

Image the maximum ICV padding of the encryption/combined algorithm

Image the maximum output overhead of the integrity algorithm (if configured)

Maximum Authentication Header Overhead

The maximum Authentication Header overhead for a given integrity algorithm is the sum of:

Image 20 bytes (tunnel mode over IPv4 only)

Image 40 bytes (tunnel mode over IPv6 only)

Image 24 bytes (GRE/IPv4 only) + 4 bytes (if tunnel key configured)

Image 44 bytes (GRE/IPv6 only) + 4 bytes (if tunnel key configured)

Image 12 bytes (fixed header fields)

Image the max. output overhead of the integrity algorithm

Image the max. ICV padding of the integrity algorithm

Extra Overhead

The above formulas do not account for additional overhead that may exist due to additional encapsulation of specific features, such as IPv4 options or IPv6 extension headers (for example, routing headers) and UDP encapsulation for NAT traversal that may be added to the IPsec packet.

Example 1

Consider IPv4 Encapsulation Security Payload tunnel mode with AES-CBC-128 and HMAC-SHA-1-96. The contributing factors to the IPsec overhead are:

Image 20 bytes for the tunnel mode IPv4 header

Image 10 bytes for the fixed header/trailer fields

Image up to 15 bytes for the encryption padding

Image 16 bytes for the encryption overhead (IV)

Image 12 bytes the for integrity overhead (ICV)

This produces a total of 73 bytes, which matches the highest number in Table 15-6.

The smallest plaintext IPv4 packet size that would result in this worst case overhead is 31 bytes (20 bytes IPv4 header + 11 bytes payload). After adding the Pad Length and Next Header fields (1 byte each), the extended plaintext is 33 bytes long and requires 15 bytes of padding to reach 48 bytes and be a multiple of the 16-byte AES block size.

Figure 15-5 shows the structure of the packet after encapsulation.

Image

Figure 15-5 Packet Structure Post-Encapsulation

Example 2

Consider GRE/IPsec over IPv6 with Encapsulation Security Payload transport mode and AES-GCM-256. A GRE tunnel key is configured. The contributing factors to the IPsec overhead are:

Image 48 bytes for GRE (40 bytes for IPv6, 4 bytes for GRE, 4 bytes for the tunnel key)

Image 10 bytes for the fixed header/trailer fields

Image 24 bytes for the combined encryption and integrity overhead (IV and ICV)

Image up to 3 bytes for the ICV padding

This produces a total of 85 bytes.

Please refer the Cisco IPsec Overhead Calculator tool at https://cway.cisco.com/tools/ipsec-overhead-calc/ipsec-overhead-calc.html for a quick calculation of IPsec overhead based on various input parameters such as plaintext packet size, IPsec mode, protocols, and algorithms.

IPsec and Fragmentation

This section describes the factors that determine the fragmentation of IP packets before and after IPsec encryption and the impact of fragmentation and reassembly.

Maximum Transmission Unit

The maximum transmission unit (MTU) of a link is the largest protocol data unit (PDU) that can be carried by a Layer 2 link. On a regular Ethernet link the MTU is 1500 bytes, meaning that an Ethernet frame will carry up to 1500 bytes of payload data. The frame as seen on the wire is actually larger, since Ethernet adds a 14-byte Ethernet header (6 bytes for the destination MAC address, 6 bytes for the source MAC address and 2 bytes for the EtherType value), an optional 802.1Q VLAN tag (4 bytes) and a frame check sequence (4 bytes) for a total of up to 1518 bytes.

Additional layers of encapsulation may reduce the MTU of the link; for example, PPP over Ethernet (PPPoE) adds 8 bytes of overhead for a resulting MTU of (1500 – 8) = 1492 bytes.

On a Cisco router, the MTU of a physical interface is set by the interface driver. Some interface types (for example, virtual Ethernet interfaces) have a default MTU that can be overridden using the mtu command, to make room for additional encapsulation or adjust the MTU to that of the underlying physical link. The interface MTU takes effect for all protocols and is displayed in the output of show interfaces (as shown in the following example).

Router#show interfaces FastEthernet4
FastEthernet4 is up, line protocol is up
  Hardware is PQUICC_FEC, address is 0021.5528.7f84 (bia 0021.5528.7f84)
  Internet address is 10.48.66.98/23
  MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation 802.1Q Virtual LAN, Vlan ID  1., loopback not set
  Keepalive set (10 sec)

Separate MTU values specific to the IP protocol family can be configured using the ip mtu and ipv6 mtu commands; those MTU values do not affect any protocols other than IPv4 and IPv6. The IP MTU is displayed in the output of show ip interfaces and show ipv6 interfaces (as shown in the following example).

Router#show running-config interface FastEthernet4
interface FastEthernet4
 ip address 10.0.0.1 255.255.255.0
 ip mtu 1400
 ipv6 enable
 ipv6 mtu 1300
end

Router#show ip interface FastEthernet4
FastEthernet4 is up, line protocol is up
  Internet address is 10.0.0.1/24
  Broadcast address is 255.255.255.255
  Address determined by setup command
  MTU is 1400 bytes
  ...

Router#show ipv6 interface FastEthernet4
FastEthernet4 is up, line protocol is up
  IPv6 is enabled, link-local address is FE80::221:55FF:FE28:7F84
  No Virtual link-local address(es):
  No global unicast address is configured
  Joined group address(es):
    FF02::1
    FF02::1:FF28:7F84
  MTU is 1300 bytes
  ...

The length of the forwarded packets is checked against the interface MTU before the interface encapsulation in the outbound direction only. If the packet is larger than the MTU (or the relevant IP MTU), it cannot be forwarded as-is. In the case of IPv4 traffic, it must be fragmented (if allowed) before being forwarded. In the case of IPv6 traffic, the packet must be dropped, and the originating host must be notified.

Fragmentation in IPv4

IPv4 packets may be fragmented at any point along the path from source to destination. In addition, fragments of a packet may be further fragmented along the path to the destination. Reassembly is always performed by the destination host. This does not mean that intermediate nodes may not process the fragments (for example, by firewalls or IPS performing virtual reassembly, buffering the fragments in order to perform inspection of the reassembled packet), but the packet remains fragmented on the wire.

Fragmentation involves three fields in the IPv4 header (illustrated in Figure 15-6):

Image the 16-bit (2-byte) Identification field (sometimes referred to as the IP ID), an identifier for fragments of an original packet

Image the 3-bit Flags field, containing the Do not Fragment (DF) and More Fragments (MF) flags (1 bit each + 1 reserved)

Image the 13-bit Fragment Offset field, indicating the offset (in 8-byte units) of the current fragment within the original packet

Image

Figure 15-6 IP Header and Fragmentation

RFC 791 specifies that the Identification value of an IPv4 packet must be different from that of all other IPv4 packets from the same flow (source/destination/protocol 3-tuple) for the time the packet will be present on the network. In practice with today’s high-speed networks, a 2-byte field has become much too small for such a requirement; this may lead to problems with reassembly on high-throughput links (see “The Impact of Fragmentation”). The IPv6 protocol avoids such issues by taking a completely different approach towards fragmentation (see “Fragmentation in IPv6”).

Upon creation of an IP datagram, the MF bit and Fragment Offset field are set to 0. The DF bit may be set to 0 or 1, depending on the local configuration and PMTUD settings (see “Path MTU Discovery”). If at any point on the path (this can be on the originating node or on any intermediate node) the packet is too large for the outgoing interface MTU, the forwarding node will react based on the value of the DF bit in the IP header.

If the DF bit is set to 1, the packet will be dropped, and an ICMP Destination Unreachable packet (type 3, code 4: fragmentation needed and DF set) will be sent back to the source of the dropped packet, including the MTU value that triggered the drop so that the source host can adjust its packet size (see “Path MTU Discovery”) as well as a portion of the packet (IP header + first 64 bits of payload). The ICMP notification is generated only if the interface that the packet was received on has the ip unreachables command configured, which is the default configuration.

If the DF bit is set to 0, the IP payload must be split across two or more new IP packets. The original IP header is copied into the header portion of the new fragment packets, and the header fields are adjusted as follows:

Image the MF field is set to 1 in all but the last fragment

Image the Fragment Offset reflects the offset of the payload in each fragment within the payload of the original unfragmented packet, in units of 8 bytes, starting at 0 in the first fragment

Image the Total Length and Header Checksum are adjusted for each fragment

Figure 15-7 shows a schematic view (not to scale) of IPv4 fragmentation for a 1500-byte UDP packet going through an interface with a 1400-byte MTU. The IP payload is composed of an 8-byte UDP header and a 1472-byte UDP payload. The first fragment contains the UDP header and the beginning of the UDP payload, while the second fragment only contains the end of the UDP payload. Note that for clarity the Fragment Offset (FO) for the second fragment is set to 1400; however, within the IP header this is value is defined as a multiple of 8 octets, so it would actually be seen as 1400/8 = 175.

Image

Figure 15-7 IPv4 Fragmentation

The packet may be split in multiple ways depending on the software or hardware performing the fragmentation. A 1500-byte payload going through an interface with a 1400-byte MTU could be split as:

Image a first fragment of 1400 bytes and a second fragment of 120 bytes

Image a first fragment of 120 bytes and a second fragment of 1400 bytes

Image a first fragment of 752 bytes and a second fragment of 748 bytes

The only constraint is that the payload length of all but the last fragment must be a multiple of 8 bytes (because of the Fragment Offset field).

Reassembly is performed only on the destination host, by buffering fragments as they arrive until the complete packet can be reassembled. Fragments of a single original packet are identified based on the source, destination, protocol, identification 4-tuple. If not all fragments are received within a certain amount of time, the already received ones are discarded to free up memory. There are no retransmissions; thus the loss of a single fragment means the loss of the entire packet.

The show ip traffic command on IOS and show platform hardware qfp active feature ipfrag global on IOS-XE allows for viewing of traffic statistics pertaining to fragmentation of IPv4 traffic. The following example illustrates the use of this command.

Router#show ip traffic | include reassembled|fragmented
  Frags: 7 reassembled, 0 timeouts, 0 couldn't reassemble
         7 fragmented, 14 fragments, 0 couldn't fragment

Fragmentation in IPv6

IPv6 takes a different approach than IPv4 when it comes to fragmentation. In IPv6, packets are never fragmented on the path, only on the originating host. A packet that is too large for the egress MTU must be dropped, and an ICMPv6 message (type 2, code 0: packet too big) must be generated, similar to the way IPv4 behaves when the DF bit is set. The ICMPv6 notification is generated only if the interface that the packet was received on (typically the one facing the source host) has ipv6 unreachables configured, which is the default.

This behavior enables the source host to perform Path MTU Discovery (see next topic) and adapt its Path MTU to send smaller packets. Any Layer 2 link used to carry IPv6 must be capable of forwarding datagrams of up to 1280 bytes (IPv6 header included).

Fragmentation is signaled through the addition of a Fragment extension header in the IPv6 fragments. An IPv6 packet that needs to be fragmented is first split into two parts:

Image an unfragmentable part: the IPv6 header plus some of the extension headers, if any (those that must be processed by nodes on the way to the destination)

Image a fragmentable part: the rest of the extension headers, if any, and the IPv6 payload data

First the unfragmentable part is replicated into all fragments, and a Fragment extension header is added at the end. Figure 15-8 shows the structure of the IPv6 Fragment extension. It contains three fields with similar semantics as their counterparts in IPv4:

Image the 13-bit Fragment Offset field is the offset in 8-byte units of the fragment payload in the original packet payload (nonzero in all fragments except the first)

Image the M flag has the same meaning as the MF flag in IPv4 (more fragments follow; it is set to 1 in all fragments except the last)

Image the 32-bit Identification field used to correlate fragments from the same original packet

Image

Figure 15-8 IPv6 Fragmentation Extension

An equivalent to the IPv4 DF bit is not required, since it is forbidden to fragment IPv6 packets in transit. The Identification field is 32 bits long and populated only when the Fragment extension is present, which makes collisions virtually impossible.

Reassembly is performed by the destination host based on the source/destination/protocol/identification 4-tuple. All fragments must be received within 60 seconds starting with the reception of the first-arriving fragment (not necessarily that with an offset of zero, as fragments may be reordered in transit); after that time, if one or more fragments are missing, the ones that have been received are discarded. If the first fragment (with an offset of zero) was received, an ICMPv6 Time Exceeded message (type 3, code 1: fragment reassembly time exceeded) is sent back to the source.

The show ipv6 traffic command on IOS and show platform hardware qfp active feature ipfrag global on IOS-XE allows for viewing of traffic statistics pertaining to fragmentation of IPv6 traffic. The following example illustrates the use of this command.

Router#show ipv6 traffic | include reassembled|fragmented
            17 fragments, 8 total reassembled
            10 fragmented into 20 fragments, 0 failed

Path MTU Discovery

By default, a source host that is sending IP traffic out of an interface will send IP datagrams that are as big as the MTU on the output interface will allow. For a standard Ethernet interface without jumbo frame support, IP datagrams will be 1500 bytes long (IP header included, that is, 1480 bytes of payload if no IP options are present).

This packet size may not be optimal, as the MTU one or more hops away may be lower than 1500 due to different link types or encapsulation, requiring fragmentation on the path and reassembly at the destination host. RFC 1191 describes a technique known as Path MTU Discovery (PMTUD) for dynamically learning about the lowest MTU on the path to a certain IPv4 destination and adjusting the size of output packets accordingly. Some of the considerations in RFC 1191 are outdated and no longer apply to modern networks; however the basic principles it outlines are still in use today. PMTUD for IPv6 is defined in RFC 1981 and is based on the same principles.

In IPv4, PMTUD requires setting the DF bit in the IP header of all generated packets. In IPv6, the DF bit does not exist and is implicit: IPv6 traffic may not be fragmented except by the source host itself. In both cases, if the packet happens to be too large to be forwarded by one of the hops on the path, it is discarded, and an ICMP or ICMPv6 message of the appropriate type (often simply referred to as Cannot Fragment or CF) is generated and sent back to the source indicating the MTU value that caused the drop.

A source host with PMTUD enabled will process this notification, update a per-destination or per-path record, and inform higher-layer protocols (for example, TCP) of the change. The intent is to keep the higher layer (referred to as the packetization layer) informed of the maximum size of the data that it may send to IP for forwarding. Most modern operating systems perform PMTUD by default.

Figure 15-9 shows the basic principle of Path MTU Discovery with IPv4.

Image

Figure 15-9 Path MTU Discovery

This mechanism remains enabled at all times, so that an MTU change in the middle of a connection can be detected. While MTU discovery generally occurs at the beginning of a data transfer, it can take place multiple times afterwards depending on the evolution of packet sizes received from the upper layers and/or changes on the path to the destination.

There is a common misconception that PMTUD is only supported in conjunction with TCP. That is the case only for certain implementations; PMTUD can theoretically work with any protocol that can adapt its datagram size (for an example, see “Tunnel PMTUD” later in this chapter). It is true however that a connection-based protocol like TCP is naturally well-adapted to keeping track of the maximum allowed packet size, since it already maintains descriptor structures on a per-connection basis.

Path MTU Discovery is based on negative signaling. No news is good news in theory, but not always in practice: the absence of signaling may be due to ICMP filtering on the return path towards the source. Specific precautions may be required if this is the case. Also, there is no positive signaling if the MTU goes up (typically when the traffic gets rerouted through a different path with a higher Path MTU). For this reason, information learned from PMTUD expires after a certain amount of time to enable re-discovery of a potentially higher Path MTU.


Note

Another method for performing PMTUD, this time at the packetization layer, is described by RFC 4821, Packetization Layer Path MTU Discovery. This method is based on probes generated by the upper layer protocol (TCP for example), with increasing packet sizes in order to detect the Path MTU in a more reliable way. That method is still not widely deployed and is not covered in this book.


TCP MSS Clamping

While PMTUD provides a way to automatically adapt the size of TCP datagrams based on the discovered MTU, it still has disadvantages: it requires the loss and retransmission of one or more packets (depending on how many drops in MTU there are on the path), requires proper behavior from all filtering devices on the return path to deliver the ICMP messages back to the source, and may be broken if a device on the path to the destination clears the DF bit in IPv4 traffic. These last two points are particularly problematic, because they potentially involve devices that are outside the administrative domain of the sender.

One way to work around situations where PMTUD cannot be relied upon is to adjust the TCP Maximum Segment Size (MSS) on the fly, a method known as MSS clamping.

MSS Refresher

The MSS is a value advertised during the TCP 3-way handshake within the initial SYN and SYN ACK packets. It is advertised, as opposed to negotiated: each connection endpoint informs its peer of the largest TCP segment size that it is prepared to receive. The MSS represents the amount of data in the TCP packet, excluding the TCP header, options, and all lower-layer headers (IP and below), as illustrated in Figure 15-10. The typical length for the IP and TCP headers is 20 bytes each (though this value can be greater if IP or TCP options are present).

Image

Figure 15-10 TCP Maximum Segment Size (MSS)

The advertised MSS is generally chosen based on the available buffer size for single TCP segments and the MTU on the output interface for the connection. The Path MTU to the remote endpoint (if known) should not be taken into account, as the MSS is relevant to inbound traffic, while the Path MTU is relevant to outbound traffic, which may be going through a different path (asymmetric routing situation).

If PMTUD signals to TCP that the Path MTU to a destination has decreased, the MSS that was recorded for that destination may need to be adjusted so that TCP packets remain within the limits of the Path MTU after encapsulation in IP.

Figure 15-11 illustrates the initial exchange of MSS during the 3-way handshake, as well as a simple example of automatic adjustment of the effective MSS triggered by PMTUD.

Image

Figure 15-11 TCP MSS exchange

MSS Adjustment

In scenarios where PMTUD is not possible or reliable (for example, ICMP filtering by a firewall or IPS) or the Path MTU to some or all destinations is known to be lower than the output interface MTU on source hosts, it may be desirable to adjust the TCP MSS in both directions to a value low enough to ensure that PMTUD will not be required (assuming that the Path MTU remains above a certain value).

The TCP MSS in the SYN packets is controlled by the TCP endpoints, however Cisco routers can perform adjustment (also referred to as clamping) of the TCP MSS on the fly. This is accomplished by configuring the ip tcp adjust-mss value command on an interface traversed by the SYN packets (see the following example). Note that this command only takes effect for TCP connections going through the router, not those initiated by the router. For TCP connections originating or terminating on a router, the TCP MSS is controlled using the ip tcp mss value command,

interface FastEthernet0/0
 ip address 209.165.201.1 255.255.255.252
 ip tcp adjust-mss 1416
 ipv6 address 2001:DB8:FFF1::10/127
 ipv6 enable

The adjustment should be performed in a conservative way (lower is safer): if the command is configured on both the input and output interface traversed by the SYN packet, the MSS is adjusted to the lowest of the two configured values, and if the MSS in the SYN packet is already lower than the configured value, it is not modified.


Note

Initially the ip tcp adjust-mss command only affected TCP sessions going over IPv4. The behavior then changed in IOS 15.2(4)M and IOS-XE 15.3(1)S, in which the command was updated to also affect TCP sessions over IPv6. However, using a single MSS value for the two protocols did not prove to be practical due to the 20 extra bytes in the IPv6 header compared to IPv4 (the MSS had to be configured either 20 bytes too high for IPv6, or 20 bytes too low for IPv4). Because of this, IOS 15.3(3)M and IOS-XE 15.3(3)S reverted to the IPv4-only behavior of the original command, and a new ipv6 tcp adjust-mss command was introduced, enabling separate MSS adjustment for TCP sessions over IPv4 and IPv6.


Figure 15-12 shows a simple example of MSS clamping when the Path MTU is known to be lower in the middle of the path to the IPv4 destination because of the presence of a GRE/IPv6 tunnel (44 bytes of overhead). Since the Path MTU for IPv4 between the TCP endpoints is known to be 1456, the TCP MSS is clamped to a value of (1456 – 40) = 1416. Even though it is sufficient to configure the ip tcp adjust-mss command on only one of the routers (as illustrated) to adjust the MSS in both directions, it would be good practice to configure the command on both sides.

Image

Figure 15-12 MSS clamping

IPsec Fragmentation and PMTUD

RFC 4301, “Security Architecture for IP,” mandates that IPsec implementations behave a specific way with regard to fragmentation and MTU.

With tunnel mode IPsec over IPv4, the DF bit in the outer IP header may be cleared, set, or copied from the inner IPv4 header. All three options must be available through configuration.

Every entry in the SA database contains a Path MTU value that represents the largest plaintext packet size (prior to encapsulation within IPsec) that it can protect. The initial value is derived from the output interface MTU minus the IPsec overhead, and the Path MTU is updated based on ICMP CF (ICMP 3/4 or ICMPv6 2/0) issued by downstream routers for IPsec packets. Upon receipt, the SPI in the original IPsec header is extracted from the portion of the dropped packet included in the ICMP message, and the Path MTU in the SA database entry for that SPI is updated after subtracting the IPsec overhead from the MTU in the ICMP packet.

When an outbound plaintext packet exceeds the Path MTU for the SA that will protect it, the encapsulating router handles it as follows:

Image IPv4 traffic with the DF bit clear may be fragmented before or after encapsulation in IPsec tunnel mode. Both options must be available through configuration. IPsec transport mode will fragment only after encapsulation.

Image IPv4 traffic with the DF bit set and IPv6 traffic are dropped, and an ICMP or ICMPv6 CF message is generated towards the source to enable PMTUD.

On Cisco routers, fragmentation before encapsulation (known as pre-fragmentation) or after (known as post-fragmentation) is controlled by the command crypto ipsec fragmentation {before-encryption | after-encryption}. Despite the name, the command applies to both Encapsulation Security Payload and Authentication Header. By default, IPsec will fragment before. The DF bit in the outer IP header is controlled by the crypto ipsec df-bit {clear | set | copy} command. The default is to copy the DF bit from the inner header. These commands may be configured globally and/or per interface (with per-interface taking precedence).


Note

It is important to note that these commands only affect the behavior of the IPsec layer, not that of the tunnel layer (VTI or GRE/IPsec). Tunnel interfaces are hit before crypto in the forwarding path of the egress packet, and the tunnel MTU is a determining factor in deciding whether to pre-fragment or reject a packet. Please continue to “Fragmentation on Tunnels” to get a complete picture of fragmentation in tunnel-based configurations.


Figure 15-13 illustrates Encapsulation Security Payload tunnel mode fragmentation before and after encryption. The Interface MTU (1500) is the MTU of the physical interface that the IPsec traffic is sourced from. The Path MTU (1427) is initialized by subtracting the maximum IPsec overhead (73 bytes in this example) from the Interface MTU. Note the figure payload sizes are not to scale.

Image

Figure 15-13 Pre- and Post-Encryption Fragmentation

The impact of fragmenting before or after IPsec will be discussed in “The Impact of Fragmentation” later in this section.

Figure 15-14 illustrates PMTUD on the source host in conjunction with IPsec PMTUD. If tunnel mode is used, this configuration requires setting or copying the DF bit to the outer IP header. The example shows IPv4 with the DF bit set, but the same logic applies to IPv6 with the implicit DF.

Image

Figure 15-14 PMTUD and IPsec

One packet is lost when performing PMTUD at the entrance of the IPsec tunnel, plus two more packets for every drop in MTU on the path of the IPsec traffic: one to adjust the Path MTU on the SA, and another to adjust the Path MTU on the host. Further drops in MTU after the exit of the IPsec tunnel would only cause the loss of a single packet, but the ICMP CF message may or may not reach the source host, as it probably needs to be routed back through the IPsec tunnel. Since the source address of the ICMP packet is the router that dropped the traffic instead of the original destination, there is no guarantee that this address is also covered by an IPsec Security Association.

Until IOS 15.3(2)T and IOS-XE 15.3(2)S it was difficult to find out the SA Path MTU that was applied to plaintext traffic on IOS, as the output of show crypto ipsec sa only displayed the MTU and Path MTU as seen by IPsec on the transport network (including the encapsulation overhead). The following examples show an SA database entry that uses a physical interface with an MTU of 1500, but got an ICMP CF for IPsec traffic that indicated a path MTU of 1400.

The following example shows the output of show crypto ipsec sa collected on IOS 15.2(4)M. The SA Path MTU as seen by the plaintext traffic is not displayed and must be computed manually (or if the IPsec Security Association is bound to a VTI, it can be found in the output of show ip interfaces—see “Fragmentation on Tunnels” for details).

Router#show crypto ipsec sa | include mtu
     path mtu 1400, ip mtu 1500, ip mtu idb Ethernet0/0

The following example shows the same output collected on 15.3(3)M, now also including the plaintext MTU.

Router#show crypto ipsec sa | include mtu
     plaintext mtu 1342, path mtu 1400, ip mtu 1500, ip mtu idb Ethernet0/0

If the IPsec tunnel endpoints are IPv6, the path MTU to the peer address also appears in the IPv6 MTU table as illustrated in the following example.

Router#show crypto ipsec sa | include mtu
     path mtu 1400, ipv6 mtu 1500, ipv6 mtu idb Ethernet0/0

Router#show ipv6 mtu
 MTU     Since    Source Address      Destination Address
1400    00:00:11  2001:DB8:FFF1::10   2001:DB8:FFF1::2510

Fragmentation on Tunnels

Tunnel interfaces introduce an additional MTU check that takes place when the packet is routed through the tunnel, presenting an opportunity to take care of fragmentation before any encapsulation takes place—it is therefore crucial to understand the interaction between the different layers in order to reach a robust and optimal configuration. The following subsections detail the characteristics of the main modes of encapsulation used in this book.

IPsec Only (VTI)

A native IPsec tunnel interface (sVTI/dVTI) represents direct encapsulation into IPsec, thus by default the IP MTU of the tunnel is the same as the MTU on the associated IPsec Security Association. This default value can be overridden by explicitly configuring ip mtu or ipv6 mtu. The MTU for protocols other than IP (as displayed in show interfaces) is not relevant on VTI, since native IPsec can only protect IP traffic.

In the following example the tunnel is sourced from a loopback interface; that is, the IKE and IPsec traffic terminate locally on the IP address of the loopback. However, it is really the MTU of the physical output interface that matters. The IPsec code determines which physical interface is facing the remote peer and uses it as the basis for MTU calculations (Ethernet0/0 in the example). The same interface also shows up in the output of show cef interface for the VTI.

Router#show running-config interface Tunnel0
interface Tunnel0
 ip address 10.0.0.1 255.255.255.252
 tunnel source Loopback1
 tunnel mode ipsec ipv4
 tunnel destination 209.165.200.225
 tunnel protection ipsec profile default
end

Router#show crypto ipsec sa | include mtu
     path mtu 1500, ip mtu 1500, ip mtu idb Ethernet0/0

Router#show interface Tunnel0 | include MTU
  MTU 17878 bytes, BW 100 Kbit/sec, DLY 50000 usec,
  Tunnel transport MTU 1438 bytes

Router#show ip interface Tunnel0 | include MTU
  MTU is 1438 bytes

Router#show cef interface Tunnel0 | include output interface
  Real output interface is Ethernet0/0

As with all tunnel interfaces, the packet length is checked against the MTU of the interface before the encapsulation takes place. For VTI, it is before the packet is handed over to the crypto engine; this means that VTI always performs pre-fragmentation regardless of the crypto ipsec fragmentation setting, which only applies once the packet has hit the crypto layer. The only case in which a VTI may do post-fragmentation is if the IP MTU of the VTI is hardcoded to a value that is higher than the IPsec Security Association PMTU (due to misconfiguration, or a drop of MTU on the path that was discovered through PMTUD).

Figure 15-15 shows a block representation of VTI encapsulation and the MTU values encountered during processing of the packet.

Image

Figure 15-15 VTI encapsulation with MTU

GRE Only

The default MTU of a GRE tunnel interface without IPsec is equal to the IP MTU of the output interface minus the IP and GRE encapsulation overhead. Just as with VTI, the tunnel code is aware of the actual output interface and will select the correct physical IP MTU as a basis, even if sourced from a loopback. Since GRE is capable of transporting any kind of protocol including non-IP, the tunnel code will adjust the MTU for all protocols (as reported in the show interfaces command).

The smallest possible GRE overhead is 24 bytes (GRE/IPv4 without optional fields) and the largest is 56 bytes (GRE/IPv6 with tunnel key, checksum and sequence numbering), without counting optional extension headers. The following example shows a GRE/IPv6 tunnel with a tunnel key configured, amounting to 48 bytes of overhead in total.

Router#show running-config interface Tunnel0
interface Tunnel0
 ip address 10.0.0.1 255.255.255.252
 tunnel source Ethernet0/0
 tunnel mode gre ipv6
 tunnel destination 2001:DB8:FFF1::2510
 tunnel key 1
end

Router#show interface Tunnel0 | include MTU
  MTU 1452 bytes, BW 100 Kbit/sec, DLY 50000 usec,
  Tunnel transport MTU 1452 bytes

Router#sh ip interface Tunnel0 | include MTU
  MTU is 1452 bytes

The IP MTU on the tunnel can be adjusted with the ip mtu or ipv6 mtu commands. Both can be configured simultaneously, because GRE allows for dual-stack (IPv4 and IPv6 inside the same GRE tunnel). The MTU for other protocols cannot be adjusted (the mtu command is not allowed on tunnel interfaces).

Figure 15-16 shows a block representation of GRE encapsulation and the MTU values encountered during processing of the packet.

Image

Figure 15-16 GRE encapsulation with MTU

An important characteristic of GRE over IPv4 is that by default it clears the DF bit in the outer IP header, so the check against the physical interface MTU may result into fragmentation even if the inner IP packet does have the DF bit set. The solution to this problem is covered in “Tunnel PMTUD.”

GRE over IPsec

Adding IPsec on top of a GRE interface (with the tunnel protection command) adds a layer of IPsec encapsulation right after the creation of the GRE packet. Just like the behavior of VTI, the MTU on the GRE tunnel is bound to that of the IPsec Security Association, so that a drop in the SA Path MTU is reflected on the MTU of the GRE tunnel. The GRE interface also allows for manually setting the IP MTU on the GRE interface to ensure that all produced GRE packets will be small enough to fit inside the IPsec plaintext MTU.

Figure 15-17 shows a block representation of GRE/IPsec encapsulation and the MTU values encountered during processing of the packet.

Image

Figure 15-17 GRE over IPsec MTU

Two options exist in order to properly set the IP MTU on the GRE tunnel. The first is to compute the maximum GRE/IPsec overhead, subtract it from the physical interface MTU, and configure it using ip mtu or ipv6 mtu on the tunnel interface. This is usually done in conjunction with adjusting the TCP MSS (see “TCP MSS Clamping” earlier in this section). The second option is Tunnel PMTUD (see below).

Tunnel PMTUD

PMTUD discovery allows for tunnel interfaces to automatically adjust the MTU based on received ICMP CF messages. To enable a tunnel interface to automatically adjust MTU based on PMTUD, ICMP unreachable messages must be received on the source interface of the tunnel. This requires any access control policy, including an ACL applied on ingress to the tunnel source, to permit ICMP 3/4 or ICMPv6 2/0 messages (also known as ICMP CF messages).

A GRE IPsec tunnel interface will by default not copy the DF bit from the protected IP header to the outer GRE header. This in some instances can cause issues due to fragmentation, but it also results in PMTUD not functioning, as traffic must have the DF bit set for this to occur. To enable PMTUD on a GRE IPsec tunnel interface, the tunnel path-mtu-discovery command must be enabled (the default behavior is not enabled). This will enable copying of the DF bit from the inner protected IP header to the outer GRE IP header.

Once PMTUD has been enabled on a tunnel interface, it can be verified as illustrated in the following example.

Router#show interface tunnel 1 | include Path MTU
  Path MTU Discovery, ager 10 mins, min MTU 92

There are a number of options for the PMTUD command. The following example illustrates the age timer and the minimum MTU size.

Router(config)#interface tunnel 1
Router(config-if)#tunnel path-mtu-discovery ?
  age-timer  Set PMTUD aging timer
  min-mtu    Min pmtud mtu allowed
  <cr>

Router(config-if)#tunnel path-mtu-discovery age-timer ?
  <10-30>   Aging time
  infinite  Disable pathmtu aging timer

Router(config-if)#tunnel path-mtu-discovery min-mtu ?
  <92-65535>  Bytes

VTI will by default adjust the tunnel MTU when ICMP CF messages are received. The command tunnel path-mtu-discovery has no effect on a VTI. A VTI will by default copy the DF value in the IP header from the protected IP packet to the outer IP header. In other words, out-of-the-box VTI will always perform PMTUD as long as the inner protected traffic is marking the DF bit in the IP header.

As the tunnel path-mtu-discovery command has no effect for a VTI, if the behavior needs to be changed, then clearing the IP DF bit is possible on the interface. This is achieved by using the crypto ipsec df-bit clear command on the VTI or globally.

For either VTI or GRE IPsec, if PMTUD is enabled with the tunnel path-mtu-discovery and the tunnel interface has the ip mtu command applied, then the lowest MTU will take precedence. So if the dynamically obtained MTU via PMTUD is lower than the configured IP MTU, the PMTUD MTU will be used; otherwise the IP MTU will be used.

The Impact of Fragmentation

There are several issues that make IP fragmentation undesirable. When traffic requires fragmentation, there is an increase in CPU and memory overhead to fragment an IP datagram. This holds true for the sender as well as for a router in the path between a sender and a receiver. As the creation of fragments simply involves generating a fragment header and copying the original datagram into the fragment, it can be achieved fairly efficiently because all the information needed to create the fragments is immediately available.

Fragmentation causes more overhead for the receiver when it reassembles the fragments, because the receiver must allocate memory for the arriving fragments and reconstruct all of them back into one datagram once all of the fragments are received. Reassembly on a host is generally not considered a problem, because the host in many circumstances has the time and memory resources to devote to this task.

Reassembly is very inefficient on a router whose primary job is to forward packets as quickly as possible. A router is not designed to hold on to packets for any length of time. Also a router that does reassembly chooses the largest buffer available (18KB) with which to work because it has no way to know the size of the original IP packet until the last fragment is received. Reassembly of packets fragmented after encryption on a router can cause the packets to arrive out of order at the destination host; also the reassembled packet can fall out of the anti-replay window.

In many instances when fragmentation is occurring, network performance will be severely degraded, with reports of slow connectivity and intermittent access.

Another fragmentation issue involves how dropped fragments are handled. If one fragment of an IP datagram is dropped, then the entire original IP datagram must be resent, and it will also be fragmented.

Many network devices, such as Network Address Translation (NAT) gateways or load balancers, might have trouble processing IP fragments correctly. If the IP fragments are out of order, a NAT gateway blocks the non-initial fragments because they do not carry the information that would match the packet filter. This would mean that the original IP datagram could not be reassembled by the receiving host. In many cases, simple access control lists (ACL) do not reconstruct fragments, so they do not apply policy on fragment traffic, either allowing or denying all fragments.

Summary

IPsec always incurs a form of overhead; however, depending on the algorithms used and the method of encapsulation, the amount of overhead can vary. Encapsulation Security Payload, Authentication Header, GRE and the modes of IPsec all have defined overheads.

Depending on the overhead generated, the MTU for tunnel interfaces is automatically adjusted. This can be manually configured or dynamically adjusted using PMTUD. There are considerations when using PMTUD for both GRE IPsec and VTI.

One of the side effects of IPsec overhead is fragmentation; this is the nemesis of many network architects and engineers who are tasked with understanding why network connectivity is degraded. Enabling PMTUD or manually setting the IP MTU on a tunnel interface can mitigate issues resulting from fragmentation.

References

https://tools.ietf.org/html/rfc791 Internet Protocol

https://tools.ietf.org/html/rfc1191 Path MTU Discovery

https://tools.ietf.org/html/rfc1981 Path MTU Discovery for IP version 6

https://tools.ietf.org/html/rfc2401 Security Architecture for the Internet Protocol

https://tools.ietf.org/html/rfc2403 The Use of HMAC-MD5-96 within Encapsulation Security Payload and Authentication Header

https://tools.ietf.org/html/rfc2404 The Use of HMAC-SHA-1-96 within Encapsulation Security Payload and Authentication Header

https://tools.ietf.org/html/rfc2410 The NULL Encryption Algorithm and Its Use With IPsec

https://tools.ietf.org/html/rfc2451 The Encapsulation Security Payload CBC-Mode Cipher Algorithms

https://tools.ietf.org/html/3602 The AES-CBC Cipher Algorithm and Its Use with IPsec

https://tools.ietf.org/html/rfc3566 The AES-XCBC-MAC-96 Algorithm and Its Use With IPsec

https://tools.ietf.org/html/3686 Using Advanced Encryption Standard (AES) Counter Mode with IPsec Encapsulating Security Payload

https://tools.ietf.org/html/rfc4106 The Use of Galois/Counter Mode (GCM) in IPsec Encapsulating Security Payload

https://tools.ietf.org/html/rfc4301 Security Architecture for the Internet Protocol

https://tools.ietf.org/html/rfc4302 IP Authentication Header

https://tools.ietf.org/html/rfc4303 IP Encapsulating Security Payload

https://tools.ietf.org/html/rfc4309 Using Advanced Encryption Standard (AES) CCM Mode with IPsec Encapsulating Security Payload

https://tools.ietf.org/html/rfc4494 The AES-CMAC-96 Algorithm and Its Use with IPsec

https://tools.ietf.org/html/rfc4543 The Use of Galois Message Authentication Code (GMAC) in IPsec Encapsulation Security Payload and Authentication Header

https://tools.ietf.org/html/rfc4821 Packetization Layer Path MTU Discovery

https://tools.ietf.org/html/rfc4868 Using HMAC-SHA-256, HMAC-SHA-384, and HMAC-SHA-512 with Ipsec

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.110.183