Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

5. Analyzing Control Plane Traffic

Vinit Jain¹

(1)

San Jose, CA, USA

This chapter covers the following topics:

Analyzing routing protocol traffic
Analyzing overlay traffic

Analyzing Routing Protocol Traffic

So far, we have learned how to set up Wireshark, perform packet captures, and analyze Layer 2 to Layer 4 traffic. Most of the traffic that we have looked at so far is data traffic. When we are dealing with packet loss in the network, we usually try to understand the problem based on what is happening in the network: Is there an errored link in the network dropping the traffic? Is network congestion leading to data loss? When the data loss is happening in the network, chances are high that the data might also be control plane traffic. Although we can give separate treatment to the control plane traffic from the data traffic using QoS, that only helps prioritizing packets on the device, not on the wire. So, a packet loss can simply drop data traffic as well as control plane traffic. Thus, a control plane flap due to any amount of packet loss can still be analyzed using the methods that we have seen so far. It could also be the case, though, that control plane protocols misbehave even when there is no packet loss or congestion in the network. This chapter is focused on analyzing control plane traffic and understanding the headers and functionality of various routing protocols, diving deeper into certain cases and how we can troubleshoot them using Wireshark. Note that this chapter does not focus on teaching any control plane or data plane traffic, but just analyzing different control plane and data plane traffic, which can prove useful for network engineers. It is assumed that network engineers are well aware of the protocols discussed in this chapter.

OSPF

Open Shortest Path First (OSPF), defined in RFC 2328, is one of the well-known and most widely adopted interior gateway protocols (IGPs). It is a dynamic routing protocol that operates within a single autonomous system (AS) and is suitable for large heterogeneous networks. OSPF uses the Dijkstra algorithm, also known as the shortest path first (SPF) algorithm, to calculate the shortest path to the destination. In OSPF, the shortest path to a destination is calculated based on the cost of the route, which considers variables such as bandwidth, delay, and load.

OSPF allows network administrators to break large networks into smaller segments known as OSPF areas. This allows network administrators to reduce the OSPF areas, which are basically a collection of networks that support multiple area types:

Backbone area: Network segment that belongs to area 0.0.0.0. All other areas are either physically or virtually connected to the backbone area. Exchanging routing information between multiple nonzero or nonbackbone areas is only possible through the backbone area.
Standard nonzero area: In this area, OSPF packets are normally transmitted. This area is directly or virtually connected to the backbone area.
Stub area: This area does not allow and accept routes from external sources such as routes learned by other routing protocols and redistributed into OSPF.
Totally stubby area: This area does not accept routes from external sources and link information from other areas. Instead, a default route is advertised in this area for allowing the router in this area to reach a destination in other areas or even external sources.
Not so stubby area (NSSA): An NSSA is derived from a stub area with the difference that this area also has an Autonomous System Boundary Router (ASBR) router attached to it and learns the external routes from the redistribution happening on the ASBR.

In an OSPF area, based on the placement of the router, each router assumes different responsibilities and performs various functions. OSPF has four router types:

Backbone router: A backbone router runs OSPF and has at least one interface part of the backbone area or area 0.0.0.0.
Internal router: An internal router has OSPF adjacency only with the devices in the same area. These routers do not form adjacency across multiple areas.
Area Border Router (ABR): An ABR router forms OSPF neighbor adjacency with multiple devices in multiple areas. Because it has adjacency in multiple areas, it maintains a copy of the link-state topology database of multiple areas and distributes it to the backbone area.
ASBR: An ASBR router participates in other routing protocols apart from OSPF and exchanges the routing information learned from other protocols into OSPF and vice versa.

The OSPF routing protocol uses a link-state database (LSDB) that is formed using the information exchanged between all the routers within the area. This information exchange between the routers within the area is done using link-state advertisement (LSA). Instead of exchanging all the network and link information in a single LSA, OSPF uses different types of LSA for different network types. The following is a list of the commonly used LSAs used in OSPF for exchanging various routing updates:

Router LSA (Type 1)
Network LSA (Type 2)
Summary LSA (Type 3)
Summary ASBR LSA (Type 4)
AS External LSA (Type 5)
NSSA LSA (Type 7)

Based on the information in the LSDB, every router in an OSPF area runs the SPF algorithm on all the destination prefixes and installs the route in the routing table. Note that every router in the OSPF area has an identical copy of the LSDB. Based on the understanding of different LSA types, each area type allows for only specific type of LSAs. Table 5-1 displays different LSAs allowed in different area types in OSPF.

Table 5-1

OSPF Area to LSA Mapping

Area Type	LSAs Allowed
Backbone area	Type 1, 2, 3, 4, 5
Standard or normal area	Type 1, 2, 3, 4, 5
Stub area	Type 1, 2, 3
Totally stubby area	Type 1, 2, and Type 3 default route
NSSA	Type 1, 2, 3, 7
Totally NSSA	Type 1, 2, 7, and Type 3 default route

Now that we have learned about the basics of the OSPF routing protocol, let’s examine the most commonly seen issues in OSPF. The majority of the issues seen in OSPF are neighbor adjacency issues. When two devices form an OSPF adjacency, they can either form the adjacency over these types of networks:

Broadcast
Nonbroadcast
Point-to-point
Point-to-multipoint

We can focus on broadcast and point-to-point networks because broadcast and nonbroadcast methods both require Designated Router (DR) / Border Designated Router (BDR) election, and point-to-multipoint networks works on the same principle as point-to-point networks. To form a neighbor adjacency, there are different kind of OSPF packets that are exchanged, including Hello packets, link-state requests, link-state (LS) updates, and LSAs. Any two devices participating in forming neighbor adjacency go through the following states in the finite state machine:

Down: This is the initial state of an OSPF router where no information is exchanged between the routers.
Attempt: This state is similar to the down state, with the difference that the router is in the state of initiating a conversation. This state is only applicable for nonbroadcast multiaccess (NBMA) networks.
Init: – In this state, a Hello packet has been received from the neighbor router, but the two-way communication has not yet been established.
2-Way: Indicates that a bidirectional conversation has been established between two routers. After this state, DR/BDR is elected for broadcast and NBMA networks. A router on a broadcast or NBMA network becomes full with the DR/BDR, but remains in 2-way with all the remaining routers.
Exstart: In this state DR/BDR is established as a master–subordinate relationship. The router with the highest router ID is selected as the master and starts exchanging the link-state information.
Exchange: In this state, the OSPF neighbors exchange database description (DBD) packets. The DBD packets contain LSA headers that describe the contents of the LSDB and are compared with the router’s LSDB.
Loading: If there is any discrepancy or missing information found by comparing the DBD packets with the LSDB, routers send link-state request packets to the neighbor routers. In response to the link-state requests, the neighbor router responds with LS Update packets that are acknowledged by the receiving router using LSA packets.
Full: In this state, the router’s database is completely synchronized with the LSDB of the neighbor routers and the routers become fully adjacent.

Let’s now look at the Wireshark captures based on the different states. Figure 5-1 displays the initial OSPF Hello packet where an OSPF-enabled router sends out a Hello packet on the 224.0.0.5 multicast address. Because the router has not received any hello back from the other end, there is no information available about the active neighbor.

Once the OSPF router is able to see the neighbor router, you can then see the Active Neighbor field in the Hello packet. Figure 5-2 displays the active neighbor in the Hello packet for router R3 with OSPF router ID 192.168.3.3. Notice that so far no DR/BDR election has happened in this network segment.

Figure 5-2
OSPF Hello packet with active neighbor

Once the OSPF routers negotiate the DR/BDR roles, the Hello packet will then have both the DR and BDR fields populated as shown in Figure 5-3.

Figure 5-3
OSPF Hello packet with DR/BDR

After the DR/BDR election, the routers decide on the master and subordinate election on the segment. Remember that initially both the routers will send the DBD packet with the Master (MS) bit set, but once the OSPF software realizes that the router with the highest router ID is the master, then the router with lowest router ID will not have the MS bit set. Within the OSPF DBD packet, the MTU of the segment is also advertised. Notice that if there is a mismatch of the MTU values, the OSPF neighborship gets stuck in the exstart or exchange state. Once the master and subordinate election is completed, only then will the routers start exchanging the LSA information in the OSPF DBD packets. Figure 5-4 displays the DBD packet with the MS bit set for the packet coming from the router with highest router ID. Also notice the various LSAs being advertised to the neighboring router. Another important thing to remember is that the Init (I) bit is always set on the initial DBD packet sent by each side of the segment. The More (M) flag is set where there are more DBD packets pending that will be sent by the router.

Figure 5-4
OSPF database description packet

The LS Update packets are sent between the routers in the segment. When an LS Update is sent by DR, it is sent to destination address 224.0.0.5, whereas BDR sends it over 224.0.0.6. The LS Update packet basically contains the list of LSAs that the OSPF router wants to advertise to its neighboring device to synchronize the OSPF database. Figure 5-5 displays the Wireshark capture of the OSPF LS Update packet advertising LSA Type 1 and LSA Type 2 to the neighboring router.

What does an LSA header look like? Each LSA has a common header with 20 bytes followed by a number of additional fields that describe the link. Here are the fields present in the OSPF LSA header:

LS Age (2 bytes): Represents the elapsed time since the LSA was created.
Options (1 byte): Used for advertising OSPF capabilities supported by the router.
LS Type (1 byte): Indicates the type of LSA.
Link State ID (4 bytes): Indicates the link of either the router or the network the link represents.
Advertising Router (4 bytes): Indicates the OSPF router ID of the router originating the LSA.
LS Sequence Number (4 bytes): A sequence number used to detect old or duplicate LSAs.
LS Checksum (2 bytes): Checksum of the LSA, which is used for identifying any data corruption.
Length (2 bytes): Length of the LSA including 20 bytes of the header.

Figure 5-6 displays the Wireshark capture of an LSA Type 3 header within a DBD packet.

Based on the LS Update packet, the router sends an LSA packet. Figure 5-7 displays the LSA packet sent by the router R1 in response to the LS Update packet sent by R3 in Figure 5-5.

Most of the issues in OSPF are usually seen during adjacency bring up, but once the adjacency is up, OSPF remains stable. There might be a bit of difference based on the different OSPF area types, especially with OSPF NSSA. In an OSPF NSSA, the Hello packet has an NSSA (N) bit set, which tells the peering router that it has the NSSA capability enabled on it. Figure 5-8 displays the Hello packet in the NSSA.

Because NSSA advertises external prefixes as Type 7 LSA and these Type 7 LSAs are converted to Type 5 LSAs at the ABR, the ABR specifically looks for a Propagate (P) bit to ensure the conversion from Type 7 to Type 5 LSA is required. If the P bit is not set, the conversion from Type 7 to Type 5 LSA will not be allowed. Figure 5-9 displays the Wireshark capture of the Type 7 LSA in the LS Update packet with the P bit set.

Note that most network OSs come with debug capability for various routing protocols that can be enabled on the router for the purpose of troubleshooting, but Wireshark can be helpful in instances when there is a bigger risk of affecting the router in a production environment when running debugs. When working with Wireshark, the filters listed in Table 5-2 can be helpful to filter packets.

Table 5-2

Wireshark OSPF Filtering

Filter	Description
ospf.area_id == 0.0.0.10	Filters OSPF packets for specified Area ID
ospf.advrouter == 192.168.5.5	Filters OSPF packets with the specified router ID of the advertising router
ospf.hello	Filters OSPF Hello packets
ospf.lsa.router	Filters OSPF router LSA
ospf.lsa.network	Filters OSPF network LSA
ospf.lsa.summary	Filters OSPF summary LSA
ospf.lsa.nssa	Filters for NSSA (Type 7) LSA
ospf.lsa.asext	Filters for Type 5 (External) LSA

EIGRP

Enhanced Interior Gateway Routing Protocol (EIGRP), defined in RFC 7868, is another IGP designed and developed by Cisco Systems. It is also known as a distance vector protocol that leverages the Diffusing Update Algorithm (DUAL) to calculate loop-free routing paths using diffusing computations. All routing protocols, including EIGRP, fundamentally work the same way and have similar functions such as these:

Establishing communication: EIGRP uses a three-way handshake for establishing communication.
Exchanging routes: EIGRP uses reliable transport for exchanging routes.
Performing path computation: The procotol leverages the DUAL algorithm to perform path computation.
Installing routes in the Routing Information Base (RIB): EIGRP only installs loop-free paths in the RIB.

One of the key components of EIGRP is its Topology table. It contains all known paths, locally learned routes, and externally learned routes (learned via redistribution). The information available in the Topology table is used by the DUAL algorithm to calculate the loop-free paths. The EIGRP Topology table not only contains information about the paths, but it also maintains information about when a route was withdrawn by a neighbor.

Most of the computation element resides locally on the router, but EIGRP performs all its tasks using five types of packets:

Hello
Update
Acknowledge
Query
Reply

Let’s take a closer look at these packets one by one.

Hello Packet

Hello packets are used for peer discovery and maintenance purposes. This packet is the first message sent when the EIGRP process comes up on a router and contains several parameters such as K values, AS numbers that are checked by the peer router on receiving the Hello packet, before forming neighborship. The Hello timer is set to a default of 5-s intervals on high-bandwidth links and 60 s on low-bandwidth links. The Hello packets are usually sent to the multicast address 224.0.0.10 unless the neighbors are statically configured on a nonbroadcast medium such as a Frame-Relay, in which case they are sent as unicast packets. Figure 5-10 displays the Wireshark capture of EIGRP Hello packets. Note that you can simply filter the EIGRP Hello packets using the filter (eigrp.opcode == 5) && (eigrp.ack == 0). Note that the eigrp.ack filter is used to filter out the Acknowledge field because the opcode for both the Hello and Acknowledge fields is the same, but the latter has a nonzero value in the Acknowledge field.

Note

The Hello packet also has the Stub flag set when sent by an EIGRP stub router. Users can filter it in Wireshark using the filter eigrp.stub_flags.

Update Packet

The Update packets are used by EIGRP to convey reachability information for prefixes to EIGRP neighbors. After an EIGRP neighborship is established, EIGRP routers send Update packets as unicast to the neighbor routers which contains all the routes, also known as the full updates. Each route in the update message contains metrics such as bandwidth, delay, load, reliability, and other information such as hop count, MTU, and so on. Once the full updates are exchanged between the EIGRP neighbors, the Update packets are only exchanged when there is a change in topology. For instance, a link flap triggers a withdrawal of multiple routes. This is communicated to EIGRP neighbors via a multicast packet only containing the updates. These updates are called partial updates. Figures 5-11 and 5-12 display both the EIGRP full updates and partial updates.

Users can filter EIGRP update packets in Wireshark using the eigrp.opcode == 1 filter. This filter displays both full and partial updates in EIGRP.

Acknowledge Packet

EIGRP works similar to the TCP three-way handshake, where the initial packet could be an Update, Query, or Reply packet and in acknowledgment to these packets, an Acknowledge packet is sent by the EIGRP router. The difference between the TCP and EIGRP three-way handshake is that the sequence number in EIGRP is not incremented but rather copied in the Acknowledge field. Also, this whole communication is done by Cisco’s proprietary Reliable Transport Protocol (RTP). The EIGRP Acknowledge packet has the same opcode as the EIGRP Hello packet, but with a nonzero Acknowledge field value. Figure 5-13 displays the Wireshark capture of an EIGRP Acknowledge packet.

You can filter the EIGRP Acknowledge packet by using the Wireshark filter (eigrp.opcode == 5_ && !(eigrp.ack == 0). The ! operator ensures that we only capture the Acknowledge packet and not the EIGRP Hello packet.

Query Packet

EIGRP queries are sent when a router loses a route to a destination network (the destination prefix goes into active state). Queries are normally sent as multicast to all the neighboring routers to find other paths to the destination prefix. If a receiving router cannot find an alternate path to the destination prefix, it will then query its peers for the destination prefix. This process goes on until the query has reached the boundary router. Figure 5-14 displays the EIGRP Query packet for the destination prefix.

You can filter the EIGRP Query packet in Wireshark using the filter eigrp.opcode == 3.

Note

EIGRP Query packets are not sent to stub routers.

Reply Packet

The EIGRP Reply packet is sent in response to the Query packet. After sending the Query packet, a router waits for a reply from its peer routers. If a router receiving the Query packet knows about an alternate path to the destination prefix, it will respond back to the querying router with the necessary metrics to reach the destination prefix. Figure 5-15 displays the Wireshark capture of an EIGRP Reply packet. You can filter the Reply packet using the Wireshark filter eigrp.opcode == 4.

BGP

BGP, often called the routing protocol of the Internet, is an open standard protocol used for connecting network across different AS boundaries. BGP is a highly scalable protocol and has support for multiple address families such as IPv4, IPv6, VPNv4, L2VPN, EVPN, and so on, which allows BGP to be the protocol of choice in enterprise, datacenter, and service provider environments. BGP, in general, cannot route traffic on its own. It leverages the information from IGP to reach to the next hop for the prefix. BGP knows about the prefixes that might be within the same AS boundary or across multiple AS boundaries. BGP only knows about next hops to reach the destination, but it needs IGP to get to that next hop.

Because BGP exchanges information across AS boundaries, it is also important that the information is exchanged via a reliable mechanism. Thus, BGP leverages TCP as its transport mechanism. A BGP session is established on TCP port 179. In BGP, two types of neighborships can be established:

Internet BGP (iBGP): BGP peering established with other routers within the same AS boundary.
External BGP (eBGP): BGP peering established with routers across AS boundaries.

For two routers to establish a BGP peering, they go through a finite state machine as listed here:

Idle: In this state, BGP detects a start event and initializes the BGP resources on the router. The BGP process initiates a TCP connection toward the peer.
Connect: In this state, BGP waits for the three-way handshake to complete. If the three-way handshake is successful, an OPEN message is sent and the BGP process moves to the OpenSent state. If it is not successful, BGP moves to the Active state, and waits for a ConnectRetry timer.
Active: BGP starts a new three-way handshake. If the connection is successful, it moves to OpenSent state. If it is unsuccessful, the BGP process moves back to the Connect state.
OpenSent: In this state, the BGP process sends an OPEN message to the remote peer and waits for an OPEN message from the peer.
OpenConfirm: In this state, the router has already received the OPEN message from the remote peer and is now waiting for a KEEPALIVE or NOTIFICATION message. On receiving the KEEPALIVE message, the BGP session is established. On receiving a NOTIFICATION message, BGP moves to the Idle state.
Established: This state indicates that the BGP session is established and is now ready to exchange routing updates via the BGP UPDATE message.

From this finite state machine, we have already learned that there are four types of BGP messages:

OPEN: This is the first message exchanged between BGP peers after a three-way handshake has been established between the peers. Once each side confirms the information shared in the BGP OPEN message, other messages are exchanged between them. The following information is compared as part of the OPEN message:
- BGP version
- Source IP of the OPEN message should match with configured peer IP
- Received AS number should match the configured remote AS number of the BGP peer
- BGP Router ID must be unique
- Other security parameters such as password, TTL, and so on
KEEPALIVE: The BGP KEEPALIVE message acts like a Hello packet to check whether the BGP peer is alive or not. This message is used to keep sessions from expiring.
NOTIFICATION: BGP NOTIFICATION is sent when the BGP process encounters an error condition. When this message is received, the BGP process closes the active session for which the notification was received. The NOTIFICATION message also contains the information such as error code and suberror code that can be used to determine the cause of the error condition.
UPDATE: This message is used for exchanging routing updates (advertisements and withdrawals) between BGP peers.

We’ll now examine these BGP messages in Wireshark. First for the BGP OPEN message, the following fields are present in the header:

Marker: Set to fffffffffffffffffffffff.
Length: Length of the BGP header
Type (OPEN message): Value set to 1.
Version: Specifies the current BGP version used by the router. The current version is 4 as defined in RFC 4271.
MS AS: Specifies the AS number of the router originating the OPEN message.
Hold Time: Specifies the Hold Timer value set on the router sending the OPEN message.
BGP Identifier: Router ID of the router sending the OPEN message.
Optional Parameters Length: Variable length, specifies the combined length of all the parameters included in the Optional Parameters field.
Optional Parameters: This field is used by the router to advertise optional BGP capabilities that are supported in BGP by the OS running on the advertising router. Some of these capabilities include the following:
- Multiprotocol BGP (MP-BGP) support
- Route Refresh support
- 4-octet (4-byte) AS number support

Figure 5-16 displays the BGP OPEN message.

The BGP KEEPALIVE message, as mentioned before, is used to ensure BGP peers are active. The BGP process does not rely on the TCP connection to validate that the BGP peers are up. BGP KEEPALIVE messages are sent every 60 s by default with the Hold Timer set to 180 s. Figure 5-17 displays the Wireshark capture of a BGP KEEPALIVE message sent between BGP peers. We can see from the Wireshark capture that there are only three fields present in the BGP KEEPALIVE message.

A BGP NOTIFICATION message is also a short message that contains the information about error code (major error code) and suberror code (minor error code). Because BGP peering may be established multiple hops away, BGP provides a mechanism to notify other peers about what might have triggered the error condition, causing the BGP peering to flap. Figure 5-18 displays the Wireshark capture of the BGP NOTIFICATION message. In the Wireshark capture we can see that the error code is 6 and the suberror code is 4, which indicates Administratively Reset. Thus, this notification message indicates that the BGP peering was manually reset.

Before we dive into the BGP UPDATE message, let’s first understand how the BGP update packaging happens. Once the initial TCP session is established, both endpoints maintain the information about the TCP MSS. As mentioned in Chapter 3, MSS = MTU – IP Header (20 bytes) – TCP Header (20 bytes). When the BGP process wants to send updates to its BGP peers, it packages all the updates to a maximum of MSS bytes and sends it across to the remote BGP peer with the Don’t Fragment (DF) bit set. If all the updates cannot be sent in one single update, BGP sends multiple updates to update the remote BGP peers. It is possible that if any of the segments has lower interface MTU or IP MTU settings, but the MSS negotiation happened with a higher value, the BGP updates might not be able to make it to the remote BGP peer. When the BGP UPDATE packet is sent, the BGP process does not send a BGP KEEPALIVE message. It treats the BGP UPDATE packet as the BGP KEEPALIVE message and the acknowledgment of the BGP UPDATE packet as an acknowledgment to the KEEPALIVE message. Therefore, if the BGP UPDATE packet is unable to make it to the remote end, the BGP session will flap due to Hold Timer expiry. Figure 5-19 displays the Wireshark capture of the BGP UPDATE packet. Notice that in the IP header, the DF bit is set, and in the BGP header, we can see the BGP UPDATE message. The BGP UPDATE message contains the attributes attached to the BGP prefixes and the BGP prefixes are listed as Network Layer Reachability Information (NLRI).

Most BGP issues can be investigated from the CLI. You might only need to leverage the help of Wireshark when there is an issue with the following:

TCP session
Packet loss
Network OS not generating packets in a timely manner
Device not sending the BGP packets out in a timely manner
BGP updates getting corrupted

PIM

Today, almost every network uses multicast in one way or the other. Multicast allows for one-to-many traffic, but only to those who have subscribed or are interested in that traffic. Multicast applications have wide implementation and use cases in financial, health care, digital streaming, and many other types of organizations. Before we dive into multicast and routing protocols to carry PIM-related traffic, we need to understand some key terms:

Source address : Unicast address of a multicast source or sender.
Group address : Destination IP address of a multicast group. Note that multicast addresses range from 224.0.0.0 to 239.255.255.255.
Multicast distribution tree (MDT) : Multicast flows from source to receivers over an MDT. The MDT is either shared or dedicated based on the multicast implementation
Rendezvous point (RP) : A multicast-enabled router that acts as the root of the shared MDT.
Protocol Independent Multicast (PIM) : Routing protocol used to create MDTs.
First-hop router (FHR) : First L3 hop that is directly adjacent to the multicast source.
Last-hop router (LHR) : First L3 hop that is directly adjacent to the receivers.

In this chapter, we focus on the PIM protocol and its messages and see how it is used to build MDT.

The PIM protocol is used to build shared trees as well as shortest path trees from source to receivers to facilitate the distribution of multicast traffic. The PIM protocol runs over the L3 network and builds an overlay network for multicast using the information from the underlying IGP. Thus, when troubleshooting multicast issues, it is important to validate the unicast routing information learned via the IGP. With the help of IGP, PIM is able to locate where the source, receiver, and the RP resides. PIM operates in two modes:

Dense mode: – Based on a push model, PIM Dense mode operates under the assumption that receivers are densely dispersed through the network. In this mode, multicast traffic is flooded domain-wide to build a shortest path tree, and the branches are pruned back where no receivers are found.
Sparse mode: Based on a pull model, PIM Sparse mode assumes that the receivers are sparsely dispersed. In this mode, PIM neighbors are formed and traffic is forwarded only over the PIM-enabled path. Using PIM messages, the join request from receivers is forwarded to the RP and thus the mechanism is known as explicit join. Because of this method, it is also the most preferred and widely used method for multicast distribution.

The PIM protocol has the following fields in its header:

PIM Version (4 bits): Version number is set to 2.
Type (4 bits): Used to specify the PIM message type.
Reserved (8 bits): Reserved for future use. The value is set to 0 in this field during transmission and is ignored by the PIM neighbor.
Checksum (16 bits): Used to calculate the checksum of the entire PIM message except for the payload section.

There are multiple PIM message types, but not all messages are used in all deployments. Some of the most commonly seen PIM messages in basic multicast deployment are listed in Table 5-3.

Table 5-3

PIM message types

Type	Message Type	Destination Address	Description
0	Hello	224.0.0.13	Neighbor discovery.
1	Register	Address of RP (unicast)	Register message is sent by FHR to RP to register the source.
2	Register-stop	Address of FHR (unicast)	This message is sent by RP to the FHR in response to the PIM Register message.
3	Join/Prune	224.0.0.13	Join or prune from an MDT.

PIM Hello Message

The PIM Hello message, identified with Type 0, is sent on all PIM-enabled interfaces to discover and form PIM neighbor adjacencies. PIM neighborship is unidirectional in nature, so it is important to validate the PIM neighborship from both ends of the link. The PIM Hello messages are sent periodically and with the destination address of 224.0.0.13. A PIM Hello message allows for multiple options in Type, Length, and Value (TLV) format. The options supported in PIM Hello messages are listed in Table 5-4.

Table 5-4

PIM Hello Message Options

Option Type	Option Value
1	Holdtime: The amount of time in which the neighbor is in a reachable state
2	Has the following parts: • LAN Prune Delay: Delay before transmitting Prune message in a shared LAN segment • Interval: Time interval for overriding a Prune message • T: Join message suppression capability
19	DR priority used during DR election
20	Generation ID: Random number indicating neighbor status
24	Address List: used for informing neighbors about secondary IP address on interface

Figure 5-20 displays the PIM Hello message between two PIM neighbors.

PIM Register Message

When the source sends multicast traffic, the FHR’s PIM DR takes the first packet, encapsulates it with the PIM header, and sends it as a unicast packet to the PIM RP. The PIM Register message is used to inform the PIM RP that the source is actively sending traffic for the given multicast group. The PIM Register message contains the following fields in its header:

Type: Value is set to 1 for Register message.
Border (B-bit): The PIM multicast border router functionality is defined in RFC 4601, which designates a local source when this bit is set to 0 and designates the source in a directly connected cloud when this bit is set to 1.
Null-Register: This bit is set to 1 when a Null-Register message is sent. In the Null-Register message, the FHR only encapsulates the header from the source and not the complete encapsulated data packet of the multicast stream coming from the source.
Multicast Data packet: The original multicast packet sent by the source is encapsulated inside the PIM Register message. If the message is a Null-Register message, only a dummy IP header containing the source and group address is encapsulated in the PIM Register message. Note that the TTL of the original packet decrements before encapsulation into the PIM Register message.

Figure 5-21 displays the Wireshark capture of the PIM Register message sent by the FHR to the RP (192.168.3.3).

PIM Register-Stop Message

On receiving the PIM Register message, the RP adds the source to the multicast distribution tree. Once the RP receives the first packet natively through the shortest path, it will send a PIM Register-stop message to the DR that has built the Shortest Path Tree (SPT) toward the source. The PIM Register-stop message has the following fields:

Type: Value is set to 2 for PIM Register-stop message.
Group Address: Group address of the encapsulated multicast packet in the PIM Register message.
Source Address: Source address of the encapsulated multicast packet in the PIM Register message.

Figure 5-22 displays the Wireshark capture of the PIM Register-stop message from RP to the DR that sent the PIM Register message.

PIM Join/Prune Message

The PIM Join/Prune message is sent by PIM routers toward the PIM RP or toward the source with the destination set to PIM multicast address 224.0.0.13. These messages are used to build RP trees (RPTs) toward the PIM RP or to build SPT toward the source. The PIM Join/Prune message contains a list of sources (called source lists) and groups (called group sets) to be joined or pruned. The following fields are present in the PIM Join/Prune message:

Type: Value is set to 3 for Join/Prune message.
Upstream Address: Address of the upstream neighbor to which the message is targeted. It also has subfields that represent the address family of the upstream neighbor as well as the encoding.
Number of Groups: Represents the number of multicast group sets in the message.
Holdtime: The amount of time to keep the Join/Prune state alive.
Num Joins: Number of joined sources in the message.
Joined Source Address {IP Address x.x.x.x/32}
- Sparse bit (S): Set to 1 for PIM Sparse mode.
- Wildcard bit (W): When set to 1, this represents wildcard a in the (*, G) entry. When set to 0, this indicates that the encoded source address for (S, G) entry.
- RP bit (R): When set to 0, join is sent toward source. When set to 1, join is sent toward RP.
Num Prunes: Number of pruned sources in the message.
Pruned Source Address {IP Address x.x.x.x/32}: Represents the list of sources being pruned for the group. All three flags in Joined Source Address are applicable for Pruned Source Address, too.

The PIM Join message is sent by the LHR’s DR toward the RP whenever a receiver shows an interest in receiving a multicast stream. Figure 5-23 displays the Wireshark capture of the PIM Join message from the FHR toward the RP.

A PIM Prune message is sent by a PIM router when it wants to remove itself from the multicast tree for a particular multicast group. Figure 5-24 displays the Wireshark capture of a PIM Prune message when there is no receiver interested in the multicast stream.

Analyzing Overlay Traffic

So far, we have learned about analyzing routing protocol traffic that can run on physical links or virtual links such as SVIs. Such networks are known as underlay networks. The routing protocols, however, can also run over an overlay network. An overlay network is a network that is built on top of another network and leverages underlying network configuration and protocols to establish communication as if they were locally connected. The devices or endpoints in an overlay network could be residing multiple hops away in the same or a different geographical location. In overlay traffic, the actual host traffic is encapsulated with the headers of the underlay network. We next look at different overlay protocols and how we can analyze the overlay traffic using Wireshark.

GRE

Generic Routing Encapsulation (GRE), defined in RFC 2784, is an overlay protocol that allows users to create virtual point-to-point links and encapsulate the data packets in a tunnel interface. Because it creates a point-to-point link, each side can encapsulate any outgoing packets toward the remote end and de-encapsulate any incoming packets from the far end of the tunnel. With GRE, users might be running a different routing protocol in the underlay to establish the reachability between the two endpoints of the tunnel while running a different routing protocol in overlay to establish end-to-end connectivity of hosts and devices sitting behind the tunnel endpoints. Figure 5-25 displays the Wireshark capture of the GRE encapsulated packet. Notice that GRE is a 4-byte header, but there is also an overhead of 20-byte outer IP header after the encapsulation. Thus, we need to make sure that the IP MTU value is adjusted accordingly when encapsulating traffic with GRE.

When data traffic is GRE encapsulated, the TTL value in the outer IP header decrements but does not in the inner IP header. Figure 5-26 displays the Wireshark capture of GRE encapsulated traffic captured after the first Layer 3 hop. Notice that the outer IP header has a TTL value of 254, whereas the inner IP header (with source IP set to 192.168.1.1 and destination IP set to 192.168.2.2) has a TTL value of 255.

Figure 5-26
GRE encapsulated traffic after first Layer 3 hop

IPSec

IP Security (IPSec), defined in RFC 1825 through RFC 1827, is a suite of protocols to establish secure communication between two endpoints across the IP network that provides authentication, data integrity, and confidentiality. The RFC also defines the protocols needed for secure key exchange and key management. The following protocols are part of the IPSec protocol suite:

Authentication Headers (AH) : AH provides data integrity, authentication, and antireplay capabilities, which protects against unauthorized transmission of packets.
Internet Key Exchange (IKE) : – IKE is a network security protocol that defines how to dynamically exchange encryption keys and use Security Associations (SAs) to establish shared security attributes between the two IPSec tunnel endpoints. The Internet Security Association Key Management Protocol (ISAKMP) provides a framework for authentication and key exchange and defines how to setup SAs. There are two versions of IKE:
- IKEv1
- IKEv2
Encapsulating Security Payload (ESP) : ESP provides authentication for the payload or data. It ensures data integrity, encryption, and authentication and prevents any replay attacks on the payload.

Let’s now look at the negotiation for IKEv1 in Wireshark. Figure 5-27 displays the Wireshark capture of all the initial communication between the two routers participating in IPSec IKEv1 negotiations and then transmitting data after a secure communication has been established. From the Wireshark capture we can see there are six Main mode messages as part of Phase 1 that negotiate security parameters to protect the next three Quick mode messages as part of Phase 2.

Figure 5-27
Wireshark capture of IPSec IKEv1 negotiations

In Phase 1, as shown in Figure 5-28, the first step is policy negotiation. In the first packet, the sender adds its unique Security Parameter Index (SPI) to identify itself. Along with the SPI, the sender also sends a set of proposals with various security parameters, called transforms . These transforms are used by the receiver to match with its local policies.

Figure 5-28
Wireshark capture of first Phase 1 packet

On receiving the packet, the receiver responds with the Responder SPI and picks one of the transforms that it received based on the configuration. Figure 5-29 displays the Wireshark capture of the reply sent by the responder for the first packet.

Figure 5-29
Wireshark capture of second Phase 1 packet

In the next two packets, both the peers exchange Diffie-Hellman (DH) public keys, which allows them to agree on a shared secret key. Figure 5-30 displays the Wireshark capture of the DH keys. Notice that there is a Nonce data highlighted in the packet capture. The Nonce value helps protect against replay attacks by adding randomness to the key generation process.

Figure 5-30
Wireshark capture of DH keys

The last two packets of the Main mode are used for authentication purposes. In this exchange, both peers confirm each other’s identity. If both sides agreed on a preshared mechanism of authentication, then both sides check whether they have the same preshared key or not. Figure 5-31 displays the Wireshark capture of the identification-related payload. Notice in this Wireshark capture that the Flags field highlights that there is no authentication between the two peers.

Figure 5-31
Wireshark capture of Phase 1 authentication process

After this step, we move to Phase 2 (Quick mode). In this phase, we primarily focus on establishing security parameters that will be used by IPSec SA. Figure 5-32 displays the packet exchanged in Quick mode. Remember that there are three packets that are exchanged in Quick mode but only one is showed for brevity.

Figure 5-32
Wireshark capture of Phase 2 Quick mode

Once Phase 2 is completed, the IPSec tunnels are formed, and all the packets exchanged over the tunnel interface are encrypted. For instance, if you send ICMP traffic, looking at the Wireshark capture you might not be able to identify that it is an ICMP packet or some other type of packet.

VXLAN

VXLAN is an overlay protocol that provides Layer 2 extensions in a datacenter environment. It allows users to extend Layer 2 domains in multitenant environments leveraging the underlying IP infrastructure. VXLAN can also be called a MAC-in-UDP encapsulation. With VXLAN encapsulation, the original Layer 2 header is encapsulated with a UDP header and a VXLAN header. VXLAN packets are sent on the destination UDP port 4789. The VXLAN header provides a 24-bit segment ID that allows users to have up to 16 million VXLAN segments in the same datacenter environment. Figure 5-33 displays how the classical Ethernet frame looks when encapsulated with VXLAN.

The VXLAN encapsulation and de-encapsulation is done by Virtual Tunnel End Points (VTEPs) that connect classic Ethernet segments to the VXLAN fabric. The VXLAN core fabric is usually based on a spine-leaf architecture. Traffic forwarding in VXLAN fabric is dependent on the type of traffic. Broadcast, Unknown Unicast, and Multicast (BUM) traffic requires either multicast replication or unicast replication of packets to a remote VTEP as these packets are sent to multiple VTEPs at the same time. Unicast traffic, on the other hand, does not require any kind of replication. Unicast traffic is encapsulated with VXLAN and a UDP header and sent to the destination VTEP where the host resides. There are, thus, two types of replication methods supported with VXLAN.

The first method is multicast replication. In multicast replication, a multicast group is mapped to the VXLAN Network Identifier (VNI), which in turn is mapped to a VLAN ID where the host resides. When BUM traffic is sent—for instance, an ARP request is sent for a destination host residing in the same VLAN or same VXLAN segment—the ARP request is multicast replicated to all the VTEPs that have the matching VXLAN Network Identifier (VNID) configured. The multicast destination address in the VXLAN encapsulation is the same multicast address that was mapped to the VNI. Figure 5-34 displays the VXLAN-encapsulated BUM traffic. Notice that in this Wireshark capture, the destination address in the IP header is set to 239.1.150.1, which is the multicast address mapped to VNI 10000.

Figure 5-34
VXLAN encapsulation BUM traffic with multicast replication

Because the ARP response is a unicast packet, the ARP reply will be encapsulated with the VXLAN header, but will be sent as a unicast packet to the source VTEP where the source host resides. Once both the end hosts have learned about each other’s MAC address, all the communication will be unicast-based communication. Figure 5-35 displays the Wireshark capture of unicast packets between the two hosts residing in same the VNI segment. Notice that the outer header has the IP address of the VTEPs and the inner IP header has the source and destination IP address of the source and destination hosts.

Figure 5-35
VXLAN encapsulated unicast packet

The second replication method is ingress replication, or unicast replication. This method is used in scenarios where either the organization is not interested in enabling multicast in its fabric or the devices are incapable of running multicast features. The BUM traffic, in this case, is replicated to statically configured remote VTEPs as unicast packets.

So far, we have explored the communication of hosts within the same VNI. Inter-VNI communication in VXLAN fabric is performed through symmetrical Integrated Routing and Bridging (IRB) and with the help of a Layer 3 VNI. For some context of what a Layer 3 or Layer 2 VNI is, let’s first understand the concept of a tenant. A tenant is a logical instance that provides Layer 2 or Layer 3 services in a datacenter. Each tenant consists of multiple Layer 2 VNIs and a Layer 3 VNI. Layer 2 VNIs are the segments where the hosts are connected and the Layer 3 VNI is used for inter-VNI routing.

If we try to understand the symmetrical IRB from a packet forwarding perspective, let’s consider an example where the host H1 with IP address 10.150.1.1, residing in VLAN 1501, which is mapped to VXLAN segment ID 10000, tries to reach to a host H3 with IP address 10.150.2.3 residing in VLAN 1502, which is mapped to VXLAN segment ID 10001. Because these hosts are in different VXLAN segments, we will have to leverage the Layer 3 VNI, let’s say 50000. When the packet from the source host reaches the source VTEP, the VTEP performs a lookup for the destination and understands that the destination resides in a different VXLAN segment and on a remote VTEP. It therefore switches the traffic coming in on segment 10000 and sets the VNID value to 50000 when encapsulating the packet with a VXLAN header and sends it out. When the remote VTEP receives the VXLAN encapsulated packet, it notices the VNID is set to L3 VNI and it performs a routing lookup for the destination IP in the tenant VRF and realizes that it resides in the segment 10001. Because the segment after de-encapsulation is just a VLAN segment, the packet is forwarded to the host residing in VLAN 1502. Figure 5-36 displays the Wireshark capture of the VXLAN encapsulated packet with the VNID value set to 50000, which is the Layer 3 VNI.

There are various implementations of VXLAN such as VXLAN-EVPN and VXLAN Multi-site, but the concept remains the same and the method of encapsulation and de-encapsulation remains the same. Thus, when investigating any VXLAN issue, you might run into issues related to BUM replication or unicast forwarding. In the case of BUM replication with multicast, you might want to troubleshoot the issue from a multicast perspective more than from a VXLAN perspective.

Summary

This chapter is primarily focused on topics that are specific to network engineers to assist them in day-to-day troubleshooting of various routing protocols and overlay network traffic. We began the chapter learning about how to analyze routing protocol traffic such as OSPF, EIGRP, BGP, and PIM. We then moved on to learn about overlay traffic such as GRE and IPSec VPNs. As part of analyzing overlay traffic, we also covered one of the most widely used and critical encapsulations, VXLAN. This chapter assumes that readers understand how these protocols work. They can then build on top of that to reach a deeper understanding of those protocols by learning about the content of their headers and how they can troubleshoot some scenarios that are commonly seen in production environments.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Analyzing Control Plane Traffic

Create new playlist

Sign In

Sign Up