Chapter 11

Troubleshooting BGP

This chapter covers the following topics:

BGP Fundamentals

Defined in RFC 1654, Border Gateway Protocol (BGP) is a path-vector routing protocol that provides scalability, flexibility, and network stability. When BGP was first developed, the primary design consideration was for IPv4 inter-organizational routing information exchange across the public networks, such as the Internet, or for private dedicated networks. BGP is often referred to as the protocol for the Internet, because it is the only protocol capable of holding the Internet routing table, which has more than 600,000 IPv4 routes and over 42,000 IPv6 routes, both of which continue to grow.

From the perspective of BGP, an autonomous system (AS) is a collection of routers under a single organization’s control. Organizations requiring connectivity to the Internet must obtain an autonomous system number (ASN). ASNs were originally 2 bytes (16-bit) providing 65,535 ASNs. Due to exhaustion, RFC 4893 expands the ASN field to accommodate 4 bytes (32-bit). This allows for 4,294,967,295 unique ASNs, providing quite a leap from the original 65,535 ASNs. The Internet Assigned Numbers Authority (IANA) is responsible for assigning all public ASNs to ensure that they are globally unique.

Two blocks of private ASNs are available for any organization to use as long as they are never exchanged publicly on the Internet. ASNs 64,512 to 65,535 are private ASNs within the 16-bit ASN range, and 4,200,000,000 to 4,294,967,294 are private ASNs within the extended 32-bit range.

Note

It is imperative that you use only the ASN assigned by IANA, the ASN assigned by your service provider, or private ASNs. Not only that, the public prefixes are mapped with the relevant ASN numbers of the organizations. Thus, mistakenly or maliciously advertising a prefix using the wrong ASN could result in traffic loss and causing havoc on the Internet.

Address Families

Originally, BGP was intended for routing of IPv4 prefixes between organizations, but RFC 2858 added Multi-Protocol BGP (MP-BGP) capability by adding extensions called address-family identifier (AFI). An address-family correlates to a specific network protocol, such as IPv4, IPv6, and so on, and additional granularity through subsequent address-family identifier (SAFI), such as unicast and multicast. MBGP achieves this separation by using the BGP path attributes (PA) MP_REACH_NLRI and MP_UNREACH_NLRI. These attributes are carried inside BGP update messages and are used to carry network reachability information for different address families.

Note

Some network engineers refer to Multi-Protocol BGP as MP-BGP and other network engineers use the term MBGP. Both terms are the same thing.

Network engineers and vendors continue to add functionality and feature enhancements to BGP. BGP now provides a scalable control plane for signaling for overlay technologies like Multiprotocol Label Switching (MPLS) Virtual Private Networks (VPN), IPsec Security Associations, and Virtual Extensible Lan (VXLAN). These overlays provide Layer 3 connectivity via MPLS L3VPNs, or Layer 2 connectivity via Ethernet VPNs (eVPN).

Every address-family maintains a separate database and configuration for each protocol (address-family + subaddress-family) in BGP. This allows for a routing policy in one address-family to be different from a routing policy in a different address-family, even though the router uses the same BGP session to the other router. BGP includes an AFI and SAFI with every route advertisement to differentiate between the AFI and SAFI databases. Table 11-1 provides a small list of common AFI and SAFIs used with BGP.

Table 11-1 BGP AFI/SAFI

AFI

SAFI

Network Layer Information

1

1

IPv4 Unicast

1

2

IPv4 Multicast

1

4

MPLS Label

1

128

MPLS L3VPN IPv4

2

1

IPv6 Unicast

2

4

MPLS Label

2

128

MPLS L3VPN IPv6

25

65

Virtual Private Lan Service (VPLS)

25

70

Ethernet VPN (EVPN)

Path Attributes

BGP attaches path attributes (PA) associated with each network path. The PAs provide BGP with granularity and control of routing policies within BGP. The BGP prefix PAs are classified as follows:

  • Well-known mandatory

  • Well-known discretionary

  • Optional transitive

  • Optional nontransitive

Per RFC 4271, well-known attributes must be recognized by all BGP implementations. Well-known mandatory attributes must be included with every prefix advertisement, whereas well-known discretionary attributes may or may not be included with the prefix advertisement.

Optional attributes do not have to be recognized by all BGP implementations. Optional attributes can be set so that they are transitive and stay with the route advertisement from AS to AS. Other PAs are nontransitive and cannot be shared from AS to AS. In BGP, the Network Layer Reachability Information (NLRI) is the routing update that consists of the network prefix, prefix-length, and any BGP PAs for that specific route.

Loop Prevention

BGP is a path vector routing protocol and does not contain a complete topology of the network like link state routing protocols. BGP behaves similar to distance vector protocols to ensure a path is a loop-free path.

The BGP attribute AS_PATH is a well-known mandatory attribute and includes a complete listing of all the ASNs that the prefix advertisement has traversed from its source AS. The AS_PATH is used as a loop-prevention mechanism in the BGP protocol. If a BGP router receives a prefix advertisement with its AS listed in the AS_PATH, it discards the prefix because the router thinks the advertisement forms a loop.

Note

The other IBGP-related loop-prevention mechanism are discussed later in this chapter.

BGP Sessions

A BGP session refers to the established adjacency between two BGP routers. BGP sessions are always point-to-point and are categorized into two types:

  • Internal BGP (iBGP): Sessions established with an iBGP router that are in the same AS or participate in the same BGP confederation. iBGP sessions are considered more secure, and some of BGP’s security measures are lowered in comparison to EBGP sessions. iBGP prefixes are assigned an administrative distance (AD) of 200 upon installing into the router’s Routing Information Base (RIB).

  • External BPG (EBGP): Sessions established with a BGP router that are in a different AS. EBGP prefixes are assigned an AD of 20 upon installing into the router’s RIB.

Note

Administrative distance (AD) is a rating of the trustworthiness of a routing information source. If a router learns about a route to a destination from more than one routing protocol and they all have the same prefix length, AD is compared. The preference is given to the route with the lower AD.

BGP uses TCP port 179 to communicate with other routers. Transmission Control Protocol (TCP) allows for handling of fragmentation, sequencing, and reliability (acknowledgement and retransmission) of communication (control plane) packets. Although BGP can form neighbor adjacencies that are directly connected, it can also form adjacencies that are multiple hops away. Multihop sessions require that the router use an underlying route installed in the RIB (static or from any routing protocol) to establish the TCP session with the remote endpoint.

Note

BGP neighbors connected via the same network use the ARP table to locate the IP address of the peer. Multihop BGP sessions require route table information for finding the IP address of the peer. It is common to have a static route or Interior Gateway Protocol (IGP) running between iBGP peers for providing the topology path information for establishing the BGP TCP session. A default route is not sufficient to establish a multihop BGP session.

BGP can be thought of as a control plane routing protocol or as an application, because it allows for the exchanging of routes with peers multiple hops away. BGP routers do not have to be in the data plane (path) to exchange prefixes, but all routers in the data path need to know all the routes that will be forwarded through them.

BGP Identifier

The BGP Router-ID (RID) is a 32-bit unique number that identifies the BGP router in the advertised prefixes as the BGP Identifier. The RID is also used as a loop prevention mechanism for routers advertised within an autonomous system. The RID can be set manually or dynamically for BGP. A nonzero value must be set for routers to become neighbors. NX-OS nodes use the IP address of the lowest up loopback interface. If there are no up loopback interfaces, then the IP address of the lowest active up interface becomes the RID when the BGP process initializes.

Router-IDs typically represent an IPv4 address that resides on the router, such as a loopback address. Any IPv4 address can be used, including IP addresses not configured on the router. NX-OS uses the command router-id router-id under the BGP router configuration to statically assign the BGP RID. Upon changing the router-id, all BGP sessions reset and need to reestablish.

Note

It is a best practice to statically assign the BGP Router-ID.

BGP Messages

BGP communication uses four message types as shown in Table 11-2.

Table 11-2 BGP Packet Types

Type

Name

Functional Overview

1

OPEN

Sets up and establishes BGP adjacency

2

UPDATE

Advertises, updates, or withdraws routes

3

NOTIFICATION

Indicates an error condition to a BGP neighbor

4

KEEPALIVE

Ensures that BGP neighbors are still alive

OPEN

The OPEN message is used to establish a BGP adjacency. Both sides negotiate session capabilities before a BGP peering establishes. The OPEN message contains the BGP version number, ASN of the originating router, Hold Time, BGP Identifier, and other optional parameters that establish the session capabilities.

The Hold Time attribute sets the Hold Timer in seconds for each BGP neighbor. Upon receipt of an UPDATE or KEEPALIVE, the Hold Timer resets to the initial value. If the Hold Timer reaches zero, the BGP session is torn down, routes from that neighbor are removed, and an appropriate update route withdraw message is sent to other BGP neighbors for the impacted prefixes. The Hold Time is a heartbeat mechanism for BGP neighbors to ensure that the neighbor is healthy and alive.

When establishing a BGP session, the routers use the smaller Hold Time value contained in the two router’s OPEN messages. The Hold Time value must be set to at least 3 seconds, or zero. For Cisco routers the default hold timer is 180 seconds.

UPDATE

The UPDATE message advertises any feasible routes, withdraws previously advertised routes, or can do both. The UPDATE message includes the Network Layer Reachability Information (NLRI) that includes the prefix and associated BGP PAs when advertising prefixes. Withdrawn NLRIs include only the prefix. An UPDATE message can act as a KEEPALIVE message to reduce unnecessary traffic.

NOTIFICATION

A NOTIFICATION message is sent when an error is detected with the BGP session, such as a Hold Timer expiring, a neighbor capabilities change, or a BGP session reset is requested. This causes the BGP connection to close.

Note

More details on the BGP messages are discussed during troubleshooting sections.

KEEPALIVE

BGP does not rely upon the TCP connection state to ensure that the neighbors are still alive. KEEPALIVE messages are exchanged every 1/3 of the Hold Timer agreed upon between the two BGP routers. Cisco devices have a default Hold Time of 180 seconds, so the default KEEPALIVE interval is 60 seconds. If the Hold Time is set for zero, no KEEPALIVE messages are sent between the BGP neighbors.

BGP Neighbor States

BGP forms a TCP session with neighbor routers called peers. BGP uses the Finite State Machine (FSM) to maintain a table of all BGP peers and their operational status. The BGP session may report in the following state:

  • Idle

  • Connect

  • Active

  • OpenSent

  • OpenConfirm

  • Established

Figure 11-1 displays the BGP FSM and the states in order of establishing a BGP session.

Image

Figure 11-1 BGP Finite State Machine

Idle

This is the first stage of the BGP FSM. BGP detects a start event and tries to initiate a TCP connection to the BGP peer and also listens for a new connect from a peer router.

If an error causes BGP to go back to the Idle state for a second time, the ConnectRetryTimer is set to 60 seconds and must decrement to zero before the connection is initiated again. Further failures to leave the Idle state result in the ConnectRetryTimer doubling in length from the previous time.

Connect

In this state, BGP initiates the TCP connection. If the 3-way TCP handshake completes, the established BGP Session BGP process resets the ConnectRetryTimer and sends the Open message to the neighbor, and changes to the OpenSent State.

If the ConnectRetry timer depletes before this stage is complete, a new TCP connection is attempted, the ConnectRetry timer is reset, and the state is moved to Active. If any other input is received, the state is changed to Idle.

During this stage, the neighbor with the higher IP address manages the connection. The router initiating the request uses a dynamic source port, but the destination port is always 179.

Note

Service providers consistently assign their customers the higher or lower IP address for their networks. This helps the service provider create proper instructions for ACLs or firewall rules, or for troubleshooting them.

Active

In this state, BGP starts a new 3-way TCP handshake. If a connection is established, an Open message is sent, the Hold Timer is set to 4 minutes, and the state moves to OpenSent. If this attempt for TCP connection fails, the state moves back to the Connect state and resets the ConnectRetryTimer.

OpenSent

In this state, an Open message has been sent from the originating router and is awaiting an Open message from the other router. After the originating router receives the OPEN message from the other router, both OPEN messages are checked for errors. The following items are being compared:

  • BGP versions must match.

  • The source IP Address of the OPEN message must match the IP address that is configured for the neighbor.

  • The AS number in the OPEN message must match what is configured for the neighbor.

  • BGP Identifiers (RID) must be unique. If a RID does not exist, this condition is not met.

  • Security Parameters (Password, Time to Live [TTL], and so on)

If the Open messages do not have any errors, the Hold Time is negotiated (using the lower value), and a KEEPALIVE message is sent (assuming the value is not set to zero). The connection state is then moved to OpenConfirm. If an error is found in the OPEN message, a Notification message is sent, and the state is moved back to Idle.

If TCP receives a disconnect message, BGP closes the connection, resets the ConnectRetryTimer, and sets the state to Active. Any other input in this process results in the state moving to Idle.

OpenConfirm

In this state, BGP waits for a Keepalive or Notification message. Upon receipt of a neighbor’s Keepalive, the state is moved to Established. If the Hold Timer expires, a stop event occurs, or a Notification message is received, the state is moved to Idle.

Established

In this state, the BGP session is established. BGP neighbors exchange routes via Update messages. As Update and Keepalive messages are received, the Hold Timer is reset. If the Hold Timer expires, an error is detected, and BGP moves the neighbor back to the Idle state.

BGP Configuration and Verification

BGP configuration on NX-OS can be laid out in few simple steps, but the BGP command line is available only after enabling the BGP feature. Use the command feature bgp to enable the BGP feature on Nexus platforms. The steps for configuring BGP on an NX-OS device are as follows:

Step 1. Create the BGP routing process. Initialize the BGP process with the global configuration command router bgp as-number.

Step 2. Assign a BGP router-id. Assign a unique BGP router-id under the BGP router process. The router-id can be an IP address assigned to a physical interface or a Loopback interface.

Step 3. Initialize the address-family. Initialize the address-family with the BGP router configuration command address-family afi safi so it can be associated to a BGP neighbor.

Step 4. Identify the BGP neighbor’s IP address and autonomous system number. Identify the BGP neighbor’s IP address and autonomous system number with the BGP router configuration command neighbor ip-address remote-as as-number.

Step 5. Activate the address-family for the BGP neighbor. Activate the address-family for the BGP neighbor with the BGP neighbor configuration command address-family afi safi.

Examine the topology shown in Figure 11-2. This topology is used as reference for the next section as well. In this topology, Nexus devices NX-1, NX-2, and NX-4 are part of AS 65000, whereas router NX-6 belongs to AS 65001.

Image

Figure 11-2 Reference Topology

Example 1-4 displays the BGP configuration for router NX-4 demonstrating both IBGP and EBGP peering. For this example, NX-4 is trying to establish an IBGP peering with NX-1 and an EBGP peering with NX-6. While configuring a BGP peering, it is important to ensure the following information is correct:

  • Local and remote ASN

  • Source peering IP

  • Remote peering IP

  • Authentication passwords (optional)

  • EBGP-multihop (EBGP only)

In Example 11-1, NX-4 is forming an IBGP peering with NX-1 and an EBGP peering with NX-6 router. The NX-4 device is also advertising its loopback address under the IPv4 address family using the network command.

Example 11-1 NX-OS BGP Configuration

NX-4
feature bgp
router bgp 65000
  router-id 192.168.4.4
  address-family ipv4 unicast
    network 192.168.4.4/32
    redistribute direct route-map conn
  neighbor 10.46.1.6
    remote-as 65001
    address-family ipv4 unicast
  neighbor 192.168.1.1
    remote-as 65000
    update-source loopback0
    address-family ipv4 unicast
      next-hop-self
!
ip prefix-list connected-routes seq 5 permit 10.46.1.0/24
!
route-map conn permit 10
  match ip address prefix-list connected-routes

After the configuration is performed on NX-4, peering should be established between NX-1 and NX-4 as well as between NX-4 and NX-6. The BGP peerings are verified using the command show bgp afi safi summary, where afi and safi are used for different address families. In this case,  IPv4 unicast address family is used. Examine the verification of BGP peering between the NX-1, NX-4, and NX-6, as shown in Example 11-2. Notice that both the IBGP and EBGP peering is established on the NX-4 switch, and a prefix is being learned from each neighbor.

Example 11-2 NX-OS BGP Peering Verification

NX-4# show bgp ipv4 unicast summary
BGP summary information for VRF default, address family IPv4 Unicast
BGP router identifier 192.168.4.4, local AS number 65000
BGP table version is 8, IPv4 Unicast config peers 2, capable peers 2
4 network entries and 4 paths using 576 bytes of memory
BGP attribute entries [3/432], BGP AS path entries [1/6]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor      V       AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.46.1.6     4    65001      24      27        8    0    0 00:16:01 1 192.168.1.1   4    65000      23      24        8    0    0 00:16:24 1

After the BGP peering is established, the BGP prefixes are verified using the command show bgp afi safi. This command lists all the BGP prefixes in the respective address families. Example 11-3 displays the output of the BGP prefixes on NX-4. In the output, the BGP table holds locally advertised prefixes with the next-hop value of 0.0.0.0, the next-hop IP address, and a flag to indicate whether the prefix was learned from an IBGP (i) or EBGP (e) peer.

Example 11-3 NX-OS BGP Table Output

NX-4# show bgp ipv4 unicast
BGP routing table information for VRF default, address family IPv4 Unicast
BGP table version is 20, local router ID is 192.168.4.4
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist,
I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

   Network            Next Hop            Metric     LocPrf     Weight Path
*>r10.46.1.0/24       0.0.0.0                  0        100      32768 ?
*>i192.168.1.1/32     192.168.1.1                       100          0 i
*>l192.168.4.4/32     0.0.0.0                           100      32768 i
*>e192.168.6.6/32     10.46.1.6                                      0 65001 i

On NX-OS, the BGP process is instantiated the moment the router bgp asn command is configured. The details of the BGP process and the summarized configuration are viewed using the command show bgp process. This command displays the BGP process ID, state, number of configured and active BGP peers, BGP attributes, VRF information, redistribution and relevant route-maps used with various redistribute statements, and so on. If there is a problem with the BGP process, this command can be viewed to verify the state of BGP along with the memory information of the BGP process. Example 11-4 displays the output of the command show bgp process, highlighting some of the important fields in the output in Example 11-3.

Example 11-4 NX-OS BGP Process

NX-4# show bgp process

BGP Process Information
BGP Process ID                 : 9618
BGP Protocol Started, reason:  : configuration
BGP Protocol Tag               : 65000
BGP Protocol State             : Running
BGP MMODE                      : Not Initialized
BGP Memory State               : OK
BGP asformat                   : asplain

BGP attributes information
Number of attribute entries    : 4
HWM of attribute entries       : 4
Bytes used by entries          : 400
Entries pending delete         : 0
HWM of entries pending delete  : 0
BGP paths per attribute HWM    : 3
BGP AS path entries            : 1
Bytes used by AS path entries  : 26

Information regarding configured VRFs:

BGP Information for VRF default
VRF Id                         : 1
VRF state                      : UP
Router-ID                      : 192.168.4.4
Configured Router-ID           : 192.168.4.4
Confed-ID                      : 0
Cluster-ID                     : 0.0.0.0
No. of configured peers        : 2
No. of pending config peers    : 0
No. of established peers       : 2
VRF RD                         : Not configured

    Information for address family IPv4 Unicast in VRF default
    Table Id                   : 1
    Table state                : UP
    Peers      Active-peers    Routes     Paths      Networks   Aggregates
    2          2               4          4          1          0       

    Redistribution
        direct, route-map conn

    Wait for IGP convergence is not configured

    Nexthop trigger-delay
        critical 3000 ms
        non-critical 10000 ms

    Information for address family IPv6 Unicast in VRF default
    Table Id                   : 80000001
    Table state                : UP
    Peers      Active-peers    Routes     Paths      Networks   Aggregates
    0          0               0          0          0          0       

    Redistribution              
        None

    Wait for IGP convergence is not configured

    Nexthop trigger-delay
        critical 3000 ms
        non-critical 10000 ms

Troubleshooting BGP Peering Issues

BGP peering issues fall primarily into two categories:

  • BGP peering down

  • Flapping BGP peer

BGP peering issues are one of the most common issues that are experienced by network operators in the production environment. Though one of the common issues, the impact of down peer or a flapping BGP peer can be from very minimal (if there is redundancy in the network) to huge (where the peering to the Internet provider is completely down). This section focuses on troubleshooting both issues.

Troubleshooting BGP Peering Down Issues

When a configured BGP session is not in an established state, network engineers refer to this scenario as BGP peering down. A BGP peering down is one of the most common issues seen in most BGP environments. The peering down issue is detected when the following occurs:

  • During establishment of BGP sessions because of misconfiguration

  • Triggered by network migration or event, software or hardware upgrades

  • Failure to maintain BGP keepalives due to transmission problems

A down BGP peer state is in either an Idle or Active state. From the peer state standpoint, these states would mean the following possible problems:

  • Idle State

    • No connected route to peer

  • Active State

    • No route to peer address (IP connectivity not present)

    • Configuration error, such as update-source missing or wrongly configured

  • Idle/Active State

    • TCP establishes but BGP negotiation fails; for example, misconfigured AS

    • Router did not agree on the peering parameters

The following subsections list the various steps involved in troubleshooting BGP peering down issues.

Verifying Configuration

The very first step in troubleshooting BGP peering issues is verifying the configuration and understanding the design. Many times, a basic configuration mistake causes a BGP peering not to establish. The following items should be checked when a new BGP session is configured:

  • Local AS number

  • Remote AS number

  • Verifying the network topology and other documentations

It is important to understand the traffic flow of BGP packets between peers. The source IP address of the BGP packets still reflects the IP address of the outbound interface. When a BGP packet is received, the router correlates the source IP address of the packet to the BGP neighbor table. If the BGP packet source does not match an entry in the neighbor table, the packet cannot be associated to a neighbor and is discarded.

In most of the deployments, the iBGP peering is established over loopback interface, and if the update-source interface is not specified, the session does not come up. The explicit sourcing of BGP packets from an interface is verified by ensuring that the update-source interface-id command under the neighbor ip-address configuration section is correctly configured for the peer.

If there are multiple hops between the EBGP peers, then proper hop count is required. Ensure the ebgp-multihop [hop-count] is configured with the correct hop count. If the hop-count is not specified, the default value is set to 255. Note that the default TTL value for IBGP sessions is 255 whereas the default value of EBGP session is 1. If an EBGP peering is established between two directly connected devices but over the loopback address, users can also use the disable-connected-check command instead of using the ebgp-multihop 2 command. This command disables the connection verification mechanism, which by default, prevents the session from getting established when the EBGP peer is not in the directly connected segment.

Another configuration that is important, although optional, for successful establishment of a BGP session is peer authentication. Misconfiguration or typo errors in authentication passwords will cause the BGP session to fail.

Verifying Reachability and Packet Loss

After the configuration has been verified, the connectivity between the peering IPs needs to be verified. If the peering is being established between loopback interfaces, a loopback-to-loopback ping test should be performed. If a ping test is performed without specifying the source interface, the outgoing interface IP address is used for a packet’s source IP address that does not correlate with the peering IP address. Example 11-5 displays a loopback-to-loopback ping test between NX-1 and NX-4 as they are peering loopback addresses.

Example 11-5 Ping with Source Interface as Loopback

NX-4# ping 192.168.1.1 source 192.168.4.4
PING 192.168.1.1 (192.168.1.1) from 192.168.4.4: 56 data bytes
64 bytes from 192.168.1.1: icmp_seq=0 ttl=253 time=4.555 ms
64 bytes from 192.168.1.1: icmp_seq=1 ttl=253 time=2.72 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=253 time=2.587 ms
64 bytes from 192.168.1.1: icmp_seq=3 ttl=253 time=2.559 ms
64 bytes from 192.168.1.1: icmp_seq=4 ttl=253 time=2.695 ms

--- 192.168.1.1 ping statistics ---
5 packets transmitted, 5 packets received, 0.00% packet loss
round-trip min/avg/max = 2.559/3.023/4.555 ms

Note

At times, users may experience packet loss when performing a ping test. If there is a pattern seen in the ping test, it is most likely be due to CoPP policy, which is dropping those packets.

Using the preceding ping methods, reachability is verified for both the IBGP and EBGP peers. But if there is a problem with the reachability, use the following procedure to isolate the problem or direction of the problem.

Identify the direction of packet loss. The show ip traffic command on NX-OS is used to identify the packet loss or direction of the packet loss. If there is a complete or random packet loss of the ping (ICMP) packets from source to destination, use this method. The command output has the section of ICMP Software Processed Traffic Statistics, which consists of two subsections: Transmission and Reception. Both the sections consist of statistics for echo request and echo reply packets. To perform this test, first ensure that the sent and receive counters are stable (not incrementing) on both the source and the destination devices. Then initiate the ping test toward the destination by specifying the source interface or IP address. After the ping is completed, verify the show ip traffic command to validate the increase in counters on both sides to understand the direction of the packet loss. Example 11-6 demonstrates the method for isolating the direction of packet loss. In this example, the ping is initiated from NX-1 to NX-4 loopback. The first output displays that the echo request packets received at 10 and the echo reply sent are 10 as well. After the ping test from NX-1 to NX-4 loopback, the counters increase to 15 for both echo request and echo reply.

Example 11-6 Ping Test and show ip traffic Command Output

NX-4
NX-4# show ip traffic | in Transmission:|Reception:|echo
Transmission:
  Redirect: 0, unreachable: 0, echo request: 33, echo reply: 10,
Reception:
  Redirect: 0, unreachable: 0, echo request: 10, echo reply: 29,
NX-1
NX-1# ping 192.168.4.4 source 192.168.1.1
PING 192.168.4.4 (192.168.4.4) from 192.168.1.1: 56 data bytes
64 bytes from 192.168.4.4: icmp_seq=0 ttl=253 time=3.901 ms
64 bytes from 192.168.4.4: icmp_seq=1 ttl=253 time=2.913 ms
64 bytes from 192.168.4.4: icmp_seq=2 ttl=253 time=2.561 ms
64 bytes from 192.168.4.4: icmp_seq=3 ttl=253 time=2.502 ms
64 bytes from 192.168.4.4: icmp_seq=4 ttl=253 time=2.571 ms

--- 192.168.4.4 ping statistics ---
5 packets transmitted, 5 packets received, 0.00% packet loss
round-trip min/avg/max = 2.502/2.889/3.901 ms
NX-4
NX-4# show ip traffic | in Transmission:|Reception:|echo
Transmission:
  Redirect: 0, unreachable: 0, echo request: 33, echo reply: 15,
Reception:
  Redirect: 0, unreachable: 0, echo request: 15, echo reply: 29,

Similarly, the outputs are verified on NX-1 as well for echo reply received counters. In the previous example, the ping test is successful, and thus both the echo request received and echo reply sent counters incremented, but in situations when the ping test is failing, it is worth checking these counters closely and with multiple iterations of test. If the ping to the destination device is failing but still both the counters increment on the destination device, the problem could be with the return path, and the users may have to check the path for the return traffic.

ACLs prove to be really useful when troubleshooting packet loss or reachability issues. Configuring an ACL matching the source and the destination IP helps to confirm whether the packet has actually reached the destination router. The only caution that needs to be taken is that while configuring ACL, permit ip any any should be configured at the end, or else it could cause the other packets to get dropped and thus cause a service impact.

Verifying ACLs and Firewalls in the Path

In most of the deployments, the edge routers or Internet Gateway (IGW) routers are configured with ACLs to limit the traffic allowed in the network. If the BGP session is being established across those links where the ACL is configured, ensure that BGP packets (TCP port 179) are not getting dropped due to those ACLs.

Example 11-7 shows how the ACL configuration should look if BGP is passing through that link. The example shows the configuration for both IPv4 as well as ipv6 access-list in case of IPv6 BGP sessions. For applying IPv4 ACL on interface, ip access-group access-list-name {in|out} command is used on all platforms. For IPv6 ACL, ipv6 traffic-filter access-list-name {in|out} interface command is used on NX-OS.

Example 11-7 ACL for Permitting BGP Traffic

NX-4(config)# ip access-list v4_BGP_ACL
NX-4(config-acl)# permit tcp any eq bgp any
NX-4(config-acl)# permit tcp any any eq bgp
! Output omitted for brevity
NX-4(config)# ipv6 access-list v6_BGP_ACL
NX-4(config-ipv6-acl)# permit tcp any eq bgp any
NX-4(config-ipv6-acl)# permit tcp any any eq bgp
! Output omitted for brevity
NX-4(config)# interface Ethernet2/1
NX-4(config-if)# ip access-group v4_BGP_ACL in
NX-4(config-if)# ipv6 traffic-filter v6_BGP_ACL in

Other than having ACLs configured on the edge devices, lot of deployments have firewalls to protect the network from unwanted and malicious traffic. It is a better option to have a firewall installed than to have a huge ACL configured on the routers and switches. Firewalls can be configured in two modes:

  • Routed mode

  • Transparent mode

In routed mode, the firewall has routing capabilities and is considered to be a routed hop in the network. In transparent mode, the firewall is not considered as a router hop to the connected device but merely acts like a bump in the wire. Thus, if an EBGP session is being established across a transparent firewall, ebgp-multihop might not be required, and even if it is required to configure ebgp-multihop due to multiple devices in the path, the firewall is not counted as another routed hop.

Firewalls implement various security levels for the interfaces. For example, the ASA Inside interface is assigned a security level of 100 and the Outside interface is assigned security level 0. An ACL needs to be configured to permit the relevant traffic from the least secure interface going toward the higher security interface. This rule applies for both routed as well as transparent mode firewalls, and ACL is required in both cases.

Bridge groups are configured in transparent mode firewall for each network to help minimize the overhead on security contexts. The interfaces are made part of a bridge group and a Bridge Virtual Interface (BVI) interface is configured with a management IP address.

Example 11-8 displays an ASA ACL configuration that allows ICMP as well as BGP packets to traverse across the firewall and shows how to assign the ACL to the interface. Any traffic that is not part of the ACL is dropped.

Example 11-8 Configuration on Transparent Firewall

interface GigabitEthernet0/0
  nameif Inside
  bridge-group 200
  security-level 100
!
interface GigabitEthernet0/1
  nameif Outside
  bridge-group 200
  security-level 0
!
! Creating BVI with Management IP and should be the same subnet
! as the connected interface subnet
interface BVI200
  ip address 10.1.13.10 255.255.255.0
!
access-list Out extended permit icmp any any
access-list Out extended permit tcp any eq bgp any
access-list Out extended permit tcp any any eq bgp
!
access-group Out in interface Outside

In the access-list named Out, though, both the statements permitting the BGP packets are not required, but it is good practice to have both.

Another problem users might run into with a firewall in middle is with a couple of features on an ASA firewall:

  • Sequence number randomization

  • Enabling TCP Option 19 for MD5 authentication

ASA firewalls by default perform sequence number randomization and thus can cause BGP sessions to flap. Also, if the BGP peering is secured using MD5 authentication, enable TCP option 19 on the firewall’s policy.

Verifying TCP Sessions

Before a BGP peering comes up, establish a TCP session. Thus, it is vital to ensure that TCP sessions are getting established and not being blocked anywhere in the path between the two BGP peering devices. TCP connections on NX-OS are verified using the command show sockets connection tcp. Example 11-9 shows the TCP in Listening state for port 179 and also the TCP connections that are established on NX-4 for both IBGP and EBGP peerings.

Example 11-9 TCP Socket Connections

NX-4# show sockets connection tcp
! Output omitted for brevity
 Total number of tcp sockets: 6
 Active connections (including servers)
 Protocol State/       Recv-Q/   Local Address(port)/
          Context      Send-Q    Remote Address(port)
 tcp      LISTEN       0         *(179)
          Wildcard     0         *(*)
 tcp6     LISTEN       0         *(179)
          Wildcard     0         *(*)

[host]: tcp      ESTABLISHED  0         10.46.1.4(53879)
                 default      0         10.46.1.6(179)

[host]: tcp      ESTABLISHED  0         192.168.4.4(179)
                 default      0         192.168.1.1(21051)

If BGP peering is not getting established, it may be possible that there is a stale entry in the TCP table. The stale entry may show the TCP session to be in established state and thus prevent the router from initiating another TCP connection, thus preventing the router from establishing a BGP peering.

A good troubleshooting technique for down BGP peers is using Telnet on TCP port 179 toward the destination peer IP and using local peering IP as the source. This technique helps ensure that the TCP is not getting blocked or dropped between the two BGP peering devices. This test is useful for verifying any TCP issues on the destination router and also helps verify any ACL that could possibly block the BGP packets.

Example 11-10 shows the use of Telnet on port 179 from NX-1 (192.168.1.1) to NX-4 (192.168.4.4) to verify BGP session. When this test is performed, the BGP TCP session gets established but is closed/disconnected immediately.

Example 11-10 Using Telnet to Port 179

NX-4# show sockets connection tcp foreign 192.168.1.1 detail

Total number of tcp sockets: 4
Active connections (including servers)
NX-1# telnet 192.168.4.4 179 source 192.168.1.1
Trying 192.168.4.4...
Connected to 192.168.4.4.
Escape character is '^]'.
Connection closed by foreign host.
NX-4# show sockets connection tcp foreign 192.168.1.1 detail

Total number of tcp sockets: 5
Active connections (including servers)
[host]: Local host: 192.168.4.4 (179), Foreign host: 192.168.1.1 (40944)
  Protocol: tcp, type: stream, ttl: 64, tos: 0xc0, Id: 18
  Options:  REUSEADR, pcb flags none, state:  | ISDISCONNECTED
! Output omittied for brevity

If the telnet is not sourced from the interface or IP that the remote device is configured to form a BGP neighborship with, the Telnet request is refused. This is another way to confirm that the peering device configuration is as per the documentation or not.

When troubleshooting TCP connection issues, it is also important to check the event-history logs for a netstack process as well. Netstack is an implementation of a Layer-2 to Layer-4 stack on NX-OS. It is one of the critical components involved in the control plane on NX-OS. If there is a problem with establishing a TCP session on a Nexus device, it could be a problem with the netstack process. The show sockets internal event-history events command helps understand what TCP state transitions happened for the BGP peer IP.

Example 11-11 demonstrates the use of the show sockets internal event-history events command to see the TCP session getting closed for BGP peer IP 192.168.2.2, but it does not show any request coming in.

Example 11-11 show sockets internal event-history events Command

NX-4# show sockets internal event-history event
1) Event:E_DEBUG, length:67, at 192101 usecs after Fri Sep  1 05:21:38 2017
    [138] [4226]: Marking desc 22 in mts_open for client 25394, sotype 2
! Output omitted for brevity
4) Event:E_DEBUG, length:91, at 810192 usecs after Fri Sep  1 05:17:09 2017
    [138] [4137]: PCB: Removing pcb from hash list L: 192.168.4.4.179, F: 192.16
8.1.1.21051 C: 1 5) Event:E_DEBUG, length:62, at 810184 usecs after Fri Sep  1 05:17:09 2017
    [138] [4137]: PCB: Detach L 192.168.4.4.179 F 192.168.1.1.21051
6) Event:E_DEBUG, length:77, at 810164 usecs after Fri Sep  1 05:17:09 2017
    [138] [4137]: TCP: Closing connection L: 192.168.4.4.179, F: 192.168.1.1.21051

Note

For any problems encountered with TCP-related protocol such as BGP, capture show tech netstack [detail] and share the information with Cisco TAC.

OPEN Message Errors

If the information within the OPEN message is wrong, BGP peering does not get established. Rather, a BGP notification is sent to the peer by the BGP speaker, which receives the wrong information than what is configured on the router. A few such reasons for a BGP OPEN message error are the following:

  • Unsupported version number

  • Wrong Peer AS

  • Bad or wrong BGP router ID

  • Unsupported optional parameters

Out of the reasons  listed, wrong peer AS or bad BGP Identifier are the most common OPEN message errors and are usually caused due to documentation or human error. The notification messages are also self-explanatory for the two errors and clearly indicate the wrong value and the expected value in the notification message, as shown in Example 11-12. In this example, the router is expecting the peer AS to be in AS 65001 but it's receiving the AS 65002.

Example 11-12 BGP Wrong Peer AS Notification Message

04:51:33 NX-4 %BGP-3-BADPEERAS:  bgp-100 [9544]  VRF default, Peer 10.46.1.6 - bad
remote-as, expecting 65001 received 65002.

During the initial BGP negotiation between the BGP speakers, certain capabilities are exchanged. If any of the BGP speakers are receiving a capability that they do not support, BGP detects an OPEN message error for unsupported capability (or unsupported optional parameter). For instance, one of the BGP speakers is having the capability of enhanced route refresh, but the BGP speaker on the receiving end is running an old software that does not have the capability, then it detects this as an OPEN message error. The following optional capabilities are negotiated between the BGP speakers:

  • Route Refresh capability

  • 4-byte AS capability

  • Multiprotocol capability

  • Single/Multisession capability

To overcome the challenges of unsupported capability, use the command dont- capability-negotiate under the BGP neighbor configuration mode. This command disables the capability negotiations between the BGP peers and allows the BGP peer to come up.

BGP Debugs

Running debugs should always be the last resort for troubleshooting any network problem because debugs can sometimes cause an impact in the network if not used carefully. But sometimes they are the only option when other troubleshooting techniques don’t help understand the problem. Using the NX-OS debug logfile, users can mitigate any kind of impact due to chatty debug outputs. Along with using debug logfile, network operators can put a filter on the debugs using the debug-filter and filtering the output for specific neighbor, prefix, and even the address-family, thus removing any possibility of an impact on the Nexus switch.

When a BGP peer is down, and all the other troubleshooting steps are not helping figure out where the problem is, enable debugs enabled to see if the router is generating and sending the necessary BGP packets, and if it's receiving the relevant packets or not. However, debug is not required on NX-OS because the traces in BGP have sufficient information to debug the problem. There are several debugs that are available for BGP. Depending on the state in which BGP is stuck, certain debug commands are helpful.

For a BGP peering down situation, one of the key debugs used is for BGP keepalives. The BGP keepalive debug is enabled using the command debug bgp keepalives. In the debug output, the two important factors to consider for ensuring a successful BGP peering are as follows:

  • If the BGP keepalive is being generated at regular intervals

  • If the BGP keepalive is being received at regular intervals

If the BGP keepalive is being generated at regular intervals but the BGP peering still remains down, it may be possible that the BGP keepalive couldn’t make it to the other end, or it reached the peering router but was not processed or dropped. In such cases, BGP keepalive debugs are useful. Enable the debug command debug bgp keepalives to verify whether the BGP keepalives are being sent and received. Example 11-13 illustrates the use of BGP keepalive debug. The first output helps the user verify that the BGP keepalive is being generated every 60 seconds. The second output shows the keepalive being received from the remote peer 192.168.1.1.

Example 11-13 BGP Keepalive Debugs

NX-4# debug logfile bgp
NX-4# debug bgp keepalives

NX-4# show debug logfile bgp | grep "192.168.1.1 sending"
05:37:13.870261 bgp: 100 [9544] (default) ADJ: 192.168.1.1 sending KEEPALIVE
05:38:13.890290 bgp: 100 [9544] (default) ADJ: 192.168.1.1 sending KEEPALIVE
05:39:13.900376 bgp: 100 [9544] (default) ADJ: 192.168.1.1 sending KEEPALIVE
05:40:13.920290 bgp: 100 [9544] (default) ADJ: 192.168.1.1 sending KEEPALIVE
05:41:13.940395 bgp: 100 [9544] (default) ADJ: 192.168.1.1 sending KEEPALIVE
05:42:13.960350 bgp: 100 [9544] (default) ADJ: 192.168.1.1 sending KEEPALIVE
05:43:13.980363 bgp: 100 [9544] (default) ADJ: 192.168.1.1 sending KEEPALIVE

NX-4# show debug logfile bgp | grep 192.168.1.1
05:37:13.870160 bgp: 100 [9544] (default) ADJ: 192.168.1.1 keepalive timer fired
05:37:13.870236 bgp: 100 [9544] (default) ADJ: 192.168.1.1 keepalive
 timer fired for peer
05:37:13.870261 bgp: 100 [9544] (default) ADJ: 192.168.1.1 sending KEEPALIVE
05:37:13.870368 bgp: 100 [9544] (default) ADJ: 192.168.1.1 next keep
alive expiry due in 00:00:59
05:37:13.946248 bgp: 100 [9544] (default) ADJ: Peer 192.168.1.1 has
pending data on socket during recv, extending expiry timer
05:37:13.946387 bgp: 100 [9544] (default) ADJ: 192.168.1.1 KEEPALIVE rcvd

Demystifying BGP Notifications

BGP notifications play a crucial role in understanding and troubleshooting failed BGP peering or flapping peer issues. A BGP notification is sent from a BGP speaker to a peer when an error is detected. The notification can be sent either before the BGP session has been established or after it is established, based on the type of error. Each message has a fixed-size header. There may or may not be a data portion following the header, depending on the message type. The layout of these fields is shown in Figure 11-3.

Image

Figure 11-3 BGP Notification Header

In addition to the fixed-size BGP message header, a notification contains the following, as shown in Figure 11-4.

Image

Figure 11-4 Notification Section Information in BGP Header

The Error code and Error-Subcode values are defined in RFC 4271. Table 11-3 shows all the Error codes, Error-Subcode and their interpretation.

Table 11-3 BGP Notification Error and Error-Subcode

Error Code

Subcode

Description

01

00

Message Header Error

01

01

Message Header Error—Connection Not Synchronized

01

02

Message Header Error—Bad Message Length

01

03

Message Header Error—Bad Message Type

02

00

OPEN Message Error

02

01

OPEN Message Error—Unsupported Version Number

02

02

OPEN Message Error—Bad Peer AS

02

03

OPEN Message Error—Bad BGP Identifier

02

04

OPEN Message Error—Unsupported Optional Parameter

02

05

OPEN Message Error—Deprecated

02

06

OPEN Message Error—Unacceptable Hold Time

03

00

Update Message Error

03

01

Update Message Error—Malformed Attribute List

03

02

Update Message Error—Unrecognized Well-Known Attribute

03

03

Update Message Error—Missing Well-Known Attribute

03

04

Update Message Error—Attribute Flags Error

03

05

Update Message Error—Attribute Length Error

03

06

Update Message Error—Invalid Origin Attribute

03

07

(Deprecated)

03

08

Update Message Error—Invalid NEXT_HOP Attribute

03

09

Update Message Error—Optional Attribute Error

03

0A

Update Message Error—Invalid Network Field

03

0B

Update Message Error—Malformed AS_PATH

04

00

Hold Timer Expired

05

00

Finite State Machine Error

06

00

Cease

06

01

Cease—Maximum Number of Prefixes Reached

06

02

Cease—Administrative Shutdown

06

03

Cease—Peer Deconfigured

06

04

Cease—Administrative Reset

06

05

Cease—Connection Rejected

06

06

Cease—Other Configuration Change

06

07

Cease—Connection Collision Resolution

06

08

Cease—Out of Resources

Whenever a notification is generated, the error code and the subcode are always printed in the message. These notification messages are really helpful when troubleshooting down peering issues or flapping peer issues.

Troubleshooting IPv6 Peers

With the depletion of IPv4 routes, the IPv6 addresses have caught up pace. Most of the service providers have already upgraded or are planning to upgrade their infrastructure to dual stack for supporting both IPv4 and IPv6 traffic and offering IPv6 ready services to the Enterprise customers. Even the new applications are being developed with IPv6 compatibility or completely running on IPv6. With such a pace, there is also a need to have appropriate techniques for troubleshooting IPv6 BGP neighbors.

The methodology for troubleshooting IPv6 BGP peers is same as that of IPv4 BGP peers. Here are a few steps you can use to troubleshoot down peering issues for IPv6 BGP neighbors:

Step 1. Verify the configuration for correct peering IPv6 addresses, AS numbers, update-source interface, authentication passwords, EBGP multihop configuration.

Step 2. Verify reachability using the ping ipv6 ipv6-neighbor-address [source interface-id | ipv6-address].

Step 3. Verify the TCP connections using the command show socket connection tcp on NX-OS. In case of IPv6, check for TCP connections for source and destination IPv6 addresses and one of the ports as port 179.

Step 4. Verify any IPv6 ACL’s in path. Like IPv4, the IPv6 ACLs in the path should permit for TCP connections on port 179 and ICMPv6 packets that can help in verifying reachability.

Step 5. Debugs. On NX-OS switches, use the debug bgp ipv6 unicast neighbors ipv6-neighbor-address debug command to capture IPv6 BGP packets. Before enabling the debugs, enable the debug logfile for BGP debug. For filtering the debugs for a particular IPv6 neighbor, use the IPv6 ACL to filtering the debug output for that particular neighbor.

BGP Peer Flapping Issues

When the BGP session is down, the state never goes to Established state. The session keeps flapping between Idle and Active. But when the BGP peer is flapping, it means it is changing state after the session is established. In this case the BGP state keeps flapping between Idle and Established states. Following are the two flapping states in BGP:

  • Idle/Active: Discussed in previous section

  • Idle/Established: Bad update, TCP problem (MSS size in multihop deployment)

Flapping BGP peers could be due to one of several reasons:

  • Bad BGP update

  • Hold Timer expired

  • MTU mismatch

  • High CPU

  • Improper control-plane policing

Bad BGP Update

A bad BGP update refers to a corrupted update packet received from a peer. This condition is not a normal condition. It is usually caused because of one of these reasons:

  • Bad link carrying the update or bad hardware

  • Problem with BGP update packaging

  • Malicious update generated or the UPDATE packet modified by an attacker (hacker)

Whenever a BGP update is corrupted, a BGP notification is generated with the error code of 3, as shown in Table 11-3. When an error is noticed in the BGP update, BGP generates a hex-dump of the bad update message, which can be further decoded to understand which section of the update was corrupted. Along with the hex-dump, BGP also generates a log message that explains what kind of update error has occurred, as shown in Example 11-14.

Example 11-14 Corrupt BGP Update Message

22:10:13.366354 bgp: 65000 [14982] Hexdump at 0xd5893430, 19 bytes:
22:10:13.366362 bgp: 65000 [14982]      FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
22:10:13.366368 bgp: 65000 [14982]      001302
22:10:13.366379 bgp: 65000 [14982] (default) UPD: Badly formatted UPDATE message from peer 10.46.1.4, illegal length for withdrawn routes 65001 [afi/safi: 1/1]
22:10:13.366393 bgp: 65000 [14982] (default) UPD: Sending NOTIFY bad msg length error of length 2 to peer 10.46.1.4
22:10:13.366403 bgp: 65000 [14982] Hexdump at 0xd7eaa5fc, 23 bytes:
22:10:13.366413 bgp: 65000 [14982]      FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
22:10:13.366426 bgp: 65000 [14982]      00170301 02FFFF
22:10:13 %BGP-5-ADJCHANGE:  bgp-65000 [14982] (default) neighbor 10.46.1.4 Down - bad msg length error

Use the command debug bgp packets to view the BGP messages in hexdump, which can be further decoded. If too many BGP updates and messages are being exchanged on the NX-OS devices, a better option is to perform an Ethanalyzer or SPAN to capture a malformed BGP update packet to further analyze it.

Note

The hexdump in the BGP message can be further analyzed using some online tools, such as http://bgpaste.convergence.cx.

Hold Timer Expired

Hold Timer expiry is a very common cause for flapping BGP peers. It simply means that the router didn’t receive or process a Keepalive message or an Update message. Thus, it sends a notification message 4/0 (Hold Timer Expired) and closes the session. BGP flaps due to Hold Timer expiry are caused by one of the following reasons:

  • Interface/platform drops

  • MTS queue stuck

  • Control-plane policy drops

  • BGP Keepalive generation

  • MTU issues

One reason may be Interface/platform drops. Various Interface issues like a physical layer issue or drops on the interface can lead to the BGP session getting flapped due to Hold Timer expiry. If the interface is carrying excessive traffic or even the line card itself is overloaded or busy, the packets may get dropped on the interface level or on the line card ASIC. If the BGP keepalive or update packets are dropped in such instances, BGP may notify the peer of Hold Timer expiry.

Another possibility is that the MTS queue is stuck. Sometimes, BGP Keepalives have arrived at the TCP receiving queue but are not being processed and moved to the BGP InQ. This is noticed when the BGP InQ queues are empty and a BGP neighbor goes down due to Hold Timer expiry. The most common reason for such a scenario on Nexus switches is because the MTS queue is stuck on either the BGP or TCP process. MTS is the main component that takes care of carrying information from one component to another component within NX-OS. In such scenarios, it may be possible that multiple BGP peers may get impacted on the system. To recover, a supervisor switchover or a reload may be required.

In addition, CoPP policy drops can also be a cause. The CoPP policy is designed to prevent the CPU from excessive and unwanted traffic. But a poorly designed CoPP policy causes control-plane protocol flaps. If the CoPP policy has not been accommodated to take care of all the BGP control-plane packets and the number of BGP peers on the router, there might be instances where those packets get dropped. In such situations, users might experience random BGP flaps due to CoPP policy dropping certain packets.

Note

MTS, CoPP, and other platform troubleshooting is covered in detail in Chapter 3, “Troubleshooting Nexus Platform Issues.”

BGP Keepalive Generation

In networks, there are instances when a BGP peering might flap randomly. Apart from the scenarios such as packet loss or control-plane policy drops, there might be other reasons that the BGP peering flaps and the reason is still seen as hold timer expiry. One such reason may be due to the BGP keepalives not being generated in a timely manner. For troubleshooting such instances, the first step is to understand if there is any pattern to the BGP flaps. This information is gathered by getting answers to the following questions:

  • At what time of the day is the BGP flap happening?

  • How frequently is the flap happening?

  • How is the traffic load on the interface/system when the flap occurs?

  • Is the CPU high during the time of the flap? If yes, is it due to traffic or a particular process?

These questions help lay out a pattern for the BGP flaps, and relevant troubleshooting can be performed around the same time. To further troubleshoot the problem, understand that the BGP flap is due to two reasons:

  • Either keepalives getting generated at regular intervals but not leaving the router or not making it to the other end.

  • Keepalives are not getting generated at regular intervals.

If the keepalives are getting generated at regular intervals but not leaving the router, then notice that the OutQ for the BGP peer keeps piling up. The OutQ keeps incrementing due to keepalive generation, but the MsgSent does not increase, which may be an indication that the messages are stuck in the OutQ. Example 11-15 illustrates such a scenario where the BGP keepalives are generated at regular intervals but do not leave the router, leading to a BGP flap due to hold timer expiry. Notice that in this example, the OutQ value increases from 10 to 12, but the MsgSent counter is stagnant at 3938. In this scenario, the peering may flap every BGP hold timer.

Example 11-15 BGP Message Sent and OutQ

NX-4# show bgp ipv4 unicast summary
BGP summary information for VRF default, address family IPv4 Unicast
BGP router identifier 192.168.4.4, local AS number 65000
BGP table version is 19, IPv4 Unicast config peers 2, capable peers 2
4 network entries and 4 paths using 576 bytes of memory
BGP attribute entries [4/576], BGP AS path entries [1/6]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.46.1.6       4 65001    3933    3938       19    0   10 14:30:46 1
192.168.1.1     4 65000     997    1009       19    0    0 15:02:52 1
NX-4# show bgp ipv4 unicast summary
! Output omitted for brevity

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.46.1.6       4 65001    3933    3938       19    0   12 14:30:46 1
192.168.1.1     4 65000     997    1009       19    0    0 15:02:52 1

But if the device experiences random BGP flaps and at irregular intervals, it is possible that the BGP keepalives are getting generated at regular intervals, although the flaps may still happen frequently. For instance, a BGP peering flaps between 4 to 10 minutes. These issues are hard to troubleshoot and may require a different technique than just running show commands. The reason is that it is not easy to isolate which device is not generating the keepalive in a timely manner, or if the keepalive is generated in a timely manner but there is a delay that occurs when the keepalive makes it to the remote peer. To troubleshoot, follow the two-step process between the two ends of the BGP connection.

Step 1. Enable BGP keepalive debug on both routers along with the debug logfile.

Step 2. Enable Ethanalyzer on both routers.

The purpose of enabling Ethanalyzer or any other packet capture tool (based on the underlying platform) is that it is possible that the BGP keepalives reach the other end in a timely manner, but those keepalives may be delayed before reaching BGP process itself. Based on the outputs of the BGP keepalive debug and the Ethanalyzer from the far end device, the timelines could be matched to conclude where exactly the delay might be happening that is causing the BGP to flap. It may be the BGP process that is delaying the keepalive generation, or it may be the other components that interact with BGP to delay the keepalive processing.

MTU Mismatch Issues

Generally, maximum transmission unit (MTU) is not a big concern when bringing up a BGP neighborship, but MTU mismatch issues can cause BGP sessions to flap. MTU settings vary in different devices in the network because of various factors, such as

  • Improper planning and network design

  • Device not supporting Jumbo MTU or certain MTU values

  • Change due to application requirement

  • Change due to end customer requirement

BGP sends updates based on the Maximum Segment Size (MSS) value calculated by TCP. If Path-MTU-Discovery (PMTUD) is not enabled, the BGP MSS value defaults to 536 bytes as defined in RFC 879. The problem with that is, if a huge number of updates are getting exchanged between the two routers at the MSS value of 536 bytes, convergence issues will be noticed and thus an inefficient use of the network. The reason is that the interface with an MTU size of 1500 is capable of sending nearly three times the MSS value and can be much higher if the interface supports jumbo MTU, but it has to break down the updates in chunks of 536 bytes.

Defined in RFC 1191, PMTUD is introduced to reduce the chances of IP packets getting fragmented along the path and thus helping with faster convergence. Using PMTUD, the source identifies the lowest MTU along the path to the destination and thus decides what packet size should be sent.

How does PMTUD work? When the source generates a packet, it sets the MTU size equal to the outgoing interface with a DF (Do-Not-Fragment) bit set. For any intermediate device that receives the packet and has an MTU value of its egress interface lower than the packet it received, the device drops the packet and sends an ICMP error message with Type 3 (Destination Unreachable) and Code 4 (Fragmentation needed and DF bit set) along with the MTU information of the outgoing interface in the Next-Hop MTU field back toward the source. When the source receives the ICMP unreachable error message, it modifies the MTU size of the outgoing packet to the value specified in the Next-Hop MTU field above. This process continues until the packet successfully reaches the final destination.

BGP also supports PMTUD. PMTUD allows a BGP router to discover the best MTU size along the path to a neighbor to ensure efficient usage of exchanging packets. With Path MTU discovery enabled, the initial TCP negotiation between two neighbors has MSS value equal to (IP MTU − 20 byte IP Header − 20 byte TCP Header) and DF bit set. Thus, if the IP MTU value is 1500 (equal to the interface MTU) then the MSS value is 1460. If the device in the path has a lower MTU or even if the destination router has a lower MTU—for example, 1400, then the MSS value is negotiated based on 1400−40 bytes = 1360 bytes. To derive MSS calculation, use the following formulas:

  • MSS without MPLS = MTU − IP Header (20 bytes) − TCP Header (20 bytes)

  • MSS over MPLS = MTU − IP Header − TCP Header − n*4 bytes (where n is the number of labels in the label stack)

  • MSS across GRE Tunnel = MTU − IP Header (Inner) − TCP Header − [IP Header (Outer) + GRE Header (4 bytes)]

Note

MPLS VPN providers should increase the MPLS MTU to at least 1508 (assuming a minimum of 2 labels) or MPLS MTU of 1516 (to accommodate up to 4 labels)

Now the question is why the MTU mismatch causes BGP sessions to flap? When the BGP connection is established, the MSS value is negotiated over the TCP session. When the BGP update is generated, BGP updates are packaged in the BGP update message, which can hold prefixes and header information to the maximum capacity of the MSS bytes. These BGP update messages are then sent to the remote peer with the do-not-fragment (df-bit) set. If a device in path or even the destination is not able to accept the packets with a higher MTU, it sends an ICMP error message back to BGP speaker. The destination router either waits for the BGP Keepalive or BGP Update packet to update its hold down timer. After 180 seconds, the destination router sends a Notification back to Source with a Hold Time expired error message.

Note

When a BGP router sends an update to a BGP neighbor, it does not send a BGP Keepalive separately. But rather it updates the Keepalive timer for that neighbor. During the BGP update process, the update message is treated as a keepalive by the BGP speakers.

Example 11-16 illustrates a BGP peer flapping problem when there is a MTU mismatch in the path. Consider the same set of devices NX-1, NX-2, NX-4, and NX-6 from the topology shown in Figure 11-2. In this topology, assume the devices have ICMP unreachable disabled on its interfaces. The NX-6 device is advertising 10,000 prefixes to NX-4, which is being further advertised toward NX-1. The interface MTU on NX-1 and NX-4 is set to 9100, whereas the MTU on the interface on NX-2 facing NX-1 is still set to the default; that is, 1500. Because the path MTU discovery (PMTUD) is enabled, the MSS is negotiated to value 9060. The ICMP unreachable message is denied because the lower MTU setting on the NX-2 interface is not received by NX-1.

Example 11-16 BGP Flaps due to MSS Issue

NX-4
NX-4# show bgp ipv4 unicast summary
! Output omitted for brevity

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.46.1.6       4 65001   10475   10482       26    0    0    1d17h 10000
192.168.1.1     4 65000    2643    2659       26    0    0 00:01:59     1
NX-1
NX-1# show bgp ipv4 unicast summary
BGP summary information for VRF default, address family IPv4 Unicast
BGP router identifier 192.168.1.1, local AS number 65000
BGP table version is 37, IPv4 Unicast config peers 1, capable peers 1
4 network entries and 4 paths using 576 bytes of memory
BGP attribute entries [4/576], BGP AS path entries [1/6]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.4.4     4 65000    2579    2566       37    0    0 00:02:49     0

NX-1# show sockets connection tcp foreign 192.168.4.4 detail
Total number of tcp sockets: 4
[host]: Local host: 192.168.1.1 (22543), Foreign host: 192.168.4.4 (179)
  Protocol: tcp, type: stream, ttl: 64, tos: 0xc0, Id: 19
  Options: none, pcb flags  unknown, state:  | NBIO
  MTS: sap 10486
  Receive buffer:
    cc: 0, hiwat: 17184, lowat: 1, flags: none
  Send buffer:
    cc: 0, hiwat: 17184, lowat: 2048, flags: none
  Sequence number state:
    iss: 987705410, snduna: 987705603, sndnxt: 987705603, sndwnd: 17184
    irs: 82840884, rcvnxt: 82841199, rcvwnd: 17184, sndcwnd: 4296
  Timing parameters:
    srtt: 3200 ms, rtt: 0 ms, rttv: 0 ms, krtt: 1000 ms
    rttmin: 1000 ms, mss: 9060, duration: 43800 ms
  State: ESTABLISHED
  Flags:  | SENDCCNEW
No MD5 peers  Context: default
NX-1
! Logs showing BGP flap after hold timer expiry
00:56:27.873 NX-1 %BGP-5-ADJCHANGE:  bgp-65000 [6884] (default) neighbor 192.168.4.4
 Down - holdtimer expired error
00:57:26.627 NX-1 %BGP-5-ADJCHANGE:  bgp-65000 [6884] (default) neighbor 192.168.4.4 Up

The BGP flap does not occur when a small amount of prefixes are exchanged between the peers because the BGP packet size is under 1460 bytes. One symptom of BGP flaps due to MSS/MTU issues is a repetitive BGP flap that occurs because the Hold Timer expires.

The following are the few possible causes of BGP session flapping due to MTU mismatch:

  • The interface MTU on both the peering routers do not match.

  • The Layer 2 path between the two peering routers does not have consistent MTU settings.

  • PMTUD didn’t calculate correct MSS for the TCP BGP session.

  • BGP PMTUD could be failing due to blocked ICMP messages by a router or a firewall in path.

To verify there are MTU mismatch issues in the path, perform an extended ping test by setting the size of the packet as the outgoing interface MTU value along with DF bit set. Also, ensure that ICMP messages are not being blocked in the path to have PMTUD function properly. Ensure that the MTU values are consistent throughout the network with a proper review of the configuration.

Perform a ping test to remote peer with the packet size as the MTU of the interface and do not fragment (df-bit) set as shown in Example 11-17.

Example 11-17 PING with DF-Bit Set

NX-1# ping 192.168.4.4 source 192.168.1.1 packet-size 1500 df-bit
PING 192.168.4.4 (192.168.4.4) from 192.168.1.1: 1500 data bytes
Request 0 timed out
Request 1 timed out
Request 1 timed out
--- 192.168.4.4 ping statistics ---
3 packets transmitted, 0 packets received, 100.00% packet loss

NX-1# ping 192.168.4.4 source 192.168.1.1 packet-size 1472 df-bit
PING 192.168.4.4 (192.168.4.4) from 192.168.1.1: 1472 data bytes
1480 bytes from 192.168.4.4: icmp_seq=0 ttl=253 time=5.298 ms
1480 bytes from 192.168.4.4: icmp_seq=1 ttl=253 time=3.494 ms
1480 bytes from 192.168.4.4: icmp_seq=2 ttl=253 time=4.298 ms
1480 bytes from 192.168.4.4: icmp_seq=3 ttl=253 time=4.528 ms
1480 bytes from 192.168.4.4: icmp_seq=4 ttl=253 time=3.606 ms

--- 192.168.4.4 ping statistics ---
5 packets transmitted, 5 packets received, 0.00% packet loss
round-trip min/avg/max = 3.494/4.244/5.298 ms

Note

Nexus platform adds 28 bytes (20 bytes IP header + 8 bytes ICMP header) when performing the ping with MTU size. Thus, when the ping test is performed with DF-bit set, the ping with 1500 size fails. To successfully test the ping with the interface MTU packet size and df-bit set, subtract 28 bytes from the MTU value on the interface. In this case, 1500 − 28 = 1472.

BGP Route Processing and Route Propagation

After the BGP peering is established, exchange network prefixes and path attributes for BGP peers. Unlike IGP, BGP allows a routing policy to be different for each peer within an AS. BGP route processing for inbound and outbound exchange of network prefixes can be understood in a simple way, as shown in Figure 11-5. When a BGP router receives a route from peer, the BGP installs those routes in the BGP table by filtering those routes through an inbound policy if configured. If the BGP table contains multiple paths for the same prefix, a best path is selected, and then the best path is installed in the routing table. Similarly, when advertising a prefix, only the best route is advertised to the peer device. If there is an outbound policy, the prefixes are filtered before being advertised to the remote peer.

Image

Figure 11-5 BGP Route Processing

Let’s now understand the various fundamentals of route advertisement in the sections that follow. For this section, examine the topology shown in Figure 11-6.

Image

Figure 11-6 BGP Route Propagation Topology

BGP Route Advertisement

BGP prefixes are injected into the BGP table for advertisement by explicit configuration. The four methods that are used to inject the BGP prefixes into the BGP table are the following:

  • Network statement: Using network ip-address/length command.

  • Redistribution: Redistribute directly connected links, static routes, and IGP, such as Routing Information Protocol (RIP), Open Shortest Path First (OSPF), Enhanced Interior Gateway Routing Protocol (EIGRP), Intermediate System to Intermediate System (IS-IS), and Locator ID Separation Protocol (LISP). A route-map is required when a prefix is being redistributed from another routing protocol including directly connected links.

  • Aggregate Route: Summarizing a route, though the component route must exist in the BGP table.

  • Default Route: Using the default-information originate command.

Network Statement

A BGP prefix is advertised via BGP using a network statement. For the network statement to function properly, the route must be present in the routing table. If the route is not present in the routing table, the network statement neither installs the route in the BGP table nor advertises it to the BGP peers. Example 11-18 illustrates the use of network statements to advertise two prefixes. One of the prefixes has the loopback configured locally on the router, and the other prefix does not have the route present in the routing table. It is clear from the output of the command show bgp ipv4 unicast neighbors ip-address advertised-routes that the prefix 192.168.4.4/32 gets advertised to the BGP peer 192.168.1.1 but not the prefix 192.168.44.44/32. When looking at the BGP table for any address-family, it is important to verify the status flags, which would indicate how the prefix is learned on the router. These status flags and their meaning are highlighted before the prefixes in the BGP table are listed. In Example 11-18, the prefix is a local prefix and thus has the status flag as L along with the flag *>, which indicates the route is selected as the best route.

Example 11-18 Prefix Advertisement Using network Command

NX-4
router bgp 65000
  router-id 192.168.4.4
  log-neighbor-changes
  address-family ipv4 unicast
    network 192.168.4.4/32
    network 192.168.44.44/32
  neighbor 192.168.1.1
    remote-as 65000
    update-source loopback0
    address-family ipv4 unicast
      next-hop-self
NX-4# show ip route 192.168.4.4/32
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

192.168.4.4/32, ubest/mbest: 2/0, attached
    *via 192.168.4.4, Lo0, [0/0], 1w1d, local
    *via 192.168.4.4, Lo0, [0/0], 1w1d, direct

NX-4# show ip route 192.168.44.44/32
Route not found
NX-4# show bgp ipv4 unicast
BGP routing table information for VRF default, address family IPv4 Unicast
BGP table version is 27, local router ID is 192.168.4.4
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist,
I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

   Network            Next Hop            Metric     LocPrf     Weight Path
*>i192.168.1.1/32     192.168.1.1                       100          0 i
*>l192.168.4.4/32     0.0.0.0                           100      32768 i
  l192.168.44.44/32   0.0.0.0                           100      32768 i

NX-4# show bgp ipv4 unicast neighbors 192.168.1.1 advertised-routes
! Output omitted for brevity

   Network            Next Hop            Metric     LocPrf     Weight Path
*>l192.168.4.4/32     0.0.0.0                           100      32768 i
Redistribution

Redistributing routes into BGP is a common method of populating the BGP table. Examine the same topology shown in Figure 11-6. On router NX-1, OSPF is being redistributed into BGP. While redistributing the routes from OSPF to BGP, the route-map permits for prefixes 192.168.4.4/32 and 192.168.44.44/32, although the routing table only learns 192.168.4.4/32 from NX-4. Example 11-19 demonstrates the redistribution process into BGP. Notice in the output, the prefix 192.168.4.4/32 has an r flag, which indicates redistributed prefix. Also, the redistributed prefix has a question mark (?) in the AS path list.

Example 11-19 BGP and IGP Redistribution

NX-1
router bgp 65000
  address-family ipv4 unicast
    redistribute ospf 100 route-map OSPF-BGP
!
ip prefix-list OSPF-BGP seq 5 permit 192.168.4.4/32
ip prefix-list OSPF-BGP seq 10 permit 192.168.44.44/32
!
route-map OSPF-BGP permit 10
  match ip address prefix-list OSPF-BGP
    redistribute ospf 100 route-map OSPF-BGP
NX-1# show ip route ospf
192.168.4.4/32, ubest/mbest: 1/0
    *via 10.14.1.4, Eth2/1, [110/41], 00:30:27, ospf-100, intra

NX-1# show bgp ipv4 unicast
BGP routing table information for VRF default, address family IPv4 Unicast
BGP table version is 6, local router ID is 192.168.1.1
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist,
I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

   Network            Next Hop            Metric     LocPrf     Weight Path
*>i192.168.2.2/32     192.168.2.2                       100          0 i
*>r192.168.4.4/32     0.0.0.0                 41        100      32768 ?

Note

The redistribution process is the same for other routing protocols, static routes, and directly connected links, as shown in Example 11-19.

There are a few caveats when performing redistribution for OSPF and IS-IS as listed:

  • OSPF: When redistributing OSPF into BGP, the default behavior includes only routes that are internal to OSPF. The redistribution of external OSPF routes requires a conditional match on route-type under route-map.

  • IS-IS: IS-IS does not include directly connected subnets for any destination routing protocol. This behavior is overcome by redistributing the connected networks into BGP.

Example 11-20 displays the various match route-type options available under the route-map. The route-type options are available for both OSPF and IS-IS route types.

Example 11-20 match route-map Command Options

NX-1(config-route-map)# match route-type ?
  external       External route (BGP, EIGRP and OSPF type 1/2)
  inter-area     OSPF inter area route
  internal       Internal route (including OSPF intra/inter area)
  intra-area     OSPF intra area route
  level-1        IS-IS level-1 route
  level-2        IS-IS level-2 route
  local          Locally generated route
  nssa-external  Nssa-external route (OSPF type 1/2)
  type-1         OSPF external type 1 route
  type-2         OSPF external type 2 route
Route Aggregation

Not all devices in the network are powerful enough to hold all the routes learned via BGP or other routing protocols. Also, having multiple paths in the network leads to consumption of more CPU and memory resources. To overcome this challenge, route aggregation or summarization can be performed. Route aggregation in BGP is performed using the command aggregate-address aggregate-prefix/length [advertise-map | as-set | attribute-map | summary-only | suppress-map]. Table 11-4 describes all the optional command options available with the aggregate-address command.

Table 11-4 aggregate-address Command Options

Option

Description

advertise-map map-name

Used to select attribute information from specific routes.

as-set

Generates an AS_SET path information and community information from the contributing paths.

attribute-map map-name

Used to set the attribute information for specific routes. Allows the attributes of the aggregate route to be changed.

summary-only

Filters all more specific routes from the updates and only advertises summary route.

suppress-map map-name

Conditionally filters more specific routes specified in the route-map.

Example 11-21 demonstrates the use of the summary-only attribute with the aggregate-address command. Notice that NX-2 has 3 prefixes but only a single aggregate prefix gets advertised to NX-5. Notice that on NX-2, when the summary-only command is configured, the more specific routes are suppressed.

Example 11-21 Route Aggregation

NX-2
router bgp 65000
   address-family ipv4 unicast
    network 192.168.2.2/32
   aggregate-address 192.168.0.0/16 summary-only
NX-2# show bgp ipv4 unicast
BGP routing table information for VRF default, address family IPv4 Unicast
BGP table version is 19, local router ID is 192.168.2.2
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist,
I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

   Network            Next Hop            Metric     LocPrf     Weight Path
*>a192.168.0.0/16     0.0.0.0                           100      32768 i
s>i192.168.1.1/32     192.168.1.1                       100          0 i
s>l192.168.2.2/32     0.0.0.0                           100      32768 i
s>i192.168.4.4/32     192.168.4.4                       100          0 i
NX-5# show bgp ipv4 unicast

   Network            Next Hop            Metric     LocPrf     Weight Path
*>e192.168.0.0/16     10.25.1.2                                      0 65000 i
Default-Information Originate

Not every external route can be redistributed and advertised within the network. In such instances, the gateway or edge device advertises a default route to other parts of the network using a routing protocol. To advertise a default route using BGP, use the command default-information originate under the neighbor configuration mode. It is important to note that the command only advertises the default route if the default route is present in the routing table. If there is no default route present, create a default route pointing to null0 interface.

BGP Best Path Calculation

In BGP, route advertisements consist of the Network Layer Reachability Information (NLRI) and the path attributes (PA). The NLRI comprises the network prefix and prefix-length, and the BGP attributes, such as AS-Path, Origin, and so on, are stored in the path attributes.

BGP uses three tables for maintaining the network prefix and path attributes (PA)s for a route. The following BGP tables are briefly explained:

  • Adj-RIB-in: Contains the NLRIs in original form before inbound route policies are processed. The table is purged after all route-policies are processed to save memory.

  • Loc-RIB: Contains all the NLRIs that originated locally or were received from other BGP peers. After NLRIs pass the validity and next-hop reachability check, the BGP best path algorithm selects the best NLRI for a specific prefix. The Loc-RIB table is the table used for presenting routes to the ip routing table.

  • Adj-RIB-out: Contains the NLRIs after outbound route-policies have processed. A BGP route may contain multiple paths to the same destination network. Every path’s attributes impact the desirability of the route when a router selects the best path. A BGP router advertises only the best path to the neighboring routers.

Inside the BGP Loc-RIB table, all the routes and their path attributes are maintained with the best path calculated. The best path is then installed in the RIB of the router. In the event the best path is no longer available, the router can use the existing paths to quickly identify a new best path. BGP recalculates the best path for a prefix upon four possible events:

  • BGP next-hop reachability change

  • Failure of an interface connected to an EBGP peer

  • Redistribution change

  • Reception of new paths for a route

The BGP best path selection algorithm influences how traffic enters or leaves an autonomous system (AS). BGP does not use metrics to identify the best path in a network. BGP uses path attributes to identify its best path. But even before BGP influences the best path selection using PAs, the router looks for the longest prefix match for the routes present in the RIB and prefers that route to be installed in the forwarding information base (FIB).

BGP path attributes are modified upon receipt or advertisement to influence routing in the local AS or neighboring AS. A basic rule for traffic engineering with BGP is that modifications in outbound routing policies influence inbound traffic, and modifications to inbound routing policies influence outbound traffic.

BGP installs the first received path as the best path automatically. When additional paths are received, the newer paths are compared against the current best path. If there is a tie, processing continues onto the next step, until a best path winner is identified.

The following list provides the attributes that the BGP best path algorithm uses for the best route selection process. These attributes are processed in the order listed in Table 11-5.

Table 11-5 BGP Attributes

BGP Attribute

Scope

Weight

Router only. Highest value wins.

Local Preference

Within AS boundary. Highest value wins.

Locally Originated

Network or redistribute command preferred over local aggregates (aggregate-address command).

Accumulated Interior Gateway Protocol (AIGP)

AIGP Path Attribute.

AS_PATH

Shortest AS_PATH wins:

  • Skipped if bgp bestpath as-path ignore configured.

  • AS_SET counts as 1.

  • CONFED parts do not count.

Origin Type

IGP < EGP < Incomplete. Lowest wins.

Mutual Exclusive Discriminator (MED)

Compare only if the first AS in AS_SEQUENCE is same for multiple paths.

EBGP over IBGP

External BGP path preferred over Internal BGP path.

Metric to Next Hop

Cost of IGP to reach BGP next-hop. Lowest metric wins.

Oldest External

When both paths are external, prefer the first (oldest).

BGP Router ID (RID)

Path with lowest BGP RID is preferred.

CLUSTER_LIST

Prefer the route with minimum CLUSTER_LIST length.

Neighbor Address

Prefer path that is received form the lowest neighbor address (neighbor configured using neighbor ip-address command).

The best path algorithm is used to manipulate network traffic patterns for a specific route by modifying various path attributes on BGP routers. Changing of BGP PA influences traffic flow into, out of, and around an autonomous system (AS). The BGP routing policy varies from organization to organization based upon the manipulation of the BGP PAs. Because some PAs are transitive and carry from one AS to another AS, those changes could impact downstream routing for other SPs, too. Other PAs are nontransitive and influence only the routing policy within the organization. Network prefixes are conditionally matched on a variety of factors, such as AS-Path length, specific ASN, BGP communities, or other attributes.

Examining the topology shown in Figure 11-6, NX-5 and NX-6 advertise their loopback toward AS 65000. When NX-1 receives the loopbacks, it receives it via NX-2 and NX-3 but only one of them is chosen as the best. The command show bgp afi safi ip-address/length displays both the received paths but also displays one of the paths that was not chosen as the best path, as shown in Example 11-22. In this example, initially the path for 192.168.5.5/32 is chosen via NX-2 due to the lowest RID, but when an inbound policy on NX-3 is defined to set a higher local preference, the path via NX-3 is chosen as the best.

Example 11-22 BGP Best Path Selection

NX-1# show bgp ipv4 unicast 192.168.5.5/32
BGP routing table information for VRF default, address family IPv4 Unicast
BGP routing table entry for 192.168.5.5/32, version 32
Paths: (2 available, best #1)
Flags: (0x08001a) on xmit-list, is in urib, is best urib route, is in HW,

  Advertised path-id 1
  Path type: internal, path is valid, is best path
  AS-Path: 65001 , path sourced external to AS
    192.168.2.2 (metric 41) from 192.168.2.2 (192.168.2.2)
      Origin IGP, MED not set, localpref 100, weight 0

  Path type: internal, path is valid, not best reason: Router Id
  AS-Path: 65001 , path sourced external to AS
    192.168.3.3 (metric 41) from 192.168.3.3 (192.168.3.3)
      Origin IGP, MED not set, localpref 100, weight 0

  Path-id 1 advertised to peers:
    192.168.3.3        192.168.4.4
NX-3(config)# route-map LP permit 10
NX-3(config-route-map)# set local-preference 200
NX-3(config-route-map)# exit
NX-3(config)# router bgp 65000
NX-3(config-router)# neighbor 10.36.1.6
NX-3(config-router-neighbor)# address-family ipv4 unicast
NX-3(config-router-neighbor-af)# route-map LP in
NX-3(config-router-neighbor-af)# end
NX-1# show bgp ipv4 unicast 192.168.5.5/32
BGP routing table information for VRF default, address family IPv4 Unicast
BGP routing table entry for 192.168.5.5/32, version 38
Paths: (2 available, best #2)
Flags: (0x08001a) on xmit-list, is in urib, is best urib route, is in HW,

  Path type: internal, path is invalid, not best reason: Local Preference, is de
leted, no labeled nexthop
  AS-Path: 65001 , path sourced external to AS
    192.168.2.2 (metric 41) from 192.168.2.2 (192.168.2.2)
      Origin IGP, MED not set, localpref 100, weight 0

  Advertised path-id 1
  Path type: internal, path is valid, is best path
  AS-Path: 65001 , path sourced external to AS
    192.168.3.3 (metric 41) from 192.168.3.3 (192.168.3.3)
      Origin IGP, MED not set, localpref 200, weight 0

  Path-id 1 advertised to peers:
    192.168.2.2        192.168.4.4    
NX-1# show bgp ipv4 unicast 192.168.5.5/32
BGP routing table information for VRF default, address family IPv4 Unicast
BGP routing table entry for 192.168.5.5/32, version 38
Paths: (1 available, best #1)
Flags: (0x08001a) on xmit-list, is in urib, is best urib route, is in HW,

  Advertised path-id 1
  Path type: internal, path is valid, is best path
  AS-Path: 65001 , path sourced external to AS
    192.168.3.3 (metric 41) from 192.168.3.3 (192.168.3.3)
      Origin IGP, MED not set, localpref 200, weight 0

  Path-id 1 advertised to peers:
    192.168.2.2        192.168.4.4

Note

While a prefix is being removed from the BGP RIB (BRIB), the prefix is marked as deleted and the path is never used for forwarding. After the update is complete, the BRIB does not show the path/prefix that was removed.

BGP Multipath

BGP’s default behavior is to advertise only the best path to the RIB, which means that only one path for a network prefix is used when forwarding network traffic to a destination. BGP multipath allows for multiple paths to be presented to the RIB, so that both paths can forward traffic to a network prefix at the same time. BGP multipath is an enhanced form of BGP multihoming.

Note

It is vital to understand that the primary difference between BGP multihoming and BGP multipath is how load balancing works. BGP multipath attempts to distribute the load of the traffic dynamically. BGP multihoming is distributed somewhat by the nature of the BGP best path algorithm, but manipulation to the inbound/outbound routing policies is required to reach a more equally distributed load among the links.

BGP supports three types of equal cost multipath (ECMP): EBGP multipath, IBGP multipath, or eiBGP multipath. In all three types of BGP multipath, the following BGP path attributes (PA) must match for multipath to be eligible:

  • Weight

  • Local Preference

  • AS-Path length and content (confederations can contain a different AS_CONFED_SEQ path)

  • Origin

  • MED

  • Advertisement method must match (IBGP or EBGP); if the prefix is learned via an IBGP advertisement, the IGP cost must match to be considered equal

Note

NX-OS does not support the eiBGP multipath feature at the time of writing.

EBGP and IBGP Multipath

EBGP multipath is enabled on NX-OS with the BGP configuration command maximum-paths number-paths. The number of paths indicates the allowed number of EBGP paths to install in the RIB. Note that the EBGP multipath configuration only allows for external path type to be selected as multipath best path. For internal path types, the IBGP multipath feature is required. The command maximum-paths ibgp number-paths sets the number of IBGP routes to install in the RIB. The commands are placed under the appropriate address-family.

Examine the topology shown in Figure 11-6. In this topology, NX-1 learns same prefixes from both NX-2 and NX-3. Because there is an IBGP peering between NX-1, NX-2, and NX-3, the paths learned via NX-1 are internal. To have multiple BGP paths installed in the RIB and BRIB, multipath IBGP is configured on NX-1. Example 11-23 demonstrates the IBGP multipath functionality as explained.

Example 11-23 IBGP Multipath

NX-1# show bgp ipv4 unicast 192.168.5.5/32
BGP routing table information for VRF default, address family IPv4 Unicast
BGP routing table entry for 192.168.5.5/32, version 32
Paths: (2 available, best #1)
Flags: (0x08001a) on xmit-list, is in urib, is best urib route, is in HW,

  Advertised path-id 1
  Path type: internal, path is valid, is best path
  AS-Path: 65001 , path sourced external to AS
    192.168.2.2 (metric 41) from 192.168.2.2 (192.168.2.2)
      Origin IGP, MED not set, localpref 100, weight 0

  Path type: internal, path is valid, not best reason: Router Id
  AS-Path: 65001 , path sourced external to AS
    192.168.3.3 (metric 41) from 192.168.3.3 (192.168.3.3)
      Origin IGP, MED not set, localpref 100, weight 0

  Path-id 1 advertised to peers:
    192.168.3.3        192.168.4.4
NX-1(config)# router bgp 65000
NX-1(config-router)# address-family ipv4 unicast
NX-1(config-router-af)# maximum-paths ibgp 2
NX-1# show bgp ipv4 unicast
BGP routing table information for VRF default, address family IPv4 Unicast
BGP table version is 65, local router ID is 192.168.1.1
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist,
  I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

   Network            Next Hop            Metric     LocPrf     Weight Path
*>l192.168.1.1/32     0.0.0.0                           100      32768 i
*>i192.168.2.2/32     192.168.2.2                       100          0 i
*>i192.168.3.3/32     192.168.3.3                       100          0 i
*>i192.168.4.4/32     192.168.4.4                       100          0 i
*>i192.168.5.5/32     192.168.2.2                       100          0 65001 i
*|i                   192.168.3.3                       100          0 65001 i
*>i192.168.6.6/32     192.168.2.2                       100          0 65001 i
*|i                   192.168.3.3                       100          0 65001 i
NX-1# show bgp ipv4 unicast 192.168.5.5
BGP routing table information for VRF default, address family IPv4 Unicast
BGP routing table entry for 192.168.5.5/32, version 59
Paths: (2 available, best #1)
Flags: (0x08001a) on xmit-list, is in urib, is best urib route, is in HW,
Multipath: iBGP

  Advertised path-id 1
  Path type: internal, path is valid, is best path
  AS-Path: 65001 , path sourced external to AS
    192.168.2.2 (metric 41) from 192.168.2.2 (192.168.2.2)
     Origin IGP, MED not set, localpref 100, weight 0

  Path type: internal, path is valid, not best reason: Router Id, multipath
 AS-Path: 65001 , path sourced external to AS
    192.168.3.3 (metric 41) from 192.168.3.3 (192.168.3.3)
      Origin IGP, MED not set, localpref 100, weight 0

  Path-id 1 advertised to peers:
    192.168.3.3        192.168.4.4
NX-1# show ip route 192.168.5.5/32 detail

192.168.5.5/32, ubest/mbest: 2/0
    *via 192.168.2.2, [200/0], 00:45:02, bgp-65000, internal, tag 65001,
         client-specific data: a       
         recursive next hop: 192.168.2.2/32
         extended route information: BGP origin AS 65001 BGP peer AS 65001
    *via 192.168.3.3, [200/0], 00:02:22, bgp-65000, internal, tag 65001,
         client-specific data: a       
         recursive next hop: 192.168.3.3/32
         extended route information: BGP origin AS 65001 BGP peer AS 65001

The BGP event-history logs are used to verify the second-best path being added to the Unicast Routing Information Base (URIB). Use the command show bgp event-history detail to view the details for both the best path and the second-best path of a prefix being added to URIB, as shown in Example 11-24. In Example 11-24, first the best path is selected, which is via 192.168.2.2, and then another path is added to the URIB, which is learned via nexthop 192.168.3.3.

Example 11-24 Event-History Logs for BGP Multipath

NX-1# show bgp event-history detail | in 192.168.5.5
16:48:55.864118: (default) RIB: [IPv4 Unicast] Adding path (0x18) to
 192.168.5.5/32 via 192.168.3.3 in URIB (table-id 0x1, flags 0x10, nh 192.168.3.
3) extcomm-len=0, preference=200
16:48:55.864112: (default) RIB: [IPv4 Unicast]: adding route 192.168.5.5/32 via 192.168.3.3
16:48:55.864108: (default) RIB: [IPv4 Unicast] Sending route 192.168.5.5/32 to URIB
16:48:55.864101: (default) RIB: [IPv4 Unicast] No change (0x80038) in best path
 for 192.168.5.5/32 , resync with RIB, backup/multipath changed
16:48:55.864093: (default) RIB: [IPv4 Unicast] Begin select bestpath for
 192.168.5.5/32, adv_all=0, cal_nth=0, install_to_rib=0, flags=0x80038
16:48:55.863833: (default) RIB: [IPv4 Unicast] Triggering bestpath s
election for 192.168.5.5/32 , flags=0x8003a

! Output omitted for brevity

16:06:15.704376: (default) BRIB: [IPv4 Unicast] 192.168.5.5/32, no Label AF
16:06:15.704373: (default) RIB: [IPv4 Unicast] 192.168.5.5/32 path#1
: set to rid=192.168.2.2 nh=192.168.2.2, flags=0x12, changed=1
16:06:15.704369: (default) RIB: [IPv4 Unicast] Selected new bestpath
192.168.5.5/32 flags=0x880018 rid=192.168.2.2 nh=192.168.2.2

BGP Update Generation Process

The update generation process on NX-OS is a bit different than both Cisco IOS and IOS XR based platforms. Unlike IOS and IOS XR, NX-OS does not have any concept of update-groups. BGP processes route update messages received from its peers, runs prefixes and attributes through any configured inbound policy, and installs the new paths in the BGP RIB (BRIB). After the route has been updated in the BRIB, BGP then marks the route for further update generation. Before the prefixes are packaged, they are processed through any configured outbound policies. The BGP puts the marked routes into the update message and sends them to peers. Example 11-25 illustrates the BGP update generation on NX-OS. For understanding the update generation process, debug commands debug ip bgp update and debug ip bgp brib can be enabled. From the debug output in Example 11-25, notice that the update received from NX-4 (192.168.4.4) includes the advertisement for prefix 192.168.44.44/32, which is then updated in the BRIB. Then NX-4, and then further updates are generated for the peers NX-2 and NX-3. Notice that the updates are generated separately for NX-2 (192.168.2.2) and NX-3 (192.168.3.3).

Example 11-25 Debugs for BGP Update and Route Installation in BRIB

NX-1# debug logfile bgp
NX-1# debug ip bgp update
NX-1# debug ip bgp brib
NX-1# show debug logfile bgp
! Receiving an update from peer for 192.168.44.44/32

22:40:31.707254 bgp: 65000 [10739] (default) UPD: Received UPDATE message from
 192.168.4.4
22:40:31.707422 bgp: 65000 [10739] (default) UPD: 192.168.4.4 parsed UPDATE
 message from peer, len 55 , withdraw len 0, attr len 32, nlri len 0
22:40:31.707499 bgp: 65000 [10739] (default) UPD: Attr code 1, length 1,
 Origin: IGP
22:40:31.707544 bgp: 65000 [10739] (default) UPD: Attr code 5, length 4,
 Local-pref: 100
22:40:31.707601 bgp: 65000 [10739] (default) UPD: Peer 192.168.4.4 nexthop
 length in MP reach: 4
22:40:31.707672 bgp: 65000 [10739] (default) UPD: Recvd NEXTHOP 192.168.4.4
22:40:31.707716 bgp: 65000 [10739] (default) UPD: Attr code 14, length 14,
 Mp-reach
22:40:31.707787 bgp: 65000 [10739] (default) UPD: [IPv4 Unicast] Received prefix
 192.168.44.44/32 from peer 192.168.4.4, origin 0, next hop 192.168.4.4,
 localpref 100, med 0
22:40:31.707859 bgp: 65000 [10739] (default) BRIB: [IPv4 Unicast] Installing
 prefix 192.168.44.44/32 (192.168.4.4) via 192.168.4.4  into BRIB with extcomm 22:40:31.707915 bgp: 65000 [10739] (default) BRIB: [IPv4 Unicast] Created new
 path to 192.168.44.44/32 via 0.0.0.0 (pflags=0x0)
22:40:31.707962 bgp: 65000 [10739] (default) BRIB: [IPv4 Unicast]
 (192.168.44.44/32 (192.168.4.4)): bgp_brib_add: handling nexthop
22:40:31.708054 bgp: 65000 [10739] (default) BRIB: [IPv4 Unicast]
 (192.168.44.44/32 (192.168.4.4)): returning from bgp_brib_add, new_path: 1,
 change : 1, undelete: 0, history: 0, force: 0, (pflags=0x2010), reeval=0
22:40:31.708292 bgp: 65000 [10739] (default) BRIB: [IPv4 Unicast]
 192.168.44.44/32, no Label AF

! Generating update for peer 192.168.2.2

22:40:31.709476 bgp: 65000 [10739] (default) UPD: [IPv4 Unicast] Starting
 update run for peer 192.168.2.2 (#65)
22:40:31.709514 bgp: 65000 [10739] (default) UPD: [IPv4 Unicast] consider
 sending 192.168.44.44/32 to peer 192.168.2.2, path-id 1, best-ext is off
22:40:31.709553 bgp: 65000 [10739] (default) UPD: 192.168.2.2 Sending attr
 code 1, length 1, Origin: IGP
22:40:31.709581 bgp: 65000 [10739] (default) UPD: 192.168.2.2 Sending attr
 code 5, length 4, Local-pref: 100
22:40:31.709613 bgp: 65000 [10739] (default) UPD: 192.168.2.2 Sending attr
 code 9, length 4, Originator: 192.168.4.4
22:40:31.709654 bgp: 65000 [10739] (default) UPD: 192.168.2.2 Sending attr
 code 10, length 4, Cluster-list
22:40:31.709700 bgp: 65000 [10739] (default) UPD: 192.168.2.2 Sending attr
 code 14, length 14, Mp-reach
22:40:31.709744 bgp: 65000 [10739] (default) UPD: 192.168.2.2 Sending nexthop
  address 192.168.4.4 length 4
22:40:31.709789 bgp: 65000 [10739] (default) UPD: [IPv4 Unicast] 192.168.2.2
 Created UPD msg (len 69) with prefix 192.168.44.44/32 ( Installed in HW
) path-id 1 for peer
22:40:31.709820 bgp: 65000 [10739] (default) UPD: [IPv4 Unicast] 192.168.2.2:
 walked 0 nodes and packed 0/0 prefixes
22:40:31.709859 bgp: 65000 [10739] (default) UPD: [IPv4 Unicast] (#66) Finished
 update run for peer 192.168.2.2 (#66)

! Generating update for peer 192.168.3.3

22:40:31.709891 bgp: 65000 [10739] (default) UPD: [IPv4 Unicast] Starting update
 run for peer 192.168.3.3 (#65)
22:40:31.709917 bgp: 65000 [10739] (default) UPD: [IPv4 Unicast] consider
 sending 192.168.44.44/32 to peer 192.168.3.3, path-id 1, best-ext is off
22:40:31.709948 bgp: 65000 [10739] (default) UPD: 192.168.3.3 Sending attr
 code 1, length 1, Origin: IGP
22:40:31.709974 bgp: 65000 [10739] (default) UPD: 192.168.3.3 Sending attr
 code 5, length 4, Local-pref: 100
22:40:31.709998 bgp: 65000 [10739] (default) UPD: 192.168.3.3 Sending attr
 code 9, length 4, Originator: 192.168.4.4
22:40:31.710149 bgp: 65000 [10739] (default) UPD: 192.168.3.3 Sending attr
 code 10, length 4, Cluster-list
22:40:31.710180 bgp: 65000 [10739] (default) UPD: 192.168.3.3 Sending attr
 code 14, length 14, Mp-reach
22:40:31.710204 bgp: 65000 [10739] (default) UPD: 192.168.3.3 Sending nexthop
 address 192.168.4.4 length 4
22:40:31.710231 bgp: 65000 [10739] (default) UPD: [IPv4 Unicast] 192.168.3.3
 Created UPD msg (len 69) with prefix 192.168.44.44/32 ( Installed in HW)
 path-id 1 for peer
22:40:31.710261 bgp: 65000 [10739] (default) UPD: [IPv4 Unicast] 192.168.3.3:
 walked 0 nodes and packed 0/0 prefixes
22:40:31.710286 bgp: 65000 [10739] (default) UPD: [IPv4 Unicast] (#66)
Finished update run for peer 192.168.3.3 (#66)

On NX-OS, debugs are not necessarily required to understand the update generation process. Use the command show bgp event-history detail to view the detailed event logs. The detail option is not available by default and thus is required to be configured under the router bgp configuration using the command event-history detail [size large | medium | small]. Example 11-26 displays the detailed output of the BGP event-history logs showing the same update process. In this example, the update is being generated for NX-3. If the event-history logs are rolled over and the issue still keeps occurring again and again, in such situations debugs can be enabled, as demonstrated in Example 11-25.

Example 11-26 Event-History Logs for BGP Update Generation

NX-1# show bgp event-history detail
BGP event-history detail
22:40:31.710283: (default) UPD: [IPv4 Unicast] (#66) Finished update
 run for peer 192.168.3.3 (#66)
22:40:31.710258: (default) UPD: [IPv4 Unicast] 192.168.3.3: walked 0 nodes and
 packed 0/0 prefixes
22:40:31.710226: (default) UPD: [IPv4 Unicast] 192.168.3.3 Created UPD msg
 (len 69) with prefix 192.168.44.44/32 ( Installed in HW) path-id 1 for peer
22:40:31.710201: (default) UPD: 192.168.3.3 Sending nexthop address
192.168.4.4 length 4
22:40:31.710177: (default) UPD: 192.168.3.3 Sending attr code 14, length 14,
 Mp-reach
22:40:31.710145: (default) UPD: 192.168.3.3 Sending attr code 10, length 4,
 Cluster-list
22:40:31.709995: (default) UPD: 192.168.3.3 Sending attr code 9, length 4,
 Originator: 192.168.4.4
22:40:31.709971: (default) UPD: 192.168.3.3 Sending attr code 5, length 4,
 Local-pref: 100
22:40:31.709945: (default) UPD: 192.168.3.3 Sending attr code 1, length 1,
 Origin: IGP
22:40:31.709913: (default) UPD: [IPv4 Unicast] consider sending 192.
168.44.44/32 to peer 192.168.3.3, path-id 1, best-ext is off
22:40:31.709887: (default) UPD: [IPv4 Unicast] Starting update run for peer
 192.168.3.3 (#65)

! Output omitted for brevity

BGP Convergence

BGP convergence depends on various factors. BGP convergence is all about the speed of the following:

  • Establishing sessions with a number of peers.

  • Locally generate all the BGP paths (either via network statement, redistribution of static/connected/IGP routes), and/or from other component for other address-family (for example, Multicast VPN (MVPN) from multicast, L2VPN from l2vpn manager, and so on).

  • Send and receive multiple BGP tables; that is, different BGP address-families to/from each peer.

  • Upon receiving all the paths from peers, perform best path calculation to find the best path and/or multipath, additional-path, backup path.

  • Installing the best path into multiple routing tables like default or VRF routing table.

  • Import and export mechanism.

  • For other address-family like L2VPN or multicast, pass the path calculation result to different lower layer components.

BGP uses a lot of CPU cycles when processing BGP updates and requires memory for maintaining BGP peers and routes in BGP tables. Based on the role of the BGP router in the network, appropriate hardware should be chosen. The more memory a router has, the more routes it can support, much like how a router with a faster CPU supports larger number of peers.

Note

BGP updates rely on TCP, optimization of router resources, like memory, and TCP session parameters, like maximum segment size (MSS), path MTU discovery, interface input queues, TCP window size, and so on to help improve convergence.

There are various steps that should be followed to verify whether the BGP has converged and the routes are installed in the BRIB.

If there is a traffic loss, before BGP has completed its convergence for a given address-family, verify the routing information in the URIB and the forwarding information in the FIB. Example 11-27 demonstrates a BGP route getting refreshed. The command show bgp event-history [event | detail] is used to validate that the prefix is installed in BRIB table and that the command show routing event-history [add-route | modify-route | delete-route] used to check the route has been installed in the URIB. In the URIB, verify the timestamp of when the route was downloaded to the URIB. If the prefix was recently downloaded to the URIB, there might have been an event that caused the route to get refreshed. Also, the difference in the time between when the prefix was installing in BRIB and when it was further downloaded to URIB will help understand the convergence time.

Example 11-27 BRIB and URIB Route Installation

NX-1# show bgp event-history detail
BGP event-history detail
! Output omitted for brevity
22:40:31.707849: (default) BRIB: [IPv4 Unicast] Installing prefix 19
2.168.44.44/32 (192.168.4.4) via 192.168.4.4  into BRIB with extcomm
NX-1# show routing internal event-history add-route | grep 192.168.44.44
22:40:31.708531 urib: "bgp-65000": 192.168.44.44/32 xri info for rnh
 192.168.4.4/32: origin AS fde8 peer AS fde8
22:40:31.708530 urib: "bgp-65000": 192.168.44.44/32, new rnh 192.168
.4.4/32, metric [200/0] route-type internal tag 0x0000fde8 flags 0x0000080e
22:40:31.708496 urib: "bgp-65000": 192.168.44.44/32 add rnh 192.168.
4.4/32 epoch 1 recursive
22:40:31.708495 urib: "bgp-65000": 192.168.44.44/32, adding rnh 192.
168.4.4/32, metric [200/0] route-type internal tag 0x0000fde8 flags 0x00000010

BGP convergence for relevant address-family is checked using the command show bgp convergence detail vrf all. Example 11-28 shows the output of the show bgp convergence details vrf all command. This command shows when the best-path selection process was started and the time to complete it. Not only that, the command also displays the time taken to converge the prefix to URIB, which can be used to understand how the device is performing from BGP and URIB convergence perspective.

Example 11-28 show bgp convergence detail Command Output

NX-1# show bgp convergence detail vrf all
Global settings:
BGP start time 1 day(s), 04:38:39 ago
Config processing completed 0.068404 after start
BGP out of wait mode 0.068493 after start
LDP convergence not required
Convergence to ULIB not required

Information for VRF default
Initial-bestpath timeout: 300 sec, configured 0 sec
BGP update-delay-always is not enabled
First peer up 00:06:14 after start
Bestpath timer not running

   IPv4 Unicast:
   First bestpath signalled 0.068443 after start
   First bestpath completed 0.069397 after start
   Convergence to URIB sent 0.082041 after start
   Peer convergence after start:
    192.168.2.2        (EOR after bestpath)
    192.168.3.3        (EOR after bestpath)
    192.168.4.4        (EOR after bestpath)

   IPv6 Unicast:
   First bestpath signalled 0.068467 after start
   First bestpath completed 0.069574 after start

Note

If the BGP best-path has not run yet, the problem is likely not related to BGP on that node.

If the best-path runs before EOR is received, or if a peer fails to send EOR marker, it can lead to traffic loss. In such situations, enable debug for BGP updates with relevant debug-filters for VRF, address-family, and peer, as shown in Example 11-29.

Example 11-29 Debug Commands with Filter

debug logfile bgp
debug bgp events updates rib brib import
debug-filter bgp address-family ipv4 unicast
debug-filter bgp neighbor 192.168.4.4
debug-filter bgp prefix 192.168.44.44/32

From the debug output, check the event log to look at the timestamp to see when the most recent EOR was sent to the peer. This also shows how many routes were advertised to the peer before the sending of the EOR. A premature EOR sent to the peer can also lead to traffic loss if the peer flushes stale routes early.

If the route in URIB has not been downloaded, it needs to be further investigated because it may not be a problem with BGP. The following commands can be run to check the activity in URIB that could explain the loss:

  • show routing internal event-history ufdm

  • show routing internal event-history ufdm-summary

  • show routing internal event-history recursive

Scaling BGP

BGP is one of the most feature-rich protocols ever developed that provides ease of routing and control using policies. Although BGP has many inbuilt features that scale the protocol very well, these enhancements were never utilized properly. This poses various challenges when BGP is deployed in a scaled environment.

BGP is a heavy protocol because it uses the most CPU and memory resources on a router. Many factors explain why it keeps utilizing more and more resources. The three major factors for BGP memory consumption are as follows:

  • Prefixes

  • Paths

  • Attributes

BGP can hold many prefixes, and each prefix consumes some amount of memory. But when the same prefix is learned via multiple paths, that information is also maintained in the BGP table. Each path adds to more memory. Because BGP was designed to give control to each AS to manage the flow of traffic through various attributes, each prefix can have various attributes per path. This is put down as a mathematical function, where N represents the number of prefixes, M represents the number of paths for a given prefix, and L represents the attributes attached to given prefix:

  • Prefixes: (O(N))

  • Paths: (O(M × N))

  • Attributes: (O(L × M × N))

Tuning BGP Memory

To reduce or tune the BGP memory consumption, make adjustments to the three major factors leading to most BGP memory consumption, as discussed. The next sections examine the various adjustments that can be made for each factor.

Prefixes

BGP memory consumption becomes huge when BGP is holding a large number of prefixes or holding the Internet routing table. In most cases, not all the BGP prefixes are required to be maintained by all the routers running BGP in the network. To reduce the number of prefixes, take the following actions:

  • Aggregation

  • Filtering

  • Partial routing table instead of full routing table

With the use of aggregation, multiple specific routes can be aggregated into one route. But aggregation is challenging when tried on a fully deployed running network. After the network is up and running, the complete IP addressing scheme has to be looked at to perform aggregation. Aggregation is a good option for green field deployments. The green field deployments give more control on the IP addressing scheme, which makes it easier to apply aggregation.

Filtering provides control over the number of prefixes maintained in the BGP table or advertised to BGP peers. BGP provides filtering based on prefix, BGP attributes, and communities. One important point to remember is that complex route filtering, or route filtering applied for a large number of prefixes, helps reduce the memory, but the router takes a hit on the CPU.

Many deployments do not require all the BGP speakers to maintain a full BGP routing table. Especially in an enterprise and data center deployments, there is no real need to having the full Internet routing table. The BGP speakers can maintain even a partial routing table containing the most relevant and required prefixes or just a default route toward the Internet gateway. Such designs greatly reduce the resources being used throughout the network and increase scalability.

Paths

Sometimes the BGP table carries fewer prefixes but still holds more memory because of multiple paths. A prefix can be learned via multiple paths, but only the best or multiple best paths are installed in the routing table. To reduce the memory consumption by BGP due to multiple paths, the following solutions should be adopted:

  • Reduce the number of peerings.

  • Use RRs instead of IBGP full mesh.

Multiple BGP paths are a direct effect of the multiple BGP peerings. Especially in an IBGP full-mesh environment, the number of BGP sessions increases exponentially and thus the number of paths. A lot of customers increase the number of IBGP neighbors to have more redundant paths, but two paths are sufficient to maintain redundancy. Increasing the number of peerings can cause scaling issues both from the perspective of the number of sessions and from the perspective of BGP memory utilization.

It is a well-known fact that IBGP needs to be in full mesh. Figure 11-7 illustrates an IBGP full-mesh topology. In an IBGP full-mesh deployment of n nodes, there are a total of n*(n−1)/2 IBGP sessions and (n−1) sessions per BGP speaker.

Image

Figure 11-7 IBGP Full Mesh

This not only affects the scalability of an individual node or router but the whole network. To increase the scalability of IBGP network, two design approaches can be used:

  • Confederations

  • Route Reflectors

Note

BGP Confederations and Route Reflectors are discussed in another section later in this chapter.

Attributes

A BGP route is a “bag” of attributes. Every BGP prefix has certain default or mandatory attributes that are assigned automatically, such as next-hop or AS-PATH, or attributes that are configured manually, such as Multi-Exit Discriminator (MED) and the like, assigned by customers. Each attribute attached to the prefix adds up some memory utilization. Along with attributes, communities—both standard and extended—add to increased memory consumption. To reduce the BGP memory consumption due to various attributes and communities, the following solutions can be adopted:

  • Reduce the number of attributes.

  • Filter standard or extended communities.

  • Limit local communities.

On NX-OS, use the command show bgp private attr detail to view the various attributes attached to the BGP prefixes. Example 11-30 displays the various global BGP attributes on NX-1. These attributes were learned across various prefixes, including the community attached to the prefix learned from NX-4.

Example 11-30 BGP Attributes Detail

NX-1# show bgp private attr detail
BGP Global attributes vxlan-enable:0  nve-api-init:0 nve-up:0 mac:0000.0000.0000

BGP attributes information
Number of attribute entries    : 4
HWM of attribute entries       : 5
Bytes used by entries          : 400
Entries pending delete         : 0
HWM of entries pending delete  : 0
BGP paths per attribute HWM    : 4
BGP AS path entries            : 1
Bytes used by AS path entries  : 26

BGP as-path traversal count   : 20

Attribute 0x64c1b8bc : Hash: 191, Refcount: 1, Attr ID 10
 origin        : IGP
 as-path       : 65001
               : 0 (path hash)
               : 1 (path refcount)
               : 20 (path marker)
 localpref     : 100
 weight        : 0
      Extcommunity presence mask: (nil)

Attribute 0x64c1b7dc : Hash: 2649, Refcount: 1, Attr ID 5
 origin        : IGP
 as-path       :
 localpref     : 100
 weight        : 32768
      Extcommunity presence mask: (nil)

Attribute 0x64c1b84c : Hash: 2651, Refcount: 1, Attr ID 2
 origin        : IGP
 as-path       :
 localpref     : 100
 weight        : 0
      Extcommunity presence mask: (nil)

Attribute 0x64c1b6fc : Hash: 3027, Refcount: 1, Attr ID 12
 origin        : IGP
 as-path       :
 localpref     : 100
 weight        : 0
      Community: 65000:44
      Extcommunity presence mask: (nil)

There is no method to get rid of the default BGP attributes, but the use of other attributes can be controlled. Using attributes that make things more complex is of no use. For example, using MED and various MED-related commands, such as the command bgp always-compare-med or bgp deterministic-med, can have an adverse impact on the network and can lead to route instability or routing loop conditions. At the same time, the user-assigned attributes will consume more BGP memory, which can easily be avoided.

Scaling BGP Configuration

BGP templates are used to assign common policies and attributes, such as AS number or source-interface, and so on for multiple neighbors. This saves on a lot of typing when there are multiple neighbors having the same policy. The NX-OS implementation of peer templates consists of three template types: peer-policy, peer-session, and peer template.

A peer-policy defines the address-family dependent policy aspects for a peer, including inbound and outbound policy, filter-list and prefix-lists, soft-reconfiguration, and so on. A peer-session template defines session attributes, such as transport details and session timers. Both the peer-policy and peer-session templates are inheritable; that is, a peer-policy or peer-session can inherit attributes from another peer-policy or peer-session, respectively. A peer template pulls the peer-session and peer-policy sections together to allow cookie-cutter neighbor definitions. Example 11-31 illustrates the configuration of BGP templates on NX-1.

Example 11-31 BGP Template Configuration

NX-1(config)# router bgp 65000
! Configure peer-policy template
NX-1(config-router)# template peer-policy PEERS-V4
NX-1(config-router-ptmp)# route-reflector-client
NX-1(config-router-ptmp)# exit
! Configure peer-session template
NX-1(config-router)# template peer-session PEER-DEFAULT
NX-1(config-router-stmp)# remote-as 65530
NX-1(config-router-stmp)# update-source loopback0
NX-1(config-router-stmp)# password cisco
NX-1(config-router-stmp)# exit
! Configure peer template
NX-1(config-router)# template peer IBGP-RRC
NX-1(config-router-neighbor)# inherit peer-session PEER-DEFAULT
NX-1(config-router-neighbor)# address-family ipv4 unicast
NX-1(config-router-neighbor-af)# inherit peer-policy PEERS-V4 10
! Applying Peer Template to BGP peers
NX-1(config-router)# neighbor 192.168.4.4
NX-1(config-router-neighbor)# inherit peer IBGP-RRC
NX-1(config-router-neighbor)# exit
NX-1(config-router)# neighbor 192.168.3.3
NX-1(config-router-neighbor)# inherit peer IBGP-RRC
NX-1(config-router-neighbor)# exit
NX-1(config-router)# neighbor 192.168.2.2
NX-1(config-router-neighbor)# inherit peer IBGP-RRC
NX-1(config-router-neighbor)# exit

Soft Reconfiguration Inbound Versus Route Refresh

BGP updates are requested for resend from peers when making adjustments to inbound BGP policies. BGP updates are incremental; that is, after the initial update is completed, only the changes are received. Thus, the BGP sessions are required to be reset, to request our peers to send us a BGP UPDATE message with all the NLRIs, so that those updates could be rerun via the new filter. There are two methods to perform the session reset:

  • Hard Reset: Dropping and reestablishing BGP session. Performed by the command clear bgp afi safi [* | ip-address].

  • Soft Reset: A soft reset uses filtered prefixes stored in the memory to reconfigure and activate BGP routing tables without tearing down the BGP session. Performed using the command clear bgp afi safi [* | ip address] soft [in | out].

Hard reset of a BGP session is disruptive to an operational network. If a BGP session is reset repeatedly over a short period of time due to multiple changes in BGP policy, it can result in other routers in the network dampening prefixes, causing destinations to be unreachable and traffic to be black-holed.

Soft reconfiguration is a traditional way to allow route-policy to be applied on the inbound BGP route update. BGP soft reconfiguration is enabled using the command soft-reconfiguration inbound under the neighbor configuration mode. When configured, the BGP stores an unmodified copy of all routes received from that peer at all times, even when the routing policies did not change frequently. Enabling soft reconfiguration means that the router also stores prefixes/attributes received prior to any policy application. This causes an extra overhead on memory and CPU on the router.

To manually perform a soft reset, use the command clear bgp ipv4 unicast [* | ip-address] soft [in | out]. The soft-reconfiguration feature is useful when the operator wants to know which prefixes have been sent to a router prior to the application of any inbound policy.

To overcome the challenges of soft-reconfiguration inbound configuration, BGP route refresh capability was introduced and is defined in RFC 2918. The BGP route refresh capability has a capability code of 2 and the capability length of 0. Using the route refresh capability, the router sends out a route refresh request to peer to get the full table from the peer again. The good part of route refresh capability is there is no preconfiguration needed to enable this capability. The ROUTE-REFRESH message is a new BGP message type described in Figure 11-8.

Image

Figure 11-8 BGP Route Refresh Message

The AFI and SAFI in the ROUTE-REFRESH message points to the address-family where the configured peer is negotiating the route refresh capability. The Reserved bits are unused and are set to 0 by the sender and ignored by the receiver.

A BGP speaker sends a ROUTE-REFRESH message only if it has negotiated the route refresh capability with its peer. This implies that all the participating routes should support the route refresh capability. The router sends a route refresh request (REFRESH_REQ) to the peer. After the speaker receives a route refresh request, the BGP speaker readvertises to the peer the Adj-RIB-Out of the AFI and SAFI carried in the message, to its peer. If the BGP speaker has an outbound route filtering policy, the updates are filtered accordingly. The route refresh requesting peer receives the filtered routes.

The clear ip bgp ip-address in or clear bgp afi safi ip-address in command tells the peer to resend the full BGP announcement by sending a route-refresh request. Whereas the clear bgp afi safi ip-address out command resends the full BGP announcement to the peer, it does not initiates a route refresh request. The route refresh capability is verified using the show bgp afi safi neighbor ip-address command. Example 11-32 displays the route refresh capability negotiated between the two BGP peers.

Example 11-32 BGP Route Refresh Capability

NX-1
NX-1# show bgp ipv4 unicast neighbors 192.168.2.2
BGP neighbor is 192.168.2.2, remote AS 65000, ibgp link, Peer index 1
  Inherits peer configuration from peer-template IBGP-RRC
  BGP version 4, remote router ID 192.168.2.2
  BGP state = Established, up for 01:10:46
  Using loopback0 as update source for this peer
  Last read 00:00:43, hold time = 180, keepalive interval is 60 seconds
  Last written 00:00:31, keepalive timer expiry due 00:00:28
  Received 77 messages, 0 notifications, 0 bytes in queue
  Sent 80 messages, 0 notifications, 0 bytes in queue
  Connections established 1, dropped 0
  Last reset by us never, due to No error
  Last reset by peer never, due to No error

 Neighbor capabilities:
  Dynamic capability: advertised (mp, refresh, gr) received (mp, refresh, gr)
  Dynamic capability (old): advertised received
  Route refresh capability (new): advertised received
  Route refresh capability (old): advertised received
  4-Byte AS capability: advertised received
  Address family IPv4 Unicast: advertised received
  Graceful Restart capability: advertised received
! Output omitted for brevity

Note

When the soft-reconfiguration feature is configured, BGP route refresh capability is not used, even though the capability is negotiated. The soft-reconfiguration configuration controls the processing or initiating route refresh.

The BGP refresh request (REFRESH_REQ) is sent in one of the following cases:

  • clear bgp afi safi [* | ip-address] in command is issued.

  • clear bgp afi safi [* | ip-address] soft in command is issued.

  • Adding or changing inbound filtering on the BGP neighbor via route-map.

  • When configuring allowas-in for the BGP neighbor.

  • Configuring soft-reconfiguration inbound for the BGP neighbor.

  • Adding a route-target import to a VRF in MPLS VPN (for AFI/SAFI value 1/128 or 2/128).

Note

It is recommended to use soft-reconfiguration inbound only on EBGP peering whenever it is required to know what the original prefix attributes are before being filtered or modified by the inbound route-map. It is not recommended on routers that receive a large number of prefixes being exchanged, such as the Internet routing table.

Scaling BGP with Route-Reflectors

The inability for BGP to advertise a prefix learned from one iBGP peer to another iBGP peer can lead to scalability issues within an AS. The formula n(n−1)/2 provides the number of sessions required, where n represents the number of routers. A full mesh topology of 5 routers requires 10 sessions, and a topology of 10 routers requires 45 sessions. IBGP scalability becomes an issue for large networks.

RFC 1966 introduces the concept that an iBGP peering can be configured so that it reflects routes to another iBGP peer. The router reflecting routes is known as a route reflector (RR), and the router receiving reflected routes is a route reflector client. The RR design turns an IBGP mesh into a hub-and-spoke design where the RR is the hub router. The RR clients are either regular IBGP peers—that is, they are not directly connected to each other—or the other design could have RR clients that are interconnected. Three basic rules involve route reflectors and route reflection:

  • Rule #1: If an RR receives an NLRI from a non-RR client, the RR advertises the NLRI to a RR client. It will not advertise the NLRI to a non-RR client.

  • Rule #2: If an RR receives an NLRI from an RR client, it advertises the NLRI to RR client(s) and non-RR client(s). Even the RR client that sent the advertisement receives a copy of the route, but it discards the NLRI because it sees itself as the route originator.

  • Rule #3: If an RR receives a route from an EBGP peer, it advertises the route to RR client(s) and non-RR client(s). Only route-reflectors are aware of this change in behavior because no additional BGP configuration is performed on route-reflector clients. BGP route reflection is specific to each address-family. The command route-reflector-client is used on NX-OS devices under the neighbor address-family configuration.

Examine the two RR design scenarios shown in Figure 11-9. The topology in (a) has R1 acting as the RR, whereas R2, R3, and R4 are the RR clients. The topology shown in (b) has a similar setup to that of (a) with a difference that the RR clients are fully meshed with each other.

Image

Figure 11-9 Topology

The RR and the client peers form a cluster and are not required to be fully meshed. Because the topology in (b) has an RR along with fully meshed IBGP client peers, which actually defies the purpose of having RR, the BGP RR reflection behavior should be disabled. The BGP RR client-to-client reflection is disabled using the command no bgp client-to-client reflection. This command is required only on the RR and not on the RR clients. Example 11-33 displays the configuration for disabling BGP client-to-client reflection.

Example 11-33 Disabling BGP Client-to-Client Reflection

NX-1(config)# router bgp 65000
NX-1(config-router)# address-family ipv4 unicast
NX-1(config-router-af)# no client-to-client reflection
Loop Prevention in Route Reflectors

Removing the full mesh requirements in an iBGP topology introduces the potential for routing loops. When RFC 1966 was drafted, two other BGP route reflector specific attributes were added to prevent loops.

ORIGINATOR_ID

This optional nontransitive BGP attribute is created by the first route-reflector and sets the value to the RID of the router that injected/advertised the route into the AS. If the ORIGINATOR_ID is already populated on an NLRI, it should not be overwritten.

If a router receives an NLRI with its RID in the Originator attribute, the NLRI is discarded.

CLUSTER_LIST

This nontransitive BGP attribute is updated by the route-reflector. This attribute is appended (not overwritten) by the route-reflector with its cluster-id. By default, this is the BGP identifier. The cluster-id is set with the BGP configuration command cluster-id.

If a route reflector receives an NLRI with its cluster-id in the Cluster List attribute, the NLRI is discarded.

Example 11-34 provides a sample prefix output from a route that was reflected by the route reflector NX-1, as shown in Figure 11-9. Notice that the originator ID is the advertising router and that the cluster list contains the route-reflector ID. The cluster list contains the route-reflectors that the prefix traversed in the order of the last route-reflector that advertised the route.

Example 11-34 Output of an RR Reflected Prefix

NX-4# show bgp ipv4 unicast 192.168.5.5/32
BGP routing table information for VRF default, address family IPv4 Unicast
BGP routing table entry for 192.168.5.5/32, version 52
Paths: (1 available, best #1)
Flags: (0x08001a) on xmit-list, is in urib, is best urib route, is in HW,

  Advertised path-id 1
  Path type: internal, path is valid, is best path
  AS-Path: 65001 , path sourced external to AS
   192.168.2.2 (metric 81) from 192.168.1.1 (192.168.1.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Originator: 192.168.2.2 Cluster list: 192.168.1.1

  Path-id 1 not advertised to any peer

If a topology contains more than one RR and both the RRs are configured with different cluster IDs, the second RR holds the path from the first RR and hence consumes more memory and CPU resources. Having either single cluster-id or multiple cluster-id has its own disadvantages.

  • Different cluster-id: Additional memory and CPU overhead on RR

  • Same cluster-id: Less redundant paths

If the RR clients are fully meshed within the cluster, no bgp client-to-client reflection command can be enabled on the RR.

Maximum Prefixes

By default, a BGP peer holds all the routes advertised by the peering router. The number of routes are filtered either on the inbound of the local router or on the outbound of the peering router. But there can still be instances where the number of routes are more than what a router needs or a router can handle.

NX-OS supports the BGP maximum-prefix feature that allows you to limit the number of prefixes on a per-peer basis. Generally, this feature is enabled for EBGP sessions, but it is also used for IBGP sessions. Although this feature helps scale and prevent the network from an excess number of routes, it is very important to understand when to use this feature. The BGP maximum-prefix feature is enabled in the following situations:

  • Know how many BGP routes are anticipated from the peer.

  • What actions need to be taken if the number of routes are exceeded. Should the BGP connection be reset or should a warning message be logged?

To limit the number of prefixes, use the command maximum-prefix maximum [threshold] [restart restart-interval | warning-only] for each neighbor. Table 11-6 elaborates each of the fields in the command.

Table 11-6 BGP maximum-prefix Command Options

maximum

Defines the maximum prefix limit.

threshold

Defines the threshold percentage at which a warning is generated.

restart restart-interval

Default behavior. Resets the BGP connection after the specified prefix limit is exceeded. The restart-interval is configured in minutes. BGP tries to reestablish the peering after the specified time interval is passed. When the restart option is set, a cease notification is sent to the neighbor, and the BGP connection is terminated.

warning-only

Only gives a warning message when the specified limit is exceeded.

An important point to remember is that when the restart option is configured with the maximum-prefix command, the only other way apart from waiting for the restart-interval timer to expire, to reestablish the BGP connection, is to perform a manual reset of the peer using the clear bgp afi safi ip-address command.

Example 11-35 illustrates the use of the maximum-prefix command. NX-2 is receiving over 10 prefixes from neighbor 10.25.1.5, but the device has set the maximum-prefix limit to 10 prefixes. In such an instance, the BGP peering is shut on the device where maximum-prefix is set, but the remote end peer remains in Idle state. While troubleshooting BGP peering issues, validate the show bgp afi safi neighbors ip-address command to verify the reason for last reset.

Example 11-35 Maximum-Prefixes

NX-2(config)# router bgp 65000
NX-2(config-router)# neighbor 10.25.1.5
NX-2(config-router-neighbor)# address-family ipv4 unicast
NX-2(config-router-neighbor-af)# maximum-prefix 10
NX-2# show bgp ipv4 unicast summary
BGP summary information for VRF default, address family IPv4 Unicast
BGP router identifier 192.168.2.2, local AS number 65000
BGP table version is 257, IPv4 Unicast config peers 2, capable peers 1
106 network entries and 108 paths using 15424 bytes of memory
BGP attribute entries [16/2304], BGP AS path entries [11/360]
BGP community entries [0/0], BGP clusterlist entries [2/8]

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.25.1.5       4 65001    7349    7354        0    0    0 00:00:08 Shut (PfxCt)
192.168.1.1     4 65000    7781    7778      257    0    0    3d05h 5  
NX-5# show bgp ipv4 unicast neighbors 10.25.1.2
BGP neighbor is 10.25.1.2,  remote AS 65000, ebgp link, Peer index 1
  BGP version 4, remote router ID 0.0.0.0
  BGP state = Idle, down for 00:32:34, retry in 00:00:01
  Last read never, hold time = 180, keepalive interval is 60 seconds
  Last written never, keepalive timer not running
  Received 7354 messages, 1 notifications, 0 bytes in queue
  Sent 7379 messages, 0 notifications, 0 bytes in queue
  Connections established 1, dropped 1
  Connection attempts 28
  Last reset by us 00:01:27, due to session closed
  Last reset by peer 00:32:34, due to maximum prefix count error
 Message statistics:
                              Sent               Rcvd
  Opens:                        31                  1  
  Notifications:                 0                  1  
  Updates:                      13                 20  
  Keepalives:                 7333               7330  
  Route Refresh:                 0                  0  
  Capability:                    2                  2  
  Total:                      7379               7354  
  Total bytes:              140687             140306  
  Bytes in queue:                0                  0
! Output omitted for brevity

BGP Max AS

Various attributes are, by default, assigned to every BGP prefix. The length of attributes that can be attached to a single prefix can grow up to size of 64 KB, which can cause scaling as well as convergence issues for BGP.

A lot of times, the as-path prepend option is used to increase the AS-PATH list to make a path with lower AS-PATH list preferred. This operation does not have much of an impact. But from the perspective of the Internet, a longer AS-PATH list cannot only cause convergence issues but can also cause security loopholes. The AS-PATH list actually signifies a router’s position on the Internet.

To limit the maximum number of AS-PATH length supported in the network, the maxas-limit command was introduced. Using the command maxas-limit 1-512 in NX-OS, any route with AS-PATH length higher than the specified number is discarded.

BGP Route Filtering and Route Policies

BGP, along with being scalable, also has the capability to provide route filtering, traffic engineering, and traffic load-sharing capabilities. BGP provides all these functionalities by defining route policies and route filters. This route filtering is defined using three methods:

  • Prefix-lists

  • Filter-lists

  • Route-maps

The BGP route-maps provide more dynamic capability as compared to prefix-lists and filter-lists, because it not only allows you to perform route filtering but also allows the network operators to define policies and set attributes that can be further used to control traffic flow within the network. All these route filtering and route policy methods are discussed in future sections.

Example 11-36 displays the BGP table of Nexus switch NX-2 in the topology shown in Figure 11-9. The NX-2 switch is  used as the base to demonstrate all the filtering techniques shown further in this chapter.

Example 11-36 BGP Table on NX-2

NX-2# show bgp ipv4 unicast | b Network
   Network            Next Hop      Metric  LocPrf  Weight Path
*>e100.1.1.0/24       10.25.1.5                          0 65001 100 {220} e
*>e100.1.2.0/24       10.25.1.5                          0 65001 100 {220} e
*>e100.1.3.0/24       10.25.1.5                          0 65001 100 {220} e
*>e100.1.4.0/24       10.25.1.5                          0 65001 100 {220} e
*>e100.1.5.0/24       10.25.1.5                          0 65001 100 {220} e
*>e100.1.6.0/24       10.25.1.5                          0 65001 100 {220} e
*>e100.1.7.0/24       10.25.1.5                          0 65001 100 {220} e
*>e100.1.8.0/24       10.25.1.5                          0 65001 100 {220} e
*>e100.1.9.0/24       10.25.1.5                          0 65001 100 {220} e
*>e100.1.10.0/24      10.25.1.5                          0 65001 100 {220} e
*>e100.1.11.0/24      10.25.1.5                          0 65001 100 292 {218 230}
*>e100.1.12.0/24      10.25.1.5                          0 65001 100 292 {218 230}
*>e100.1.13.0/24      10.25.1.5                          0 65001 100 292 {218 230}
*>e100.1.14.0/24      10.25.1.5                          0 65001 100 292 {218 230}
*>e100.1.15.0/24      10.25.1.5                          0 65001 100 292 {218 230}
*>e100.1.16.0/24      10.25.1.5                          0 65001 100 292 {218 230}
*>e100.1.17.0/24      10.25.1.5                          0 65001 100 292 {218 230}
*>e100.1.18.0/24      10.25.1.5                          0 65001 100 292 {218 230}
*>e100.1.19.0/24      10.25.1.5                          0 65001 100 292 {218 230}
*>e100.1.20.0/24      10.25.1.5                          0 65001 100 292 {218 230}
*>e100.1.21.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.22.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.23.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.24.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.25.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.26.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.27.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.28.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.29.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.30.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>i192.168.1.1/32     192.168.1.1           100          0 i
*>l192.168.2.2/32     0.0.0.0               100      32768 i
*>i192.168.3.3/32     192.168.3.3           100          0 i
*>i192.168.4.4/32     192.168.4.4           100          0 i
*>e192.168.5.5/32     10.25.1.5                          0 65001 i
*>e192.168.6.6/32     10.25.1.5                          0 65001 i
*>i192.168.44.0/24    192.168.4.4           100          0 i

Prefix-List-Based Filtering

As explained in Chapter 10, “Troubleshooting Nexus Route-Maps,” prefix lists provide another method of identifying networks in a routing protocol. They identify a specific IP address, network, or network range, and allow for the selection of multiple networks with a variety of prefix lengths (subnet masks) by using a prefix match specification.

The prefix-list can be applied directly to a BGP peer and also as a match statement within the route-map. A prefix-list is configured using the command ip prefix-list name [seq sequence-number] [permit ip-address/length | deny ip-address/length] [le length | ge length | eq length]. Examine the same topology as shown in Figure 11-6. Example 11-37 illustrates the configuration of BGP inbound and outbound route filtering using prefix-lists on NX-2. The inbound prefix-list permits for 5 networks, whereas the outbound prefix-list permits for host network entries is /32 prefixes matching in subnet 192.168.0.0/16. When the prefix-lists are configured, use the command show bgp afi safi neighbor ip-address to ensure that the prefix-lists have been attached to the neighbor.

Example 11-37 Prefix-List-Based Route Filtering

NX-2(config)# ip prefix-list Inbound permit 100.1.1.0/24
NX-2(config)# ip prefix-list Inbound permit 100.1.2.0/24
NX-2(config)# ip prefix-list Inbound permit 100.1.3.0/24
NX-2(config)# ip prefix-list Inbound permit 100.1.4.0/24
NX-2(config)# ip prefix-list Inbound permit 100.1.5.0/24
NX-2(config)#
NX-2(config)# ip prefix-list Outbound permit 192.168.0.0/16 eq 32
NX-2(config)# router bgp 65000
NX-2(config-router)# neighbor 10.25.1.5
NX-2(config-router-neighbor)# address-family ipv4 unicast
NX-2(config-router-neighbor-af)# prefix-list Inbound in
NX-2(config-router-neighbor-af)# prefix-list Outbound out
NX-2(config-router-neighbor-af)# end
NX-2# show bgp ipv4 unicast neighbors 10.25.1.5
BGP neighbor is 10.25.1.5,  remote AS 65001, ebgp link, Peer index 2
  BGP version 4, remote router ID 192.168.5.5
  BGP state = Established, up for 2d00h
! output omitted for brevity
  For address family: IPv4 Unicast
  BGP table version 1085, neighbor version 1085
  5 accepted paths consume 400 bytes of memory
  4 sent paths
  Inbound ip prefix-list configured is Inbound, handle obtained
 Outbound ip prefix-list configured is Outbound, handle obtained
  Last End-of-RIB received 1d23h after session start

  Local host: 10.25.1.2, Local port: 58236
  Foreign host: 10.25.1.5, Foreign port: 179
  fd = 74

Example 11-38 displays the output of the BGP table after the prefix-lists have been configured and attached to BGP neighbor 10.25.1.5. Notice that in this example, on the NX-2 switch, only 5 prefixes are seen from neighbor 10.25.1.5. On NX-5, all the loopback addresses of the nodes in AS 65000 are advertised apart from 192.168.44.0/24.

Example 11-38 BGP Table Output After Prefix-List Configuration

NX-2# show bgp ipv4 unicast
BGP routing table information for VRF default, address family IPv4 Unicast
BGP table version is 1085, local router ID is 192.168.2.2
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

   Network            Next Hop            Metric     LocPrf     Weight Path
*>e100.1.1.0/24       10.25.1.5                         0 65001 100 220
*>e100.1.2.0/24       10.25.1.5                         0 65001 100 220
*>e100.1.3.0/24       10.25.1.5                         0 65001 100 220
*>e100.1.4.0/24       10.25.1.5                         0 65001 100 220
*>e100.1.5.0/24       10.25.1.5                         0 65001 100 220
*>i192.168.1.1/32     192.168.1.1                       100          0 i
*>l192.168.2.2/32     0.0.0.0                           100      32768 i
*>i192.168.3.3/32     192.168.3.3                       100          0 i
*>i192.168.4.4/32     192.168.4.4                       100          0 i
NX-5# show bgp ipv4 unicast neighbors 10.25.1.2 routes
Peer 10.25.1.2 routes for address family IPv4 Unicast:
BGP table version is 1209, local router ID is 192.168.5.5
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist,
  I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

   Network            Next Hop            Metric     LocPrf     Weight Path
*>e192.168.1.1/32     10.25.1.2                         0 65000 i
*>e192.168.2.2/32     10.25.1.2                         0 65000 i
*>e192.168.3.3/32     10.25.1.2                         0 65000 i
*>e192.168.4.4/32     10.25.1.2                         0 65000 i

On the inbound direction on NX-2, use the command show bgp event-history detail to view the details of the prefixes being matched against the prefix-list Inbound. Based on the match, the prefixes are either permitted or denied. If no entry exists for the prefix in the prefix-list, it is dropped by BGP and will not be part of the BGP table. Example 11-39 displays the event-history detail output demonstrating how a prefix 100.1.30.0/24 is rejected or dropped by BGP prefix-list and the prefix 100.1.5.0/24 being permitted at the same time.

Example 11-39 BGP Event-History for Inbound Prefixes

! Event-History output for incoming prefixes
NX-2# show bgp event-history detail
14:54:41.278141: (default) UPD: [IPv4 Unicast] 10.25.1.5 processing EOR update from peer
14:54:41.278138: (default) UPD: 10.25.1.5 parsed UPDATE message from peer, len 29 , withdraw len 0, attr len 6, nlri len 0
14:54:41.278135: (default) UPD: Received UPDATE message from 10.25.1.5
14:54:41.278131: (default) UPD: [IPv4 Unicast] Dropping prefix 100.1.30.0/24 from peer 10.25.1.5, due to prefix policy rejected
14:54:41.278129: (default) UPD: [IPv4 Unicast] Prefix 100.1.30.0/24 from peer 10.25.1.5 rejected by inbound policy
14:54:41.278126: (default) UPD: [IPv4 Unicast] 10.25.1.5 Inbound ip prefix-list Inbound, action deny
14:54:41.278124: (default) UPD: [IPv4 Unicast] Received prefix 100.1.30.0/24 from peer 10.25.1.5, origin 1, next hop 10.25.1.5, localpref
0, med 0
14:54:41.278119: (default) UPD: [IPv4 Unicast] Dropping prefix 100.1.29.0/24 from peer 10.25.1.5, due to prefix policy rejected
14:54:41.278116: (default) UPD: [IPv4 Unicast] Prefix 100.1.29.0/24 from peer 10.25.1.5 rejected by inbound policy

! output omittied for brevity

14:54:41.277740: (default) BRIB: [IPv4 Unicast] (100.1.5.0/24 (10.25.1.5)): returning from bgp_brib_add, new_path: 0, change: 0, undelete:
 0, history: 0, force: 0, (pflags=0x28), reeval=0
14:54:41.277737: (default) BRIB: [IPv4 Unicast] 100.1.5.0/24 from 10.25.1.5 was already in BRIB with same attributes
14:54:41.277734: (default) BRIB: [IPv4 Unicast] (100.1.5.0/24 (10.25.1.5)): bgp_brib_add: handling nexthop
14:54:41.277731: (default) BRIB: [IPv4 Unicast] Path to 100.1.5.0/24 via 192.168.5.5 already exists, dflags=0x8001a
14:54:41.277728: (default) BRIB: [IPv4 Unicast] Installing prefix 100.1.5.0/24 (10.25.1.5) via 10.25.1.5  into BRIB with extcomm
14:54:41.277723: (default) UPD: [IPv4 Unicast] 10.25.1.5 Inbound ip prefix-list Inbound, action permit
14:54:41.277720: (default) UPD: [IPv4 Unicast] Received prefix 100.1.5.0/24 from peer 10.25.1.5, origin 1, next hop 10.25.1.5, localpref 0 , med 0

For the outbound direction, the show bgp event-history detail command output displays the prefixes in the BGP table being permitted and denied based on the matching entries in the outbound prefix-list named Outbound. After the filtering is performed, the prefixes are then advertised to the BGP peer along with relevant attributes, as shown in Example 11-40.

Example 11-40 BGP Event-History for Outbound Prefixes

NX-2# show bgp event-history detail
BGP event-history detail

17:53:22.110665: (default) UPD: [IPv4 Unicast] 10.25.1.5 192.168.44.0/24 path-id 1 not sent to peer due to: outbound policy
17:53:22.110659: (default) UPD: [IPv4 Unicast] 10.25.1.5 Outbound ip prefix-list Outbound, action deny
17:53:22.110649: (default) UPD: [IPv4 Unicast] 10.25.1.5 Created UPD msg (len 54) with prefix 192.168.4.4/32 ( Installed in HW) path-id 1 for peer
17:53:22.110643: (default) UPD: 10.25.1.5 Sending nexthop address 10.25.1.2 length 4
17:53:22.110638: (default) UPD: 10.25.1.5 Sending attr code 14, length 14, Mp-reach
17:53:22.110631: (default) UPD: 10.25.1.5 Sending attr code 2, length 6, AS-Path: <65000 >
17:53:22.110624: (default) UPD: 10.25.1.5 Sending attr code 1, length 1, Origin: IGP
17:53:22.110614: (default) UPD: [IPv4 Unicast] 10.25.1.5 Outbound ip prefix-list Outbound, action permit
17:53:22.110605: (default) UPD: [IPv4 Unicast] consider sending 192.168.4.4/32 to peer 10.25.1.5, path-id 1, best-ext is off

NX-OS also has CLI to verify policy-based statistics for prefix-lists. The statistics are verified for the policy implied in both inbound and outbound directions and shows the number of prefixes permitted and denied in either direction. Use the command show bgp afi safi policy statistics neighbor ip-address prefix-list [in | out] to view the policy statistics for prefix-list applied on a BGP neighbor. The counters of the policy statistics command increment every time a BGP neighbor flaps or a soft clear is performed on the neighbor. Example 11-41 demonstrates the use of a policy statistics command for BGP peer 10.25.1.5 in both inbound and outbound directions to understand how many prefixes are being permitted and dropped in both inbound and outbound directions. In this example, a soft clear is performed on the outbound direction, and it is seen that the counters increment for the outbound prefix-list policy statistics by 4 for permitted prefixes and 1 for a dropped prefix.

Example 11-41 BGP Policy Statistics for Prefix-List

NX-2# show bgp ipv4 unicast policy statistics neighbor 10.25.1.5 prefix-list in
Total count for neighbor rpm handles: 1

C: No. of comparisons, M: No. of matches

ip prefix-list Inbound seq 5 permit 100.1.1.0/24             M: 3
ip prefix-list Inbound seq 10 permit 100.1.2.0/24            M: 3
ip prefix-list Inbound seq 15 permit 100.1.3.0/24            M: 3
ip prefix-list Inbound seq 20 permit 100.1.4.0/24            M: 3
ip prefix-list Inbound seq 25 permit 100.1.5.0/24            M: 3

Total accept count for policy: 15
Total reject count for policy: 81
NX-2# show bgp ipv4 unicast policy statistics neighbor 10.25.1.5 prefix-list out
Total count for neighbor rpm handles: 1

C: No. of comparisions, M: No. of matches

ip prefix-list Outbound seq 5 permit 192.168.0.0/16 eq 32    M: 17

Total accept count for policy: 17
Total reject count for policy: 3
! Perform soft clear out on neighbor 10.25.1.5
NX-2# clear bgp ipv4 unicast 10.25.1.5 soft out
NX-2# show bgp ipv4 unicast policy statistics neighbor 10.25.1.5 prefix-list out
Total count for neighbor rpm handles: 1

C: No. of comparisons, M: No. of matches

ip prefix-list Outbound seq 5 permit 192.168.0.0/16 eq 32    M: 21

Total accept count for policy: 21
Total reject count for policy: 4

If at any point there is a problem noticed with an inbound or outbound prefix-list on a BGP neighbor, verify the route policy manager (RPM) NX-OS component. The first step of verification is to ensure that the prefix-list is attached to the BGP process or not. This is verified using the command show system internal rpm ip-prefix-list. This command displays the name of the prefix-list and its client information. This command also displays the number of entries present in the prefix-list.

After verifying the prefix-list and its clients, use the command show system internal rpm event-history rsw to ensure the correct prefix-list has been bound to the BGP process. An incorrect binding or a missing binding event-history log can indicate that the prefix-list is not properly associated with the BGP process or the BGP neighbor.

Example 11-42 shows the output of both the preceding commands.

Example 11-42 RPM Client Info and Event-History for Prefix-Lists

NX-2# show system internal rpm ip-prefix-list
Policy name: Inbound           Type: ip prefix-list
Version: 6                     State: Ready
Ref. count: 1                  PBR refcount: 0
Stmt count: 5                  Last stmt seq: 25
Set nhop cmd count: 0          Set vrf cmd count: 0
Set intf cmd count: 0          Flags: 0x00000003
PPF nodeid: 0x00000000         Config refcount: 0
PBR Stats: No
Clients:
    bgp-65000 (Route filtering/redistribution)    ACN version: 0

Policy name: Outbound          Type: ip prefix-list
Version: 2                     State: Ready
Ref. count: 1                  PBR refcount: 0
Stmt count: 1                  Last stmt seq: 5
Set nhop cmd count: 0          Set vrf cmd count: 0
Set intf cmd count: 0          Flags: 0x00000003
PPF nodeid: 0x00000000         Config refcount: 0
PBR Stats: No
Clients:
    bgp-65000 (Route filtering/redistribution)    ACN version: 0
NX-2# show system internal rpm event-history rsw

Routing software interaction logs of RPM
1) Event:E_DEBUG, length:81, at 104214 usecs after Sun Sep 17 06:01:47 2017
    [120] [5736]: Bind ack sent - client bgp-65000 uuid 0x0000011b for policy Outbound
2) Event:E_DEBUG, length:76, at 104179 usecs after Sun Sep 17 06:01:47 2017
    [120] [5736]: Bind request - client bgp-65000 uuid 0x0000011b policy Outbound
3) Event:E_DEBUG, length:80, at 169619 usecs after Sun Sep 17 06:01:42 2017
    [120] [5736]: Bind ack sent - client bgp-65000 uuid 0x0000011b for policy Inbound
4) Event:E_DEBUG, length:75, at 169469 usecs after Sun Sep 17 06:01:42 2017
    [120] [5736]: Bind request - client bgp-65000 uuid 0x0000011b policy Inbound

Filter-Lists

BGP filter-lists allow for filtering of prefixes based on AS-Path lists. A BGP filter-list can be applied in both inbound and outbound directions. A BGP filter-list is configured using the command filter-list as-path-list-name [in | out] under the neighbor address-family configuration mode. Example 11-43 illustrates a sample configuration of filter-list on NX-2 switch in the topology referenced in Figure 11-6. In this example, an inbound filter-list is configured to allow the prefixes that have AS 274 in the AS_PATH list. The second output of the example shows that the filter-list is applied on the inbound direction.

Example 11-43 BGP Filter-Lists

NX-2(config)# ip as-path access-list ALLOW_274 permit 274
NX-2(config)# router bgp 65000
NX-2(config-router)# neighbor 10.25.1.5
NX-2(config-router-neighbor)# address-family ipv4 unicast
NX-2(config-router-neighbor-af)# filter-list ALLOW_274 in
NX-2(config-router-neighbor-af)# end
NX-2# show bgp ipv4 unicast neighbors 10.25.1.5
BGP neighbor is 10.25.1.5,  remote AS 65001, ebgp link, Peer index 2
  BGP version 4, remote router ID 192.168.5.5
  BGP state = Established, up for 2d00h
! output omitted for brevity
  For address family: IPv4 Unicast
  BGP table version 1085, neighbor version 1085
  5 accepted paths consume 400 bytes of memory
  4 sent paths
  Inbound as-path-list configured is ALLOW_274, handle obtained
  Outbound ip prefix-list configured is Outbound, handle obtained
  Last End-of-RIB received 1d23h after session start

  Local host: 10.25.1.2, Local port: 58236
  Foreign host: 10.25.1.5, Foreign port: 179
  fd = 74

Note

AS-Path access-list is discussed later in this chapter.

Example 11-44 displays the prefixes in the BGP table received from peer 10.25.1.5 after being filtered by the filter-list. Notice that all the prefixes shown in the BGP table have AS 274 in their AS_PATH list.

Example 11-44 BGP Table with Filter-List Applied

! Output after configuring filter-list
NX-2# show bgp ipv4 unicast neighbor 10.25.1.5 routes
! Output omitted for brevity
   Network            Next Hop      Metric  LocPrf  Weight Path
*>e100.1.21.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.22.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.23.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.24.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.25.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.26.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.27.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.28.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.29.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}
*>e100.1.30.0/24      10.25.1.5                          0 65001 100 228 274 {300 243}

Note

If a BGP peer is configured with the soft-reconfiguration inbound command, you can also use the command show bgp afi safi neighbor ip-address received-routes to view the received BGP prefixes.

The easiest way to verify which prefixes are being permitted and denied is to use the show bgp event-history detail command output, but if the event-history detail command is not enabled under the router bgp configuration, you can enable debugs to verify the updates. The debug bgp updates command can be used to verify both the inbound and the outbound updates. Example 11-45 demonstrates the use of debug bgp updates to verify which prefixes are being permitted and which are being denied. The action of permit or deny is always based on the entries present in the AS-path list.

Example 11-45 debug bgp updates Output

NX-2# debug logfile bgp
NX-2# debug bgp updates
NX-2# clear bgp ipv4 unicast 10.25.1.5 soft in
NX-2# show debug logfile bgp
21:39:01.721587 bgp: 65000 [10743] (default) UPD: [IPv4 Unicast] 10.25.1.5 Inbound as-path-list ALLOW_274, action deny
21:39:01.721622 bgp: 65000 [10743] (default) UPD: [IPv4 Unicast] Received prefix 100.1.1.0/24 from peer 10.25.1.5, origin 1, next hop 10.25.1.5, localpref 0, med 0
21:39:01.721649 bgp: 65000 [10743] (default) UPD: [IPv4 Unicast] Dropping prefix 100.1.1.0/24 from peer 10.25.1.5, due to attribute policy rejected
21:39:01.721678 bgp: 65000 [10743] (default) UPD: [IPv4 Unicast] Received prefix 100.1.2.0/24 from peer 10.25.1.5, origin 1, next hop 10.25.1.5, localpref 0, med 0
21:39:01.721702 bgp: 65000 [10743] (default) UPD: [IPv4 Unicast] Dropping prefix 100.1.2.0/24 from peer 10.25.1.5, due to attribute policy rejected
! Output omittied for brevity
21:39:01.723538 bgp: 65000 [10743] (default) UPD: [IPv4 Unicast] 10.25.1.5 Inbound as-path-list ALLOW_274, action permit
21:39:01.723592 bgp: 65000 [10743] (default) UPD: [IPv4 Unicast] Received prefix 100.1.21.0/24 from peer 10.25.1.5, origin 1, next hop 10.25.1.5, localpref 0, med 0
21:39:01.723687 bgp: 65000 [10743] (default) UPD: [IPv4 Unicast] Received prefix 100.1.22.0/24 from peer 10.25.1.5, origin 1, next hop 10.25.1.5, localpref 0, med 0

Similar to policy statistics for prefix-lists, the statistics are also available for filter-list entries. When executing the command show bgp afi safi policy statistics neighbor ip-address filter-list [in | out], notice the relevant AS-path access list referenced as part of the filter-list command and the number of matches per each entry. The output also displays the number of accepted and rejected prefixes by the filter-list, as displayed in Example 11-46.

Example 11-46 BGP Filter-Lists

NX-2# show bgp ipv4 unicast policy statistics neighbor 10.25.1.5 filter-list in
Total count for neighbor rpm handles: 1

C: No. of comparisons, M: No. of matches

ip as-path access-list ALLOW_274 permit "274"                C: 5      M: 1

Total accept count for policy: 1
Total reject count for policy: 4

Because the filter-list uses AS-path access-list, RPM information can be verified for as-path-access-list using the command show system internal rpm as-path-access-list as-path-acl-name. This command confirms if the AS-path access-list is associated with the BGP process. The command show system internal rpm event-history rsw is used to validate if the AS-path access-list is bound to the BGP process. Example 11-47 displays both the command outputs.

Example 11-47 BGP Filter-Lists

NX-2# show system internal rpm as-path-access-list ALLOW_274
Policy name: ALLOW_274         Type: as-path-list
Version: 2                     State: Ready
Ref. count: 1                  PBR refcount: 0
Stmt count: 1                  Last stmt seq: 1
Set nhop cmd count: 0          Set vrf cmd count: 0
Set intf cmd count: 0          Flags: 0x00000003
PPF nodeid: 0x00000000         Config refcount: 0
PBR Stats: No
Clients:
    bgp-65000 (Route filtering/redistribution)    ACN version: 0
! RPM Event-History
NX-2# show system internal rpm event-history rsw

Routing software interaction logs of RPM
1) Event:E_DEBUG, length:82, at 684846 usecs after Sun Sep 17 19:46:46 2017
    [120] [5736]: Bind ack sent - client bgp-65000 uuid 0x0000011b for policy ALLOW_274
2) Event:E_DEBUG, length:77, at 684797 usecs after Sun Sep 17 19:46:46 2017
    [120] [5736]: Bind request - client bgp-65000 uuid 0x0000011b policy ALLOW_274

BGP Route-Maps

BGP uses route-maps to provide route filtering capability and traffic engineering by setting various attributes to the prefixes that help control the inbound and outbound traffic. Route-maps typically use some form of conditional matching so that only certain prefixes are blocked or accepted. At the simplest level, route-maps can filter networks similar to an AS-Path filter/prefix-list, but also provide additional capability by adding or modifying a network attribute. Route-maps are referenced to a specific route-advertisement or BGP neighbor and require specifying the direction of the advertisement (inbound/outbound). Route-maps are a critical component of BGP because they allow for a unique routing policy on a neighbor-by-neighbor basis.

Example 11-48 illustrates a sample configuration of a multisequence route-map that is applied to a neighbor. Notice that in this example, the route-map sequence 10 is matching on prefix-list to match certain set of prefixes and on sequence 20 matches AS-Path access-list. Note that there is no sequence 30. Absence of any other entry in the route-map acts as an implicit deny statement and denies all prefixes.

Example 11-48 BGP Route-Map Configuration

NX-2(config)# route-map Inbound-RM permit 10
NX-2(config-route-map)# match ip address prefix-list Inbound
NX-2(config-route-map)# set local-preference 200
NX-2(config-route-map)# exit
NX-2(config)# route-map Inbound-RM permit 20
NX-2(config-route-map)# match as-path AlLOW_274
NX-2(config-route-map)# set local-preference 300
NX-2(config-route-map)# exit
! The above referenced Prefix-list and AS-Path Access-list were shown in previous
! examples
NX-2(config)# router bgp 65000
NX-2(config-router)# neighbor 10.25.1.5
NX-2(config-router-neighbor)# address-family ipv4 unicast
NX-2(config-router-neighbor-af)# route-map Inbound-RM in
NX-2(config-router-neighbor-af)# end

Example 11-49 shows the BGP table after inbound route-map filtering. Notice that the prefixes 100.1.1.0/24 to 100.1.5.0/24 are set with the local preference of 200, whereas the prefixes that match AS 274 in the AS-path list are set with the local preference of 300. Because there is no route-map entry matching sequence 30, all the other prefixes are denied by the inbound route-map filtering.

Example 11-49 BGP Table Output with Route-Map Filtering

NX-2# show bgp ipv4 unicast neighbor 10.25.1.5 routes
BGP routing table information for VRF default, address family IPv4 Unicast
BGP table version is 1141, local router ID is 192.168.2.2
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup
   Network            Next Hop      Metric  LocPrf  Weight Path
*>e100.1.1.0/24       10.25.1.5             200          0 65001 100 {220} e
*>e100.1.2.0/24       10.25.1.5             200          0 65001 100 {220} e
*>e100.1.3.0/24       10.25.1.5             200          0 65001 100 {220} e
*>e100.1.4.0/24       10.25.1.5             200          0 65001 100 {220} e
*>e100.1.5.0/24       10.25.1.5             200          0 65001 100 {220} e
*>e100.1.21.0/24      10.25.1.5             300          0 65001 100 228 274 {300 243}
*>e100.1.25.0/24      10.25.1.5             300          0 65001 100 228 274 {300 243}
*>e100.1.26.0/24      10.25.1.5             300          0 65001 100 228 274 {300 243}
*>e100.1.27.0/24      10.25.1.5             300          0 65001 100 228 274 {300 243}
*>e100.1.28.0/24      10.25.1.5             300          0 65001 100 228 274 {300 243}
*>e100.1.29.0/24      10.25.1.5             300          0 65001 100 228 274 {300 243}
*>e100.1.30.0/24      10.25.1.5             300          0 65001 100 228 274 {300 243}

The show bgp event-history detail command can be used again to verify which prefixes are being permitted or denied based on the route-map policy. Based on the underlying match statements, relevant set actions are taken (if any). Example 11-50 displays the event-history detail output demonstrating prefixes being permitted and denied by route-map.

Example 11-50 BGP Event-History

NX-2# show bgp event-history detail
04:36:32.954809: (default) BRIB: [IPv4 Unicast] Installing prefix 100.1.21.0/24
(10.25.1.5) via 10.25.1.5  into BRIB with extcomm
04:36:32.954796: (default) UPD: [IPv4 Unicast] 10.25.1.5 Inbound route-map Inbound-RM, action permit
04:36:32.954763: (default) UPD: [IPv4 Unicast] Received prefix 100.1.21.0/24 from
peer 10.25.1.5, origin 1, next hop 10.25.1.5, localpref 0, med 0

! Output omitted for brevity
04:36:32.954690: (default) UPD: [IPv4 Unicast] Dropping prefix 100.1.20.0/24 from
peer 10.25.1.5, due to prefix policy rejected
04:36:32.954684: (default) UPD: [IPv4 Unicast] Prefix 100.1.20.0/24 from peer
10.25.1.5 rejected by inbound policy
04:36:32.954679: (default) UPD: [IPv4 Unicast] 10.25.1.5 Inbound route-map Inbound-RM, action deny
04:36:32.954647: (default) UPD: [IPv4 Unicast] Received prefix 100.1.20.0/24 from
peer 10.25.1.5, origin 1, next hop 10.25.1.5, localpref 0, med 0

You can also validate the policy statistics for the route-map similar to prefix-list and filter-list. The command show bgp ipv4 unicast policy statistics neighbor ip-address route-map [in | out] displays the matching prefix-list or AS-path access-list or any other attributes under each route-map sequence and its matching statistics, as shown in Example 11-51.

Example 11-51 BGP Policy Statistics for Route-Map

NX-2# show bgp ipv4 unicast policy statistics neighbor 10.25.1.5 route-map in
Total count for neighbor rpm handles: 1

C: No. of comparisons, M: No. of matches

route-map Inbound-RM permit 10
  match ip address prefix-list Inbound                       C: 52     M: 5
route-map Inbound-RM permit 20
  match as-path ALLOW_274                                    C: 47     M: 30

Total accept count for policy: 35
Total reject count for policy: 17

Within route-maps, various conditional matching features are used, such as prefix-lists, regular expressions (regex), AS-Path access-list, BGP communities, and community-lists. When multiple filtering mechanisms are configured under the same neighbor, the following order of preference is used for both inbound and outbound filtering:

  • Inbound Filtering

    • Route-map

    • Filter-list

    • Prefix-list, distribute-list

  • Outbound Filtering

    • Filter-list

    • Route-map

    • Advertise-map (conditional advertisement)

    • Prefix-list, distribute-list

Network prefixes are conditionally matched by a variety of routing protocol attributes, but the following sections explain the most common techniques for conditionally matching a prefix.

Regular Expressions (RegEx)

There may be times when conditionally matching network prefixes is too complicated, and identifying all routes from a specific organization is preferred. In this manner, path selection is made off of the BGP AS-Path.

To parse through the large amount of available ASNs (4,294,967,295), regular expressions (regex) are used. Regular expressions are based upon query modifiers to select the appropriate content. The BGP table is parsed with regex using the command show bgp afi safi regexp “regex-pattern” on Nexus switches.

Note

NX-OS devices require the regex-pattern to be placed within a pair of double-quotes “”.

Table 11-7 provides a brief list and description of the common regex query modifiers.

Table 11-7 RegEx Query Modifiers

Modifier

Description

_  (Underscore)

Matches a space

^  (Caret)

Indicates the start of the string

$  (Dollar Sign)

Indicates the end of the string

[]  (Brackets)

Matches a single character or nesting within a range

- (Hyphen)

Indicates a range of numbers in brackets

[^]  (Caret in Brackets)

Excludes the characters listed in brackets

() (Parentheses)

Used for nesting of search patterns

|   (Pipe)

Provides or functionality to the query

. (Period)

Matches a single character, including a space

*  (Asterisk)

Matches zero or more characters or patterns

+  (Plus Sign)

One or more instances of the character or pattern

? (Question Mark)

Matches one or no instances of the character or pattern

Note

The .^$*+()[]? characters are special control characters that cannot be used without using the backslash escape character. For example, to match on the * in the output use the * syntax.

The following section provides a variety of common tasks to help demonstrate each of the regex modifiers. Example 11-52 provides a reference BGP table for displaying scenarios of each regex query modifier for querying the prefixes learned via Figure 11-10.

Image

Figure 11-10 BGP Regex Reference Topology

Example 11-52 BGP Table for Regex Queries

NX-2# show bgp ipv4 unicast
! Output omitted for brevity
     Network          Next Hop      Metric LocPrf Weight Path
*>e172.16.0.0/24     172.32.23.3      0             0 300 80 90 21003 2100 i
*>e172.16.4.0/23     172.32.23.3      0             0 300 878 1190 1100 1010 i
*>e172.16.16.0/22    172.32.23.3      0             0 300 779 21234 45 i
*>e172.16.99.0/24    172.32.23.3      0             0 300 145 40 i
*>e172.16.129.0/24   172.32.23.3      0             0 300 10010 300 1010 40 50 i
*>e192.168.0.0/16    172.16.12.1      0             0 100 80 90 21003 2100 i
*>e192.168.4.0/23    172.16.12.1      0             0 100 878 1190 1100 1010 i
*>e192.168.16.0/22   172.16.12.1      0             0 100 779 21234 45 i
*>e192.168.99.0/24   172.16.12.1      0             0 100 145 40 i
*>e192.168.129.0/24  172.16.12.1      0             0 100 10010 300 1010 40 50 i

Note

The AS-Path for the prefix 172.16.129.0/24 has the AS 300 twice nonconsecutively for a specific purpose. This is not seen in real life, because it indicates a routing loop.

_ Underscore

Query Modifier Function: Matches a space

Scenario: Only display ASs that passed through AS 100. The first assumption is that the syntax show bgp ipv4 unicast regex “100” as shown in Example 11-53 is ideal. The regex query includes the following unwanted ASNs: 1100, 2100, 21003, and 10010.

Example 11-53 BGP Regex Query for AS 100

NX-2# show bgp ipv4 unicast regex "100"
! Output omitted for brevity
     Network          Next Hop    Metric LocPrf Weight Path
*>e172.16.0.0/24     172.32.23.3      0             0 300 80 90 21003 2100 i
*>e172.16.4.0/23     172.32.23.3      0             0 300 878 1190 1100 1010 i
*>e172.16.129.0/24   172.32.23.3      0             0 300 10010 300 1010 40 50 i
*>e192.168.0.0/16    172.16.12.1      0             0 100 80 90 21003 2100 i
*>e192.168.4.0/23    172.16.12.1      0             0 100 878 1190 1100 1010 i
*>e192.168.16.0/22   172.16.12.1      0             0 100 779 21234 45 i
*>e192.168.99.0/24   172.16.12.1      0             0 100 145 40 i
*>e192.168.129.0/24  172.16.12.1      0             0 100 10010 300 1010 40 50 i

Example 11-54 uses the underscore (_) to imply a space left of the 100 to remove the unwanted ASNs. The regex query includes the following unwanted ASNs: 10010.

Example 11-54 BGP Regex Query for AS _100

NX-2# show bgp ipv4 unicast regexp "_100"
! Output omitted for brevity
     Network          Next Hop    Metric LocPrf Weight Path
     Network          Next Hop    Metric LocPrf Weight Path
*>e172.16.129.0/24   172.32.23.3      0             0 300 10010 300 1010 40 50 i
*>e192.168.0.0/16    172.16.12.1      0             0 100 80 90 21003 2100 i
*>e192.168.4.0/23    172.16.12.1      0             0 100 878 1190 1100 1010 i
*>e192.168.16.0/22   172.16.12.1      0             0 100 779 21234 45 i
*>e192.168.99.0/24   172.16.12.1      0             0 100 145 40 i
*>e192.168.129.0/24  172.16.12.1      0             0 100 10010 300 1010 40 i

Example 11-55 provides the final query by using the underscore (_) before and after the ASN (100) to finalize the query for the route that passes through AS 100.

Example 11-55 BGP Regex Query for AS _100_

NX-2# show bgp ipv4 unicast regexp "_100_"
! Output omitted for brevity
     Network          Next Hop    Metric LocPrf Weight Path
*>e192.168.0.0/16    172.16.12.1      0             0 100 80 90 21003 2100 i
*>e192.168.4.0/23    172.16.12.1      0             0 100 878 1190 1100 1010 i
*>e192.168.16.0/22   172.16.12.1      0             0 100 779 21234 45 i
*>e192.168.99.0/24   172.16.12.1      0             0 100 145 40 i
*>e192.168.129.0/24  172.16.12.1      0             0 100 10010 300 1010 40 50 i
^ Caret

Query Modifier Function: Indicates the start of the string.

Scenario: Only display routes that were advertised from AS 300. At first glance, the command show bgp ipv4 unicast regex “_300_” might be acceptable for use, but in Example 11-56 the route 192.168.129.0/24 is also included.

Example 11-56 BGP Regex Query for AS 300

NX-2# show bgp ipv4 unicast regexp "_300_"
! Output omitted for brevity
     Network          Next Hop      Metric LocPrf Weight Path
*>e172.16.0.0/24     172.32.23.3        0             0 300 80 90 21003 2100 i
*>e172.16.4.0/23     172.32.23.3        0             0 300 878 1190 1100 1010 i
*>e172.16.16.0/22    172.32.23.3        0             0 300 779 21234 45 i
*>e172.16.99.0/24    172.32.23.3        0             0 300 145 40 i
*>e172.16.129.0/24   172.32.23.3        0             0 300 10010 300 1010 40 50 i
*>e192.168.129.0/24  172.16.12.1        0             0 100 10010 300 1010 40 50 i

Because AS 300 is directly connected, it is more efficient to ensure that AS 300 was the first AS listed. Example 11-57 shows the caret (^) in the regex pattern.

Example 11-57 BGP Regex Query with Caret

NX-2# show bgp ipv4 unicast regexp "^300_"
! Output omitted for brevity
     Network          Next Hop      Metric LocPrf Weight Path
*>e172.16.0.0/24    172.32.23.3        0             0 300 80 90 21003 2100 i
*>e172.16.4.0/23    172.32.23.3        0             0 300 878 1190 1100 1010 i
*>e172.16.16.0/22   172.32.23.3        0             0 300 779 21234 45 i
*>e172.16.99.0/24   172.32.23.3        0             0 300 145 40 i
*>e172.16.129.0/24  172.32.23.3        0             0 300 10010 300 1010 40 50 i
$ Dollar Sign

Query Modifier Function: Indicates the end of the string.

Scenario: Only display routes that originated in AS 40. In Example 11-58 the regex pattern “_40_” was used. Unfortunately, this also includes routes that originated in AS 50.

Example 11-58 BGP Regex Query with AS 40

NX-2# show bgp ipv4 unicast regexp "_40_"
! Output omitted for brevity
     Network          Next Hop    Metric  LocPrf Weight Path
*>e172.16.99.0/24   172.32.23.3      0             0 300 145 40 i
*>e172.16.129.0/24  172.32.23.3      0             0 300 10010 300 1010 40 50 i
*>e192.168.99.0     172.16.12.1      0             0 100 145 40 i
*>e192.168.129.0    172.16.12.1      0             0 100 10010 300 1010 40 50 i

Example 11-59 provides the solution using the dollar sign ($) for the regex the pattern “_40$”.

Example 11-59 BGP Regex Query with Dollar Sign

NX-2# show bgp ipv4 unicast regexp "_40$"
! Output omitted for brevity
     Network          Next Hop    Metric  LocPrf Weight Path
*>e172.16.99.0/24   172.32.23.3      0             0 300 145 40 i
*>e192.168.99.0     172.16.12.1      0    100      0 100 145 40 i
[ ] Brackets

Query Modifier Function: Matches a single character or nesting within a range.

Scenario: Only display routes with an AS that contains 11 or 14 in it. The regex filter “1[14]” can be used as shown in Example 11-60.

Example 11-60 BGP Regex Query with Brackets

NX-2# show bgp ipv4 unicast regexp "1[14]"
! Output omitted for brevity
     Network          Next Hop    Metric LocPrf Weight Path
*>e172.16.4.0/23    172.32.23.3      0             0 300 878 1190 1100 1010 i
*>e172.16.99.0/24   172.32.23.3      0             0 300 145 40 i
*>e192.168.4.0/23   172.16.12.1      0             0 100 878 1190 1100 1010 i
*>e192.168.99.0     172.16.12.1      0             0 100 145 40 i
- Hyphen

Query Modifier Function: Indicates a range of numbers in brackets.

Scenario: Only display routes with the last two digits of the AS of 40, 50, 60, 70, or 80. Example 11-61 uses the regex query “[5-8]0_”. See the output in Example 11-60.

Example 11-61 BGP Regex Query with Hyphen

NX-2# show bgp ipv4 unicast regexp "[4-8]0_"
! Output omitted for brevity
     Network          Next Hop    Metric LocPrf Weight Path
*>e172.16.0.0/24    172.32.23.3      0             0 300 80 90 21003 2100 i
*>e172.16.99.0/24   172.32.23.3      0             0 300 145 40 i
*>e172.16.129.0/24  172.32.23.3      0             0 300 10010 300 1010 40 50 i
*>e192.168.0.0      172.16.12.1      0             0 100 80 90 21003 2100 i
*>e192.168.99.0     172.16.12.1      0             0 100 145 40 i
*>e192.168.129.0    172.16.12.1      0             0 100 10010 300 1010 40 50 i
[^] Caret in Brackets

Query Modifier Function: Excludes the character listed in brackets.

Scenario: Only display routes where the second AS from AS 100 or AS 300 does not start with 3, 4, 5, 6, 7, or 8. The first component of the regex query is to restrict the AS to the AS 100 or 300 with the regex query “^[13]00_”, and the second component is to filter out AS starting with 3-8 with the regex filter “_[^3-8]”. The complete regex query is “^[13]00_[^3-8]” as shown in Example 11-62.

Example 11-62 BGP Regex Query with Caret in Brackets

NX-2# show bgp ipv4 unicast regexp "^[13]00_[^3-8]"
! Output omitted for brevity
     Network          Next Hop    Metric LocPrf Weight Path
*>e172.16.99.0/24   172.32.23.3      0             0 300 145 40 i
*>e172.16.129.0/24  172.32.23.3      0             0 300 10010 300 1010 40 50 i
*>e192.168.99.0     172.16.12.1      0             0 100 145 40 i
*>e192.168.129.0    172.16.12.1      0             0 100 10010 300 1010 40 50 i
( ) Parentheses and | Pipe

Query Modifier Function: Nesting of search patterns and provides or functionality.

Scenario: Only display routes where the AS_PATH ends with AS 40 or 45 in it. The regex filter “_4(5|0)$” is shown in Example 11-63.

Example 11-63 BGP Regex Query with Parentheses

NX-2# show bgp ipv4 unicast regexp "_4(5|0)$"
! Output omitted for brevity
     Network          Next Hop     Metric LocPrf Weight Path
*>e172.16.16.0/22   172.32.23.3       0             0 300 779 21234 45 i
*>e172.16.99.0/24   172.32.23.3       0             0 300 145 40 i
*>e192.168.16.0/22  172.16.12.1       0             0 100 779 21234 45 i
*>e192.168.99.0     172.16.12.1       0             0 100 145 40 i
. Period

Query Modifier Function: Matches a single character, including a space.

Scenario: Only display routes with an originating AS of 1–99. In Example 11-64, the regex query “_..$” requires a space, and then any character after that (including other spaces).

Example 11-64 BGP Regex Query with Period

NX-2# show bgp ipv4 unicast regexp "_..$"
! Output omitted for brevity
     Network          Next Hop    Metric LocPrf Weight Path
*>e172.16.16.0/22   172.32.23.3      0             0 300 779 21234 45 i
*>e172.16.99.0/24   172.32.23.3      0             0 300 145 40 i
*>e172.16.129.0/24  172.32.23.3      0             0 300 10010 300 1010 40 50 i
*>e192.168.16.0/22  172.16.12.1      0    100      0 100 779 21234 45 i
*>e192.168.99.0     172.16.12.1      0    100      0 100 145 40 i
*>e192.168.129.0    172.16.12.1      0    100      0 100 10010 300 1010 40 50 i
+ Plus Sign

Query Modifier Function: One or more instances of the character or pattern.

Scenario: Only display routes where they contain at least one or more ‘11’ in the AS path. The regex pattern is “(11)+” as shown in Example 11-65.

Example 11-65 BGP Regex Query with Plus Sign

NX-2# show bgp ipv4 unicast regexp "(10)+[^(100)]"
! Output omitted for brevity
     Network          Next Hop     Metric LocPrf Weight Path
*>e172.16.4.0/23     172.32.23.3      0             0 300 878 1190 1100 1010 i
*>e192.168.4.0/23   172.16.12.1       0             0 100 878 1190 1100 1010 i
? Question Mark

Query Modifier Function: Matches one or no instances of the character or pattern.

Scenario: Only display routes from the neighboring AS or its directly connected AS (that is, restrict to two ASs away). This query is more complicated and requires you to define an initial query for identifying the AS, which is “[0-9]+”. The second component includes the space and an optional second AS. The “?” limits the AS match to one or two ASs as shown in Example 11-66.

Note

The CTRL+V escape sequence must be used before entering the ?.

Example 11-66 BGP Regex Query with Dollar Sign

NX-2# show bgp ipv4 unicast regexp "^[0-9]+ ([0-9]+)?$"
! Output omitted for brevity
     Network          Next Hop       Metric LocPrf Weight Path
*>e172.16.99.0/24   172.32.23.3         0             0 300 40 i
*>e192.168.99.0     172.16.12.1         0    100      0 100 40 i
* Asterisk

Query Modifier Function: Matches zero or more characters or patterns.

Scenario: Display all routes from any AS. This may seem like a useless task, but may be a valid requirement when using AS-Path access lists, which are explained later in this chapter. Example 11-67 shows the regex query.

Example 11-67 BGP Regex Query with Asterisk

NX-2# show bgp ipv4 unicast regexp ".*"
! Output omitted for brevity
     Network          Next Hop   Metric LocPrf Weight Path
*>e172.16.0.0/24    172.32.23.3     0             0 300 80 90 21003 2100 i
*>e172.16.4.0/23    172.32.23.3     0             0 300 1080 1090 1100 1110 i
*>e172.16.16.0/22   172.32.23.3     0             0 300 11234 21234 31234 i
*>e172.16.99.0/24   172.32.23.3     0             0 300 40 i
*> 172.16.129.0/24  172.32.23.3     0             0 300 10010 300 30010 30050 i
*>e192.168.0.0      172.16.12.1     0    100      0 100 80 90 21003 2100 i
*>e192.168.4.0/23   172.16.12.1     0    100      0 100 1080 1090 1100 1110 i
*>e192.168.16.0/22  172.16.12.1     0    100      0 100 11234 21234 31234 i
*>e192.168.99.0     172.16.12.1     0    100      0 100 40 i
*>e192.168.129.0    172.16.12.1     0    100      0 100 10010 300 30010 30050 i

AS-Path Access List

Selecting routes by using the AS_Path in a route-map requires the definition of an AS-path access-list (AS-path ACL). Processing is peformed in a sequential top-down order, and the first qualifying match processes against the appropriate permit or deny action. An implicit deny exists at the end of the AS-Path ACL. IOS supports up to 500 AS-path ACLs and uses the command ip as-path access-list acl-number {deny | permit} regex-query for creating the as-path access-list.

Example 11-68 provides two sample AS-Path access lists. AS-Path access-list 1 matches against any local IBGP prefix, or any prefix that passes through AS 300 where as AS-Path access-list 2 provides a more complicated AS-Path access list that matches the 16-bit private ASN range (64,512 – 65,536).

Example 11-68 AS-Path Access List Configuration

ip as-path access-list 1 permit _300_
ip as-path access-list 1 permit ^$
ip as-path access-list 2 permit _(6451[2-9])_
ip as-path access-list 2 permit _(645[2-9][0-9])_
ip as-path access-list 2 permit _(64[6-9][0-9][0-9])_
ip as-path access-list 2 permit _(65[0-4][0-9][0-9])_
ip as-path access-list 2 permit _(655[0-2][0-9])_
ip as-path access-list 2 permit _(6553[0-6])_

BGP Communities

BGP communities provide additional capability for tagging routes and are considered either well-known or private BGP communities. Private BGP communities are used for conditional matching for a router’s route-policy, which could influence routes during inbound or outbound route-policy processing. There are four well-known communities that affect only outbound route-advertisement:

  • No-Advertise: The No_Advertise community (0xFFFFFF02 or 4,294,967,042) specifies that routes with this community should not be advertised to any BGP peer. The No-Advertise BGP community can be advertised from an upstream BGP peer or locally with an inbound BGP policy. In either method, the No-Advertise community is set in the BGP Loc-RIB table that affects outbound route advertisement.

  • No-Export: The No_Export community (0xFFFFFF01 or 4,294,967,041) specifies that when a route is received with this community, the route is not advertised to any EBGP peer. If the router receiving the No-Export route is a confederation member, the route is advertised to other sub ASs in the confederation.

  • Local-AS: The No_Export_SubConfed community (0xFFFFFF03 or 4,294,967,043) known as the Local-AS community specifies that a route with this community is not advertised outside of the local AS. If the router receiving a route with the Local-AS community is a confederation member, the route is advertised only within the sub-AS (Member-AS) and is not advertised between Member-ASs.

  • Internet: Advertise this route to the Internet community and all the routers that belong to it.

The private community value is of the format (as-number:16-bit-number). Conditionally matching BGP communities allows for selection of routes based upon the BGP communities within the route’s path attributes so that selective processing occurs in a route-map.

NX-OS devices do not advertise BGP communities to peers by default. Communities are enabled on a neighbor-by-neighbor basis with the BGP address-family configuration command send-community [standard | extended | both] under the neighbor’s address family configuration. Standard communities are sent by default, unless the optional extended or both keywords are used.

Conditionally matching on NX-OS devices requires the creation of a community list. A community list shares a similar structure to an ACL, is standard or expanded, and is referenced via number or name. Standard community lists match either well-known communities or a private community number (as-number:16-bit-number), whereas Expanded community lists use regex patterns.

Examining the same topology as shown in Figure 11-6. In this topology, NX-5 assigns a community value of 65001:274 for the prefixes that have AS 274 in their AS_Path list. Example 11-69 illustrates the configuration on NX-5 to a community value attached to prefixes.

Example 11-69 Advertising Community Value

NX-5(config)# ip as-path access-list ASN_274 permit 274
NX-5(config)# route-map set-Comm
NX-5(config-route-map)# match as-path ASN_274
NX-5(config-route-map)# set community 65001:274
NX-5(config-route-map)# route-map set-Comm per 20
NX-5(config-route-map)# exit
NX-5(config)# router bgp 65001
NX-5(config-router)# neighbor 10.25.1.2
NX-5(config-router-neighbor)# address-family ipv4 unicast
NX-5(config-router-neighbor-af)# route-map set-Comm out
NX-5(config-router-neighbor-af)# send-community
NX-5(config-router-neighbor-af)# end
NX-2# show bgp ipv4 unicast 100.1.25.0/24
BGP routing table information for VRF default, address family IPv4 Unicast
BGP routing table entry for 100.1.25.0/24, version 1195
Paths: (1 available, best #1)
Flags: (0x08001a) on xmit-list, is in urib, is best urib route, is in HW,

  Advertised path-id 1
  Path type: external, path is valid, is best path
  AS-Path: 65001 100 228 274 {300 243} , path sourced external to AS
    10.25.1.5 (metric 0) from 10.25.1.5 (192.168.5.5)

      Origin EGP, MED not set, localpref 100, weight 0
      Community: 65001:274

  Path-id 1 advertised to peers:
    192.168.1.1

On NX-2, if an operator wants to set a BGP attribute based on the matching community value, community-list is used in the matching statement under route-map. Example 11-70 illustrates the configuration for using BGP community values for influencing route policy.

Example 11-70 Influencing Route Policy Using BGP Community

NX-2(config)# ip community-list standard Comm-65001:274 permit 65001:274
NX-2(config)# route-map Match-Comm per 10
NX-2(config-route-map)# match community Comm-65001:274
NX-2(config-route-map)# set local-preference 200
NX-2(config-route-map)# route-map Match-Comm per 20
NX-2(config-route-map)# exit
NX-2(config)# router bgp 65000
NX-2(config-router)# neighbor 10.25.1.5
NX-2(config-router-neighbor)# address-family ipv4 unicast
NX-2(config-router-neighbor-af)# route-map Match-Comm in
NX-2(config-router-neighbor-af)# end
NX-2# show bgp ipv4 unicast neighbor 10.25.1.5 routes
BGP routing table information for VRF default, address family IPv4 Unicast
BGP table version is 1141, local router ID is 192.168.2.2
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup
   Network            Next Hop      Metric  LocPrf  Weight Path
! Output omittied for brevity
*>e100.1.21.0/24      10.25.1.5             200          0 65001 100 228 274 {300 243}
*>e100.1.22.0/24      10.25.1.5             200          0 65001 100 228 274 {300 243}
*>e100.1.23.0/24      10.25.1.5             200          0 65001 100 228 274 {300 243}
*>e100.1.24.0/24      10.25.1.5             200          0 65001 100 228 274 {300 243}
*>e100.1.25.0/24      10.25.1.5             200          0 65001 100 228 274 {300 243}
*>e100.1.26.0/24      10.25.1.5             200          0 65001 100 228 274 {300 243}
*>e100.1.27.0/24      10.25.1.5             200          0 65001 100 228 274 {300 243}
*>e100.1.28.0/24      10.25.1.5             200          0 65001 100 228 274 {300 243}
*>e100.1.29.0/24      10.25.1.5             200          0 65001 100 228 274 {300 243}
*>e100.1.30.0/24      10.25.1.5             200          0 65001 100 228 274 {300 243}

Looking Glass and Route Servers

Hands-on experience is helpful when learning technologies such as regex. There are public devices called looking glass or route servers that allow users to log in and view BGP tables. Most of these devices are Cisco routers, but there are other vendors as well. These servers allow network engineers to see if they are advertising their routes to the Internet, as they had intended, and provide a great method to try out regular expressions on the Internet BGP table. A quick search on the Internet provides website listings of looking glass and route servers.

Logs Collection

In event of BGP failure, the following show tech logs can be collected:

  • show tech bgp

  • show tech netstack

If there is some issue seen with BGP route policies, collect the following logs along with show tech bgp:

  • show tech rpm

In case the routes are not being installed in the routing table but are present in the BGP table, you can also collect the following show tech output:

  • Show tech routing ipv4 unicast [brief]

Collect and share these logs  with Cisco TAC for a root-cause analysis of the problem.

Summary

BGP is a powerful path vector routing protocol that provides scalability and flexibility that cannot be compared to any other routing protocol. BGP uses TCP port 179 for establishing neighbors, which allows for BGP to establish sessions with directly attached routers or with routers that are multiple hops away.

Originally BGP was intended for routing of IPv4 prefixes between organizations, but over the years has had significant increase in functionality and feature enhancements. BGP has expanded from being an Internet routing protocol to other aspects of the network, including the data center.

BGP provides a scalable control plane signaling for overlay topologies, including MPLS VPNs, IPsec SAs, and VXLAN. These overlays provide Layer 3 services, such as L3VPNs, or Layer 2 services, such as eVPNs, across a widely used scalable control plane for everything from provider-based services to data center overlays. Every AFI/SAFI combination maintains an independent BGP table and routing policy, which makes BGP the perfect control plane application.

This chapter focused on various techniques for troubleshooting BGP peering issues and flapping peering issues related to MTU mismatch or due to bad BGP updates. Then the chapter dives deep into BGP route processing and convergence issues. The route processing concepts such as BGP update generation, route advertisement, best path calculation, and multipath are covered as part of the BGP route processing. This chapter then covers various scaling techniques for BGP, including BGP route reflectors.

The chapter then focuses on route filtering concepts using prefix-lists, filter-lists, and route-maps and goes over various matching criteria available with route-maps, such as prefix-lists, community-lists, and regular expressions.

Further Reading

Some of the topics involving validity checks and next-hop resolution are explained further in the following books:

Halabi, Sam. Internet Routing Architectures. Indianapolis: Cisco Press, 2000.

Zhang, Randy, and Micah Bartell. BGP Design and Implementation. Indianapolis: Cisco Press 2003.

White, Russ, Alvaro Retana, and Don Slice. Optimal Routing Design. Indianapolis: Cisco Press, 2005.

Jain, Vinit, and Brad Edgeworth. Troubleshooting BGP. Indianapolis: Cisco Press, 2016.

References

Jain, Vinit, and Brad Edgeworth. Troubleshooting BGP. Indianapolis: Cisco Press, 2016.

Edgeworth, Brad, Aaron Foss, and Ramiro Garza Rios. IP Routing on Cisco IOS, IOS XE and IOS XR. Indianapolis: Cisco Press, 2014.

Cisco. Cisco NX-OS Software Configuration Guides, www.cisco.com.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.110.116