Chapter 11 Troubleshooting BGP

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 11 Troubleshooting BGP

This chapter covers the following topics:

BGP Fundamentals

Defined in RFC 1654, Border Gateway Protocol (BGP) is a path-vector routing protocol that provides scalability, flexibility, and network stability. When BGP was first developed, the primary design consideration was for IPv4 inter-organizational routing information exchange across the public networks, such as the Internet, or for private dedicated networks. BGP is often referred to as the protocol for the Internet, because it is the only protocol capable of holding the Internet routing table, which has more than 600,000 IPv4 routes and over 42,000 IPv6 routes, both of which continue to grow.

From the perspective of BGP, an autonomous system (AS) is a collection of routers under a single organization’s control. Organizations requiring connectivity to the Internet must obtain an autonomous system number (ASN). ASNs were originally 2 bytes (16-bit) providing 65,535 ASNs. Due to exhaustion, RFC 4893 expands the ASN field to accommodate 4 bytes (32-bit). This allows for 4,294,967,295 unique ASNs, providing quite a leap from the original 65,535 ASNs. The Internet Assigned Numbers Authority (IANA) is responsible for assigning all public ASNs to ensure that they are globally unique.

Two blocks of private ASNs are available for any organization to use as long as they are never exchanged publicly on the Internet. ASNs 64,512 to 65,535 are private ASNs within the 16-bit ASN range, and 4,200,000,000 to 4,294,967,294 are private ASNs within the extended 32-bit range.

Note

It is imperative that you use only the ASN assigned by IANA, the ASN assigned by your service provider, or private ASNs. Not only that, the public prefixes are mapped with the relevant ASN numbers of the organizations. Thus, mistakenly or maliciously advertising a prefix using the wrong ASN could result in traffic loss and causing havoc on the Internet.

Address Families

Originally, BGP was intended for routing of IPv4 prefixes between organizations, but RFC 2858 added Multi-Protocol BGP (MP-BGP) capability by adding extensions called address-family identifier (AFI). An address-family correlates to a specific network protocol, such as IPv4, IPv6, and so on, and additional granularity through subsequent address-family identifier (SAFI), such as unicast and multicast. MBGP achieves this separation by using the BGP path attributes (PA) MP_REACH_NLRI and MP_UNREACH_NLRI. These attributes are carried inside BGP update messages and are used to carry network reachability information for different address families.

Note

Some network engineers refer to Multi-Protocol BGP as MP-BGP and other network engineers use the term MBGP. Both terms are the same thing.

Network engineers and vendors continue to add functionality and feature enhancements to BGP. BGP now provides a scalable control plane for signaling for overlay technologies like Multiprotocol Label Switching (MPLS) Virtual Private Networks (VPN), IPsec Security Associations, and Virtual Extensible Lan (VXLAN). These overlays provide Layer 3 connectivity via MPLS L3VPNs, or Layer 2 connectivity via Ethernet VPNs (eVPN).

Every address-family maintains a separate database and configuration for each protocol (address-family + subaddress-family) in BGP. This allows for a routing policy in one address-family to be different from a routing policy in a different address-family, even though the router uses the same BGP session to the other router. BGP includes an AFI and SAFI with every route advertisement to differentiate between the AFI and SAFI databases. Table 11-1 provides a small list of common AFI and SAFIs used with BGP.

Table 11-1 BGP AFI/SAFI

AFI	SAFI	Network Layer Information
1	1	IPv4 Unicast
1	2	IPv4 Multicast
1	4	MPLS Label
1	128	MPLS L3VPN IPv4
2	1	IPv6 Unicast
2	4	MPLS Label
2	128	MPLS L3VPN IPv6
25	65	Virtual Private Lan Service (VPLS)
25	70	Ethernet VPN (EVPN)

Path Attributes

BGP attaches path attributes (PA) associated with each network path. The PAs provide BGP with granularity and control of routing policies within BGP. The BGP prefix PAs are classified as follows:

Well-known mandatory
Well-known discretionary
Optional transitive
Optional nontransitive

Per RFC 4271, well-known attributes must be recognized by all BGP implementations. Well-known mandatory attributes must be included with every prefix advertisement, whereas well-known discretionary attributes may or may not be included with the prefix advertisement.

Optional attributes do not have to be recognized by all BGP implementations. Optional attributes can be set so that they are transitive and stay with the route advertisement from AS to AS. Other PAs are nontransitive and cannot be shared from AS to AS. In BGP, the Network Layer Reachability Information (NLRI) is the routing update that consists of the network prefix, prefix-length, and any BGP PAs for that specific route.

Loop Prevention

BGP is a path vector routing protocol and does not contain a complete topology of the network like link state routing protocols. BGP behaves similar to distance vector protocols to ensure a path is a loop-free path.

The BGP attribute AS_PATH is a well-known mandatory attribute and includes a complete listing of all the ASNs that the prefix advertisement has traversed from its source AS. The AS_PATH is used as a loop-prevention mechanism in the BGP protocol. If a BGP router receives a prefix advertisement with its AS listed in the AS_PATH, it discards the prefix because the router thinks the advertisement forms a loop.

Note

The other IBGP-related loop-prevention mechanism are discussed later in this chapter.

BGP Sessions

A BGP session refers to the established adjacency between two BGP routers. BGP sessions are always point-to-point and are categorized into two types:

Internal BGP (iBGP): Sessions established with an iBGP router that are in the same AS or participate in the same BGP confederation. iBGP sessions are considered more secure, and some of BGP’s security measures are lowered in comparison to EBGP sessions. iBGP prefixes are assigned an administrative distance (AD) of 200 upon installing into the router’s Routing Information Base (RIB).
External BPG (EBGP): Sessions established with a BGP router that are in a different AS. EBGP prefixes are assigned an AD of 20 upon installing into the router’s RIB.

Note

Administrative distance (AD) is a rating of the trustworthiness of a routing information source. If a router learns about a route to a destination from more than one routing protocol and they all have the same prefix length, AD is compared. The preference is given to the route with the lower AD.

BGP uses TCP port 179 to communicate with other routers. Transmission Control Protocol (TCP) allows for handling of fragmentation, sequencing, and reliability (acknowledgement and retransmission) of communication (control plane) packets. Although BGP can form neighbor adjacencies that are directly connected, it can also form adjacencies that are multiple hops away. Multihop sessions require that the router use an underlying route installed in the RIB (static or from any routing protocol) to establish the TCP session with the remote endpoint.

Note

BGP neighbors connected via the same network use the ARP table to locate the IP address of the peer. Multihop BGP sessions require route table information for finding the IP address of the peer. It is common to have a static route or Interior Gateway Protocol (IGP) running between iBGP peers for providing the topology path information for establishing the BGP TCP session. A default route is not sufficient to establish a multihop BGP session.

BGP can be thought of as a control plane routing protocol or as an application, because it allows for the exchanging of routes with peers multiple hops away. BGP routers do not have to be in the data plane (path) to exchange prefixes, but all routers in the data path need to know all the routes that will be forwarded through them.

BGP Identifier

The BGP Router-ID (RID) is a 32-bit unique number that identifies the BGP router in the advertised prefixes as the BGP Identifier. The RID is also used as a loop prevention mechanism for routers advertised within an autonomous system. The RID can be set manually or dynamically for BGP. A nonzero value must be set for routers to become neighbors. NX-OS nodes use the IP address of the lowest up loopback interface. If there are no up loopback interfaces, then the IP address of the lowest active up interface becomes the RID when the BGP process initializes.

Router-IDs typically represent an IPv4 address that resides on the router, such as a loopback address. Any IPv4 address can be used, including IP addresses not configured on the router. NX-OS uses the command router-id router-id under the BGP router configuration to statically assign the BGP RID. Upon changing the router-id, all BGP sessions reset and need to reestablish.

Note

It is a best practice to statically assign the BGP Router-ID.

BGP Messages

BGP communication uses four message types as shown in Table 11-2.

Table 11-2 BGP Packet Types

Type	Name	Functional Overview
1	OPEN	Sets up and establishes BGP adjacency
2	UPDATE	Advertises, updates, or withdraws routes
3	NOTIFICATION	Indicates an error condition to a BGP neighbor
4	KEEPALIVE	Ensures that BGP neighbors are still alive

OPEN

The OPEN message is used to establish a BGP adjacency. Both sides negotiate session capabilities before a BGP peering establishes. The OPEN message contains the BGP version number, ASN of the originating router, Hold Time, BGP Identifier, and other optional parameters that establish the session capabilities.

The Hold Time attribute sets the Hold Timer in seconds for each BGP neighbor. Upon receipt of an UPDATE or KEEPALIVE, the Hold Timer resets to the initial value. If the Hold Timer reaches zero, the BGP session is torn down, routes from that neighbor are removed, and an appropriate update route withdraw message is sent to other BGP neighbors for the impacted prefixes. The Hold Time is a heartbeat mechanism for BGP neighbors to ensure that the neighbor is healthy and alive.

When establishing a BGP session, the routers use the smaller Hold Time value contained in the two router’s OPEN messages. The Hold Time value must be set to at least 3 seconds, or zero. For Cisco routers the default hold timer is 180 seconds.

UPDATE

The UPDATE message advertises any feasible routes, withdraws previously advertised routes, or can do both. The UPDATE message includes the Network Layer Reachability Information (NLRI) that includes the prefix and associated BGP PAs when advertising prefixes. Withdrawn NLRIs include only the prefix. An UPDATE message can act as a KEEPALIVE message to reduce unnecessary traffic.

NOTIFICATION

A NOTIFICATION message is sent when an error is detected with the BGP session, such as a Hold Timer expiring, a neighbor capabilities change, or a BGP session reset is requested. This causes the BGP connection to close.

Note

More details on the BGP messages are discussed during troubleshooting sections.

KEEPALIVE

BGP does not rely upon the TCP connection state to ensure that the neighbors are still alive. KEEPALIVE messages are exchanged every 1/3 of the Hold Timer agreed upon between the two BGP routers. Cisco devices have a default Hold Time of 180 seconds, so the default KEEPALIVE interval is 60 seconds. If the Hold Time is set for zero, no KEEPALIVE messages are sent between the BGP neighbors.

BGP Neighbor States

BGP forms a TCP session with neighbor routers called peers. BGP uses the Finite State Machine (FSM) to maintain a table of all BGP peers and their operational status. The BGP session may report in the following state:

Idle
Connect
Active
OpenSent
OpenConfirm
Established

Figure 11-1 displays the BGP FSM and the states in order of establishing a BGP session.

Figure 11-1 BGP Finite State Machine

Idle

This is the first stage of the BGP FSM. BGP detects a start event and tries to initiate a TCP connection to the BGP peer and also listens for a new connect from a peer router.

If an error causes BGP to go back to the Idle state for a second time, the ConnectRetryTimer is set to 60 seconds and must decrement to zero before the connection is initiated again. Further failures to leave the Idle state result in the ConnectRetryTimer doubling in length from the previous time.

Connect

In this state, BGP initiates the TCP connection. If the 3-way TCP handshake completes, the established BGP Session BGP process resets the ConnectRetryTimer and sends the Open message to the neighbor, and changes to the OpenSent State.

If the ConnectRetry timer depletes before this stage is complete, a new TCP connection is attempted, the ConnectRetry timer is reset, and the state is moved to Active. If any other input is received, the state is changed to Idle.

During this stage, the neighbor with the higher IP address manages the connection. The router initiating the request uses a dynamic source port, but the destination port is always 179.

Note

Service providers consistently assign their customers the higher or lower IP address for their networks. This helps the service provider create proper instructions for ACLs or firewall rules, or for troubleshooting them.

Active

In this state, BGP starts a new 3-way TCP handshake. If a connection is established, an Open message is sent, the Hold Timer is set to 4 minutes, and the state moves to OpenSent. If this attempt for TCP connection fails, the state moves back to the Connect state and resets the ConnectRetryTimer.

OpenSent

In this state, an Open message has been sent from the originating router and is awaiting an Open message from the other router. After the originating router receives the OPEN message from the other router, both OPEN messages are checked for errors. The following items are being compared:

BGP versions must match.
The source IP Address of the OPEN message must match the IP address that is configured for the neighbor.
The AS number in the OPEN message must match what is configured for the neighbor.
BGP Identifiers (RID) must be unique. If a RID does not exist, this condition is not met.
Security Parameters (Password, Time to Live [TTL], and so on)

If the Open messages do not have any errors, the Hold Time is negotiated (using the lower value), and a KEEPALIVE message is sent (assuming the value is not set to zero). The connection state is then moved to OpenConfirm. If an error is found in the OPEN message, a Notification message is sent, and the state is moved back to Idle.

If TCP receives a disconnect message, BGP closes the connection, resets the ConnectRetryTimer, and sets the state to Active. Any other input in this process results in the state moving to Idle.

OpenConfirm

In this state, BGP waits for a Keepalive or Notification message. Upon receipt of a neighbor’s Keepalive, the state is moved to Established. If the Hold Timer expires, a stop event occurs, or a Notification message is received, the state is moved to Idle.

Established

In this state, the BGP session is established. BGP neighbors exchange routes via Update messages. As Update and Keepalive messages are received, the Hold Timer is reset. If the Hold Timer expires, an error is detected, and BGP moves the neighbor back to the Idle state.

BGP Configuration and Verification

BGP configuration on NX-OS can be laid out in few simple steps, but the BGP command line is available only after enabling the BGP feature. Use the command feature bgp to enable the BGP feature on Nexus platforms. The steps for configuring BGP on an NX-OS device are as follows:

Step 1. Create the BGP routing process. Initialize the BGP process with the global configuration command router bgp as-number.

Step 2. Assign a BGP router-id. Assign a unique BGP router-id under the BGP router process. The router-id can be an IP address assigned to a physical interface or a Loopback interface.

Step 3. Initialize the address-family. Initialize the address-family with the BGP router configuration command address-family afi safi so it can be associated to a BGP neighbor.

Step 4. Identify the BGP neighbor’s IP address and autonomous system number. Identify the BGP neighbor’s IP address and autonomous system number with the BGP router configuration command neighbor ip-address remote-as as-number.

Step 5. Activate the address-family for the BGP neighbor. Activate the address-family for the BGP neighbor with the BGP neighbor configuration command address-family afi safi.

Examine the topology shown in Figure 11-2. This topology is used as reference for the next section as well. In this topology, Nexus devices NX-1, NX-2, and NX-4 are part of AS 65000, whereas router NX-6 belongs to AS 65001.

Figure 11-2 Reference Topology

Example 1-4 displays the BGP configuration for router NX-4 demonstrating both IBGP and EBGP peering. For this example, NX-4 is trying to establish an IBGP peering with NX-1 and an EBGP peering with NX-6. While configuring a BGP peering, it is important to ensure the following information is correct:

Local and remote ASN
Source peering IP
Remote peering IP
Authentication passwords (optional)
EBGP-multihop (EBGP only)

In Example 11-1, NX-4 is forming an IBGP peering with NX-1 and an EBGP peering with NX-6 router. The NX-4 device is also advertising its loopback address under the IPv4 address family using the network command.

Example 11-1 NX-OS BGP Configuration

Error Code	Subcode	Description
01	00	Message Header Error
01	01	Message Header Error—Connection Not Synchronized
01	02	Message Header Error—Bad Message Length
01	03	Message Header Error—Bad Message Type
02	00	OPEN Message Error
02	01	OPEN Message Error—Unsupported Version Number
02	02	OPEN Message Error—Bad Peer AS
02	03	OPEN Message Error—Bad BGP Identifier
02	04	OPEN Message Error—Unsupported Optional Parameter
02	05	OPEN Message Error—Deprecated
02	06	OPEN Message Error—Unacceptable Hold Time
03	00	Update Message Error
03	01	Update Message Error—Malformed Attribute List
03	02	Update Message Error—Unrecognized Well-Known Attribute
03	03	Update Message Error—Missing Well-Known Attribute
03	04	Update Message Error—Attribute Flags Error
03	05	Update Message Error—Attribute Length Error
03	06	Update Message Error—Invalid Origin Attribute
03	07	(Deprecated)
03	08	Update Message Error—Invalid NEXT_HOP Attribute
03	09	Update Message Error—Optional Attribute Error
03	0A	Update Message Error—Invalid Network Field
03	0B	Update Message Error—Malformed AS_PATH
04	00	Hold Timer Expired
05	00	Finite State Machine Error
06	00	Cease
06	01	Cease—Maximum Number of Prefixes Reached
06	02	Cease—Administrative Shutdown
06	03	Cease—Peer Deconfigured
06	04	Cease—Administrative Reset
06	05	Cease—Connection Rejected
06	06	Cease—Other Configuration Change
06	07	Cease—Connection Collision Resolution
06	08	Cease—Out of Resources

BGP Attribute	Scope
Weight	Router only. Highest value wins.
Local Preference	Within AS boundary. Highest value wins.
Locally Originated	Network or redistribute command preferred over local aggregates (aggregate-address command).
Accumulated Interior Gateway Protocol (AIGP)	AIGP Path Attribute.
AS_PATH	Shortest AS_PATH wins: Skipped if bgp bestpath as-path ignore configured. AS_SET counts as 1. CONFED parts do not count.
Origin Type	IGP < EGP < Incomplete. Lowest wins.
Mutual Exclusive Discriminator (MED)	Compare only if the first AS in AS_SEQUENCE is same for multiple paths.
EBGP over IBGP	External BGP path preferred over Internal BGP path.
Metric to Next Hop	Cost of IGP to reach BGP next-hop. Lowest metric wins.
Oldest External	When both paths are external, prefer the first (oldest).
BGP Router ID (RID)	Path with lowest BGP RID is preferred.
CLUSTER_LIST	Prefer the route with minimum CLUSTER_LIST length.
Neighbor Address	Prefer path that is received form the lowest neighbor address (neighbor configured using neighbor ip-address command).

maximum	Defines the maximum prefix limit.
threshold	Defines the threshold percentage at which a warning is generated.
restart restart-interval	Default behavior. Resets the BGP connection after the specified prefix limit is exceeded. The restart-interval is configured in minutes. BGP tries to reestablish the peering after the specified time interval is passed. When the restart option is set, a cease notification is sent to the neighbor, and the BGP connection is terminated.
warning-only	Only gives a warning message when the specified limit is exceeded.

Modifier	Description
_ (Underscore)	Matches a space
^ (Caret)	Indicates the start of the string
$ (Dollar Sign)	Indicates the end of the string
[] (Brackets)	Matches a single character or nesting within a range
- (Hyphen)	Indicates a range of numbers in brackets
[^] (Caret in Brackets)	Excludes the characters listed in brackets
() (Parentheses)	Used for nesting of search patterns
\| (Pipe)	Provides or functionality to the query
. (Period)	Matches a single character, including a space
* (Asterisk)	Matches zero or more characters or patterns
+ (Plus Sign)	One or more instances of the character or pattern
? (Question Mark)	Matches one or no instances of the character or pattern

Table of Contents for Chapter 11 Troubleshooting BGP

Create new playlist

Sign In

Sign Up

Chapter 11

Troubleshooting BGP

BGP Fundamentals

Address Families

Path Attributes

Loop Prevention

BGP Sessions

BGP Identifier

BGP Messages

OPEN

UPDATE

NOTIFICATION

KEEPALIVE

BGP Neighbor States

Idle

Connect

Active

OpenSent

OpenConfirm

Established

BGP Configuration and Verification

Troubleshooting BGP Peering Issues

Troubleshooting BGP Peering Down Issues

Verifying Configuration

Verifying Reachability and Packet Loss

Verifying ACLs and Firewalls in the Path

Verifying TCP Sessions

OPEN Message Errors

BGP Debugs

Demystifying BGP Notifications

Troubleshooting IPv6 Peers

BGP Peer Flapping Issues

Bad BGP Update

Hold Timer Expired

BGP Keepalive Generation

MTU Mismatch Issues

BGP Route Processing and Route Propagation

BGP Route Advertisement

Network Statement

Redistribution

Route Aggregation

Default-Information Originate

BGP Best Path Calculation

BGP Multipath

EBGP and IBGP Multipath

BGP Update Generation Process

BGP Convergence

Scaling BGP

Tuning BGP Memory

Prefixes

Paths

Attributes

Scaling BGP Configuration

Soft Reconfiguration Inbound Versus Route Refresh

Scaling BGP with Route-Reflectors

Loop Prevention in Route Reflectors

ORIGINATOR_ID

CLUSTER_LIST

Maximum Prefixes

BGP Max AS

BGP Route Filtering and Route Policies

Prefix-List-Based Filtering

Filter-Lists

BGP Route-Maps

Regular Expressions (RegEx)

_ Underscore

^ Caret

$ Dollar Sign

[ ] Brackets

- Hyphen

[^] Caret in Brackets

( ) Parentheses and | Pipe

. Period

+ Plus Sign

? Question Mark

* Asterisk

Table of Contents for
Chapter 11 Troubleshooting BGP