Chapter 6

BGP

Lab 1: Establishing a BGP Session Using the Correct TTL Value

Images

This lab should be conducted on the Enterprise POD.

BGP Peering Session Overview

BGP, unlike other routing protocols, such as OSPF and EIGRP, does not implement its own transport when forming neighbor relationships (also called peer relationships in BGP terminology) with other BGP-speaking routers. Instead, BGP leverages TCP as its transport protocol, running over the well-known BGP TCP port 179. This means that in order for a BGP peering session to come up between two routers, they must first establish a TCP session with each other. Thus, BGP session establishment is a two-step process, where the first step is establishing a TCP session and the second step is exchanging BGP-specific information to build the BGP peering session.

TCP sessions operate on a client/server model. The server listens for connection attempts on a specific TCP port number. The client attempts to establish TCP sessions to the port number on which the server is listening. The client sends a TCP synchronization (TCP SYN) message to the listening server indicating that it would like to begin sending data to the server. The server responds with a TCP synchronization acknowledgment (TCP SYN ACK) message confirming it received the client’s request and is ready to receive data over the connection. The client finally responds with a simple TCP acknowledgment (TCP ACK) message to acknowledge that it received the server’s SYN-ACK packet. From this point on, the client can begin sending data to the server as TCP segments. This process is known as the TCP three-way handshake.

When BGP is enabled on a router, the router begins listening for TCP server connection attempts on port 179. When the router is configured to peer with a particular neighbor (using the neighbor command in BGP router configuration mode), it attempts to establish a TCP connection with the configured neighbor by sending a TCP SYN to the potential neighbor. This process is also known as an active open attempt. The TCP SYN packet is sent with the source IP address of the outgoing interface the router uses to reach the neighbor, the destination IP address of the potential neighbor, and the destination TCP port 179. In this situation, the router is acting as a TCP client, attempting to connect to a TCP server at port 179.

The remote neighbor listens for connections coming in on TCP port 179. When it receives the TCP SYN packet, it checks its own BGP configuration to verify that the connection attempt is being made from an IP address that is designated as a potential BGP peer by using the neighbor command in its own BGP configuration. If it finds a match, it responds with a TCP SYN-ACK message; otherwise, it resets the TCP session. At the same time, the server is also sending its own TCP SYN packets to its configured neighbor in an attempt to establish a BGP peering session with it.

Because BGP routers both passively listen for TCP connections and actively attempt to create TCP sessions to configured neighbors, they act as both TCP clients and servers during the TCP exchange. If two neighbors both receive and send a TCP SYN connection to each other, the one with the higher BGP identifier (or BGP router ID) becomes the TCP client, and the one with the lower BGP router ID becomes the TCP server.

Once the TCP session is established, the routers begin the BGP peering session establishment phase, determining the type of BGP peering and exchanging capabilities. Neighbors are identified by their IP addresses and BGP autonomous system numbers (ASNs):

  • If a peer’s ASN matches the local ASN, it is considered to be an internal BGP (iBGP) peer.

  • If a peer’s ASN does not match the local ASN, it is considered to be an external BGP (eBGP) peer.

The two peers also exchange BGP capabilities used to negotiate the keepalive (hello) interval, hold timer value, and other session parameters. If the receiving BGP peer finds a parameter unacceptable, then the BGP peering session does not come up.

This is the basic process used to establish a BGP session between two routers. The following labs address the subtleties in how this interaction occurs. The key point to remember is that BGP uses TCP as transport and it is therefore bound by the rules of TCP regarding how it operates. Due to this reliance on TCP, BGP session establishment has two phases: TCP session establishment and BGP peering session establishment.

Task 1

Configure appropriate IP addressing as indicated in the diagram above.

Task 2

Configure R1 and R2 to become eBGP neighbors. They should use their Loopback0 interfaces as the peering addresses.

Task 3

Reconfigure the above BGP peering such that R1 and R2 are able to become eBGP peers without using the disable-connected-check command. Ensure that the TTL value of sent packets is as low as possible. Use a BGP-related command to accomplish this task.

Task 4

Reconfigure R1 and R2 such that they form an eBGP peering with each other. Ensure that the TTL value of any received BGP packet is no less than 253. Do not use the disable-connected-check command.

Task 5

Configure R1 and R2 to become eBGP peers. Do not use disable-connected-check, ebgp-multihop, or ttl-security to accomplish this task. Do not configure any tunneling mechanisms or IRB to accomplish this task.

Task 6

Reload the routers and configure the following topology.

Images

Task 7

Configure R1 and R3 with an eBGP session using their Loopback0 interfaces.

Task 8

Reconfigure R1 and R3 with an eBGP session without modifying the TTL value, configuring IRB, or using GRE or IPnIP tunneling mechanisms.

Task 9

Erase the startup configuration and reload the routers before proceeding to the next lab.

Lab 2: Establishing Neighbor Adjacency Using Different Methods

Images

This lab should be conducted on the Enterprise POD.

Lab Setup:

If you are using EVE-NG, and you have imported the EVE-NG topology from the EVE-NG-Topology folder, ignore the following and use Lab-2-T1-4-Establishing Peering session Using Different Methods in the BGP folder in EVE-NG.

To copy and paste the initial configurations, go to the Initial-config folder BGP folder Lab-2-T1-4.

Task 1

Configure R1 through R4 in AS 100.

  • Ensure that these routers create an iBGP peering session with each other in a full mesh manner.

  • Ensure that these routers advertise their Loopback0 interfaces in this AS.

Task 2

Reconfigure the routers based on the following policy:

  • Keep R1 in AS 100.

  • Configure R2, R3, and R4 in AS 200, 300, and 400, respectively.

A full mesh BGP peering session must be configured between these routers. Advertise Loopback0 on each router into BGP.

Task 3

Reconfigure all the routers in AS 100. Use the following policy for their iBGP peering sessions:

  • Enable authentication between the peers, using cisco as the password.

  • Ensure that the peering session is established based on the Loopback0 interface’s IP address.

  • Ensure that these routers only advertise their Loopback1 interface in BGP.

  • Provide reachability to the Loopback0 interfaces using RIPv2.

  • Ensure that peering sessions between the routers are established only if they are running BGP Version 4.

  • Use peer groups to accomplish this task.

Task 4

Remove the BGP configuration from the routers and reconfigure all four routers in AS 100 using peer session templates. You should configure the following two templates to accomplish this task:

  • Common template

    • This template should contain the neighbor version 4 and neighbor password commands.

    • This template should be applied to all neighbors.

  • iBGP template

    • This template should contain the neighbor update-source and neighbor remote-as commands.

    • This template should be applied to all iBGP neighbors.

You should advertise the Loopback1 interface in BGP and use Loopback0 for establishing the peering sessions. Do not remove RIPv2’s configuration.

Task 5

Erase the startup configurations and reload the routers before continuing to Task 6.

Task 6

Images

Lab Setup:

If you are using EVE-NG, and you have imported the EVE-NG topology from the EVE-NG-Topology folder, then ignore the following and use Lab-2-T6-7-Establishing Peering session Using Different Methods in the BGP folder in EVE-NG.

To copy and paste the initial configurations, go to the Initial-config folder BGP folder Lab-2-T6-7.

Configure the routers based on the following policy:

  • Ensure that R1 in AS 100 establishes an eBGP peering session with R2 in AS 200. Ensure that R1 advertises all of its loopback interfaces in AS 100.

  • Ensure that R2, R3, and R4 are configured in AS 200. Ensure that these routers establish iBGP peering sessions in a full mesh manner and advertise their Loopback0 interfaces in AS 200.

  • Configure the BGP router IDs of the routers as follows:

    • R1: 10.1.1.1

    • R2: 10.2.2.2

    • R3: 10.3.3.3

    • R4: 10.4.4.4

  • Ensure that the loopback interfaces on R4 and R3 have reachability to the networks advertised by R1.

Task 7

Configure R2 to provide reachability to R3 and R4 by changing the next hop IP address for all the networks advertised by R1 to the IP address of its G0/0 interface. Use a template so that future policies can be implemented once in that template and make sure it affects R3 and R4. Do not use peer groups to accomplish this task.

Task 8

Erase the startup configuration and reload the routers before proceeding to the next lab.

Lab 3: Route Reflectors

Images

This lab should be conducted on the Enterprise POD.

Lab Setup:

If you are using EVE-NG, and you have imported the EVE-NG topology from the EVE-NG-Topology folder, ignore the following and use Lab-3-T1-3-Route Reflectors in the BGP folder in EVE-NG.

To copy and paste the initial configurations, go to the initial-config folder BGP folder Lab-3-T1-2.

Task 1

Configure BGP AS 100 on all routers and ensure that the routers can successfully establish an iBGP peering session with each other. These routers should only advertise their Loopback0 interface in BGP.

Task 2

Management emails you, stating that within the next 12 months, 20 additional routers will be added to this AS. In order to minimize the number of peering sessions within this AS, you decide to implement route reflectors. Configure R1 as a route reflector for this AS. You must remove OSPF.

Task 3

Images

Do not erase the existing configuration. Add the following configuration to the existing configuration.

After implementing the route reflector, you realize that if the route reflector is down, the entire network is dysfunctional; therefore, you decide to add R4 as the second route reflector for redundancy. Ensure that the routers can reach the advertised networks and the redundancy is operational. R4 should advertise its Lo0 interface in BGP.

Task 4

Erase the startup configuration and reload the routers. Reconfigure the routers based on the following topology. Do not configure BGP.

Images

Lab Setup:

If you are using EVE-NG, and you have imported the EVE-NG topology from the EVE-NG-Topology folder, ignore the following and use Lab-3-T4-Route Reflectors in the BGP folder in EVE-NG.

To copy and paste the initial configurations, go to the initial-config folder BGP folder Lab-3-T4.

Task 5

Configure BGP on R1 through R6, based on the following policy:

  • Ensure that all routers belong to AS 100.

  • Configure R1 as the route reflector for routers R2 and R3.

  • Configure R4 to be the route reflector for routers R5 and R6.

  • Configure R1 and R4 to have an iBGP peering session between them.

  • Advertise the networks on the Loopback0 interface on every router into BGP.

  • Ensure that reachability for the links is provided through OSPF. The OSPF router ID should be set to 0.0.0.x, where x is the router number.

Task 6

Erase the startup configuration and reload the routers before proceeding to the next lab.

Lab 4: BGP Confederation

Images

This lab should be conducted on the Enterprise POD.

Lab Setup:

If you are using EVE-NG, and you have imported the EVE-NG topology from the EVE-NG-Topology folder, ignore the following and use Lab-4-BGP Confederation in the BGP folder in EVE-NG.

To copy and paste the initial configurations, go to the initial-config folder BGP folder Lab-4.txt.

Task 1

Configure BGP peering on the routers as follows:

  • Ensure that R1 in AS 65001 can establish an eBGP peering session with R2 in AS 65023.

  • Ensure that R2 in AS 65023 can establish an iBGP peering session with R3.

  • Ensure that R3 in AS 65023 can establish an eBGP peering session with R4 in AS 65045.

  • Ensure that R4 in AS 65045 can establish an iBGP peering session with R5.

  • Ensure that R4 and R5 can establish an eBGP peering session with R6 in AS 600.

  • Provide reachability to the links in AS 100 using EIGRP AS 1.

  • Ensure that these routers advertise their loopback interfaces in BGP.

  • Configure R1, R2, R3, R4, and R5 in AS 100.

Task 2

Change the default local preference attribute on R5 to 500 and on R4 to 400.

Task 3

Configure R6 such that when it advertises its network 6.0.0.0/8 to routers R4 and R5 in AS 65045, the routers in AS 65045 do not advertise this network to any of their existing and future eBGP peers.

Task 4

Erase the startup configuration and reload the routers before proceeding to the next lab.

Lab 5: BGP Backdoor and Conditional Advertisement

Images

This lab should be conducted on the Enterprise POD.

Lab Setup:

If you are using EVE-NG, and you have imported the EVE-NG topology from the EVE-NG-Topology folder, ignore the following and use Lab-5-T1-6-BGP Backdoor and Conditional Advertisement in the BGP folder in EVE-NG.

To copy and paste the initial configurations, go to the initial-config folder BGP folder Lab-5-T1-6.

Task 1

Configure R1 in AS 100 to establish an eBGP peering session with R2 and R3 in AS 200 and 300, respectively.

Task 2

Configure R1, R2, and R3 to advertise their Loopback0 interface in BGP.

Task 3

Configure RIPv2 and EIGRP AS 100 on the routers based on the following rules:

  • Configure RIPv2 on the networks 12.1.1.0/24 and 13.1.1.0/24. Disable auto-summarization.

  • Configure EIGRP AS 100 on R2’s G0/3 and R3’s G0/2 and their Loopback1 ­interfaces.

Task 4

The network 23.1.1.0/24 is not advertised in BGP. This means if the link between R2 and R3 goes down, these routers will not be able to reach each other’s Loopback1 interfaces, even though there is a redundant link between these two routers through BGP. Therefore, the administrator of R2 and R3 decided that the Loopback1 interfaces of R2 and R3 should be advertised in BGP for redundancy. Configure these routers to accommodate this decision.

Task 5

After implementing the previous task, the administrators realized that the traffic between networks 10.1.2.0/24 and 10.1.3.0/24 was taking a suboptimal path and was not using the direct path between routers R2 and R3.

Implement a BGP solution to fix this problem. You should not use the distance, PBR, or any global configuration mode command to accomplish this task.

Task 6

Remove the IP addresses from the G0/3 interface of R2 and G0/2 interface of R3 and ensure that both the G0/2 and G0/3 interfaces are in an administratively down state. Also remove the Loopback1 interfaces from these two routers.

Task 7

Reconfigure the routers based on the following topology.

Images

Lab Setup:

If you are using EVE-NG, and you have imported the EVE-NG topology from the EVE-NG-Topology folder, ignore the following and use Lab-5-T7-10-BGP Backdoor and Conditional Advertisement in the BGP folder in EVE-NG.

To copy and paste the initial configurations, go to the initial-config folder BGP folder Lab-5-T7-10.

Configure BGP based on the above topology. You must use the directly connected IP addresses to establish the peering.

Task 8

Ensure that R1, which is connected to AS 200 and AS 300, uses AS 300 as its primary service provider. Configure the following policy:

  • If network 3.0.0.0/8 from AS 300 is up and is advertised to R1, R1 should not advertise its network 1.0.0.0/8 to R2 in AS 200.

  • R1 should advertise network 1.0.0.0/8 to R2 only if network 3.0.0.0/8 is down and R1 is no longer receiving this network from AS 300.

Task 9

Remove the configuration commands entered in the previous task before you proceed to the next task. Ensure that the routers have the advertised networks in their BGP tables.

Task 10

Configure R2 and R1 such that R1 advertises the network 200.1.1.0/24.

Restrictions:

  • Do not configure another interface or logical interface with this IP address to accomplish this task.

  • Do not use the network command or configure redistribution to complete this task.

  • Upon completion of this task, this network may not be reachable.

Task 11

Erase the startup configuration and reload the routers before proceeding to the next lab.

Lab 6: BGP Aggregation

Images

This lab should be conducted on the Enterprise POD.

Lab Setup:

If you are using EVE-NG, and you have imported the EVE-NG topology from the EVE-NG-Topology folder, ignore the following and use Lab-6- BGP Aggregation in the BGP folder in EVE-NG.

To copy and paste the initial configurations, go to the initial-config folder BGP folder Lab-6.txt.

Task 1

R3 is configured in AS 300, and R4 is configured in AS 400. Configure R3 and R4 to advertise their loopback interfaces in BGP. If this configuration is done successfully, R2 in AS 200 should see four prefixes in its BGP table. You should use network statements to accomplish this task.

Task 2

Configure R2 to aggregate all four prefixes received from AS 300 and AS 400. Ensure that R2 can advertise the aggregate route to existing and future neighbor(s). Configure a BGP peering session between R2 and R1. If this configuration is performed successfully, R1 in AS 100 should only receive a single prefix representing the four prefixes.

Task 3

Configure R2 such that R1 in AS 100 can see the AS numbers where some or all the specific prefixes originated. Do not directly attach a route map to a neighbor or prefix in order to accomplish this.

Task 4

Configure R2 such that only R1 in AS 100 and R4 in AS 400 accept the aggregate. The other AS(es) in this topology (such as AS 300) should receive the aggregate route, but they should discard it.

Task 5

Configure R2 to advertise the aggregate plus 10.1.2.0/24 prefix to R1 and R4.

Task 6

Configure R2 to advertise 10.1.3.0/24 to R1 only. R4 should not receive this prefix.

Task 7

Erase the startup configuration of the routers and reload the devices before proceeding to the next lab.

Lab 7: BGP Filtering

Images

This lab should be conducted on the Enterprise POD.

Lab Setup:

If you are using EVE-NG, and you have imported the EVE-NG topology from the EVE-NG-Topology folder, ignore the following and use Lab-7- BGP Filtering in the BGP folder in EVE-NG.

To copy and paste the initial configurations, go to the initial-config folder BGP folder Lab-7.

The following loopback interfaces are preconfigured on the routers:

Router

Loopback0

Loopback200

R1

1.1.1.1/8

200.1.1.1/32

R2

2.2.2.2/8

200.1.1.2/32

R3

3.3.3.3/8

200.1.1.3/32

R4

4.4.4.4/8

200.1.1.4/32

R5

5.5.5.5/8

200.1.1.5/32

R6

6.6.6.6/8

200.1.1.6/32

R7

7.7.7.7/8

200.1.1.7/32

R8

8.8.8.8/8

200.1.1.8/32

SW1

10.1.1.10/8

200.1.1.10/32

SW2

20.1.1.20/8

200.1.1.20/32

SW3

30.1.1.30/8

200.1.1.30/32

SW4

40.1.1.40/8

200.1.1.40/32

SW5

50.1.1.50/8

200.1.1.50/32

OSPF PID 1 is preconfigured to provide reachability.

Task 1

Establish a BGP peering session on the devices based on the diagram. The BGP sessions must be established based on the Loopback0 interfaces of the devices. These devices should advertise their Lo200 interface in BGP.

Task 2

Configure R2 such that AS 300 and AS 600 do not use AS 200 as a transit AS.

Task 3

Configure SW4 such that it filters any prefix(es) that has originated or traversed AS 200. Do not use a route map to accomplish this task.

Task 4

Configure SW5 to filter any prefix(es) that has originated in AS 400.

Task 5

Configure R5 to filter paths that have AS 800 as the second AS in the AS-Path.

Task 6

Configure R8 in AS 800 to only allow prefixes from existing and future autonomous systems that are directly connected.

Task 7

Configure SW1 to filter any prefix(es) that originated in AS 200 and traversed through AS 300.

Task 8

Configure R7 to prepend its AS number four additional times when it advertises its Loopback200 interface to its directly connected neighbors.

Task 9

Configure R4 to filter any prefix(es) that has prepended the AS number multiple times.

Task 10

Configure R6 to filter all prefixes that were received by and originated in AS 200.

Task 11

Configure SW4 to filter the prefixes that originated from AS 933’s directly connected neighbors. This should not override the previous policy implemented in Task 4.

Task 12

Configure R7 such that it discards paths for any prefix(es) that has more than two AS hops.

Task 13

Configure R3 to advertise network 30.3.3.0/24. This network is a static route to Null0 that was part of the initial configuration file.

Task 14

Ensure that R1 is configured such that if the number of routes received from R3 exceeds 10, it shuts down the neighbor (R3). R1 should generate a console warning message when 80% of this threshold is reached. If the adjacency goes down because of this policy, R1 should restart the adjacency after 1 minute and check the number of routes; if they still exceed the threshold, the adjacency should once again go down, and the cycle should repeat.

Task 15

Configure SW4 to advertise networks 40.0.0.0/8 through 49.0.0.0/8, which were preconfigured in SW4’s routing table as static routes pointing to Null0.

Task 16

Since SW3 is running low on system resources, it cannot handle networks advertised by SW4 in the previous task. Configure the appropriate device(s) such that networks 40.0.0.0/8 through 49.0.0.0/8 are filtered.

Task 17

Erase the startup configuration on all routers and switches, delete vlan.dat on all switches, and reload all devices before proceeding to the next lab.

Lab 8: BGP Load Balancing

Images

This lab should be conducted on the Enterprise POD.

Lab Setup:

If you are using EVE-NG, and you have imported the EVE-NG topology from the EVE-NG-Topology folder, ignore the following and use Lab-8- BGP Load Balancing in the BGP folder in EVE-NG.

To copy and paste the initial configurations, go to the initial-config folder BGP folder Lab-8.txt.

Task 1

Configure R1 in AS 100 to establish a peering session with R3 in AS 300. These two routers should be configured such that they load share:

  • R1 should use its G0/9 and G0/3 interfaces.

  • R3 should use its G0/9 and G0/1 interfaces.

Underlying reachability should be provided with RIPv2.

Task 2

Configure the following:

  • Configure R2, R4, and R6 with full mesh iBGP peering in AS 246.

    • Configure R8 and SW1 for iBGP peerings in AS 890. The peerings should be established over their Loopback0 interfaces.

  • Configure R4 in AS 246 such that it performs load sharing between the two eBGP neighbors (R8 and SW1) to reach the 118.1.1.0/24 network.

Task 3

Configure R1 to advertise its Lo1 interface in BGP. Configure the appropriate router(s) such that R6 performs unequal-cost load sharing for R1’s Lo1, based on the bandwidth of the R1–R2 and R1–R4 connections. Configure the appropriate BGP peering to accomplish this task.

Task 4

Configure eBGP peerings between the following routers:

  • R7–R1

  • R7–R2

  • R1–R5

  • R2–R5

Configure R7 to advertise its Lo0 interface into AS 700. Configure the appropriate router such that R5 in AS 500 load shares between R1 and R2 to reach the Loopback0 interface of R7 in AS 700.

Task 5

Erase the startup configuration and reload the devices before proceeding to the next lab.

Lab 9: Remove-Private-AS: A Walkthrough

This lab should be conducted on the Enterprise POD.

Images

Erase the startup configuration and reload the routers before proceeding with this lab.

The following configurations handle IP addressing and BGP peerings on the devices in the topology:

On R1:

R1(config)#interface lo0
R1(config-if)#ip address 1.1.1.1 255.255.255.255
 
R1(config)#interface g0/2
R1(config-if)#ip address 12.1.1.1 255.255.255.0
R1(config-if)#no shut
 
R1(config)#router bgp 65001
R1(config-router)#neighbor 12.1.1.2 remote-as 65002
R1(config-router)#network 1.1.1.1 mask 255.255.255.255

On R2:

R2(config)#interface lo0
R2(config-if)#ip address 2.2.2.2 255.255.255.255
 
R2(config)#interface g0/1
R2(config-if)#ip address 12.1.1.2 255.255.255.0
R2(config-if)#no shut
 
R2(config-if)#interface g0/3
R2(config-if)#ip address 23.1.1.2 255.255.255.0
R2(config-if)#no shut
 
R2(config)#router bgp 65002
R2(config-router)#neighbor 12.1.1.1 remote-as 65001

R2(config-router)#neighbor 23.1.1.3 remote-as 300
R2(config-router)#network 2.2.2.2 mask 255.255.255.255

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 12.1.1.1 Up

On R3:

R3(config)#interface lo0
R3(config-if)#ip address 3.3.3.3 255.255.255.255
 
R3(config)#interface g0/2
R3(config-if)#ip address 23.1.1.3 255.255.255.0
R3(config-if)#no shut
 
R3(config)#interface g0/4
R3(config-if)#ip address 34.1.1.3 255.255.255.0
R3(config-if)#no shut
 
R3(config)#router bgp 300
R3(config-router)#network 3.3.3.3 mask 255.255.255.255
R3(config-router)#neighbor 23.1.1.2 remote-as 65002
R3(config-router)#neighbor 34.1.1.4 remote-as 400

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 23.1.1.2 Up

On R4:

R4(config)#interface lo0
R4(config-if)#ip address 4.4.4.4 255.255.255.255
 
R4(config)#interface g0/3
R4(config-if)#ip address 34.1.1.4 255.255.255.0
R4(config-if)#no shut
 
R4(config)#router bgp 400
R4(config-router)#network 4.4.4.4 mask 255.255.255.255
R4(config-router)#neighbor 34.1.1.3 remote-as 300

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 34.1.1.3 Up

The show ip bgp command is issued on R4 to verify the BGP paths the router learns:

R4#show ip bgp | begin Net
 
     Network          Next Hop    Metric LocPrf Weight Path
 *>   1.1.1.1/32       34.1.1.3                       0 300 65002 65001 i
 *>   2.2.2.2/32       34.1.1.3                       0 300 65002 i
 *>   3.3.3.3/32       34.1.1.3          0            0 300 i
 *>   4.4.4.4/32       0.0.0.0           0        32768 i

A ping from R4’s loopback address 4.4.4.4 to R1’s address 1.1.1.1 is issued to verify reachability:

R4#ping 1.1.1.1 source 4.4.4.4
 
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
Packet sent with a source address of 4.4.4.4
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 5/26/109 ms

This topology has been created to demonstrate the differences between private and public ASNs. Every organization that participates in the global BGP routing table is assigned an autonomous system number (ASN). The ASNs are prepended to paths that an organization either passes on or originates. The original BGP RFC defined a 16-bit field to carry the ASN, which means there are a total of 65536 (0–65535) ASNs in the 16-bit ASN space. In order to participate in the global BGP table, an organization must apply for a BGP ASN from its local registry.

There are situations in which an organization might wish to run BGP privately for its own internal network or might need to run BGP with its service provider in order to advertise its own public IP address space to the global BGP table. This is similar to how organizations need to use IP addresses within their own private networks for devices and use other techniques such as proxy servers or Network Address Translation to provide Internet connectivity to the end devices. This is why the private IP address ranges from RFC 1918 were created.

Based on RFC 1930, BGP has a reserved range of ASNs for private use: 64512–65535. This means these ASNs should not be used on the global BGP table for Internet routes.

Note

RFC 6793 introduces 4-octet (32-bit) ASNs, and subsequent RFCs define different ranges of public/private ASNs for 4-octet ASNs. This lab focuses on 16-bit ASNs to show the basic concepts.

Observing the BGP table on R4, the AS_PATH attributes for the 1.1.1.1 and 2.2.2.2 networks both include the private ASNs 65002 and 65001 in addition to the public ASN 300. R3 and R4 in the topology are the only routers configured with public ASNs; therefore, private ASNs should not be exchanged between them.

R3 can be configured to remove the private ASNs and advertise the prefixes on behalf of its customer using the remove-private-as command appended to its neighbor command to R4. To examine this, the following configuration demonstrates removal of these private ASNs on R3 with the neighbor 34.1.1.4 remove-private-as command:

On R3:

R3(config)#router bgp 300
R3(config-router)#neighbor 34.1.1.4 remove-private-as

The command results in the router removing all occurrence of private ASNs in the AS_PATH attribute. Following is the packet capture for the BGP routing update sent by R3 to R4 that verifies this:

Internet Protocol Version 4, Src: 34.1.1.3, Dst: 34.1.1.4

Transmission Control Protocol, Src Port: 45745, Dst Port: 179, Seq: 262, Ack: 20, Len: 179

Border Gateway Protocol - UPDATE Message

Marker: ffffffffffffffffffffffffffffffff

Length: 48

Type: UPDATE Message (2)

Withdrawn Routes Length: 0

Total Path Attribute Length: 20

Path attributes

Path Attribute - ORIGIN: IGP

Path Attribute - AS_PATH: 300

Path Attribute - NEXT_HOP: 34.1.1.3

Network Layer Reachability Information (NLRI)

2.2.2.2/32

Border Gateway Protocol - UPDATE Message

Marker: ffffffffffffffffffffffffffffffff

Length: 48

Type: UPDATE Message (2)

Withdrawn Routes Length: 0

Total Path Attribute Length: 20

Path attributes

Path Attribute - ORIGIN: IGP

Path Attribute - AS_PATH: 300

Path Attribute - NEXT_HOP: 34.1.1.3

Flags: 0x40, Transitive, Well-known, Complete

Type Code: NEXT_HOP (3)

Length: 4

Next hop: 34.1.1.3

Network Layer Reachability Information (NLRI)

1.1.1.1/32

Border Gateway Protocol - UPDATE Message

Border Gateway Protocol - UPDATE Message

R4’s BGP table is also verified below. Notice that, unlike in the earlier output, the private ASNs 65002 and 65001 no longer show up in the AS_PATH attribute:

On R4:

R4#show ip bgp | begin Net
 
     Network          Next Hop              Metric LocPrf Weight Path
 *>   1.1.1.1/32       34.1.1.3                               0 300 i
 *>   2.2.2.2/32       34.1.1.3                               0 300 i
 *>   3.3.3.3/32       34.1.1.3                 0             0 300 i
 *>   4.4.4.4/32       0.0.0.0                  0         32768 i

The topology is modified as shown below in order to demonstrate another scenario. The BGP configuration from earlier is removed from R1, R2, and R3. R1 is reconfigured in AS 65001. R2 now belongs to the public AS 200.

Images

The following commands remove the BGP configuration from R1, R2, and R3:

On R1:

R1(config)#no router bgp 65001

On R2:

R2(config)#no router bgp 65002

On R3:

R3(config)#no router bgp 300

The following commands reconfigure the eBGP peerings based on the diagram above:

On R1:

R1(config)#router bgp 65001
R1(config-router)#neighbor 12.1.1.2 remote-as 200
R1(config-router)#network 1.1.1.1 mask 255.255.255.255

On R2:

R2(config)#router bgp 200
R2(config-router)#neighbor 23.1.1.3 remote-as 300
R2(config-router)#neighbor 12.1.1.1 remote-as 65001
R2(config-router)#network 2.2.2.2 mask 255.255.255.255

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 12.1.1.1 Up

On R3:

R3(config)#router bgp 300
R3(config-router)#network 3.3.3.3 mask 255.255.255.255
R3(config-router)#neighbor 34.1.1.4 remote-as 400
R3(config-router)#neighbor 23.1.1.2 remote-as 200

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 34.1.1.4 Up
%BGP-5-ADJCHANGE: neighbor 23.1.1.2 Up

In R4’s BGP table, notice that the path to 1.1.1.1/32 has the private ASN 65001 included in the AS_PATH attribute:

On R4:

R4#show ip bgp | begin Net
 
     Network          Next Hop           Metric LocPrf Weight Path
 *>   1.1.1.1/32       34.1.1.3                         0 300 200 65001 i
 *>   2.2.2.2/32       34.1.1.3                         0 300 200 i
 *>   3.3.3.3/32       34.1.1.3           0             0 300 i
 *>   4.4.4.4/32       0.0.0.0            0         32768 i

Much as in the earlier scenario, the neighbor 34.1.1.4 remove-private-as command is issued on R3 to remove the private ASN 65001:

On R3:

R3(config)#router bgp 300
R3(config-router)#neighbor 34.1.1.4 remove-private-as
 
R3#clear ip bgp *

After using clear ip bgp * on R3, the BGP table on R4 is shown again. Unlike earlier, the command has no effect. The private ASN 65001 still exists for the path to the 1.1.1.1/32 prefix:

On R4:

R4#show ip bgp | begin Net
 
     Network          Next Hop           Metric LocPrf Weight Path
 *>   1.1.1.1/32       34.1.1.3                         0 300 200 65001 i
 *>   2.2.2.2/32       34.1.1.3                         0 300 200 i
 *>   3.3.3.3/32       34.1.1.3           0             0 300 i
 *>   4.4.4.4/32       0.0.0.0            0         32768 i

This scenario is different from the previous one in a key area. In the previous scenario, R3 received only private ASNs in the AS_PATH attribute for paths received from R2. In this scenario, R3 receives a combination of public and private ASNs in the AS_PATH attribute for paths received from R2. It is this distinction that causes the behavior above. Cisco documents this situation as follows:

If the AS path includes both private and public AS numbers, the software considers this to be a configuration error and does not remove the private AS numbers.

Since the AS_PATH attribute for the 1.1.1.1/32 network contains both the public and private ASNs, the software on R3 fails to remove the private ASN. This was the behavior in IOS versions prior to 15.1(2)T. IOS Versions 15.1(2)T and later introduced a command that modifies this behavior. You can explicitly tell the router to remove the private by appending the all keyword to the end of the remove-private-as command, as shown below:

On R3:

R3(config)#router bgp 300
R3(config-router)#neighbor 34.1.1.4 remove-private-as all
 
R3#clear ip bgp * soft out

After issuing a clear ip bgp * soft out on R3, R4’s BGP table verifies that the private ASN is no longer included in the AS_PATH attribute for the 1.1.1.1/32 network:

On R4:

R4#show ip bgp | begin Net
 
     Network          Next Hop             Metric LocPrf Weight Path
 *>   1.1.1.1/32       34.1.1.3                            0 300 200 i
 *>   2.2.2.2/32       34.1.1.3                            0 300 200 i
 *>   3.3.3.3/32       34.1.1.3                 0          0 300 i
 *>   4.4.4.4/32       0.0.0.0                  0      32768 i

The routers are reconfigured according to the topology below to demonstrate another scenario:

Images

On R4:

R4(config)#no router bgp 400

On R3:

R3(config)#no router bgp 300

On R2:

R2(config)#no router bgp 200

On R1:

R1(config)#no router bgp 65001
 
R1(config)#router bgp 65001
R1(config-router)#network 1.1.1.1 mask 255.255.255.255
R1(config-router)#neighbor 12.1.1.2 remote-as 200

On R2:

R2(config)#router bgp 200
R2(config-router)#network 2.2.2.2 mask 255.255.255.255
R2(config-router)#neighbor 23.1.1.3 remote-as 65003
R2(config-router)#neighbor 12.1.1.1 remote-as 65001

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 12.1.1.1 Up
 

On R3:

R3(config)#router bgp 65003
R3(config-router)#network 3.3.3.3 mask 255.255.255.255
R3(config-router)#neighbor 34.1.1.4 remote-as 400
R3(config-router)#neighbor 23.1.1.2 remote-as 200

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 23.1.1.2 Up
 

On R4:

R4(config)#router bgp 400
R4(config-router)#network 4.4.4.4 mask 255.255.255.255
R4(config-router)#neighbor 34.1.1.3 remote-as 65003

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 34.1.1.3 Up

Now the show ip bgp command is issued on R4:

R4#show ip bgp | begin Net
 
     Network          Next Hop         Metric LocPrf Weight Path
 *>   1.1.1.1/32       34.1.1.3                       0 65003 200 65001 i
 *>   2.2.2.2/32       34.1.1.3                       0 65003 200 i
 *>   3.3.3.3/32       34.1.1.3         0             0 65003 i
 *>   4.4.4.4/32       0.0.0.0          0         32768 i

Here, R4 receives private ASNs in the AS_PATH attribute for paths received from R3 once again. The neighbor 34.1.1.4 remove-private-as command is once again issued to have R3 remove the private ASNs from the BGP routing updates to R4:

On R3:

R3(config)#router bgp 65003
R3(config-router)#neighbor 34.1.1.4 remove-private-as
 
R3#clear ip bgp * soft out

As expected, the command doesn’t work. The private ASNs still exist in the AS_PATH attributes of the 1.1.1.1, 2.2.2.2, and 3.3.3.3 networks:

On R4:

R4#show ip bgp | begin Net
 
     Network          Next Hop      Metric LocPrf Weight Path
 *>   1.1.1.1/32       34.1.1.3                    0 65003 200 65001 i
 *>   2.2.2.2/32       34.1.1.3                      0 65003 200 i
 *>   3.3.3.3/32       34.1.1.3         0             0 65003 i
 *>   4.4.4.4/32       0.0.0.0          0         32768 i

The result is expected because the AS_PATH attribute that R3 receives contains both public and private ASNs. As in the previous example, by default the software considers this an error and does not remove the private ASNs. The simple solution here should be to add the all parameter. Doing so, however, causes the following message on R3:

On R3:

R3(config)#router bgp 65003
R3(config-router)#neighbor 34.1.1.4 remove-private-as all
 

%BGP: Private AS cannot be removed for local private as

This message is the router’s way of kindly letting you know that using the all command in this case would mean removing the local private AS as well. In other words, when R3 advertises paths to R4, it must add its own ASN to the AS_PATH attribute. R3’s ASN is 65003 in this case. Using the AS_PATH attribute for the path to the 1.1.1.1/32 prefix as an example, R3 will add its own ASN to the AS_PATH attribute, yielding the AS_PATH attribute value 65003 200 65001. After adding this information, if configured with the remove-private-as all command on its peering to R4, it must remove all private ASNs. This means the only ASN that would remain is AS 200, which is NOT R3’s ASN. This is where the problem begins.

Whenever a router removes the private ASNs from an AS_PATH attribute, it is taking on responsibility for advertising those paths into the global BGP table on behalf of the autonomous systems that have only private ASNs. In other words, when R3 removes the private ASNs from the path to the 1.1.1.1/32 network when advertising to R4, it is abstracting those ASNs that are private and representing them as being advertised by its own AS. If R3 removes its own private ASN during this process, then it would appear to R4 that the paths are actually coming from AS 200 on R2. This can have negative impacts on the global BGP topology.

If, for example, R3 had other peers with public ASNs, paths R3 advertised to R4 could loop back to R3 through those other peers. This is because R3 has removed its own ASN from the AS_PATH attribute and has broken the primary loop-detection mechanism for BGP.

Removing ASNs from the AS_PATH attribute comes at a cost in the BGP table. The AS_PATH attribute is supposed to be a record of all autonomous systems a path passes through. This way, routing information loops can be detected and blocked at AS boundaries (which happens when an AS receives a BGP update with its own ASN in the AS_PATH attribute). Each advertising AS adds its ASN to the AS_PATH attribute. When this information is removed, the loop detection is broken.

Breaking the loop detection doesn’t matter for stub autonomous systems, which are autonomous systems that do not provide transit for Internet traffic. Such an autonomous system is typically a customer peered with a single ISP. The ISP participates in global BGP routing and has its own public ASN. Since the ISP customer is only peered with the ISP, it is okay for the ISP to remove the public ASN from paths received for customer prefixes and advertise its own ASN to other public BGP peers it has. There is no way for the customer to receive a looped path because that would mean the ISP itself would have to receive a looped path first.

Thus the proper design for removing private ASNs is as presented in earlier scenarios. The router removing the private ASNs needs to have its own public ASN assigned that it prepends to the outgoing paths advertised to its neighbor. In this scenario, the router receiving the public ASNs is acting as a service provider, filtering out the private ASNs and advertising the paths in the public BGP table on behalf of its customers.

If a customer is multihomed to two ISPs, the customer should have its own public ASN that it advertises to its own ISPs. If it uses a private ASN and both providers remove it, the customer AS could potentially receive a looped path, depending on local best-path decision policies in transit autonomous systems between the two ISPs. (The path from ISP1 is advertised to ISP2 and then advertised to the customer again.)

For these reasons, IOS alerts of this condition and rejects the all parameter whenever the local ASN is also a private ASN. This is proven by the BGP configuration on R3 lacking the all parameter:

R3#show run | section router bgp
 
router bgp 65003
 bgp log-neighbor-changes
 network 3.3.3.3 mask 255.255.255.255
 neighbor 23.1.1.2 remote-as 200
 neighbor 34.1.1.4 remote-as 400
 neighbor 34.1.1.4 remove-private-as

The routers are modified for the following topology.

Images

On R4:

R4(config)#no router bgp 400

On R3:

R3(config)#no router bgp 300

On R2:

R2(config)#no router bgp 200

On R1:

R1(config)#no router bgp 65001

R1(config)#router bgp 65001
R1(config-router)#network 1.1.1.1 mask 255.255.255.255
R1(config-router)#neighbor 12.1.1.2 remote-as 65002

On R2:

R2(config)#router bgp 65002
R2(config-router)#network 2.2.2.2 mask 255.255.255.255
R2(config-router)#neighbor 23.1.1.3 remot 65003
R2(config-router)#neighbor 12.1.1.1 remot 65001

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 12.1.1.1 Up
 

On R3:

R3(config)#router bgp 65003
R3(config-router)#network 3.3.3.3 mask 255.255.255.255
R3(config-router)#neighbor 34.1.1.4 remote-as 400
R3(config-router)#neighbor 23.1.1.2 remote-as 65002

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 23.1.1.2 Up
 

On R4:

R4(config)#router bgp 400
R4(config-router)#network 4.4.4.4 mask 255.255.255.255
R4(config-router)#neighbor 34.1.1.3 remote-as 65003

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 34.1.1.3 Up

In this scenario, all routers except for R4 are using private ASNs. R4’s BGP table shows the following:

R4#show ip bgp | begin Net
 
     Network        Next Hop      Metric LocPrf Weight Path
 *>   1.1.1.1/32     34.1.1.3                    0 65003 65002 65001 i
 *>   2.2.2.2/32     34.1.1.3                    0 65003 65002 i
 *>   3.3.3.3/32     34.1.1.3         0          0 65003 i
 *>   4.4.4.4/32     0.0.0.0          0      32768 i

R3 will once again be tasked with removing the private ASNs from the AS_PATH attribute it received from R2 before advertising to R4 using the remove-private-as command:

On R3:

R3(config)#router bgp 65003
R3(config-router)#neighbor 34.1.1.4 remove-private-as

After R3 re-sends its updates (when it is forced to do so by the clear ip bgp * soft out command), R4’s BGP table shows the following:

R3#clear ip bgp * soft out

On R4:

R4#show ip bgp | begin Net
 
      Network          Next Hop             Metric LocPrf Weight Path
 *>   1.1.1.1/32       34.1.1.3                              0 65003 i
 *>   2.2.2.2/32       34.1.1.3                              0 65003 i
 *>   3.3.3.3/32       34.1.1.3                 0            0 65003 i
 *>   4.4.4.4/32       0.0.0.0                  0        32768 i

Notice that in this situation, R3 successfully removed the private ASNs 65002 and 65001. Its own private ASN, 65003, remains. This behavior is consistent with the previous examples. Since there are no public ASNs in the AS_PATH attribute on the paths R3 receives from R2, R3 can remove all private ASNs except its own by using just the regular remove-private-as command. It doesn’t attempt to remove its own private ASN because the all parameter has not been issued. If it were to be added, the IOS software on R3 would return the following error:

On R3:

R3(config)#router bgp 65003
R3(config-router)#neighbor 34.1.1.4 remove-private-as all
%BGP: Private AS cannot be removed for local private as

This error occurs for the same reason as in the previous task: R3 cannot remove its own ASN from the AS_PATH attribute. In this case, doing so would actually make R3 advertise a blank AS_PATH attribute to R4. With the configuration as is, from R4’s perspective, R3 has originated paths to the 1.1.1.1, 2.2.2.2, and 3.3.3.3 prefixes.

A final scenario for removing private ASNs is shown here:

Images

The routers are modified to reflect this topology.

On R4:

R4(config)#no router bgp 400

On R3:

R3(config)#no router bgp 65003

On R2:

R2(config)#no router bgp 65002

On R1:

R1(config)#no router bgp 65001
R1(config)#router bgp 100
R1(config-router)#network 1.1.1.1 mask 255.255.255.255
R1(config-router)#neighbor 12.1.1.2 remote-as 65002

On R2:

R2(config)#router bgp 65002
R2(config-router)#network 2.2.2.2 mask 255.255.255.255

R2(config-router)#neighbor 23.1.1.3 remote-as 300
R2(config-router)#neighbor 12.1.1.1 remote-as 100

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 12.1.1.1 Up
   

On R3:

R3(config)#router bgp 300
R3(config-router)#network 3.3.3.3 mask 255.255.255.255
R3(config-router)#neighbor 34.1.1.4 remote-as 400
R3(config-router)#neighbor 23.1.1.2 remote-as 65002

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 23.1.1.2 Up
 

On R4:

R4(config)#router bgp 400
R4(config-router)#network 4.4.4.4 mask 255.255.255.255
R4(config-router)#neighbor 34.1.1.3 remote-as 300

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 34.1.1.3 Up
 

To verify the configuration:

R4#show ip bgp | begin Net
 
     Network          Next Hop                  Metric LocPrf Weight Path
 *>   1.1.1.1/32       34.1.1.3                         0 300 65002 100 i
 *>   2.2.2.2/32       34.1.1.3                         0 300 65002 i
 *>   3.3.3.3/32       34.1.1.3           0             0 300 i
 *>   4.4.4.4/32       0.0.0.0            0         32768 i

In this topology, AS 65002 exists between AS 100 and AS 300. It advertises paths to AS 300. AS 300 needs to remove the private ASN from the paths as it advertises to AS 400. The show ip bgp | begin Net command on R4 in AS 400 confirms the received private ASNs:

On R4:

R4#show ip bgp | begin Net
 
     Network          Next Hop         Metric LocPrf Weight Path
 *>   1.1.1.1/32       34.1.1.3                      0 300 65002 100 i
 *>   2.2.2.2/32       34.1.1.3                      0 300 i
 *>   3.3.3.3/32       34.1.1.3           0          0 300 i
 *>   4.4.4.4/32       0.0.0.0            0      32768 i

Building on results from prior examples, it’s clear that just issuing the remove-private-as command will not be sufficient to complete the task. This is because the private ASN exists in the middle of the AS_PATH attribute. To implement the required task changes, the remove-private-as all command will be used on R3’s neighbor command to R4:

On R3:

R3(config)#router bgp 300
R3(config-router)#neighbor 34.1.1.4 remove-private-as all

After clearing the BGP updates outbound on R3, R4’s BGP table shows the following:

R3#clear ip bgp * soft out

On R4:

R4#show ip bgp | begin Net
 
     Network          Next Hop           Metric LocPrf Weight Path
 *>   1.1.1.1/32       34.1.1.3                        0 300 100 i
 *>   2.2.2.2/32       34.1.1.3                        0 300 i
 *>   3.3.3.3/32       34.1.1.3             0          0 300 i
 *>   4.4.4.4/32       0.0.0.0              0      32768 i

R3 removes the private ASNs from the middle of the AS_PATH attribute. From R4’s perspective, R3 receives the path from AS 100. R4 has no knowledge of AS 65002 sitting in between. R3 has advertised the path on behalf of AS 65002.

To summarize this section, the remove-private-as command attached to a neighbor will remove all private ASNs from an AS_PATH attribute as long as there are only private ASNs in the AS_PATH attribute. Otherwise, the all parameter should be added to the command. The all parameter removes all private ASNs in the AS_PATH attribute only if the local ASN is a public ASN.

Lab 10: AS Migration

Images

This lab should be conducted on the Enterprise POD.

Lab Setup:

If you are using EVE-NG, and you have imported the EVE-NG topology from the EVE-NG-Topology folder, ignore the following and use Lab-10- AS Migration in the BGP folder in EVE-NG.

To copy and paste the initial configurations, go to the initial-config folder BGP folder Lab-10.

Task 1

Configure R1 in AS 100 to establish an eBGP session with R2 in AS 200. Ensure that these routers advertise their Loopback0 interfaces.

Task 2

Configure R1 in AS 111 to establish an eBGP peering session with R2 in AS 200 such that the output of the show ip bgp command on these two routers will be identical to the following:

On R1:

R1#show ip bgp | begin Network
   Network          Next Hop            Metric LocPrf Weight Path
*> 1.0.0.0          0.0.0.0                  0         32768 i
*> 2.0.0.0          12.1.1.2                 0             0 100 200 i

On R2:

R2#show ip bgp | begin Network
   Network          Next Hop            Metric LocPrf Weight Path
*> 1.0.0.0          12.1.1.1                 0             0 100 111 i
*> 2.0.0.0          0.0.0.0                  0         32768 i

Task 3

Configure R1 such that when R2 advertises network 2.0.0.0/8, the output of the show ip bgp command on R1 resembles the following. Do not remove any commands to accomplish this task.

On R1:

R1#show ip bgp | begin Network
   Network          Next Hop            Metric LocPrf Weight Path
*> 1.0.0.0          0.0.0.0                  0         32768 i
*> 2.0.0.0          12.1.1.2                 0             0 200 i

Task 4

Configure R1 such that the output of the show ip bgp command on R2 is identical to the following. Do not remove any commands to accomplish this task.

On R2:

R2#show ip bgp | begin Network
   Network          Next Hop            Metric LocPrf Weight Path
*> 1.0.0.0          12.1.1.1                 0             0 100 i
*> 2.0.0.0          0.0.0.0                  0         32768 i

Task 5

Configure R1 such that R2 can establish an eBGP peering session with R1 using AS 111 or AS 100.

Task 6

Erase the startup configuration and reload the routers before proceeding to the next lab.

Lab 11: BGP Best-Path Algorithm: A Walkthrough

This lab should be conducted on the MOCK LAB POD.

Lab Setup:

If you are using EVE-NG, and you have imported the EVE-NG topology from the EVE-NG-Topology folder, ignore the following and use Lab-11- BGP BestPath Algorithm a Walk Through in the BGP folder in EVE-NG.

To copy and paste the initial configurations, go to the initial-config folder BGP folder Lab-11.

Introduction

This lab follows a guided lab format focused on explaining the BGP best-path algorithm. The lab explores how and why BGP uses such an algorithm for making decisions about which paths should be added to the local RIB of a router. The lab begins with a brief introduction to core BGP concepts and then elaborates on BGP path attributes as they relate to the processing of the BGP best-path algorithm. These examples are demonstrated in a sample topology with real output from IOS devices. The lab assumes a basic understanding of configuring BGP peers and knowledge of the difference between an internal peer and an external peer.

Building Blocks of BGP

To understand BGP’s place in the modern network, you must first be familiar with the concept of autonomous systems. An autonomous system (AS) is a set of networking equipment that belongs to the same governing body. The Internet is composed of many autonomous systems. Each AS advertises the locations and names of resources, such as websites, FTP servers, or other services, that are made available across the public Internet. In order to exchange this routing information dynamically, a routing protocol was needed that can be run between these autonomous system.

BGP is the routing protocol of choice for carrying these routes. It is the successor to EGP, which was the original exterior gateway protocol. BGP added many features the EGP lacked. BGP gained this position because it was engineered to be scalable, flexible, stable, and tunable. BGP maintains these four features through its unique operation.

Path Vector

Routing protocols can generally be classified into two categories: link-state and distance vector protocols. Link-state routing protocols operate by advertising the status of connected links—such as cost and what other devices are connected to those links—to neighboring routers. Neighboring routers then use this information to build a graph of the overall network topology. Routes are calculated as shortest path computations against points on the resulting graph. Distance vector protocols operate by advertising to a neighboring router all networks reachable by the local router and the local router’s cost to each network.

In both link-state and distance vector protocols, the prefixes are advertised based on the physical connections of the network. These connections have static metrics assigned to them that are aggregated along the way. The metric can be cost, delay, bandwidth, or any other quantifiable value. It is important to note that these characteristics are attached to the physical links of the network. As a result, if a link goes down or is added to the network, the aggregate metric values contained in the routing updates need to be updated accordingly. This adds a degree of instability to the overall network design.

Such instability would cause constant route updates across the public Internet. If an internal link in AS A goes down, for example, AS B does not need to be aware of it. It is up to AS A to route around the failure. For this reason, BGP needs a different way to calculate best paths that doesn’t rely on metrics attached to physical links. Instead, BGP uses the concept of paths.

Rather than describing links of an internal network, BGP describes virtual paths through an AS. When a BGP peer advertises a prefix, it isn’t advertising a physical connection but advertising the availability of a path that can be used to transit traffic. BGP peers exchange path information with each other and glean reachability information for all available paths between all autonomous systems participating in the global Internet routing table.

This advertisement and collection of different paths is one of the reasons BGP is called a path vector protocol. Rather than make decisions based on a calculated link-state graph or explicitly advertise all prefixes with metric assignments, BGP advertises paths. What kind of path, where the path goes, which BGP router advertised the path, which BGP router is the next step in the path, and how the path was first learned are all issues that are addressed through BGP path attributes.

Controlling Routing to Paths

The entire reason BGP advertises routing information as paths is to provide mechanisms to control traffic flow. When speaking about BGP routing, there are two kinds of traffic flows: local traffic and transit traffic. Local traffic is traffic that originates or is destined to a node (that is, a host, a server, or another network device) within the local AS. Transit traffic is traffic that originates outside the local AS and is destined to another AS that is external to the local AS. A transit AS is an AS that acts as an intermediary point between two or more autonomous systems.

Think of it like a business that spans a large geographic region containing multiple buildings. That business requires roads to connect all of its buildings together to allow employees and supplies to enter and leave the establishment. The business also requires a road that allows access to the main road utilized by all other citizens. Within the business’s compound, traffic traveling the internal roads is always destined for a building owned by the company. A normal citizen looking for access to another business would not go through this business to reach their destination. This business does not provide transit for any other businesses on its own local roads.

Some businesses are located in shared geographic areas where internal connections allow access to two separate businesses. Each business has a separate connection to the main road. The difference is, if a citizen wishes to access Business B, it could use Business A’s connection to the main road, pass through Business A’s internal roads, and access Business B. In this case, Business A has become a transit business because it can carry traffic that neither originates from nor is destined to a building it controls.

Using BGP, a network administrator can set policies, based on their own autonomous system’s corporate policies, regarding whether an AS is a transit AS and how traffic is sent and received from the Internet. These policies are similar to a business employing a gate at their main connection to the road. The security guards at the gate can deny or allow traffic based on a configured set of rules (policies); these rules are based on the attributes of the car wishing to transit the business. BGP paths have the same system of attributes assigned to each potential path. These attributes, called path attributes, are the main way administrators can influence the path selection process.

Path Attributes

BGP path attributes are descriptors attached to a BGP path that describe what kind of path it is. These descriptors include the destination network, the originating router, and the list of all autonomous systems that are traversed along the path. Each attribute is classified as one of the following:

  • Well-known mandatory: The attribute should be understood by all BGP-speaking routers and included in all BGP updates.

  • Well-known discretionary: The attribute should be understood by all BGP-speaking routers and may not be included in all BGP updates.

  • Optional transitive: The attribute does not have to be understood by all BGP-speaking routers and can be included in a BGP update.

  • Optional non-transitive: The attribute does not have to be understood by all BGP-speaking routers and should not be sent in any BGP updates.

For example, the BGP AS_PATH attribute is a well-known mandatory attribute. It must be included in all BGP updates and must be understood by all BGP implementations. The ORIGINATOR_ID attribute, however, is an optional non-transitive attribute. All BGP-speaking routers do not need to understand the ORIGINATOR_ID attribute, and it should not be sent in any update messages to other BGP peers.

Path attributes provide the framework for BGP’s decision-making algorithm when it encounters two paths to the same destination and can be modified by the administrator to achieve specific traffic flow goals.

Modifying Path Attributes

Let’s go back to the earlier example of the business with a security gate. If the business has two entrances—one for customers and one for freight trucks—the business could, in its advertisements to the city, specify that preference. The business would be modifying its advertisements outbound toward the rest of the city but would be affecting how the rest of the city entered its own establishment.

The same business can have a policy that all goods delivered to customers with large orders should leave out the freight entrance as well. Such orders are marked as they are received by the business to keep them separate from normal orders. In this case, the order received inbound from the customer determines how the company utilizes its exits outbound toward the city.

This concept can be applied to how path attributes are modified and advertised between BGP peers. Path attributes can be modified inbound as they enter the BGP table or as the local BGP router advertises the path outbound toward another BGP peer. These modifications are advertised in inbound (received from a BGP peer) and outbound (sent to a BGP peer) BGP update advertisements.

In general, outbound path attribute updates affect how traffic enters the AS, and inbound path attribute updates affect how traffic leaves the AS. The exact effect of a particular change depends on the specific attribute being modified and its position with regard to BGP’s decision-making algorithm.

The Best-Path Algorithm

By default, BGP can only advertise a single path to its BGP peers. This path, called the best path, is the same path that is sent to be potentially installed in the local router’s RIB. A common occurrence in routing protocols is receiving several advertisements for the same destination. IGPs utilize metrics as tie-breakers in this event, but BGP does not have a concept of traditional metrics. Instead, it relies on the path attributes to make its routing decisions.

BGP consumes the information provided in path attributes in order to choose a best path from among several competing paths. It does so in a step-by-step elimination process, in which a particular attribute is compared between the two competing paths, and preference is given to a specific value of that attribute. If the values tie, BGP continues to the next step in the algorithm.

Assuming that the next hop IP address specified in the BGP updates is reachable by the local router, the BGP best-path algorithm step-by-step process occurs as follows:

  1. Prefer the path with the higher WEIGHT attribute.

  2. Prefer the path with the higher LOCAL_PREF attribute.

  3. Prefer locally originated routes.

  4. Prefer the path with the lowest total AS_PATH attribute length.

  5. Prefer the path that is directly injected into BGP over paths that are learned from EGP. Prefer the path that originates from EGP over a path that has unknown or incomplete ORIGIN information.

  6. Prefer the path with the lowest MED value.

  7. Prefer a path learned externally over a path learned internally.

  8. Prefer the path for which the metric to the next hop on the local router is lowest.

  9. If all of the above tie, consider using both paths but continuing to evaluate the best path.

  10. If two competing paths are both external paths, prefer the path that was learned first.

  11. Prefer the path with the lowest BGP router ID.

  12. Prefer the path with the shorter CLUSTER length.

  13. Prefer the path learned from the peer with the lowest peering IP address.

This algorithm provides two important things to the operation of BGP. First, it ensures that no matter what different set of path attributes two competing paths have, there will always be a single best path chosen. Second, it allows an administrator to manipulate the path attributes in order to influence BGP’s best-path decision.

The remainder of this lab examines all the steps of the BGP decision-making algorithm and how specific path attributes can be modified to influence BGP’s decision making. It also specifies cases where inbound and outbound make a difference in specific traffic flows.

The topology for this lab is located in the initial-config folder → BGP folder, BGP-Topology.pdf.

The following section describes the topology setup. The initial configurations for all the switches and routers are found in the initial configuration folder.

[--L2 Switching--]

  1. SW1 - SW5 are used for this topology

    1. R1, R2, R3, R4, R5, R6, R7, R8 are connected to SW1

    2. R9 and R10 are connected to SW2

    3. R11, R12, R13, R14, and R15 are connected to SW3

    4. R21 is connected to SW4

    5. R16, R17, R18, R19, and R20 are connected to SW5

  2. The routers to switch connections are configured to run as trunk ports

  3. All x/1 (x - Switch number) interfaces between the switches are configured to run as trunk ports

[--IP Addressing--]

  1. Configure IPv4 addresses on the physical and logical interfaces as indicated in the diagram

    1. Each router in the topology is configured with loopback 1 interface. The IP address format for the loopback 1 interface is x.x.x.x/32 where x is the router number. Eg. R20’s loopback 1 address is configured with an IP address of 20.20.20.20/32

[--BGP Configuration--]

  1. AS 300

    1. All BGP peerings within AS 300 are established over Loopback 1 interfaces. AS 300 is subdivided as:

      1. R1 and R2 are iBGP peers in sub confederation 312

      2. R4 and R5 are iBGP peers in sub confederation 345

      3. R7 and R8 are iBGP peers in sub confederation 378

      4. R3 and R6 are iBGP peers in sub confederation 336

      5. R1 and R4 are eBGP confederation peers

      6. R2 and R3 are eBGP confederation peers

      7. R3 and R4 are eBGP confederation peers

      8. R5 and R7 are eBGP confederation peers

      9. R6 and R8 are eBGP confederation peers

  2. AS 100

    1. R19 and R20 use their loopback 1 interface for the iBGP peering session between them

  3. AS 200

    1. R16, R17, and R18 use their loopback 1 interface for the iBGP peering session between them

  4. AS 400

    1. All routers in AS 400 use their loopback 1 interfaces for their iBGP peerings

    2. R11 reflects routes to R10 and R14

    3. R12 reflects routes to R9, R10, R11, and R13

    4. R13 reflects routes to R14 and R15

    5. R9 and R10 are iBGP peers

  5. eBGP Peerings are established between:

    1. R2 - R20

    2. R2 - R16

    3. R2 - R10

    4. R3 - R10

    5. R6 - R11

    6. R18 - R19

    7. R10 - R17

    8. R9 - R17

Step 1: WEIGHT

Note

Before starting this section, revert the configuration on all the routers to the base initial configuration files provided with the lab.

The first step, holding the highest precedence over all other steps, in the best-path algorithm involves comparing a path attribute called the WEIGHT attribute. The WEIGHT attribute is a Cisco-proprietary attribute, and therefore, this step is performed only on Cisco routers. It is represented by a 16-bit number with a valid range of 0 to 65,535. The attribute is an optional, non-transitive attribute, meaning its value is only significant to the local router and will not be exchanged with neighboring BGP routers in UPDATE messages.

By default, the router will set the WEIGHT for all paths it injects into the BGP table through either network, redistribute, or aggregate-address commands to 32768. Paths introduced into the BGP table are considered local paths, and an example can be seen in the show ip bgp 110.19.1.1 output on R19 below:

On R19:

R19#show run | section router bgp
 
router bgp 100
 bgp log-neighbor-changes
 network 110.19.1.1 mask 255.255.255.255
 
R19#show ip bgp 110.19.1.1
 
BGP routing table entry for 110.19.1.1/32, version 2
Paths: (1 available, best #1, table default)
  Advertised to update-groups:
     1          2
  Refresh Epoch 1
  Local
    0.0.0.0 from 0.0.0.0 (19.19.19.19)
      Origin IGP, metric 0, localpref 100, weight 32768, valid,
sourced, local, best
      rx pathid: 0, tx pathid: 0x0

R19 is configured to inject a path to the 110.19.1.1 network into BGP. This path is flagged in its BGP table as a local route.

All other paths received through BGP updates have their WEIGHT attributes initialized at 0. These defaults can be modified per neighbor or per prefix. The non-transitive property of the WEIGHT attribute carries the implication that it is not possible to set it in the outbound direction for a neighbor. If such a configuration is ever attempted, the router kindly reminds you of this fact with the message “% ‘WEIGHT’ used as BGP outbound route-map, set weight not supported.The non-transitive restriction leaves only two ways of manually setting the weight attribute on a router:

  • Attached to the neighbor command using the neighbor x.x.x.x weight weight command

  • Attached to the neighbor command using an inbound route map with the set weight clause

The non-transitive property of the WEIGHT attribute also ensures that a router cannot influence the WEIGHT value of its BGP peer. The WEIGHT attribute’s position as first in the best-path algorithm and its inability to be affected by neighboring BGP routers give the administrator complete control over what the local router ultimately chooses as its best path, regardless of the interactions of other path attributes.

The WEIGHT attribute can be utilized to force the local router to prefer one path over another in its BGP table. When deciding between two paths to the same prefix in the BGP table, the router will choose the path with the higher WEIGHT value. This preference for choosing higher weight values means that all prefixes originated by the router itself (through the network, redistribute, or aggregate-address commands) will be preferred over all other routers by default due to their default weight setting of 32,768.

This point can be observed below. R20 is configured to inject paths to the 110.20.1.1 and 110.20.2.1 prefixes into its BGP table using a network command:

On R20:

router bgp 100
 network 110.20.1.1 mask 255.255.255.255
 network 110.20.2.1 mask 255.255.255.255
 
R20#show ip bgp
 
BGP table version is 16, local router ID is 20.20.20.20
Status codes: s suppressed, d damped, h history, * valid, > best,
i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path,
f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
 
     Network          Next Hop             Metric LocPrf Weight Path
 *>i  110.19.1.1/32    19.19.19.19              0    100      0 i
 *>i  110.19.2.1/32    19.19.19.19              0    100      0 i
 *>   110.20.1.1/32    0.0.0.0                  0         32768 i
 *>   110.20.2.1/32    0.0.0.0                  0         32768 i
 *>i  120.18.1.1/32    19.19.19.19              0    100      0 200 i
 *                     200.2.20.2                             0 300 200 i
 *>i  120.18.2.1/32    19.19.19.19              0    100      0 200 i
 *                     200.2.20.2                             0 300 200 i
 *>   130.7.1.1/32     200.2.20.2                             0 300 i
 * i  140.15.1.1/32    19.19.19.19              0    100      0 200 400 i
 *>                    200.2.20.2                             0 300 400 i
 * i  140.15.2.1/32    19.19.19.19              0    100      0 200 400 i
 *>                    200.2.20.2                             0 300 400 i

The WEIGHT values for those paths (highlighted in red) have been set to 32768 in the output. In contrast, the WEIGHT values for the BGP paths learned from R19 and R2 (110.19.1.1/32, 110.19.2.1/32, 120.18.1.1/32, and 120.18.2.1/32) are set to 0.

As stated above, the administrator can manually modify the WEIGHT attribute for a particular path to influence the local outbound routing decision in the BGP table. In the output above, R20 learns paths to the 120.18.1.1 and 120.18.2.1 prefixes from its iBGP neighbor R19 and eBGP neighbor R2. Because the WEIGHT values for these paths are tied, the path via R19 is chosen as best because it has a shorter AS_PATH attribute length (step 4 in the best-path algorithm, explained in detail later). To demonstrate the WEIGHT attribute’s higher precedence over the AS_PATH attribute length, R20 will be configured to prefer the path via R2 instead of R19 for the 120.18.1.1 prefix.

Recall that the WEIGHT value can be assigned directly with the neighbor x.x.x.x weight command. However, this would apply to all routes learned from that neighbor, which is not the desired outcome of this example. Instead, a route map will be configured using the set weight clause. The route map uses a prefix list to target only the 120.18.1.1 prefix. This route map will be applied to the R20/R2 neighbor peering statement in the inbound direction, as shown below.

Note

The neighbor x.x.x.x weight command and an inbound route map can be used simultaneously to set WEIGHT for paths coming from a particular neighbor.

In such a situation, the inbound route map takes precedence over the global neighbor x.x.x.x weight command. Any path to a prefix that is not matched by the inbound route map will receive the WEIGHT set by the neighbor x.x.x.x weight command.

In the following configuration, the prefix 120.18.1.1 is first identified and permitted in a prefix list called 123. Then a route map called tst is created that references the 123 prefix list. This route map contains the set weight parameter with a value of 32768. The route map is then appended to the neighbor statement for the eBGP peer 200.2.20.2 in the inbound direction:

On R20:

R20(config)#ip prefix-list 123 permit 120.18.1.1/32
 
R20(config)#route-map tst permit 10
R20(config-route-map)#match ip address prefix 123
R20(config-route-map)#set weight 32768
R20(config)#route-map tst permit 90
 
R20(config)#router bgp 100
R20(config-router)#neighbor 200.2.20.2 route-map tst in

Note

The relationship between route maps and prefix lists (and access lists) has two fundamental properties:

  • The prefix list or access list identifies prefixes that will be manipulated by the route map.

  • The route map manipulates, permits, or denies those prefixes (or paths to prefixes, in the case of BGP) from being advertised or accepted by the local router.

When using prefix lists or access lists for BGP path attribute modifications, the following rules apply:

  • A permit action in a prefix list indicates a prefix that will be matched.

  • A deny action in a prefix list indicates prefixes that will not be matched.

Similarly, when working with route maps for the same purpose, the following rules apply:

  • A permit route map statement allows the path to be accepted or advertised if it matches all match clauses.

  • A deny route map statement disallows the path from being accepted or advertised if it matches all match clauses.

The implicit deny at the end of a prefix list or access list prevents all prefixes that are not matched by a permit statement from being manipulated by that particular route map statement. Those that are not matched by the prefix list or access list will still be able to be processed by subsequent route map statements.

The implicit deny at the end of a route map prevents all prefixes that are not matched by any route map statement to be filtered out or blocked from being advertised or accepted. For this reason, it is important to include an empty permit route map statement (containing no match clauses) at the end of a route map to ensure that unaffected prefixes are allowed to safely pass through to be advertised or accepted.

With the above configured, clear ip bgp * soft in is issued on R20. R20 sends a route refresh message to all its neighbors, which is an indication to them to re-send their routing advertisements. When R20 receives the refreshed BGP updates, it processes the inbound route map configuration, assigning the WEIGHT value of 32768 to R2’s path for the prefix 120.18.1.1.

Since the WEIGHT value has higher precedence than the AS_PATH attribute, R20 chooses the path via R2 as best. Notice that the best path to 120.18.2.1 has not been affected. Traffic destined to the 120.18.1.1 network from R20 will now transit AS 300, and traffic destined to the 120.18.2.1 network from R20 will be sent to R19.

R20#show ip bgp regexp _200$
 
BGP table version is 48, local router ID is 20.20.20.20
Status codes: s suppressed, d damped, h history, * valid, > best,
i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path,
f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
 
     Network          Next Hop             Metric LocPrf Weight Path
 *>  120.18.1.1/32    200.2.20.2                     32768 300 200 i
 * i                  19.19.19.19          0    100      0 200 i
 *   120.18.2.1/32    200.2.20.2                         0 300 200 i
 *>i                  19.19.19.19          0    100      0 200 i

Observing R20’s advertisement of the 120.18.1.1 network to R19 helps confirm the non-transitive property of the WEIGHT attribute. A quick look at R19’s BGP table for its paths to the 120.18.1.1/32 prefix reveals a missing WEIGHT value for the path received from R20:

On R19:

R19#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 6
Paths: (2 available, best #2, table default)
  Advertised to update-groups:
     2
  Refresh Epoch 2
  300 200
    20.20.20.20 (metric 11) from 20.20.20.20 (20.20.20.20)
      Origin IGP, metric 0, localpref 100,[weight value of 0 should be
 here but isn’t], valid, internal
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  200
    200.18.19.18 from 200.18.19.18 (18.18.18.18)
      Origin IGP, metric 0, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0

The missing WEIGHT value is a result of how Cisco IOS chooses to display path attributes in the long format of the show ip bgp x.xx.x.x command. The router shows a weight value only if it has a value over 0. Paths with a WEIGHT value of 0 do not report this value in the long version of the command. The 0 WEIGHT value is, however, reported in the short version show ip bgp command output shown below:

R19#show ip bgp | begin Net
 
     Network          Next Hop             Metric LocPrf Weight Path
 *>  110.19.1.1/32    0.0.0.0               0         32768 i
 *>  110.19.2.1/32    0.0.0.0               0         32768 i
 *>i 110.20.1.1/32    20.20.20.20           0    100      0 i
 *>i 110.20.2.1/32    20.20.20.20           0    100      0 i
 * i 120.18.1.1/32    20.20.20.20           0    100      0 300 200 i
 *>                   200.18.19.18          0             0 200 i
 *>  120.18.2.1/32    200.18.19.18          0             0 200 i
 *   130.7.1.1/32     200.18.19.18                        0 200 300 i
 *>i                  20.20.20.20           0    100      0 300 i
 * i 140.15.1.1/32    20.20.20.20           0    100      0 300 400 i
 *>                   200.18.19.18                        0 200 400 i
 * i 140.15.2.1/32    20.20.20.20           0    100      0 300 400 i
 *>                   200.18.19.18                        0 200 400 i

The output above confirms that R19 has a WEIGHT value of 0 for all paths to the 120.18.1.1/32 network, including the path it receives from R20. This fact can further be seen by the packet capture below:

Internet Protocol Version 4, Src: 20.20.20.20, Dst: 19.19.19.19
Transmission Control Protocol, Src Port: 179, Dst Port: 24627, Seq:
43, Ack: 43, Len: 283
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 66
    Type: UPDATE Message (2)
   Withdrawn Routes Length: 0
    Total Path Attribute Length: 38
    Path attributes
        Path Attribute - ORIGIN: IGP
        Path Attribute - AS_PATH: 300 200
        Path Attribute - NEXT_HOP: 20.20.20.20
        Path Attribute - MULTI_EXIT_DISC: 0
        Path Attribute - LOCAL_PREF: 100

    Network Layer Reachability Information (NLRI)
        120.18.1.1/32
Border Gateway Protocol - UPDATE Message
Border Gateway Protocol - UPDATE Message
Border Gateway Protocol - UPDATE Message

The key point to notice in the capture is there is no “Path Attribute - WEIGHT” section in the packet capture, which proves that R20 did not advertise the WEIGHT value to R19.

All of the above pieces of evidence confirm that the WEIGHT value is not exchanged between neighbors. R19 chooses the path received from R18 over the one received from R20 for the same prefix due to the lower AS_PATH attribute.

Step 2: Local Preference

Note

Before starting this section, revert the configuration on all routers to the base initial configuration files provided with the lab.

Local preference, or LOCAL_PREF, is the next step in the BGP best-path algorithm. Unlike WEIGHT, local preference is a well-known discretionary BGP attribute, meaning all implementations of BGP should understand the local preference attribute. It is a 32-bit number ranging from 0 to 4,294,967,295. As with WEIGHT, higher values are preferred over lower ones. The default local preference value for received paths that have no explicit local preference defined is 100.

Local preference is used by the local AS to signify a global preference for a specific path that should be used within the AS. To serve this purpose, local preference has what is known as limited transitive capabilities.

Limited transit capability signifies that Local Preference can be sent in BGP updates, but routers only send it in specific situations. It is only sent in BGP updates to iBGP peers and is never sent to eBGP peers. This limited transit functionality is what allows Local Preference to be used to define preferred paths for prefixes in the entire AS and affect the entire AS’s outbound routing decisions. Local Preference’s limited transit capability does not apply to confederation eBGP peers as documented in RFC 4271:

5.1.5. LOCAL_PREF

LOCAL_PREF is a well-known attribute that SHALL be included in all UPDATE messages that a given BGP speaker sends to other internal peers. A BGP speaker MUST NOT include this attribute in UPDATE messages it sends to external peers, except in the case of BGP Confederations.

This is because BGP confederations are simply subdivisions of a large AS. Confederations are typically implemented to reduce the iBGP full-mesh scaling issues as the number of iBGP routers grows within the AS. These subdivisions are known as sub-autonomous systems. Even though the AS has been divided into sub-autonomous systems, the routing inside the AS remains unchanged. The routers within the AS should still all agree on all outbound routing decisions.

For example, in AS 300, the policy could be such that outbound traffic destined to AS 400 should exit R2 or R6. This policy could be enforced using local preference and can be set on R2 or R6 in the inbound direction for routes received from AS 400. To maintain consistent routing within the AS, this local preference value should be communicated to R4 even though it is in a different sub-AS within AS 300.

Local preference can be set in the following manner using a route map with the set local-preference clause:

  • Inbound from any BGP peer

  • Outbound to iBGP peers only

An example will help illustrate the implications of each option. In this case, we examine AS 300’s choice of best path to reach the 120.18.1.1/32 prefix. A path to this prefix is received by edge routers R2, R3, and R6 in the network from their eBGP peers. These routers will advertise the same path to their confederation iBGP and eBGP peers. This act of advertising the prefix makes R2, R3, and R6 potential exit points for traffic destined to 120.18.1.1/32. Traffic will be funneled from the internal routers R1, R4, R5, R7, and R8 toward R2, R3, or R6, based on which path the AS decides is the best path.

Note

In the paragraph above, R1, R4, R5, R7, and R8 are called internal routers to signify the fact that they do not have any true eBGP peers. They only have confederation iBGP or eBGP peers that exist within AS 300. Not having true eBGP peers from which external paths can be learned means they will not receive any external paths. This makes them ineligible to be exit points for any traffic.

Alternatively, R2, R3, and R6 are called edge routers because they have eBGP peerings and sit on the edge of the network. Being positioned as such, these routers will learn external paths and advertise them as internal paths inside AS 300.

BGP will only select a single best path to be submitted to the routing table by default. This means only the path from R2, R3, or R6 will be chosen as best by all routers in the topology. Next, we examine each edge router and internal router R8’s path selection decision.

R2 in AS 300 has multiple paths to reach 120.18.1.1/32 and 120.18.2.1/32. The output below shows that it currently uses the direct path from R16 as its best path due to the shorter AS_PATH attribute length.

On R2:

R2#show ip bgp regex _200$
 
BGP table version is 28, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best,
i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path,
f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
 
     Network          Next Hop             Metric LocPrf Weight Path
 *   120.18.1.1/32    200.2.20.20                         0 100 200 i
 *                    200.2.10.10                         0 400 200 i
 *>                   200.2.16.16                         0 200 i
 *   120.18.2.1/32    200.2.20.20                         0 100 200 i
 *                    200.2.10.10                         0 400 200 i
 *>                   200.2.16.16                         0 200 i

R2 advertises this selected best path to all of its BGP neighbors, setting itself as the next hop because it has been configured with the next-hop-self setting on all of its iBGP peers. Of those neighbors, R3, as an edge router, is most notable for this exercise. R3 receives the paths from R2 and R10. It compares the two paths and chooses R2’s path due to the shorter AS_PATH attribute, as shown below. As a result, R3’s routing table entry for the 120.18.1.1/32 network shows R2 as the next hop.

On R3:

R3#show ip bgp regex _200$
 
--- omitted ---
     Network          Next Hop               Metric LocPrf Weight Path
 *   120.18.1.1/32    200.3.10.10                           0 400 200 i
 *>                   2.2.2.2                 0    100      0 (312) 200 i
 *   120.18.2.1/32    200.3.10.10                           0 400 200 i
 *>                   2.2.2.2                 0    100      0 (312) 200 i
 

R3#show ip route bgp
 
--- omitted ---
Gateway of last resort is not set
 
      110.0.0.0/32 is subnetted, 2 subnets
B        110.19.1.1 [200/0] via 2.2.2.2, 00:27:47
B        110.19.2.1 [200/0] via 2.2.2.2, 00:27:47
      120.0.0.0/32 is subnetted, 2 subnets
B        120.18.1.1 [200/0] via 2.2.2.2, 00:27:16
B        120.18.2.1 [200/0] via 2.2.2.2, 00:27:16
 
--- omitted ---

R3 then advertises this chosen best path to R6 with itself as next hop because it has been configured with the next-hop-self setting on all of its BGP peers. R6 has received a path to the same prefix from R11. It compares the two paths and selects the path from R3 as best because of the AS_PATH attribute length again. Just as in R3’s case, R6’s routing table reflects R3 as its next hop to reach the 120.18.1.1/32 prefix as well:

On R6:

R6#show ip bgp regex _200$
 
--- omitted ---
     Network          Next Hop              Metric LocPrf Weight Path
 *>i 120.18.1.1/32    3.3.3.3                0    100      0 (312) 200 i
 *                    200.6.11.11                          0 400 200 i
 *>i 120.18.2.1/32    3.3.3.3                0    100      0 (312) 200 i
 *                    200.6.11.11                          0 400 200 i
 
R6#show ip route bgp
 
--- omitted ---
Gateway of last resort is not set
 
      110.0.0.0/32 is subnetted, 2 subnets
B        110.19.1.1 [200/0] via 3.3.3.3, 00:30:18
B        110.19.2.1 [200/0] via 3.3.3.3, 00:30:18
      120.0.0.0/32 is subnetted, 2 subnets
B        120.18.1.1 [200/0] via 3.3.3.3, 00:29:50
B        120.18.2.1 [200/0] via 3.3.3.3, 00:29:50

Finally, R6 advertises this same path to its confed-eBGP peer R8, as shown below. It also sets the next hop to itself because it has been configured with the next-hop-self setting on all of its BGP peers. R8 only receives a single path to this prefix from R6, so it marks it as best and installs a route in its routing table pointing to R6 as the next hop:

On R8:

R8#show ip bgp regexp _200$
 
BGP table version is 32, local router ID is 8.8.8.8
Status codes: s suppressed, d damped, h history, * valid, > best,
i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path,
f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
 
     Network          Next Hop          Metric LocPrf Weight Path
 *>  120.18.1.1/32    6.6.6.6             0    100      0 (336 312) 200 i
 *>  120.18.2.1/32    6.6.6.6             0    100      0 (336 312) 200 i
 
R8#show ip route bgp | begin Gateway
Gateway of last resort is not set
 
      110.0.0.0/32 is subnetted, 2 subnets
B        110.19.1.1 [200/0] via 6.6.6.6, 00:31:22
B        110.19.2.1 [200/0] via 6.6.6.6, 00:31:22
      120.0.0.0/32 is subnetted, 2 subnets
B        120.18.1.1 [200/0] via 6.6.6.6, 00:31:10
B        120.18.2.1 [200/0] via 6.6.6.6, 00:31:10

Note

The path on R3 and R6 may not seem shorter because the AS_PATH is listed as (312) 200. The numbers in parentheses represent the confederation sub-ASNs the path traverses more specifically forming the AS_CONFED_SEQUENCE. The AS_CONFED_SEQUENCE is ignored whenever making best-path decisions in BGP. The guide goes into more detail on this fact in the AS_PATH section.

From the above outputs, it is concluded that R8 uses R6 as the next hop to reach the 120.18.1.1/32 prefix, R6 uses R3, and R3 uses R2. Traffic flowing from R8 to the 120.18.1.1/32 prefix flows R8R6R3R2, as shown in the traceroute below:

R8#traceroute 120.18.1.1
 
Type escape sequence to abort.
Tracing the route to 120.18.1.1
VRF info: (vrf in name/id, vrf out name/id)
  1 30.6.8.6 1 msec 1 msec 1 msec
  2 30.3.6.3 1 msec 1 msec 1 msec
  3 30.2.3.2 2 msec 2 msec 1 msec
  4  *  *  * 

The traffic only exits the AS at the edge router R2, indicated by the * * * output of the traceroute. This output signifies that the traceroute fails after the R2 router hop. The failure occurs because AS 200 does not have a route to reach R8 and cannot route the traceroute probe back to R8.

No matter which edge router first receives traffic destined to the 120.18.1.1/32 network, it eventually ends up at R2 because R6 forwards it to R3, and R3 ultimately forwards to R2. Thus, the exit point for traffic destined to the 120.18.1.1/32 network in AS 300 is R2.

The following sections show what happens when the administrator attempts to make R6 become the preferred exit point for the 120.18.1.1/32 prefix by modifying the local preference attribute R6 advertises to its neighbors. These sections examine the change made in the outbound direction and again in the inbound direction. Each example uses the following prefix-list/route-map combination to set the local preference advertised by R6 to other routers in AS 300, in an attempt to ensure that all other routers accept R6’s path as the best path. The only difference is the choice of inbound/outbound direction and to which BGP neighbor the route map is applied:

On R6:

R6(config)#ip prefix-list 123 permit 120.18.1.1/32
 
R6(config)#route-map tst permit 10
R6(config-route-map)#match ip address prefix 123
R6(config-route-map)#set local-preference 200
R6(config)#route-map tst permit 90
Local Preference Outbound

In this section, the administrator chooses to apply the change in the outbound direction on R6. R6 will only advertise the selected path with the modified LOCAL_PREF value to neighbors where the route map is applied.

In order to attempt to change outbound to affect all the other routers in AS 300’s best-path decision, the change will be applied on every neighbor command for R6’s peers within AS 300. In this case, it is applied to R3 and R8 in the outbound direction, as shown:

On R6:

R6(config)#router bgp 336
R6(config-router)#neighbor 3.3.3.3 route-map tst out
R6(config-router)#neighbor 8.8.8.8 route-map tst out
 
R6#clear ip bgp * soft

After the changes have been made, the BGP table is refreshed by issuing the clear ip bgp * soft command. As a result, R6 re-sends the updates to R3 and R8 with the modified LOCAL_PREF value 200, as shown in the packet capture below:

Internet Protocol Version 4, Src: 6.6.6.6, Dst: 8.8.8.8
Transmission Control Protocol, Src Port: 29995, Dst Port: 179, Seq:
24, Ack: 20, Len: 387
Border Gateway Protocol - UPDATE Message
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 72
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 44
    Path attributes
        Path Attribute - ORIGIN: IGP
        Path Attribute - AS_PATH: (336 312) 200
        Path Attribute - NEXT_HOP: 6.6.6.6
        Path Attribute - MULTI_EXIT_DISC: 0
        Path Attribute - LOCAL_PREF: 200
    Network Layer Reachability Information (NLRI)
        120.18.1.1/32

The expectation at this point is that all of the routers in AS 300 now choose to use R6 to reach the designated prefix 120.18.1.1/32. A quick glance at R8’s BGP table seems promising:

On R8:

R8#show ip bgp regexp _200$
 
BGP table version is 33, local router ID is 8.8.8.8
Status codes: s suppressed, d damped, h history, * valid, > best,
i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path,
f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
 
     Network          Next Hop          Metric LocPrf Weight Path
 *>  120.18.1.1/32    6.6.6.6             0    200      0 (336 312) 200 i
 *>  120.18.2.1/32    6.6.6.6             0    100      0 (336 312) 200 i

R8’s BGP table reflects the changes made on R6. The path to 120.18.1.1/32 now has a local preference value of 200. This proves that even though R8 is a confed-eBGP peer to R6, R6 still communicates the local preference changes to R8, in compliance with the RFC excerpt mentioned earlier.

The problem, however, comes in whenever R3’s BGP table is examined:

On R3:

R3#show ip bgp regexp _200$
 
--- omitted ---
     Network          Next Hop              Metric LocPrf Weight Path
 *>  120.18.1.1/32    2.2.2.2                 0    100      0 (312) 200 i
 *                    200.3.10.10                           0 400 200 i
 *>  120.18.2.1/32    2.2.2.2                 0    100      0 (312) 200 i
 *                    200.3.10.10                           0 400 200 i

R3’s BGP table contains no noticeable changes. It continues to use R2’s path as the best path to reach the prefix 120.18.1.1/32. Why does R3 seemingly ignore R6’s advertisements? The answer lies in R6’s BGP table, shown on the next page:

On R6:

R6#show ip bgp regexp _200$
 
--- omitted---
 
     Network          Next Hop             Metric LocPrf Weight Path
 *>i 120.18.1.1/32    3.3.3.3                0    100      0 (312) 200 i
 *                    200.6.11.11                          0 400 200 i
 *   120.18.2.1/32    200.6.11.11                          0 400 200 i
 *>i                  3.3.3.3                0    100      0 (312) 200 i

This table reveals that, although R6 has advertised a modified local preference value to its neighbors, its own BGP table is unchanged by the outbound policy. The reason is that outbound modifications to path attributes do not affect the local router’s BGP table. R6 still chooses R3 as its best path due to the shorter AS_PATH attribute length. As a result, R6 will not advertise R3’s path back to R3 because R3 is a confederation internal peer. The iBGP split-horizon rule prevents R6 from advertising an internal route to an internal peer. This fact is proven by observing R6’s advertised routes to R3 below, using the command show ip bgp neighbors 3.3.3.3 advertised-routes:

R6#show ip bgp neighbors 3.3.3.3 advertised-routes
 
BGP table version is 27, local router ID is 6.6.6.6
Status codes: s suppressed, d damped, h history, * valid, > best,
i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path,
f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
 
     Network          Next Hop            Metric LocPrf Weight Path
 *>  140.15.1.1/32    200.6.11.11                            0 400 i
 *>  140.15.2.1/32    200.6.11.11                            0 400 i

Note

A BGP router follows different rules for advertising paths depending on whether the peer is an internal or external peer. In the above, R3 and R6 have a confederation internal BGP peering with each other. Such peerings act like internal peerings. The show ip bgp neighbor 3.3.3.3 command on R6 confirms the two peers are connected via an internal link:

R6#show ip bgp neighbors 3.3.3.3
BGP neighbor is 3.3.3.3,  remote AS 336, internal link
  BGP version 4, remote router ID 3.3.3.3
  Neighbor under common administration
  BGP state = Established, up for 6d00h
  Last read 00:00:22, last write 00:00:24, hold time is 180, keepalive
  interval is 60 seconds
  Neighbor sessions:
    1 active, is not multisession capable (disabled)

The reasoning behind this lies in how BGP advertises routes. For external (eBGP) peers, the router prepends its own ASN to the AS_PATH attribute in the UPDATE message sent to the external peer. For internal (iBGP or confederation iBGP) peers, this AS_PATH prepend is not performed.

Recall that BGP uses the AS_PATH attribute for loop prevention. Paths received with the router’s own ASN in the AS_PATH attribute are rejected. Therefore the AS_PATH is not prepended with the router’s ASN when advertising to other internal peers.

The side effect of this is there is no way for the internal peer to verify the path is not a looped path. The only sure way to prevent a loop is for the receiving peer to never advertise the received internal path to another internal peer. This is the iBGP split-horizon rule.

To mitigate this, iBGP peers should be fully meshed (all internal BGP routers peer with all other internal BGP routers), employ route reflection, or use confederations.

AS 300 has decided to employ confederations to bypass this requirement. Confederations are implemented by breaking up the AS into multiple sub-ASes. These ASes have their own ASN, which is recorded in the AS_CONFED_SEQUENCE attribute of paths advertised between sub-ASes. The AS_CONFED_SEQUENCE attribute is used in lieu of the AS_PATH attribute for loop prevention.

R6 does, however, advertise the path to R8. You might think that R3 should still receive the path from R6 through the R8 → R7 → R5 → R4 path. Following that path leads to the correct answer:

  1. First, R3 selects R2’s path as the best path to reach the 120.18.1.1/32 prefix. The AS_CONFED_SEQUENCE at this point is (312). R3 advertises this best path to R6.

  2. R6 receives the path and chooses it as its best path. It advertises the path to R8 with the modified LOCAL_PREF value of 200 and the AS_CONFED_SEQUENCE value (336 312).

  3. R8 receives the path as well and chooses it as its best path. It advertises to R7, retaining the received LOCAL_PREF of 200 and the AS_CONFED_SEQUENCE value (336 312).

  4. R7 receives the path and marks it as the best path. It advertises the path to R5 with the retained LOCAL_PREF value 200 and the modified AS_CONFED_SEQUENCE value (378 336 312).

  5. R5 receives the path, marks it as best, and advertises to R4, retaining the received LOCAL_PREF value 200 and the AS_CONFED_SEQUENCE value (378 336 312).

  6. R4 receives the path and marks it as best. It advertises the path to R1 and R3 with the retained LOCAL_PREF value 200 and the AS_CONFED_SEQUENCE value (345 378 336 312).

What happens next is critical to this example. Both R3 and R1 receive the path from R4 in a BGP update. They examine the AS_CONFED_SEQUENCE attribute, only to find their own sub-ASN in the sequence. For this reason, both R1 and R3 deny the update. This is proven on R3 through the following debug message:

BGP(0): 4.4.4.4 rcv UPDATE w/ attr: nexthop 6.6.6.6, origin i, local-
pref 200, metric 0, originator 0.0.0.0, merged path (345 378 336 312)
200, AS_PATH , community , extended community , SSA attribute
BGPSSA ssacount is 0
 
BGP(0): 4.4.4.4 rcv UPDATE about 120.18.1.1/32 -- DENIED due to:
AS-PATH contains our own AS;

Notice that the path is received from 4.4.4.4, with the LOCAL_PREF value 200 and the indicated AS_CONFED_SEQUENCE clearly containing R3’s sub-ASN 336. The same occurs on R1. Both R1 and R3 are forced to deny the path, in compliance with standard BGP confederation rules. These rules protect the topology from a loop situation that could occur as a result of modifying the LOCAL_PREF value outbound on R6.

All in all, modifying the LOCAL_PREF value outbound on R6 does not result in the desired outcome because R6 does not affect the change itself. R6 continues to use the same path through R3 to reach the 120.18.1.1/32 prefix, triggering a sequence of events that ultimately leads to the state reported above.

While there may be specific use cases where setting the local preference in this manner is a desirable outcome, it should be done with caution and careful examination due to the fact that suboptimal routing and control/data plane loops can result in such changes. It is also true that R6 could also manipulate its own BGP table to remedy the problem. It is far easier to simply apply the change inbound, as specified in the next section.

Note

The potential for suboptimal routing can be observed in the case of R5. R5 chooses the path received from R7 as the best path. When it installs the path in its BGP table, the next hop points to R6’s 6.6.6.6 address. When R5 needs to route a packet, it must recurse to an exit interface of the next hop 6.6.6.6:

R5#show ip route bgp | in 120.18.1.1
B        120.18.1.1 [200/0] via 6.6.6.6, 00:49:06

R5’s routing table has two equal-cost paths to reach 6.6.6.6—one through R4 and one through R7—as shown below:

R5#show ip cef 6.6.6.6
6.6.6.6/32
  nexthop 30.4.5.4 Ethernet0/0.45
  nexthop 30.5.7.7 Ethernet0/0.57

Because R6 currently uses R3 as its best path and R3 uses R2 as its best path, the ultimate exit point is R2. With the current state of R5’ table, R5 has the potential to send packets on an R4 → R3 → R2 path OR an R7 → R8 → R6 → R3 → R2 path.

The path through R7 is the suboptimal path, traversing four routers to reach R2, and the path through R4 is more optimal, traversing only two routers to reach R2.

Local Preference Inbound

The preceding section demonstrated how an outbound policy that modified the local preference value on R6 for the 120.18.1.1 prefix did not change the LOCAL_PREF value on R6. As a result, R6’s best-path decision remained unaffected, and R6 continued to use the path from R3 as best. The effects of the change were seen only on R4, R5, R7, and R8, which all decided to use the path from R6 as their best paths. The net result is that R6 was not chosen as the preferred exit path in AS 300, and the modifications performed on R6 made suboptimal routing a possibility.

To force R6 to become the new exit point, rather than setting the local preference outbound toward its neighbors, the local preference in this case is set inbound as R6 receives the path from its eBGP peer R11. This change is made by first removing the previous outbound configuration and applying the route map tst inbound on the neighborship to R11, as shown below:

On R6:

R6(config)#router bgp 336
 
R6(config-router)#no neighbor 3.3.3.3 route-map tst out
R6(config-router)#no neighbor 8.8.8.8 route-map tst out
 
R6#clear ip bgp * soft
R6(config)#router bgp 336
R6(config-router)#neighbor 200.6.11.11 route-map tst in

After performing a soft clear of the BGP table, R6 assigns the LOCAL_PREF value 200 to the path to the 120.18.1.1/32 received from R11, as shown below:

R6#clear ip bgp * soft
 
R6#show ip bgp regexp _200$
 
--- omitted ---
     Network          Next Hop              Metric LocPrf Weight Path
 *>  120.18.1.1/32    200.6.11.11                  200      0 400 200 i
 *   120.18.2.1/32    200.6.11.11                           0 400 200 i
 *>i                  3.3.3.3                 0    100      0 (312) 200 i

The output above contrasts with the output observed in the outbound section. Here, the path to 120.18.1.1/32 exists in R6’s BGP table, with a LOCAL_PREF value 200, whereas in the outbound section, there was no noticeable change in R6’s BGP table following the policy modification.

R6 can now act on the new LOCAL_PREF setting and choose the path received from R11 as the best path. As shown with the debug ip bgp update output below from R3 and R8, R6 advertises this best path to all of its BGP neighbors, this time including R3 because the received path from R11 is an external path, not an internal one.

On R3:

BGP(0): 6.6.6.6 rcvd UPDATE w/ attr: nexthop 6.6.6.6, origin i,
localpref 200, metric 0, merged path 400 200, AS_PATH
 
BGP(0): 6.6.6.6 rcvd 120.18.1.1/32

On R8:

BGP(0): 6.6.6.6 rcvd UPDATE w/ attr: nexthop 6.6.6.6, origin i,
localpref 200, metric 0, merged path (336) 400 200, AS_PATH
 
Scode_l20:07:09.543: BGP(0): 6.6.6.6 rcvd 120.18.1.1/32

Because R3 chooses R6’s path as its best path, it advertises this path to R2 and R4, its confed-eBGP neighbors:

On R3:

R3#show ip bgp regexp _200$
 
BGP table version is 26, local router ID is 3.3.3.3

Status codes: s suppressed, d damped, h history, * valid, > best,
i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path,
f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
 
     Network          Next Hop              Metric LocPrf Weight Path
 *>i 120.18.1.1/32    6.6.6.6                 0    200      0 400 200 i
 *                    200.3.10.10                           0 400 200 i
 *>  120.18.2.1/32    2.2.2.2                 0    100      0 (312) 200 i
 *                    200.3.10.10                           0 400 200 i

In the following example, R2 receives R3’s new best path with LOCAL_PREF set to 200, even though it is a confed-eBGP peer of R3. R2 chooses this path as best and will advertise it on to its BGP peers as well.

On R2:

R2#show ip bgp regexp _200$
 
BGP table version is 22, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best,
i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path,
f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
 
     Network          Next Hop       Metric LocPrf Weight Path
 *>  120.18.1.1/32    3.3.3.3             0    200      0 (336) 400 200 i
 *                    200.2.16.16                       0 200 i
 *                    200.2.10.10                       0 400 200 i
 *                    200.2.20.20                       0 100 200 i
 *>  120.18.2.1/32    200.2.16.16                       0 200 i
 *                    200.2.10.10                       0 400 200 i
 *                    200.2.2                           0 100 200 i

This process continues until all routers in the AS receive the path originally advertised by R6. The other routers will choose this path as best, resulting in the entire AS 300 agreeing that R6 should be the exit point for the 120.18.1.1/32 network.

Note

Reviewing the BGP table on R6 after the inbound policy change reveals that an interesting dichotomy exists between the BGP table’s current state and the state of the BGP table before the policy change was made.

R6 Before:

R6#show ip bgp regex _200$
--- omitted ---
     Network          Next Hop            Metric LocPrf Weight Path
 *>i 120.18.1.1/32    3.3.3.3                  0    100      0 (312) 200 i
 *                    200.6.11.11                            0 400 200 i
 *>i 120.18.2.1/32    3.3.3.3                  0    100      0 (312) 200 i
 *                    200.6.11.11                            0 400 200 i

R6 After:

R6#show ip bgp regexp _200$
--- omitted ---
     Network          Next Hop            Metric LocPrf Weight Path
 *>  120.18.1.1/32    200.6.11.11                   200      0 400 200 i
 *   120.18.2.1/32    200.6.11.11                            0 400 200 i
 *>i                  3.3.3.3                  0    100      0 (312) 200 i

Before any changes were made, R6 received two paths to the 120.18.1.1/32 prefix, one from R3 and one from R11. After the changes were made, however, R6 only receives a single path.

The sequence of events that brought this change is as follows:

  1. R6 first receives the two paths (from R11 and R3).

  2. Because R6’s modified policy sets the local preference to 200 for the path received from R11, R6 chooses this path as best over the path received from R3.

  3. R6 advertises its best path, with local preference 200, to R3.

  4. R3 compares this newly received path from R6 to its other received paths for the same prefix. Namely, the path it received from R2, which is its current best path.

  5. R3 selects the path from R6 as the best path because of the local preference.

  6. R3 now advertises R6’s path to all of its BGP neighbors. At the same time R3 withdraws the path from R6.

Through this sequence of events, R3 discovers a better path to 120.18.1.1/32 through R6. This new path replaces its previous best path through R2. Because R3 has replaced its best path, it needs to update all of its neighbors about the new path change. To do so, R3 must withdraw the path from R2 it advertised to all of its neighbors, and then advertise the new path to R6. It will not, however, advertise R6’s path to R6 again because it is an internal path. A BGP router does not advertise internal paths to iBGP peers.

A debug performed on R6 during this transition proves R3 withdraws its path from R6 and doesn’t advertise a new path to R6 again:

19:00:42.568: BGP(0): 3.3.3.3 rcv UPDATE about 120.18.1.1/32 --
withdrawn
The final table on R3 is as follows:
R3#show ip bgp regexp _200$
BGP table version is 26, local router ID is 3.3.3.3
Status codes: s suppressed, d damped, h history, * valid, >
best,
i - internal,
              r RIB-failure, S Stale, m multipath, b backup-
path,
f RT-Filter,
              x best-external, a additional-path, c RIB-
compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
 
     Network          Next Hop            Metric LocPrf Weight Path
 *>i 120.18.1.1/32    6.6.6.6             0    200      0 400 200 i
 *                    200.3.10.10                       0 400 200 i
 *>  120.18.2.1/32    2.2.2.2             0    100      0 (312)
200 i
 *                    200.3.10.10                       0 400 200 i

A quick glance at the BGP table on R20 helps prove that local preference, while communicated to iBGP and confed-eBGP peers, is not communicated to true eBGP peers. In fact, the LOCAL_PREF attribute isn’t even included in the UPDATE packet sent from R2 to R20, as evidenced by the packet capture below:

Internet Protocol Version 4, Src: 200.2.20.2, Dst: 200.2.20.20
Transmission Control Protocol, Src Port: 179, Dst Port: 59790, Seq:
43, Ack: 20, Len: 293
Border Gateway Protocol - UPDATE Message
Border Gateway Protocol - UPDATE Message
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 52
    Type: UPDATE Message (2)

    Withdrawn Routes Length: 0
    Total Path Attribute Length: 24
    Path attributes !LOCAL_PREF is missing here
        Path Attribute - ORIGIN: IGP
        Path Attribute - AS_PATH: 300 200
        Path Attribute - NEXT_HOP: 200.2.20.2
    Network Layer Reachability Information (NLRI)
        120.18.2.1/32

The BGP table shows the path received from R2, but without the local preference modifications:

On R20:

R20#show ip bgp regexp _200$
 
BGP table version is 17, local router ID is 20.20.20.20
Status codes: s suppressed, d damped, h history, * valid, > best,
i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path,
f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
 
     Network          Next Hop            Metric LocPrf Weight Path
 *   120.18.1.1/32    200.2.20.2                          0 300 400 200 i
 *>i                  19.19.19.19           0    100      0 200 i
 *   120.18.2.1/32    200.2.20.2                          0 300 200 i
 *>i                  19.19.19.19           0    100      0 200 i

Above, it appears as though there is no local preference value assigned to the path received from R2. In reality, IOS does not show the local preference value if it has not been modified from the default in the show ip bgp output for external prefixes. As stated earlier, the default LOCAL_PREF value is 100. The detailed show ip bgp 128.18.1.1 output confirms the default LOCAL_PREF setting:

R20#show ip bgp 120.18.1.1
BGP routing table entry for 120.18.1.1/32, version 48
Paths: (2 available, best #1, table default)
  Advertised to update-groups:
     1
  Refresh Epoch 13
  200

    19.19.19.19 (metric 11) from 19.19.19.19 (110.19.2.1)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 13
  300 400 200
    200.2.20.2 from 200.2.20.2 (2.2.2.2)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
Step 3: Locally Originated

Note

Before starting this section, revert the configuration on all the routers to the base initial configuration files provided with the lab.

Step 3 of the BGP best-path algorithm makes a distinction between locally originated paths and paths received from other BGP peers. As mentioned earlier, in BGP, a locally originated path is a path that was injected into the BGP table using either the network, redistribute, or aggregate-address command. By default, the BGP router should prefer to use paths that it locally originates over any path received by another BGP peer if the previous steps in the BGP best-path algorithm resulted in a tie.

To demonstrate this preference, R16 has been chosen. Currently R16 chooses to use the path from R18 to reach the 120.18.1.1/32 prefix. Because this path has not been locally originated by R16 and instead was received from R18, it is a prime candidate for this demonstration.

On R16:

R16#show ip bgp regex ^$
 
BGP table version is 36, local router ID is 16.16.16.16
Status codes: s suppressed, d damped, h history, * valid, > best,
i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path,
f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
 
     Network          Next Hop            Metric LocPrf Weight Path
 *>i 120.18.1.1/32    18.18.18.18              0    100      0 i
 *>i 120.18.2.1/32    18.18.18.18              0    100      0 i

On R16, a static route to Null0 is created for the 120.18.1.1/32 prefix, as shown below. Such a route is called a discard route because Null0 is a discard interface on a Cisco router. Any traffic sent to the Null0 interface is silently discarded by the router. Creating such a route for the 120.18.1.1/32 prefix populates a route in the routing table that directs traffic sent to the 120.18.1.1/32 prefix to the Null0 interface. This simply means all traffic sent to the 120.18.1.1/32 prefix will be discarded. This route is then injected into the BGP table by matching it with a network command on R16:

On R16:

 
R16(config)#ip route 120.18.1.1 255.255.255.255 null0
 
R16(config)#router bgp 200
R16(config-router)#network 120.18.1.1 mask 255.255.255.255

The results of this configuration can be seen in the output below:

R16#show ip bgp regexp ^$
 
--- omitted ---
 
     Network          Next Hop            Metric LocPrf Weight Path
 *>  120.18.1.1/32    0.0.0.0                  0         32768 i
 * i                  18.18.18.18              0    100      0 i
 *>i 120.18.2.1/32    18.18.18.18              0    100      0 i
 
R16#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 19
Paths: (2 available, best #1, table default)
  Advertised to update-groups:
     1          2
  Refresh Epoch 1
  Local
    0.0.0.0 from 0.0.0.0 (16.16.16.16)
      Origin IGP, metric 0, localpref 100, weight 32768, valid,
sourced, local, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  Local
    18.18.18.18 (metric 11) from 18.18.18.18 (18.18.18.18)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0

R16 chooses the locally sourced path to Null0 instead of the one received from R18. However, the reason it chooses this path is not because of step 3 of the best-path algorithm. Instead, it’s because of step 1.

Recall that step 1 of the best-path algorithm prefers paths with higher WEIGHT values. By default, Cisco IOS assigns a local weight of 32768 to all locally originated paths in the BGP table. In doing so, Cisco guarantees that locally originated paths will be preferred over all received paths to the same prefix. In order to properly test the step 3 preference, the WEIGHT attribute for the locally originated path needs to be set to 0 to match the default WEIGHT values for received paths.

A route map is therefore created that sets WEIGHT to 0. Unlike most other changes made in this document, the route map is applied to the network command instead of to a neighbor statement. Route maps applied to network commands affect only the specific prefix being injected using the network command. More importantly, any path attributes modified by the route map are reflected in the BGP table of the local router. It is not necessary to specify an in or out direction when applying a route map directly to the network command. It is always applied in the in direction. The configuration commands used are shown below:

On R16:

R16(config)#route-map tst permit 10
R16(config-route-map)#set weight 0
 
R16(config)#router bgp 200
R16(config-router)#network 120.18.1.1 mask 255.255.255.255 route-map tst

After applying the changes, R16’s BGP tables reflect the following:

R16#show ip bgp regex ^$
 
--- omitted ---
     Network          Next Hop            Metric LocPrf Weight Path
 *>  120.18.1.1/32    0.0.0.0                  0             0 i
 * i                  18.18.18.18              0    100      0 i
 *>i 120.18.2.1/32    18.18.18.18              0    100      0 i
 

R16#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 18
Paths: (2 available, best #1, table default)
  Advertised to update-groups:
     1          2
  Refresh Epoch 1
  Local
    0.0.0.0 from 0.0.0.0 (16.16.16.16)
      Origin IGP, metric 0, localpref 100, valid, sourced, local, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  Local
    18.18.18.18 (metric 11) from 18.18.18.18 (18.18.18.18)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0

With the WEIGHT and LOCAL_PREF values being equal, R16 still prefers its own locally sourced path to the prefix due to step 3 in the best-path algorithm. It is important to note that this configuration was implemented as a proof-of-concept to demonstrate step 3 of the best-path algorithm processes. It is not recommended to implement this configuration in a production environment because traffic may be blackholed unintentionally as a result.

Step 4: AS_PATH

Note

Before starting this section, revert the configuration on all the routers to the base initial configuration files provided with the lab.

In the first three steps of the BGP best-path algorithm, determinations are made based on administrative preferences or how the paths were inserted into the BGP table. While each step is instrumental in creating policies, the steps do not really describe much about measurable characteristics in the path itself, as a distance vector or link-state routing protocol would.

The fourth step in the BGP best-path algorithm deals with the length of the AS_PATH attribute. Specifically, when deciding between two paths, BGP should choose the path with the shorter AS_PATH attribute length. The reasoning for this preference is grounded in the assumption that a shorter AS_PATH attribute length indicates that the path goes through fewer autonomous systems and therefore is a shorter path to the destination prefix.

Images

This assumption is based on how the AS_PATH attribute is set. When a BGP router originates a path to a prefix, the AS_PATH attribute is empty. Whenever the BGP router advertises the same path to an external neighbor, it prepends its local ASN, such as ASN1, to the AS_PATH attribute. The external peer receives the path and stores it in its BGP table with the new AS_PATH attribute value of ASN1. When the same external peer advertises the path to one if its own external peers, it prepends its ASN, ASN2, to the existing AS_PATH attribute value. Now the AS_PATH attribute has two values: ASN2 and ASN1.

Following suit, the third external peer receives the path with AS_PATH attribute values ASN2 and ASN1. Whenever it advertises the path along again, it prepends its ASN, ASN3, to the AS_PATH attribute, resulting in the AS_PATH attribute length ASN3, ASN2, and ASN1. This process continues as the path is advertised from external peer to external peer. Each successive peer prepends its own ASN to the AS_PATH attribute length.

The prepending process ensures that when a BGP peer receives a path from its external peer, the AS_PATH attribute is presented in order from first ASN hop to originating ASN hop. By looking at the AS_PATH attribute of a BGP prefix in the BGP table, it is possible to determine every AS that has received and forwarded the path.

Images

Notice the use of the term external BGP peer in the prepending process above. This terminology is important because the prepending process only occurs when a path is advertised between external BGP peers. When a BGP router advertises a prefix to an internal BGP peer, it does not automatically modify the AS_PATH attribute in any way. This behavior makes sense, based on the purpose of the AS_PATH attribute.

The AS_PATH attribute is a well-known, mandatory attribute. Its primary purpose is to record the ASNs a path follows in order to prevent routing loops between autonomous systems. Whenever a BGP router receives a path from its external BGP peer, it checks the AS_PATH attribute for any occurrence of its own ASN. If the path contains the local ASN, the router rejects the path and will not install it in its BGP table, assuming the path passes through the local router’s own ASN.

For example, say that debug bgp updates is enabled on R20. Then R2 is forced to re-send its best paths to all of its BGP neighbors, including R20. R20 receives an UPDATE with path information for its 110.19.1.1/32 and 110.19.2.1/32 prefixes. It rejects the update it receives from R2, however, because the ASN 100 is contained in the AS_PATH attribute:

On R20:

 
R20#debug ip bgp updates

On R2:

R2#clear ip bgp * soft out

On R20:

BGP(0): 200.2.20.2 rcv UPDATE about 110.19.1.1/32 -- DENIED due to:
AS-PATH contains our own AS;
BGP(0): 200.2.20.2 rcv UPDATE about 110.19.2.1/32 -- DENIED due to:
AS-PATH contains our own AS;
 

This check has been instituted to prevent external routing loops in BGP.

Because the AS_PATH is designed to record AS hops, it makes sense for internal peers not to perform prepending when advertising paths amongst themselves because no AS boundary has been crossed.

The AS_PATH attribute can be modified using set as-path prepend [asn1 asn2 asn3] for the route map. The route map is then applied to a neighbor command in the outbound or inbound direction. The effects of this command are different, depending on which direction it is set. To demonstrate these differences, R10 and R9 from AS 400 along with R17 from AS 200 will be used as an example, with interest focused on the 140.15.1.1/32 and 140.15.2.1/32 prefixes. R17 receives these prefixes from R9 and R10 with an AS_PATH attribute of 400, as shown below:

On R17:

R17#show ip bgp regex _400$
 
--- omitted ---
 
     Network          Next Hop              Metric LocPrf Weight Path
 *   140.15.1.1/32    200.9.17.9                             0 400 i
 *>                   200.10.17.10                           0 400 i
 *   140.15.2.1/32    200.9.17.9                             0 400 i
 *>                   200.10.17.10                           0 400 i

The Wireshark capture below shows the AS_PATH attribute in the UPDATE message received by R17 from R10, and R9 contains the ASN 400 as the AS_SEQUENCE:

On R10:

Internet Protocol Version 4, Src: 200.10.17.10, Dst: 200.10.17.17
Transmission Control Protocol, Src Port: 60843, Dst Port: 179,
Seq: 100, Ack: 96, Len: 231
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 53
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 20
        Path Attribute - AS_PATH: 400
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: AS_PATH (2)
            Length: 6
            AS Path segment: 400
                Segment type: AS_SEQUENCE (2)
                Segment length (number of ASN): 1
                AS4: 400
        Path Attribute - NEXT_HOP: 200.10.17.10
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: NEXT_HOP (3)
            Length: 4
            Next hop: 200.10.17.10
    Network Layer Reachability Information (NLRI)
        140.15.1.1/32
        140.15.2.1/32
  

On R9:

  
Internet Protocol Version 4, Src: 200.9.17.9, Dst: 200.9.17.17 
Transmission Control Protocol, Src Port: 59690, Dst Port: 179,
Seq: 96, Ack: 220, Len: 204
Border Gateway Protocol - UPDATE Message 
    Marker: ffffffffffffffffffffffffffffffff 
    Length: 53 
    Type: UPDATE Message (2) 
    Withdrawn Routes Length: 0 
    Total Path Attribute Length: 20 
        Path Attribute - AS_PATH: 400 
            Flags: 0x40, Transitive, Well-known, Complete 
            Type Code: AS_PATH (2) 

            Length: 6 
            AS Path segment: 400 
                Segment type: AS_SEQUENCE (2) 
                Segment length (number of ASN): 1 
                AS4: 400 
        Path Attribute - NEXT_HOP: 200.9.17.9 
            Flags: 0x40, Transitive, Well-known, Complete 
            Type Code: NEXT_HOP (3) 
            Length: 4 
            Next hop: 200.9.17.9 
    Network Layer Reachability Information (NLRI) 
        140.15.1.1/32 
        140.15.2.1/32

R17 receives two paths for each 140.15.1.1 and 140.15.1.2. By default, R17 prefers whichever route it received first, which in this example is R10’s path.

Note

The above output may vary depending on the order in which R17 received the updates from R9 and R10. This is because all attributes between the two paths will tie. In such a situation, BGP will simply select the path that was received first as the best path because it is the older path. In this lab, that path was received from R10.

If the current best-path selection is R9 instead, to better align with the examples in this lab, simply shut down the neighbor connection between R17 and R9 by using the following command sequence:

R17(config)#router bgp 200 
R17(config-router)#neighbor 200.9.17.9 shutdown
  
22:02:17.293: %BGP-5-NBR_RESET: Neighbor 200.9.17.9 reset (Admin.
shutdown)
22:02:17.298: %BGP-5-ADJCHANGE: neighbor 200.9.17.9 Down Admin.
shutdown
22:02:17.298: %BGP_SESSION-5-ADJCHANGE: neighbor 200.9.17.9 IPv4
Unicast topology base removed from session  Admin. shutdown 

Doing so causes R17 to clear out R9’s path in its BGP table and install the path through R10. When R9 is brought back up, R10 will remain the best path because it will be the older route:

R17(config-router)#no neighbor 200.9.17.9 shutdown
*Jun  5 22:02:31.164: %BGP-5-ADJCHANGE: neighbor 200.9.17.9 Up 

Details behind this path selection behavior are explained in step 10 of this lab.

AS 400 has a specific application that utilizes the 140.15.1.1/32 address. Because of internal politics, traffic for this application needs to come into the AS via R9 whenever possible. The goal will be for R17 to prefer to use the path through R9 to reach the 140.15.1.1/32 prefix instead of R10 using only the AS_PATH attribute.

The following sections demonstrate solutions for this scenario that involve prepending AS_PATH outbound and inbound.

AS_PATH Outbound

The more common way of using the AS_PATH attribute is in the outbound direction. Typically, an AS will prepend its own ASN to make certain paths less favorable to the rest of the global BGP table. Recall that a shorter AS_PATH attribute length is preferred over longer ones. Thus, in order to influence R17 to choose the path through R9, R10 needs to prepend its path when it advertises it to R17 to make it longer. Once applied, for R17, R10’s path will be less preferable than R9’s path because the AS_PATH attribute will be longer.

To apply the AS_PATH attribute in the outbound direction, a prefix list is created to identify the 140.15.1.1/32 network. This prefix list is then added to a route map with the set as-path prepend option. The route map is then applied in the outbound direction on the R10/R17 neighbor command. The following configuration on R10 configures R10 to prepend ASN 400 to the beginning of the AS_PATH attribute one time:

On R10:

R10(config)#ip prefix-list 123 permit 140.15.1.1/32
 
R10(config)#route-map tst permit 10
R10(config-route-map)#match ip address prefix 123
R10(config-route-map)#set as-path prepend 400
 
R10(config)#route-map tst permit 90
R10(config-route-map)#router bgp 400
R10(config-router)#neighbor 200.10.17.17 route-map tst out
 
R10#clear ip bgp * soft out

The effects of this command can be seen with a before/after view of R17’s BGP table. First, the AS_PATH attribute length ties (containing a single 400) between the paths received from R9 and R10. After initiating the change and performing a clear ip bgp * soft out on R10, R10 advertises its path to R17 with a modified AS_PATH attribute, shown in the packet capture of the UPDATE packet sent by R10 to R17 below:

Internet Protocol Version 4, Src: 200.10.17.10, Dst: 200.10.17.17
Transmission Control Protocol, Src Port: 179, Dst Port: 23911, Seq: 1,
Ack: 20, Len: 204
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 52
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 24
    Path attributes
        Path Attribute - AS_PATH: 400 400
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: AS_PATH (2)
            Length: 10
            AS Path segment: 400 400
                Segment type: AS_SEQUENCE (2)
                Segment length (number of ASN): 2
                AS4: 400
                AS4: 400
        Path Attribute - NEXT_HOP: 200.10.17.10
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: NEXT_HOP (3)
            Length: 4
            Next hop: 200.10.17.10
    Network Layer Reachability Information (NLRI)
        140.15.1.1/32

As a result, R17 now prefers R9’s path because the AS_PATH attribute length is shorter, as evidenced by the before and after views of R17’s BGP table:

R17 Before the changes were made:

 
R17#show ip bgp regex _400$
 
--- omitted ---
 
     Network          Next Hop            Metric LocPrf Weight Path
 *   140.15.1.1/32    200.9.17.9                             0 400 i
 *>                   200.10.17.10                           0 400 i

 *   140.15.2.1/32    200.9.17.9                             0 400 i
 *>                   200.10.17.10                           0 400 i
 

On R17:

R17#show ip bgp regex _400$
 
--- omitted ---
 
     Network          Next Hop            Metric LocPrf Weight Path
 *>  140.15.1.1/32    200.9.17.9                             0 400 i
 *                    200.10.17.10                           0 400 400 i
 *   140.15.2.1/32    200.9.17.9                             0 400 i
 *>                   200.10.17.10                           0 400 i

Note

When configuring AS_PATH prepending, it’s important to keep in mind these three rules:

  • The router will always prepend its own ASN to the AS_PATH attribute as the leftmost ASN in the AS_PATH attribute.

  • The AS_PATH attribute is prepended with the exact values in order, as entered in the set as-path prepend command, in addition to the normal AS_PATH prepending that happens whenever an eBGP peer advertises to another eBGP peer.

  • It is wise to use only the local ASN when prepending in the outbound direction.

The first point reinforces the fact that the set as-path prepend command does not replace the router’s own ASN. The router will always prepend its own ASN to the path as the leftmost ASN in the path.

The second point is better serviced with an example. In the example above, the set as-path command contains only a single 400. When R10 advertised the path to the prefix 140.15.1.1/32 to R17, it first performed its normal prepending, adding a single 400 to the AS_PATH attribute length. Then, because of the outbound route map applied, it added an additional 400 to the advertisement, creating an AS_PATH attribute of 400 400. If the command set as-path 400 400 were used instead, the result would be an AS_PATH attribute of 400 400 400.

The last point is a best practice rooted in the simple fact that if an AS decides to prepend another ASN to a BGP path, if the path reaches that ASN, the path would be denied, and all autonomous systems that rely on that AS for their Internet routes will not have the path that was prepended. This occurs because of the default loop prevention mechanisms employed by BGP.

For example, if R10 above prepended the ASN 200 to R17 instead of prepending 400, as shown in the configuration below, R17 would receive the BGP update message and see an occurrence of its own ASN in the AS_PATH attribute. This is shown in the capture below:

R10:
 
ip prefix-list PREPEND seq 5 permit 140.15.1.1/32

!
route-map PREPEND permit 10
 match ip address prefix-list PREPEND
 set as-path prepend 200
route-map PREPEND permit 20
!
router bgp 400
 neighbor 200.10.17.17 route-map PREPEND out
 
BGP Update message from R10 to R17:
Internet Protocol Version 4, Src: 200.10.17.10, Dst: 200.10.17.17
Transmission Control Protocol, Src Port: 60843, Dst Port: 179, Seq:
455, Ack: 91, Len: 204
Border Gateway Protocol - UPDATE Message
   Marker: ffffffffffffffffffffffffffffffff
    Length: 52
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 24
    Path attributes
        Path Attribute - ORIGIN: IGP
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: ORIGIN (1)
            Length: 1
            Origin: IGP (0)
        Path Attribute - AS_PATH: 400 200
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: AS_PATH (2)
            Length: 10
            AS Path segment: 400 200
                Segment type: AS_SEQUENCE (2)
                Segment length (number of ASN): 2
                AS4: 400
                AS4: 200
    Network Layer Reachability Information (NLRI)
        140.15.1.1/32

Note, in compliance with the three rules above, R10 advertises the AS_PATH as “400 200”, keeping its own ASN as the leftmost ASN in the AS_PATH. R17 would deny the path because the AS_PATH would contain “400 200”. Because R17 is a member of AS 200, it will not accept the path. This can be confirmed by turning on debug ip bgp updates on R17:

16:46:39.466: BGP(0): 200.10.17.10 rcv UPDATE about
140.15.1.1/32 -- DENIED due to: AS-PATH contains our own AS;

Any AS that peers with AS 200 will not receive the path, resulting in whatever resource is made available using the 140.15.1.1/32 to be unavailable to them.

However, prepending with the local ASN does not suffer from this problem and is the recommended way to affect outbound AS_PATH manipulation.

The AS_PATH attribute can only be modified outbound toward eBGP neighbors, as in the example above. It cannot be modified outbound toward iBGP neighbors. BGP routers do not modify the AS_PATH attribute for paths advertised to iBGP peers at all, even when explicitly set using a route map. This is proven in the example below.

R10 will attempt to advertise the 130.7.1.1/32 prefix to its iBGP neighbor R12 by prepending the ASN 400:

R10 and R12 before AS_PATH prepending configuration:

 

On R10:

R10#show ip bgp regexp _300$
 
--- omitted ---
 
     Network          Next Hop            Metric LocPrf Weight Path
 * i 130.7.1.1/32     11.11.11.11              0    100      0 300 i
 *                    200.10.17.17                           0 200 300 i
 *                    200.3.10.3                             0 300 i
 *>                   200.2.10.2                             0 300 i
 

! Depending on the order of received UPDATES, R10 may receive the path through R3 before the path through R2. In such a situation, R10 may choose R3’s path as best since it is the older route.

 

On R12:

R12#show ip bgp regexp _300$
 
--- omitted ---
 
    Network          Next Hop            Metric LocPrf Weight Path
 * i 130.7.1.1/32     11.11.11.11              0    100      0 300 i
 *>i                  10.10.10.10              0    100      0 300 i

On R10:

R10(config)#ip prefix-list PREPEND permit 130.7.1.1/32
 
R10(config)#route-map PREPEND permit 10
R10(config-route-map)#match ip address prefix PREPEND
R10(config-route-map)#set as-path prepend 400
R10(config)#route-map PREPEND permit 20
 
R10(config-route-map)#router bgp 400
R10(config-router)#neighbor 12.12.12.12 route-map PREPEND out
 
R10#clear ip bgp *

On R10:

R10#show ip bgp regexp _300$
 
--- omitted ---
 
     Network          Next Hop               Metric LocPrf Weight Path
 * i 130.7.1.1/32     11.11.11.11              0    100      0 300 i
 *                    200.10.17.17                           0 200 300 i
 *                    200.3.10.3                             0 300 i
 *>                   200.2.10.2                             0 300 i

On R12:

R12#show ip bgp regexp _300$
 
--- omitted ---
 
     Network          Next Hop            Metric LocPrf Weight Path
  * i 130.7.1.1/32     11.11.11.11              0    100      0 300 i
  *>i                  10.10.10.10              0    100      0 300 i

Notice above that even though the route map PREPEND_R12 sets the AS_PATH prepend, it does not work toward R10’s iBGP neighbor R12. BGP peers do not modify the AS_PATH attribute when advertising a path to an internal peer. This can also be seen in the capture below:

Internet Protocol Version 4, Src: 10.10.10.10, Dst: 9.9.9.9
Transmission Control Protocol, Src Port: 12201, Dst Port: 179, Seq:
20, Ack: 20, Len: 233
Border Gateway Protocol – UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 62
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 34
    Path attributes
        Path Attribute – AS_PATH: 300
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: AS_PATH (2)
            Length: 6
            AS Path segment: 300
                Segment type: AS_SEQUENCE (2)
                Segment length (number of ASN): 1
                AS4: 300
    Network Layer Reachability Information (NLRI)
        130.7.1.1/32

AS_PATH Inbound

Note

Before starting this section, revert the configuration on all routers to the base initial configuration files provided with the lab.

The AS_PATH attribute can also be modified in the inbound direction on a BGP router. It is uncommon to do this, but it is a valid configuration. When the AS_PATH attribute is modified inbound, it affects the local AS and any other AS to which the local AS is peered. The above AS_PATH attribute modification can be made by simply reversing where the configuration is being made. Instead of setting the AS_PATH attribute outbound toward R17, R17 will set it for the path to 140.15.1.1/32 received from R10 inbound.

This time, R17 is configured with the same route map and prefix list combination as in the outbound example. The route map is then applied in the in direction on the neighbor command for R17’s peering with R10, as shown on the next page:

R17 before AS_PATH prepending:

R17#show ip bgp regex _400$
 
--- omitted ---
 
     Network          Next Hop            Metric LocPrf Weight Path
 *   140.15.1.1/32    200.9.17.9                             0 400 i
 *>                   200.10.17.10                           0 400 i
 *   140.15.2.1/32    200.9.17.9                             0 400 i
 *>                   200.10.17.10                           0 400 i

Note

R17 chooses the path via R10 as best because it received it before R9’s path. The results may vary in lab testing, depending on whose path is received first by R17.

R17 AS_PATH prepending configuration:

R17(config)#ip prefix-list PREPEND permit 140.15.1.1/32
 
R17(config)#route-map PREPEND per 10
R17(config-route-map)#match ip address prefix PREPEND
R17(config-route-map)#set as-path prepend 400
R17(config)#route-map PREPEND permit 90
 
R17(config)#router bgp 200
R17(config-router)#neighbor 200.10.17.10 route-map PREPEND in
 
R17#show ip bgp regex _400$
 
--- omitted ---
 
     Network          Next Hop            Metric LocPrf Weight Path
 *   140.15.1.1/32    200.10.17.10                        0 400 400 i
 *>                   200.9.17.9                          0 400 i
 *   140.15.2.1/32    200.10.17.10                        0 400 i
 *>                   200.9.17.9                          0 400 i

Here, the net result is the same: R17 prefers the path from R9 to enter AS 400. A special note about prepending inbound is that, unlike with outbound prepending, it is possible to prepend with the local ASN or the peer’s ASN without negative impact to the global BGP table. In this example, R17 could prepend the ASNs 200 or 400 to achieve this result. However, care should be taken to ensure that only the local ASN or the peer’s ASN is used in the inbound prepending. If another ASN is used, it could cause that path to be rejected by the AS that actually owns that particular ASN.

AS_PATH in Confederations

Loop Prevention Within the AS The examples above dealt heavily with the subject of how the AS_PATH attribute is communicated between BGP routers. One key aspect of this is that the AS_PATH attribute is not prepended when advertised between iBGP peers. This section deals with the implications of that behavior. Because the AS_PATH attribute is not prepended between iBGP peers, there is no way for all routers in the AS to prevent loops from forming from within the AS. The default loop-prevention mechanism is broken because it relies on the presence of the AS’s own ASN.

iBGP peers have to rely on another loop-prevention mechanism to compensate for this deficiency and must first advertise external paths to their iBGP neighbors as internal paths. With the paths designated as internal paths, all iBGP routers then must follow the rule that they are not to advertise internal paths to other internal peers. This rule is often called the iBGP split-horizon rule in the networking community. The process works as follows:

  1. A BGP router receives a path from its eBGP peer.

  2. The BGP router checks for loops, using the AS_PATH attribute.

  3. After passing the AS_PATH attribute check, the path is accepted into the BGP table and marked as best (after the best-path algorithm is run).

  4. The BGP router advertises the same external path as an internal path to its iBGP peers.

  5. The iBGP peers cannot advertise the same path to each other, resulting in the only path coming from the original BGP router that received the prefix.

Images

This sequence of events ensures that no loops are formed inside the AS whenever BGP peers advertise internal prefixes. The unfortunate side effect of this configuration is that, in order to ensure that all iBGP peers receive all internal updates, all the BGP routers inside the AS should be iBGP peers with each other. In other words, there should be a full mesh of iBGP peers within the AS; otherwise, all routers may not receive all prefixes.

Images

This requirement can cause issues with scalability within the AS as the number of peerings grows when new BGP routers are added to the AS. Two methods are generally used to handle this scalability problem: route reflection and confederations. Route reflection involves designating a BGP router as a route reflector. The route reflector serves a set of clients. When advertising internal prefixes, the route reflector is able to relax the iBGP split-horizon rule and advertise internal prefixes to other internal routers. (Route reflection will be explained further in later steps of the BGP best-path algorithm.)

Images

The method most applicable to the AS_PATH attribute is confederations, which deserves more explanation here.

Confederations introduce a new kind of path attribute called the AS_CONFED_SEQUENCE segment. This segment behaves similarly to the AS_SEQUENCE segment but is specific to how confederations function. Rather than simply allowing certain routers to relax the iBGP split-horizon rule, confederations break up the AS into multiple sub-autonomous systems. These sub-autonomous systems are assigned their own sub-ASNs. Routers belonging to two different sub-autonomous systems form a special kind of eBGP peering called a confederation eBGP peering. A router is allowed to advertise a received internal prefix to its confederation external BGP peers, as shown below.

Images

Advertising Paths Within Confederations Confederation eBGP peers are able to advertise internal paths between each other, as confederation external paths. When doing so, they prepend their sub-ASNs to the AS_CONFED_SEQUENCE segment of the path. The process is similar to how regular eBGP peers prepend their local ASNs to paths advertised to other external peers. The purpose for this prepending is also identical to the purpose with regular eBGP peers. By prepending the local sub-ASN to the AS_CONFED_SEQUENCE, the original BGP loop-prevention check is restored. When a router receives a confederation external path from a confed-eBGP neighbor, it first checks the AS_CONFED_SEQUENCE segment for its own local ASN. If the ASN is found, then the path is denied by the local router.

The AS_CONFED_SEQUENCE and AS_SEQUENCE segments are separate segments within the AS_PATH attribute that are combined in the show ip bgp output under the Path column, as shown in the packet capture text and show ip bgp output from an exchange between R2 and R3 in the lab topology:

Packet Capture of an UPDATE message from R2 to R3:

Frame 723: 586 bytes on wire (4688 bits), 586 bytes captured (4688
bits) on interface 0
Ethernet II, Src: aa:bb:cc:00:02:00 (aa:bb:cc:00:02:00), Dst:
aa:bb:cc:00:03:00 (aa:bb:cc:00:03:00)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 23
Internet Protocol Version 4, Src: 2.2.2.2, Dst: 3.3.3.3
Transmission Control Protocol, Src Port: 179, Dst Port: 11176, Seq:
62, Ack: 570, Len: 528
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 73
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 40
    Path attributes
        Path Attribute - ORIGIN: IGP
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: ORIGIN (1)
            Length: 1
            Origin: IGP (0)
        Path Attribute - AS_PATH: (312) 400
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: AS_PATH (2)
            Length: 12
            AS Path segment: (312)
                Segment type: AS_CONFED_SEQUENCE (3)
                Segment length (number of ASN): 1
                AS4: 312
            AS Path segment: 400
                Segment type: AS_SEQUENCE (2)
                Segment length (number of ASN): 1
                AS4: 400
       --output omitted--
    Network Layer Reachability Information (NLRI)
        140.15.1.1/32
            NLRI prefix length: 32
            NLRI prefix: 140.15.1.1
        140.15.2.1/32
            NLRI prefix length: 32
            NLRI prefix: 140.15.2.1

Here, R2 sends an UPDATE message, advertising a path to reach the 140.15.1.1/32 and 140.15.2.1/32 prefixes. The AS_PATH attribute is included in the UPDATE message as expected because the AS_PATH attribute is a well-known mandatory attribute. Within the AS_PATH attribute are the AS_CONFED_SEQUENCE attribute, highlighted in red, and the AS_SEQUENCE attribute, highlighted in blue. R2, belonging to sub-AS 312, prepends (312) to the AS_CONFED_SEQUENCE attribute before advertising the path to R3.

Notice how the AS_CONFED_SEQUENCE value is written in parenthesis. This coincides with how the AS_PATH attribute is represented in R3’s BGP table for the same prefixes in the output below. Again, the AS_SEQUENCE is highlighted in blue, and the AS_CONFED_SEQUENCE is highlighted in red:

On R3:

R3#show ip bgp regexp _400$
 
--- omitted---
 
     Network          Next Hop               Metric LocPrf Weight Path
 *   140.15.1.1/32    2.2.2.2                0    100      0 (312) 400 i
 * i                  6.6.6.6                0    100      0 400 i
 *>                   200.3.10.10                          0 400 i
 *   140.15.2.1/32    2.2.2.2                0    100      0 (312) 400 i
 * i                  6.6.6.6                0    100      0 400 i
 *>                   200.3.10.10                          0 400 i

R3 itself is a member of sub-AS 336. The AS_PATH attribute does not include sub-AS 336 in the AS_CONFED_SEQUENCE attribute, allowing R3 to accept the path as advertised from R2. R3 also receives the same paths from R4, as shown in the packet capture below:

Packet Capture of an UPDATE Message from R4 to R3:

Internet Protocol Version 4, Src: 4.4.4.4, Dst: 3.3.3.3
Transmission Control Protocol, Src Port: 38522, Dst Port: 179,
Seq: 81, Ack: 570, Len: 485
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 77
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 44
    Path attributes
        Path Attribute - AS_PATH: (345 336) 400
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: AS_PATH (2)
            Length: 16

            AS Path segment: (345 336)
                Segment type: AS_CONFED_SEQUENCE (3)
                Segment length (number of ASN): 2
                AS4: 345
                AS4: 336
            AS Path segment: 400
                Segment type: AS_SEQUENCE (2)
                Segment length (number of ASN): 1
                AS4: 400
    Network Layer Reachability Information (NLRI)
        140.15.1.1/32
            NLRI prefix length: 32
            NLRI prefix: 140.15.1.1
        140.15.2.1/32
            NLRI prefix length: 32
            NLRI prefix: 140.15.2.1

In the above, R4 is advertising R3’s path back to R3. It does so because R4 and R3 are confederation eBGP peers. The iBGP split-horizon rule does not apply and R4 will advertise all of its best paths to R3 even if it has selected the path through R3 as best (as in this situation). This behavior is the same as how a BGP router with a normal eBGP peering would advertise a path back to its neighboring eBGP peer.

R4 advertises the path with AS_CONFED_SEQUENCE (345 336). When this advertisement reaches R3, it fails the AS_PATH attribute check at the sub-AS boundary, and R3 denies the path because its own ASN, 336, is contained in the AS_CONFED_SEQUENCE. The debug ip bgp updates output below illustrates this process:

BGP(0): 4.4.4.4 rcv UPDATE about 140.15.1.1/32 -- DENIED due to:
AS-PATH contains our own AS; NEXTHOP is our own address;
 
BGP(0): 4.4.4.4 rcv UPDATE about 140.15.2.1/32 -- DENIED due to:
AS-PATH contains our own AS; NEXTHOP is our own address;

We have just looked at the process by which paths are exchanged between routers belonging to different sub-autonomous systems. Routers within a single sub-AS can exchange paths as well but are considered confederation iBGP peers. Confederation iBGP peers are constrained by the same rules as normal iBGP peers. They are unable to advertise internal paths, called confederation internal paths, with each other, and they do not prepend the local sub-ASN to the AS_CONFED_SEQUENCE when advertising paths with each other.

The final point that deserves highlighting in this section regarding confederations is how confederations are represented to external autonomous systems. When a BGP router belonging to a confederation advertises a path to a true external peer, it strips off the AS_CONFED_SEQUENCE segment from the AS_PATH attribute and prepends the main ASN to the AS_SEQUENCE segment. As an example, see the advertisement of the 140.15.1.1/32 prefix from R2 to R20 below:

On R20:

R20#show ip bgp regexp _400$
 
--- omitted ---
 
     Network          Next Hop               Metric LocPrf Weight Path
 * i 140.15.1.1/32    19.19.19.19              0    100      0 200 400 i
 *>                   200.2.20.2                             0 300 400 i
 * i 140.15.2.1/32    19.19.19.19              0    100      0 200 400 i
 *>                   200.2.20.2                             0 300 400 i

This can be verified with a Wireshark capture showing the BGP update message sent by R2 to R20. Notice the missing AS_CONFED_SEQUENCE segment:

Internet Protocol Version 4, Src: 200.2.20.2, Dst: 200.2.20.20
Transmission Control Protocol, Src Port: 32403, Dst Port: 179,
Seq: 96, Ack: 410, Len: 299
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 57
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 24
    Path attributes
        Path Attribute - ORIGIN: IGP
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: ORIGIN (1)
            Length: 1
            Origin: IGP (0)
        Path Attribute - AS_PATH: 300 400
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: AS_PATH (2)
            Length: 10
            AS Path segment: 300 400
                Segment type: AS_SEQUENCE (2)
                Segment length (number of ASN): 2

                AS4: 300
                AS4: 400
        Path Attribute - NEXT_HOP: 200.2.20.2
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: NEXT_HOP (3)
            Length: 4
            Next hop: 200.2.20.2
    Network Layer Reachability Information (NLRI)
        140.15.1.1/32
        140.15.2.1/32

This function is implemented because routers in completely different autonomous systems have no use for the information contained in the AS_CONFED_SEQUENCE attribute. The information is germane only to the local AS. By performing the stripping and prepending, the confederation is represented as a single AS to all other BGP-speaking routers in different autonomous systems.

AS_PATH Processing Within the Confederation

Note

Before starting this section, revert the configuration on all routers to the base initial configuration files provided with the lab.

As shown in the sections above, the AS_CONFED_SEQUENCE segment within the AS_PATH attribute is primarily used for loop prevention. AS_CONFED_SEQUENCE does not contribute to the overall AS_PATH attribute length comparison used in step 4 of the best-path algorithm. Only the AS_SEQUENCE segment is compared for the AS_PATH attribute length selection criteria. R4 receives two paths for both the 120.18.1.1/32 and 120.18.2.1/32 prefixes, as shown below:

On R4:

R4#show ip bgp regex 200$
 
BGP table version is 16, local router ID is 4.4.4.4
Status codes: s suppressed, d damped, h history, * valid, > best,
i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path,
f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
 

     Network          Next Hop       Metric LocPrf Weight Path
 *   120.18.1.1/32    2.2.2.2            0    100    0 (312) 200 i
 *>                   3.3.3.3            0    100    0 (336 312) 200 i
 *   120.18.2.1/32    2.2.2.2            0    100      0 (312) 200 i
 *>                   3.3.3.3            0    100      0 (336 312) 200 i

The path to next hop R2 has a single sub-AS (312) in the AS_CONFED_SEQUENCE segment. The path to the next hop R3 has two sub-autonomous systems (336 312). Technically, the path to next hop R2 has the shorter total AS_CONFED_SEQUENCE length and should be considered best.

R4, however, chooses the path to next hop R3 as best, even though the AS_CONFED_SEQUENCE segment is longer than the path to next hop R2. This proves that the AS_CONFED_SEQUENCE segment is not considered as part of the AS_PATH attribute length.

When R4 makes the comparison between the two paths, it considers the AS_PATH attribute length a tie, with an AS_SEQUENCE segment value of 200. The deciding factor is then the IGP metric to the next hop, as shown in the output below:

On R4:

 
R4#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 12
Paths: (2 available, best #2, table default)
  Advertised to update-groups:
     1          2
  Refresh Epoch 1
  (312) 200
    2.2.2.2 (metric 21) from 1.1.1.1 (1.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, confed-external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  (336 312) 200
    3.3.3.3 (metric 11) from 3.3.3.3 (3.3.3.3)
      Origin IGP, metric 0, localpref 100, valid, confed-external,
best
      rx pathid: 0, tx pathid: 0x0
 
R4#show ip bgp 120.18.2.1
 
BGP routing table entry for 120.18.2.1/32, version 13

Paths: (2 available, best #2, table default)
  Advertised to update-groups:
     1          2
  Refresh Epoch 1
  (312) 200
    2.2.2.2 (metric 21) from 1.1.1.1 (1.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, confed-external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  (336 312) 200
    3.3.3.3 (metric 11) from 3.3.3.3 (3.3.3.3)
      Origin IGP, metric 0, localpref 100, valid, confed-external,
best
      rx pathid: 0, tx pathid: 0x0

In this case, the metric expressed in the show ip bgp output above was taken directly from R4’s IGP metrics for the same prefixes:

On R4:

R4#show ip route ospf | i 2.2.2.2|3.3.3.3
 
O        2.2.2.2 [110/21] via 30.3.4.3, 00:44:07, Ethernet0/0.34
O        3.3.3.3 [110/11] via 30.3.4.3, 00:44:07, Ethernet0/0.34

This metric is assigned to the paths through the route recursion process in IOS routing. Because R4 has a lower metric to reach next hop R3, it chooses to use the path with next hop R3 as its best path, assuming that it has an overall shorter or more preferred internal route to reach it. More about this determination is revealed in step 8 of the BGP best-path process.

This behavior of ignoring the AS_CONFED_SEQUENCE for best-path calculations follows with the functionality specified in RFC 5065 Section 5.3, point 3:

5.3. AS_PATH and Path Selection

  • Path selection criteria for information received from members inside

  • a confederation MUST follow the same rules used for information

  • received from members inside the same autonomous system, as specified

  • in [BGP-4].

  • In addition, the following rules SHALL be applied:

  • --- omitted for brevity ---

  • 3) When comparing routes using AS_PATH length, CONFED_SEQUENCE and CONFED_SETs SHOULD NOT be counted.

as-path ignore

The command bgp bestpath as-path ignore is a hidden command that allows a BGP router to ignore the AS_PATH attribute length in its decision-making process. It causes the router to completely skip step 4 of the best-path algorithm and ignore the AS_PATH length comparison. R2 is chosen below to demonstrate the use of this command:

On R2:

R2#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 20
Paths: (3 available, best #1, table default)
  Advertised to update-groups:
     8          9          10
  Refresh Epoch 1
  200
    200.2.16.16 from 200.2.16.16 (16.16.16.16)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  100 200
    200.2.20.20 from 200.2.20.20 (20.20.20.20)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  400 200
    200.2.10.10 from 200.2.10.10 (10.10.10.10)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

Here, R2 chooses to use the path from R16 to reach the 120.18.1.1 prefix because it has the lower AS_PATH length compared to its other paths received from R20 and R10. R2 is then modified with the bgp bestpath as-path ignore command. Then, the peering between R2 and R16 is reset, resulting in the following after clearing the BGP peering to R16:

On R2:

 
R2(config)#router bgp 312
R2(config-router)#bgp bestpath as-path ignore
 
R2#clear ip bgp 200.2.16.16
 
%BGP-5-ADJCHANGE: neighbor 200.2.16.16 Down User reset
%BGP_SESSION-5-ADJCHANGE: neighbor 200.2.16.16 IPv4 Unicast topology
base removed from session  User reset
 
%BGP-5-ADJCHANGE: neighbor 200.2.16.16 Up
 
 
R2#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 6
Paths: (4 available, best #3, table default)
  Advertised to update-groups:
     8          9          10
  Refresh Epoch 1
  200
    200.2.16.16 from 200.2.16.16 (16.16.16.16)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  100 200
    200.2.20.20 from 200.2.20.20 (20.20.20.20)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  400 200
    200.2.10.10 from 200.2.10.10 (10.10.10.10)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 2
  (336) 400 200
    3.3.3.3 (metric 11) from 3.3.3.3 (3.3.3.3)
      Origin IGP, metric 0, localpref 100, valid, confed-external
      rx pathid: 0, tx pathid: 0

After the change is implemented, R2 adjusts its decision making. When the neighborship to R16 is reset, R2 loses its best path. It then reevaluates its decision between R10 and R20. R2 chooses R10’s path as best because it has a lower router ID (10.10.10.10) than R20 (20.20.20.20).

After selecting the path from R10 as the best path, the R2/R16 peering comes up. After this, R16 advertises its path to R2 again. R2 runs the best-path algorithm. This time, because it is configured with the bgp bestpath as-path ignore command, it ignores step 4 of the best-path algorithm. Now, the paths from R16, R10, and R20 tie. Since R2 has already marked R10 as its best path, it follows step 10 of the best-path algorithm and continues to use R10 as its best path, preferring its older route.

Note

When R2 chooses R10 as its best path, it advertises this new path selection to R3. Before all the madness, R3 chose R2 as its best path because of the lower AS_PATH length shown below:

R3#show ip bgp 120.18.1.1
BGP routing table entry for 120.18.1.1/32, version 27
Paths: (3 available, best #2, table default)
  Advertised to update-groups:
     1          2          3
  Refresh Epoch 2
  (345 312) 200
    2.2.2.2 (metric 11) from 4.4.4.4 (4.4.4.4)
      Origin IGP, metric 0, localpref 100, valid, confed-external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  (312) 200
    2.2.2.2 (metric 11) from 2.2.2.2 (2.2.2.2)
      Origin IGP, metric 0, localpref 100, valid, confed-external,
best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 2
  400 200
    200.3.10.10 from 200.3.10.10 (10.10.10.10)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

When R2 advertises the new path through R10, the AS_PATH attribute ties between R2’s path and the path R3 receives from R10 itself. With the AS_PATH as a tie, R3 chooses the path from R10 as best because it is an external path over its confederation external path received from R2. This is in compliance with confederation route selection rules that are explained in RFC 5065.

R3 advertises this path to R2, as shown in the outputs above. R2 will choose its external path received from R10 over the confederation external path received  from R3, just as R3 itself did.

More details about this path selection are provided in step 7 of the best-path algorithm.

It’s important to keep in mind that the bgp bestpath as-path ignore command does not disable the loop-prevention check. Before accepting the path, the router still checks the AS_PATH attribute to ensure that its own ASN does not appear. The effects of this command only apply to whether the router considers AS_PATH length in its decision-making process.

Step 5: Origin Code

Note

Before starting this section, revert the configuration on all routers to the base initial configuration files provided with the lab.

The fifth step in the BGP best-path decision algorithm relates to how the path was first injected into the BGP table. This attribute is known as the well-known mandatory ORIGIN attribute. The ORIGIN attribute is defined as having one of three values: IGP, EGP, or INCOMPLETE.

The ORIGIN value of IGP signifies that the path was first injected into the BGP table directly by the originating router. In Cisco IOS, paths that are injected into the BGP process in the following ways are given the ORIGIN code IGP:

  1. Using a network command in the BGP process

  2. As the result of the aggregate-address command where the as-set option is not used

  3. As the result of the aggregate-address command with the as-set option used and all component paths have an ORIGIN code of IGP

A value of EGP signifies that the path was originally injected into an EGP process by a legacy EGP-only-speaking router. Finally, a value of INCOMPLETE signifies that there is no authoritative origin information for the path.

The INCOMPLETE origin code requires a bit more explanation than the first two. Paths with origin of INCOMPLETE are typically for NLRI that have been redistributed into BGP from another routing source. The INCOMPLETE origin code also applies to NLRI that are injected into BGP using an aggregate-address command along with the as-set option, if any component path covered by the aggregate has an INCOMPLETE origin.

These origins are shown in the show ip bgp output under the Path column with values of e, i or ?, as shown below:

On Any Router:

Router#show ip bgp
BGP table version is 14, local router ID is 100.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best,
i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path,
f RT-Filter,

              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGPe - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
 
     Network          Next Hop            Metric LocPrf Weight Path
 *>  1.11.1.1/32      200.2.16.16                            0 200 100 i
 *>  2.22.2.2/32      0.0.0.0                  0         32768 ? 

In the output above, the i and ? to the far right of the AS_PATH information indicates the origin of the path.

Because redistributed information is external to the BGP process, BGP has less granular information about the path compared to a path that was directly injected into BGP originally. The BGP best-path algorithm prefers to trust paths that were originated directly by BGP (containing code IGP) or EGP over such paths.

The ORIGIN attribute of a particular path can be changed by using the set origin argument in a route map. The BGP best-path algorithm prefers ORIGIN code IGP over EGP and code EGP over INCOMPLETE.

R19 is chosen to demonstrate this path preference. It has two potential paths it can use to reach the 140.15.1.1/32 prefix: one from R18 and another from R20.

On R19:

R19#show ip bgp 140.15.1.1
 
BGP routing table entry for 140.15.1.1/32, version 10
Paths: (2 available, best #1, table default)
  Advertised to update-groups:
     2
  Refresh Epoch 1
  200 400
    200.18.19.18 from 200.18.19.18 (18.18.18.18)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  300 400
    20.20.20.20 (metric 11) from 20.20.20.20 (20.20.20.20)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0

Without any further modifications, R19 chooses the path through R18 because it is an external path, and all steps prior tie; that is, the WEIGHT, LOCAL_PREF, locally originated, AS_PATH, and ORIGIN steps all tie.

The packet capture shows the BGP update message sent by R18 to R19, where the ORIGIN code for the 140.15.1.1 network is set to IGP. The code is set to IGP by R15 when the network is injected into BGP using the network command. This code is preserved in UPDATE messages as it is exchanged between BGP routers.

Internet Protocol Version 4, Src: 200.18.19.18, Dst: 200.18.19.19
Transmission Control Protocol, Src Port: 40092, Dst Port: 179,
Seq: 24, Ack: 24, Len: 192
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 57
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 24
    Path attributes
        Path Attribute - ORIGIN: IGP
            Flags: 0x40, Transitive, Well-known, Complete
                0... .... = Optional: Not set
                .1.. .... = Transitive: Set
                ..0. .... = Partial: Not set
                ...0 .... = Extended-Length: Not set
                .... 0000 = Unused: 0x0
            Type Code: ORIGIN (1)
            Length: 1
            Origin: IGP (0)
    Network Layer Reachability Information (NLRI)
        140.15.1.1/32
        140.15.2.1/32

The path preference for the 140.15.1.1 prefix on R19 can be changed by modifying the ORIGIN value of the path received from R18 to INCOMPLETE. This method is implemented below on R19.

A prefix list 123 permitting the 140.15.1.1/32 prefix is first configured. This prefix list is then referenced inside the route map TST. The set origin code command under the route map sets the origin code to INCOMPLETE. An empty permit route map statement allows for all other prefixes to pass through. Finally, the route map TST is appended to the neighbor 200.18.19.18 statement in an inbound direction:

On R19:

R19(config)#ip prefix-list 123 permit 140.15.1.1/32
 
R19(config)#route-map TST per 10
R19(config-route-map)#match ip address prefix 123
R19(config-route-map)#set origin incomplete
R19(config)#route-map TST permit 90
 
R19(config)#router bgp 100
R19(config-router)#neighbor 200.18.19.18 route-map TST in
 
R19#clear ip bgp * soft in
 
R19#show ip bgp 140.15.1.1
 
BGP routing table entry for 140.15.1.1/32, version 30
Paths: (2 available, best #2, table default)
  Advertised to update-groups:
     1
  Refresh Epoch 2
  200 400
    200.18.19.18 from 200.18.19.18 (120.18.2.1)
      Origin incomplete, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  300 400
    20.20.20.20 (metric 11) from 20.20.20.20 (20.20.20.20)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0

clear ip bgp * soft in is then issued on R19, causing R18 to re-send its BGP routing advertisements to R19. R19 can then apply the newly configured policy that sets the path attribute of the origin code to INCOMPLETE for the 140.15.1.1/32 prefix. With these changes made, R19 now chooses the path through R20 because its ORIGIN code of IGP is a more favorable code than the path through R18 with an incomplete origin code.

Step 6: MED

Note

Before starting this section, revert the configuration on all the routers to the base initial configuration files provided with the lab.

The sixth step in the BGP best-path algorithm compares the value of the MED or MULTI_EXIT_DISC attribute. The MED attribute is an optional non-transitive attribute that is used to indicate a preferred entry point into the local AS to a neighboring AS. MED is expressed as a 32-bit unsigned number and is commonly referred to as the metric of a BGP path because it behaves in a similar way to IGP metrics.

The MED attribute is set and exchanged in many different ways, depending on a specific set of circumstances. To aid in the explanation, the following adjustments have been made to the original topology:

  • The prefix 130.7.1.1 is no longer advertised into BGP on R7.

  • The prefix 130.7.1.1 is advertised into OSPF Area 0 on R7.

  • R2 and R3 advertise the 130.7.1.1 prefix into BGP using a network statement.

  • R1 has been configured to set the WEIGHT value to 100 for the path to the 130.7.1.1 network it receives from R4.

On R7:

R7(config)#router bgp 378
R7(config-router)#no network 130.7.1.1 mask 255.255.255.255
 
R7(config-router)#interface lo10
R7(config-if)#ip ospf 1 area 0

On R2:

R2(config)#router bgp 312
R2(config-router)#network 130.7.1.1 mask 255.255.255.255
 

On R3:

R3(config)#router bgp 336
R3(config-router)#network 130.7.1.1 mask 255.255.255.255

On R1:

R1(config)#ip prefix-list 123 permit 130.7.1.1/32
 
R1(config)#route-map TST permit 10
R1(config-route-map)#match ip address prefix 123
R1(config-route-map)#set weight 100
R1(config)#route-map TST permit 90
 
R1(config)#router bgp 312
R1(config-router)#neighbor 4.4.4.4 route-map TST in

After making the changes, R2 and R3 now receive the prefix 130.7.1.1 through OSPF instead of through BGP. On R2 and R3, the network command matches this OSPF-learned route. The result is that R2 and R3 originate the BGP prefix 130.7.1.1 into BGP instead of R7. R1 will prefer the path advertised to it from R4 by R3 and advertise it to R2. With this setup, the different ways in which MED is set can be examined.

How MED Is Set

The BGP table on R2 helps reveal one way that MED can initially be set:

On R2:

 
 
R2#show ip bgp
---omitted---
     Network          Next Hop               Metric LocPrf Weight Path
 * i 130.7.1.1/32     3.3.3.3                 31    100    0 (345 336)
 *                    3.3.3.3                 31    100    0 (336) i
 *>                   30.1.2.1                41           32768 i
 
R2#show ip bgp 130.7.1.1
BGP routing table entry for 130.7.1.1/32, version 20
Paths: (3 available, best #3, table default)
  Advertised to update-groups:
     1          2          3
  Refresh Epoch 5
  (345 336)
    3.3.3.3 (metric 11) from 1.1.1.1 (1.1.1.1)
      Origin IGP, metric 31, localpref 100, valid, confed-internal
      rx pathid: 0, tx pathid: 0

  Refresh Epoch 6
  (336)
    3.3.3.3 (metric 11) from 3.3.3.3 (3.3.3.3)
      Origin IGP, metric 31, localpref 100, valid, confed-external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  Local
    30.1.2.1 from 0.0.0.0 (2.2.2.2)
      Origin IGP, metric 41, localpref 100, weight 32768, valid,
sourced, local, best
      rx pathid: 0, tx pathid: 0x0

Note

Depending on the IOS version, the next hop value for the locally injected path to the 130.7.1.1 network may not match the output above. The reason behind this is because R2 has two equal-cost OSPF paths to reach this network in its routing table, 30.1.2.1 and 30.2.3.3. BGP will select only one next hop as the next hop it adds to the BGP table. In older IOS versions such as 15.4, when multiple IGP next hops exist, the router next hop IP address will be the lowest IP address, 30.1.2.1. In case of newer IOS versions such as 15.7, IOS chooses to display the higher next hop IP address, 30.2.3.3.

R2’s BGP table in the output above shows three paths for the 130.7.1.1 network: one from R1, one from R3, and one locally injected by R2 itself. R2 chooses its own locally originated path as best due to the WEIGHT attribute, as discussed in step 1.

Looking at the MED value of each path, the paths received from R1 and R3 both have a MED value set to 31. The locally injected path has a MED value of 41. These values were not randomly generated by BGP. Instead, they correspond to the IGP metric used to reach the destination prefix. This case can be proven on R2 by examining the following show ip route output for the 130.7.1.1 prefix:

R2#show ip route 130.7.1.1
 
Routing entry for 130.7.1.1/32
  Known via "ospf 1", distance 110, metric 41, type intra area
  Advertised by bgp 312
  Last update from 30.1.2.1 on Ethernet0/0.12, 01:59:48 ago
  Routing Descriptor Blocks:
  * 30.2.3.3, from 7.7.7.7, 01:59:48 ago, via Ethernet0/0.23
      Route metric is 41, traffic share count is 1
    30.1.2.1, from 7.7.7.7, 01:59:48 ago, via Ethernet0/0.12
      Route metric is 41, traffic share count is 1

By default, MED is set to whatever the internal metric is to reach that specific prefix whenever the router originates a prefix into BGP. In the case of the 130.7.1.1 prefix, R2’s metric to reach the prefix is its OSPF cost of 41. Thus, this metric value is copied into the MED attribute of the BGP prefix when matched with the network command in BGP configuration. The same is true whenever a route is redistributed from another routing source into BGP, using the redistribute command. MED can also be set manually using a route map with the set metric option included. The metric value chosen will be sent in the BGP update as the new MED value.

How MED Is Communicated

The above proved that MED is initially set based on the local router’s metric to the target prefix. R2 inserted a MED value of 41 for its locally originated path to the 130.7.1.1 prefix because its internal OSPF cost to reach that prefix was also 41. If the examination continues to the paths received from both R1 and R3, however, there seems to be a contradiction:

R2#show ip bgp
 
 * i  130.7.1.1/32     3.3.3.3                31    100      0 (345 336) i
 *                     3.3.3.3                31    100      0 (336) i
 *>                    30.1.2.1               41         32768 i

Note

Depending on the IOS version, the next hop IP address for the best path may differ. In older IOS versions such as 15.4, when multiple IGP next hops exist, the router next hop IP address (highlighted in yellow above) will be the lowest IP address. In case of newer IOS versions such as 15.7, IOS chooses to display the higher next hop IP address. In which case, the highlighted output might show 30.2.3.3.

The MED values assigned to R1’s and R3’s paths are listed as 31 instead of 41. Why isn’t MED set to 41 on R2 for the other two paths? The answer lies in which router originated the path to next hop 3.3.3.3 in BGP. In this case, that router is R3.

The simple explanation is that R3 is also configured to inject the 130.7.1.1/32 prefix into BGP, using a network command. R3’s own OSPF cost to reach the 130.7.1.1/32 prefix is 31, as shown below:

R3#show ip route ospf | section 130.7.1.1
 
O        130.7.1.1 [110/31] via 30.3.6.6, 00:23:21, Ethernet0/0.36
                   [110/31] via 30.3.4.4, 00:23:21, Ethernet0/0.34
 

R3#show ip bgp 130.7.1.1
 
BGP routing table entry for 130.7.1.1/32, version 27
Paths: (2 available, best #2, table default)
  Advertised to update-groups:
     1          2          3
  Refresh Epoch 1
  (312)
    2.2.2.2 (metric 11) from 2.2.2.2 (2.2.2.2)
      Origin IGP, metric 41, localpref 100, valid, confed-external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  Local
    30.3.4.4 from 0.0.0.0 (3.3.3.3)
      Origin IGP, metric 31, localpref 100, weight 32768, valid,
sourced, local, best
      rx pathid: 0, tx pathid: 0x0

Just like R2, R3 will inject its internal metric from its routing table (OSPF cost 31) as the MED value for the BGP path. This MED value will stick with R3’s path as it is advertised from peer to peer within the AS.

The process is as follows:

  1. R3 injects the 130.7.1.1/32 prefix into its BGP table with MED value 31 and marks it as the best path. It advertises this prefix to R2 and R4, its confederation external BGP peers, and to R6, its confederation internal BGP peer.

  2. R4 also marks R3’s path as the best path and advertises the same path to R1 with the same MED value of 31.

  3. R1 receives R3’s path from R4 with the original MED value of 31 and marks it as the best path. R1 then advertises R3’s path to R2 as well with the retained MED value.

R2 receives two paths to the prefix with next hop 3.3.3.3—one from R1 and one from R3—both with the metric 31.

The sequence of events above shows how MED values are communicated between confederation internal and external BGP peers unchanged. The path advertised by R3 travels from R4 to R1 to R2, retaining its original MED value of 31 throughout its journey.

MED is not limited to confederation peers; it is also communicated to normal iBGP peers and even eBGP peers under certain circumstances. To illustrate this, R20’s BGP table is examined below:

R20#show ip bgp 130.7.1.1
 
BGP routing table entry for 130.7.1.1/32, version 22
Paths: (1 available, best #1, table default)

  Advertised to update-groups:
     2
  Refresh Epoch 1
  300
    200.2.20.2 from 200.2.20.2 (2.2.2.2)
      Origin IGP, metric 41, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0

R20 also receives a path from R2 with MED value set to 41, the same MED value R2 set for its locally originated path to the same prefix. A packet capture of the BGP update message sent to R20 by R2 for the 130.7.1.1 prefix below confirms this:

Internet Protocol Version 4, Src: 200.2.20.2, Dst: 200.2.20.20
Transmission Control Protocol, Src Port: 25684, Dst Port: 179,
Seq: 24, Ack: 240, Len: 306
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 55
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 27
    Path attributes
        Path Attribute - ORIGIN: IGP
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: ORIGIN (1)
            Length: 1
            Origin: IGP (0)
        Path Attribute - AS_PATH: 300
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: AS_PATH (2)
            Length: 6
            AS Path segment: 300
        Path Attribute - NEXT_HOP: 200.2.20.2
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: NEXT_HOP (3)
            Length: 4
            Next hop: 200.2.20.2
        Path Attribute - MULTI_EXIT_DISC: 41
            Flags: 0x80, Optional, Non-transitive, Complete
            Type Code: MULTI_EXIT_DISC (4)
            Length: 4
            Multiple exit discriminator: 41
    Network Layer Reachability Information (NLRI)
        130.7.1.1/32

Note

Interestingly, the BGP metric is actually recorded in R20’s routing table entry for the prefix 130.7.1.1. In the show ip route output below, R20 has a BGP-learned entry for 130.7.1.1 with the AD/metric combination 20/41. The 41 is the same MED value as the path to 130.7.1.1 in R20’s BGP table:

On R20:

R20#show ip route bgp | sec 130.7.1.1
 
B        130.7.1.1 [20/41] via 200.2.20.2, 00:02:23
      140.15.0.0/32 is subnetted, 2 subnets

R20 will also in turn advertise this MED value to R19. This too can be seen with the show ip bgp 130.7.1.1 output on R19 and the update message from R20 to R19, as shown below:

R19#show ip bgp 130.7.1.1
 
BGP routing table entry for 130.7.1.1/32, version 23
Paths: (2 available, best #2, table default)
  Advertised to update-groups:
     1
  Refresh Epoch 1
  200 300
    200.18.19.18 from 200.18.19.18 (18.18.18.18)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  300
    20.20.20.20 (metric 11) from 20.20.20.20 (20.20.20.20)
      Origin IGP, metric 41, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0
Internet Protocol Version 4, Src: 20.20.20.20, Dst: 19.19.19.19
 
Transmission Control Protocol, Src Port: 179, Dst Port: 51761, Seq:
43, Ack: 20, Len: 217
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 62
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 34
    Path attributes
        Path Attribute - ORIGIN: IGP

            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: ORIGIN (1)
            Length: 1
            Origin: IGP (0)
        Path Attribute - AS_PATH: 300
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: AS_PATH (2)
            Length: 6
            AS Path segment: 300
        Path Attribute - NEXT_HOP: 20.20.20.20
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: NEXT_HOP (3)
            Length: 4
            Next hop: 20.20.20.20
        Path Attribute - MULTI_EXIT_DISC: 41
            Flags: 0x80, Optional, Non-transitive, Complete
            Type Code: MULTI_EXIT_DISC (4)
            Length: 4
            Multiple exit discriminator: 41
        Path Attribute - LOCAL_PREF: 100
            Flags: 0x40, Transitive, Well-known, Complete
            Type Code: LOCAL_PREF (5)
            Length: 4
            Local preference: 100
    Network Layer Reachability Information (NLRI)
        130.7.1.1/32

A difference occurs whenever R19 advertises the same path to R18. The BGP table on R18 shows a missing MED value instead of the expected value of 41:

R18#show ip bgp 130.7.1.1
 
BGP routing table entry for 130.7.1.1/32, version 25
Paths: (2 available, best #2, table default)
  Advertised to update-groups:
     1
  Refresh Epoch 1
  100 300
    200.18.19.19 from 200.18.19.19 (19.19.19.19)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  300
    16.16.16.16 (metric 11) from 16.16.16.16 (16.16.16.16)

      Origin IGP, metric 41, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0

A packet capture showing the BGP update message sent by R19 to R18 also reveals the missing MED attribute for the 130.7.1.1/32 prefix:

Internet Protocol Version 4, Src: 200.18.19.19, Dst: 200.18.19.18
Transmission Control Protocol, Src Port: 179, Dst Port: 62942,
Seq: 43, Ack: 24, Len: 188
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 52
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 24
    Path attributes
        Path Attribute - ORIGIN: IGP
        Path Attribute - AS_PATH: 100 300
        Path Attribute - NEXT_HOP: 200.18.19.19
    Network Layer Reachability Information (NLRI)
        130.7.1.1/32

The reason for this is related to the nature of the MED attribute. Although MED is a transitive attribute, there are restrictions on when MED values are retained when advertised between BGP peers. MED is retained only when advertised to iBGP peers, confederation internal peers, and confederation external peers. MED values are also retained when advertised to eBGP peers, but only if the MED value was not received from another AS. MED values received from a neighboring AS should not be communicated to a different AS. This point is made clear in RFC 4271:

If received over EBGP, the MULTI_EXIT_DISC attribute MAY be propagated over IBGP to other BGP speakers within the same AS (see also 9.1.2.2). The MULTI_EXIT_DISC attribute received from a neighboring AS MUST NOT be propagated to other neighboring ASes.

In compliance with the RFC, R19 strips the MED value from its UPDATE message to R18 because the value was originally set by AS 300. The reasoning behind this behavior is explained by the nature of the MED attribute. MED stands for Multi-Exit Discriminator. The value is intended to represent a degree of preference to the neighboring AS when there are multiple entry points into the local AS. MED accomplishes this task by allowing the neighboring AS a glimpse into the local AS’s internal metric structure.

In this example, AS 300 has communicated to AS 100 that it has an internal metric of 41 to reach the prefix 130.7.1.1. If R19 advertises this same preference to R18, R18 will believe that AS 100 is advertising its internal metric, which is not true. The value 41 does not represent AS 100’s true internal metric structure. It represents AS 300’s internal metric structure. Thus, R19 strips it from the advertisement. A simpler way of stating this requirement is that a BGP router will not advertise the MED value of an internal path to an external peer without administrator intervention.

R18 does receive a path from R16 that has MED set to 41. This is the path R2 sent directly to R16, containing the indicated MED value. The path was communicated in the same manner as from R2 to R20 to R19. This time, because it was received directly from R2 in AS 300, the MED is representative of AS 300’s internal metric structure.

How MED Is Evaluated

Now that we have established how MED is set and communicated between BGP peers, the discussion can turn to how MED is evaluated and influences path selection in the BGP best-path algorithm.

As mentioned, MED is a communication of internal path preferences for a specific prefix between the local AS and its neighboring AS. The neighboring AS can consider this information when making decisions on how to send traffic into the local AS when there are multiple direct entry points from the neighboring AS to the local AS. The lower MED value is preferred between competing entry points, allowing MED to behave in a manner similar to IGP metrics.

To help show this preference, R2, R6, R10, and R11 are examined. First, the BGP table on R10 shows the following paths to reach the 130.7.1.1 prefix:

R10#show ip bgp 130.7.1.1
 
BGP routing table entry for 130.7.1.1/32, version 25
Paths: (5 available, best #5, table default)
  Advertised to update-groups:
     1
  Refresh Epoch 1
  300
    11.11.11.11 (metric 11) from 12.12.12.12 (12.12.12.12)
      Origin IGP, metric 0, localpref 100, valid, internal
      Originator: 11.11.11.11, Cluster list: 12.12.12.12
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  200 300
    200.10.17.17 from 200.10.17.17 (17.17.17.17)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1

  300
    200.3.10.3 from 200.3.10.3 (3.3.3.3)
      Origin IGP, metric 31, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  300
    200.2.10.2 from 200.2.10.2 (2.2.2.2)
      Origin IGP, metric 41, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  300
    11.11.11.11 (metric 11) from 11.11.11.11 (11.11.11.11)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0

In the above, R10 has paths from R2, R3, R11, R12, and R17 to reach the prefix 130.7.1.1. With the default path selection only the paths from R2, R3, R11, and R12 are considered (the path from R17 has a higher AS_PATH length and is least favorable as a result). ORIGIN codes tie between all paths, so the decision lands on the MED attribute. R10 chooses the path from R11 as its best path because its MED value is 0, which is the lowest possible MED value. From where did this MED value of 0

Note

Technically, R11’s and R12’s paths tie on R10. The deciding factor in this case is based on the lower cluster length attribute. Because R11 does not have a cluster length attribute, its path is chosen over R12’s. More about this decision-making process is explained in step 12 of the best-path algorithm

Missing MED Values R10 receives a path from R11 with a mysterious MED value of 0. To track down where this MED value comes from, the BGP table on R11 reveals that it learns the path to the 130.7.1.1/32 prefix from R6 with no MED value set:

R11#show ip bgp 130.7.1.1
 
BGP routing table entry for 130.7.1.1/32, version 30
Paths: (1 available, best #1, table default)
  Advertised to update-groups:
     2          3
  Refresh Epoch 1
  300
    200.6.11.6 from 200.6.11.6 (6.6.6.6)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0

R6’s BGP table reveals that its path to reach 130.7.1.1/32 has a MED value of 31 set:

R6#show ip bgp 130.7.1.1
 
BGP routing table entry for 130.7.1.1/32, version 29
Paths: (1 available, best #1, table default, RIB-failure(17) - next-
hop mismatch)
  Advertised to update-groups:
     1          3
  Refresh Epoch 1
  Local
    3.3.3.3 (metric 11) from 3.3.3.3 (3.3.3.3)
      Origin IGP, metric 31, localpref 100, valid, confed-internal,
best
      rx pathid: 0, tx pathid: 0x0

So why does R11 report a MED value of 0 and not 31 for the same path? Recall that, by default, BGP does not advertise MED values of internal paths to external peers. As a result, R6 strips the MED value from the advertisement to R11. When R11 receives the update with the missing MED value, it advertises it to R10 with a MED value of 0. This is a behavior specific to the Cisco implementation of BGP.

Cisco routers treat missing MED values as 0 values when they are received from eBGP neighbors. Originally, the BGP specification was unclear on what to do with missing MED values. Some implementations set them to maximum value, while others, Cisco included, set them to 0. The impact of this function results in R11’s path to reach the 130.7.1.1/32 prefix always being preferred over any path received by R10 for the same prefix.

BGP MED Missing as Worst By default, Cisco routers treat missing MED values as having a value of 0 when received from an external peer. This preference means paths without MED values set will always be preferred over those that do have MED set. Cisco routers, however, include a command that reverses this preference. Instead of considering missing MED values as MED value 0, the router will consider them to have the maximum MED value 4294967295. Such a change ensures that set MED values are always preferred over paths without set MED values. The bgp bestpath med missing-as-worst command can be used to activate this behavior.

To demonstrate such a configuration, the network will be configured such that R10 chooses the path through R3 to reach the 130.7.1.1 network by modifying MED alone. To do so, R10 needs to receive the path from R3 with the lowest MED value. However, the path R10 receives from R11 has a MED value of 0, which is the lowest possible value. By issuing the bgp bestpath med missing-as-worst command on R11, R11 will assign all paths with missing MED values the maximum MED value.

On R11:

R11(config)#router bgp 400
R11(config-router)#bgp bestpath med missing-as-worst
 
R11#clear ip bgp *

After configuring the above and executing the clear ip bgp * command, R11’s BGP table for 130.7.1.1 looks as follows:

R11#show ip bgp 130.7.1.1
 
BGP routing table entry for 130.7.1.1/32, version 42
BGP Bestpath: med
Paths: (3 available, best #2, table default)
  Advertised to update-groups:
     1          3
  Refresh Epoch 2
  300
    10.10.10.10 (metric 11) from 12.12.12.12 (12.12.12.12)
      Origin IGP, metric 31, localpref 100, valid, internal
      Originator: 10.10.10.10, Cluster list: 12.12.12.12
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 3
  300
    10.10.10.10 (metric 11) from 10.10.10.10 (10.10.10.10)
      Origin IGP, metric 31, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 2
  300
    200.6.11.6 from 200.6.11.6 (6.6.6.6)
      Origin IGP, metric 4294967295, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

What has happened above is that R11 receives three paths to reach the prefix 130.7.1.1—from R12, R10, and R6. Originally, R6’s path was assigned a MED value of 0 because it was missing from R6’s BGP UPDATE. With the lowest possible MED value, R6’s path was chosen. Now that bgp bestpath med missing-as-worst has been configured, R11 instead marks R6’s path with the highest possible MED value. When evaluating the best-path algorithm again, R11’s decision is between paths received from R10 and R12 because their MED values of 31 are much lower than the maximum MED value it has stored for the path received from R6. R10’s path is marked best due to step 12 of the best-path algorithm.

Modifying MED Evaluation

Note

Before starting this section, revert the configuration on all routers to the base initial configuration files provided with the lab.

By default, the MED attribute is only evaluated when deciding between two external paths from the same AS. Used in this way, as discussed previously, the remote AS can indicate a degree of preference for which entry point the local AS should use to reach the prefix advertised by the remote AS. If a BGP router receives multiple paths to the same prefix with differing MED values from different autonomous systems, these MED values are ignored during MED evaluation because they cannot be directly compared. The representation of internal metric information from one AS may not be the same as from another. For example, one AS may use monetary cost as its metric, while another uses bandwidth—and these values are not comparable. This is why, under normal circumstances, only MED values received from the same AS are compared.

Equally important, BGP best-path processing follows a top-down model. The first path received is compared to the second, the resulting winner is then compared to the third, and then the fourth, and so on until all available paths to a specific prefix have been evaluated and only a single best path remains. This approach can lead to very different results, depending on the order in which paths are received from external and internal peers.

To demonstrate these differences, the topology is modified in the following manner:

  1. R10 has its peerings to R2, R3, R9, R11, R12, and R17 shut down in the BGP configuration.

  2. R2 is configured to send a MED value of 200 to R10 for the prefix 110.19.1.1/32.

  3. R3 is configured to send a MED value of 300 to R10 for the prefix 110.19.1.1/32.

  4. R9 is configured to send a MED value of 100 to R10 for the prefix 1+ 10.19.1.1/32.

  5. R17 is configured to send a MED value of 150 to R10 for the prefix 110.19.1.1/32.

  6. R10’s BGP peerings to R9, R3, R17, and R2 are brought back up—in that order (making sure BGP learns a path to 110.19.1.1/32 from each peer before bringing the next up).

The relevant configuration modifications are shown below:

On R10:

R10(config)#router bgp 400
R10(config-router)#neighbor 200.10.17.17 shutdown
R10(config-router)#neighbor 200.3.10.3 shutdown
R10(config-router)#neighbor 200.2.10.2 shutdown
R10(config-router)#neighbor 9.9.9.9 shutdown
R10(config-router)#neighbor 11.11.11.11 shutdown
R10(config-router)#neighbor 12.12.12.12 shutdown
 

On R2:

R2(config)#ip prefix-list 123 permit 110.19.1.1/32
 
R2(config)#route-map TST permit 10
R2(config-route-map)#match ip address prefix 123
R2(config-route-map)#set metric 200
 
R2(config)#router bgp 312
R2(config-router)#neighbor 200.2.10.10 route-map TST out

On R3:

R3(config)#ip prefix-list 123 permit 110.19.1.1/32
 
R3(config)#route-map TST permit 10
R3(config-route-map)#match ip address prefix 123
R3(config-route-map)#set metric 300
 
R3(config)#router bgp 336
R3(config-router)#neighbor 200.3.10.10 route-map TST out
 

On R9:

R9(config)#ip prefix-list 123 permit 110.19.1.1/32
 
R9(config)#route-map TST permit 10
R9(config-route-map)#match ip address prefix 123
R9(config-route-map)#set metric 100
 
R9(config)#router bgp 400
R9(config-router)#neighbor 10.10.10.10 route-map TST out

On R17:

R17(config)#ip prefix-list 123 permit 110.19.1.1/32
 
R17(config)#route-map TST permit 10
R17(config-route-map)#match ip address prefix 123
R17(config-route-map)#set metric 150
 
R17(config)#router bgp 200
R17(config-router)#neighbor 200.10.17.10 route-map TST out

On R10:

R10(config)#router bgp 400
R10(config-router)#no neighbor 9.9.9.9 shutdown
R10(config-router)#no neighbor 200.3.10.3 shutdown
R10(config-router)#no neighbor 200.10.17.17 shutdown
R10(config-router)#no neighbor 200.2.10.2 shutdown

As a result of the sequence of configurations, R10’s BGP table lists all of the learned paths for the 110.19.1.1/32 prefix in the order presented in the example below. For readability, internal paths are highlighted in orange, external paths are highlighted in green, MED values are highlighted in red, and the AS_PATH attribute is highlighted in purple:

On R10:

R10#show ip bgp 110.19.1.1
 
BGP routing table entry for 110.19.1.1/32, version 84
BGP Bestpath: med
Paths: (4 available, best #3, table default)
  Advertised to update-groups:
     6          7
  Refresh Epoch 1
  300 100
    200.2.10.2 from 200.2.10.2 (2.2.2.2)
      Origin IGP, metric 200, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  200 100

    200.10.17.17 from 200.10.17.17 (17.17.17.17)
      Origin IGP, metric 150, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  300 100
    200.3.10.3 from 200.3.10.3 (3.3.3.3)
      Origin IGP, metric 300, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  200 100
    9.9.9.9 (metric 11) from 9.9.9.9 (9.9.9.9)
      Origin IGP, metric 100, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0

With the top-down BGP comparison method, the best path is determined as follows:

  1. The path from R2 is compared to the path from R17. R17 is chosen as best because it is the older path.

  2. The path from R17 is then compared to the path from R3. The path from R3 is chosen as best because it is the older path as well.

  3. The path from R3 is compared to the path from R9. The path from R3 is chosen as best because it is an external path.

In this comparison, R3 is chosen as the best path, even though all other paths possess better MED values. MED is not considered because, at every comparison, the paths being compared did not have the same source AS (the first ASN in the AS_PATH). In addition, the R3/R9 comparison does not consider MED because the two paths are not external paths.

BGP best-path processing can be altered to influence a different outcome by forcing the router to always compare MED values, forcing the router to reorganize how the paths were received, or forcing the router to do both. The following sections examine this process with the above setup in mind.

Always Comparing MED Values In the above case, the BGP decision would be vastly different if the MED values were compared between the four competing paths. However, due to the rules regarding MED comparison, MED does not have an effect in the decision-making process.

The bgp always-compare-med command changes this behavior. With this command enabled, the MED attribute is always compared between two competing paths, regardless of whether the paths are both external paths or were received from the same AS. The effects can be seen when applied to R10, as shown on the next page.

First, the BGP peering of R10 to R9, R3, R17, and R2 is shut down. The bgp always-compare-med command is issued on R10 in BGP router configuration mode. The peerings are once again brought back up in the order R9, R3, R17, and R2:

On R10:

R10(config)#router bgp 400
 
R10(config-router)#neighbor 9.9.9.9 shutdown
R10(config-router)#neighbor 200.3.10.3 shutdown
R10(config-router)#neighbor 200.10.17.17 shutdown
R10(config-router)#neighbor 200.2.10.2 shutdown
R10(config-router)#bgp always-compare-med
 
! Reestablishing the BGP peerings in order:
 
R10(config-router)#no neighbor 9.9.9.9 shutdown
 
%BGP-5-ADJCHANGE: neighbor 9.9.9.9 Up
 
R10(config-router)#no neighbor 200.3.10.3 shutdown
 
%BGP-5-ADJCHANGE: neighbor 200.3.10.3 Up
 
R10(config-router)#no neighbor 200.10.17.17 shutdown
 
%BGP-5-ADJCHANGE: neighbor 200.10.17.17 Up
 
R10(config-router)#no neighbor 200.2.10.2 shutdown
 
%BGP-5-ADJCHANGE: neighbor 200.2.10.2 Up

In this run, the result seems to be unexpected regarding normal best-path operation:

R10#show ip bgp 110.19.1.1
 
BGP routing table entry for 110.19.1.1/32, version 85
BGP Bestpath: med
Paths: (4 available, best #4, table default)
  Advertised to update-groupsc:
     7
  Refresh Epoch 1
  300 100

    200.2.10.2 from 200.2.10.2 (2.2.2.2)
      Origin IGP, metric 200, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  200 100
    200.10.17.17 from 200.10.17.17 (17.17.17.17)
      Origin IGP, metric 150, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  300 100
    200.3.10.3 from 200.3.10.3 (3.3.3.3)
      Origin IGP, metric 300, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  200 100
    9.9.9.9 (metric 11) from 9.9.9.9 (9.9.9.9)
      Origin IGP, metric 100, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0

In this case, the internal path from R9 is chosen over all three competing external paths. The reason, of course, is because of the command enabled on R10. This time, R10 considers MED in all comparisons, regardless of external or internal status and regardless of whether the paths were received from the same AS. The comparison process for the paths is conducted in this manner:

  1. The path from R2 is compared to the path from R17. The path from R17 has a lower MED value than the path from R2. R17’s path is preferred.

  2. The path from R17 is compared to the path from R3. The path from R17 still has a lower MED value than the path from R3. R17’s path is preferred.

  3. The path from R17 is compared to the path from R9. The path from R9 has a lower MED value than the path from R17. R9’s path is preferred.

The bgp always-compare-med option is a powerful command. It is most useful in situations where a set of autonomous systems all agree on how MED values are measured. Typically, MED values received from different autonomous systems are not directly comparable because the MED values are measures of different metrics between the two autonomous systems. If all autonomous systems agree on how MED values are measured, then the MED values become directly comparable once again.

Note

The logic of comparable metrics is not unique to BGP. The same logic is used with OSPF external routes as well. OSPF type 1 external routes will combine the external metric with the internal metric when calculating costs. With a type 1 OSPF external route, it is assumed that the external metric is directly comparable to the internal metrics and thus the metrics can be aggregated together in the OSPF domain.

More Deterministic MED Evaluation There may be cases in which the MED values are not directly comparable, but the administrator wants to ensure that the best MED value possible is represented during the best-path processing of a BGP router. In the original example, the path from R2 has a MED value of 200, and the path from R3 has a MED value of 300. If these paths are compared directly, R2 provides a better path based on MED. Due to the order in which the paths were received, the R2/R3 comparison never happens, and R3 is chosen as the best path.

BGP can be modified to always take into consideration MED values received from the same AS. This modification ensures that the path from the AS with the best MED value is always selected before a comparison with paths from other autonomous systems occurs. In this way, MED’s influence on the decision process becomes more deterministic. The bgp deterministic-med command activates this feature.

Once configured with the bgp deterministic-med command, the router will reorganize its BGP table such that paths received from the same AS are grouped together. MED values are first compared between all paths belonging to the same AS. After this comparison, BGP then compares the winning paths from each AS to each other in a top-down fashion.

To demonstrate, R10 is reset as indicated above. The bgp always-compare-med command is removed, and the bgp deterministic-med command is then enabled on R10. The results are as follows:

On R10:

R10(config)#router bgp 400
R10(config-router)#no bgp always-compare-med
R10(config-router)#bgp deterministic-med
 
R10#show ip bgp 110.19.1.1
 
BGP routing table entry for 110.19.1.1/32, version 87
BGP Bestpath: deterministic-med: med
Paths: (4 available, best #3, table default)
  Advertised to update-groups:
     6          7
  Refresh Epoch 1
  200 100
    9.9.9.9 (metric 11) from 9.9.9.9 (9.9.9.9)
      Origin IGP, metric 100, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  200 100
    200.10.17.17 from 200.10.17.17 (17.17.17.17)
      Origin IGP, metric 150, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

  Refresh Epoch 1
  300 100
    200.2.10.2 from 200.2.10.2 (2.2.2.2)
      Origin IGP, metric 200, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  300 100
    200.3.10.3 from 200.3.10.3 (3.3.3.3)
      Origin IGP, metric 300, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

Notice in the output above that BGP has actually reorganized the paths based on their source AS. R10 also chooses R2’s path as the best path. To come to this conclusion, R10 follows these steps:

  1. The path received from R3 is compared to the path received from R2. The path from R2 has a lower MED value than the path from R3. R2’s path replaces R3’s as the current best path.

  2. The path received from R9 is compared to the path received from R17. R17’s path is considered best because it is an external path.

  3. The path from R17 is compared to the path from R2. MED is not considered because the paths are from different autonomous systems. Processing falls to retaining the current best path over installing a new best path. R2’s path is chosen as the best path.

This processing order ensures that R2’s path is the chosen path to represent AS 300, even though it is the path that was received last. This processing order creates a more deterministic outcome for paths received from the same AS. The administrator knows the path received with the lower MED value will always be chosen for comparison to other paths in the BGP table.

Combining the Options The final case for modifying the MED processing is a case in which MED values are always compared and are made deterministic. This configuration is achieved by using both bgp always-compare-med and bgp deterministic-med together. When both options are present, processing first starts by grouping together paths received from the same AS and determining a best path among them. Then all other paths are evaluated in order, with the better MED value winning.

R10’s configuration is updated to include both the bgp always-compare-med and bgp deterministic-med commands. After configuration, it is instructed to do a soft refresh of its BGP table, using the clear ip bgp * soft command. The configuration steps and output are shown below:

On R10:

R10(config)#router bgp 400
R10(config-router)#bgp always-compare-med
R10(config-router)#bgp deterministic-med
 
R10#clear ip bgp *
 
R10#show ip bgp 110.19.1.1
 
BGP routing table entry for 110.19.1.1/32, version 96
BGP Bestpath: deterministic-med: med
Paths: (4 available, best #1, table default)
Flag: 0x820
  Advertised to update-groups:
     11
  Refresh Epoch 1
  200 100
   9.9.9.9 (metric 11) from 9.9.9.9 (9.9.9.9)
      Origin IGP, metric 100, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  200 100
    200.10.17.17 from 200.10.17.17 (17.17.17.17)
      Origin IGP, metric 150, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  300 100
    200.2.10.2 from 200.2.10.2 (2.2.2.2)
      Origin IGP, metric 200, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  300 100
    200.3.10.3 from 200.3.10.3 (3.3.3.3)
      Origin IGP, metric 300, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

R10 now chooses the path through R9 as the best path. This decision is made using the following process:

  1. All paths from the same AS are grouped together.

  2. The path from R9 is compared to the path from R17. The path from R9 has the lower MED value and is chosen as the best path from all AS 200 paths.

  3. The path from R2 is compared to the path from R3. The path from R2 has the lower MED value and is chosen as the best path from all AS 300 paths.

  4. The path from R9 is compared to the path from R2. The path from R9 has the lower MED value and is chosen as the best path overall.

Keep in mind that each of these features, bgp always-compare-med and bgp deterministic-med, should be configured everywhere to ensure consistent decision making for all BGP routers in the AS. In addition, the bgp always-compare-med command is typically used only if a group of autonomous systems all agree on the same measure for assigning MED values.

Step 7: eBGP over iBGP

Note

Before starting this section, revert the configuration on all routers to the base initial configuration files provided with the lab.

The seventh step in the BGP best-path algorithm gives preference to external paths over internal paths. If no best path has been chosen within the first six steps of the best-path algorithm, BGP will prefer to send traffic on an external path rather than an internal one. Paths received from an eBGP peer are external paths, while paths received from an iBGP peers are internal paths.

This is an important distinction to make. What makes a path internal or external is not the path itself but from which type of BGP peering the path was learned. This means a path can be advertised to the router by an external peer, existing in the local router’s BGP table as an external path.

When the router advertises this same path to one of its iBGP peers, the iBGP peer will consider the path an internal path in its own BGP table. The distinction points to the fact that BGP does not advertise routes per se. Instead, it advertises paths and descriptions about those paths. A path learned from an eBGP peer describes a path that is external to the local BGP domain. A path learned from an iBGP peer describes a path inside the local BGP domain. Consider this example:

On R19:

R19#show ip bgp 140.15.1.1
 
BGP routing table entry for 140.15.1.1/32, version 34
Paths: (2 available, best #2, table default)
  Advertised to update-groups:
     2
  Refresh Epoch 1
  300 400
    20.20.20.20 (metric 11) from 20.20.20.20 (20.20.20.20)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  200 400
    200.18.19.18 from 200.18.19.18 (18.18.18.18)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0

In this case, R19 has learned two paths to reach the prefix 140.15.1.1/32. Keep in mind that the 140.15.1.1/32 prefix does not exist within R19’s local BGP domain (AS 100). However, that does not determine the path types reported in the BGP table. R19 learns one path from its iBGP neighbor R20 and another from its eBGP peer R18. If R19 were to follow the path to R20, the packets would flow toward a BGP router inside the domain first. Thus, it is an internal path. If R19 were to follow the path received from R18, the packets would flow to a BGP router outside, or external to, the domain. Thus the path is an external path.

Note

The above statement may seem erroneous when considering multi-hop eBGP peerings. It is possible for a BGP router to form an eBGP peering with a BGP router that is not directly connected to it. In this case, even if following an external path, the packets may flow between multiple internal routers before reaching the appropriate eBGP peer.

BGP does not see the topologies as such. BGP does not concern itself with the inner specific workings of a particular AS. Instead, it is more concerned with routing outside the AS. From BGP’s perspective, the AS is treated as a single router. An eBGP peering describes a link between two autonomous systems or two singular routers. From the perspective of how BGP understands the topology, the BGP router is following an external path because it was learned from an external router.

This policy assumes that the administrator has no preference between the two paths in question because the preceding six criteria (WEIGHT, LOCAL_PREF, locally originated versus received from another peer, AS_PATH, ORIGIN, and MED) have tied, and there is no explicit administration-driven preference between the two paths. Thus BGP makes the decision to prefer the external path over the internal one.

This is basically a form of hot-potato routing, where the AS attempts to exit external traffic in the smallest number of internal hops possible. For example, the BGP table on R10 for the 120.18.1.1/32 and 120.18.2.1/32 prefixes reveals the following:

On R10:

R10#show ip bgp regex _200$
 
BGP table version is 8, local router ID is 10.10.10.10
Status codes: s suppressed, d damped, h history, * valid, > best,
i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path,
f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
              t secondary path,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
 
     Network          Next Hop            Metric LocPrf Weight Path
 * i  120.18.1.1/32    9.9.9.9              0    100      0 200 i
 * i                   9.9.9.9              0    100      0 200 i
 *                     200.2.10.2                         0 300 200 i
 *                     200.3.10.3                         0 300 200 i
 *>                    200.10.17.17                       0 200 i
 * i  120.18.2.1/32    9.9.9.9              0    100      0 200 i
 * i                   9.9.9.9              0    100      0 200 i
 *                     200.2.10.2                         0 300 200 i
 *                     200.3.10.3                         0 300 200 i
 *>                    200.10.17.17                       0 200 i

R10 receives five paths for both prefixes. Paths to next hops 200.3.10.3 and 200.2.10.2 are not considered because of a longer AS_PATH length. For the remaining paths, in accordance with step 7, R10 will choose the paths with the next hop 200.10.17.17 as its best path because it was received from an external peer. The other two remaining paths were received from the internal iBGP peer, R9 (indicated with the i to the left of the prefix).

Thinking about the flow of traffic, with R10 choosing the external path directly to R17, R10 will forward transit traffic for 120.18.1.1/32 and 120.18.2.1/32 directly out the local AS toward the proper destination.

If R10 chose to send to R9 (as indicated in its internal paths), transit traffic traverses an extra hop inside the local AS before ultimately leaving. This is an example of cold-potato routing, where transit traffic is kept local to the AS longer before leaving the AS, possibly leading to latency and internal link utilization.

Confederations

Recall the concept of confederation internal peers and confederation external peers. Some publications claim that the decision-making algorithm prefers confederation external peers over confederation internal peers. Let’s examine these claims from R5’s point of view of the 120.18.1.1 prefix:

On R5:

R5#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 4
Paths: (2 available, best #1, table default)
  Advertised to update-groups:
     1
  Refresh Epoch 1
  (336 312) 200
    3.3.3.3 (metric 21) from 4.4.4.4 (4.4.4.4)
      Origin IGP, metric 0, localpref 100, valid, confed-internal,
best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  (378 336 312) 200
    6.6.6.6 (metric 31) from 7.7.7.7 (7.7.7.7)
      Origin IGP, metric 0, localpref 100, valid, confed-external
      rx pathid: 0, tx pathid: 0

Here, R5 receives two paths to 120.18.1.1: one from its confederation iBGP (internal) neighbor R4 and the other from its confederation eBGP (external) neighbor R7. According to the statements from other sources, R5 should prefer the path through R7, but this is not the case, as you can see above. In reality, with all else being equal, the decision falls on the lowest IGP metric to the next hop.

R5’s metric to reach next hop 3.3.3.3 is 21, whereas the metric to reach next hop 6.6.6.6 is 31. Since the metric to the next hop 3.3.3.3 is lower than the metric to 6.6.6.6, R5 uses the path from R4.

To prove this point, the metrics will be modified in the topology. This is done by modifying the OSPF cost on the E0/0.45 interface on R5 to 20. With the changes made, the output below reveals a tie in metric to reach both 3.3.3.3 and 6.6.6.6:

R5(config)#interface e0/0.45
R5(config-subif)#ip ospf cost 20
 
R5#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 4
Paths: (2 available, best #1, table default)
Flag: 0x100
  Advertised to update-groups:
     1
  Refresh Epoch 1
  (336 312) 200
    3.3.3.3 (metric 31) from 4.4.4.4 (4.4.4.4)
      Origin IGP, metric 0, localpref 100, valid, confed-internal, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  (378 336 312) 200
    6.6.6.6 (metric 31) from 7.7.7.7 (7.7.7.7)
      Origin IGP, metric 0, localpref 100, valid, confed-external
      rx pathid: 0, tx pathid: 0

Because R4’s RID 4.4.4.4 is lower than R7’s RID 7.7.7.7, R4’s path is chosen as best. This fact can once again be proven by modifying the router IDs on both routers such that R7’s RID is lower:

On R7:

R7(config)#router bgp 378
R7(config-router)#bgp router-id 1.1.1.7

On R5:

R5#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 14
Paths: (2 available, best #1, table default)

  Advertised to update-groups:
     2
  Refresh Epoch 1
  (378 336 312) 200
    6.6.6.6 (metric 31) from 7.7.7.7 (1.1.1.7)
      Origin IGP, metric 0, localpref 100, valid, confed-external,
best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  (336 312) 200
    3.3.3.3 (metric 31) from 4.4.4.4 (4.4.4.4)
      Origin IGP, metric 0, localpref 100, valid, confed-internal
      rx pathid: 0, tx pathid: 0

With the changes made, the path via R7 is chosen as the best path due the lower RID. The results above should not be surprising as this is the exact behavior described in RFC 5065 Section 5.3, point 4:

Path selection criteria for information received from members inside a confederation MUST follow the same rules used for information received from members inside the same autonomous system, as specified in [BGP-4].

In addition, the following rules SHALL be applied:

  1. If the AS_PATH is internal to the local confederation (i.e., there are only AS_CONFED_* segments), consider the neighbor AS to be the local AS.

  2. Otherwise, if the first segment in the path that is not an AS_CONFED_SEQUENCE or AS_CONFED_SET is an AS_SEQUENCE, consider the neighbor AS to be the leftmost AS_SEQUENCE AS.

  3. When comparing routes using AS_PATH length, CONFED_SEQUENCE and CONFED_SETs SHOULD NOT be counted.

  4. When comparing routes using the internal (IBGP learned) versus external (EBGP learned) rules, treat a route that is learned from a peer that is in the same confederation (not necessarily the same Member-AS) as "internal".

This simply means that all peers that are members of the same confederation should be treated as internal peers, regardless of whether they belong to the same member AS. Because R4, R5, and R7 are all members of the confederation AS 300, when comparing paths received from confederation members, the paths are treated as though they are internal paths.

This rule also has the side effect of making eBGP paths preferred over both iBGP and confederation eBGP paths because iBGP and confederation eBGP paths are treated the same.

Step 8: Lowest IGP Metric to the Next Hop

Note

Before starting this section, revert the configuration on all of the routers to the base initial configuration files provided with the lab.

At this stage in the best-path algorithm, BGP is left with two paths that are both either external or internal. Step 8 in the best-path algorithm is a comparison that is based on internal metrics to reach the next hop. In this step, BGP prefers paths to which the local router has a lower metric to reach the BGP next hop. The comparison exists as an enhancement to BGP’s hot-potato routing default. By comparing the internal cost to reach the next hop for two similar paths, BGP further ensures that the closest exit is chosen for the traffic.

The IGP metric to the next hop is the aggregate IGP metric, as stored in the local router’s RIB. This information is kept up to date with IGP metric changes.

To examine the effects of the IGP next hop on BGP decision making, the 120.18.1.1/32 network is examined within AS 400. The following catalogs the prefix’s journey from edge router to internal router:

  1. R10 and R9 receive multiple paths for the 120.18.1.1 prefix. They both choose their eBGP paths from R17 as best and advertise them to R12. R10 also advertises its best path to R11.

    On R10:

    R10#show ip bgp regexp _200$
     
    BGP table version is 10, local router ID is 10.10.10.10
    Status codes: s suppressed, d damped, h history, * valid, > best,
    i - internal,
                  r RIB-failure, S Stale, m multipath, b backup-path,
    f RT-Filter,
                  x best-external, a additional-path, c RIB-compressed,
                  t secondary path,
    Origin codes: i - IGP, e - EGP, ? - incomplete
    RPKI validation codes: V valid, I invalid, N Not found
     
         Network          Next Hop            Metric LocPrf Weight Path
     * i  120.18.1.1/32    9.9.9.9              0    100      0 200 i
     * i                   9.9.9.9              0    100      0 200 i
     *>                    200.10.17.17                       0 200 i
     *                     200.3.10.3                         0 300 200 i
     *                     200.2.10.2                         0 300 200 i
     * i  120.18.2.1/32    9.9.9.9              0    100      0 200 i
     * i                   9.9.9.9              0    100      0 200 i
    
     *>                    200.10.17.17                       0 200 i
     *                     200.3.10.3                         0 300 200 i
     *                     200.2.10.2                         0 300 200 i
    

    On R9:

    R9#show ip bgp regexp _200$
     
    BGP table version is 97, local router ID is 9.9.9.9
    Status codes: s suppressed, d damped, h history, * valid, > best,
    i - internal,
                  r RIB-failure, S Stale, m multipath, b backup-path,
    f RT-Filter,
                  x best-external, a additional-path, c RIB-compressed,
                  t secondary path,
    Origin codes: i - IGP, e - EGP, ? - incomplete
    RPKI validation codes: V valid, I invalid, N Not found
         Network          Next Hop            Metric LocPrf Weight Path
     * i  120.18.1.1/32    10.10.10.10              0    100      0 200 i
     *>                    200.9.17.17                            0 200 i
     * i  120.18.2.1/32    10.10.10.10              0    100      0 200 i
     *>                    200.9.17.17                            0 200 i
    
  2. R12 receives two paths: one from R9 and one from R10. It chooses the one from R9 because of lower RID and advertises it to R11 and R13 (as shown below).

    On R12:

    R12#show ip bgp 120.18.1.1
    BGP routing table entry for 120.18.1.1/32, version 6
    Paths: (2 available, best #1, table default)
      Advertised to update-groups:
         1
      Refresh Epoch 1
      200, (Received from a RR-client)
        9.9.9.9 (metric 11) from 9.9.9.9 (9.9.9.9)
          Origin IGP, metric 0, localpref 100, valid, internal, best
          rx pathid: 0, tx pathid: 0x0
      Refresh Epoch 1
      200, (Received from a RR-client)
        10.10.10.10 (metric 11) from 10.10.10.10 (10.10.10.10)
          Origin IGP, metric 0, localpref 100, valid, internal
          rx pathid: 0, tx pathid: 0
    
  3. R11 receives three paths: one from R6, one from R10, and one from R12. The path through R6 has a longer AS_PATH length and is less preferred. The remaining paths from R10 and R12 are compared. The next hop for the path via R10 is 10.10.10.10 and for the path via R12 is 9.9.9.9. R11 chooses the one from R10 because the metric to next hop 10.10.10.10 is lower than the one to next hop 9.9.9.9. R10 advertises the path received from R10 to R14.

    On R11:

    R11#show ip bgp 120.18.1.1
    BGP routing table entry for 120.18.1.1/32, version 4
    Paths: (3 available, best #3, table default)
      Advertised to update-groups:
         1          3
      Refresh Epoch 1
      200
        9.9.9.9 (metric 21) from 12.12.12.12 (12.12.12.12)
          Origin IGP, metric 0, localpref 100, valid, internal
          Originator: 9.9.9.9, Cluster list: 12.12.12.12
          rx pathid: 0, tx pathid: 0
      Refresh Epoch 1
      300 200
        200.6.11.6 from 200.6.11.6 (6.6.6.6)
          Origin IGP, localpref 100, valid, external
          rx pathid: 0, tx pathid: 0
      Refresh Epoch 1
      200
        10.10.10.10 (metric 11) from 10.10.10.10 (10.10.10.10)
          Origin IGP, metric 0, localpref 100, valid, internal, best
          rx pathid: 0, tx pathid: 0x0
    

    Note

    The output above may look odd because the next hop advertised by R12 for the path to 120.18.1.1 is 9.9.9.9 instead of itself (12.12.12.12). However, this is normal behavior for BGP routers in a route reflector configuration.

    Originally, R9 received the advertisement from R17, its eBGP neighbor. When R9 advertised the path to R12, it set itself as the next hop because it is configured with the next-hop-self command under its neighbor command for its peering with R12.

    Under normal circumstances, R12 would not advertise the path to R11 because it is an internal path. R11 is a route reflector client of R12, however, which means R12 can relax its normal iBGP split-horizon rule and reflect the path to R11.

    When a route reflector reflects a path, it does not change the next hop to itself. This is because the route reflector does not insert itself into the data forwarding path unnecessarily. The next hop should be retained as R9 because it was R9 that originally advertised in the internal prefix, and R9 is the edge router to which all other routers in the topology should recurse.

    To help better show this interaction, two packet captures are included below:

    ! BGP update message from R9 to R12:
     
    Internet Protocol Version 4, Src: 9.9.9.9, Dst: 12.12.12.12
    Transmission Control Protocol, Src Port: 179, Dst Port: 41971, Seq:
    66, Ack: 397, Len: 171
     
    Border Gateway Protocol - UPDATE Message
        Marker: ffffffffffffffffffffffffffffffff
        Length: 67
        Type: UPDATE Message (2)
        Withdrawn Routes Length: 0
        Total Path Attribute Length: 34
        Path attributes
    Path Attribute - ORIGIN: IGP
    Path Attribute - AS_PATH: 200 
    Path Attribute - NEXT_HOP: 9.9.9.9 Path Attribute - MULTI_EXIT_DISC: 0
    Path Attribute - LOCAL_PREF: 100
        Network Layer Reachability Information (NLRI)
            120.18.1.1/32
            120.18.2.1/32
    ! BGP update message from R12 to R11:
    Internet Protocol Version 4, Src: 12.12.12.12, Dst: 11.11.11.11
    Transmission Control Protocol, Src Port: 179, Dst Port: 60255, Seq:
    24, Ack: 1, Len: 354
    Border Gateway Protocol - UPDATE Message
        Marker: ffffffffffffffffffffffffffffffff
        Length: 81
        Type: UPDATE Message (2)
        Withdrawn Routes Length: 0
        Total Path Attribute Length: 48
        Path attributes
    Path Attribute - ORIGIN: IGP
    Path Attribute - AS_PATH: 200
    Path Attribute - NEXT_HOP: 9.9.9.9
    Path Attribute - MULTI_EXIT_DISC: 0
    Path Attribute - LOCAL_PREF: 100
    
    Path Attribute - CLUSTER_LIST: 12.12.12.12 Path Attribute -
    ORIGINATOR_ID: 9.9.9.9
        Network Layer Reachability Information (NLRI)
            120.18.1.1/32
            120.18.2.1/32
    

    R9 advertises itself as next hop to R12 in the first packet capture. In the second capture,

    R12 continues to advertise R9 as next hop to R11 because it is performing route reflection.

  4. R13 receives a single path to the prefix from R12 and advertises it on to R14.

    On R13:

    R13#show ip bgp 120.18.1.1
     
    BGP routing table entry for 120.18.1.1/32, version 129
    Paths: (1 available, best #1, table default)
      Advertised to update-groups:
         2
      Refresh Epoch 1
      200
        9.9.9.9 (metric 21) from 12.12.12.12 (12.12.12.12)
          Origin IGP, metric 0, localpref 100, valid, internal, best
          Originator: 9.9.9.9, Cluster list: 12.12.12.12
          rx pathid: 0, tx pathid: 0x0
    
  5. R14 has two paths: one path with next hop 10.10.10.10 and the other with next hop 9.9.9.9, shown below. R14 chooses the path through R10 as best because of the lower metric to the next hop 10.10.10.10.

    On R14:

    R14#show ip bgp 120.18.1.1
     
    BGP routing table entry for 120.18.1.1/32, version 43
    Paths: (2 available, best #2, table default)
    Flag: 0x100
      Not advertised to any peer
      Refresh Epoch 2
      200
        9.9.9.9 (metric 31) from 13.13.13.13 (13.13.13.13)
          Origin IGP, metric 0, localpref 100, valid, internal
    
          Originator: 9.9.9.9, Cluster list: 13.13.13.13, 12.12.12.12
          rx pathid: 0, tx pathid: 0
      Refresh Epoch 2
      200
        10.10.10.10 (metric 21) from 11.11.11.11 (11.11.11.11)
          Origin IGP, metric 0, localpref 100, valid, internal, best
          Originator: 10.10.10.10, Cluster list: 11.11.11.11
          rx pathid: 0, tx pathid: 0x0
    

In the sequence above, both R11 and R14 make best-path decisions based on their internal IGP metric to reach the next hop for the prefix 120.18.1.1/32.

First, R11 chooses its path from R10 because its metric to the next hop 10.10.10.10 is 1,1 compared to the alternate path through R12, with a next hop of 9.9.9.9 and metric of 21. R14 chooses the path from R11 with metric 21 to reach next hop 10.10.10.10 compared to the path from R13, with a metric of 31, to reach the next hop 9.9.9.9.

All the metrics from the above are calculated based on the cumulative IGP metric to reach the next hop. In this case, the metric used is the OSPF cost for the routes as installed in the routing table, as shown in the output on R14:

R14#show ip route 9.9.9.9
Routing entry for 9.9.9.9/32
  Known via "ospf 1", distance 110, metric 31, type intra area
  Last update from 40.11.14.11 on Ethernet0/0.1114, 2d23h ago
  Routing Descriptor Blocks:
  * 40.13.14.13, from 9.9.9.9, 2d23h ago, via Ethernet0/0.1314
      Route metric is 31, traffic share count is 1
    40.11.14.11, from 9.9.9.9, 2d23h ago, via Ethernet0/0.1114
      Route metric is 31, traffic share count is 1
 
R14#show ip route 10.10.10.10
 
Routing entry for 10.10.10.10/32
  Known via "ospf 1", distance 110, metric 21, type intra area
  Last update from 40.11.14.11 on Ethernet0/0.1114, 00:51:21 ago
  Routing Descriptor Blocks:
  * 40.11.14.11, from 10.10.10.10, 00:51:21 ago, via Ethernet0/0.1114
      Route metric is 21, traffic share count is 1

To manipulate BGP’s choice of best path utilizing the IGP metric to the next hop, the administrator can modify the physical link OSPF costs to engineer the desired results. For example, to force R14 to prefer the path through R9 instead of R10, the administrator could change the costs of R12’s e0/0.1012 interface and R14’s e0/0.1114 interface.

In the following, R12 has its e0/0.1012 interface’s cost increased to 20, while R14 has its e0/0.1114 interface’s cost increased to 30. The result increases the metric R14 uses to reach the next hop 10.10.10.10 from 20 to 41, as shown below:

On R12:

R12(config)#interface e0/0.1012
R12(config-subif)#ip ospf cost 20

On R14:

R14(config)#interface e0/0.1114
R14(config-subif)#ip ospf cost 10
 
R14#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 53
Paths: (2 available, best #1, table default)
  Not advertised to any peer
  Refresh Epoch 2
  200
    9.9.9.9 (metric 31) from 13.13.13.13 (13.13.13.13)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Originator: 9.9.9.9, Cluster list: 13.13.13.13, 12.12.12.12
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 2
  200
    10.10.10.10 (metric 41) from 11.11.11.11 (11.11.11.11)
      Origin IGP, metric 0, localpref 100, valid, internal
      Originator: 10.10.10.10, Cluster list: 11.11.11.11
      rx pathid: 0, tx pathid: 0

As a result of the above configuration, R14 now installs the path via R9 as the best path due to the lower metric (31) to reach 9.9.9.9.

Step 9: Determine if Multiple Paths Exist

Note

Before starting this section, revert the configuration on all routers to the base initial configuration files provided with the lab.

A common phenomenon in IP routing is a situation whereby the router receives multiple equal-cost routes to reach a specific destination. In such a situation, the individual routing protocols can offer both routes to the RIB for load sharing purposes. BGP also includes this functionality, but it is implemented based on two criteria.

First, BGP selects only a single path as the best path to a particular prefix and installs it into the RIB. This setting is limited by what is known as the maximum path configuration setting. By default, the maximum path setting is set to 1, which is why only a single path is installed into the RIB from BGP. The show ip protocols output below confirms this default setting:

On R10:

R10#show ip protocols | section "bgp 400"
 
Routing Protocol is "bgp 400"
  Outgoing update filter list for all interfaces is not set
  Incoming update filter list for all interfaces is not set
  IGP synchronization is disabled
  Automatic route summarization is disabled
  Neighbor(s):
    Address          FiltIn FiltOut DistIn DistOut Weight RouteMap
    9.9.9.9
    11.11.11.11
    12.12.12.12
    200.2.10.2
    200.3.10.3
    200.10.17.17
  Maximum path: 1
  Routing Information Sources:
    Gateway         Distance      Last Update
    12.12.12.12          200      03:43:11
    200.10.17.17          20      03:43:48
    200.2.10.2            20      03:43:48
  Distance: external 20 internal 200 local 200

The administrator must designate the type and quantity of paths that can be considered for multipath. For example, if the administrator designates that only two external paths can be considered to be installed into the RIB, then BGP picks two external paths as multipath and one as best path. Both paths are installed into the RIB.

The maximum-paths [ibgp | eibgp] [number-of-paths] command controls the quantity and type of paths that can be used as best paths and can be installed as multiple paths in the RIB. There are three ways this command can be used in BGP router configuration mode:

  • maximum-paths [number-of-paths]: Chooses only equal external paths

  • maximum-paths eibgp [number-of-paths]: Chooses between a mix of equal external and internal paths

  • maximum-paths ibgp [number-of-paths]: Chooses only equal internal paths

For example, let’s configure R10 to increase its default number of maximum paths to four by using the command maximum-paths 4. With this form of the command, R10 will only select up to four equal-cost paths as multipath in the BGP RIB to be installed in the RIB. The effects of this change are reflected in the show ip protocols output on R10:

R10(config)#router bgp 400
R10(config-router)#maximum-paths 4
 
R10#show ip protocols | section bgp
 
Routing Protocol is "bgp 400"
  Outgoing update filter list for all interfaces is not set
  Incoming update filter list for all interfaces is not set
  IGP synchronization is disabled
  Automatic route summarization is disabled
  Neighbor(s):
    Address          FiltIn FiltOut DistIn DistOut Weight RouteMap
    9.9.9.9
    11.11.11.11
    12.12.12.12
    200.2.10.2
    200.3.10.3
    200.10.17.17
  Maximum path: 4
  Routing Information Sources:
    Gateway         Distance      Last Update
    12.12.12.12          200      00:22:51
    200.2.10.2            20      00:00:09
    200.3.10.3            20      00:22:18
    200.10.17.17          20      00:22:48
  Distance: external 20 internal 200 local 200

Now that the BGP default has been modified to allow multiple equal-cost paths to be selected as the best path from the BGP table, the second criterion deals with how BGP determines whether two paths are equal. This is a simple calculation for IGPs, which use metric values to determine what specific route is more preferred over another. If the metric value ties between two routes learned by an IGP, the IGP will automatically list them as candidates for multipath routing. BGP does not use metric values in this way.

As discussed earlier, BGP does not include the concept of traditional metrics, as IGPs do. Instead, it relies on its path attributes to determine degrees of preference for all received paths. Logically speaking, if BGP determined that particular path attributes are equal between two competing paths, then it could provide both paths as routes to the RIB of the local router. The only thing BGP would need to do is determine which path attributes should be equal and integrate such a check into the best-path algorithm it already uses. BGP includes this functionality but with different criteria.

First, it must be established that the BGP algorithm always chooses a single best path. No matter what multipath settings are applied, there is always a solitary best path selected in the BGP table for all prefixes. Any additional paths that are to be installed in the RIB are selected based on how much they match the chosen best path.

Step 9 in the best-path algorithm performs this step. If, for a particular competing path, certain attributes are equal to the current best path, then step 9 calls for a check of the multipath settings for the router. BGP selects as many equal paths as the maximum path settings allow. In the above, R10’s maximum path setting was increased to four. Thus, BGP would install up to four paths from the BGP table into the RIB.

So, in short, these are the two criteria for selecting multiple paths in the BGP table:

  • The maximum path setting must be set to allow more than one path.

  • Certain attributes of the path in question must match the same attributes of the current best path.

Although the specific attributes that must match vary depending on which version of the maximum-paths command is used, at least the following attributes must be equal to the current best path:

  • WEIGHT

  • LOCAL_PREF

  • AS_PATH length

  • ORIGIN

  • MED

  • AS_PATH SEQUENCE

The next subsections review the specific requirements for each maximum-paths command variant.

External Paths In order for BGP to consider two or more external paths as equal-cost paths suitable for multipathing, in addition to the attributes listed above, the following additional requirements must be met:

  • The path must be learned from an external or confederation-external BGP neighbor.

  • The IGP metric to the next hop should be equal to the best-path IGP metric.

In other words, the path must be an external path or must be learned from an external confederation peer, and the router’s own metric to reach the next hop should be the same. These concepts are proven next, using R10. In the following configuration, the previous maximum-paths 4 command is removed. R10 receives a mix of internal and external paths to reach the 130.7.1.1/32 prefix shown below:

R10(config)#router bgp 400
R10(config-router)#no maximum-paths 4
 
R10#show ip bgp 130.7.1.1
 
BGP routing table entry for 130.7.1.1/32, version 58
Paths: (4 available, best #3, table default)
  Advertised to update-groups:
     1          2
  Refresh Epoch 2
  300
    11.11.11.11 (metric 11) from 11.11.11.11 (11.11.11.11)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  300
    200.3.10.3 from 200.3.10.3 (3.3.3.3)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  300
    200.2.10.2 from 200.2.10.2 (2.2.2.2)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 2
  200 300
    200.10.17.17 from 200.10.17.17 (17.17.17.17)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

The path from R17 is not considered a best path because of the longer AS_PATH length. The path from R11 is also not considered the best path because it is an internal BGP route, unlike the external BGP routes received from R2 and R3. The paths from R2 and R3 are identical up to step 10. Without multipath enabled, the decision is based on whichever path is older. In this case, the path received from R2 is older or was marked as best before the path to R3, and thus R10 chooses R2’s path as its best path.

Note

The results of this best-path calculation are highly dependent upon the timing in which R10 received the paths from R2 and R3. If R10 received both paths at the same time before running the best-path algorithm to select a best path, then R2’s path will be chosen over R3’s because of its lower BGP RID.

However, if R3’s path is received before R2’s and marked as best, then R10 will retain R3 as its best path because it is the older route. These concepts are explained in greater detail in steps 10 and 11.

R10 submits the above best path via R2 to the RIB. This can be confirmed with the show ip route 130.7.1.1 output on R10:

R10#show ip route 130.7.1.1
 
Routing entry for 130.7.1.1/32
  Known via "bgp 400", distance 20, metric 0
  Tag 300, type external
  Last update from 200.2.10.2 00:08:59 ago
  Routing Descriptor Blocks:
  * 200.2.10.2, from 200.2.10.2, 00:08:59 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 300
      MPLS label: none

R10 has an additional path to the same prefix in its BGP RIB that appears to be virtually identical to its best path through R2. However, because the default maximum path setting is 1, it will only install the best path in the RIB. In order for R10 to consider the extra external path, the maximum-paths 2 command is configured on R10. Remember that the maximum-paths command followed by a number only considers external paths for equal-cost multipathing.

R10(config)#router bgp 400
R10(config-router)#maximum-paths 2

The output of the show ip protocols command confirms that the maximum paths setting has taken effect:

R10#show ip protocols | s bgp
 
Routing Protocol is "bgp 400"
  Outgoing update filter list for all interfaces is not set
  Incoming update filter list for all interfaces is not set
  IGP synchronization is disabled
  Automatic route summarization is disabled
  Neighbor(s):
    Address          FiltIn FiltOut DistIn DistOut Weight RouteMap
    9.9.9.9
    11.11.11.11
    12.12.12.12
    200.2.10.2
    200.3.10.3
    200.10.17.17
  Maximum path: 2
  Routing Information Sources:
    Gateway         Distance      Last Update
    12.12.12.12          200      00:29:17
    200.10.17.17          20      00:29:16
    200.3.10.3            20      00:00:54
    200.2.10.2            20      00:20:03
  Distance: external 20 internal 200 local 200

With the new maximum path setting in effect, R10’s BGP table shows the following:

R10#show ip bgp 130.7.1.1
 
BGP routing table entry for 130.7.1.1/32, version 122
Paths: (4 available, best #3, table default)
Multipath: eBGP
  Advertised to update-groups:
     1          2
  Refresh Epoch 2
  300
    11.11.11.11 (metric 11) from 11.11.11.11 (11.11.11.11)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  300
    200.3.10.3 from 200.3.10.3 (3.3.3.3)

      Origin IGP, localpref 100, valid, external, multipath(oldest)
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  300
    200.2.10.2 from 200.2.10.2 (2.2.2.2)
      Origin IGP, localpref 100, valid, external, multipath, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 2
  200 300
    200.10.17.17 from 200.10.17.17 (17.17.17.17)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

Now R10 marks the external paths received from R3 and R2 as multipath, with R2’s path as best. The multipath designation indicates that R10 has sent both paths to the RIB for multipath installation consideration. This can be seen in R10’s routing table for the same prefix:

R10#show ip route 130.7.1.1
 
Routing entry for 130.7.1.1/32
  Known via "bgp 400", distance 20, metric 0
  Tag 300, type external
  Last update from 200.2.10.2 00:00:11 ago
  Routing Descriptor Blocks:
  * 200.3.10.3, from 200.3.10.3, 00:00:11 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 300
      MPLS label: none
    200.2.10.2, from 200.2.10.2, 00:00:11 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 300
      MPLS label: none

It is important to understand that even though BGP sends both paths to the RIB to be installed, BGP will still only advertise one path as its best path to its other BGP neighbors. To demonstrate this, in this example, R10 has selected R2’s path as its best path. The output of the show ip bgp neighbor 12.12.12.12 advertised-routes command shows that this is the same path that R10 advertises to R12 as its iBGP neighbor:

R10#show ip bgp neighbor 12.12.12.12 advertised-routes
 
BGP table version is 27, local router ID is 10.10.10.10
Status codes: s suppressed, d damped, h history, * valid, > best,
i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path,
f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
 
     Network          Next Hop           Metric LocPrf Weight Path
 *>   110.19.1.1/32    200.10.17.17                       0 300 100 i
 *>   110.19.2.1/32    200.10.17.17                       0 300 100 i
 *>   120.18.1.1/32    200.10.17.17                       0 200 i
 *>   120.18.2.1/32    200.10.17.17                       0 200 i
 *>   130.7.1.1/32     200.2.10.2                         0 300 i

This behavior is not unlike IGP behavior—specifically with distance vector protocols. Distance vector IGPs advertise only a single route to reach a destination prefix, even if they have multiple routes stored in their IGP topology tables and stored in the RIB. Similarly, BGP advertises a single best path while installing multiple equal-cost paths.

External and Internal Paths The example above shows a configuration where R10 is allowed to install its additional external equal-cost path into the RIB. However, if you look again at R10’s show ip bgp 130.7.1.1 output, you see that there is another path that could potentially be installed in the routing table:

R10#show ip bgp 130.7.1.1
 
BGP routing table entry for 130.7.1.1/32, version 27
Paths: (4 available, best #3, table default)
Multipath: eBGP
  Advertised to update-groups:
     2          3
  Refresh Epoch 2
  300
    11.11.11.11 (metric 11) from 11.11.11.11 (11.11.11.11)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0

  Refresh Epoch 1
  300
    200.3.10.3 from 200.3.10.3 (3.3.3.3)
      Origin IGP, localpref 100, valid, external, multipath(oldest)
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  300
    200.2.10.2 from 200.2.10.2 (2.2.2.2)
      Origin IGP, localpref 100, valid, external, multipath, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 2
  200 300
    200.10.17.17 from 200.10.17.17 (17.17.17.17)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

The path received from R11 ties with the paths received from R2 and R3 except for one thing: The path received from R11 is an internal path, while the current best path received from R2 is an external path. With the maximum-paths 2 command, only external paths are considered for multipathing.

If you wanted to include any potential equal path, regardless of whether it was an internal or external path, you would use the maximum-paths eibgp command. In this case, the router will designate all paths as multipath candidates if the following attributes are the same as for its current best path:

  • WEIGHT

  • LOCAL_PREF

  • AS_PATH

  • ORIGIN

  • MED

  • AS_PATH SEQUENCE

Basically, the requirements are that steps 1–6 should all result in a tie when compared to the current best path. To configure this command, the maximum-paths 2 command should be removed from the configuration on R10 and replaced with the maximum-paths eibgp 3 command. Note that the number increases from 2 to 3. This ensures that three total paths (including the best path) can be marked for multipathing. This configuration is shown below:

R10 After maximum-path eibgp 3:

R10(config)#router bgp 400
R10(config-router)#no maximum-paths 2
R10(config-router)#maximum-paths eibgp 3
R10#show ip bgp 130.7.1.1
 
BGP routing table entry for 130.7.1.1/32, version 144
Paths: (4 available, best #3, table default)
Multipath: eiBGP
  Advertised to update-groups:
     1          2
  Refresh Epoch 2
  300
    11.11.11.11 (metric 11) from 11.11.11.11 (11.11.11.11)
      Origin IGP, metric 0, localpref 100, valid, internal,
 multipath
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  300
    200.3.10.3 from 200.3.10.3 (3.3.3.3)
      Origin IGP, localpref 100, valid, external, multipath(oldest)
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  300
    200.2.10.2 from 200.2.10.2 (2.2.2.2)
      Origin IGP, localpref 100, valid, external, multipath, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 2
  200 300
    200.10.17.17 from 200.10.17.17 (17.17.17.17)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

After you configure this command, the internal path from R11 and the external paths from R2 and R3 are both designated as multipath in the BGP table, as shown above. These paths are once again installed in the routing table, as shown in the output below:

R10#show ip route 130.7.1.1
 
Routing entry for 130.7.1.1/32
  Known via "bgp 400", distance 20, metric 0
  Tag 300, type external
  Last update from 200.2.10.2 00:00:55 ago
  Routing Descriptor Blocks:
  * 200.3.10.3, from 200.3.10.3, 00:00:55 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 300
      MPLS label: none
    200.2.10.2, from 200.2.10.2, 00:00:55 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 300
      MPLS label: none
    11.11.11.11, from 11.11.11.11, 00:00:55 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 300
      MPLS label: none

The maximum-paths eibgp command includes equal internal and external paths in the calculation, as expressed above. Even though the multipath configuration includes both internal and external paths, there will still be only one best path. In this example, the best path is the path received from R2. All other multipath-capable paths are selected if they contain attributes that are equal to the path R10 received from R2.

Note

When entering the maximum-paths eibgp command, the following output can be observed:

R10(config-router)#maximum-paths eibgp 3
 
%BGP: This may cause traffic loop if not used properly (command
accepted)
R10(config-router)#
*Jun 28 14:35:08.029: %BGP-4-MULTIPATH_LOOP: This may cause traffic
loop if not used properly (command accepted).

This message serves as a warning when you use the eiBGP multihop feature. In certain situations, such as when an iBGP path may have inconsistent next hops that lead back to the calculating router, loops may be formed when using this form of multipathing.

Note

Even though AS_PATH SEQUENCE must match exactly for multipath decision making, the bgp bestpath as-path multipath-relax hidden command removes this consideration. When configured, the router will consider all paths with the same AS_PATH length—not necessarily the same SEQUENCE—as potential multipath candidates.

Without this command, the AS_PATH SEQUENCE of the multipath candidate must equal the AS_PATH SEQUENCE of the best path.

Internal Paths Finally, the maximum-paths ibgp command can be used to have the router select multipath paths for only internally learned paths. When selecting multipath paths for internal paths, the following additional attributes must be the same between the candidate multipath path and the current best path:

  • Both paths must be learned from an internal neighbor.

  • The IGP metric to the BGP next hop should be equal to the best path.

To demonstrate this feature, R12’s BGP table for the 130.7.1.1 prefix is shown below:

On R12:

R12#show ip bgp 130.7.1.1
 
BGP routing table entry for 130.7.1.1/32, version 112
Paths: (2 available, best #1, table default)
  Advertised to update-groups:
     1
  Refresh Epoch 1
  300, (Received from a RR-client)
    10.10.10.10 (metric 11) from 10.10.10.10 (10.10.10.10)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  300, (Received from a RR-client)
    11.11.11.11 (metric 11) from 11.11.11.11 (11.11.11.11)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0

Here, R12 receives two iBGP paths to reach the prefix. It has marked its path from R10 as the best path because of the lower BGP RID. For the path through R11, however, all of the required path attributes for multipath are equal with R12’s current best path through R10.

The maximum-paths ibgp 2 command is configured under BGP router configuration mode on R12 to allow R12 to install the extra path through R11 as multipath in the RIB. The results are shown in the output below. R12 designates both internal paths as multipaths, with the path received from R10 as the best path.

On R12:

R12(config)#router bgp 400
R12(config-router)#maximum-paths ibgp 2
 
R12#show ip bgp 130.7.1.1
 
BGP routing table entry for 130.7.1.1/32, version 120
Paths: (2 available, best #1, table default)
Multipath: iBGP
  Advertised to update-groups:
     1
  Refresh Epoch 1
  300, (Received from a RR-client)
    10.10.10.10 (metric 11) from 10.10.10.10 (10.10.10.10)
      Origin IGP, metric 0, localpref 100, valid, internal, multipath,
best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  300, (Received from a RR-client)
    11.11.11.11 (metric 11) from 11.11.11.11 (11.11.11.11)
      Origin IGP, metric 0, localpref 100, valid, internal,
multipath(oldest)
      rx pathid: 0, tx pathid: 0

As you have seen in this section, the BGP multipath settings allow BGP to act like a normal IGP and install multiple paths into the routing table if certain attributes are equal to its current best path. When configuring BGP multipathing, the following should be considered:

  • Potential multipath paths are compared to the current BGP best path.

  • It is not possible to configure maximum-paths or maximum-paths ibgp along with maximum-paths eibgp.

  • It is possible to configure maximum-paths along with maximum-paths ibgp.

The first point is just another friendly reminder that the multipath comparison is made by comparing the path attributes of the additional path to the current BGP best path. If multipathing isn’t working, start by verifying that the proper set of attributes matches between the two paths.

The second point enforces the requirement that if both internal and external paths to the same prefix are to be considered for multipathing, it is not possible to enable multipathing for only one set of paths. In other words, the administrator must choose whether only internal or external paths to the same prefix are considered for multipathing. This is because the maximum-paths eibgp command would include a subset of the other two commands. Including this subset makes it incompatible for consideration.

On the other hand, the second point emphasizes that it is possible to configure the maximum-paths and maximum-paths ibgp commands together. This use is not detailed in this document.

Step 10: Oldest Route

Note

Before starting this section, revert the configuration on all routers to the base initial configuration files provided with the lab.

The next steps of the BGP best-path algorithm introduce a series of tie-breaker conditions that are designed to help deterministically choose a best path from two paths that are virtually identical. At this point, the two paths under consideration have similar attributes and are either both external or both internal paths. Step 10 of the best-path algorithm specifically deals with stability between external peers.

Put simply, at this step, BGP prefers an external path that has already become best over any competing external path with the same attributes. This ensures that BGP does not unnecessarily introduce route flaps into the BGP process run with its external peers by informing it of a path that is only superior to the local BGP router’s already chosen best path based on criteria following this step.

The show ip bgp output lists paths in reverse order from when they were received. That is, paths at the bottom of the list were received first, and paths at the top of the list were received last.

This step only applies to external paths and is skipped if any of the following are true:

  • The router ID is the same for multiple paths, indicating that the path was learned from the same router.

  • There is currently no best path, indicating that the current best path was lost or has never been selected.

  • The bgp best-path compare-routerid command has been enabled. This is explained in step 11 of the best-path algorithm.

Because this section is heavily dependent on a specific order of operations, the following has been applied to the topology in order to achieve consistent results:

  • The R17/R10 peering is shut down.

  • The R17/R10 peering is brought up.

To demonstrate how this step functions, examine the output of the show ip bgp 140.15.1.1 output on R17:

On R17:

R17(config)#router bgp 200
R17(config-router)#neighbor 200.10.17.10 shutdown
 
%BGP-5-NBR_RESET: Neighbor 200.10.17.10 reset (Admin. shutdown)
 
%BGP-5-ADJCHANGE: neighbor 200.10.17.10 Down Admin. shutdown
 
%BGP_SESSION-5-ADJCHANGE: neighbor 200.10.17.10 IPv4 Unicast topology
base removed from session  Admin. shutdown
 
R17(config-router)#no neighbor 200.10.17.10 shutdown

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 200.10.17.10 Up
 
 
R17#show ip bgp 140.15.1.1
 
BGP routing table entry for 140.15.1.1/32, version 28
Paths: (2 available, best #2, table default)
  Advertised to update-groups:
     1          2
  Refresh Epoch 1
  400
    200.10.17.10 from 200.10.17.10 (10.10.10.10)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  400
    200.9.17.9 from 200.9.17.9 (9.9.9.9)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0

R17 receives two paths for the 140.15.1.1/32 prefix: one from R9 and one from R10. Between the two competing prefixes, the WEIGHT, LOCAL_PREF, AS_PATH, ORIGIN, and MED are all tied. In addition, both paths are external paths. The deciding factor between the two paths is that the path from R9 was received first, as indicated by its being the last path listed in the output. Since R9 is the oldest route, BGP prefers R9.

To prove this point, the peering between R17 and R9 is shut down and brought back up again. R17 removes all prefixes learned from R9 as the peering goes down. At this point, the only path R17 has is from R10, which it now considers best. When the peering is restored to R9, R9 advertises its path back to R17. R17 runs the best-path algorithm for the two paths again. The result of the calculation is that R17 will continue to retain its current best path via R10 as its oldest path.

R17(config)#router bgp 200
R17(config-router)#neighbor 200.9.17.9 shut

You should see the following console messages:

%BGP-5-NBR_RESET: Neighbor 200.9.17.9 reset (Admin. shutdown)
%BGP-5-ADJCHANGE: neighbor 200.9.17.9 Down Admin. shutdown
%BGP_SESSION-5-ADJCHANGE: neighbor 200.9.17.9 IPv4 Unicast topology
base removed from session  Admin. shutdown
 
R17(config-router)#no neighbor 200.9.17.9 shut

You should see the following console messages:

%BGP-5-ADJCHANGE: neighbor 200.9.17.9 Up
 
R17#show ip bgp 140.15.1.1
 
BGP routing table entry for 140.15.1.1/32, version 20
Paths: (2 available, best #2, table default)
  Advertised to update-groups:
     1          2
  Refresh Epoch 1
  400
    200.9.17.9 from 200.9.17.9 (9.9.9.9)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  400
    200.10.17.10 from 200.10.17.10 (10.10.10.10)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0

A key aspect of understanding this step comes whenever the “best” path is not also the oldest path. In the case above, R10’s path was retained as best, and it was the oldest path. This provided for a clear basic understanding. To fully grasp the functionality, however, you need to understand the case where the “best” path isn’t also the oldest path. The path to the prefix 120.18.1.1/32 on R2 can help demonstrate.

R2 has three paths to reach this prefix. Of those three paths, R2 chooses the path through R16 as its best path because it has the lower AS_PATH length. Before looking at the output on R2, R2’s peerings need to be reset as follows:

  1. The R20 and R2/R16 peerings are shut down.

  2. The R2/20 peering is brought up

  3. The R2/R16 peering is brought up.

Now, the output of show ip bgp 120.18.1.1 on R2 should appear in the following order:

On R2:

R2#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 20
Paths: (3 available, best #1, table default)
  Advertised to update-groups:
     8          9          10
  Refresh Epoch 1
  200
    200.2.16.16 from 200.2.16.16 (16.16.16.16)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  100 200
    200.2.20.20 from 200.2.20.20 (20.20.20.20)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  400 200
    200.2.10.10 from 200.2.10.10 (10.10.10.10)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

In output above, R16’s path is chosen as the best path because it has a lower AS_PATH length attribute than the paths received from R10 and R20. This determination is made in step 4 of the best-path algorithm. If step 4 in the AS_PATH algorithm were skipped, then processing would eventually fall to step 10, where BGP would prefer the oldest route.

This scenario can be tested using the bgp bestpath as-path ignore command on R2. After making this change, R2 ignores step 4 of the best-path algorithm when evaluating best paths. In this case, the expectation is that R10’s path would be chosen because it was received before the path received from R16. However, R2 still chooses R16, as shown below:

R2(config)#router bgp 312
R2(config-router)#bgp bestpath as-path ignore
 
show ip bgp 120.18.1.1
BGP routing table entry for 120.18.1.1/32, version 20
Paths: (3 available, best #1, table default)
  Advertised to update-groups:
     8          9          10
  Refresh Epoch 1
  200
    200.2.16.16 from 200.2.16.16 (16.16.16.16)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  100 200
    200.2.20.20 from 200.2.20.20 (20.20.20.20)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  400 200
    200.2.10.10 from 200.2.10.10 (10.10.10.10)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

R2 chooses 16 again because BGP has determined that the three paths are all the same. Since R2 previously marked the path through R16 as best, in compliance with step 10 of the best-path algorithm, R2 will not swap the best path to R10. To do so, R2 would have to first lose R2’s path in its BGP table, prompting a new evaluation for the best path, as demonstrated below:

! Shutting down the peering between R2 and R16
 
R2(config)#router bgp 312
R2(config-router)#neighbor 200.2.16.16 shutdown
 
Neighbor 200.2.16.16 reset (Admin. shutdown)
neighbor 200.2.16.16 Down Admin. Shutdown

neighbor 200.2.16.16 IPv4 Unicast topology base removed from session
Admin. Shutdown
 
! Bringing back the peering between R2 and R16
 
R2(config-router)#no neighbor 200.2.16.16 shutdown
%BGP-5-ADJCHANGE: neighbor 200.2.16.16 Up
 
 
R2#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 27
Paths: (4 available, best #4, table default)
  Advertised to update-groups:
     1          2          3
  Refresh Epoch 1
  200
    200.2.16.16 from 200.2.16.16 (16.16.16.16)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  (336) 400 200
    3.3.3.3 (metric 11) from 3.3.3.3 (3.3.3.3)
      Origin IGP, metric 0, localpref 100, valid, confed-external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  100 200
    200.2.20.20 from 200.2.20.20 (20.20.20.20)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  400 200
    200.2.10.10 from 200.2.10.10 (10.10.10.10)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0

Here, R2 shuts down its peering to R16. This causes it to lose its best path. R2 then decides between the path from R10 and the path from R20 remaining in its BGP table. The path to R10 is chosen between the two paths, but this time it isn’t due to the older route but because of step 11, covered below.

R2 skips the oldest-path comparison because when it loses its best path R16, there is no longer an existing best path in its table. As covered above, if there is no current best path, then BGP does not evaluate the oldest route.

During convergence, R2 receives a path from R3 as well. R10’s path is preferred over R3’s because R10’s path is an external path, while R3’s path is a confederation-external path. Finally, the R16 peering comes up again, and R2 receives R16’s path again. This time, because it is still ignoring AS_PATH calculations, it retains R10 as its best path because R10 is the oldest path.

So step 10 of the best-path algorithm can be put succinctly as follows: If there is already a current best path, continue using that same best path if all other attributes are equal. So, it isn’t necessarily strictly the oldest path that is always selected; rather, the oldest best path is selected.

Note

If the bgp bestpath as-path ignore command were removed from the configuration and the BGP peering were reset, R10 would again select R16 as its best path. This is proven in the output below:

R2:
router bgp 312
 no bgp bestpath as-path ignore
R2#clear ip bgp * soft in
R2#show ip bgp 120.18.1.1
BGP routing table entry for 120.18.1.1/32, version 31
Paths: (3 available, best #1, table default)
  Advertised to update-groups:
     1          2          3
  Refresh Epoch 2
  200
    200.2.16.16 from 200.2.16.16 (16.16.16.16)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 3
 100 200
    200.2.20.20 from 200.2.20.20 (20.20.20.20)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 3
  400 200
    200.2.10.10 from 200.2.10.10 (10.10.10.10)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
Step 11: Lowest Router ID

Note

Before starting this section, revert the configuration on all routers to the base initial configuration files provided with the lab.

Step 11 in the best-path algorithm considers the router ID assigned to the BGP router. The BGP router ID is a 32-bit value that is automatically assigned to a router in the following way:

  1. If a loopback address exists, the highest IP address of a loopback on the router is used as the BGP router ID

  2. The highest IP address assigned to a non-shut physical interface is used as the BGP router ID

The administrator is also given the option to manually set the BGP router ID for better control over its value. This is accomplished using the bgp router-id command in BGP router configuration mode.

At this step, BGP prefers the path that was learned from the router with the lower BGP router ID. There isn’t much to this particular preference. It is a mostly arbitrary decision—with some interesting caveats. First, to demonstrate the functionality, R12’s 120.18.1.1/32 prefix is examined below:

On R12:

R12#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 74
Paths: (2 available, best #1, table default)
  Advertised to update-groups:
     1
  Refresh Epoch 1
  200, (Received from a RR-client)
    9.9.9.9 (metric 11) from 9.9.9.9 (9.9.9.9)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  200, (Received from a RR-client)
    10.10.10.10 (metric 11) from 10.10.10.10 (10.10.10.10)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0

R12 receives two paths to reach this particular prefix: one from R9 with router ID 9.9.9.9 and one from R10 with router ID 10.10.10.10. R12 decides to choose the path with the lower RID as its best path. To prove that lower router ID is the deciding factor, R10’s router ID is modified into something lower than R9’s in the following configuration:

On R10:

R10(config)#router bgp 400
R10(config-router)#bgp router-id 1.1.1.10

You should see the following console messages:

%BGP-5-ADJCHANGE: neighbor 9.9.9.9 Down Router ID changed
%BGP_SESSION-5-ADJCHANGE: neighbor 9.9.9.9 IPv4 Unicast topology base
removed from session  Router ID changed

Changing the router ID on R10 causes R10 to reset all of its BGP sessions. As a result, R12 now chooses the path through R10 as the best path because of its lower BGP router ID.

On R12:

R12#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 86
Paths: (2 available, best #1, table default)
  Advertised to update-groups:
     1
  Refresh Epoch 1
  200, (Received from a RR-client)
    10.10.10.10 (metric 11) from 10.10.10.10 (1.1.1.10)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  200, (Received from a RR-client)
    9.9.9.9 (metric 11) from 9.9.9.9 (9.9.9.9)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0

There are two caveats to how this particular step is processed, both having to do with paths with route reflector attributes attached to them. Route reflectors are an iBGP scaling mechanism whereby a router is designated as a route reflector. The route reflector is statically configured with a set of route reflector clients. When advertising routes to its clients, the route reflector is allowed to relax the iBGP split-horizon rules and advertise internal paths to each of its clients.

When the route reflector reflects a path, it adds attributes to the paths in the UPDATE messages sent to its clients. The two most important attributes are Originator ID and Cluster List. The route reflector keeps track of which BGP router originally advertised a path to the route reflector. This attribute is important in preventing loops. If a client receives a path with its own BGP router ID in the Originator ID attribute, it will reject the path, preventing loops. This can be seen in the following capture of the UPDATE packet R12 sends to R11:

Internet Protocol Version 4, Src: 12.12.12.12, Dst: 11.11.11.11
Transmission Control Protocol, Src Port: 36963, Dst Port: 179,
Seq: 115, Ack: 218, Len: 242
Border Gateway Protocol - UPDATE Message
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 81
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 48
    Path attributes
        Path Attribute - ORIGIN: IGP
        Path Attribute - AS_PATH: 200
        Path Attribute - NEXT_HOP: 10.10.10.10
        Path Attribute - MULTI_EXIT_DISC: 0
        Path Attribute - LOCAL_PREF: 100
        Path Attribute - CLUSTER_LIST: 12.12.12.12
        Path Attribute - ORIGINATOR_ID: 1.1.1.10
    Network Layer Reachability Information (NLRI)
        120.18.1.1/32
        120.18.2.1/32

In the UPDATE message above, R12, having selected the path through R10 with new RID 1.1.1.10 as the best path, reflects that path in its BGP UPDATE message to R11. Within the UPDATE message, R12 has added the Cluster List and Originator ID attributes. In this case, Originator ID is given the value of R10’s RID 1.1.1.10 because R10 was the originator of the path. Cluster List, on the other hand, lists R12’s RID because R12 is a route reflector participating in the route reflector cluster 12.12.12.12.

The Cluster List attribute keeps track of how many route reflector clusters the path has traversed. Multiple route reflectors may exist together in larger iBGP environments. Route reflectors identify themselves using the cluster ID. The cluster ID designates which route reflectors service the same clients and helps prevent loops within the route reflector environment. When a route reflector reflects a path, it adds its own cluster ID to the Cluster Length attribute. Route reflectors will not accept paths with the local cluster ID in the Cluster List attribute.

When processing step 11 of the best-path algorithm, if one of the paths contains route reflector attributes, the Originator ID attribute is used instead of the router ID as the comparison value. The 120.18.1.1 prefix on R11 helps you understand how this comparison works. Before examining the BGP table, first, the competing paths from R10 and R12 need to tie in all steps through step 10. Without modification, R11 will choose the path through R10 because of its lower metric to the next hop. To engineer the tie condition, the ip ospf cost 20 command is issued on the VLAN 1011 interface on R11 to even out the metrics. In addition, R10’s BGP RID is set back to 10.10.10.10:

On R10:

R10(config)#router bgp 400
R10(config-router)#bgp router-id 10.10.10.10
 

On R11:

R11(config)#interface e0/0.1011
R11(config-subif)#ip ospf cost 20
 
R11#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 24
Paths: (3 available, best #2, table default)
  Advertised to update-groups:
     1          3
  Refresh Epoch 1
  300 200
    200.6.11.6 from 200.6.11.6 (6.6.6.6)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  200
    9.9.9.9 (metric 21) from 12.12.12.12 (12.12.12.12)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Originator: 9.9.9.9, Cluster list: 12.12.12.12
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  200
    10.10.10.10 (metric 21) from 10.10.10.10 (10.10.10.10)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0

Now that the metrics tie, R11 is left to use the lower router ID to determine the best path. In the output above, it seems that R11 has incorrectly chosen R12’s path with RID 12.12.12.12 rather than R10’s path with router ID 10.10.10.10. This is not the case. The path received from R12 has route reflector attributes. Instead of comparing R12’s router ID 12.12.12.12 with R10’s RID 10.10.10.10, R11 compares the Originator ID value 9.9.9.9. The Originator ID value 9.9.9.9 is lower than the router ID 10.10.10.10, and R11 chooses R12’s path over R10’s, as expected.

This decision-making process is outlined in Section 9 of RFC 4456:

9. Impact on Route Selection
  • The BGP Decision Process Tie Breaking rules (Sect. 9.1.2.2, [1]) are

  • modified as follows:

    • If a route carries the ORIGINATOR_ID attribute, then in Step f)

    • the ORIGINATOR_ID SHOULD be treated as the BGP Identifier of the

    • BGP speaker that has advertised the route.

The reason for this preference relates to the purpose of the route reflector. The route reflector is used as a route server. Its primary job is to advertise paths on behalf of other internal peers. As a result, it doesn’t necessarily have to be in the data plane. In such a capacity, it acts as a control plane device. Thus, the router ID of a reflected path is representative of the route reflector and not the original router. This step of the BGP best-path algorithm attempts to give preference to a specific originator of a path and not the route reflector itself, so the originator ID is preferred over the router ID for paths with route reflector attributes.

The above was a comparison between originator ID and router ID. In cases where both paths are learned from a route reflector, meaning they have route reflector attributes, the originator ID is used in comparison of both paths. For this R14 is used as an example. It learns a path to the 120.18.1.1 /32 network from R11 and R13. Both R11 and R13 are route reflectors serving R14. To even up the metric to reach the next hop between the two paths, the OSPF cost on the VLAN 1114 interface on R14 has been modified to the value 20. In addition, the OSPF cost of R11’s e0/0.1011 interface has been returned to default using the no ip ospf cost 20 command:

On R11:

R11(config)#interface e0/0.1011
R11(config-subif)#no ip ospf cost 20

On R14:

R14(config)#interface e0/0.1114
R14(config-subif)#ip ospf cost 20
 
R14#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 88
Paths: (2 available, best #2, table default)
  Not advertised to any peer
  Refresh Epoch 1
  200
    10.10.10.10 (metric 31) from 11.11.11.11 (11.11.11.11)
      Origin IGP, metric 0, localpref 100, valid, internal
      Originator: 10.10.10.10, Cluster list: 11.11.11.11
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  200
    9.9.9.9 (metric 31) from 13.13.13.13 (13.13.13.13)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Originator: 9.9.9.9, Cluster list: 13.13.13.13, 12.12.12.12
      rx pathid: 0, tx pathid: 0x0

With the metrics tied, R14 prefers the path with the lower RID, which is R9, as shown above—even though the router ID associated with the path of 13.13.13.13 is higher than the path from R11, which has the router ID 11.11.11.11.

Finally, a special note for this step has to do with whether or not this step is included when comparing external paths. When evaluating external paths, processing typically stops at step 10, where BGP prefers the oldest received path. This requirement can lead to some unpredictable behavior, especially during failure scenarios in the BGP table. The bgp bestpath compare-routerid command configures the router to bypass the oldest path check entirely and to always compare the BGP router IDs between two external paths.

As proof of this concept, R17’s 140.15.1.1/32 prefix is examined. To begin, R17’s peerings to R10 and R9 are shut down. Then, the bgp bestpath compare-routerid command is issued on R17. After that, R17’s peering to R10 is brought back up.

In this state, R10 advertises the 140.15.1.1/32 prefix to R17. R17 is allowed to mark R10’s path as best:

On R17:

R17#show ip bgp 140.15.1.1

BGP routing table entry for 140.15.1.1/32, version 24
BGP Bestpath: compare-routerid
Paths: (1 available, best #1, table default)
Flag: 0x820
  Advertised to update-groups: (Pending Update Generation)
     2          3
  Refresh Epoch 2
  400
    200.10.17.10 from 200.10.17.10 (10.10.10.10)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0

At this point, under normal circumstances, when the R9 peering is brought back up, R17 would not switch to R9’s path as the best path. The R9 peering is brought back up, and the following is recorded in the BGP table on R17:

R17(config-router)#no neighbor 200.9.17.9 shutdown
 
R17#show ip bgp 140.15.1.1
 
BGP routing table entry for 140.15.1.1/32, version 26
BGP Bestpath: compare-routerid
Paths: (2 available, best #1, table default)
  Advertised to update-groups:
     2          3
  Refresh Epoch 1
  400
    200.9.17.9 from 200.9.17.9 (9.9.9.9)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 2
  400
    200.10.17.10 from 200.10.17.10 (10.10.10.10)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

R17 marks the path received from R9 as its best path over the path received from R10 due to the bgp bestpath compare-routerid command. R17 is forced to always compare the router ID between two external paths, regardless of whether it has a previously learned best path. This feature has the potential to introduce some instability in the BGP process, particularly if R17’s peering to R9 flaps. To prevent such instability, BGP features such as route dampening should be employed. This method is not expounded upon in this lab.

Step 12: Minimum Cluster List Length

Note

Before starting this section, revert the configuration on all routers to the base initial configuration files provided with the lab.

This next step in the BGP best-path algorithm provides another tie-breaker mechanism for paths with route reflector attributes. In step 11, the originator IDs of the two paths were compared, with preference given to the lower value. If the two originator IDs tie, then this step considers the Cluster Length attribute. The idea here is that because Cluster Length value is a collection of route reflector clusters that the path has traversed, comparing this value can determine which path traveled the furthest from its originator to the local router. At this step, BGP prefers the path with the lower Cluster Length value.

As an example, we can examine R14’s BGP table for the 120.18.1.1/32 prefix:

On R14:

R14#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 62
Paths: (2 available, best #1, table default)
  Not advertised to any peer
  Refresh Epoch 1
  200
    10.10.10.10 (metric 21) from 11.11.11.11 (11.11.11.11)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Originator: 10.10.10.10, Cluster list: 11.11.11.11
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 2
  200
    9.9.9.9 (metric 31) from 13.13.13.13 (13.13.13.13)
      Origin IGP, metric 0, localpref 100, valid, internal
      Originator: 9.9.9.9, Cluster list: 13.13.13.13, 12.12.12.12
      rx pathid: 0, tx pathid: 0

As it stands now, R14 has two paths for the 120.18.1.1 prefix: one from the route reflector R11 and the other from the route reflector R13. Without intervention, the originator IDs are different. To test the impacts of this step, the originator IDs need to tie.

To engineer this tie, R12 needs to prefer the path it learns from R10 over the path learned from R9. Lowering R10’s router ID to 1.10.10.10 accomplishes this goal:

On R10:

R10(config)#router bgp 400
R10(config-router)#bgp router-id 1.10.10.10

After you make the changes on R10, R12 properly selects the path from R10 as best and advertises the new path to R13 instead of advertising the path it received from R9:

On R12:

R12#show ip bgp 120.18.1.1
BGP routing table entry for 120.18.1.1/32, version 86
Paths: (2 available, best #1, table default)
  Advertised to update-groups:
     1
  Refresh Epoch 1
  200, (Received from a RR-client)
    10.10.10.10 (metric 11) from 10.10.10.10 (1.10.10.10)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 3
  200, (Received from a RR-client)
    9.9.9.9 (metric 11) from 9.9.9.9 (9.9.9.9)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0 

R12 advertises this best path to R13, which in turn advertises it to R14. The results are shown below. R14 receives two paths that both have the originator ID set to 1.10.10.10. Recall that Originator ID is a BGP attribute that is created by the route reflector that carries the router ID of the originator of the route, which is 1.10.10.10 in this case.

On R14:

R14#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 72
Paths: (2 available, best #1, table default)
  Not advertised to any peer

  Refresh Epoch 1
  200
    10.10.10.10 (metric 21) from 11.11.11.11 (11.11.11.11)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Originator: 1.10.10.10, Cluster list: 11.11.11.11
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 2
  200
    10.10.10.10 (metric 21) from 13.13.13.13 (13.13.13.13)
      Origin IGP, metric 0, localpref 100, valid, internal
      Originator: 1.10.10.10, Cluster list: 13.13.13.13, 12.12.12.12
      rx pathid: 0, tx pathid: 0

When R14 processes the two paths, the Originator ID values for the two paths tie. R14 falls to step 12 of the best-path algorithm and compares the Cluster Length attributes of the two paths. The path received from R11 has a smaller Cluster Length value, and so R11 is the best path.

Note

You have just seen what happens between paths that both contain route reflector attributes, but what happens if a router is connected to both a route reflector and a normal iBGP peer? To answer this question, we can consult R11’s BGP table:

R11#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 18
Paths: (3 available, best #1, table default)
  Advertised to update-groups:
     1          2
  Refresh Epoch 1
  200
    10.10.10.10 (metric 11) from 10.10.10.10 (1.10.10.10)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 2
  200
    10.10.10.10 (metric 11) from 12.12.12.12 (12.12.12.12)
      Origin IGP, metric 0, localpref 100, valid, internal
      Originator: 1.10.10.10, Cluster list: 12.12.12.12
      rx pathid: 0, tx pathid: 0

  Refresh Epoch 1
  300 200
    200.6.11.6 from 200.6.11.6 (6.6.6.6)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

R11 learns the prefix 120.18.1.1 from R10 and R12. The originator ID and router ID tie here, so the decision-making process falls to step 12, which focuses on the minimum cluster list length. However, the path received directly from R10 has no cluster list included. This is because R10 is not servicing R11 as a route reflector and is instead a normal iBGP peer to R11. The path received from R12 has route reflector attributes, including Cluster List. RFC 4456 makes concessions for such a situation in Section 9:

9. Impact on Route Selection

The BGP Decision Process Tie Breaking rules (Sect.  9.1.2.2, [1]) are

modified as follows:

If a route carries the ORIGINATOR_ID attribute, then in Step f)

the ORIGINATOR_ID SHOULD be treated as the BGP Identifier of the

BGP speaker that has advertised the route.

In addition, the following rule SHOULD be inserted between Steps

f) and g): a BGP Speaker SHOULD prefer a route with the shorter

CLUSTER_LIST length.  The CLUSTER_LIST length is zero if a route

does not carry the CLUSTER_LIST attribute.

A path with a missing cluster length is considered to have a Cluster Length value of 0. R10’s path has a Cluster Length value of 0 (missing), while R12 has a Cluster Length value of 1. Because 0 is less than 1, R11 chooses R10’s path over R12’s.

Step 13: Lowest Neighbor Address

Note

Before starting this section, revert the configuration on all routers to the base initial configuration files provided with the lab.

The final factor in deciding a best path between two different paths is the neighbor peering address. At this stage, all other attributes have tied, and an arbitrary decision needs to be made in order for the router to select a best path. This arbitrary decision examines the peering address over which the path was learned. The path that was learned from a neighbor with the lowest peering address is considered the best path.

To see this, we once again examine AS 400. R14 receives two paths to reach the 120.18.1.1/32 prefix:

On R14:

R14#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 6
Paths: (2 available, best #2, table default)
  Not advertised to any peer
  Refresh Epoch 1
  200
    9.9.9.9 (metric 31) from 13.13.13.13 (13.13.13.13)
      Origin IGP, metric 0, localpref 100, valid, internal
      Originator: 9.9.9.9, Cluster list: 13.13.13.13, 12.12.12.12
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  200
    10.10.10.10 (metric 21) from 11.11.11.11 (11.11.11.11)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Originator: 10.10.10.10, Cluster list: 11.11.11.11
      rx pathid: 0, tx pathid: 0x0

R14 chooses the path from R11 as best because it has a lower metric to the next hop 10.10.10.10. To properly examine the processing of the final step in the BGP best-path algorithm, the two paths R14 receives need to tie for all attributes.

One way to accomplish this is to simply ensure that both paths use the same next hop. This way, the metric to the next hop will be the same for both prefixes. Care must also be taken to ensure that the Originator ID and Cluster Length attributes tie as well. These modifications must happen to the path R11 chooses to send to R14.

R11 receives two paths to reach the 120.18.1.1/32 prefix. Without intervention, R11 chooses the path from R10 with next hop 10.10.10.10 to reach the prefix shown below:

On R11:

R11#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 6
Paths: (3 available, best #1, table default)
  Advertised to update-groups:
     1          3
  Refresh Epoch 1

  200
    10.10.10.10 (metric 11) from 10.10.10.10 (10.10.10.10)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 2
  200
    9.9.9.9 (metric 21) from 12.12.12.12 (12.12.12.12)
      Origin IGP, metric 0, localpref 100, valid, internal
      Originator: 9.9.9.9, Cluster list: 12.12.12.12
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  300 200
    200.6.11.6 from 200.6.11.6 (6.6.6.6)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

R11 receives a path with next hop of 9.9.9.9 (in red) and originator ID of 9.9.9.9 (in green). The path also contains a single cluster ID in the Cluster Length attribute (in purple). These features make R11’s path from R12 a prime candidate for being advertised to R14.

R11 needs to choose the path from R12 as its best path. It currently chooses the path from R10 as best due to the lower metric to the next hop. If this metric ties between the paths received from R10 and R12, processing will shift to the lower router ID. This is desirable as the path received from R9 has route reflector attributes, meaning the originator ID will be compared with the router ID of the path received from R10. The originator ID 9.9.9.9 is lower than the router ID 10.10.10.10, making the path received from R9 the best path.

To engineer this decision, the OSPF cost on the e0/0.1011 interface of R11 is increased to 30:

On R11:

R11(config)#interface e0/0.1011
R11(config-subif)#ip ospf cost 30

Now, R11’s cost to reach 10.10.10.10 through R10 increases to 31. OSPF chooses to install the route through R12 to reach the next hop with a metric of 21 instead, lowering the BGP metric to the next hop attribute to tie with the path received from R9:

R11#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 16
Paths: (3 available, best #2, table default)

  Advertised to update-groups:
     1          3
  Refresh Epoch 1
  200
    10.10.10.10 (metric 21) from 10.10.10.10 (10.10.10.10)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  200
    9.9.9.9 (metric 21) from 12.12.12.12 (12.12.12.12)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Originator: 9.9.9.9, Cluster list: 12.12.12.12
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  300 200
    200.6.11.6 from 200.6.11.6 (6.6.6.6)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0

In the above, R10’s path is compared against R9’s path. Because the metrics tie, lower Router ID is the deciding tie breaker. R9’s path is chosen as best over R10 because the Originator ID attribute (9.9.9.9) is lower than R10’s RID 10.10.10.10. R11 then compares R9’s path against R6’s. R6’s path has a longer AS_PATH length and R11 correctly chooses the path through R9 as best.

R11 will advertise this path to R14 as well. Below, R14’s BGP table now has two completely equal paths to reach 120.18.1.1/32. All attributes tie up through the Cluster Length attribute.

On R14:

R14#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 78
Paths: (2 available, best #1, table default)
  Not advertised to any peer
  Refresh Epoch 1
  200
    9.9.9.9 (metric 31) from 11.11.11.11 (11.11.11.11)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Originator: 9.9.9.9, Cluster list: 11.11.11.11, 12.12.12.12
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 2
  200

    9.9.9.9 (metric 31) from 13.13.13.13 (13.13.13.13)
      Origin IGP, metric 0, localpref 100, valid, internal
      Originator: 9.9.9.9, Cluster list: 13.13.13.13, 12.12.12.12
      rx pathid: 0, tx pathid: 0

With all attributes tied, processing shifts to the lowest neighbor IP address. In this case, the path from R11 is chosen as best. To prove this decision, the peering address between R14 and R11 is modified in the following steps:

  1. A loopback interface is added to R11 with IP address 20.20.20.20/32 and is advertised into OSPF.

  2. R11’s neighbor command for its peering to R14 is modified to source its OPEN messages with the new loopback interface’s address (20.20.20.20).

  3. R14’s peering to R11’s 11.11.11.11 address is shut down, and a new peering is created to R11’s new 20.20.20.20 address.

These modifications are shown below, in sequence. After the peering to R11 comes up fully, R14 again receives identical paths to reach the 120.18.1.1/32 network. This time, because the peering address to R11 is now 20.20.20.20, when evaluating step 13 of the best-path algorithm, R14 chooses the path through R13 as its best path because the peering address with R13 is now the lowest.

On R11:

R11(config)#interface lo20
R11(config-if)#ip address 20.20.20.20 255.255.255.255
R11(config-if)#ip ospf 1 area 0
 
R11(config)#router bgp 400
R11(config-router)#neighbor 14.14.14.14 update-source lo20

On R14:

R14(config)#router bgp 400
R14(config-router)#neighbor 11.11.11.11 shutdown
R14(config-router)#neighbor 20.20.20.20 remote 400

You should see the following console message:

%BGP-5-ADJCHANGE: neighbor 20.20.20.20 Up

On R14:

R14#show ip bgp 120.18.1.1
 
BGP routing table entry for 120.18.1.1/32, version 84
Paths: (2 available, best #2, table default)
  Not advertised to any peer
  Refresh Epoch 1
  200
    9.9.9.9 (metric 31) from 20.20.20.20 (11.11.11.11)
      Origin IGP, metric 0, localpref 100, valid, internal
      Originator: 9.9.9.9, Cluster list: 11.11.11.11, 12.12.12.12
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  200
    9.9.9.9 (metric 31) from 13.13.13.13 (13.13.13.13)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Originator: 9.9.9.9, Cluster list: 13.13.13.13, 12.12.12.12
      rx pathid: 0, tx pathid: 0x0
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.177.14