Chapter 10. Using NETCONF and YANG

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 10. Using NETCONF and YANG

This chapter covers

The example business case background for this chapter
How to get started, a top-down service model approach
Building from the bottom up with device templates
How to connect the service YANG model to the device YANG models
Setting up NETCONF on a couple of different devices
Discovering what your devices can do for you
Play around with creating, updating, and rolling back services
Checking that the orchestrator is synchronized with its devices
Seeing the NETCONF network-wide transactions in action

Introduction

The purpose of this chapter is to show a full story, starting with a business need and taking it all the way to verifying that it works correctly in the network. This chapter builds a somewhat real software-defined wide area network (SDWAN) service using the Network Services Orchestrator (NSO) product as the orchestrator. The first step is to design a YANG service model, map it to the device models, write a little bit of code to assist in that mapping, and then plug in some devices, configure them for NETCONF management, and try out the service with some service creation, modifications, and rollback. This chapter also examines how the service gets deployed from an orchestrator point of view as well as the NETCONF messages flying around between the orchestrator and devices. The project is available as a hands-on project you can clone from GitHub and then build, run, and play with as you please. Instructions for how to obtain the necessary free tools are found within the project README file at https://github.com/janlindblad/bookzone.

So the Story Goes

BookZone is a fictitious bookstore franchise. They already implemented a NETCONF/YANG-based solution to manage each of their store locations (see Chapter 3, “YANG Explained”). This interface is used by BookZone central management to add and remove authors and titles from individual store databases, and to monitor inventory counts.

BookZone naturally has many publishing houses as suppliers. Some of the best suppliers sell so many books with BookZone that it makes sense that the publishers have a direct connection to stores, to restock without involvement from the central office. This service is called “BookZone Store Connect.”

Now, with each store offering a NETCONF/YANG interface, it is time to manage the Store Connect service using an orchestrated network service. In order to understand how the service is implemented in the BookZone enterprise network, have a look at the network diagram shown in Figure 10-1.

A diagram depicts the setup of the Store Connect Service. — **Figure 10-1** Network Diagram of the Store Connect Service

In the Store Connect Service network setup, seven different book store interfaces are connected to the Core network of the BookZone. Three book store interfaces are connected to the core through nodes "E" and four book store interfaces are connected to the core through nodes "I." Both sets of interfaces are connected to a node "M."

The device symbols marked “E” are externally facing routers, while the ones marked “I” are internally facing routers. The system marked “M” is the monitoring system. It may consist of a multitude of nodes and measurement devices, but from the point of view of the orchestration system, this is one device with a single application programming interface (API). This idea is often referred to as a “single system view.”

This chapter uses a device-naming convention. The first letter of a device name indicates whether it”s an externally or internally facing device. The second letter indicates the brand of the device, where “c” indicates a Cisco IOS XE device and “j” refers to a Juniper JunOS device. There is also a number at the end to distinguish multiple devices of the same role and brand. The seven devices in Figure 10-1 are called ej0, ej1, ec0, ij0, ic0, ic1, and m0. The YANG module developed in this chapter also refers to devices with names starting with “ei”. The “i” refers to the Internet Engineering Task Force (IETF), meaning a device of any brand that uses the standard IETF interface YANG module.

Whenever a new publisher qualifies and signs up for the BookZone Store Connect service, the BookZone networking staff has to perform the following tasks:

Allocate a virtual local area network identity (VLAN ID) for this publisher.
Open up the firewall on relevant E routers to allow external connections from a specified set of IP addresses for each site of the publisher.
Configure the E routers to place traffic for the publisher run over the allocated VLAN.
Configure the I routers to terminate the traffic for the publisher run over the allocated VLAN.
Configure the monitoring system to continuously monitor the connectivity/quality of the setup.

The core network does not need any configuration changes just because a new publisher joins. Nor are there any changes on the configuration of the stores in this scenario.

The networking department decided on the following rules for the service:

The traffic of each publisher must run on a separate VLAN.
Each publisher may have one or more sites.
Each publisher site is connected to one E router.
Each publisher site has a single range of allowed IPv4 addresses.
Each store is connected to one I router.
Each store has one management IPv4 address.
The monitoring system must monitor all connections from the publisher site to the stores created.

With this background, the following sections are a closer look at what the BookZone network team came up with for their automation solution. They decided to implement their use cases based on the NSO platform. Note that several other controllers/orchestrators could have been used instead, such as OpenDaylight and CloudOpera, that also leverage YANG and NETCONF.

Top-Down Service Model

Generally, a “service” is developed either top down or bottom up. Which approach is the better one can be debated. To a high degree, what is better depends on the engineer doing the modeling. Network engineers often start bottom up, because then they are starting with known concepts and can abstract from there. Software engineers might start with the outward facing interface instead (that is, the service YANG model) and then drill down to figure out how that maps to the underlying infrastructure.

Let’s go with the top-down approach, starting with designing the YANG service interface as the first step toward automating the Store Connect service. A service YANG model doesn’t describe the interface of a device; instead, it describes how an operator wants to interact with the service at a high level.

The service models need to contain enough information to be used to configure all the low-level details required to configure the service on all the types of devices involved, as shown in Example 10-1. This does not mean that all low-level information must be present in the service model. Far from it. Many low-level configuration options may be hard-coded to a given value for a specific type of service, and many configuration settings may be computed or fetched from other systems by service code, so that the operator doesn’t need to figure out a suitable value.

Example 10-1 Store Connect Service YANG Module, Imports, Description, and Revision

Click here to view code image

module storeconnect {
  yang-version 1.1;
  namespace "http://example.com/storeconnect";
  prefix storeconnect;

  import ietf-inet-types {
    prefix inet;
  }
  import tailf-common {
    prefix tailf;
  }
  import tailf-ncs {
    prefix ncs;
  }
  import junos-conf-root {
    prefix jc;
  }
  import junos-conf-interfaces {
    prefix jc-interfaces;
  }
  import Cisco-IOS-XE-native {
    prefix ios;
  }
  import ietf-interfaces {
    prefix if;
  }

  description
    "Bla bla...";

  revision 2018-02-01 {
    description
      "Initial revision.";
  }

Example 10-1 shows bit of boilerplate YANG, which gives the service a name, description, and revision. It imports a number of YANG modules you’re going to reference.

The next bit in Example 10-2 declares a list of stores. Each store has a network address and is connected to an I router, which must be among the devices NSO is managing. The configuration points out a specific interface to which the store is connected. The interface name is left as a free-form string. This modeling choice is debated later.

Finally, a leaf-list of tags is given for each store, as shown in Example 10-2. This set of tags on each store is meant to reflect general attributes of the store, such as its location, which languages it carries, how large the store is, and any specialties (such as focusing on fantasy and science fiction literature). This helps publishers know which stores they should target.

Example 10-2 Store Connect Service YANG Module: The List of Stores

Click here to view code image

container stores {
  list store {
    key name;
    leaf name {
      type string;
    }
    container network {
      leaf address {
        type inet:ipv4-address;
      }
      leaf i-router {
        type leafref {
          path "/ncs:devices/ncs:device/ncs:name";
        }
      }
      leaf interface {
        type string;
      }
    }
    leaf-list tags {
      type string;
    }
  }
}

The next part, shown in Example 10-3, declares a list of publishers. Note that this is just the initial content of the publisher list. Soon, there will be more elements added to this list. Each publisher has a simple string name. The statements beginning with tailf: and ncs: are extension keywords (that is, they are declared using the extension keyword in YANG). Their meaning is proprietary to whoever declares them, but since the syntax for extension keyword is standardized, any YANG-compliant parser can read this module and get past the extensions without tripping over them. Many other languages do similar things using comments with special character sequences.

The ncs:servicepoint extension keyword tells the orchestrator that this list is a service. This means that service code registered with the name storeconnect-servicepoint will be invoked as changes are made to this part of the configuration. The service point is usually located in a list so that an operator can configure many instances of the service. This is illustrated by Example 10-3.

Example 10-3 Store Connect Service YANG Module: The List of Publishers, Core Part

Click here to view code image

container publishers {
  list publisher {
    description "Storeconnect service for BookZone publishers";

    key name;
    leaf name {
      tailf:info "Name of publisher connecting";
      tailf:cli-allow-range;
      type string;
    }

    uses ncs:service-data;
    ncs:servicepoint storeconnect-servicepoint;

Example 10-4 shows container network, which contains a list of publisher sites. Each site has a name and an IPv4 address range.

Example 10-4 Store Connect Service YANG Module: The List of Publishers, Network Part

Click here to view code image

      container network {
        list site {
          key name;
          leaf name {
            type string;
          }
          leaf address {
            type inet:ipv4-address;
          }
          leaf mask-len {
            type uint32 {
              range "0..32";
            }
            default 32;
}

Each publisher has a number of sites. Each site has a name and an IPv4 address range that is allowed to connect with BookZone. No matter which device type happens to sit at the location to which this site is connected, the address and mask are expressed as a dotted quad and an integer. Some devices use other formats in their management interfaces, as you shall soon see.

Each site is attached to a specific E router as well as a specific interface on that E router. This is modeled as a leafref, pointing to any device in the NSO managed device list. Here, you only want to let the operator select devices with the E router role. This is accomplished with a must statement that only allows selection of devices that have names starting with ej, ec, or ei.

There are some advantages and disadvantages with making models depend on naming conventions. Naming conventions are a weak form of architecture. On the other hand, if this sort of simple statement can prevent mistakes and make the operators’ lives easier, why not? It’s as easy to change the YANG as changing any other code. You need to decide for yourself in which situations you want to consider this a good or bad thing.

Then there is the interface reference. In the store model shown in Example 10-2, the interface reference was modeled as a simple string. The advantage is dead simple YANG as well as total decoupling between service-level YANG and device YANG. The disadvantage is that the operator gets no help at all in figuring out what valid values are. With a leafref, everyone sees what YANG list and leaf has the values to go here. A good client application displays the options in a drop-down menu or as tab-completion options.

Since this service supports devices of several different kinds, and not all device types support the IETF interface’s YANG module, this service lists a number of different options it allows. This is the price to pay for being specific and helping the operator understand what the options are. Each interface reference has a when statement to only make it available with the right device type. Next, the leafref path follows the e-router leaf to the device it points to, and then goes into that device’s configuration tree and down into the interface list (of the right kind, depending on device type).

The require-instance false statement is important to discuss. It allows the leafref reference to be broken so that devices can be removed without tearing down the services that depend on them. If you feel the operator should not be allowed to remove a device if a service depends on it, simply remove the require-instance false statement.

The device and interface selection configuration for both externally and internally facing routers is shown in Example 10-5.

Example 10-5 Store Connect Service YANG Module: The List of Publishers, Interface Part

Click here to view code image

          leaf e-router {
            must "starts-with(current(), 'ej') or "+
                 "starts-with(current(), 'ec') or "+
                 "starts-with(current(), 'ei')";
            type leafref {
              path "/ncs:devices/ncs:device/ncs:name";
            }
          }
          choice interface-type {
            leaf junos-interface {
              when "starts-with(../e-router, 'ej')";
              type leafref {
                path "/ncs:devices/ncs:device[ncs:name=current()/../e-router]/"+
                     "ncs:config/jc:configuration/jc-interfaces:interfaces/"+
                     "jc-interfaces:interface/jc-interfaces:name";
                // The path above expressed using deref():
                // path "deref(../e-router)/../ncs:config/jc:configuration/"+
                //      "jc-interfaces:interfaces/jc-interfaces:interface/"+
                //      "jc-interfaces:name";
                require-instance false;
              }
            }
            leaf ios-ge-interface {
              when "starts-with(../e-router, 'ec')";
              type leafref {
                path "/ncs:devices/ncs:device[ncs:name=current()/../e-router]/"+
                     "ncs:config/ios:native/ios:interface/ios:GigabitEthernet/"+
                     "ios:name";
                // The path above expressed using deref():
                // path "deref(../e-router)/../ncs:config/"+
                //      "ios:native/ios:interface/ios:GigabitEthernet/ios:name";
                require-instance false;
              }
            }
            leaf ietf-interface {
              when "starts-with(../e-router, 'ei')";
              type leafref {
                path "/ncs:devices/ncs:device[ncs:name=current()/../e-router]/"+
                     "ncs:config/if:interfaces/if:interface/if:name";
                // The path above expressed using deref():
                // path "deref(../e-router)/../ncs:config/"+
                //      "if:interfaces/if:interface/if:name";
                require-instance false;
              }
            }
          }
        }
        leaf allocated-vlan {
          config false;
          type uint32 {
            range "1..4094";
          }
        }
}

The last leaf in Example 10-5, allocated-vlan, is a config false operational state element. The operator is not asked to enter a VLAN ID for this publisher. Instead, the service code allocates a free VLAN. The operator might still be interested to know what allocation was done, if only for debugging purposes. That allocation is given back to the operator through this leaf by the service code.

The last part of the publisher list is about which publishers should be connected to which stores. A simple policy could of course be that all publishers are connected to all stores. BookZone’s policy is based on stores and publishers both listing tag keywords. If they have any tag keyword in common, they are connected. A tag keyword could be a specialty, like science, manga, or cooking. Or it could be mainstream books in a given language, like English, French, or Mandarin.

In order to make it easier to understand the model and for operators to fill in values that are in use, the tag is modeled as a leafref to any tag value in use by any store, with require-instance false. This means a particular keyword, such as cd-rom, can be dropped by the last store without the configuration becoming invalid even if there are some publishers that still want to connect with stores that market something that has gone out of fashion (such as cd-rom). The number-of-stores leaf is a read-only count of how many stores that carry the given tag filled in by the system, as a guide to publishers.

Example 10-6 shows the YANG model for configuring the mapping between stores and publishers.

Example 10-6 Store Connect Service YANG Module: The list of Publishers, target-stores Part

Click here to view code image

      container target-stores {
        list tag {
          key tag;

          leaf tag {
            type leafref {
              path "/stores/store/tags";
              require-instance false;
            }
          }
          leaf number-of-stores {
            config false;
            type uint32;
          }
        }
      }
    }
  }
}

Bottom-Up Device Templates

In the end, the service needs to push configuration changes to the devices. In NSO, this part is often described using device templates. A device template pushes Extensible Markup Language (XML) data structures to devices with a mix of hard-coded values and expressions in curly brackets, as shown in Example 10-7. The expressions may draw input data directly from the service YANG and service code using XPath pointers. The expression {/name} picks up the value of leaf name directly from the service YANG. The expressions may also refer to variables computed and published by the service code. For example, {$VLAN_ID}.

Example 10-7 Store Connect Service Template: Cisco IOS-XE Part

Click here to view code image

<config-template xmlns="http://tail-f.com/ns/config/1.0">
  <devices xmlns="http://tail-f.com/ns/ncs">
    <device>
      <name>{$DEVICE}</name>
      <config>

        <!-- CISCO XE1671 -->
        <native xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-native">
          <vrf>
            <definition tags="merge">
              <name>{/name}</name>
              <rd>300:{$VLAN_ID}</rd>
...
          <interface tags="nocreate">
            <GigabitEthernet>
              <name>{$INTERFACE}</name>
              <description tags="merge">connection to {/name}</description>
              <ip tags="merge">
                <address>
                  <primary>
                    <address>{$ADDRESS}</address>
                    <mask>{$MASK}</mask>

Further down in the same device template file is the mapping to a JunOS device, as shown in Example 10-8. The orchestrator automatically picks the right namespace(s) for the given device.

Example 10-8 Store Connect Service Template: Juniper JunOS Part

Click here to view code image

        <!-- Juniper Junos18 -->
        <configuration xmlns="http://yang.juniper.net/junos/conf/root">
          <interfaces xmlns="http://yang.juniper.net/junos/conf/interfaces">
            <interface tags="nocreate">
              <name>{$INTERFACE}</name>
              <unit tags="merge">
                <name>{$VLAN_ID}</name>
                <description>connection to {/name}</description>
                <vlan-id>{$VLAN_ID}</vlan-id>
                <family>
                  <inet>
                    <address>
                      <name>{$ADDRESS}/{$MASK_LEN}</name>
...
          <routing-instances 
               xmlns="http://yang.juniper.net/junos/conf/routing-instances">
            <instance>
              <name>{/name}</name>
              <instance-type>vrf</instance-type>
              <interface>
                <name>{$INTERFACE}.{$VLAN_ID}</name>
              </interface>
              <route-distinguisher>
                <rd-type>300:{$VLAN_ID}</rd-type>
              </route-distinguisher>
              <vrf-import>{/name}-IMP</vrf-import>
              <vrf-export>{/name}-EXP</vrf-export>

The service also needs a device template for the monitoring system, but that is not shown here.

The service could equally well configure other services rather than devices this way, or a mix of lower-level services and devices. Some of the “devices” might be lower-level NSO systems in their own right.

Service Logic Connecting the Dots

A simple service might not need any service logic code at all, simply relying on the control logic available in the device templates. Most services need some sort of more complex calculations, assignments, or communications, however. This requires some code to compute good values to use in the templates.

This service was implemented in Python. If this isn’t your forte, keep reading anyway. The gist of the code is explained at every step.

The first part, shown in Example 10-9, registers this class as responsible for the storeconnect-servicepoint declared in the YANG earlier. This is how the system knows which code to invoke when a particular service moves.

Note

The NSO product was called NCS before it was acquired by Cisco. Cisco already had several products called NCS, so decided to rename it. The programming APIs still reflect the old name.

Example 10-9 Store Connect Service Code: Service Registration Part

Click here to view code image

# -*- mode: python; python-indent: 4 -*-
import ncs
from ncs.application import Service
# ---------------------------------------------
# COMPONENT THREAD THAT WILL BE STARTED BY NCS.
# ---------------------------------------------
class Main(ncs.application.Application):
    def setup(self):
        self.log.info('Main RUNNING')
        self.register_service('storeconnect-servicepoint', ServiceCallbacks)
    def teardown(self):
        self.log.info('Main FINISHED')

The next thing to do is to implement the create() method of the service. This method is called whenever a service is created or modified. It is supposed to take the input parameters from the service YANG, and possibly other inputs such as topology or server load situation, and configure devices or lower-level services. The interface towards the devices and lower-level services is always given by their YANG models. The service remains unaware of which protocol is actually used to convey this information, and whether it is a local or remote entity. The service data and resulting device changes live in the same transaction, so they either succeed or fail together. This means there is no need for error detection or recovery code in the service.

The top-level create() method, shown in Example 10-10, allocates a VLAN ID and updates the allocated-vlan operational data element from the YANG model, to reflect the assignment to the operator. Then it calls a method to configure the E routers, the I routers, and the monitoring system.

Example 10-10 Store Connect Service Code: The create() Callback

Click here to view code image

# ------------------------
# SERVICE CALLBACK EXAMPLE
# ------------------------
class ServiceCallbacks(Service):

    # The create() callback is invoked inside NCS FASTMAP and
    # must always exist.
    @Service.create
    def cb_create(self, tctx, root, publisher, proplist):
        self.log.info('Service create(publisher=', publisher._path, ')')

        vlan_id = self.allocate_vlan(publisher)
        publisher.network.allocated_vlan = vlan_id
        mon  = self.config_e_routers(publisher, vlan_id)
        mon += self.config_i_routers(publisher, vlan_id, root)
        self.config_monitoring(publisher, mon)

        self.log.info('Service creation done')

The VLAN ID allocation could be done using a resource management component, but in order to keep things really simple, here it’s just computed as a hash on the publisher’s name. The code is found in Example 10-11.

Example 10-11 Store Connect Service Code: The allocate_vlan() Method

Click here to view code image

def allocate_vlan(self, publisher):
    # Let's make this as simple as possible for now:
    # Just return a hash on the name (1000..2999)
    return 1000 + hash(publisher.name) % 2000

To configure the E routers, loop over the configured sites in the publisher’s network. For each site, fetch the name of the interface that was configured and then check that a router device, an interface, and an address was given. If not, this site is simply skipped. Alternatively, an error could have been thrown, or these values could have been made mandatory or given default values in the YANG.

Next, a bag of template variables is created, and named variables are assigned suitable values. Near the end, the e-router-template is applied with the bag of variables. All this does is to update the ongoing transaction, which is pushed to the network at a later stage. Finally, the name of the leg is created, and the address and vlan_id are saved as input to the monitoring system. This is shown in Example 10-12.

Example 10-12 Store Connect Service Code: The config_e_routers() Method

Click here to view code image

def config_e_routers(self, publisher, vlan_id):
    mon = []
    for site in publisher.network.site:
        site_interface = self.get_interface(site)
        if bool(site_interface) and bool(site.e_router) and bool(site.address):
            # e-router, address and interface are not mandatory in YANG
            # (they could have been => we would not have needed this)
            # Unless all three are set, we will simply skip this site
            vars = ncs.template.Variables()
            vars.add('DEVICE', site.e_router)
            vars.add('INTERFACE', site_interface)
            vars.add('ADDRESS', site.address)
            vars.add('MASK_LEN', site.mask_len)
            vars.add('MASK', self.ip_size_to_mask[site.mask_len])
            vars.add('VLAN_ID', vlan_id)
            template = ncs.template.Template(publisher)
            template.apply('e-router-template', vars)
            mon += [("%s-%s-int"%(publisher.name, site.name), 
                     site.address, vlan_id)]
    return mon

Since you modeled the E router interface as a choice with one of many possible cases, getting the interface name actually requires a few lines on the code side. Since it was modeled as a choice, only one alternative can have a value. Example 10-13 shows the code that looks up which value is configured by the operator.

Example 10-13 Store Connect Service Code: The get_interface() Method

Click here to view code image

def get_interface(self,site):
    if bool(site.junos_interface):
        return site.junos_interface
    if bool(site.ios_ge_interface):
        return "GigabitEthernet"+site.ios_ge_interface
    return site.ietf_interface

The config_i_routers() method, shown in Example 10-14, is similar to the config_e_routers() method from Example 10-12 in many ways. Only here, the code needs to first figure out between which publisher sites there should be a connection to a store. This is done by looping over all stores and checking if the publisher has any interest tags in common with the store in question. If so, they are connected. Every time a store is connected, that is recorded as a connection to monitor.

At the end is a loop to count the number of stores that carry each of the publisher’s interest tags and then update the operational data to reflect this. The purpose is to detect any misspellings, or see how these numbers change, so that the publisher can change the interest tags over time.

Example 10-14 Store Connect Service Code: The config_i_routers() Method

Click here to view code image

def config_i_routers(self, publisher, vlan_id, root):
    mon = []
    for store in root.storeconnect__stores.store:
        connect = False # Assume no connection to this store
        for tag in [x.tag for x in publisher.target_store.tags]:
            if tag in store.tags:
                # This publisher targets a tag that is
                # carried by this store. Let's connect!
                connect = True
                break
        if connect:
            self.log.info('connecting store ', store.name, ' to publisher ', 
                          publisher.name)
            vars = ncs.template.Variables()
            vars.add('DEVICE', store.network.i_router)
            vars.add('INTERFACE', store.network.interface)
            vars.add('ADDRESS', store.network.address)
            vars.add('VLAN_ID', vlan_id)
            template = ncs.template.Template(publisher)
            template.apply('i-router-template', vars)
            mon += [("%s-%s-ext"%(publisher.name, store.name), 
                     store.network.address, vlan_id)]
    for tag in [x.tag for x in publisher.target_store.tags]:
        publisher.target_store.tags[tag].number_of_stores_with_tag = len(
            [store for store in root.stores.store if tag in store.tags])
    return mon

Finally, the monitoring system needs to know about all the connections configured for E and I routers so that they are properly monitored for connectivity, quality of service, and asserting security policies. Example 10-15 shows the code that applies the template, once for each device configured by this service instance, so that everything configured is also monitored.

Example 10-15 Store Connect Service Code: The config_monitoring() Method

Click here to view code image

def config_monitoring(self, publisher, mon):
    self.log.info('setup monitoring for ', publisher.name, ': ', len(mon), ' legs')
    vars = ncs.template.Variables()
    template = ncs.template.Template(publisher)
    vars.add('DEVICE', 'm0')
    for (mon_name, address, vlan_id) in mon:
        vars.add('MON_NAME', mon_name)
        vars.add('ADDRESS', address)
        vars.add('VLAN_ID', vlan_id)
        template.apply('monitoring-template', vars)

Setting Up NETCONF on a Device

When a NETCONF-capable device is powered on, generally the NETCONF subsystem is not ready for use. It first needs to be configured. How this is done obviously varies with each vendor and device family, and it might change over time. Let’s take a brief look at a couple of different systems here to provide a general idea. Google is typically a good source for additional information of this kind.

On a JunOS device, after you install the necessary software and create a user with sufficient privileges, you need to create a crypto key pair. In a Linux environment, this could be accomplished as shown in Example 10-16.

Example 10-16 Creating an RSA Crypto Key Pair for Setting Up SSH Communications

Click here to view code image

$ ssh-keygen -t rsa -b 2048 -f mykey
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in mykey.
Your public key has been saved in mykey.pub.
The key fingerprint is:
SHA256:zQuqZpBsFydGdRoVBrUhKRrHBEGQ7Nmg7oINJII3eUw jlindbla@JLINDBLA-M-W0J2
The key's randomart image is:
+---[RSA 2048]----+
|o++=..==B.       |
|.o. E .* o       |
|+ +O .. .        |
|=+=.* .  o       |
|=o = +  S o      |
|..= .  . . .     |
|o+ o  .   .      |
|o.. o.           |
|.  o.            |
+----[SHA256]-----+
$ ls mykey*
mykey           mykey.pub
$

Once the key is created, log in to the device, install the private key on the device, and enable NETCONF, as shown in Example 10-17.

Example 10-17 Installing the Crypto Key Pair and Enabling NETCONF on JunOS

Click here to view code image

edit system login user username authentication
set load-key-file sftp://"/mykey
commit
edit system services
set netconf ssh
commit

On a Cisco IOS-XE device, the procedure is similar. First, a crypto key pair needs to be generated, Secure Shell (SSH) enabled, and the NETCONF-YANG subsystem started, as shown in Example 10-18.

Example 10-18 Installing the Crypto Key Pair and Enabling NETCONF on IOS-XE

Click here to view code image

crypto key generate rsa modulus 2048
ip ssh version 2
netconf ssh
netconf-yang

It is certainly also possible to configure the system to use a key generated using ssh-keygen; just paste it under

Click here to view code image

ip ssh pubkey-chain, username username, key-string ...

Once the device is configured, it is easy to do a quick check to verify that the NETCONF subsystem is operational. The standard port for NETCONF, as assigned by IANA, is port 830. Some implementations also allow connecting to the NETCONF subsystem on port 22, the IANA assigned port for SSH. Use an ssh client to connect, as shown in the following snippet:

Click here to view code image

ssh username@device -p 830 -s netconf

The device should respond with a hello message, which might look something like Example 10-19.

Example 10-19 Hello Message Indicating the NETCONF Subsystem Is Operational

Click here to view code image

<?xml version="1.0" encoding="UTF-8"?>
<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<capabilities>
<capability>urn:ietf:params:netconf:base:1.0</capability>
<capability>urn:ietf:params:netconf:base:1.1</capability>
...
</capabilities>
<session-id>13</session-id></hello>]]>]]>

If this is what you see, close the SSH connection (Ctrl+D) and start playing with the device over NETCONF.

Discovering What’s on a Device

Once you have a working connection to a NETCONF server, you can figure out what this system can do. There are three main mechanisms that NETCONF servers use to declare which YANG models they implement, which is the way NETCONF servers communicate what they do for a living.

The hello message lists all the device NETCONF capabilities, as well as all the YANG 1.0 modules that it supports, along with version and features for each module. Example 10-20 shows a short example listing some of the modules announced by a device. These module listings are often hundreds of lines long.

Example 10-20 Hello Message Listing Some YANG 1.0 Modules

Click here to view code image

<capability>http://yang.juniper.net/junos/conf/fabric?module=junos-conf-fabric&revision=2018-01-01</capability>
<capability>http://yang.juniper.net/junos/conf/firewall?module=junos-conf-firewall&revision=2018-01-01</capability>
<capability>http://yang.juniper.net/junos/conf/forwarding-options?module=junos-conf-forwarding-options&revision=2018-01-01</capability>
<capability>http://yang.juniper.net/junos/conf/interfaces?module=junos-conf-interfaces&revision=2018-01-01</capability>
<capability>http://yang.juniper.net/junos/conf/logical-systems?module=junos-conf-logical-systems&revision=2018-01-01</capability>

If one of the modules listed in hello is the ietf-netconf-monitoring module, as shown in the following snippet, additional information about the server may be retrieved:

Click here to view code image

<capability>urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring?module=ietf-netconf-monitoring&revision=2010-10-04</capability>

If supported, this module can tell you about the server capabilities, datastores, schemas (that is, modules), sessions, statistics, and available streams. Retrieve this information as shown in the following snippet; just add arguments for --host, --port, --user, and so on:

Click here to view code image

$ netconf-console --get --xpath /netconf-state

Example 10-21 shows what it could look like.

Example 10-21 NETCONF Server Information Reply (Abridged)

Click here to view code image

<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1">
  <data>
    <netconf-state xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring">
      <capabilities>
        <capability>urn:ietf:params:netconf:base:1.0</capability>
        <capability>urn:ietf:params:netconf:base:1.1</capability>
...
      </capabilities>
      <datastores>
        <datastore>
          <name>running</name>
          <transaction-id xmlns="http://tail-f.com/yang/netconf-monitoring">1541-58483-523787</transaction-id>
        </datastore>
      </datastores>
      <schemas>
        <schema>
          <identifier>audiozone-example</identifier>
          <version>2018-01-09</version>
          <format>yang</format>
          <namespace>http://example.com/ns/audiozone</namespace>
          <location>NETCONF</location>
        </schema>
        <schema>
          <identifier>bookzone-example</identifier>
          <version>2018-01-05</version>
          <format>yang</format>
          <namespace>http://example.com/ns/bookzone</namespace>
          <location>NETCONF</location>
        </schema>
...
      </schemas>
      <sessions>
        <session>
          <session-id>15</session-id>
          <transport>netconf-ssh</transport>
          <username>admin</username>
          <source-host>127.0.0.1</source-host>
          <login-time>2018-11-01T08:59:15+01:00</login-time>
          <in-rpcs>1</in-rpcs>
          <in-bad-rpcs>0</in-bad-rpcs>
          <out-rpc-errors>0</out-rpc-errors>
          <out-notifications>0</out-notifications>
        </session>
      </sessions>
      <statistics>
        <netconf-start-time>2018-11-01T08:48:41+01:00</netconf-start-time>
        <in-bad-hellos>0</in-bad-hellos>
        <in-sessions>3</in-sessions>
        <dropped-sessions>0</dropped-sessions>
        <in-rpcs>4</in-rpcs>
        <in-bad-rpcs>0</in-bad-rpcs>
        <out-rpc-errors>0</out-rpc-errors>
        <out-notifications>0</out-notifications>
      </statistics>
      <streams xmlns="http://tail-f.com/yang/netconf-monitoring">
        <stream>
          <name>NETCONF</name>
          <description>default NETCONF event stream</description>
          <replay-support>false</replay-support>
        </stream>
        <stream>
          <name>Trader</name>
          <description>BookZone trading and delivery events</description>
          <replay-support>true</replay-support>
        </stream>
      </streams>
    </netconf-state>
  </data>
</rpc-reply>

Another interesting aspect of the schema (module) list is the RPC to download the actual YANG source for the module, as discussed in Chapter 7, “Automation Is as Good as the Data Models, Their Related Metadata, and the Tools: For the Network Architect and Operator.” This is an optional feature, so your device may or may not support this operation, and even if it does, it may not apply to every possible YANG file.

Downloading the YANG from the device is a nice way for a NETCONF client to understand exactly what the interface is like. YANG files are often downloaded from public Git sites, but by downloading directly from the device, you don’t have to go looking, so you’ll be sure you found the right version for this particular device.

Because the hello message is getting so long on many devices, YANG 1.1 (RFC 7950) prescribes a different hello behavior for YANG 1.1 modules. Instead of listing all the YANG 1.1 modules as capabilities in hello together with the YANG 1.0 modules, only a single capability is listed in lieu of all YANG 1.1 modules:

Click here to view code image

<capability>urn:ietf:params:netconf:capability:yang-library:1.0?revision=2016-06-21&module-set-id=2c6ee52de6f4e3db52497342fb3cc282</capability>

When this module is present in the hello message, the system may have some YANG 1.1 modules. Note the module-set-id attribute in the snippet just mentioned. By tracking the module-set-id in the hello message, the NETCONF client can tell whether or not reading the yang-module-library is required. If the module-set-id hasn’t changed, the set of modules hasn’t changed either. Use the following command to query the ietf-yang-library module list (add --user, and so on):

Click here to view code image

$ netconf-console --get --xpath /modules-state

The reply might look like Example 10-22. The conformance-type in the following reply is either implement or import, where implement means the module is really implemented on the server, while import means the module isn’t implemented. The module is still included on the system because some groupings or types were imported from that module, so it isn’t possible to compile the YANG modules without it.

Example 10-22 List of Modules Supported by the Server Taken from ietf-yang-library

Click here to view code image

<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1">
  <data>
    <modules-state xmlns="urn:ietf:params:xml:ns:yang:ietf-yang-library">
      <module-set-id>7aec0b1b1d4e5783ff4d305475e6e92c</module-set-id>
      <module>
        <name>audiozone-example</name>
        <revision>2018-01-09</revision>
        <namespace>http://example.com/ns/audiozone</namespace>
        <conformance-type>implement</conformance-type>
      </module>
      <module>
        <name>bookzone-example</name>
        <revision>2018-01-05</revision>
        <namespace>http://example.com/ns/bookzone</namespace>
        <conformance-type>implement</conformance-type>
      </module>
      <module>
        <name>iana-crypt-hash</name>
        <revision>2014-08-06</revision>
        <namespace>urn:ietf:params:xml:ns:yang:iana-crypt-hash</namespace>
        <feature>crypt-hash-md5</feature>
        <feature>crypt-hash-sha-256</feature>
        <feature>crypt-hash-sha-512</feature>
        <conformance-type>import</conformance-type>
      </module>

Another important thing a NETCONF manager often needs to know is whether someone else changed the configuration since the last time the manager was connected. Of course, one way to find out is to issue a full get-config and compare it with a saved result from earlier. Some devices support leaner mechanisms, such as a transaction ID or a timestamp of the last change. By simply reading the transaction ID or timestamp and comparing it with the latest known value, the manager can quickly discover if any out-of-band (OOB) changes took place.

There are several proprietary mechanisms for this. One of the more common ones is shown in Example 10-23 for reference. The manager sends a get request for the transaction-id leaf, augmented into the datastore list in the ietf-netconf-monitoring module, and looks at the running datastore.

Example 10-23 Manager Checks the Latest transaction-id on a Device

Click here to view code image

<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"message-id="3">
  <get xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">
    <filter>
      <netconf-state xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring">
        <datastores>
          <datastore>
            <name>running</name>
            <transaction-id xmlns="http://tail-f.com/yang/netconf-monitoring"/>
          </datastore>
        </datastores>
      </netconf-state>
    </filter>
  </get>
</rpc>

A device that supports this augmented leaf might reply as shown in Example 10-24.

Example 10-24 Device Returns Latest transaction-id

Click here to view code image

<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
           message-id="3">
  <data>
    <netconf-state xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring">
      <datastores>
        <datastore>
          <name xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">running</name>
          <transaction-id xmlns="http://tail-f.com/yang/netconf-monitoring">1540-997164-482246</transaction-id>
        </datastore>
      </datastores>
    </netconf-state>
  </data>
</rpc-reply>

With this, you’re ready to start your automation journey. To spark your imagination, the next section looks at what a small automation solution looks like, and how it behaves on the network.

Managing Services

How to set up an orchestration environment that allows YANG-based service configuration is beyond the scope of this book, but seeing how services are used and what they look like on a NETCONF level is central to the mission of this book.

Let’s say you have the network setup described in the beginning of this chapter up and running. That is, you have three E routers (ej0, ej1, ec0), three I routers (ic0, ic1, ij0), a monitoring system (m0), a core network (ignored here), three publishers, four stores, and an NSO-based system orchestrating it all. On top of the NSO platform, you are running the Store Connect service application shown earlier in this chapter. This application contains a service YANG module, templates, and service code. It’s now time to actually use this service.

As your tour begins, note that the NSO orchestrator was already installed and configured with operator logins and device YANG modules as well as loaded with the Store Connect service package. All the devices already mentioned were added to the device list and synchronized. This means that NSO has a full copy of the configuration of every managed device in memory. NSO only manages devices that are configured in the NSO device list.

As you are logging in to the NSO command-line interface (CLI), there are initially no stores or publishers configured. A quick way to fix that is to load some configuration data from a file that someone else prepared. To see what’s about to change by way of the loaded file, the show c command, short for show configuration, displays the uncommitted configuration changes. Example 10-25 shows the initial data loading in a CLI interaction toward the NSO orchestrator.

Example 10-25 Loading Stores’ Configuration Data

Click here to view code image

admin connected from 127.0.0.1 using console on JLINDBLA-M-W0J2
admin@ncs# show running-config stores
% No entries found.
admin@ncs# show running-config publishers
% No entries found.
admin@ncs# con
Entering configuration mode terminal
admin@ncs(config)# load merge store
Possible completions:
  <filename>  storedstate  stores-init.xml
admin@ncs(config)# load merge stores-init.xml
Loading.
1.38 KiB parsed in 0.01 sec (77.27 KiB/sec)
admin@ncs(config)# show c
stores store Singoalla
 network address 10.0.0.3
 network i-router ij0
 network interface ge-0/0/1
 tags [ english french nobel small sweden ]
!
stores store Took-Look
 network address 10.0.0.4
 network i-router ij0
 network interface ge-0/0/0
 tags [ belgium english large ]
!
stores store Varnes-Soble
 network address 10.0.0.1
 network i-router ic0
 network interface GigabitEthernet0/0/2
 tags [ crime english large science usa ]
!
stores store Yoihon
 network address 10.0.0.2
 network i-router ic1
 network interface GigabitEthernet0/1/1
 tags [ english japan japanese manga nobel science ]
!

Next, you need to load some initial publisher configs into the same transaction. Each publisher is a separate service instance the way you modeled this. You are about to create three new service instances in a single transaction. To save some space, we’ll only look at one of them here. The other two are similar. Example 10-26 shows the CLI interaction toward the central orchestrator.

Example 10-26 Loading Publishers’ Configuration Data

Click here to view code image

admin@ncs(config)# load merge publishers-init.xml
Loading.
2.02 KiB parsed in 0.04 sec (44.81 KiB/sec)
admin@ncs(config)# show full-configuration publishers publisher
Possible completions:
  Astrakan-Media   Name of publisher connecting
  Best-Books       Name of publisher connecting
  Culture-Froide   Name of publisher connecting
  <cr>
Possible match completions:
  network  target-store
admin@ncs(config)# show full-configuration publishers publisher Astrakan-Media
publishers publisher Astrakan-Media
 network site 1
  address         172.20.1.1
  mask-len        24
  e-router        ej0
  junos-interface ge-0/0/1
 !
 target-store tags belgium
 !
 target-store tags english
 !
 target-store tags french
 !
!

With all these changes loaded into the current transaction, it’s time to commit them. Before you do, check what the system would do if these changes are committed. The command to show that is called commit dry-run. Plus signs indicate additions; minus signs indicate removals. Note that these changes are going to many different devices of different brands, YANG models and roles.

Since the full output is about 350 lines long, Example 10-27 just shows a sampling to give you a taste of what it looks like. Seven devices are touched by this network-wide transaction.

Example 10-27 A commit dry-run to Inspect the Service Footprint on Devices (Abridged)

Click here to view code image

admin@ncs(config)# commit dry-run
cli {
    local-node {
        data  devices {
                  device ec0 {
                      config {
                          ios:native {
                              vrf {
             +                    definition Culture-Froide {
             +                        rd 300:2620;
...
             +                        route-target {
             +                            export 300:2620;
             +                            import 300:2620;
             +                        }
...
                  device ej0 {
                      config {
                          jc:configuration {
                              routing-instances {
             +                    instance Astrakan-Media {
             +                        instance-type vrf;
             +                        interface ge-0/0/1.2246;
             +                        route-distinguisher {
             +                            rd-type 300:2246;
             +                        }
             +                        vrf-import [ Astrakan-Media-IMP ];
             +                        vrf-export [ Astrakan-Media-EXP ];
...
                  device ij0 {
                      config {
                          jc:configuration {
                              routing-instances {
             +                    instance Astrakan-Media {
             +                        instance-type vrf;
             +                        interface ge-0/0/0.2246;
             +                        interface ge-0/0/1.2246;
             +                        route-distinguisher {
             +                            rd-type 300:2246;
...
                  device m0 {
                      config {
                          netrounds-ncc:accounts {
             +                account bookzone {
             +                    monitors {
             +                        monitor Astrakan-Media {
             +                            description "connectivity with standard qos";
             +                            template connectivity-std-qos;
...
             +                    twamp-reflectors {
             +                        twamp-reflector Astrakan-Media-1-int {
             +                            address 172.20.1.1;
             +                            port 6789;
             +                        }
             +                        twamp-reflector Astrakan-Media-Singoalla-ext {
             +                            address 10.0.0.3;
             +                            port 6789;
             +                        }

This looks good, so let’s commit it. Now at commit time, the managed devices hear about the change for the first time:

admin@ncs(config)# commit
Commit complete.
admin@ncs(config)#

During the commit, the service-level configuration change was computed to include the device-level changes; then the result was validated, written to the database, and finally communicated to all participating devices in a network-wide transaction.

We’ll inspect the NETCONF messages between the client (orchestrator) and servers (all the devices) involved in network-wide transaction in the next section. If any device has an issue with the configuration change, the entire transaction is aborted. No device activated any part of the change at that point, so there should be zero service interruptions.

Another reason the orchestrator might abort the transaction is if it finds that a device configuration changed since last being synchronized, usually because of OOB changes (for example, by manual intervention by an operator on the device console or by action from another automation system). Many strategies for handling multiple managers in an automated process are possible. Generally speaking, automation gets easier the fewer managers (unaware of each other) that are involved—much like with your own work situation.

Since you committed a change, at this point it might be interesting to have a look at your running services from an operational point of view. Look at Example 10-28 and notice which devices are being touched by one particular service instance. Also notice the service’s operational data (that is, the allocated-vlan and the list of store tags that this publisher targets) and how many stores each tag connects to.

Example 10-28 Showing a Service Instance with Operational State

Click here to view code image

admin@ncs(config)# do show publishers
publishers publisher Astrakan-Media
 modified devices [ ej0 ic0 ic1 ij0 m0 ]
 directly-modified devices [ ej0 ic0 ic1 ij0 m0 ]
 device-list [ ej0 ic0 ic1 ij0 m0 ]
 network allocated-vlan 2246
         NUMBER
         OF
         STORES
         WITH
TAG      TAG
-----------------
belgium  1
english  4
french   1
...

At a later time, it might be interesting to go back and see exactly which device-level changes a particular service instance has incurred. The command publishers publisher Astrakan-Media get-modifications shows just that, but the output looks just like the dry-run output shown in Example 10-27, so it is not repeated here.

Now the services are up and running on the seven managed devices. What next? Let’s see what happens if the publisher wants to modify the service. Say one of the publishers wants to drop the very generic store tag english. The tag is removed with a simple no command.

The orchestrator knows just what the service created last time on the devices (and any lower-layer services). When the create() method runs again now, the orchestrator notices that the method no longer creates some of the connection objects it created on the previous run. The orchestrator updates the transaction to remove exactly that diff. A commit dry-run shows what will happen, as shown in Example 10-29. The change affects three devices: ic0, ic1, and m0.

Example 10-29 Showing What Would Happen if a Particular Store Tag Was Removed

Click here to view code image

admin@ncs(config)# no publishers publisher Astrakan-Media target-store tags english
admin@ncs(config)# commit dry-run
cli {
    local-node {
        data  devices {
                  device ic0 {
                      config {
                          ios:native {
                              vrf {
             -                    definition Astrakan-Media {
             -                        rd 300:2246;
...
             -                        route-target {
             -                            export 300:2246;
             -                            import 300:2246;
...
                  device ic1 {
                      config {
                          ios:native {
                              vrf {
             -                    definition Astrakan-Media {
             -                        rd 300:2246;
...
             -                        route-target {
             -                            export 300:2246;
             -                            import 300:2246;
             -                        }
...
                  device m0 {
                      config {
                          netrounds-ncc:accounts {
                              account bookzone {
                                  twamp-reflectors {
             -                        twamp-reflector Astrakan-Media-Varnes-Soble-ext {
             -                            address 10.0.0.1;
             -                            port 6789;
             -                        }
             -                        twamp-reflector Astrakan-Media-Yoihon-ext {
             -                            address 10.0.0.2;
             -                            port 6789;
             -                        }
                                  }
                              }
                          }
                      }
                  }
              }
              publishers {
                  publisher Astrakan-Media {
                      target-store {
             -            tags english {
             -            }
                      }
                  }
              }
    }
}

This matches the publisher’s expectations, so you can commit. Now look at the service’s operational state in Example 10-30. The five devices touched by this service instance are now down to three. All configuration from this service instance on ic0 and ic1 is gone, so they are no longer part of the modified devices list for this service instance. The m0 device still has some configuration, so it is still listed.

Example 10-30 Showing a Service Instance with Operational State after a Particular Store Tag Was Removed

Click here to view code image

admin@ncs(config)# commit
Commit complete.
admin@ncs(config)# do show publishers publisher Astrakan-Media
publishers publisher Astrakan-Media
 modified devices [ ej0 ij0 m0 ]
 directly-modified devices [ ej0 ij0 m0 ]
 device-list [ ej0 ij0 m0 ]
 network allocated-vlan 2246
         NUMBER
         OF
         STORES
         WITH
TAG      TAG
-----------------
belgium  1
french   1

Great. This was a productive morning. But after a change like this, a not entirely unlikely event is a phone call from someone who thinks the latest change wasn’t such a good idea after all (sometimes expressed quite differently). In order to restore the service, either the latest transaction, all transactions back to a given point in time, or some specific set of transactions must be undone. The rollback command does that. The rollback configuration command creates a new transaction with content that undoes the previous transaction. A commit dry-run shows all the details, but in the interest of time, let’s skip this. There is nothing special about a rollback transaction. It’s simply yet another transaction that happens to make the configuration the same as (or similar to, in case of cherry-picked transactions to roll back) an earlier provisioned configuration. The rollback command sequence is shown in Example 10-31.

Example 10-31 Rolling Back a Configuration Change

Click here to view code image

admin@ncs(config)# rollback configuration
admin@ncs(config)# show c
publishers publisher Astrakan-Media
 target-store tags english
 !
!
admin@ncs(config)# comm
Commit complete.

Manager Synchronization with Devices

Having observed the service-level management flow from the operator’s perspective, it’s now time to drop down to the underlying foundations and observe the same flow from a protocol perspective. Here, assume all devices involved communicate over NETCONF. In the real world, there is often a mix of different protocols and even among NETCONF devices, capabilities vary.

When the operator configures the orchestrator (NSO) with the devices to be managed, it connects to them to find out what kind of devices they are and to synchronize with them. The orchestrator starts by sending a hello message to each of the seven devices in its device list, as shown in Example 10-32.

Example 10-32 Manager Sends Hello Message

Click here to view code image

<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
  <capabilities>
    <capability>urn:ietf:params:netconf:base:1.0</capability>
    <capability>urn:ietf:params:netconf:base:1.1</capability>
  </capabilities>
</hello>

Each device responds to the hello message; the response from device m0 is shown in Example 10-33.

Example 10-33 Manager Receives Hello Messages (Abridged)

Click here to view code image

<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
  <capabilities>
    <capability>urn:ietf:params:netconf:base:1.0</capability>
    <capability>urn:ietf:params:netconf:base:1.1</capability>
    <capability>urn:ietf:params:netconf:capability:candidate:1.0</capability>
    <capability>urn:ietf:params:netconf:capability:confirmed-commit:1.0</capability>
    <capability>urn:ietf:params:netconf:capability:confirmed-commit:1.1</capability>
    <capability>urn:ietf:params:netconf:capability:validate:1.0</capability>
    <capability>urn:ietf:params:netconf:capability:validate:1.1</capability>
    <capability>urn:ietf:params:netconf:capability:rollback-on-error:1.0</capability>
...
    <capability>urn:ietf:params:xml:ns:yang:ietf-yang-library?module=ietf-yang-library&revision=2016-06-21</capability>

Fairly similar responses from ej0, ej1, ec0, ic0, ic1, and ij0 were omitted. The manager takes note of the capabilities of each device. Here, all devices support the NETCONF base protocol, the candidate datastore, configuration validation (separate from activation), and rollback-on-error (that is, transactions). That’s nice, because all of those are all required for a device to be able to participate in a network-wide transaction. All devices also support confirmed-commit, which allows you to use the most powerful form of network-wide transactions.

If one or a few devices are missing one or several of these capabilities, the manager can still go ahead with the devices that have the required support and handle those that don’t in a best-effort kind of way at the end of the PREPARE phase of the transaction. This is not a full network-wide transaction, but it’s significantly more reliable than a pure best-effort, scripted solution.

Looking at the hello response from the devices, the orchestrator notes that some devices announce the ietf-yang-library module capability. For those devices, the manager reads the list of modules on the device via a get operation on the modules-state list in the ietf-yang-library. Example 10-34 shows the request toward the m0 device.

Example 10-34 Manager Reads ietf-yang-library modules-state

Click here to view code image

<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="1">
  <get>
    <filter>
      <modules-state xmlns="urn:ietf:params:xml:ns:yang:ietf-yang-library"/>
    </filter>
  </get>
</rpc>

Similar requests are sent to the other devices supporting the ietf-yang-library. Example 10-35 shows the response from device m0.

Example 10-35 Device Responding with Supported Modules by Listing modules-state (Abridged)

Click here to view code image

<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
           message-id="1">
  <data>
    <modules-state xmlns="urn:ietf:params:xml:ns:yang:ietf-yang-library">
      <module-set-id>61e5be5ab84c9d7b2db0bef3aca036b7</module-set-id>
      <module>
        <name>iana-crypt-hash</name>
        <revision>2014-08-06</revision>

A similar response arrives from all other devices that support the ietf-yang-library (omitted here).

Next, the manager needs to synchronize the configuration data from each device to its own database. Example 10-36 is the request sent to device m0. Someone configured the manager so that it only cares about a single namespace on m0, so the manager issues a get-config to retrieve only that one YANG namespace to save time and memory.

Example 10-36 Manager Synchronizing the Configuration from One Device

Click here to view code image

<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="2">
  <get-config>
    <source>
      <running/>
    </source>
    <filter>
      <accounts xmlns="http://com/netrounds/ncc"/>
    </filter>
  </get-config>
</rpc>

Toward one of the other devices, the manager is configured to handle a few more YANG namespaces. Example 10-37 is the request toward ec0.

Example 10-37 Manager Synchronizing the Configuration from Another Device

Click here to view code image

<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="2">
  <get-config>
    <source>
      <running/>
    </source>
    <filter>
      <mdt-subscriptions xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-mdt-cfg"/>
      <mpls-ldp xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-mpls-ldp"/>
      <native xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-native"/>
      <netconf-yang xmlns="http://cisco.com/yang/cisco-self-mgmt"/>
      <pseudowire-config xmlns="urn:cisco:params:xml:ns:yang:pw"/>
      <mpls-static xmlns="urn:ietf:params:xml:ns:yang:common-mpls-static"/>
      <classifiers xmlns="urn:ietf:params:xml:ns:yang:ietf-diffserv-classifier"/>
      <policies xmlns="urn:ietf:params:xml:ns:yang:ietf-diffserv-policy"/>
      <filters xmlns="urn:ietf:params:xml:ns:yang:ietf-event-notifications"/>
      <subscription-config xmlns="urn:ietf:params:xml:ns:yang:ietf-event-notifications"/>
      <interfaces xmlns="urn:ietf:params:xml:ns:yang:ietf-interfaces"/>
      <key-chains xmlns="urn:ietf:params:xml:ns:yang:ietf-key-chain"/>
      <routing xmlns="urn:ietf:params:xml:ns:yang:ietf-routing"/>
      <nvo-instances xmlns="urn:ietf:params:xml:ns:yang:nvo"/>
    </filter>
  </get-config>
</rpc>

A similar request goes to all other devices (omitted here). Each device responds with its configuration (also omitted here).

On devices that support this, the manager reads and stores the transaction ID that is current on that device, to make it possible to later quickly check if the configuration of the device was altered. The request for the transaction-id is shown in Example 10-38.

Example 10-38 Manager Requesting transaction-id from a Device

Click here to view code image

<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="3">
  <get xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">
    <filter>
      <netconf-state xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring">
        <datastores>
          <datastore>
            <name>running</name>
            <transaction-id xmlns="http://tail-f.com/yang/netconf-monitoring"/>
          </datastore>
        </datastores>
      </netconf-state>
    </filter>
  </get>
</rpc>

The devices that are asked respond. The previous request was sent to m0, which responds like Example 10-39. (The other device responses have been omitted.)

Example 10-39 Device Responding with transaction-id

Click here to view code image

<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
           message-id="3">
  <data>
    <netconf-state xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring">
      <datastores>
        <datastore>
          <name xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">running</name>
          <transaction-id xmlns="http://tail-f.com/yang/netconf-monitoring">1540-997218-679912</transaction-id>
        </datastore>
      </datastores>
    </netconf-state>
  </data>
</rpc-reply>

At this point, the manager has retrieved the basic information it needs about the devices and has no pending work for them, so it closes the connection to each device. It would work fine to simply close the connection, but sending a polite close-session message, as shown in Example 10-40, makes everybody understand that the disconnection was intentional.

Example 10-40 Manager Closing the Connection to All Devices

Click here to view code image

<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="4">
  <close-session/>
</rpc>

Network-Wide Transactions

Let’s look at the NETCONF exchange that takes place when the operator commits one of the transactions discussed earlier in this chapter.

First, the manager opens up the connection to all relevant devices and sends out a hello message. The hello response is compared with what the manager already knows about each device. If nothing changed, the manager proceeds to send out the configuration changes computed for each device in a few steps.

Example 10-41 shows the messages sent to device m0. These requests are sent in rapid succession, without waiting for a reply in between. These messages ask the device to clear the candidate datastore, lock it to prevent others from using it, and get a readout of the latest transaction-id. Similar messages are sent to all the other devices involved in this transaction in parallel.

Example 10-41 Manager Preparing a Device for a Configuration Change

Click here to view code image

<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="1">
  <discard-changes/>
</rpc>
...
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="2">
  <lock>
    <target>
      <candidate/>
    </target>
  </lock>
</rpc>
...
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="3">
  <get xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">
    <filter>
      <netconf-state xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring">
        <datastores>
          <datastore>
            <name>running</name>
            <transaction-id xmlns="http://tail-f.com/yang/netconf-monitoring"/>
          </datastore>
        </datastores>
      </netconf-state>
    </filter>
  </get>
</rpc>

The m0 device responds as shown in Example 10-42.

Example 10-42 Device Responds with ok and the Latest transaction-id

Click here to view code image

<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
           message-id="1">
  <ok/>
</rpc-reply>
...
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
           message-id="2">
  <ok/>
</rpc-reply>
...
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
           message-id="3">
  <data>
    <netconf-state xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring">
      <datastores>
        <datastore>
          <name xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">running</name>
          <transaction-id xmlns="http://tail-f.com/yang/netconf-monitoring">1540-997218-679912</transaction-id>
        </datastore>
      </datastores>
    </netconf-state>
  </data>
</rpc-reply>

A similar exchange happens for all other devices (omitted here).

Since all devices responded positively, it’s now time for the manager to send out the edit-config messages to each device with the respective actual configuration changes. The edit-config toward the candidate (shown in Example 10-43) is sent toward the m0 device.

Example 10-43 Manager Sends an edit-config Toward One Device (Abridged)

Click here to view code image

<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="4">
  <edit-config xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">
    <target>
      <candidate/>
    </target>
    <test-option>test-then-set</test-option>
    <error-option>rollback-on-error</error-option>
    <config>
      <accounts xmlns="http://com/netrounds/ncc">
        <account>
          <name>bookzone</name>
          <twamp-reflectors>
            <twamp-reflector>
              <name>Astrakan-Media-1-int</name>
              <port>6789</port>
              <address>172.20.1.1</address>
            </twamp-reflector>

In parallel, an edit-config is sent toward device ec0, as shown in Example 10-44.

Example 10-44 Manager Sends an edit-config Toward Another Device (Abridged)

Click here to view code image

<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="5">
  <edit-config xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">
    <target>
      <candidate/>
    </target>
    <test-option>test-then-set</test-option>
    <error-option>rollback-on-error</error-option>
    <config>
      <native xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-native">
        <vrf>
          <definition>
            <name>Culture-Froide</name>
            <address-family>
              <ipv4/>
            </address-family>
            <route-target>
              <import>
                <asn-ip>300:2620</asn-ip>
              </import>
              <export>
                <asn-ip>300:2620</asn-ip>
              </export>
            </route-target>
            <rd>300:2620</rd>

In parallel, another edit-config is sent towards device ij0, as shown in Example 10-45.

Example 10-45 Manager Sends an edit-config Toward Yet Another Device (Abridged)

Click here to view code image

<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="4">
  <edit-config xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">
    <target>
      <candidate/>
    </target>
    <test-option>test-then-set</test-option>
    <error-option>rollback-on-error</error-option>
    <config>
      <configuration xmlns="http://yang.juniper.net/junos/conf/root">
        <routing-instances xmlns="http://yang.juniper.net/junos/conf/routing-instances">
          <instance>
            <name>Astrakan-Media</name>
            <vrf-export>Astrakan-Media-EXP</vrf-export>
            <interface>
              <name>ge-0/0/0.2246</name>
            </interface>
            <interface>
              <name>ge-0/0/1.2246</name>
            </interface>
            <vrf-table-label/>
            <vrf-import>Astrakan-Media-IMP</vrf-import>
            <instance-type>vrf</instance-type>
            <route-distinguisher>
              <rd-type>300:2246</rd-type>

A similar exchange happens for all other devices involved in this transaction (omitted here).

The manager immediately also sends a validate request to each device involved. Example 10-46 shows the message sent toward m0.

Example 10-46 Manager Requests Validation of the Candidate Datastore on a Device

Click here to view code image

<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="5">
  <validate>
    <source>
      <candidate/>
    </source>
  </validate>
</rpc>

A similar exchange happens for all other devices involved in this transaction (omitted here).

In this case, notice that the device m0 responds with ok to the edit-config and to the validate, as shown in Example 10-47.

Example 10-47 Manager Receives Positive Response to the edit-config and validate Requests on the Candidate Datastore on a Device

Click here to view code image

<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
           message-id="4">
  <ok/>
</rpc-reply>
...
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
           message-id="5">
  <ok/>
</rpc-reply>

You have now completed the PREPARE phase in the transaction. If all devices reach this state with all happy ok messages, the manager goes ahead with the COMMIT phase.

If anything bad happens, such as a lost connection to a device or a negative validation response, the manager proceeds with the ABORT phase. Initiating the ABORT sequence is very simple in NETCONF—simply drop the connection. To be clear, a lost connection is the abort command. That way, any device that loses the connection to the manager in the middle of a transaction always aborts.

Since the transaction was only ever targeting the candidate datastore, no harm is done. The device operations, as guided by the running datastore, continue unaffected as before. There is nothing on the device that must be undone.

Assuming all devices responded well to the edit-config and validate, the manager decides to cross the transaction’s point of no return. After this point, the transaction “happened” (is persisted), and a new transaction must be created to revert it, should the manager feel the desire to have the action undone. The manager commits by writing the transaction to stable storage (that is, disk) and telling all devices to go ahead with the transaction.

The manager tells a device to go ahead by sending a commit message to each participating device. In this case, the manager chose to also set the confirmed flag, indicating that a three-phase transaction (PREPARE, COMMIT, CONFIRM) will be used, as shown in Example 10-48.

Example 10-48 Manager Sends a Confirmed commit Message to a Device

Click here to view code image

<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="6">
  <commit>
    <confirmed/>
  </commit>
</rpc>

All devices participating in the transaction are now busy transferring the changes from the candidate datastore to their running datastore and implementing the new configuration. When done, each device returns an ok message. Example 10-49 shows the message from m0.

Example 10-49 Device Sends a Confirmed commit ok Message

Click here to view code image

<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
           message-id="6">
  <ok/>
</rpc-reply>

It is of course possible that a commit operation fails. It’s not nice for a device to promise that the configuration is fine by responding ok to the validation and then fail when the actual commit arrives, but things can always go wrong regardless of promises.

The best way to deal with an over-promising device is to treat it the same whether the failure happens 2 milliseconds, 2 minutes, or 2 months after the transaction’s point of no return. Hopefully, there is a supervision and recovery mechanism that deals with devices that fail because of connectivity, power, software bugs, attacks, or whatever. One recovery option might be to roll back the transaction from the manager.

When all devices have committed the change and returned ok, that’s the end of the COMMIT phase. If this was a basic two-phase transaction, the connection could have been closed now, releasing locks and so on. In this case, as noted earlier, the manager requested a three-phase transaction with a CONFIRM phase. That is the phase that starts now.

During the CONFIRM phase, the manager determines whether or not the proposed new configuration, which is already running, is desirable. Is the service level agreement (SLA) met? Is the connectivity in place? Is the resource utilization still in the safe zone?

By default, the manager has at most 2 minutes to reach a decision about how to proceed. A shorter or longer timeframe may be specified by the manager, and the running timer extended, if desired. If the manager doesn’t send a confirming commit message to a device within the timeout period, the device rolls back the configuration change. Similarly, if the connection to the manager is lost, the device rolls back, unless the manager asked the device to allow disconnects during the CONFIRM phase.

This scheme protects against accidentally cutting of the management network and then having to involve a red-faced human making a call to someone at the other end with specific recovery instructions.

When the manager has measured, evaluated, and finally determined the desirability of the change, it communicates the verdict to all participating devices by dropping the connection (or sending an abort message) for rolling back or by sending a commit message for keeping the change. Example 10-50 shows the commit, transaction-id, and unlock messages the manager sends to m0 for keeping the new configuration.

Example 10-50 Manager Sends a Confirming commit, Reads transaction-id and Sends unlock to Devices

Click here to view code image

<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="7">
  <commit/>
</rpc>
...
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="8">
  <get xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">
    <filter>
      <netconf-state xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring">
        <datastores>
          <datastore>
            <name>running</name>
            <transaction-id xmlns="http://tail-f.com/yang/netconf-monitoring"/>
          </datastore>
        </datastores>
      </netconf-state>
    </filter>
  </get>
</rpc>
...
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="9">
  <unlock>
    <target>
      <candidate/>
    </target>
  </unlock>
</rpc>

A similar exchange happens for all other devices involved in this transaction (omitted here). Example 10-51 shows the response from m0.

Example 10-51 Device’s Response to commit, transaction-id, and unlock

Click here to view code image

<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
           message-id="7">
  <ok/>
</rpc-reply>
...
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
           message-id="8">
  <data>
    <netconf-state xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring">
      <datastores>
        <datastore>
          <name xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">running</name>
          <transaction-id xmlns="http://tail-f.com/yang/netconf-monitoring">1541-83740-854414</transaction-id>
        </datastore>
      </datastores>
    </netconf-state>
  </data>
</rpc-reply>
...
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
           message-id="9">
  <ok/>
</rpc-reply>

When all devices have responded, the transaction is complete, and unless there are more transactions pending for a given device, the connection may be closed. Example 10-52 shows the message sent to m0.

Example 10-52 The Manager’s Goodbye to the Device. The Device Closes the Connection.

Click here to view code image

<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
     message-id="10">
  <close-session/>
</rpc>

Interview with the Experts

Q&A with Kristian Larsson

Kristian Larsson had an early interest in computer networks, acquiring his first Cisco router in his teens. Hardly surprising, his career was quickly absorbed by nascent ISP and mobile carriers, premier among them Tele2, for nearly a decade—first operating large IP networks, then designing and automating them. When Deutsche Telekom started a clean-slate networking project called TeraStream, headed by disruptive innovation industry leader Axel Clauberg and designed by Internet legend Peter Löthberg, it was only natural for Kristian to join. Kristian’s DevOps profile with deep network design and programming skills are a perfect match for this all-new network design.

Question:

What’s the new insight behind TeraStream?

Answer:

I think it’s a refocus on the core tenets that made the Internet work well. You keep the network design simple. That leads to reliability, scalability, and enables you to automate its operation. Being simple, it is also much more cost efficient than the currently installed network. We run all services on top of this infrastructure, from residential Internet through business services and telephony to mobile. In a connected world, society runs on top of this network. It just has to work. If you run this network with software, then the software just has to work, too.

Traditionally automation is mostly about cost savings. We take a fundamentally different approach with a focus on reliability.

Question:

Deutsche Telekom (DT) is a major operator with lots of networks. How come DT came up with the idea to start from scratch? There are plenty of large and small operators around the world that didn’t.

Answer:

That’s a good question. Someone at DT realized a fresh start was needed to stay competitive. It’s so much easier to build a simple solution when starting from scratch. Any established organization suffers from silo behavior, leading to piecemeal fixes. You’re simply not going to get the same answer if you go to an existing organization and ask, how much smarter can you do what you do? It will never result in a globally optimal solution.

TeraStream is an excellent example where the whole is greater than the sum of its parts. When you can simplify some areas of your network, which allows you to automate it, it can in turn unlock completely new possibilities. Having a holistic view of the network and the software to manage it allows us to deliver a very efficient solution.

Question:

How did you get into the picture?

Answer:

I started my career operating networks and transitioned into more of design and architectural work, where I realized that automation was key. You can’t run a network comprising thousands of routers without automation. I had always been scripting, one liners in bash, Perl, or similar, and used this for many years to do things at scale. I don’t have a formal programming background but learned out of necessity. At one point I wrote my own configuration templating language. Looking back it wasn’t the best of choices but it certainly was educational, and it did solve some domain-specific problems that would have taken a lot more boilerplate in other languages.

With the TeraStream project, my focus turned entirely to writing software, not just some helpful automation scripts, but a software system able to handle all aspects of building and operating an optical and IP network.

Question:

You said the network just has to work. What is it then that you do that just makes it work?

Answer:

(Laughter) Right. I guess it’s first and foremost a way of working. We take Quality Assurance (QA) through Continuous Integration and Deployment (CI/CD) dead seriously. If you get into our development environment (we use GitLab) and open a merge request, the code will always be tested automatically in a virtual network before you are allowed to merge it.

Every merge request needs to have test coverage that proves the code works as intended. Same thing for documentation being up to date, and we encourage self-review. Good code and testing is important to us, in particular CI testing, where code is tested before deployment. Operating critical network infrastructure is different from doing daily deployments of a web app. Failure can be catastrophic, and rolling back might mean rolling a truck.

Much of this CI/CD infrastructure also doubles as development environment. We launch virtual routers for testing the same way as for a development environment. When you want to develop a new feature, you start a virtual network and work toward those routers. You can pick one or multiple router vendors to work with. Similarly, for testing, we use our software to configure a virtual network and then observe that network, making sure it conforms to our expectations. When someone asks, “Did you provision anything with this code?”, the answer is “Yes, we’ve used the code to provision tens of virtual networks every day, day in and day out. We’re very happy about the way this works.” Repeatability is key in both testing and development. The ability to bring up a clean, virtual network easily and quickly is essential for good results.

Question:

What made you choose NETCONF and YANG for your network?

Answer:

Since the dawn of the Internet, operating IP networks has pretty much been about the command-line interface (CLI). I think it’s clear to anyone who’s tried to parse or generate CLI configuration that it’s an interface primarily meant for human consumption. We needed something better to programmatically interface with our network devices. NETCONF/YANG was moving in the IETF, and we thought it was a good choice. Being purpose written, it is a protocol stack that is uniquely suited for the task. There are no other protocols that feature three phase commits, which NETCONF has, nor any modeling languages that can express network configuration and operational state as naturally as YANG can.

YANG in particular has struck a perfect balance of pragmatism and elegance. It, together with the meta-model behind it with declarative configuration, is applicable far outside the world of IP networks.

When you standardize on these protocols, you get rid of a whole swath of problems. It’s somewhat ironic that we are still interfacing many routers over CLI even today. I have worked with vendors for many years, helping to improve their NETCONF/YANG implementations. Sadly, it takes a very long time. In many cases, the assumptions and behavior of the CLI permeate the design so that the device can’t handle transactions properly. Transaction support is not something you can simply bolt on top of a legacy solution and get a good result. A proper implementation needs transactions at the core.

Our time is spent with a rather even split between development, testing, and integration toward legacy interfaces such as the CLI. As soon as a device has proper NETCONF/YANG support, that integration piece simply goes away. You don’t need to spend time on it. It’s a major time thief.

We’re working toward NETCONF/YANG everywhere. If you place NETCONF/YANG alongside other protocols out there, there’s no question that NETCONF/YANG is uniquely qualified for the task. It’s really good for us.

Question:

Have your expectations on NETCONF/YANG come true?

Answer:

They have, yes. Not only when compared to a world of CLI scraping but also when compared to a lot of the RESTful APIs out there, you find that NETCONF/YANG typically provides a better experience.

Question:

You have come far in the automation journey. What’s your next step?

Answer:

Our focus so far has almost entirely been on what I think of as the lowest layer of services, often called resource-facing services (RFS), that interface directly with devices. We’re keeping feature parity across four different vendor platforms, so it requires a fair bit of work. The RFS layer exposes a device- and vendor-neutral interface on top of which other services can be built. Really cool things start to emerge when you get up a level or two. You can quickly leave the realm of classic network configuration; imagine predictive capacity planning with the network telling you what hardware to order, or even doing it for you, instead of the other way around—or root cause analysis across all network layers.

A new and interesting development is the application of formal verification on networks. There are some exciting new systems that can reason and prove correctness of your network. Some are focusing on the forwarding plane, like how packets are sent. Others are focusing on the policy layer, which is how your routing policies will work. It’s a good complement to our existing CI testing.

Question:

Do you have any advice for folks who would like to get to where you are?

Answer:

First and foremost, it’s about putting the right people together. Seeing how it’s hard to find people with experience both in networking and software development, an alternative can be to team up in pairs; put the networking guy next to the programmer and let them work together. Another essential piece is to provide a development environment that is both true to reality and cheap and simple to set up. I think virtual routers play a crucial role there.

Question:

Do you have any advice for folks who would like to sell you some devices?

Answer:

NETCONF, please! (Laughter) Well, that’s pretty much it. We have a plethora of hardware requirements, but from a network management perspective, it’s all about proper NETCONF/YANG—and software deliveries in parity for physical and virtual routers so it can be easily tested!

Summary

This chapter started out describing a business automation idea for the fictitious enterprise BookZone, then discussed how the BookZone staff created a high-level service YANG module, capturing the user interface for the service. Next, it examined how this service could be implemented in an orchestrator using a little templates and code.

The next phase was about plugging actual or simulated devices into the orchestrator, configuring them for NETCONF, and checking that their capabilities match the expectations. With service and devices in place, it was time to create some service instances and observe how NETCONF messages play out on the network. Finally, services were modified, changes rolled back in a transactional manner, and observed on the wire.

At the end there was an interview with Kristian Larsson, automation engineer in the famous TeraStream project, about working toward complete automation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 10. Using NETCONF and YANG

Create new playlist

Sign In

Sign Up

Chapter 10. Using NETCONF and YANG

Introduction

So the Story Goes

Top-Down Service Model

Bottom-Up Device Templates

Service Logic Connecting the Dots

Setting Up NETCONF on a Device

Discovering What’s on a Device

Managing Services

Manager Synchronization with Devices

Network-Wide Transactions

Interview with the Experts

Q&A with Kristian Larsson

Summary

Table of Contents for
Chapter 10. Using NETCONF and YANG