Building a network management application

For this chapter, we will build a very simple network management application, and then write different types of tests and check their coverage. This network management application is focused on digesting alarms, also referred to as network events. This is different from certain other network management tools that focus on gathering SNMP alarms from devices.

For reasons of simplicity, this correlation engine doesn't contain complex rules, but instead contains simple mapping of network events onto equipment and customer service inventory. We'll explore this in the next few paragraphs as we dig through the code.

How to do it...

With the following steps, we will build a simple network management application.

  1. Create a file called network.py to store the network application.
  2. Create a class definition to represent a network event.
    class Event(object):
        def __init__(self, hostname, condition, severity, event_time):
            self.hostname = hostname
            self.condition = condition
            self.severity = severity
            self.id = -1
    
        def __str__(self):
            return "(ID:%s) %s:%s - %s" % (self.id, self.hostname, self.condition, self.severity)
    • hostname: It is assumed that all network alarms originate from pieces of equipment that have a hostname.
    • condition: Indicates the type of alarm being generated. Two different alarming conditions can come from the same device.
    • severity: 1 indicates a clear, green status; and 5 indicates a faulty, red status.
    • id: The primary key value used when the event is stored in a database.
  3. Create a new file called network.sql to contain the SQL code.
  4. Create a SQL script that sets up the database and adds the definition for storing network events.
    CREATE TABLE EVENTS (
        ID INTEGER PRIMARY KEY,
        HOST_NAME TEXT,
        SEVERITY INTEGER,
        EVENT_CONDITION TEXT
        );
  5. Code a high-level algorithm where events are assessed for impact to equipment and customer services and add it to network.py.
    from springpython.database.core import*
    
    class EventCorrelator(object):
        def __init__(self, factory):
            self.dt = DatabaseTemplate(factory)
    
        def __del__(self):
            del(self.dt)
    
        def process(self, event):
            stored_event, is_active = self.store_event(event)
    
            affected_services, affected_equip = self.impact(event)
    
            updated_services = [
                self.update_service(service, event)
                for service in affected_services]
            updated_equipment = [
                self.update_equipment(equip, event)
                for equip in affected_equip]
    
            return (stored_event, is_active, updated_services, updated_equipment)

    Note

    The __init__ method contains some setup code to create a DatabaseTemplate. This is a Spring Python utility class used for database operations. See http://static.springsource.org/spring-python/1.2.x/sphinx/html/dao.html for more details. We are also using sqlite3 as our database engine, since it is a standard part of Python.

    The process method contains some simple steps to process an incoming event.

    • We first need to store the event in the EVENTS table. This includes evaluating whether or not it is an active event, meaning that it is actively impacting a piece of equipment.
    • Then we determine what equipment and what services the event impacts.
    • Next, we update the affected services by determining whether it causes any service outages or restorations.
    • Then we update the affected equipment by determining whether it fails or clears a device.
    • Finally, we return a tuple containing all the affected assets to support any screen interfaces that could be developed on top of this.
  6. Implement the store_event algorithm.
        def store_event(self, event):
            try:
                max_id = self.dt.query_for_int("""select max(ID)
                                                  from EVENTS""")
            except DataAccessException, e:
                max_id = 0
    
            event.id = max_id+1
    
            self.dt.update("""insert into EVENTS
                              (ID, HOST_NAME, SEVERITY,
                                             EVENT_CONDITION)
                              values
                              (?,?,?,?)""",
                           (event.id, event.hostname,
                            event.severity, event.condition))
    
            is_active = 
                     self.add_or_remove_from_active_events(event)
    
            return (event, is_active)

    This method stores every event that is processed. This supports many things including data mining and post mortem analysis of outages. It is also the authoritative place where other event-related data can point back using a foreign key.

    • The store_event method looks up the maximum primary key value from the EVENTS table.
    • It increments it by one.
    • It assigns it to event.id.
    • It then inserts it into the EVENTS table.
    • Next, it calls a method to evaluate whether or not the event should be added to the list of active events, or if it clears out existing active events. Active events are events that are actively causing a piece of equipment to be unclear.
    • Finally, it returns a tuple containing the event and whether or not it was classified as an active event.

    Note

    For a more sophisticated system, some sort of partitioning solution needs to be implemented. Querying against a table containing millions of rows is very inefficient. However, this is for demonstration purposes only, so we will skip scaling as well as performance and security.

  7. Implement the method to evaluate whether to add or remove active events.
    def add_or_remove_from_active_events(self, event):
            """Active events are current ones that cause equipment
               and/or services to be down."""
    
            if event.severity == 1:
                self.dt.update("""delete from ACTIVE_EVENTS
                                  where EVENT_FK in (
                                   select ID   
                                   from   EVENTS   
                                   where  HOST_NAME = ?   
                                   and    EVENT_CONDITION = ?)""",   
                               (event.hostname,event.condition))
                return False
            else:
                self.dt.execute("""insert into ACTIVE_EVENTS
                                   (EVENT_FK) values (?)""",
                                (event.id,))
                return True

    When a device fails, it sends a severity 5 event. This is an active event and in this method, a row is inserted into the ACTIVE_EVENTS table, with a foreign key pointing back to the EVENTS table. Then we return back True, indicating this is an active event.

  8. Add the table definition for ACTIVE_EVENTS to the SQL script.
    CREATE TABLE ACTIVE_EVENTS (
        ID INTEGER PRIMARY KEY,
        EVENT_FK,
        FOREIGN KEY(EVENT_FK) REFERENCES EVENTS(ID)
        );

    This table makes it easy to query what events are currently causing equipment failures.

    Later, when the failing condition on the device clears, it sends a severity 1 event. This means that severity 1 events are never active, since they aren't contributing to a piece of equipment being down. In our previous method, we search for any active events that have the same hostname and condition, and delete them. Then we return False, indicating this is not an active event.

  9. Write the method that evaluates the services and pieces of equipment that are affected by the network event.
        def impact(self, event):
            """Look up this event has impact on either equipment
               or services."""
    
            affected_equipment = self.dt.query(
                           """select * from EQUIPMENT
                              where HOST_NAME = ?""",
                           (event.hostname,),
                           rowhandler=DictionaryRowMapper())
    
            affected_services = self.dt.query(
                           """select SERVICE.*
                              from   SERVICE
                              join   SERVICE_MAPPING SM
                              on (SERVICE.ID = SM.SERVICE_FK)
                              join   EQUIPMENT
                              on (SM.EQUIPMENT_FK = EQUIPMENT.ID)
                              where  EQUIPMENT.HOST_NAME = ?""",
                              (event.hostname,),
                              rowhandler=DictionaryRowMapper())
    
            return (affected_services, affected_equipment)
    • We first query the EQUIPMENT table to see if event.hostname matches anything.
    • Next, we join the SERVICE table to the EQUIPMENT table through a many-to-many relationship tracked by the SERVICE_MAPPING table. Any service that is related to the equipment that the event was reported on is captured.
    • Finally, we return a tuple containing both the list of equipment and list of services that are potentially impacted.

    Note

    Spring Python provides a convenient query operation that returns a list of objects mapped to every row of the query. It also provides an out-of-the-box DictionaryRowMapper that converts each row into a Python dictionary, with the keys matching the column names.

  10. Add the table definitions to the SQL script for EQUIPMENT, SERVICE, and SERVICE_MAPPING.
    CREATE TABLE EQUIPMENT (
        ID        INTEGER PRIMARY KEY,
        HOST_NAME TEXT    UNIQUE,
        STATUS    INTEGER
        );
    CREATE TABLE SERVICE (
        ID INTEGER PRIMARY KEY,
        NAME TEXT UNIQUE,
        STATUS TEXT
        );
    
    CREATE TABLE SERVICE_MAPPING (
        ID INTEGER PRIMARY KEY,
        SERVICE_FK,
        EQUIPMENT_FK,
        FOREIGN KEY(SERVICE_FK) REFERENCES SERVICE(ID),
        FOREIGN KEY(EQUIPMENT_FK) REFERENCES EQUIPMENT(ID)
        );
  11. Write the update_service method that stores or clears service-related events, and then updates the service's status based on the remaining active events.
        def update_service(self, service, event):
            if event.severity == 1:
                self.dt.update("""delete from SERVICE_EVENTS
                                  where EVENT_FK in (
                                      select ID
                                      from EVENTS
                                      where HOST_NAME = ?
                                      and EVENT_CONDITION = ?)""",
                               (event.hostname,event.condition))
            else:
                self.dt.execute("""insert into SERVICE_EVENTS
                                   (EVENT_FK, SERVICE_FK)
                                   values (?,?)""",
                                (event.id,service["ID"]))
    
            try:
                max = self.dt.query_for_int(
                                """select max(EVENTS.SEVERITY)
                                   from SERVICE_EVENTS SE
                                   join EVENTS
                                   on (EVENTS.ID = SE.EVENT_FK)
                                   join SERVICE
                                   on (SERVICE.ID = SE.SERVICE_FK)
                                   where SERVICE.NAME = ?""",
                               (service["NAME"],))
            except DataAccessException, e:
                max = 1
    
            if max > 1 and service["STATUS"] == "Operational":
                service["STATUS"] = "Outage"
                self.dt.update("""update SERVICE
                                  set STATUS = ?
                                  where ID = ?""",
                               (service["STATUS"], service["ID"]))
    
            if max == 1 and service["STATUS"] == "Outage":
                service["STATUS"] = "Operational"
                self.dt.update("""update SERVICE
                                  set STATUS = ?
                                  where ID = ?""",
                               (service["STATUS"], service["ID"]))
    
            if event.severity == 1:
                return {"service":service, "is_active":False}
            else:
                return {"service":service, "is_active":True}

    Service-related events are active events related to a service. A single event can be related to many services. For example, what if we were monitoring a wireless router that provided Internet service to a lot of users, and it reported a critical error? This one event would be mapped as an impact to all the end users. When a new active event is processed, it is stored in SERVICE_EVENTS for each related service.

    Then, when a clearing event is processed, the previous service event must be deleted from the SERVICE_EVENTS table.

  12. Add the table definition for SERVICE_EVENTS to the SQL script.
    CREATE TABLE SERVICE_EVENTS (
        ID INTEGER PRIMARY KEY,
        SERVICE_FK,
        EVENT_FK,
        FOREIGN KEY(SERVICE_FK) REFERENCES SERVICE(ID),
        FOREIGN KEY(EVENT_FK) REFERENCES EVENTS(ID)
        );

    Tip

    It is important to recognize that deleting an entry from SERVICE_EVENTS doesn't mean that we delete the original event from the EVENTS table. Instead, we are merely indicating that the original active event is no longer active and it does not impact the related service.

  13. Prepend the entire SQL script with drop statements, making it possible to run the script for several recipes.
    DROP TABLE IF EXISTS SERVICE_MAPPING;
    DROP TABLE IF EXISTS SERVICE_EVENTS;
    DROP TABLE IF EXISTS ACTIVE_EVENTS;
    DROP TABLE IF EXISTS EQUIPMENT;
    DROP TABLE IF EXISTS SERVICE;
    DROP TABLE IF EXISTS EVENTS;
  14. Append the SQL script used for database setup with inserts to preload some equipment and services.
    INSERT into EQUIPMENT (ID, HOST_NAME, STATUS) values (1, 'pyhost1', 1);
    INSERT into EQUIPMENT (ID, HOST_NAME, STATUS) values (2, 'pyhost2', 1);
    INSERT into EQUIPMENT (ID, HOST_NAME, STATUS) values (3, 'pyhost3', 1);
    
    INSERT into SERVICE (ID, NAME, STATUS) values (1, 'service-abc', 'Operational'),
    INSERT into SERVICE (ID, NAME, STATUS) values (2, 'service-xyz', 'Outage'),
    
    INSERT into SERVICE_MAPPING (SERVICE_FK, EQUIPMENT_FK) values (1,1);
    INSERT into SERVICE_MAPPING (SERVICE_FK, EQUIPMENT_FK) values (1,2);
    
    INSERT into SERVICE_MAPPING (SERVICE_FK, EQUIPMENT_FK) values (2,1);
    INSERT into SERVICE_MAPPING (SERVICE_FK, EQUIPMENT_FK) values (2,3);
  15. Finally, write the method that updates equipment status based on the current active events.
        def update_equipment(self, equip, event):
            try:
                max = self.dt.query_for_int(
                           """select max(EVENTS.SEVERITY)
                              from ACTIVE_EVENTS AE
                              join EVENTS
                              on (EVENTS.ID = AE.EVENT_FK)
                              where EVENTS.HOST_NAME = ?""",
                           (event.hostname,))
            except DataAccessException:
                max = 1
    
            if max != equip["STATUS"]:
                equip["STATUS"] = max
                self.dt.update("""update EQUIPMENT
                                  set STATUS = ?""",
                               (equip["STATUS"],))
    
            return equip

Here, we need to find the maximum severity from the list of active events for a given host name. If there are no active events, then Spring Python raises a DataAccessException and we translate that to a severity of 1.

We check if this is different from the existing device's status. If so, we issue a SQL update. Finally, we return the record for the device, with its status updated appropriately.

How it works...

This application uses a database-backed mechanism to process incoming network events, and checks them against the inventory of equipment and services to evaluate failures and restorations. Our application doesn't handle specialized devices or unusual types of services. This real-world complexity has been traded in for a relatively simple application, which can be used to write various test recipes.

Note

Events typically map to a single piece of equipment and to zero or more services. A service can be thought of as a string of equipment used to provide a type of service to the customer. New failing events are considered active until a clearing event arrives. Active events, when aggregated against a piece of equipment, define its current status. Active events, when aggregated against a service, defines the service's current status.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.170.223