Network Operations Center

When an IT service desk receives a trouble call from a user stating “I can’t connect to the network,” to a network analyst that could mean the user can’t connect to an application. But what if the network is down? Where is it down? What exactly is broken? These are questions one asks when performing problem resolution. Hence, network troubleshooting is the same thing as problem resolution.

A network operations center (NOC) or IT service desk (e.g., help desk) is the central location where network administrators and support personnel are located. The NOC is where continuous monitoring occurs. It is also the center of the organization’s network infrastructure. This is where problem resolution and troubleshooting occur on the front line. For organizations that demand high availability, network operations is a 24 × 7 × 365 job. The NOC’s main purpose is to ensure availability of the network. This includes local area networks (LANs), metropolitan area networks (MANs), and wide area networks (WANs) and management of service providers.

This can lead to security incidents, which would cause the organization’s incident response team to be engaged. The functions of a NOC typically include the following:

  • Asset inventory and management—This includes network infrastructure hardware assets, software licensing assets, and management of these assets.
  • Business communications—This includes voice, video, email, and unified communications.
  • Capacity planning—This includes monitoring bandwidth, identifying scalable growth, and managing implementation.
  • Data backup—Networks are needed to support daily, nightly, and weekly data backups. More important is how data is recovered through the network when needed.
  • Incident response—Scope is limited to ticketing only and WAN or LAN network availability. Security incidents and tickets will be forwarded to the IT security organization and incident response team.
  • Firewall and IDS/IPS management—Sometimes network administrators double as firewall managers. In other organizations, the IT security organization is responsible for layered security solutions and security appliances.
  • Maintenance—NOC members adhere to hardware and software maintenance agreements.
  • Network monitoring—This requires 24 × 7 × 365 continuous monitoring solutions to manage availability and confirm service level agreements (SLAs).
  • Patch management—This is part of vulnerability management and software patching for production network assets.
  • Performance management—Part of capacity planning and monitoring. Throughput, bits transmitted, dropped packets, and availability are some of the performance metrics that can be monitored and linked to an SLA.
  • SLA management—Service providers require the customer to prove or validate if an SLA is not met during a billing cycle.
  • Service provider and vendor management—When a circuit or network link goes down, problem resolution, escalation, and third-party vendor management is critical.
  • Vulnerability management—This is part of ongoing vulnerability scanning, configuration and change management, and performing patch updates on network assets.

Network Troubleshooting and Ticketing

There are many approaches and methodologies regarding problem resolution. A common approach is for the user to call the IT service desk, and then the IT service desk analyst opens a ticket with the NOC or IT service desk. The ticket must capture and document the IT assets and network connection involved. In general, the NOC or IT service desk analyst must perform some immediate problem identification and insert the information into the trouble ticket. This is the first step in problem resolution. The following are typical problem identification steps performed during the initial opening of a ticket:

  1. Interview the user. Capture initial user information and details about the current state of the problem, including any steps taken by the user prior to the call.
  2. Identify the symptoms of the network problem. For example, is there no network connectivity, or is the network slow? Is there a network-wide problem, or are the issues being experienced by only one user?
  3. Determine if anything had changed in the network before the issues appeared. Is a new piece of hardware in use? Have new users been added to the network? Has there been a software update or change somewhere in the network?
  4. Define individual problems clearly. Sometimes a network can have multiple problems. Identify each issue separately so your solution to one isn’t bogged down by other unsolved problems.

NOC and IT Service Desk Structure

Most NOCs and IT service desks have a hierarchical structure to capture initial tickets and perform problem resolution. These can be referred to as levels. Here is a sample hierarchical structure and their definitions:

  • Level 1 support—This is the first line of defense for the IT service desk and the point of entry for users. Level 1 is typically supported by an 800 number or email address. The goal of Level 1 is to open a ticket and capture as much preliminary information as possible. The second goal is to resolve the problem at Level 1 support if possible.
  • Level 2 support—If the problem is not resolved at Level 1, a network- and protocol- knowledgeable network analyst or engineer is required to rule out Layer 1, 2, 3, and 4 network- and protocol-related issues.
  • Level 3 support—If the problem is not resolved at Level 2, a senior network engineer and protocol expert is required to assist with the application owner and the user regarding an application issue. In addition, service provider escalation is performed by a Level 3 senior network engineer or manager.
  • Level 4 support—If the problem is not resolved at Level 3, the NOC director or manager will recommend to the application owner that a vendor or subject matter expert consultant be hired to assist with the Application Layer problem resolution efforts.

FIGURE 15-1 depicts a user-to-IT-service-desk problem resolution workflow.

A flow diagram. The structure has multiple levels, starting from Level 1 through multiple support levels until there is solution to the problem. The goal of I T service desk, Level 1, is to open a ticket and document, and to gather preliminary information for the ticket. The other goal is to resolve the problem at Level 1 support. On resolution of the problem at Level 1, the ticket is closed. If the problem is not resolved at Level 1, the problem is further investigated and remediated at Level 2 Support. If the problem is not resolved at Level 2, the problem is further investigated and remediated at Level 3 Support. This continues up to N levels until the problem is resolved and the ticket is closed.

FIGURE 15-1 NOC or IT service desk problem resolution workflow.

FYI

Internal IT organizations and service desks can define service level agreements (SLAs). An SLA is an agreement among the users, the departments, and the IT organization. This agreement defines a commitment by the IT organization to the user community for providing technical support and troubleshooting assistance. SLAs define response times and communication updates for users and department heads that are impacted by the outage. Organizational SLAs are defined for problem resolution and response time to users and departments. Escalation procedures define a process for users to obtain a response within a period of time. This is typically aligned to IT service desk response times.

Here is an example of an internal organization’s SLAs:

  • Level 2 response time—A 15-minute telephone or email response with follow-up every 60 minutes; if no follow-up, escalate this to Level 3.
  • Level 3 response time—A 10-minute telephone or email response with follow-up every 30 minutes; if no follow-up, escalate this to Level 4, where the director or manager of the NOC or IT service desk assumes responsibility moving forward.
  • Level 4 response time—A special project is crafted and led by the director or manager of the NOC or IT services, the application owner, and any third parties regarding problem resolution.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.9.147