Chapter 2. DDoS Detection

The first step in mitigating a DDoS attack is to know the attack is happening. This might sound obvious, since a volumetric attack will by nature tie up computing resources, such as bandwidth, CPU, buffer, memory, or a combination of all of those. But just as DoS, distributed or otherwise, comes in many shapes and sizes, our detection needs to match the ever-increasing types of attacks.

There are many ways to stop an ongoing or potential attack, some of them are obvious, some are less known. Our goal for detection is to quickly and accurately diagnose the attack and lower the mean time to mitigation.

In this chapter, we will look at some of the common ways to detect DDoS attacks using information gathered in poll-based and flow-based monitoring. When needed, there are instances where we need to perform packet inspection using network mirrors. We can also use anomalies and a frequency-based detection mechanism for possible DDoS attacks.

It is our opinion that there is no single detection mechanism that can detect all types of DDoS attacks. In our experience, whenever possible, all of the detection technologies mentioned in this chapter should be set up in advance and continuously validated with ongoing feedback from live traffic. The machine needs to be trained to recognize potential signals of attack from actual attacks in order to accurately predict the next one.

Tools in Your Detection Toolbelt

It is our opinion that there is no single detection mechanism that is able to detect all of the DDoS attacks! If possible, all of the detection technologies mentioned in this chapter should be set up in advance and continued to be validated with ongoing feedback with live traffic. We should leverage all data sources with the intention to help identify and understand the impact of any given attacks.

Let’s begin by looking at the poll-based network detection.

Poll-Based Monitoring and Detection

The first place to start in your detection strategy is to examine the current reporting capabilities of the hardware and software in your infrastructure. Simple Network Management Protocol (SNMP) is a mature internet standard protocol defined in RFC 3411–3418 for collecting and organizing information about networked devices. It is widely supported on routers, switches, servers, workstations, and more.

The basic operation of SNMP consists of one or more management stations responsible for collecting the data from a group of hosts and devices. The managed node typically has an SNMP agent that is responsible for returning the data to the manager in a standardized format conforming to the RFC. The agent serves as a proxy that in turn queries the subagent in each device. This setup subsequently hides the proprietary components that make monitoring different proprietary systems easier.

The poll-based information retrieval can be handy because it is likely that it already exists in your devices. Once you have a management station in place, the incremental effort involved in adding a new managed node is minimal.

In terms of DDoS, SNMP can generally reveal device health information that shows signs of stress at points in your network, such as the following:

  • Saturated interfaces

  • High CPU

  • High packets-per-second

  • High rate of packet losses

Generally, when the device is under a DDoS attack, you would see a significant deviation of the metric you are tracking from the normal usage, such as the spike in network traffic shown in Figure 2-1. As mentioned, this is usually an indication of stress, and the administrator should perform further investigation in order to determine the cause of the stress. The result could have been caused by a DDoS attack but does not have to be.

Figure 2-1. Bandwidth spike (source: http://bit.ly/2EurjMI/)

The poll-based detection mechanism is handy and useful, but the operation tends to be control-plane based and CPU-intensive. We have been in an environment where multiple management stations were polling information from a network device at a high frequency. When we reduced the number of pollers, the CPU level dropped by 30%.

First Layer of Detection: SNMP

SNMP is a mature protocol that serves as a common denominator among network and computing devices. It is a great first response detection mechanism and should be a starting point of reference for network behavior. However, it is less likely to provide more meaningful insight other than the fact that your network is under stress.

Imagine a time when your device is under stress, such as during a DDoS attack, and the only way to retrieve more information will add even more CPU cycle to the device such as SNMP poll, thus adding more stress to the device. SNMP might not be the best choice of tools and care needs to be taken when using SNMP. But since they are so widely used and adapted, they can be a useful first alert detection tool in your DDoS detection toolbelt.

Flow-Based Network Parameter Detections

Compared to a poll-based detection mechanism, a flow-based network detection is push-based. Shown in Figure 2-2, the device information is collected on the device itself and pushed to the collector. The basic operation consists of flow exporters and collectors. Similar to SNMP, the collector is a central aggregation point for multiple exporters. Unlike SNMP, the exporter on the device is responsible for aggregating the information before export to the collector. This task delegation allows the exporter, usually the network and system devices, to place a higher priority (if necessary) on more critical operations, such as processing BGP control packets.

The flow-based monitoring mechanism was first introduced by Cisco in the form of NetFlow; many vendors have similar mechanism but with different names, such as JFlow or CFlowd for Juniper Networks, and NetStream for Huawei Technologies. RFC 7012 is the latest IETF standard that tracks IPFIX based on NetFlow v9.

Figure 2-2. NetFlow architecture (source: http://bit.ly/2E3C2Qp)

Flow-based technologies can often perform the same function as SNMP with less CPU cycle. Although mainly used as a flow observer, in the newer version of IPFIX, the exporter can export more relevant information than its SNMP counterpart with template-based configuration that allows more agile adaptation to newer information.

Being newer, vendor-introduced technology, NetFlow and its variants take longer to sort out and set up; however, given its usefulness, it is an invaluable tool in the DDoS detection and should be used whenever possible. The most useful nature of NetFlow is its ability to identify high offenders individually. For example, the SNMP data is usually collected on a per-interface level where you see the total bytes and packets per time interval on a network interface. When drilling down, NetFlow can be used to identify which source IP is the offender. This information is critical for mitigation, which we will cover in Chapter 3.

Flow Information Identifies Individual Offenders

Flow information can identify the top-N traffic usage by source and destination IP. Since infrastructure devices are typically shared among many resources, this information is critical to our mitigation strategy. Figure 2-3 shows an example output.

Figure 2-3. IPFIX screen output

In a typical flow, such as a client web browser downloading a webpage, the number of packets is not known in advance. The exporter will take the first packet unique to the 5-tuple network header and identify the subsequent packets matching the information. When the flow is deemed finished, such as by timeout value or TCP FIN or RST, the number of packets and byte count is tallied and exported.

As such, the exporter needs to keep track of the flow information, record the flow information, and export it at the end. It is important to note that the exporter uses onboard resources, such as TCAM, to keep track of the flows before exporting. Because the network today can process thousands of flows per second, the flow information is generally taken in samples due to resource constraints. Therefore the information is typically expressed in “1 in N packets” sampling with the degree of error in an inverse relationship with the N packets. The higher the N, the less accurate the flow information is. When designing a NetFlow architecture, it is always a balancing act between accuracy and device overhead.   

Sampled Flow (RFC 3176), or sFlow, on the other hand, try to lessen the exporter resource burden by placing the calculation and flow state information to the collector. It does so by doing a “1 in N” sampling as well as the interface counter for the same time period while exporting the sampling packet right away without keeping flow state information on the device. By doing a simple calculation of correlating the two numbers, the collector can analyze the data and derive an estimate of the individual flow usage.

sFlow was originally developed by InMon but aims to be open source, multivendor supported, and in a scaled-out design. The technology proves to be popular with so-called “white box” or newer vendors who need to lower overhead on network devices by focusing their limited resources on core functions, such as routing and switching. In Figure 2-4, we see an example of sFlow in operation.

Figure 2-4. sFlow in operation (source: http://bit.ly/2nsbbUI)

Compare to SNMP, flow-based detection technology is newer and more fragmented. For example, the operator might need to implement different collectors for NetFlow and sFlow. However, because it is one of the only technologies that can identify individual usage information, it is critical in DDoS detection. Besides immediate mitigation needs, this information is often used if you need to take legal actions against the attackers.

Between the two approaches of flow-based network monitoring mechanisms, there is obviously no right or wrong solution. Sometimes you need to go with the technology that is already part of your network; other times it is worth exploring new technologies. Generally, we prefer the sFlow technology over NetFlow because of scalability and broader vendor support.

FastNetMon Project

One of the open source projects we participate and contribute to is FastNetMon. It has both an open source community and a commercial paid edition. The project aims to use flow exports to quickly detect DDoS attacks and automatically trigger mitigation techniques.

Network Mirrors and Deep Packet Inspection

The technologies we have mentioned so far mainly covered up to Layer 4 in the OSI model. They are suitable for monitoring and detecting activities at scale in a macro-level for your infrastructure. Whenever we see a segment in a movie or TV show depicting a Network Operations Center (NOC), or a real-world NOC for that matter, macro-level monitoring is the type of output that is rightfully projected on the giant screen while the engineers look busy doing some analyzation of the data.

While SNMP and flow data can give you a great place to start, they sometimes sacrifice the details in favor of scale: SNMP, by nature, is not meant to dissect beyond the basics of the packet payload, and we already discussed the sampling nature of flow-based detection. Imagine a slow-and-low attack on your HTTP web server like the one that we mentioned in Chapter 1. In order to detect the specifics of the attack, we need to actually look at the contents of the packets instead of relying on just the header. This is typically done by placing a network mirror that indentifies a source port on a network device, makes a copy of the transmitted packet, and transmits out of the mirror port.

As illustrated in Figures 2-5 and 2-6, in many instances the only way to be 100 percent positive of the attack behavior is to look at the packets in detail. In both cases, we are able to see the payload of the packet. In the case of NTP amplification, we are able to see the NTP Monlist IP addresses that we can use for mitigation.

Figure 2-5. SSDP amplification packet
Figure 2-6. NTP amplification packet

While simple network mirrors are easy to construct, they are difficult to replicate in scale. With the advance of software defined networking (SDN), big data, machine learning, and cloud, we are seeing an increase of technologies that combine the three fields into an attractive DDoS detection mechanism:

  • SDN, in the form of OpenFlow protocol (Figure 2-7), can offer two advantages over the traditional network in terms of monitoring and detection:

    • More precise matching of packets: as much as 15-tuple criteria of matching.

    • Once matched, the controller provides the mechanism to replicate traffic flow on demand without impacting the original flow.

  • Big data technology provides a way to store and index data for efficient information gathering.

  • Machine learning allows for an automatic self-learning cycle of the DDoS training set.

  • Public and hybrid cloud provides a lower bar of entry for utilizing SDN, big data, and machine learning.

Figure 2-7. OpenFlow controller-based network monitoring (source: http://bit.ly/2FzuDp7)

It is worth pointing out that the technologies we have mentioned can be decoupled and used independently of each other. Another example of real-time packet inspection is shown in Figure 2-8.

Figure 2-8. Packet inspection and reporting (source: http://bit.ly/2DStzjP)

With the rise of Bring Your Own Device (BYOD), we have users of the technology bringing their own device while utilizing the services, such as email, provided by the company. We have also seen a growing trend of host-based monitoring and detection in the marketplace both in commercial and open source projects. While they are great for detecting a breach of security, such as social engineering and compromised data breach, they are not as relevant for DDoS attacks. They can provide value in specific use cases when the agent is installed on a host that is under attack and we need to isolate the attacker and pattern. But in general, they are more useful in detecting other types of security breaches than DDoS detection.

Anomalies and Frequency-Based Detections

We are still in the early stage of machine learning, but it is already showing great promise in making detection of DDoS attacks easier.

If we take a step back and review the steps we normally take in detecting a DDoS attack, they typically include:

  • Baseline our normal traffic usage, such as interface utilization level, requests per second, etc. This baselining needs to take into account the normal fluctuation over the course of a day, quarter, and year. 
  • Detect any deviation from our defined normal usage. For example, in the SNMP section, we see a burst of traffic that is five times our normal usage. 
  • Further examination to see if the event was caused by a known event, such as an e-commerce site during a Black Friday sale, or if it was caused by DDoS attacks. 
  • If not caused by a known event, we will start to collect information and match against the well-known pattern of attacks, and decide mitigation action.
  • Document the event for future reference and knowledge. 

Many of the steps can be replaced by computers with machine learning capabilities. In fact, the computer is much better suited for the job because it can identify “needle in the haystack” types of anomalies much better than a human can. Elasticsearch is an open source technology that supports scalable, near-real-time search technology. Along with its sister projects Logstash and Kibana, sometimes referred to as the ELK stack, it is a great example of how machine learning can drastically help with DDoS detection.

We will use the following workflow as an illustration of the example:

  1. Collect NetFlow, SNMP, and log information via Logstash input.

  2. Normalize and augment data via Logstash filters and databases.

  3. Output data to Elasticsearch for indexing.

  4. Use machine learning x-pack to create a model baseline of data set, identify anomalies from baseline, and correlate influencers as the cause of outliers.

The example in Figure 2-9 shows a continuation of baselining traffic data.

Figure 2-9. Modeling of data (source: http://bit.ly/2GDJuAu)

Once the baseline is determined, Figure 2-10 shows that an outlier can be identified.

Figure 2-10. Outlier identification (source: http://bit.ly/2GDJuAu)

A correlation of event to outcome can be guessed, as shown in Figure 2-11.

Figure 2-11. Influence of outlier (source: http://bit.ly/2GDJuAu)

The biggest gain from the workflow is a continuous baselining of traffic. Keep in mind that the first time an outlier event happens, even as a known event, it will generate an alert. A good example would be during the year-end holiday season when sales volume is expectedly higher than normal. If this is the first year the model is being built, a false positive alert will be generated. However, as time goes on, the model will become more accurate.

Another open source tool that has gained a lot of traction is Graylog. This is a more log-centric approach where you can centrally collect Syslog and event log messages and spot problems early.

Summary

In this chapter, we identified the various DDoS detection methods and mechanisms. We looked at SNMP and flow-based detection, as well as network mirrors and packet inspection. As we move into the world of machine learning, it is showing great promise in making DDoS detection easier and more autonomous.

In the next chapter, you will use the data we collected from the network and application and start to examine different types of mitigation and countermeasures against DDoS attacks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.15.99