Appendix B. Integrating Open Source Intelligence

The community of security professionals works tirelessly toward the goals of securing perimeters, preventing breaches, and keeping hackers out. Because of how attackers commonly target more than one organization at a time, there are significant merits to information sharing and fluidity in strengthening the line of defense. Security intelligence sharing has proven to be quite useful in detecting attacks and assessing risk. The term Open Source Intelligence (OSINT) is used to refer to data that has been collected from various sources (not necessarily in the context of security) and is shared with other systems that can use it to drive predictions and actions. Let’s take a brief look at a few different types of open source intelligence and consider its impact in the context of security machine learning systems. Our coverage is by no means exhaustive; we refer you to the literature1,2,3 for more information.

Security Intelligence Feeds

Threat intelligence feeds can be a double-edged sword when applied to security machine learning systems. The most common manifestation of security intelligence is the real-time IP or email blacklist feed. By collecting the latest attack trends and characteristics from honeypots, crawlers, scanners, and proprietary sources, these feeds provide an up-to-date list of values that can be used by other systems as a feature for classifying entities. For instance, the Spamhaus Project tracks spam, malware, and phishing vectors around the world, providing real-time feeds of mail server, hijacked server, and end-user IP addresses that its data and analysts have determined to be consistently exhibiting bad behavior online. A subscriber to Spamhaus blocklists can query an endpoint to find out if a request coming into their system has exhibited bad behavior elsewhere on the internet. The response can then motivate secondary decisions or actions, such as increasing the risk score of this request if it has been marked as originating from a potentially hijacked server.

A common problem observed by consumers of threat intelligence feeds is the reliability and applicability of the feeds across different systems. What has been determined to be a threat in one context might not be a threat in every other context. Furthermore, how can we guarantee that the feeds are reliable and have not themselves been subject to poisoning attacks? These are questions that can severely limit the direct applicability of threat intelligence feeds in many systems. The Threat Intelligence Quotient Test is a system (not currently under active development) that allows for the “easy statistical comparison of different threat intelligence indicator sources such as novelty, overlap, population, aging and uniqueness.” Tools such as this one can help you to measure and compare the reliability and usefulness of threat feeds.

Despite their drawbacks, security intelligence feeds can provide useful features for enriching datasets or for using as a source of confirmation when your security machine learning system suspects an entity to be malicious.

Another common use of threat intelligence feeds is to fuel entity reputation systems that keep track of the history of an IP address, domain, or user account’s historical behavior. Mature organizations typically maintain a compounding4 knowledge base of entities in a system that will contribute to how much trust they place in an entity. For instance, if an IP address originating from Eastern Europe has consistently been showing up in threat intelligence feeds as a host potentially hijacked by a botnet, its score in the IP reputation database will probably be low. When a future request originating from that IP address exhibits the slightest sign of anomaly, we might go ahead and take action on it, whereas we might give more leeway to an IP address with no history of malice.

Geolocation

The IP address is the most common unit of threat identification for web applications. Because every request originates from an IP address and most addresses can be associated with a set of physical location coordinates, collecting IP addresses enables data analysts to obtain information about the initiator of the request and make inferences about the threat level. In addition to the physical location, IP intelligence feeds commonly also provide the autonomous system number (ASN), internet service provider (ISP), and even device type associated with an IP address. Maxmind is one of the most popular providers of IP intelligence, providing frequently updated databases and APIs for resolving the location information of an IP address.

Even though geolocation is a valuable feature to add to security machine learning systems, it is important to note that there are some gotchas when considering the IP addresses associated with a web request. These may not be the IP address of the user making the request, since your system only sees the address of the last hop in the request routing path. For example, if the user is sitting behind a proxy, the IP address seen will be that of the proxy instead of the user. In addition, IP addresses cannot be reliably associated with a single person. Multiple users in a household or large enterprise will share the same IP address if they share an internet connection or sit behind the same proxy service. Many ISPs also provide dynamic IPs, which means that the IP addresses of their end users are rotated regularly. Mobile users on a cellular network will typically have rotating IP addresses even if they don’t change their physical location, because each cell tower has a pool of nonsticky IP addresses that users connected to them share.

1 Lee Brotherston and Amanda Berlin, Defensive Security Handbook: Best Practices for Securing Infrastructure (Sebastopol, CA: O’Reilly Media, 2017), Chapter 18.

2 Robert Layton and Paul Watters, Automating Open Source Intelligence: Algorithms for OSINT (Waltham, MA: Syngress, 2015).

3 Sudhanshu Chauhan and Nutan Panda, Hacking Web Intelligence: Open Source Intelligence and Web Reconnaissance Concepts and Techniques (Waltham, MA: Syngress, 2015).

4 The word “compounding” is used here in the same way that “compounding interest” is used in the financial context. Knowledge bases are frequently compounded in the sense that they are used to build systems that generate more knowledge to be fed back into the original knowledge base.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.102.182