Counting in population analysis

The execution of anomaly detection on counting the occurrence of things with respect to an entity's own history is clearly useful. But, as we introduced conceptually in Chapter 1, Machine Learning for IT, the idea of comparing the behavior of something against its peers is also informative, especially in cases where we assess the number of times something happens. Counting the occurrence of things across a population to find individual outliers has a variety of important use cases. Some of these use cases include the following:

Finding machines that are logging more (or less) than similarly configured machines. Here are some example scenarios:
- Incorrect configuration changes that have caused more errors to suddenly occur in the log file for the system or application.
- A system that might be compromised by malware may actually be instructed to suppress logging in certain situations, thus drastically decreasing the log volume.
- A system that has lost connectivity or has operationally failed, thus having its log volume diminished.
- An otherwise harmless change to a logging-level setting (debug instead of normal), now annoyingly making your logs take up more disk space.
Finding a behavior that differs from that of most normal users. A nod in the direction of user-behavioral analytics, a comparison of the rate of activity of users against their peers can be useful in the following cases:
- Automated users: Instead of the typical human behavior or usage pattern, an automated script may exhibit behavioral patterns that look quite different in terms of the speed, duration, and diversity of events they create. Whether it is finding a crawler trying to harvest the products and prices of an online catalog or detecting a bot that might be engaged in the spread of misinformation on social media, the automatic identification of automated users can be helpful.
- Snooping users: Whether it is a real human testing the boundaries of what they can get away with or an intelligent piece of malware doing some reconnaissance, a snooper may execute a wide variety of things, hoping for a match or to find a way in (such as by port scanning). Often, using the distinct_count function can help find a snooper.
- Malicious/abusive users: After the reconnaissance phase, a malicious user or malware is now actively wreaking havoc and is involved in active measures such as denial of service, brute forcing, or stealing valuable information. Again, compared with typical users, malicious and abusive users have stark contrasts in their behavior regarding volume, diversity, and intensity of activity per unit of time.

A practical example of exposing behavioral anomalies would involve the analysis of a log that tracks usage, such as a web access log. We could set up a job looking for unusual client IP addresses, those that are acting like automated bots and not like humans (since bots often make requests with higher volumes, frequency, and diversity than humans). The configuration compares the count of web requests per unit of time, split by the HTTP status code (since bots will also often make random access patterns that result in a diverse set of response codes), against a population of client IPs:

When executed, the job nicely identifies some rogue IP addresses:

The heatmap shows the top 10 most unusual client IP addresses, again based on the volume of requests per unit of time. Focusing on the top offender, 173.203.78.60, we can see the details when clicking on the red tile in its swim lane:

We can see that this rogue IP address was executing literally thousands of requests for a URI of /wp-login.php, which fortunately doesn't exist on this web server (thus resulting in the status code of 404). It seems like this was a rather unsophisticated brute-force login attempt, but an interesting find nonetheless.

As a point of comparison, if the analysis of the web logs had instead leveraged distinct_count of the URL field instead of the standard Count function, then the preceding rogue IP address would not have been highlighted as anomalous. This is simply because the thousands of requests made were all made for the same URL (wp-login.php). Thus, the diversity of the requests was really low. However, in a job that looks for IP addresses with an unusually high diversity of URL requests, using distinct_count will find different situations, such as this IP:

This IP (109.234.202.124) was making hundreds of requests for unique URLs (whereas a human does not make that many different ones in the same amount of time). If you were to use Kibana's Discover panel to look at the raw requests in the web logs, filtered for this IP address, it would reveal that this IP was trying all sorts of requests for different PHP pages, each time passing an odd-looking argument in the query string:

It seems as if this traffic is driven by a bot that is hoping to find an exploit in a site's PHP code. It is blindly testing a variety of presumably well-known PHP filenames, and passing the contents of an established text file (that's always hosted on Google) may indicate to the bot that a vulnerability exists for that PHP page. If found, it is likely that some malicious subsequent actions will be taken against that PHP page.

Table of Contents for Counting in population analysis

Create new playlist

Sign In

Sign Up

Table of Contents for
Counting in population analysis