"We need to connect the dots to make the invisible visible" | ||
--Samir Datt |
Just as we need to connect the dots to build a big picture, from a network forensics perspective, we need to correlate logs in order to get to the big picture of activity on a network. All devices that maintain logs of events are a great resource to track intruder activity. In our role as Network 007s, we will use these logs to try and track every step of the route taken by an intruder in our network.
Let's begin by trying to understand what logs are. A log, as the name suggests, is a record of information generated in response to a specific event or activity that occurs on a system or network. A log aims to capture the who, what, when, and where of an event. Logs can include the information about the date and time of the activity; device or application the log relates to; associated user or account; type of log—such as error, warning, information, and success or failure audits, and of course, cryptic or detailed information (depending upon how the logging is configured) about the actual event itself. Each specific event generates a log record. Multiple records form a log file, which is usually either in the text format or the records are stored in a database for easy management and analysis.
One of the typical problems that we face with log analysis is caused by the sheer magnitude of the generated logs. Depending on the number of devices and level of logging enabled, the volume can really add up. To tackle this issue, most logging systems are designed to store logs up to a specific period. Therefore, in the case of a firewall, we may have limited it to 10,000 logs of the most recent events or 100 MB of recent data by default, unless configured otherwise.
However, our perspective is different from that of a normal network administrator. Our aim is to detect, track, and identify intruders and illegal activity on our network. Therefore, an access to the older logs can be quite useful. A comparison of logs can produce interesting results. A simple example of this is a sudden threefold increase in the size of a log file. This itself can be an indicator of increased network activity, a possible indicator of a malware compromise.
Another major problem faced due to the volume of generated logs is that of manually analyzing the logs to look for anomalies. Wherever manual systems of log analysis are in place, over a period of time, we see that logs tend to be ignored. This can be remedied by the use of good log analysis tools such as Splunk and Logstash. We will examine Splunk's use for log analysis and visualization in this chapter.
In this chapter, we will cover the following topics:
At the beginning of this chapter, we discussed how logs keep track of the four Ws related to an event. These were the when, where, who, and what of the event. Let's understand how each of these is done in a bit more detail in the following table:
One critical aspect in analyzing logs is that of when. Each log entry has a date and time input, which establishes the when of a specific event. Should this date or time be incorrect, all correlations will be hopelessly wrong and may end up completely messing up the analysis.
To illustrate this, let me share an incident that we came across in one of our investigations.
18.227.10.162