Chapter 13. Beautiful Log Handling

Anton Chuvakin

A well-thrashed maxim proclaims that “knowledge is power,” but where do we get our knowledge about the components of information technology (IT) for which we’re responsible—computers, networking gear, application frameworks, SOA web infrastructure, and even whatever future, yet-uninvented components come our way? The richest source of such information, almost always available but often unnoticed, are the logs and audit trails produced by the systems and applications. Through logs and audit trails, along with alerts, information systems often give signs that something is amiss or even allow us to look into the future and tell us that something will be amiss soon.

The logs might also reveal larger weaknesses, such as lapses in our controls that affect regulatory compliance. They even impinge on IT governance and, by extension, corporate governance, thus going even beyond the IT realm where they surfaced.

However, more often than not, such logs contain merely data (and sometimes junk data!) rather than information. Extra effort—sometimes gargantuan effort—is needed to distill that data into usable and actionable information about IT and our businesses.

Logs in Security Laws and Standards

To start at a very high level, logs equal accountability. This idea is not new; it goes all the way back to the venerable Orange Book (“Department of Defense Trusted Computer System Evaluation Criteria”), first released in 1983. Under the “Fundamental Requirements” section, we find a requirement for logging:

Requirement 4 — ACCOUNTABILITY — Audit information must be selectively kept and protected so that actions affecting security can be traced to the responsible party. A trusted system must be able to record the occurrences of security-relevant events in an audit log.

Wikipedia defines accountability as follows:

Accountability is a concept in ethics with several meanings. It is often used synonymously with such concepts as responsibility, answerability, enforcement, blameworthiness, liability and other terms associated with the expectation of account-giving.[99]

There are many other mechanisms for accountability in an organization, but logs are the mechanism that pervades IT. And if your IT is not accountable, neither is your business. Thus, if you tend to not be serious about logs, be aware that you are not serious about accountability. Is that the message your organization wants to send?

Along the same lines, logs are immensely valuable for regulatory compliance. Many recent U.S. laws have clauses related to audit logging and the handling of those logs; just a few of the most important laws are the Health Insurance Portability and Accountability Act (HIPAA), the Gramm-Leach-Bliley Financial Services Modernization Act (GLBA), and the Sarbanes-Oxley Act (SOX).

For example, a detailed analysis of the security requirements and specifications outlined in the HIPAA Security Rule sections §164.306, §164.308, and §164.312 reveals items relevant to auditing and logging. Specifically, section §164.312 (b), “Audit Retention,” covers audit, logging, and monitoring controls for systems that contain a patient’s protected health information (PHI). Similarly, GLBA section 501, as well as SOX section 404 and other clauses, indirectly address the collection and review of audit logs.

Centralized event logging across a variety of systems and applications, along with its analysis and reporting, all provide information to demonstrate the presence and effectiveness of the security controls implemented by organizations. These practices also help identify, reduce the impact of, and remedy a variety of security weaknesses and breaches in the organization. The importance of logs for regulatory compliance will only grow as other standards (such as PCI DSS, ISO2700x, ITIL, and COBIT) become the foundations of new regulations that are sure to emerge.

Focus on Logs

With regulatory lecturing out of the way, what are some examples of logfiles and audit trails? We can classify logfiles by the source that produced them, since it usually broadly determines the type of information they contain. For example, system logfiles produced by Unix, Linux, and Windows systems are different from network device logs produced by routers, switches, and other network gear from Cisco, Nortel, and Lucent. Similarly, security appliance logs produced by firewalls, intrusion detection or prevention systems, and messaging security appliances are very different from both system and network logs.

In fact, security systems display a wide diversity in what they log and the format in which they do it. Ranging in function from simply recording suspicious IP addresses all the way to capturing full network traffic, security logs store an amazing wealth of data, both relevant and totally irrelevant—or even deceitful!—to the situation at hand.

When Logs Are Invaluable

Logs turn up as an essential part of an investigation, and in fact are often the first data one needs to look at. Once recorded, logs are not altered through the course of normal system use, meaning they can serve as a permanent record (at least as long as the logs are retained). As such, they provide an accurate complement to other data on the system, which may be more susceptible to alteration or corruption. (This assumes that the administrator has followed recommended procedures for logging to a system that’s off the Internet and hard to corrupt.)

Since logs have timestamps on each record, they provide a chronological sequence of events, showing not only what happened but also when it happened and in what order.[100]

In addition, logs forwarded to a dedicated logging collector host provide a source of evidence that is separate from the originating source. If the accuracy of the information on the original source is called into question (such as the issue of an intruder who may have altered or deleted logs), the separate source of information may be considered more reliable. Logs from different sources, and even different sites, can corroborate other evidence and reinforce the accuracy of each data source.

In addition, logs serve to reinforce other evidence that was collected during a forensic investigation. Often, the re-creation of an event is based not on just one piece or even one source of information, but on data from a variety of sources: files and timestamps on the system, user command history, network data, and logs. Occasionally logs may refute other evidence, which in itself may indicate that other sources have been corrupted (e.g., by an attacker). When a host is compromised, the logs recorded remotely on other systems may be the only source of reliable information.

As I’ll explain in the following section, the evidence in logs is at times indirect or incomplete. For example, a log entry might show a particular activity, but not who did it. As an example, process accounting logs show what commands a user has run, but not the arguments to those commands. So logs can’t always be relied on as a sole source of information.

Challenges with Logs

In light of the chaos we’ve explored in formats, syntax, and meaning, logs present many unique challenges to analysis or even just collecting and retaining them for future use. We will review some of the challenges and then illustrate how they come to life in a representative story about investigative use of log data:

Too much data

This is the first challenge that usually comes to mind with logging. Hundreds of firewalls (not uncommon for a large environment) and thousands of desktop applications have the potential to generate millions of records every day. And log volume is getting higher every day due to increasing bandwidth and connectivity, if not for other reasons.

The sheer volume of log messages can force analysis to take significant time and computing resources. Even simply using the Unix grep utility (which looks for strings in a file, line by line) on a multigigabyte file can take 10 minutes or more. Some types of analysis, such as data mining, can take hours or even days with this volume of data.

Not enough data

This is the opposite of the preceding problem. The processing of incident or event responses could be hindered because the application or security device could not record essential data, or because the administrator did not anticipate the need to collect it. This challenge is also often caused by a log retention policy that is too short.

Poor information delivery

This is similar to the previous challenge. Many logs just don’t have the right information—or the right information needs to be wrangled out of them with some pain. For example, some email systems will record sent and received emails in different log messages or even different files, thus making it harder to follow the sequence of messages and correlate an email message with its responses.

Some logs just miss key pieces of data in records of interest. One blatant example is a login failure message that does not indicate the user account on which somebody tried to log in.

False positives

These are common in network intrusion detections systems (NIDS), wasting administrators’ time and occluding more important information that may indicate real problems. In addition to false positives (benign events that trigger alerts), systems overwhelm administrators with false alarms (events that may be malicious but have no potential of harming the target).

Hard-to-get data

For political or technical reasons, data is frequently unavailable to the person who can benefit from analyzing it, undercutting log management projects. A less common variant is data that is hard to get due to the use of legacy software or hardware. For example, getting mainframe audit records is often a challenge.

Redundant and inconsistent data

Redundant data comes from multiple devices recording the same event, and confusion can arise from the different ways they record it. This adds extra steps to log analysis because data “deduplication” needs to be performed to remove the records that “say the same thing.”

There has never been a universal logging standard. Most applications log in whatever format was developed by their creators (who are sometimes creative to the point of bordering on insanity), thus leading to massive analysis challenges. Logs come in a dizzying variety of formats, are logged via different ports, and sometimes look different while meaning the same thing or look the same while meaning something different. For example, the following strings all indicate successful access to a system, albeit on different systems:

login
logon
log on
access granted
password accepted

Along with the diversity of messages, many are also rather obscure. Many systems do not provide a catalog of messages that they can produce or explanations of how to interpret them. In an extreme case, an individual programmer makes a decision about logging something as well as about the format, syntax, and content of a message, often making it completely inscrutable. Hence the maxim, “log analysis is an art.”

In addition, one has to deal with binary as well as text logs, some of the latter being freeform logs that are hard to parse or to convert from text to structured data.

Heterogeneous IT environments

How many folks only use one platform, one piece of network gear and security device type, and a single vendor? Not many. Most companies have multiple types of devices from multiple vendors.

Heterogeneous IT environments boost some of the preceding problems as well as bring forth new ones. For example, more peculiar file formats need to be understood and processed to get to the big picture. Volume gets out of control, NIDSs get confused by what they’re monitoring, and custom application logs complicate this already complex problem dramatically.

We will illustrate these and other challenges in the following case study of the investigative use of logs.

Case Study: Behind a Trashed Server

The example in this section is loosely based on several real investigations led by the author, combined to provide an interesting illustration of several concepts in a small space.

Architecture and Context for the Incident

The company in question, a medium-sized online retailer, understands the value of network and host security because its business depends upon reliable and secure online transactions. Its internal network and DMZ setup was designed with security in mind and protected by the latest in security technology. The DMZ was a bastion network with one firewall separating the DMZ from the hostile Internet and another protecting internal networks from DMZ and Internet attacks (with all connections from the DMZ to the internal network blocked).

A network intrusion protection system (IPS) was also deployed inside the firewall that separated the network from the outside. In the DMZ, the company gathered the standard set of network servers: web, email, and a legacy FTP server dedicated to support for some long-running operations, a remainder from the old times. A few of the network services, such as DNS, were outsourced to external providers.

The Observed Event

On Monday morning, the company support team was alerted by one of their field personnel who was trying to download a large ZIP file from the FTP server. He reported that his browser was “timing out” while trying to connect to the company’s FTP server. Upon failing to log into the FTP server remotely via secure shell from the internal network, the support team member walked to a server room, only to discover that the machine had crashed and was unable to boot. The reason was simple: it had lost its operating system.

At that point, the company’s incident response plan was triggered. Since the security team has long argued that the FTP server needed to be retired in favor of more secure file transfer methods, this situation was used to “drive the final nail in the FTP coffin” and stop using the server. However, the security team was told to complete an investigation to prevent other critical network services from being disrupted. Note that at this point we didn’t know whether the system crash was due to a malicious attack and whether there were any other persistent effects.

The Investigation Starts

Thus, the primary purpose of the investigation was to learn what had happened and, in case it was of a malicious nature, to secure other system servers against its recurrence. The main piece of evidence for the investigation was the server’s disk drive. No live forensics were possible because the machine had crashed when running unattended, and memory contents or other live data were totally lost.

However, we did have a set of logfiles from the firewall and IPS as well as logs from other DMZ systems, collected by a log management system. We would have been delighted to find logs collected by the log management system from the FTP server; however, due to an omission, remote logging was not enabled on the FTP server. Thus, no firsthand attack information was available from the FTP server itself.

We started the investigation by reviewing the traffic log patterns.

First, by analyzing the firewall log data from its network firewall, we found that somebody had probed the company’s externally visible IP addresses at least several hours prior to the incident. That person had also tried to connect to multiple servers in the DMZ. All such attempts were unsuccessful—and logged, of course.

Here are the firewall log records that provided us with the evidence of the rapid attempts to access all external-facing systems. (All IP addresses in all the log records in this section have been sanitized to be in the LAN 10.10.0.0/16 range.)

Oct 1 13:36:56: %PIX-2-106001: Inbound TCP connection denied from 
 10.10.7.196/41031 to 10.10.15.21/135 flags SYN  on interface outside
Oct 1 13:36:57: %PIX-2-106001: Inbound TCP connection denied from 
 10.10.7.196/41031 to 10.10.15.21/80 flags SYN  on interface outside
Oct 1 13:36:58: %PIX-2-106001: Inbound TCP connection denied from 
 10.10.7.196/41031 to 10.10.15.21/443 flags SYN  on interface outside
Oct 1 13:37:15: %PIX-2-106002: udp connection denied by outbound list 
 1 src 10.10.7.196 3156 dest 10.10.175.7 53

He finally connected to the FTP server, as indicated by this log record:

Oct 1 13:36:59: %PIX-6-302001: Built inbound TCP connection 11258524 
 for faddr 10.10.7.196/3904 gaddr 10.10.15.16/21 laddr 10.10.16.120.122/21

Having gained access, the attacker finally uploaded a file to the FTP server, as shown by the following firewall log record:

Oct 1 14:03:30 2008 11:10:49: %PIX-6-303002:  10.10.7.196 Stored 
 10.10.15.66:rollup.tar.gz

We suspected that the file rollup.tar.gz contained a rootkit, which was later confirmed by a more complete investigation.

The last item shown was another unpleasant surprise. How was the attacker able to get onto the system if no connectivity from the DMZ was allowed? The company system administrative team was questioned and the unpleasant truth came out: the FTP server had a world-writable directory for customers to upload the logfiles used for troubleshooting. Unrestricted anonymous uploads were possible, as on many classic FTP servers, to a directory named incoming, and it was set up in the most insecure manner possible: anonymous users were able to read any of the files uploaded by other people. Among other things, this presents a risk of an FTP server being used by anonymous outside parties to store and exchanges pirated software.

Bringing Data Back from the Dead

After network log analysis, it was time for some forensics on the hard drive. We decided to look for fragments of logfiles (originally in /var/log) to confirm the nature of the attack as well as to learn other details. The investigation brought up the following log fragments from the system messages log, the network access log, and the FTP transfer log (fortunately, the FTP server was verbosely logging all transfers):

Oct 1 00:08:25 ftp ftpd[27651]: ANONYMOUS FTP
 LOGIN FROM 10.10.7.196 [10.10.7.196], mozilla@
Oct  1 00:17:19 ftp ftpd[27649]: lost connection to 10.10.7.196 [10.10.7.196]
Oct  1 00:17:19 ftp ftpd[27649]: FTP session closed
Oct  1 02:21:57 ftp ftpd[27703]: ANONYMOUS FTP LOGIN FROM
 10.10.7.196 [10.10.7.196], mozilla@
Oct  1 02:29:45 ftp ftpd[27731]: ANONYMOUS FTP LOGIN FROM
 10.10.7.196 [192.168.2.3], x@
Oct  1 02:30:04 ftp ftpd[27731]: Can't connect to a mailserver.
Oct  1 02:30:07 ftp ftpd[27731]: FTP session closed

(At this point, an astute reader will notice that one of the challenges I have discussed manifested itself: the timestamps between the FTP server and firewall logs were not in sync.)

This sequence indicates that the attacker looked around first with a browser (which left the standard footprint mozilla@). Then, presumably, the exploit was run (password x@). The line showing an attempt to access the mail server looks ominous as well.

Also from the FTP logs on the hard disk come the following:

Oct  1 00:08:25 ftp xinetd[921]: START: ftp pid=27692 from=10.10.7.196
Oct  1 00:17:19 ftp xinetd[921]: EXIT: ftp pid=27692 duration=255(sec)

All downloads initiated from the FTP server to the attacker’s machine have failed due to rules on the company’s external firewall. But by that time the attacker already possessed a root shell from the exploit.

Summary

Two conclusions can be drawn from this incident. First, the server was indeed compromised from outside the perimeter using a machine at 10.10.7.196 (address sanitized). Second, the attacker managed to get some files onto the victim host.

Overall, this teaches us that despite the challenges they present, logs are of great use while investigating an incident; they can often be retrieved even if erased.

Future Logging

How will the humble logfile evolve and continue to play critical roles in system administration and security?

A Proliferation of Sources

First we should consider the increase in the breadth of log sources. There used to be just firewall and IDS logs, then came servers, and now it is expanding to all sorts of log sources: databases, web servers, applications, etc.

A few years ago, any firewall or network administrator worth her salt would at least look at a simple summary of connections logged by her baby PIX or Checkpoint router. Indeed, firewall log analysis represented a lot of early business for log management vendors. Many firewalls log their records in syslog format, which fortunately is easy to collect and review.

At the next historic stage, even though system administrators always knew to look at logs in case of problems, massive operating system log analysis on servers didn’t materialize until more recently. It is now de rigeur for both Windows and Unix/Linux. Collecting logs from all critical (and many noncritical) Windows servers, for example, was hindered for a long time by the lack of agentless log collection tools such as LASSO. On the other hand, Unix server log analysis was severely undercut by a total lack of unified format for log content in syslog records.

Electronic mail tracking through email server logs languished in a somewhat similar manner. People turn to email logs only when something goes wrong (email failures) or even horribly wrong (an external party subpoenas your logs). Lack of native centralization and, to some extent, the use of complicated log formats slowed down initiatives in email log analysis.

Database logging probably wasn’t on the radar of most IT folks until last year. In fact, IT folks were perfectly happy never to turn on the extensive logging and data access auditing capabilities that DMBSs offered. That has certainly changed now! It will be all the rage in a very near future. Oracle, MS SQL, DB2, and MySQL all provide excellent logging, if you know how to enable it (and know what to do with the resulting onslaught of data).

What’s next? Web applications and large enterprise application frameworks used to live largely in worlds of their own, but people are finally starting to realize that log data from these sources provides unique insight into insider attacks, insider data theft, and other trusted access abuse. It is expected that much more of such logs will be flowing into log management solutions. Desktop log analysis should not be too far behind.

In a more remote future, various esoteric log sources will be added into the mix. Custom applications, physical sensors, and many other uncommon devices and software want to “be heard” as well!

So, we have observed people typically paying attention first to firewall logs, then to server logs, then to other email and web logs, then to databases (this is coming now), and ultimately to other applications and even non-IT log sources.

Log Analysis and Management Tools of the Future

To conclude this chapter, let’s imagine the ideal log management and analysis application of the future—one that will help solve the challenges we presented earlier and address the needs we brought up.

Such an ideal log management tool will have the following capabilities:

Logging configuration

The application will go out and find all possible log sources (systems, devices, applications, etc.) and then enable the right kind of logging on them, following a high-level policy that you give it. As of today, this requires the tools to have “God-like powers” that are far beyond current products.

Log collection

The application will collect all the logs it finds securely (and without using any risky super-user access) and with little to no impact on networks and systems. As of today, this also is impossible.

Log standards

The tool will be able to make use of logging standards that I hope to see adopted in the future. Today’s growing log standard efforts (such as MITRE’s Common Event Expression, or CEE) will lead first to the creation of log standards and ultimately to their adoption. It might take a few years, but at least partial order will be imposed on the chaotic world of logs.

Log storage

The application can securely store the logs in the original format for as long as needed and in a manner allowing quick access to them in both raw and summarized/enriched form. This is not impossible today, as long as one is willing to pay for a lot of storage hardware.

Log analysis

This ideal application will be able to look at all kinds of logs, even those previously unknown to it, from standard and custom log sources, and tell the user what he needs to know about his environment based on his needs. What is broken? What is hacked? Where? What is in violation of regulations/policies? What will break soon? Who is doing this stuff? The analysis will drive automated actions, real-time notifications, long-term historical analysis, and compliance relevance analysis (discussed later). Future development of AI-like systems might bring this closer to reality.

Information presentation

The tool will distill the data, information, and conclusions generated by the analytic components and present them in a manner consistent with the user’s role, whether an operator, an analyst, an engineer, or an executive. Interactive visual, advanced, text-based data presentation with drill-down capabilities will be available across all log sources. Future log visualization tools will not only present “pretty” pictures but will fit the tasks of their users, from operators to managers. The user can also customize the data presentation based on her wishes and job needs, as well as information perception styles. This might not take more than a bunch of daring user interface designers who deeply understand logs.

Automation

The tool will be able to take limited automated actions to resolve discovered and confirmed issues as well as generate guidance to users so that they know what actions to take when fully automatic mode is not appropriate. The responses will range from fully automatic actions, to assisted actions (“click here to fix it”), to issuing detailed remediation guidance. The output will include a to-do list of discovered items complete with actions suggested, ordered by priority. This is also very far from today’s reality.

Compliance

This tool can also be used directly by auditors to validate or prove compliance with relevant regulations by using regulation-specific content and all the collected data. The tool will also point out gaps in data collection relevant to specific regulations with which the user is interested in complying. Again, this capability calls for “God-like” powers and might never be developed (but we sure can try!).

Conclusions

Logs are extremely useful for investigative and regulatory purposes, as I’ve demonstrated in this chapter. Despite the challenges I outline, log handling is indeed “beautiful.” At the very least, it serves the beautiful purpose of discovering the hidden truth in an incident.

Finally, if we cast a look ahead to the potential world of logging in the future, we will see more logging: more volume, more log sources, and more diversity. Second, we will see more and more need for log information and more uses for such data. Third, we will hopefully see better tools to deal with logs. Fourth, the log standardization efforts of today should bear fruit and make the world of logs better. When it will happen is anybody’s guess, but the author of this chapter is working hard to make it a reality.

But before the logging challenge “gets better,” it is likely to “get worse” in the coming years. It will be interesting to watch the race between slowly emerging log standardization efforts and the sharply rising tide of new log types (such as messages logged by various applications) as well as new uses for logs. Developing credible and workable log standards, such as CEE, will take years and the efforts of many people. Standard adoption will also be a challenge because there are so many legacy logging systems.

On the other side, compliance efforts as well as incident investigation requirements are driving increased logging across platforms and applications today. Developers are asked to “add logging” before the community has a chance to give them guidance on how to do it effectively and in a manner useful for security, compliance, and operations. I have confidence that logging will be more uniform in the future, but I am not willing to say when this future will come.



[99] In the interest of accountability, I’ll note that this definition began the Wikipedia entry on “Accountability”, last accessed on January 10, 2009.

[100] Keep in mind the challenges to correct log timing mentioned earlier.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.136.18.61