image
CHAPTER  11
Analysis Methodology
image
Science literacy advocate Neil deGrasse Tyson once said, “Science is a way of equipping yourself with the tools to interpret what happens in front of you.” As a practitioner in the field of incident response, computer forensics, or computer security, it’s hard to deny the importance of science. Our fields exist because of science. So we must also understand, accept, and use science and its tools as we perform our duties.
When we consider performing analysis on new data, we follow a general process that is similar to the scientific method:
 1. Define and understand objectives.
 2. Obtain relevant data.
 3. Inspect the data content.
 4. Perform any necessary conversion or normalization.
 5. Select a method.
 6. Perform the analysis.
 7. Evaluate the results.
This process may repeat multiple times for the same set of data until we get good answers. In other words, the process is iterative. We encourage you to use this process whenever you receive new data for analysis—whether it is a small text file, terabytes of server logs, a potential malware file, a forensic image of a hard drive, or something completely unknown. Following the same process will help ensure you stay focused and produce consistent and accurate results. We hope that sharing our lessons learned in this chapter will help you to better “interpret what happens in front of you” throughout your investigation.
DEFINE OBJECTIVES
Most of us recognize that well-defined objectives usually result in a better outcome for a given project. Making objectives is not hard, but making good objectives can be a major challenge. You must have commanding knowledge of both the situation and the technology to be effective. What are you looking to determine? Is it possible to form a conclusion from the facts you have? How long will it take? What resources will you need? Who is interested in your results? What to do they plan to do with them? You should have a clear understanding of these types of questions before you start performing analysis.
An important step in this process is to identify (or designate) who will define the objectives. Although this might seem unimportant, it’s often critical to the success of the investigation. In addition to identifying who defines the objectives, you must also ensure the entire investigative team is aware of who that person is. Failure to take this step often leads to miscommunication and loss of focus, thus severely impeding investigative progress.
In the Field
On the topic of objectives, be careful when someone asks you to “prove” a negative. Instead, focus on something positive and realistic. For example, you might be asked if you can “prove” that a system was not compromised. It nearly all cases, it will be difficult, if not impossible. The reason is, it’s very unlikely you would have access to all the information you would need. Most systems do not have full audit trails of every action that was taken, preserved and waiting for your review. Also, even when evidence is generated, it may not exist long enough—log files have limited space, deleted files get overwritten, and so on. Therefore, it’s likely that at least some evidence of past events on a system either never existed or was lost over time. Instead, you can look for a set of indicators of compromise, and state if you find any. Provided the indicators are reasonable, you can state an opinion that the system was likely not compromised—but you don’t know for sure. Your supporting factual evidence is that you performed an appropriate analysis and uncovered no evidence.
The investigative objectives are normally defined as a series of questions. We review each question and evaluate how realistic it is. Some questions are impossible to answer without some limiting scope. For example, the question “Is malware present on this computer?” may seem simple enough, but it is not easy to answer. You could expend a lot of effort looking for malware, and still easily miss something. So the analysis normally becomes a short list of tasks that will provide reasonable confidence in an answer to the question. The stakeholder must realize that there is no guarantee in that situation. Some questions can be answered more definitively. The question “Is there an active file with the MD5 hash d41d8cd98f00b204e9800998ecf8427e on the computer’s file system?” can be answered with a much higher level of confidence with a very simple analysis. In this case, you would compute an MD5 hash for every file on the system and see if there is a match. The important concept here is to go through each question with the stakeholder and outline the proposed approach. This will give them a very good understanding of what can be answered, how definitive the answers will be, and how much effort is involved.
As you may have guessed, an important consideration when defining objectives is scope. If you intend to perform an analysis to the best of your ability, you must clearly understand what the scope of the analysis is. If someone asks “Can you look at this hard drive for me?”, they probably don’t mean “Can you examine every bit on this hard drive and determine what it all means?” That would likely take more time and incur more cost than anyone can afford. It is your job to understand what is really important, and stay focused. Perhaps e-mail is an important aspect of the investigative objectives that helps to narrow the scope, but “look at all e-mail” is still a very broad and ambiguous statement. You should look at further defining the scope into something that is clearly actionable—perhaps something like “Review all active .pst files for any e-mail from Bob Smith received within the past month,” if that is appropriate. Like a curious child, you should always ask “Why?” If the answer doesn’t make sense, ask more questions. Keep asking questions until you and the stakeholders come to a consensus about the scope and purpose of the analysis.
As an analyst, you may need to define the objectives because the individuals within the organization may not always understand what analysis is possible or reasonable to perform. Based on the available information, you may need to clearly state what you think the objectives should be. You should allow the individual responsible for defining objectives to ask questions and come to an understanding of the situation. Other times, there may not be a specific individual, and you are essentially your own boss. In those cases, you must resist the temptation to run toward shiny objects, or you may never come back. Instead, think about what is most important with respect to the issue at hand.
KNOW YOUR DATA
Computing systems can store data in many formats and locations. Before you can perform analysis, or even select an analysis approach, you will need to explore possible data sources and understand how you can use them. This section will discuss highlevel sources and formatting of data.
As you run an investigation, one of the tasks you will need to perform is data collection. You will use the data you collect to perform analysis and answer investigative questions. To increase your chances of being successful, you want to collect data that is useful to the investigation. Knowing what sources of data exist and what they can provide will help you to determine what data to collect. You should refer to the inventory of data sources you created as part the pre-incident preparation recommendations from Chapter 3 of this book.
image
image
We often hear statements such as “Technology changes so fast, how can I keep up?” Although outwardly things can look quite different over time, the basic fundamentals tend to be fairly stable. We encourage you to explore and understand the fundamentals. They will make you an investigator who stands the test of time.
Where Is Data Stored?
Let’s start by taking a look at seven of the most common places data is stored. In this context, “data” is used in a very broad sense, meaning operating system files, applications, databases, and user data.
• Desktops and laptops  User desktops and laptops are physical computers that are used for day-to-day business. They are typically located at a user’s desk or work area. The system usually contains one or more hard drives that contain the operating system, applications, and associated data. Data may also be stored on an external storage solution or media that is physically connected or accessed through a computer network.
Today, desktops can also be virtualized. Virtual desktops are commonly accessed through a terminal that has no local data storage and only provides remote access to the virtualized system. The virtual desktop is commonly run on a centralized virtualization infrastructure. This shifts data storage from the traditional desktop to a central infrastructure.
• Servers  Server systems typically provide core business or infrastructure services. They are usually found in data centers, server rooms, or communication closets. Server systems may physically look like a user desktop or laptop, but are more commonly rack mount devices. Servers will normally have at least one hard drive for the operating system, but may or may not contain any additional drives for applications or data. In some cases, application and data are stored exclusively on external storage solutions. This is especially true in the case of virtual servers, which are typically centralized in a virtual server infrastructure.
• Mobile devices  Mobile devices are typically small, handheld, networked computers. They include cell phones, personal digital assistants (PDAs), tablets, and wearable computers. Nearly all mobile devices have a relatively small amount of built-in storage, typically some form of nonvolatile (flash) memory. Many mobile devices also have expansion slots for additional internal storage, or interfaces that can access external storage media.
• Storage solutions and media  USB flash drives, USB hard drives, CDs, and DVDs are common storage media in almost any environment. Small office as well as medium and large enterprise environments typically use some form of network-based shared storage solution, such as network attached storage (NAS) or storage area network (SAN). In environments that use NAS or SAN solutions, you will likely need to coordinate with the local staff because those solutions can be complex to deal with.
• Network devices  Most network environments today contain devices such as firewalls, switches, and routers. Although these devices do not typically store user data, they may contain configuration and logging data that can be critical in an investigation.
• Cloud services  In this context, a cloud service is an off-site third-party service that provides hosted applications or data storage for an organization. Common business services are hosted e-mail, timesheets, payroll, and human resources. But there are also many personal services, such as Dropbox or Google Drive.
• Backups  Backups are copies of important data, typically part of a disaster recovery plan. Backups can be stored on existing storage solutions and media, but any comprehensive disaster recovery plan will require off-site backups. The backups are usually rotated off-site on a regular schedule. The backups may be stored on common media, such as external USB drives or DVDs, but are more commonly saved to tape media. Some cloud-based backup solutions, commonly targeted to individuals, are also available, such as Carbonite or Mozy.
What’s Available?
From a general analysis standpoint, four high-level categories of evidence might exist in the locations we just outlined. Each of these categories is covered in more detail in the remaining chapters of this book:
• Operating system  This category includes file systems such as NTFS and HFS+, state information such as running processes and open network ports (memory), operating system logs, and any other operating system–specific data sources. OS-specific sources include the Windows registry, a Unix syslog, and Apple OS X property list (plist) files. Forensic tools can examine file systems and provide file listings that include file name, path, size, and timestamps.
File systems can be independent of operating systems, and each has its own unique characteristics. Keep in mind that many storage concepts apply to most file systems, such as allocation units, active files, deleted files, timestamps, unallocated (free) space, file slack, and partition tables. Each file system will also have unique characteristics, data, and artifacts (for example, NTFS file name timestamps, NTFS streams, UFS inodes, HFS resource forks, and the file allocation table for FAT12, 16, and 32 file systems). A fantastic resource for file system analysis is Brian Carrier’s book File System Forensic Analysis (Addison-Wesley Professional, March 2005).
• Application  The application category includes all artifacts that are specific to an application (for example, Internet browser cache, database files, web server logs, chat program user preferences and logs, e-mail client data files, and so on). Keep in mind that many of the artifacts for a given application are similar across different operating systems. Also, when applications are removed or uninstalled, some artifacts are frequently left behind. Resources for application artifacts are enormous, but are usually specialized. Books and other resources on applications tend to focus on a single category, or sometimes a specific product. We recommend that you review and experiment with applications that are common in your environment.
• User data  If the subject of your investigation involves a specific user, or group of users, it will be important to understand where user data is stored. Each user is likely to have user data on their day-to-day system; however, there may be user data on other systems throughout the environment. For example, e-mail, documents, spreadsheets, or source code may be stored in centralized locations for each user. These areas should be part of your data source inventory.
• Network services and instrumentation  Nearly every organization has internal network services or instrumentation. Sometimes they may be forgotten, but even common services such as DHCP, DNS, and proxy servers may be key sources of data for an investigation. Imagine trying to determine what computer was compromised if all you have is an IP address, and the environment is DHCP with a high lease turnover rate. In addition, common instrumentation such as network flow data, IDS/IPS systems, and firewalls are frequently important to an investigation. These sources should also be part of your data source inventory.
ACCESS YOUR DATA
Once you’ve obtained data to analyze, one of the first challenges you may encounter is figuring out how to access it. In this context, “access” means “in a state that you can perform analysis.” The data may be encrypted, compressed, encoded, in a custom format, provided on original hard drives, contained in hard drive images, in hard drive clones, or even just plain broken. Although the information we’re going to cover in this section is not directly related to analysis methods, you won’t be able to perform any analysis unless you understand what it is you are looking at. This section should give you ideas to accomplish that.
image
image
Remember to follow your organization’s policy on handling data you receive as evidence. Take the proper measures to document, secure, and prevent alteration of the data. Failure to do so may jeopardize the investigation, or worse.
The first thing you’ll need to do is determine what you actually have. This may sound simple, but it can be quite complicated. If you generate or collect the data yourself, it’s usually easier. If someone else provides you the data, you’ll need to do a good job of asking questions about what you are receiving. If you don’t, you may have a very difficult time figuring things out.
Disk Images
Let’s take a look at a scenario. Imagine that someone in your organization tells you they are going to provide you a copy of a hard drive from a system and you will need to analyze it for signs of a security compromise. The next day, you receive a hard drive via courier. You eagerly open the package, connect the drive via a write blocker, and begin analysis. You are stuck within moments because when you connect the drive to your analysis system, forensic tools do not recognize a file system. Even worse, the drive looks like it’s full of junk—seemingly random characters. Based on this, you guess that perhaps the drive is encrypted. However, there is no clear indication of what kind of encryption, and you don’t have the password or other required access components. So, you try to contact the person who gave you the drive to find out more. Unfortunately, the person went on vacation and co-workers are unsure where the drive came from. At this point, you are stuck and cannot perform any analysis.
The message here is to always ask questions when someone provides you data. You must get some basic answers to ensure you know how to handle what you receive. For the situation in the last paragraph, you should ask the person what they mean by “copy of a hard drive from a system.” The reason is, that statement is extremely ambiguous. You should pick apart that statement and ask questions that will help you understand what is meant by “copy,” “hard drive,” and “system.” What kind of copy? Perhaps it was a logical copy, but maybe it’s actually a forensic image, or even a clone. Depending on what the goal of the analysis is, the format might be unacceptable or have some downsides that need to be addressed. If the “copy” was actually an image, you should ask what the image format is. Also, whenever receiving a “copy of a hard drive,” you should ask if disk encryption was in use, and, if so, how to get the information needed to access the drive. Some forensic tools provide somewhat seamless access to disks encrypted with popular encryption packages. At times, the tools will require extra libraries or files from the system administrators. If the package is not supported, you may need to look into other solutions to deal with the encryption. Finally, you should ask questions about the “system.” Is it a desktop, laptop, server, or something else? What is the make and model of the system? What operating system, including version, does it run? If it is a server, does it have a RAID setup? If so, what is the make and model of the RAID card, and what are the RAID settings? What is its primary function? The questions can go on and will vary based on the situation and the responses to previous questions, but we think you probably get the idea. If you don’t ask these types of questions, you could find yourself wasting a lot of time.
We commonly encounter three disk image formats: Expert Witness/EnCase (E01), Raw (DD), and virtual machine disk files (VMDK, OVF). If you have a commercial forensic tool, such as EnCase from Guidance Software, support is normally built in for all these formats. That means you don’t have to perform a conversion to get access to the data. If you don’t have a commercial tool, you can either perform a conversion or use additional tools that interpret, or convert, on-the-fly. Let’s briefly look a little deeper at dealing with disk images.
Most large organizations have licenses for at least one commercial forensic tool. Assuming that tool is EnCase, it’s very straightforward to open a disk image. Typically, for E01 and VMDK files, you can simply drag and drop them into EnCase v6 (after you start a new case). If the image is a DD, you have to configure the image through File | Add Raw Image. Another advantage to some of the commercial tools is their ability to deal with various full disk encryption (FDE) solutions. For example, you may receive an image of a hard drive that was encrypted with Credent full disk encryption. With a commercial product such as EnCase, you can directly work on the image, provided you have the appropriate passwords and/or key files. This saves you from having to figure out how to decrypt the image or otherwise determine how to get access to the data.
If you are working on a tight budget, or just like to minimize costs, there are a couple of free tools that can help you deal with disk images. AccessData’s FTK Imager can create, convert, and view disk images of many different types. The viewing capability is mostly for sanity checking or simple file export versus forensic analysis. But the conversion capability is nice—you can add a forensic image to FTK Imager, and then right-click on the evidence and select Export. Following the export wizard, you can select a different export format than that of the source. Using this method, you can convert E01 images to DD, DD to E01, or any of the other formats that FTK Imager supports. Under Linux, you can use Filesystem in Userspace (FUSE) to mount DD images, and libewf to mount E01 images.
What Does It Look Like?
The analysis approaches discussed in this chapter are rendered useless if you forget about one simple fact: there are limitless ways to represent a single piece of data. When you consider character encodings, compression, encryption, data encoding, time zones, languages, and other locale-related formatting, your job of finding evil and solving crime can become exponentially difficult. For example, what do the following strings have in common:
image
At first glance, it may not seem like they have anything in common. However, they are all derived from the string “the password is solvecrime.” The first is base64 encoding, the second is UU encoding, and the third is the MD5 hash. As an investigator, it is important to realize that all information consists of layers, such as these simple types of encoding. In order to effectively find what you are looking for, you must understand where layers can exist, what they look like, and how to deal with them.
The idea of information consisting of layers is probably not new to you. Most aspects of computing involve layers. What may be new to you is how those layers affect your analysis. Even something as simple as determining if a name or credit card number is present on a hard drive can become a complex task, depending on what layers are present. There are many different character sets, encoding, compression, encryption, and custom data structures and formats. In one case, we examined a debug log for a point-of-sale terminal. A third party’s analysis concluded, in writing, that a particular credit card number was not present in the debug log. We examined the file and found the unencrypted credit card number. How could that be?
Taking a quick look at the file, we observed that it was not consistent with the formatting of a typical text document. Instead, the format was similar to a standard hex viewer application. Typically, hex viewers show a hex offset on the left, the binary data represented as hexadecimal values in the middle, and an ASCII conversion on the right.
In this case, the file contained text that was formatted as if it were a hex editor display, as shown next:
image
If you opened the file in a text editor, you would see exactly what is in shown here. A simple search against this data format will likely miss strings due to how the data is presented. In this illustration, the card number “4444555566667777” is present in the file. However, the credit card number is split into two parts and displayed on separate lines, with other text in between. Because a typical string search tool does not compensate for data presentation, it fails to find the card number. It takes a person to inspect the data, understand how the information is formatted, and create an appropriate search method. In this case, we created a script that read in the source data and changed the formatting. The script discarded the hex columns and reformatted the ASCII columns so characters between each transaction separator (the dashes) were merged into a single line.
When thinking about what your data might look like, don’t forget about localization. Different areas of the world use different conventions for representing dates, times, numbers, characters, and other information. Even within the same locale there may be variances. For example, if you looked at log files on your own computer, you can probably find dates that are in at least six different formats.
ANALYZE YOUR DATA
Now that you understand your objectives and data sources, it’s time to determine an approach. Review your investigative questions and compile a preliminary list of the data sources you will need to answer the questions. Sometimes data from an unexpected source can provide indirect evidence, so be sure to always consider each of the four “What’s Available?” categories listed earlier in this chapter. Let’s take a look at a sample scenario.
image
image
All of this work—listing objectives, creating lists of data sources, and documenting the approach you intend to take—seems like a lot of overhead. When working by yourself on a handful of systems, it may be. We’ve found it to be quite useful and at times necessary when working on large projects with multiple teams. Documenting and making lists helps you keep the investigation organized, allows you to manage multiple lines of inquiry concurrently, and gives you useful metrics for periodic meetings with the stakeholders.
Outline an Approach
Perhaps your investigative question is to determine if data theft occurred. That alone is quite broad. Sometimes attackers will make claims about what they did, either publicly or privately. That information may provide good leads, but keep in mind that they could be partially or completely false. Without more specific information, you could begin your investigation by looking for the two following types of evidence:
• Network anomalies
• Common host-based artifacts of data theft
The next step is to consider what data sources may contain this type of evidence.
To search for network anomalies, you could start by looking at network flow data for egress points to see if there are any anomalies. Perhaps there was a day within the last month where the volume of data transferred (outbound) to the Internet was abnormally high. Or perhaps there was an unusual level of traffic over certain protocols or ports. You could investigate why, and see where that leads you. You could also examine proxy logs, DNS logs, firewall logs, or other network instrumentation for anomalies and investigate anything that seems suspicious. For example, you might observe a large number of failed login attempts.
Concurrently, you can search systems in your environment for artifacts of data theft. Knowledge of the attacker’s methods will help greatly, but there are also some generic artifacts to look for. Here are some examples:
• Abnormal user activity
• Login activity outside of expected hours
• Odd connection durations
• Unexpected connection sources (remote session from a workstation to a server, for example)
• Periods of abnormally high CPU or disk utilization (common when compressing data)
• File artifacts associated with the use of common compression tools
• Recently installed or modified services, or the presence of other persistence mechanisms
If you have information about how an attacker is operating in your environment, be sure to include those items as well.
Another common investigative question is, is there malware on the system? It’s unlikely that you would review every combination of bytes or possible location to store malware on a hard drive. You cannot prove there is no malware on a system; therefore, you will have to create a list of steps to take that provide a reasonable level of confidence that the system does not contain malware. A sample listing of steps for this situation might be:
• Follow the initial leads. For example, if a date is relevant, review system activity for that day. If you know that specific file names are involved, search for them.
• Review programs that automatically start.
• Verify the integrity of system binaries.
• Make a list of and look for other well-known artifacts of an infection.
• Perform a virus scan of the system.
You would perform each of these steps, document the factual results, and use that as supporting evidence for an opinion regarding the presence of malware. Remember that a file doesn’t have to be malicious to be used with ill intent. Keep in mind that legitimate tools and system programs may be cause for suspicion in some situations. For example, finding a copy of cmd.exe in a directory other than Windows/System32 should get your attention.
Next, you will have to describe a number of specific tasks and decide what to do. Before you can, you may need to further define what the task is. For example, if the task is to search for abnormal user login times, how will that be accomplished? You may already have the ability to automate the process, or you may need to develop a technique or perform steps manually. Also, for each task, there are considerations such as the volume of data, the time it will take to process, who is available to work on it, and how likely the data source is to help answer your question. Let’s take a more detailed look at what categories of steps you might take to perform analysis.
Select Methods
Several analysis methods are commonly used across many different types of operating systems, disk images, log files, and other data. Some are useful when you know what you are looking for, and others are useful when you don’t. Although the implementation details may change, these methods are not directly tied to any particular technology:
• Use of external resources
• Manual inspection
• Use of specialized tools
• Data minimization through sorting and filtering
• Statistical analysis
• Keyword searching
• File and record carving
For example, although NTFS artifacts are (mostly) specific to that file system, the general concept of identifying and using file system artifacts to further an investigation is not. The chapters in this book were structured around this concept. This is an important paradigm to think about. If you plan to examine the artifacts of a popular web browser, chances are that the underlying browser technology will result in at least some artifacts that are independent of the operating system. The artifacts may exist in different locations, or have different names, but often the same information is present. Good forensic tools, techniques, and documentation will take this into account, and allow you to easily apply the same process in multiple environments. Let’s talk about these analysis methods in a little more detail.
Using external resources, or using other people’s work, may sound a little like we’re telling you to cheat. However, unless you have time to treat a situation as a learning experience, we think it is perfectly acceptable to use other’s work. If mounds of data are preventing you from quickly solving crime, and time is of the essence, you should use any reasonable and accepted method to figure things out. In this case we’re referring to resources such as known file databases, search engines, knowledge repositories, online forums, automated tools, co-workers, and colleagues. If you are looking at a file and you are unsure what it is, compute its MD5 hash and perform a lookup in a known file database such as the National Software Reference Library (NSRL), Bit9, or a popular search engine. If you run across a domain name and are wondering if it’s associated with malware, try searching the websites of security vendors. Finally, don’t be afraid to ask for help. Making assumptions or impeding investigative progress puts your organization at risk.
Another approach you may consider is manual inspection of data. Sometimes the size of the data you need to review is small. In those cases, it may make sense to manually review the entire contents of the data. Even as recent as 2012, we have been involved in investigations that required analysis of a few floppy disks. With a data set of that size, it’s unacceptable to do anything but a full review of all of the data. This does not happen very often, but you should keep it in mind for at least two reasons. First, you may waste more time trying to “figure out a better way.” Second, it may help provide increased confidence in your results. When a situation exists that lends itself to automation, we consistently use manual inspection of the data to validate other methods. Take, for example, a process designed to perform comparisons of data for a copyright infringement case. In many situations, categorizing (or “binning”) information in the data sets is perfectly suited for automation. In parallel to the development of a process, we manually review a subset of the data for validation. As the larger data set is processed, we take samples and repeat the manual review. You should carefully consider the situation before using a manual inspection approach, because it can easily get out of hand and become very time consuming. In most cases, we use a combination of one or more other methods, such as using specialized tool, sorting and filtering, or a keyword search. We’ll describe several of these methods in the following paragraphs.
Commercial entities, practitioners, and researchers in the fields of incident response and computer forensics have created a mountain of specialized tools to help us get our job done. They include tools that perform tasks such as data visualization, browser artifact analysis, malware identification, and file system metadata reporting. It’s important to have a comprehensive collection of tools on your tool belt so you can effectively handle different situations. Sometimes general tools, such as a tool that identifies malware, can help you get started in cases where there are no other leads. In the following chapters we cover a number of tools and techniques that we think you should consider. When used effectively, tools can save you a lot of time. However, remember that an important aspect of using tools is to perform some validation, or testing, to help ensure they actually do what they say they do. Using tools that are untested or uncommon in our field could land you in hot water.
The next category we cover is data minimization through sorting and filtering. When reviewing metadata, such as full file listings, we find that most of the data is not useful. A system may have hundreds of thousands of files, and it’s very unlikely that you would manually review each entry. In most incidents, only a small subset of the metadata is actually relevant to the investigation, and if the metadata is voluminous, sifting through it to find what’s important can be difficult. In cases like this, it’s useful to be able to sort and filter so you can focus on specific dates, directories, file names, or other attributes. Of course, sorting and filtering applies to much more than just reviewing metadata. Most structured data, meaning data that has some parsable record format, is well suited for sorting and filtering. However, depending on the volume of data you are looking at, sorting and filtering can be time consuming and clunky. It’s best to sort and filter when you have general leads, or know certain properties of the data that are effective in helping you find what you are after. If sorting and filtering doesn’t help much in a specific case, you may need to explore the next category, statistical analysis.
Statistical analysis is typically used in cases where you don’t know exactly what you are looking for or how to look for it. A statistical analysis will normally help to uncover patterns or anomalies that would be difficult or impossible for a human to discover. For example, if you have a large volume of web server logs to review for “malicious activity,” it may be useful to use a log analysis tool. Common features of these tools include automated parsing of fields, geolocation lookups, indexing, and generation of statistics and summaries based on different ways you can slice the data. Once the processing is complete, you may notice patterns based on statistics for requests made to the server. For example, a dashboard display of Apache logs processed by the log analysis tool Sawmill is shown here:
image
As you drill down into the statistics, perhaps you find there were numerous “POST” requests on a single day in a month. You could investigate those specific requests in more detail to determine if they were malicious. Perhaps on another day, an above average amount of total data was transferred. If you look into what was transferred, you may discover something relevant to the investigation. When reviewing the results of an analysis, be careful not to let it consume you. Many “anomalies” are either false positives, infrequent but legitimate events, or perhaps are really malicious but are not the bad guys you are after.
The “string” or “keyword” search is a basic analysis method that forensic examiners have used since the beginning of computer forensics. The idea is that you create a list of keywords (strings) that are used to search evidence (files) with the goal of uncovering data that will help you answer investigative questions. This method is one of the most obvious to use whenever you are interested in finding keywords relevant to an investigation. However, there are many subtleties and nuances. As discussed earlier in this chapter, certain conditions, such as encoding or formatting, can render a string search useless. It is imperative that the analyst understand how the string they are searching for is represented in the data they are searching.
Building on the keyword search concept, long, long ago one of the next evolutionary steps in disk forensics was to search unallocated and slack space. Being able to keyword search unallocated and slack space opened up new sources of evidence. Unallocated space includes deleted files, which is frequently a very important source of evidence. File slack is the data present between the logical end of a file and the end of the allocation unit. File slack is technically allocated space (another file cannot use that area); however, slack space typically contains data that was part of the previous file or some random contents of memory, or both. In most cases, a user cannot control what data is placed in slack space. This can spread evidence of user activities throughout the drive, and cover long periods of time. The tools and procedures you use to find evidence should include unallocated and slack space.
The final category we cover is known as “file carving.” This technique combines aspects of several other methods. The idea is to search for a unique sequence of bytes that corresponds with the header, or the first few bytes, of a file. Most common file formats have a standardized header that marks the beginning of a file, and sometimes a footer that marks the end of the file. A sample header for a JPEG graphics file is shown next:
image
In this case, the most common portion of the header is the first 10 bytes. The goal of the process is to identify all instances of a specific file type that exist in the source evidence and extract them for analysis. Most commercial forensic tools can perform this type of analysis. There are also open source tools, such as Foremost.
image
GO GET IT ON THE WEB
This method is not affected by file extensions, file names, whether the file is active or deleted, or even a file at all. This technique is a very powerful method to identify and recover specific file types. However, sometimes you may not be interested in locating entire files. If a file format consists of uniquely identifiable discrete records, you can attempt to locate and extract the individual records. This technique attempts to extract records based on a record header and some knowledge of the record format. In the case where a file is partially overwritten, is fragmented, or you are analyzing a memory image or swap space, searching for records instead of entire files is more likely to return useful information.
EVALUATE RESULTS
An important part of the analysis process is to evaluate your results and adjust your methods as needed. There are two parts to this:
• You should evaluate results periodically throughout the analysis process.
• Once the process is complete, you should evaluate how well the result answers the investigative questions.
Don’t wait until the end of a long analysis process before checking results. The reason is, many things can go wrong. For example, a keyword that sounds unique might occur very frequently, causing so many false positives that the output is useless. Or perhaps you attempt to carve for ZIP files and the tool is detecting millions of ZIPs. Or possibly the opposite—you might look for a keyword you expect to see a moderate number of hits on, but after six hours of running there are still no hits. Watching for conditions like this, sometimes called “sanity checking,” is a key part of your job as an analyst. The root cause might be a simple mistake in setting up the parameters of the process—a typo, perhaps. Sometimes, the problem is with the approach. It’s best to find out about either of these as soon as possible in the process, so you can fix the issue. We normally spot-check some of the initial results to see if we’re getting what we expect.
If things go well throughout the analysis process, you’ll be left with some results to look at. Once you begin your review, be sure to have the relevant investigative questions fresh in your mind. Examine the results in that context and build evidence to support a position. Sometimes you may get inconclusive results or even no results (for example, a keyword search that returns no hits). Perhaps you can improve the keyword, but that is not always possible. Also consider that the absence of a hit is not proof that keyword never existed—it just doesn’t exist now. If the results you are looking at don’t help, you may need to consider a different approach or different sources of evidence.
SO WHAT?
Taking the time to familiarize yourself with data, tools, and methods is a critical part of being a good investigator. All too often, we see analysts who don’t understand the data they are looking at, don’t understand (or test) the tools they are using, or don’t know if the method will even provide them with valid results. We recommend that you keep a healthy skepticism about the results of any tools or processes you use. Always ask yourself if you are taking the right approach—because if you don’t, your opposition most certainly will.
QUESTIONS
 1. Based on the information presented in this chapter, what do you think is the most challenging part of the analysis process? Explain why.
 2. Given the following scenario, explain how you would proceed. An investigation stakeholder tells you that one of the most critical objectives is to prove that a file with a specific MD5 hash was not present on a system at the time of analysis. You have a recent forensic disk image for the system.
 3. List four common types of text encoding. Explain how you would perform an effective keyword search if your source data were encoded with them.
 4. A manager at another office lets you know to expect a disk image via courier within the next day. You are tasked with recovering deleted files. What questions would you ask before the image arrives? Why?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.221.67