CHAPTER FOUR

Files

AS WE MOVE FURTHER ALONG, as we dive deeper into exploring and gaining an understanding of the science behind cyber forensics, our goal is to provide useable materials to the reader, materials that will sustain the evolutionary creep of technology, materials that will not become dated or obsolete before they are published.

In order to accomplish our goal, it is necessary to explore general or broad concepts (although perhaps complex) while refraining from addressing specific software, programs, or even generalized forensic tools, which can quickly become dated and obsolete over time.

As we examine further the building blocks of cyber forensics, special attention has been made to focus on tools that will not quickly become dated, expire, or no longer be vendor supported. We spend more time, for example, discussing a tool such as a HEX editor, versus discussing the Windows NT operating system. A HEX editor has been around for as long as well—since HEX itself—whereas Windows NT is quickly becoming less and less relevant.

Read on as we continue with our exploration of the science behind cyber forensics, focusing here on files, file signatures, and their role and relevancy in cyber forensic investigations.

OPENING

In Chapter 3 the following topics were addressed:

1. Discussed HEX and the steps involved with converting this binary representation to ASCII.

2. Covered the actual conversion process in an effort to better understand the HEX character representation.

3. Went into some detail discussing the nuts and bolts of HEX.

4. Referred to HEX editors and their function, but stopped short of probing deeply into the functionality and usefulness of a HEX editor when viewing files (or pieces of files). This discussion is saved for a later chapter.

HEX, as was discussed, is useful when attempting to view a file that is partially deleted. This begs the questions:

1. Why would a partially deleted file have difficulties being opened or viewed normally?

2. What parts of a file does a HEX editor allow us to see, which otherwise would not be visible?

FILES, FILE STRUCTURES, AND FILE FORMATS

To answer the questions posed above, we need to further investigate the basics of a file, file structures, and file formats. A partially deleted file in many cases may be missing part of its formatting data, the data that identifies the file.

It is this formatting information that identifies the file to its parent or native software. If a file does not contain this formatting information, the software or operating system (OS) will most likely not be able to access or execute the file. It is this formatting information that uniquely identifies a file.

There are hundreds of different formats for data (databases, word processing, spreadsheets, images, video, etc.). There are also formats for executable programs on different platforms (Windows, Mac, Linux, Unix, etc.). Each format defines how the sequence of bits and bytes are laid out, with the ASCII based text file being one of the simplest formats for humans to decipher.

Some file formats are designed to store very particular sorts of data: the JPEG format, for example, is designed only to store static photographic images. Other file formats, however, are designed for storage of several different types of data: the GIF format supports storage of both still images and simple animations, and the QuickTime format can act as a container for many different types of multimedia.

A text file is simply one that stores any text, in a format such as ASCII or UTF-8, with few if any control characters. Some file formats, such as HTML, or the source code of some particular programming language, are in fact also text files, but adhere to more specific rules which allow them to be used for specific purposes.1

There are a wide variety of digital file types in our ever expanding electronic universe. These various file types contain specific formatting information which allows for file access, storage, or “manipulation.” This “manipulation” may occur via the operating system itself, or it may occur via a “parent” program installed on the operating system.

Parent program, meaning the program and possibly proprietary software, is used to create, execute, or otherwise access the file. In most cases a file will contain data, its file signature, from which its parent software (or the operating system) will be able to identify and handle its operation.

This file signature information is contained in what is sometimes referred to as a file header. The data contained within a file header is not seen by the casual user, yet is very important for the file to function as designed. It is this data contained within the file header that is used to identify the format of the file.

File headers may also contain data regarding the integrity of the file as well as information about itself and its contents. This data is often referred to as metadata.

There is no one specific file format structure that fits all file types. File formats will vary as does file content. The contents of an image, as well as its format, for example, will be different from the contents and format of a word processing document.

A summary of some more common file formats along with their Windows file extensions can be found in Appendix 4A.

FILE EXTENSIONS

Within the Windows Operating System environment, file formats are easily identified by file extensions.

The Windows Operating System uses file extensions to “bind” an application to a specific file type. For example, Windows will bind Adobe Reader software to the .PDF file extension, or MS Word to the .DOC (or .DOCX) file extension.

File extensions are specific to the Windows Operating System and without an extension the Windows Operating System would not know how to open, process, or handle a file. The Windows Operating System looks at the extension when binding a file to an application.

Question: What would occur if the file extension of an executable (.EXE) file was changed to that of an Adobe file extension (.PDF)?

Answer: Windows would look at the file extension and see that it’s a .PDF; it would therefore hand that file over to Adobe to open. Adobe would attempt to launch or open the file and report an error since the file, regardless of its name, is not actually an Adobe file.

Windows stores this application binding information in a section of the Operating System (OS) called the registry.

Each file type contains a corresponding file extension; this correlation stored within the registry tells the OS what type of program is needed to access a certain file type. This is Window’s way of organizing the many different types of files to their corresponding software.

When the OS identifies an extension, say .CSV (Comma Separated Values), the OS looks to the registry and finds which application is bound to this extension. It most cases, MS Excel is bound to CSVs, so Windows will hand that file over to Excel to open and process. A file extension and/or its corresponding registry information can be manipulated by a savvy user.

For example, suppose a change was made to the registry so that the .CSV file extension was associated to and therefore opened with an image viewer such as Windows Picture Viewer. If a user were to click on an actual CSV file, Windows would hand that file over to Windows Viewer instead of the logical application (e.g., Excel); the image viewer would then attempt to open the file. If the file was an actual CSV file, a “no preview available” message or an error would be displayed, as the Windows Viewer application would not be able to process a file with the .CSV file format extension.

Say the file was an image, which had been renamed with a .CSV extension. Windows would hand that CSV file over to the image viewing software and the image would be displayed. A file with an incorrect file extension would open as long as the Windows Registry had that “incorrect” file extension associated with the correct software. Remember, changing or renaming a file’s extension does not change the content of the file; it only changes the way in which Windows OS handles the file (i.e., which application the file is sent to).

So why is the way the OS handles the interpretation of a file’s extension important to the cyber forensic investigator?

What if a cyber forensic investigator receives a forensic image of a suspected child molester’s hard drive, and searches the drive’s contents for image files (e.g., JPGs). Let’s say that the investigator is unable to find the existence of files with image extensions such as .JPG. Is this case closed? Is the suspected child molester innocent and free to go?

Hardly; there could be plenty of images on this hard drive that have just been renamed. The fact that Windows uses file extensions creates a means by which a user can hide information by renaming file extensions.

CHANGING A FILE’S EXTENSION TO EVADE DETECTION

The process to change a file’s extension to evade detection is quite simple, as shown in the following steps.

Step 1

Create a legitimate looking folder into which you wish to place your files (see Figure 4.1).

FIGURE 4.1 Creating a Misnamed but Legitimate Looking Folder

image

Step 2

Open the misnamed but legitimate looking folder called My Doxuments (see Figure 4.2).

FIGURE 4.2 Contents of My Doxuments Folder

image

Step 3

Open the Tools tab and select Folder Options (see Figure 4.3).

FIGURE 4.3 Preparing to Change a File’s Extension

image

Step 4

Open the View tab (see Figure 4.4).

FIGURE 4.4 File Management Option to Hide File Extensions

image

Step 5

Uncheck “Hide extensions for known file types” (see Figure 4.5).

FIGURE 4.5 Option to Hide File Extensions Deselected

image

Step 6

File extension type is revealed (see Figure 4.6).

FIGURE 4.6 Individual File Extension Types Revealed

image

Step 7

Right-click on the file name to Rename the file, including providing any valid file extension type.

The file type is changed based upon the extension provided (see Figure 4.7).

FIGURE 4.7 Renaming File and File’s Extension

image

Step 8

Click Hide extensions for known file types, to hide the new file extensions (see Figure 4.8).

FIGURE 4.8 Original Files and Extensions and the Files after Changing Their Extensions

image

Where there were once 10 JPEG image files there are now only six (see Figure 4.9). Scanning simply for image files will result in missing the four files with modified extensions!

FIGURE 4.9 Results of Changes to a File’s Extension

image

Advice to the potential criminal: It may be wise to rename the file names from “racy pic” to something more inconspicuous! Also, using or renaming a less well-known folder buried further down in the directory tree may be advantageous.

Remember Windows looks at a file’s extension first, and hands that file over to the appropriate application to open. A Microsoft Word application attempting to open a .JPEG or .TIF file would attempt to launch or open the file and report an error since the file, regardless of its name, is not actually a Microsoft Word file.

FILES AND THE HEX EDITOR

In our intellectual property theft case, Ronelle Sawyer is investigating whether Jose McCarthy has potentially engaged in the unlawful distribution of his organization’s intellectual property to a competitor, Janice Witcome, Managing Director of the XYZ Company.

Ronelle is faced with examining millions of pieces of potential evidential data residing on Jose’s hard drive, such as any occurrence of the character string “X,” “Y,” and “Z.” To add to the complexity of Ronelle’s task, these files could have easily been renamed and moved to locations buried deep within the logical folder structure of the computer.

Figure 4.10 displays the contents of a folder (7.0.6000.374) buried within the Windows folder structure, a location which would normally contain system files such as .DLLs. The folder’s name is [C:WINDOWSsystem32SoftwareDistributionSetupServiceStartupwups2.dll7.0.6000.374]

FIGURE 4.10 Windows Folder 7.0.6000.374

image

Windows Folder 7.0.6000.374 contains two files: file #1, wups2.dll, and file #2, systemm32.dll.

Remember, there can be hundreds if not thousands of folders and even more files, all of which may seem inconsequential as they are scattered and stored throughout an individual’s hard drive.

So these file types (i.e., .DLL) seem normal enough, right?

Let’s look at the files with a HEX editor. There are many HEX editors available, most of which are free to download. Google is your friend; just be sure you are downloading a HEX editor and not a Trojan.

Figure 4.11 shows File #1 “wups2.dll” viewed in a HEX editor.

FIGURE 4.11 HEX Editor View of File wups2.dll

image

Figure 4.12 shows File # 2, “systemm32.dll” viewed in a HEX editor.

FIGURE 4.12 HEX Editor View of File systemm32.dll

image

Contained within the file format information is the file signature, sometimes referred to as the “Magic Number.

Magic numbers are referred to as magic because the purpose and significance of their values are not apparent without some additional knowledge. The term magic number is also used in programming to refer to a constant that is employed for some specific purpose but whose presence or value is inexplicable without additional information.2

FILE SIGNATURE

A file signature is the binary that identifies a particular file: the data that will aid in the identification of the file to its native or parent software.

For common file formats, the file signatures conveniently represent the names of the file types. For example, image files conforming to the widely used GIF87a format in HEX equals 0x474946383761; when converted into ASCII it equates to GIF87a. ASCII is the de facto standard used by computers and communications equipment for character encoding (i.e., associating alphabetic and other characters with numbers).

Likewise, the file signature for image files having the subsequently introduced GIF89a format is 0x3474946383961. For both types of GIF (Graphic Interchange Format) files, the file signature occupies the first six bytes of the file. They are then followed by additional general information (i.e., metadata) about the file.

Similarly, a commonly used file signature for JPEG (Joint Photographic Experts Group) image files is 0x4A464946, which is the ASCII equivalent of JFIF (JPEG File Interchange Format). However, JPEG file signatures are not the first bytes in the file; rather, they begin with the seventh byte. Additional examples include 0x34D546864 for MIDI (Musical Instrument Digital Interface) files and 0x425a6831415925 for bzip2 compressed files.

Remember, “0x - zerox” refers to HEX(adecimal) notational value. Therefore, the value 0x474946383761 is not and should not be interpreted as a decimal value, but rather as a hexadecimal representation of a decimal equivalent.

Notice in the HEX editor of file “systemm32.dll” (Figure 4.12) we see a file signature of “d0 cf 11 e0.” This is the known file signature for MS Word. In fact, Microsoft picked the binary code to identify its files with some forethought, as the HEX representation of the binary (which is d0 cf 11 e0) almost spells out (if you look closely and use your imagination) the word “docfile” (d0c f11e 0). Perhaps it’s an example of tech humor or clever, albeit maybe bored, application designers?

Curious? Why would a file with a .dll file extension contain a “docfile” file signature? If we scroll down through the HEX editor some more, we will also see the actual text contained within the file (Figure 4.13).

FIGURE 4.13 Text Contained within the .dll File “systemm32.dll”

image

Notice the HEX value 58 59 5a and its ASCII equivalent, “XYZ” contained within the ASCII Character Panel. HEX editors, as part of their “tool set,” will automatically convert HEX to ASCII, so the rigorous HEX to ASCII conversion process we performed in the previous discussion is not necessary here.

The binary values representing the text “XYZ” are contained within the file “systemm32.dll.” When we rename “systemm32.dll” to “systemm32.doc” and double click the file we will see that it is not a system file (.dll file) after all but a Word document (.doc).

This example shows us the importance of HEX when viewing files or attempting to view files. Since we already know the file signature for MS Word documents, “d0 cf 11 e0,” we can now search the entirety of Jose McCarthy’s drive for those specific HEX characters, revealing the existence of an MS Word document.

Notice we couldn’t search the drive for the ASCII equivalent. The ASCII equivalent of the binary represented by the HEX is “.....”. The “.......” will sometimes be displayed when there is no ASCII equivalent to binary code, such as with file signatures. See Appendix 4C for a further review and discussion of file signatures.

ASCII IS NOT TEXT OR HEX

Remember, the ASCII equivalent of HEX d0 cf 11 e0, is not the file extension “.doc.”

There may not always be an ASCII equivalent to a file type; this is one of the reasons to use HEX, or the importance of HEX. There may not always be an ASCII equivalent of, say, a file header (as in this case), ergo HEX.

Remember, ASCII has limitations and was expanded with Unicode (the Unicode equivalent of the HEX D0 CF 11 E0 is ÐÏ.ࡱ.á (seen and pronounced as DIATA). (See Figure 4.14.) However, this isn’t something easily searchable either as the characters are not all text based.

FIGURE 4.14 ASCII Equivalent of HEX D0 CF 11 E0 (MS Word .doc file)

image

Open a .doc file (pre Office 2007) in a HEX editor and you will notice dots “................” for a lot of varying HEX characters. The file signature for this HEX value may not have an ASCII equivalent as well as some of the code and other file header data.

The first eight (8) bytes are contained in the fixed compound document file identifier, so the file identifier would be all eight (8) bytes: D0 CF 11 E0 A1 B1 1A E1.

However, when searching through HEX the first four (4) bytes would certainly suffice, but it is best to be most accurate, so the full file signature for HEX D0 CF 11 E0 (MS Word .doc file) is D0 CF 11 E0 A1 B1 1A E1 (Figure 4.15).

FIGURE 4.15 ASCII Equivalent of HEX D0 CF 11 E0 (MS Word .doc file)

image

For a more detailed breakdown of the compound file header, please refer to Appendix 4D.

VALUE OF FILE SIGNATURES

We see that even though a file is renamed we can still view the contents. If we were to search for the binary representation of “XYZ” across the entire drive we would find this value regardless of its modified file extension or file signature. As was discussed previously, many times in the course of normal day-to-day operations and file processing, a deleted file and its associated metadata will be partially overwritten, perhaps missing the entire file signature, or other important formatting information and even some text. However, if the binary values representing a piece of evidence (as in our case, “XYZ”) remain within the remnants, then the file can be found.

It is important to note that a forensic examiner cannot always depend on having an intact file or a file with the authorized (correct) file extension (i.e., file type), available in its native format, on which to perform an analysis.

Office 2007 has drastically altered the MS Word file format. The previous example is true for Office 2003 and earlier versions. Microsoft Office 2007 documents are now stored in what is referred to as the Office Open XML File Format.

It is essentially a ZIP file of various XML documents describing the entire document.

The point of the previous example is not to discuss the inner workings of an MS Word file format, but to show how file formats and signatures work in general. All file signatures are different and will continue to evolve. The purpose here is not to cover all file signatures, but to provide the reader with a very practical example of the relevance of a file’s signature in locating and identifying potentially incriminating information as part of a cyber forensic investigation.

There are file signature databases and tables available on the Internet. Most forensic tools are able to identify file signatures and header information, and will verify file types in this manner. Forensic tools will convert binary to ASCII, verify file signatures, and search for binary strings (or keywords, such as “XYZ”) without much effort on the part of the forensic examiner, and now you know HOW the software accomplishes this. See Appendix 4B for an example of a file signature database.

COMPLEX FILES: COMPOUND, COMPRESSED, AND ENCRYPTED FILES

Before ending this section there are other more complex files worth discussing: compound, compressed, and encrypted files. The full complexities of these files are not covered here, as there are books written about each. We do, however, explain some basics and their importance in forensics.

A compound file is a file format that consists of numerous files. The compound file itself is little more than a container for those files. The structure within a compound file is similar to that of a real file system consisting of a hierarchy of storage with one parent directory.

There is a root directory folder, children contained within, and files (data streams) contained therein. Compound files are sometimes associated with Microsoft’s Compound File Binary Format (CFBF) file.

All allocations of space within a Compound File are done in “chunks” or units called sectors. The size of a sector is definable at creation time of a Compound File, and those sectors are usually 512 bytes in size. A virtual stream is made up of a sequence of sectors.

At its simplest, the Compound File Binary Format is a container, with little restriction on what can be stored within it.

However, in forensics, the term compound files is sometimes used more loosely, representing any file that may contain a directory structure. Again, our goal is not to cover a specific file type or software, but concepts generally.

As with other files, the file header of a compound file will contain a file signature, identifying the file; it will also contain information required to interpret the rest of the file such as the file’s size and storage location.

It is this metadata that allows the software to reconstruct the file into the appropriate file format that will display the file’s specific information (i.e., size, creation date, change date, etc.). The file therefore needs to be “reconstructed” by its parent software in order for the data to be legible or otherwise accessible.

To further explain, we typically think of data storage as linear. For example, consider the information in the following data stream, “XYZ Corp.” The data is displayed in a linear contiguous pattern, X before Y and Y before Z. If that data was displayed in a nonlinear pattern we would see perhaps, “oZ pYCrX.”

If that same data now were not contiguous, other data from that same compound file may also be intertwined (e.g., ...?>>o....Z^qL p....77Ymn....C@qwerbsbdX......,,,,). The original data stream “XYZ Corp” is not as easily discernable now. Even searching for the HEX equivalent wouldn’t help us uncover the data in this example.

We would need an instruction set to reconstruct this data.

WHY DO COMPOUND FILES EXIST?

Files have become more complex and need to contain a lot of information. Many files contain Object Linking and Embedding (OLE) technology, in which one file may contain many files.

OLE (object linking and embedding) allows users to integrate data from different applications. Object linking allows users to share a single source of data for a particular object. The document contains the name of the file containing the data, along with a picture of the data. When the source is updated, all the documents using the data are updated as well.

With object embedding, one application (referred to as the “source”) provides data or an image that will be contained in the document of another application (referred to as the “destination”). The destination application contains the data or graphic image, but does not understand it or have the ability to edit it. It simply displays, prints, and/or plays the embedded item. To edit or update the embedded object, it must be opened in the source application that created it. This occurs automatically when you double-click the item or choose the appropriate edit command while the object is highlighted.

While embedding doesn’t allow users to have a single source of data, it does make it easier to integrate applications. An embedded object contains the actual data for the object, the name of the application that created it, and a picture of the data.3

For instance, an MS Word document may contain a JPG image; a file within a file. Compound files allow for incremental access, allowing for individual components to be accessed without the need of the entire file. This can save time and resources by not having to load an entire file, only the piece or pieces desired.

COMPRESSED FILES

As we continue on with our discussion, compressed files are essentially compound files (and sometimes referred to as such in the forensic community) that are compressed. They work in similar fashion; however, also contained within the compound file are compression instructions.

A common file extension associated with compressed files is .ZIP. This file format has gone mainstream and is supported by many software utilities other than its parent software, PKZIP. The .ZIP file format was publically released, making it an open format which is used by other programs including Microsoft’s Open Office XML format. The ZIP file extension name is often used to describe any archival file format. There are other ZIP file formats including WINZIP, 7-Zip, GZip, and RZip.

The file format of a compressed file (or .ZIP file) changes depending upon its compression algorithm. Algorithms are the mathematical operations or instructions for completing a task, in this case compressing data. It is a method of encoding data using fewer bits than used in the original encoding. Algorithms are complex to say the least and books have been written regarding this topic.

To exemplify the difference between a regular file and a more complex file (compound or compressed) it would be best to examine a similar file in both formats.

Let’s examine a letter from Jose McCarthy, seized as part of the Ronelle Sawyer investigation. The letter examined was an MS Word file format (i.e., .doc), as was made clear when viewed via the HEX editor, shown here again.

We can easily see the doc file signature, d0 cf 11 e0, displayed in the HEX editor in Figure 4.16.

FIGURE 4.16 MS Word Office 2003 Document File Signature

image

By knowing the file signature for an MS Word document, we can easily identify and/or search for the text contained within this .doc file, and in doing so, find references to the “XYZ” company, as shown in Figure 4.17.

FIGURE 4.17 HEX and ASCII Identification of “XYZ” Company

image

What happens, however, when an application is upgraded? How might this effect the application’s file signature? To see the result of a change in software application file formatting, let us view the same document file from Jose McCarthy with a HEX editor, when Microsoft Office 2007 rather than Office 2003 is used to generate the document.

We see in Figure 4.18 that the file signature has changed. If you search for a file signature matching 50 4B 03 04 you will notice it corresponds to a .ZIP file signature (ASCII panel shows PK . . . . format). With the release of Office ‘07, Microsoft Word documents now use the same file format signature as a .ZIP file.

FIGURE 4.18 MS Word Office 2007 Document File Signature

image

What is the importance to a cyber forensic investigation and what does this mean? For starters, it means that the file is a compound file consisting of other files. If we were to view the entirety of the file with our HEX editor we would not uncover any legible ASCII characters (see ASCII panel in Figure 4.17).

Why? The file structure and assembly instructions are contained within the file; thus, the file would need to be mounted by its native software in order for the contents to be viewed. As can be seen in Figure 4.19, the ASCII representation is not identifiable.

FIGURE 4.19 Identifiable ASCII Representation for .doc File

image

Viewing and, more importantly, searching the contents of these “complex” files are possible once they are mounted. Forensic tools incorporate the software to mount these so that searching is possible. If these complex files are not mounted then no search results will be obtained.

FORENSICS AND ENCRYPTED FILES

Encrypted files are also complex but differ in that an encryption key is required to decrypt an encrypted file.

Encryption uses an algorithm (cipher) to alter or transform the data in an attempt to prevent reconstruction by those without the instruction set, a.k.a. Encryption Key. Decryption refers to the reverse process of making the data readable or otherwise accessible.

Encryption is a method by which the confidentiality of data can be protected. For the most part, an encrypted file cannot be decrypted without the encryption key (aka password). The encryption process uses an algorithm or cipher to mathematically transform the plaintext along with the encryption key (password), thereby encoding it in such a manner that it is illegible or indecipherable.

With the correct decryption key (password) the data is then run through its associated cipher text (algorithm) and converted back to clear text, which is, by default, decrypted. Remember, this entire process occurs in binary, as 0s or 1s.

It is the cipher that actually changes the file; the password is just a set of data which are used to “mathematically mix” and set the process in motion, turning the plaintext data into an unreadable end product.

THE STRUCTURE OF CIPHERS

The structure of ciphers depends upon the cipher’s type. Types of ciphers vary but generally they can be categorized by the following:

  • Block or stream. Block ciphers generally work on fixed length bits of data called blocks. The cipher may take a 256–bit block of plaintext data and encrypt it, which results in a 256-bit block of encrypted data. In a stream cipher, the plaintext bits are encrypted one at a time along with the encryption key.
  • Symmetric or asymmetric. In symmetric encryption, the same encryption key or password is used for both encryption and decryption, whereas with asymmetric encryption different keys are used. Symmetric key encryption is intuitive in that the same password is used to encrypt or decrypt the data. Asymmetric key encryption, or public-key cryptography, uses two different encryption keys, a public and a private key. Data is encrypted using a person’s public key, one in which everyone may have access to or even be distributed. However, data can only be decrypted using the person’s private key, one which is kept secret by the individual.

There are various encryption methods available, such as the Advanced Encryption Standard (AES), which is currently the standard adopted by the United States government and one of the most popular encryption methods available and in use today.

There are other encryption algorithms (or formats) available and many books have been written regarding each. It is not within the scope of this text to cover the various standards of encryption.

A similar attribute shared by all these “complex” file types discussed is that they contain some level or form of instruction needed to reconstruct the file. If that information is overwritten or otherwise missing, the ability to retrieve the data contained within the file will be severely compromised.

If the instructional data needed to reconstruct a compound file is missing, overwritten, destroyed, compromised, etc., the file may not be recoverable, even though the data containing the evidence (e.g., XYZ Corp) may still be contained within the file itself.

However, with that said, it may be possible to reconstruct a complex file which has been partially overwritten. Forensic analysts are creative, cutting edge, innovative, and very intelligent; they have developed solutions for some of the most complex problems. However, recovering the data with normal “point and click” methods may not always be possible.

SUMMARY

It is important to understand that not all binary values are convertible into readable ASCII. ASCII is a code, based on the ordering of the English alphabet¸ and not all data contained within a computer is necessarily text (ASCII) based. There are many programs or software applications which are written in programming code which is not ASCII based.

This programming code is not meant to be viewed in ASCII, it is meant to perform a function. Recall from our earlier discussions, a computer’s functions are all based on math, not the English (nor French, Chinese, Slavic, Greek, or Arabic) language; code therefore needs to be based on mathematical principles not grammatical ones.

A file’s type or format is based upon its file signature, not a Microsoft Windows extension. The file header, including the file signature is best viewed in HEX as there is no legible or identifiable corresponding ASCII representation. As we discussed, file signature/headers are the pieces of a file which identify the file to its “parent” software, not to the user.

Thus, when we view the HEX editor and see HEX values appearing as “...............” in the ASCII Character Panel, this could mean that there may not be an ASCII representation for those HEX values. The HEX values, when they do exist, are unique and therefore searchable.

It is very easy (and potentially dangerous) to become dependent on the forensic tools and forget the nuts and bolts of the technological process, and forget or even be unaware HOW the answer is obtained.

Reliance on any “tool” without having a solid understanding of how the tool works could spell personal and professional disaster for the cyber forensic investigator.

This is akin to successfully providing the correct answers to all the questions on a mathematics exam, and still receiving a failing grade because you failed to show your work.

If asked to explain how an answer was obtained or on what data analysis a conclusion is reached, if one were to reply, “I used tool ‘ABC’ and it provided the answer,” and if you are unable to explain how the tool obtained the answer or how you could validate and substantiate that the answer you provided was correct, the validity and reliance of your answer could be called in to question and held suspect.

Use a tool, but be certain you know how the tool works and how to replicate the results if you had to do so, without the tool.

NOTES

1. File format, retrieved January 2010, www.answers.com/topic/file-format.

2. Bellevue Linux Users Group (BLUG), Magic Number Definition,” The Linux Information Project, August 21, 2006, retrieved January 2010, www.linfo.org/magic_number.html.

3. “Common Questions: Object Linking and Embedding, Data Exchange,” Microsoft Support, retrieved November 2011, http://support.microsoft.com/kb/122263, © 2007 Microsoft Corporation. All rights reserved. Used with permission from Microsoft.

APPENDIX 4A: COMMON FILE EXTENSIONSa

Common file extensions that are good to know, organized by file format.

Text Files
.doc Microsoft Word Document
.docx Microsoft Word Open XML Document
.log Log File
.msg Outlook Mail Message
.pages Pages Document
.rtf Rich Text Format File
.txt Plain Text File
.wpd WordPerfect Document
.wps Microsoft Works Word Processor Document
Data Files
.123 Lotus 1-2-3 Spreadsheet
.accdb Access 2007 Database File
.csv Comma Separated Values File
.dat Data File
.db Database File
.dll Dynamic Link Library
.mdb Microsoft Access Database
.pps PowerPoint Slide Show
.ppt PowerPoint Presentation
.pptx Microsoft PowerPoint Open XML Document
.sdb OpenOffice.org Base Database File
.sdf Standard Data File
.sql Structured Query Language Data
.vcfv Card File
.wks Microsoft Works Spreadsheet
.xls Microsoft Excel Spreadsheet
.xlsx Microsoft Excel Open XML Document
.xml XML File
Image Files
.pct Picture File
Raster Image Files
.bmp Bitmap Image File
.gif Graphical Interchange Format File
.jpg JPEG Image File
.png Portable Network Graphic
.psd Photoshop Document
.psp Paint Shop Pro Image File
.thm Thumbnail Image File
.tif Tagged Image File
Vector Image Files
.ai Adobe Illustrator File
.drw Drawing File
.dxf Drawing Exchange Format File
.eps Encapsulated PostScript File
.ps PostScript File
.svg Scalable Vector Graphics File
3D Image Files
.3dm Rhino 3D Model
.dwg AutoCAD Drawing Database File
.pln ArchiCAD Project File
Page Layout Files
.indd Adobe InDesign File
.pdf Portable Document Format File
.qxd QuarkXPress Document
.qxp QuarkXPress Project File
Audio Files
.aac Advanced Audio Coding File
.aif Audio Interchange File Format
.iff Interchange File Format
.m3u Media Playlist File
.mid MIDI File
.midi MIDI File
.mp3 MP3 Audio File
.mpa MPEG-2 Audio File
.ra Real Audio File
.wav WAVE Audio File
.wma Windows Media Audio File
Video Files
.3g2 3GPP2 Multimedia File
.3gp 3GPP Multimedia File
.asf Advanced Systems Format File
.asx Microsoft ASF Redirector File
.avi Audio Video Interleave File
.flv Flash Video File
.mov Apple QuickTime Movie
.mp4 MPEG-4 Video File
.mpg MPEG Video File
.rm Real Media File
.swf Flash Movie
.vob DVD Video Object File
.wmv Windows Media Video File
Web Files
.asp Active Server Page
.css Cascading Style Sheet
.htm Hypertext Markup Language File
.html Hypertext Markup Language File
.js JavaScript File
.jsp Java Server Page
.php Hypertext Preprocessor File
.rss Rich Site Summary
.xhtml Extensible Hypertext Markup Language File
Font Files
.fnt Windows Font File
.fon Generic Font File
.otf OpenType Font
.ttf TrueType Font
Plugin Files
.8bi Photoshop Plug-in
.plugin Mac OSX Plug-in
.xll Excel Add-In File
System Files
.cab Windows Cabinet File
.cpl Windows Control Panel
.cur Windows Cursor
.dmp Windows Memory Dump
.drv Device Driver
.key Security Key
.lnk File Shortcut
.sys Windows System File
Settings Files
.cfg Configuration File
.ini Windows Initialization File
.prf Outlook Profile File
Executable Files
.app Mac OS X Application
.bat DOS Batch File
.cgi Common Gateway Interface Script
.com DOS Command File
.exe Windows Executable File
.pif Program Information File
.vb VBScript File
.ws Windows Script
Compressed Files
.7z 7-Zip Compressed File
.deb Debian Software Package
.gz Gnu Zipped File
.pkg Mac OS X Installer Package
.rar WinRAR Compressed Archive
.sit Stuffit Archive
.sitx Stuffit X Archive
.zip Zip File
.zipx Extended Zip File
Encoded Files
.bin Macbinary II Encoded File
.hqx BinHex 4.0 Encoded File
.mim Multi-Purpose Internet Mail Message
.uue Uuencoded File
Developer Files
.c C/C++ Source Code File
.cpp C++ Source Code File
.java Java Source Code File
.pl Perl Script
Backup Files
.bak Backup File
.bup Backup File
.gho Norton Ghost Backup File
.ori Original File
.tmp Temporary File
Disk Files
.dmg Mac OS X Disk Image
.iso Disc Image File
.toast Toast Disc Image
.vcd Virtual CD
Game Files
.gam Saved Game File
.nes Nintendo (NES) ROM File
.rom N64 Game ROM File
.sav Saved Game
Misc Files
.msi Windows Installer Package
.part Partially Downloaded File
.torrent BitTorrent File
.yps Yahoo! Messenger Data File

APPENDIX 4B: FILE SIGNATURE DATABASE

image

image

image

image

APPENDIX 4C: MAGIC NUMBER DEFINITIONb

A magic number is a number embedded at or near the beginning of a file that indicates its file format (i.e., the type of file it is). It is also sometimes referred to as a file signature.

Magic numbers are generally not visible to users. However, they can easily be seen with the use of a HEX editor, which is a specialized program that shows and allows modification of every byte in a file.

For common file formats, the numbers conveniently represent the names of the file types. Thus, for example, the magic number for image files conforming to the widely used GIF87a format in hexadecimal (i.e., base 16) terms is 0x474946383761, which when converted into ASCII is GIF87a. ASCII is the de facto standard used by computers and communications equipment for character encoding (i.e., associating alphabetic and other characters with numbers).

Likewise, the magic number for image files having the subsequently introduced GIF89a format is 0x474946383961. For both types of GIF (Graphic Interchange Format) files, the magic number occupies the first six bytes of the file. They are then followed by additional general information (i.e., metadata) about the file.

Similarly, a commonly used magic number for JPEG (Joint Photographic Experts Group) image files is 0x4A464946, which is the ASCII equivalent of JFIF (JPEG File Interchange Format). However, JPEG magic numbers are not the first bytes in the file; rather, they begin with the seventh byte. Additional examples include 0x4D546864 for MIDI (Musical Instrument Digital Interface) files and 0x425a6831415925 for bzip2 compressed files.

Magic numbers are not always the ASCII equivalent of the name of the file format, or even something similar. For example, in some types of files they represent the name or initials of the developer of that file format. Also, in at least one type of file the magic number represents the birthday of that format’s developer.

Various programs make use of magic numbers to determine the file type. Among them is the command line (i.e., all-text mode) program named file, whose sole purpose is determining the file type.

Although they can be useful, magic numbers are not always sufficient to determine the file type. The main reason is that some file types do not have magic numbers, most notably plain text files, which include HTML (hypertext markup language), XHTML (extensible HTML), and XML (extensible markup language) files as well as source code.

Fortunately, there are also other means that can be used by programs to determine file types. One is by looking at a file’s character set (e.g., ASCII) to see if it is a plain text file. If it is determined that a file is a plain text file, then it is often possible to further categorize it on the basis of the start of the text, such as <html> for HTML files and #! (the so-called shebang) for script (i.e., short program) files.

Another way to determine file type is through the use of filename extensions (e.g., .exe, .html, and .jpg), which are required on the various Microsoft operating systems but only to a small extent on Linux and other Unix-like operating systems. However, this approach has the disadvantage that it is relatively easy for a user to accidentally change or remove the extensions, in which case it becomes difficult to determine the file type and use the file.

Still another way that is possible in the case of some commonly used filesystems is through the use of file type information that is embedded in each file’s metadata. In Unix-like operating systems, such metadata is contained in inodes, which are data structures (i.e., efficient ways of storing information) that store all the information about files except their names and their actual data.

Magic numbers are referred to as magic because the purpose and significance of their values are not apparent without some additional knowledge. The term magic number is also used in programming to refer to a constant that is employed for some specific purpose but whose presence or value is inexplicable without additional information.

APPENDIX 4D: COMPOUND DOCUMENT HEADERc

The first 512 bytes of the file may look like Table 4D.1.

TABLE 4D.1 Compound Document Header

00000000H D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00
00000010H 00 00 00 00 00 00 00 00 3B 00 03 00 FE FF 09 00
00000020H 06 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
00000030H 0A 00 00 00 00 00 00 00 00 10 00 00 02 00 00 00
00000040H 01 00 00 00 FE FF FF FF 00 00 00 00 00 00 00 00
00000050H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000060H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000070H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000080H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000090H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000000A0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000000B0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000000C0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000000D0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000000E0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000000F0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000100H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000110H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000120H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000130H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000140H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000150H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000160H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000170H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000180H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000190H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000001A0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000001B0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000001C0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000001D0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000001E0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000001F0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

Examining the details of this Compound Document Header discloses the following: eight (8) bytes containing the fixed compound document file identifier (Table 4D.2).

TABLE 4D.2 Document File Identifier

00000000H D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00

Sixteen (16) bytes containing a unique identifier, followed by four (4) bytes containing a revision number and a version number (Table 4D.3).

TABLE 4D.3 Unique Identifier, Revision Number and Version Number

00000000H D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00
00000010H 00 00 00 00 00 00 00 00 3B 00 03 00 FE FF 09 00

Two (2) bytes containing the byte order identifier. It should always consist of the byte sequence FEH FFH (Table 4D.4).

TABLE 4D.4 Byte Order Identifier

00000010H 00 00 00 00 00 00 00 00 3B 00 03 00 FE FF 09 00

Two (2) bytes containing the size of sectors, two (2) bytes containing the size of short-sectors. The sector size is 512 bytes, and the short-sector size is 64 bytes here (Table 4D.5).

TABLE 4D.5 Size of Sectors

00000010H 00 00 00 00 00 00 00 00 3B 00 03 00 FE FF 09 00
00000020H 06 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00

Ten (10) bytes without valid data can be ignored (Table 4D.6).

TABLE 4D.6 Bytes without Valid Data

00000020H 06 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00

Four (4) bytes containing the number of sectors used by the sector allocation table (SAT). The SAT uses only one sector here (Table 4D.7).

TABLE 4D.7 Number of Sectors Used by the Sector Allocation Table

00000020H 06 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00

Four (4) bytes containing the SecID of the first sector used by the directory. The directory starts at sector 10 here (Table 4D.8).

TABLE 4D.8 SecID of the First Sector Used by the Directory

00000030H 0A 00 00 00 00 00 00 00 00 10 00 00 02 00 00 00

Four (4) bytes without valid data can be ignored (Table 4D.9).

TABLE 4D.9 Bytes without Valid Data

00000030H 0A 00 00 00 00 00 00 00 00 10 00 00 02 00 00 00

Four (4) bytes containing the minimum size of standard streams. This size is 00001000H = 4096 bytes here (Table 4D.10).

TABLE 4D.10 Minimum Size of Standard Streams

00000030H 0A 00 00 00 00 00 00 00 00 10 00 00 02 00 00 00

Four (4) bytes containing the SecID of the first sector of the short-sector allocation table (Table 4D.11).

TABLE 4D.11 SecID of the First Sector of the Short-Sector Allocation Table

00000030H 0A 00 00 00 00 00 00 00 00 10 00 00 02 00 00 00

Four (4) bytes containing the number of sectors used by the SSAT. In this example, the SSAT starts at sector 2 and uses one sector (Table 4D.12).

TABLE 4D.12 Number of Sectors Used by the SSAT

00000040H 01 00 00 00 FE FF FF FF 00 00 00 00 00 00 00 00

Four (4) bytes containing the SecID of the first sector of the master sector allocation table, followed by four (4) bytes containing the number of sectors used by the MSAT. The SecID here is −2, which states that there is no extended MSAT in this file (Table 4D.13).

TABLE 4D.13 SecID of the First Sector of the Master Sector Allocation Table, Followed by Four (4) Bytes Containing the Number of Sectors Used by the MSAT

00000040H 01 00 00 00 FE FF FF FF 00 00 00 00 00 00 00 00

436 bytes containing the first 109 SecIDs of the MSAT. Only the first SecID is valid, because the SAT uses only one sector (see earlier).

Therefore, all remaining SecIDs are set to the special Free SecID with the value −1.

The only sector used by the SAT is sector 0 (Table 4D.14).

TABLE 4D.14 436 Bytes Containing the First 109 SecIDs of the MSAT

00000040H 01 00 00 00 FE FF FF FF 00 00 00 00 00 00 00 00
00000050H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000060H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000070H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000080H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000090H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000000A0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000000B0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000000C0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000000D0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000000E0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000000F0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000100H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000110H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000120H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000130H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000140H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000150H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000160H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000170H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000180H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
00000190H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000001A0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000001B0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000001C0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000001D0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000001E0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000001F0H FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

aThe information in this Appendix came from www.fileinfo.com/common.php.

bBellevue Linux Users Group (BLUG), “Magic Number Definition,” The Linux Information Project, August 21, 2006, retrieved January 2010, www.linfo.org/magic_number.html.

cD. Rentz, D. “Documentation of the Microsoft Compound Document File Format,” OpenOffice.org Source Project, August 7, 2007, retrieved February 2010, http://sc.openoffice.org/compdocfileformat.pdf.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.96.94