Chapter 4. Classification of Infection Strategies

“ All art is an imitation of nature.”

—Seneca

In this chapter, you will learn about common computer virus infection techniques that target various file formats and system areas.

4.1 Boot Viruses

The first known successful computer viruses were boot sector viruses. In 1986 two Pakistani brothers, on the IBM PC, created the first such virus—called Brain.

Today the boot infection technique is rarely used. However, you should become familiar with boot viruses because they can infect a computer regardless of the actual operating system installed on it.

Boot sector viruses take advantage of the boot process of personal computers (PCs). Because most computers do not contain an operating system (OS) in their read-only memory (ROM), they need to load the system from somewhere else, such as from a disk or from the network (via a network adapter).

A typical IBM PC's disk is organized in up to four partitions, which have logical letters assigned to them on several operating systems such as MS-DOS and Windows NT, typically C:, D:, and so on. (Drive letters are particularities of the operating system—for example, UNIX systems use mount points, not driver letters.) Most computers only use two of these partitions, which can be accessed easily. Some computer vendors, such as COMPAQ and IBM, often use hidden partitions to store additional BIOS setup tools on the disk. Hidden partitions do not have any logical names assigned to them, making them more difficult to access. Good tools such as Norton Disk Editor can reveal such areas of the disk. (Please use advanced disk tools very carefully because you can easily harm your data!)

Typically PCs load the OS from the hard drive. In early systems, however, the boot order could not be defined, and thus the machine would boot from the diskette, allowing great opportunity for computer viruses to load before the OS. The ROM-BIOS reads the first sector of the specified boot disk according to the boot order settings in the BIOS setup, stores it in the memory at 0:0x7C00 when successful, and runs the loaded code1.

On newer systems, each partition is further divided into additional partitions. The disk is always divided into heads, tracks, and sectors. The master boot record (MBR) is located at head 0, track 0, sector 1, which is the first sector on the hard disk. The MBR contains generic, processor-specific code to locate the active boot partition from partition table (PT) records. The PT is stored in the data area of the MBR. At the front of the MBR is some tiny code, often called a boot strap loader.

Each PT entry contains the following:

• The addresses of the first and last sectors of the partition

• A flag whenever the partition is bootable

• A type byte

• The offset of the first sector of the partition from the beginning of the disk in sectors

• The size of the partition in sectors

The loader locates the active partition and loads its first logical sector as the boot sector. The boot sector contains OS-specific code. The MBR is general-purpose code, not related to any OS. Thus IBM PCs can easily support more than one partition with different kinds of file systems and operating systems. This also makes the job of computer viruses very simple. The MBR code can be easily replaced with virus code that loads the original MBR after itself and stays in memory, depending on the installed operating system. In the case of MS-DOS, boot viruses can easily remain in memory and infect other inserted media on the fly. A few tricky boot viruses, like Exebug, always force the computer to load them on the system first and then complete the boot process themselves. Exebug changes the CMOS settings of the BIOS to trick the PC into thinking it has no floppy drives. Thus, the PC will boot using the infected MBR first. When the virus is executed (from the hard disk), it checks if there is a diskette in drive A:, and if there is one, it will load the boot sector of the diskette and transfer control to it. Thus when you try to boot from a boot diskette, the virus can trick you into believing that you indeed booted from the diskette, but in reality, you did not.

In the case of floppy diskettes, the boot sector is the first sector of the diskette. The boot record contains OS-specific filenames to load, such as IBMBIO.COM and IBMDOS.COM.

It is advisable to set the boot process in such a way that you boot from the hard drive first. In first-generation IBM PCs, the boot process was not designed that way, so whenever a diskette was left in drive A:, the PC attempted to boot from it. Boot viruses took advantage of this design mistake. By setting the boot process properly, you can easily avoid simple boot sector viruses.

Note

If your system has a SCSI disk connected to it, the system might not boot from those drives first because it is unable to handle these disks directly from its BIOS.

The following sections discuss in detail the basic kinds of MBR and boot sector infection techniques.

4.1.1 Master Boot Record (MBR) Infection Techniques

Infection of the MBR is a relatively trivial task for viruses. The size of the MBR is 512 bytes. Only a short code fits in there, but it is more than enough for a small virus. Typically the MBR gets infected immediately upon booting from an infected diskette in drive A.

4.1.1.1 MBR Infection by Replacement of Boot Strap Code

The classic type of MBR viruses uses the INT 13h BIOS disk routine to access the disks for read and write access. Most MBR infectors replace the boot strap code in the front of the MBR with their own copy and do not change the PT. This is important, because the hard disk is only accessible when booting from a diskette whenever the PT is in place. Otherwise, DOS has no way to find the data on the drive.

The Stoned virus is a typical example of this technique. The virus stores the original MBR on sector 7 (see Figure 4.1). After the virus gets control via the replaced MBR, it reads the stored MBR located on sector 7 in memory and gives it control. A couple of empty sectors are typically available after the MBR, and Stoned takes advantage of this. However, this condition cannot be 100% guaranteed, and this is exactly why some MBR viruses make a system unbootable after infection.

Figure 4.1. The typical layout of the disk before and after a Stoned infection.

image

4.1.1.2 Replacing the MBR Code but Not Saving It

Another technique of viruses to infect the MBR is to overwrite the boot strap code, leaving the PT entries in place but not saving the original MBR anywhere. Such viruses need to perform the function of the original MBR code. In particular, they need to locate the active partition, load it, and give control to it after themselves.

One of the first viruses that used this technique was Azusa2, discovered in January of 1991 in Ontario, Canada. Viruses like this cannot be disinfected with regular methods because the original copy of the MBR is not stored anywhere.

Antivirus programs quickly reacted to this threat by carrying a standard MBR code within them. To disinfect the virus, this generic MBR code was used to overwrite the virus code, thereby saving the system.

4.1.1.3 Infecting the MBR by Changing the PT Entries

An easy target of MBR viruses is the partition table record of the MBR. By manipulating the PT entry of the active partition, a virus can make sure it loads a different boot sector, where the virus body is stored. Thus the MBR will load the virus boot sector instead of the original one, and the virus will load the original after itself.

The StarShip virus is an example of this technique. Some tricky viruses, such as some members of the Ginger family, manipulate the PT entries in such a way as to create a “circular partition”3,4 effect. Apparently this trick causes MS-DOS v4.0–7.0 to run in an endless loop when booted. Thus only a clean MS-DOS 3.3x or some other non-Microsoft-made DOS system, such as PC DOS, must be used to be able to boot properly from a diskette.

4.1.1.4 Saving the MBR to the End of the Hard Disk

A common method of infecting the MBR is to replace the MBR completely and save the original at the end of the hard drive, in the hope that nothing overwrites it there. Some of the more careful viruses reduce the size of the partition to make sure that that this area of the disk will not be overwritten again. The multipartite virus, Tequila, uses this technique.

4.1.2 DOS BOOT Record (DBR) Infection Techniques

Boot sector viruses infect the first sector, the boot sector of the diskettes. They optionally infect the hard-disk boot sectors, as well. There are more known infection techniques to infect boot sectors than there are to infect MBRs.

4.1.2.1 Standard Boot Infection Technique

One of the most frequently used boot infection techniques was developed in viruses like Stoned. Stoned infects a diskette's boot sector by replacing the 512-byte boot sector with its own copy and saving the original to the end of the root directory.

In practice, this technique is safe most of the time, but accidental damage to the content of the diskette can happen if there are too many filenames stored in the diskette's directory. In such a case, the original sector's content might overwrite the content of the directory; as a result, only some garbage is displayed on-screen via a DIR command.

4.1.2.2 Boot Viruses That Format Extra Sectors

Some boot viruses are simply too large to fit in a single sector. Most diskettes can be formatted to store more data than their actual formatted size. Not all floppy disk drives support the formatting of extra sectors, but many do. For example, my first PC clone's diskette drive did not support the access to these areas of diskettes. As a result, some copy-protected software simply did not work properly on my system.

Copy-protection software often takes advantage of specially formatted “extra” diskette sectors placed outside of normal ranges. As a result, normal diskette copying tools, such as DISKCOPY, fail to make an identical copy of such diskettes.

Some viruses specially format a set of extra diskette sectors to make it more difficult for the antivirus program to access the original copy during repair. However, the typical use of extra sectors is to make more space for a larger virus body.

The Indonesian virus, Denzuko, is an example that uses this technique. Denzuko was released during the spring of 1988. Unlike with most other viruses, the author of this virus is known. It was written by Denny Yanuar Ramdhani. The nickname of the virus writer is Denny Zuko, which comes from “Danny Zuko,” the character in the popular musical movie Grease played by John Travolta5. This boot virus was among the first to implement a counterattack against another computer virus. Denzuko killed the Brain virus whenever it encountered it on a computer.

Denzuko also displayed the graphical payload shown in Figure 4.2 for a fraction of a second when Ctrl-Alt-Del was pressed. Then the computer appeared to reboot, but the virus stayed in memory6.

Figure 4.2. Payload of the Denzuko virus.

image

The extremely complex and dangerous Hungarian stealth BOOT/MBR virus, Töltögetö (also known as Filler), uses this technique as well. This virus was written by a computer student at a technical high school in SzE8kesfehE8rvE1r, Hungary, in 1991. Filler has formatting records for both 360KB and 1.2MB diskettes and format sectors on track 40 or 80 on these, respectively. These areas of the diskette are not formatted normally.

A benefit of such an infection technique is the possibility of reviving dead virus code. Reviving attempts were first seen in computer viruses in the early '90s. For example, some COM infector viruses would attempt to load to the very end of the disk, outside of normally formatted areas, and give control to the loaded sector. Many early antivirus solutions did not overwrite the virus code everywhere on the disk during cleanup. The boot sector of the disk was often fixed, and the virus code was considered dead in the diskettes' “out of reach” areas. Unfortunately, this provided the advantage of allowing virus writers to revive such dead virus instances easily, using another virus.

4.1.2.3 Boot Viruses That Mark Sectors as BAD

An interesting method of viruses to infect boot sectors is to replace the original boot sector with the virus code and save the original sector, or additional parts of the virus body, in an unused cluster marked as BAD in the DOS FAT. An example of this kind of virus is the rather dangerous Disk Killer, written in April 19897.

4.1.2.4 Boot Viruses That Do Not Store the Original Boot Sector

Some boot sector viruses do not save the diskette's original boot sector anywhere. Instead, they simply infect the active boot sector or the MBR of the hard disk and give control to saved boot sectors on the hard disk. Thus the diskette infection cannot be repaired with standard techniques because the virus does not need to store the original sector anywhere. Because the boot sector is operating system–specific, this task is not as simple as replacing the MBR code; there are too many different OS boot sectors to choose from. Not surprisingly, the most common antivirus solution to this problem has been to overwrite the virus code with a generic boot sector code that displays a message asking the user to boot from the hard disk instead. As a result, a system diskette cannot be repaired properly.

A second, less common method is to overwrite the diskette boot sector with the virus code, which will infect the MBR or the boot sector of the hard disk. The virus then displays a false error message, such as “Non-system disk or disk error,” and lets the user load the virus from the hard disk. The Strike virus is an example that uses this technique.

A further method to infect the boot sector of diskettes without saving is to mimic the original boot sector functionality and attempt to load some system files. Obviously, this method will only work if the virus code matches the system files on the diskette. The Lucifer virus is an example of this technique.

4.1.2.5 Boot Viruses That Store at the End of Disks

A class of boot viruses replaces the original boot sector by overwriting it and saving it at the end of the hard disk, like MBR viruses, which also do this occasionally. The infamous Form virus uses this method. It saves the original boot sector at the very end of the disk. Form hopes that this sector will be used infrequently, or not at all, and thus the stored boot sector will stay on the disk without too much risk of being modified. Thus the virus does not mark this sector in any way; neither does it reduce the size of the partition that contains the saved sector.

Another class of boot viruses also saves the boot sector at the end of the active partition and makes the partition shorter in the partition table to be certain that this sector is not going to be “free” for other programs to use. Occasionally, the boot sector's data area is modified for the same reason.

4.1.3 Boot Viruses That Work While Windows 95 Is Active

Several boot viruses, typically the multipartite kind, attack the new floppy disk driver of Windows 95 systems stored in SYSTEMIOSUBSYSHSFLOP.PDR. The technique appeared in the Slovenian virus family called Hare (also known as Krishna) in May of 1996, written by virus writer Demon Emperor.

Viruses delete this file to get access to INT 13h, BIOS, real-mode interrupt handler while Windows 95 is active on the system. Without this trick, other boot viruses cannot infect the diskettes using INT 13h because it is not available for them to use.

4.1.4 Possible Boot Image Attacks in Network Environments

Diskless workstations boot using a file image from the server. On Novell NetWare file servers, for instance, the command DOSGEN.EXE can create an image of a bootable diskette, called NET$DOS.SYS, for the use of terminals. The terminals have a special PROM chip installed that searches for the boot images over the network.

This provides two obvious possibilities for the attacker. The first is to infect or replace the NET$DOS.SYS file on the server whenever access is available to it. The second is to simulate the functionality of the server code and host fake virtual servers via virus code on the network with images that contain virus code.

No such viruses are known. However, the NET$DOS.SYS image file is often infected, which is ignored by many virus scanners. This exposes the “dumb terminals” to virus attacks.

4.2 File Infection Techniques

In this section, you will learn about the common virus infection strategies that virus writers8 have used over the years to invade new host systems.

4.2.1 Overwriting Viruses

Some viruses simply locate another file on the disk and overwrite it with their own copy. Of course, this is a very primitive technique, but it is certainly the easiest approach of all. Such simple viruses can do major damage when they overwrite files on the entire disk.

Overwriting viruses cannot be disinfected from a system. Infected files must be deleted from the disk and restored from backups. Figure 4.3 shows how the content of the host program changes when an overwriting virus attacks it.

Figure 4.3. An overwriting virus infection that changes host size.

image

Normally, overwriting viruses are not very successful threats because the obvious side effects of the infections are easily discovered by users. However, such viruses have better potential when this technique is combined with network-based propagation. For instance, the VBS/LoveLetter.A@mm virus mass mails itself to other systems. When executed, it will overwrite with its own copy any local files with the following extensions:

.vbs, .vbe, .js, .jse, .css, .wsh, .sct, .hta, .jpg, .jpeg, .wav, .txt, .gif, .doc, .htm, .html, .xls, .ini, .bat, .com, .avi, .qt, .mpg, .mpeg, .cpp, .c, .h, .swd, .psd, .wri, .mp3, and .mp2

Another overwriting virus infection method is used by the so-called tiny viruses. A classic family of this type is the Trivial family on DOS. During the early 1990s, many virus writers attempted to write the shortest possible binary virus. Not surprisingly, there are many variants of Trivial. Some of the viruses are as short as 22 bytes (Trivial.22).

The algorithm for such viruses is simple:

  1. Search for any (*.*) new host file in the current directory.
  2. Open the file for writing.
  3. Write the virus code on top of the host program.

The shortest viruses are often unable to infect more than a single host program in the same directory in which the virus was executed. This is because finding the next host file would be “as expensive” as a couple of bytes of extra code. Such viruses are not advanced enough to attack a file marked read-only because that would take a couple of extra instructions.

Often the virus code is optimized to take advantage of the content of the registers during program execution as they are passed in by the operating system. Thus the virus code itself does not need to initialize registers that have known content set by the system loader. By using this condition, virus writers can make their creation even shorter.

Such optimization, however, can cause fatal errors when the virus code is executed on the wrong platform, which did not initialize the registers in the way that the virus expected.

Some tricky overwriting viruses also use BIOS disk writes instead of DOS file functions to infect new files. A very primitive form of such a virus was implemented in 15 bytes. The virus overwrites each sector on the disk with itself. Evidently, the system corruption is so major that such viruses kill the host system very quickly, keeping the virus from spreading any further.

Figure 4.4 illustrates an overwriting virus that simply overwrites the beginning of the host but does not change its size.

Figure 4.4. An overwriting virus that does not change the size of the host.

image

4.2.2 Random Overwriting Viruses

Another rare variation of the overwriting method does not change the code of the program at the top of the host file. Instead, the virus seeks to a random location in the host program and overwrites the file with itself at that location. Evidently, the virus code might not even get control during execution of the host. In both cases, the host program is lost during the virus's attack and often crashes before the virus code can execute. An example of this virus is the Russian virus Omud9, as shown in Figure 4.5.

Figure 4.5. A random overwriter virus.

image

To improve performance by reducing the disk I/O, modern antivirus scanners are optimized to find viruses at “well-known” locations of the file whenever possible. Thus random overwriting viruses are often problematic for scanners to find because a scanner would need to scan the contents of the complete host program for the virus code, which is too I/O expensive.

4.2.3 Appending Viruses

A very typical DOS COM file infection technique is called normal COM. In this technique, a jump (JMP) instruction is inserted at the front of the host to point to the end of the original host. A typical example of this virus is Vienna, which was published in Ralf Burger's computer virus book in a slightly modified form with its source code. This was back in 1986–1987.

The technique gets its name from the location of the virus body, which is appended to the end of the file. (It is interesting to note that some viruses infect EXE files as COM by first converting the EXE file to a COM file. The Vacsina virus family uses this technique.)

The jump instruction is sometimes replaced with equally functional instructions, such as the following:


a.) CALL start_of_virus

b.) PUSH offset start_of_virus
    RET

The first three overwritten bytes at the top of the host program (sometimes 4–16) are stored in the virus body. When the virus-infected program is executed, the virus loads in memory with the actual infected host. The jump instruction directs control to the virus body, and then the virus typically replicates itself by locating new host programs on the disk or by executing some sort of activation routine (also called a trigger). Finally, the virus virtually cleans the program in memory by copying the original bytes to offset CS:0x100 (the location where the COM files are loaded) and executes the original program by jumping back to CS:0x100. The COM files are loaded to CS:0x100 because the program segment prefix (PSP) is placed at CS:0–CS:0xFF.

Figure 4.6 shows how a DOS COM appender virus infects a host program.

Figure 4.6. A typical DOS COM appender virus.

image

Obviously the appender technique can be implemented for any other type of executable file, such as EXE, NE, PE, and ELF formats, and so on. Such files have a header section that stores the address of the main entry point, which, in most cases, will be replaced with a new entry point to the start of the virus code appended to the end of the file.

Section 4.3 is dedicated to Win32 infection techniques to demonstrate the principles of file infection techniques in modern file formats. These formats often have complicated internal structures—offering many more opportunities to attackers.

4.2.4 Prepending Viruses

A common virus infection technique uses the principle of inserting virus code at the front of host programs. Such viruses are called prepending viruses. This is a simple kind of infection, and it is often very successful. Virus writers have implemented it on various operating systems, causing major virus outbreaks in many.

An example of a COM prepender virus is the Hungarian virus Polimer.512.A, which prepends itself, 512 bytes long, at the front of the executable and shifts the original program content to follow itself.

Let's take a look at the front of the Polimer virus in DOS DEBUG. Polimer is a good example to study because the top of the virus code is a completely harmless data area with a message that is displayed onscreen during execution of infected programs.

image

The virus body is loaded to offset 0x100 in memory. The virus code starts with a jump (0xe9) instruction to give control to the virus code after its own data area. Because Polimer is 512 bytes (0x200) long, at the front of COM executable, offset 0x300 in memory should be the original host program (0x100+0x200=0x300). Indeed, in this example, the actual infected host is the Free Memory Query Program. Prepending COM viruses can easily start their host programs by copying the original programs' content to offset 0x100 and giving it control.

image

Figure 4.7 illustrates how a prepender virus is inserted at the front of a host program.

Figure 4.7. A typical prepender virus.

image

Prepender viruses are often implemented in high-level languages such as C, Pascal, or Delphi. Depending on the actual structure of the executable, the execution of the original program might not be as trivial a task as it is for COM files. This is exactly why a generic solution involves creation of a new temporary file on the disk to hold the content of the original host program. Then a function, such as system(), is used to execute the original program in the temporary file. Such viruses typically pass command-line parameters of the infected host to the host program stored in the temporary file. Thus the functionality of the application will not break because of missing parameters.

4.2.5 Classic Parasitic Viruses

A variation of the prepender technique is known as the classic parasitic infection, as shown in Figure 4.8. Such viruses overwrite the top of the host with their own code and save the top of the original host program to the very end of the host, usually virus-size long. The first such virus was Virdem, written by Ralf Burger. In fact, Virdem is one of the first examples of a file virus ever seen; Burger's book did not even contain information about any other kinds of computer viruses but file viruses. Burger distributed his creation at the Chaos Computer Club conference in December 1986.

Figure 4.8. A classic parasitic virus.

image

Often when such viruses are repaired, a common problem occurs. In many cases, the repair definition directs to copy N number of bytes to the front of the file by calculating backward from the end of the infected program. Then the file is truncated at FILESIZE-N, where N is typically the size of the virus—but the size of the file can change. The most common reason for this is a multiple infection, when the file is infected more than once.

In other cases, the file has some extra data appended, such as inoculation information placed there by some other antivirus program. For instance, the Jerusalem virus uses the MsDos marker at the end of the infected file to “recognize” files that are already infected. Some early antivirus programs appended the string to the end of all COM and EXE files to inoculate files from recurrent Jerusalem infections. Although it might sound like a great idea, the extra modification of the files can easily cause trouble for disinfectors. This happens when the inoculated file is already infected with another parasitic virus. When the FILESIZE-N calculation is used, the repair routine will seek to an incorrect location 5 bytes after the top of the original program content. This repair will result in a garbage host program that will crash when executed. This kind of disinfection is often called a half-cooked repair10.

Some special parasitic infectors do not save the top of the host to the end of the host program. Instead they use a temporary file to store this information outside of the file, sometimes with hidden attributes. For example, the Hungarian DOS virus, Qpa, uses this technique and saves 333 bytes (the size of the virus) to an extra file. Some members of the infamous W32/Klez family use this technique to store the entire host program in a new file.

4.2.6 Cavity Viruses

Cavity viruses (as shown in Figure 4.9) typically do not increase the size of the object they infect. Instead they overwrite a part of the file that can be used to store the virus code safely. Cavity infectors typically overwrite areas of files that contain zeros in binary files. However, other areas also can be overwritten, such as 0xCC- filled blocks that C compilers often use for instruction alignment. Other viruses overwrite areas that contain spaces (0x20).

Figure 4.9. A cavity virus injects itself into a cave of the host.

image

The first known virus to use this technique was Lehigh, in 1987. Lehigh was a fairly unsuccessful virus. However, Ken van Wyk created a lot of publicity about the virus and eventually set up the VIRUS-L newsgroup on Usenet to discuss his findings.

Cavity infectors are usually slow spreaders on DOS systems. The Bulgarian Darth_Vader viruses, for instance, never caused major outbreaks. This was also due to the fact that Darth_Vader was a slow infector virus. It waited for a program to be written, and only then did it infect the program using a cavity of the host.

The W2K/Installer virus (written by virus writers Benny and Darkman) uses the cavity infection technique to infect Win32 PE on Windows 2000 without increasing the file's size.

A special kind of cavity virus infection relies on PE programs' relocation sections. Relocations of most executables are not used in normal situations. Modern linker versions can be configured to compile PE executable files without a relocation table—to make them shorter. Relocation cavity viruses overwrite the relocation section when there are relocations in the host. When the relocation section is longer than the virus, the virus does not increase the file size. Such viruses make sure that the relocation section is the last or it has sufficient length. Otherwise, the file gets corrupted easily during infection. For example, the W32/CTX and W95/Vulcano virus families use this technique.

4.2.7 Fractionated Cavity Viruses

A few Windows 95 viruses implement the cavity infection technique extremely successfully. The W95/CIH virus implements a variation of cavity infection called the fractionated cavity technique. In this case, the virus code is split between a loader routine and N number of sections that contain section slack space. First the loader (HEAD) routine of the virus locates the snippets of the virus code and reads them into a continuous area of memory, using an offset tablet kept in the HEAD part of the virus code. During infection, the virus locates the section slack gaps of portable executable (PE) files and injects its code into as many section slack holes as necessary.

A new viral entry point will be presented in the header of the file to point to the start of the virus code, usually inside the header section of the host applications. Some shorter cavity infectors, such as Murkry, use this area to infect files in a single step. However, CIH is longer and needs to split its code into snippets. Eventually, the virus executes the original host program from the stored entry point (EP). The advantage of the technique is that the virus only needs to “remember” the original EP of the host and simply jump there to execute the loaded program in memory.

Figure 4.10 represents the state of the host program before and after the infection of a fractionated cavity virus. The host would normally start at its entry point (EP) defined in its header section. The virus replaces that EP value with VEP, the viral entry point. The VEP points to the loader of the virus snippets. If there is not enough slack space in the file to present the loader in a single snippet, the file cannot get infected.

Figure 4.10. A fractionated cavity virus.

image

The section slacks are typically presented in modern file formats such as PE, and they can be easily located using the section header information of such files.

One of the special problems of cavity virus repair is that the content of overwritten areas cannot be restored 100%. This happens when the virus overwrites areas of files that usually contain zeros, but in other cases contain some other pattern. Thus the cryptographic checksums of files after repair will be often different from the original program's content. Furthermore, exact identification of such viruses is complicated because the virus snippets need to be pieced together.

Detection of the virus code is simple based on the content of the HEAD routine, which must be placed in a single snippet of code.

4.2.8 Compressing Viruses

A special virus infection technique uses the approach of compressing the content of the host program. Sometimes this technique is used to hide the host program's size increase after the infection by packing the host program sufficiently with a binary packing algorithm. Compressor viruses are sometimes called “beneficial” because such viruses might compress the infected program to a much shorter size, saving disk space. (Runtime binary packers, such as PKLITE, LZEXE, UPX, or ASPACK, are extremely popular programs. Many of these have been used independently by attackers to pack the content of Trojan horses, viruses, and computer worms to make them obfuscated and shorter.)

The DOS virus, Cruncher, was among the first to use the compression technique. Some of the 32-bit Windows viruses that use this technique include W32/HybrisF (a file infector plug-in of the Hybris worm), written by the virus writer, Vecna. Another infamous example is W32/Aldebera, which combines the infection method with polymorphism. Aldebera attempts to compress the host in such a way that the host remains equivalent in size to the original file. This virus was written by the virus writer, B0/S0 (Bozo) of the IKX virus writing group, in 1999.

The W32/Redemption virus of the virus writer, Jacky Qwerty, also uses the compression technique to infect 32-bit PE files on Windows systems. Figure 4.11 shows how a compressor virus attacks a file.

Figure 4.11. A compressor virus.

image

4.2.9 Amoeba Infection Technique

A rarely seen virus infection technique, Amoeba, embeds the host program inside the virus body. This is done by prepending the head part of the virus to the front of the file and appending the tail part to the very end of the host file. The head has access to the tail and is loaded later. The original host program is reconstructed as a new file on the disk for proper execution afterwards. For example, W32/Sand.12300, written by the virus writer, Alcopaul, uses this technique to infect PE files on Windows systems. Sand is written in Visual Basic.

Figure 4.12 shows the host program before and after infection by a virus that uses the Amoeba infection technique.

Figure 4.12. The Amoeba infection method.

image

4.2.10 Embedded Decryptor Technique

Some crafty viruses inject their decryptors into the executable's code. The entry point of the host is modified to point to the decryptor code. The location of the decryptor is randomly selected, and the decryptor is split into many parts. The overwritten blocks are stored inside the virus code for proper execution of the host program after infection.

When the infected application starts, the decryptor is executed. The decryptor of the virus decrypts the encrypted virus body and gives it control. The Slovakian polymorphic virus, One_Half, used this method to infect DOS COM and EXE files in May 1994. Evidently, the proper infection of EXE files with this technique is a more complicated task. If relocations are applied to parts of the file that are overwritten with pieces of the virus decryptor, the decryptor might get corrupted in memory. This can result in problems in executing host programs properly.

Figure 4.13 shows the “Swiss cheese” layout of infected program content. The detection of such viruses made scanning code more complicated. The scanner needed either to detect decryptor blocks split into many parts or to include some more advanced scanning technique, such as code emulation, to resolve detection easily. (These techniques are discussed in Chapter 11.)

Figure 4.13. A “Swiss cheese” infection.

image

The easiest way to analyze such virus code is based on the use of special goat (decoy) files filled with a constant pattern, such as 0x41 (“A”) characters. After the test infection, the overwritten parts stand out in the infected test program the following way:

image

Note the 0xE9 (JMP) and 0xEB (JMP short) patterns in the previous dump in two pieces of One_Half's decryptor. These are the pointers to the next decryptor block. In the past, several antivirus products would put together the pieces of the decryptor by following these offsets to decrypt the virus quickly and identify it properly.

4.2.11 Embedded Decryptor and Virus Body Technique

A more sophisticated infection technique was used by the Bulgarian virus, Commander_Bomber, written by Dark Avenger as one of his last known viruses in late 1993. The virus was named after the string that can be found in the virus body: COMMANDER BOMBER WAS HERE.

The Commander_Bomber virus body is split into several parts, which are placed at random positions on the host program, overwriting original content of the host. The head of the virus code starts in the front of the file and gives control to the next piece of the virus code, and so on. These pieces overwrite the host program in a way similar to the One_Half virus. The overwritten parts are stored at the end of the file, and a table is used to describe their locations.

Figure 4.14 shows the sophistication of the virus code's location within the host program. Scanners must follow the spiral path of the control flow from block to block until they find the main virus body.

Figure 4.14. Commander_Bomber-style infection.

image

The control blocks are polymorphic, generated by the DAME (Dark Avenger Mutation Engine) of the virus. This makes the blocks especially difficult to read because they contain a lot of garbage code with obfuscated ways to give control to the next block, until the nonencrypted virus body is reached. Eventually, the control arrives at the main virus body, which can be practically anywhere in the file, not at its very end. This is a major advantage for such viruses, because scanners need to locate where the main body of the virus is stored. Back in 1993, this technique was extremely sophisticated, and only a few scanners were able to detect such viruses effectively. The host program is reconstructed by the virus in runtime.

4.2.12 Obfuscated Tricky Jump Technique

W32/Donut, the first virus to infect .NET executables, was not dependent on JIT compilation as discussed in Chapter 3. This is because first editions of the .NET executable format can be attacked at its entry-point code, which is still architecture-dependent. (In later versions of Windows, this platform-dependent code will be eliminated by moving the functionality to the system loader itself.)

Donut gets control immediately upon executing an infected .NET PE file. The virus uses the simplest possible infection technique to infect .NET images. In fact, Donut turns .NET executables to regular-looking PE files. This is because the virus nullifies the data directory entry of the CLR header when it infects a .NET application.

The six-byte-long jump to the _CorExeMain() import at the entry point of .NET files is replaced by Donut with a jump to the virus entry point. The _CorExeMain() function is used to fire up the CLR execution of the MSIL code. The entry point in the header is not changed by the virus. This technique is called an obfuscated tricky jump. Evidently, this method can fool some heuristic scanners.

The actual jump at the entry point will be replaced with a 0xE9 (JMP) opcode, followed by an offset to the start of the virus body in the first physical byte of the relocation section, as shown in Figure 4.15.

Figure 4.15. An applied obfuscated tricky jump technique.

image

The obfuscated tricky jump is a common technique to avoid changing the original entry point of the file. One of the first viruses that used this trick was DOS COM infector, Leapfrog, which followed the jump instruction at the front of the host and inserted its own jump to the actual entry point instead, as Figure 4.15 demonstrates.

The first documented Win32 virus, W32/Cabanas11, used this technique as an antiheuristic feature to infect regular PE files on Windows 95 and Windows NT.

When activated, W32/Donut displays the following message box shown in Figure 4.16.

Figure 4.16. The message box of the W32/Donut virus.

image

Note

The virus writer wanted to call this creation “.dotNet,” but because this is a platform name, it cannot be the name of the virus. For obvious reasons, viruses are not called “DOS,” “Windows,” and so on. So I decided to name the virus something that sounds similar to “dotNET,” calling it Donut instead.

4.2.13 Entry-Point Obscuring (EPO) Viruses

Entry-point obscuring viruses do not change the entry point of the application to infect it; neither do they change the code at the entry point. Instead, they change the program code somewhere in such a way that the virus gets control randomly.

4.2.13.1 Basic EPO Techniques on DOS

Several viruses use the EPO strategy on DOS to avoid easy detection with fast scanners that scan the file near its entry-point code. For example, in early 1997 the Olivia virus12 infected DOS EXE and COM files using this method. This technique became increasingly popular among virus writers to defeat heuristics analyzer programs after 1995.

Olivia infects COM and EXE files as they are run or renamed or as their attributes are changed. First, the virus clears the attributes of the file, and then it opens the file to analyze its structure.

Figure 4.17 demonstrates the simplified look of an EPO virus-infected program.

Figure 4.17. A typical encrypted DOS EPO virus.

image

If the victim has a COM extension, Olivia uses a special function that reads four bytes in a loop from the beginning of the victim and checks for E9h (JMP), EBh (JMP short), 90h (NOP), F8h (CLC), F9h (STC), FAh (CLI), FBh (STI), FCh (CLD), and FDh (STD) each time. If one of the previous instructions is found, the virus seeks the place of the next such instruction. If that position is not in the last 64 bytes of the host, the virus modifies the host program at the location where the previous instruction sequence was detected.

Olivia uses the 0x68 (Intel 286 PUSH) opcode to push a word value to the stack. This is followed by a 0xC3 (RET) instruction, which gives control to the virus code by popping the pushed offset to the decryptor of the virus.


(0x68) PUSH offset DECRYPTOR
(0xC3) RET

In Figure 4.17, a jump instruction is shown to transfer control to a decryptor located at the end of the file, followed by the encrypted virus body. Other viruses often use a CALL instruction or similar trampoline to transfer control to the start of the virus body.

Figure 4.18 shows the happy birthday message displayed by Olivia upon activation.

Figure 4.18. The payload of the Olivia virus.

image

4.2.13.2 Advanced EPO Techniques on DOS

The Nexiv_Der virus13 (shown in Figure 4.19) is polymorphic in COM files, and it also infects the disk's boot sector (DBS). The most interesting technique of this virus, however, is the special EPO technique that it uses to infect files. Nexiv_Der was named after a backward string contained in its encrypted body: “Nexiv_Der takes on your files.”

Figure 4.19. A polymorphic EPO virus.

image

This virus traces the execution of a program as an application debugger does. Then it patches the code at a randomly selected location to a CALL instruction. This CALL instruction points to the polymorphic decryptor of the virus.

The execution path through a program depends on many parameters, including command-line arguments passed to the program and DOS version number. Depending on the same parameters, an infected victim program will most likely run the virus code upon normal execution each time. However, the virus might not run at all on a different version of DOS because the virus code cannot take control. This generates a major problem for even sophisticated heuristic scanners that use a virtual machine to simulate the execution of programs because it is difficult to emulate all of the system calls and the execution path of the victim.

The major idea of the Nexiv_Der virus is based on its hook of the INT 1 handler (TRACE) under DOS. This handler is the real infection routine. It starts to trace the host program for at least 256 instructions and stops at the maximum of 2,048 iterations. If the last instruction of the trace happens to be an E8h, E9h, or 80C0..CF opcode (CALL, JMP, ADD AL,byte .. OR BH,byte), then Nexiv_Der replaces it with a CALL instruction, which starts the virus at the end of the file. Figure 4.19 shows a high-level look at a Nexiv_Der-infected executable.

The main advantage of this technique is the increased likelihood of virus code execution in a similar host-system environment. This technique, however, is too complicated and thus encountered very rarely.

4.2.13.3 EPO Viruses on 16-Bit Windows

One of the first EPO viruses in the wild was the Tentacle_II family14 on Windows 3.x systems. This virus does not change the original entry point of the NE header, which is the obvious choice of typical 16-bit Windows viruses. How can it take control, then? The virus takes advantage of the NE file structure. Although the NE on disk structure is more complicated to parse, it provides many more possibilities for an attacker to inject code reliably into the execution flow. Tentacle_II takes advantage of the module reference table of NE files to find common function calls that are expected to be executed among the first function calls made by the host program.

Tentacle_II checks for the KERNEL and VBRUN300 module names in the module reference table. The virus picks the module number of the found module name and reads the segment relocation records of every segment. It looks for the relocation record 91 (INITTASK) in the case of KERNEL or 100 (THUNKMAIN) in the case that VBRUN300 has been found previously. Both of these relocation records point to standard initialization code that must be called at the beginning of a Windows application. For example, the original KEYVIEW.EXE (a standard Windows application) has a relocation entry for KERNEL.91 for its first segment as follows:

image

When KEYVIEW.EXE gets infected, the virus patches this record to point to a new segment, the VIRUS_SEGMENT.

Segment relocation records:

image

image

Thus the infected file starts as it would before the infection, but when the application calls one of the preceding initialization functions, control is passed to the address where the virus starts.

The VIRUS_SEGMENT has three relocation records. One of these will point to the original initialization procedure KERNEL.91 or VBRUN300. In this way, the virus is able to start the host program after itself. This infection technique is an NE entry point–obscuring infection technique, which makes Tentacle_II an antiheuristic Windows virus.

The preceding analysis was made with the help of Borland's TDUMP (Turbo Dump) utility. In the analysis techniques and tools sections, I will give a longer introduction to such tools and their role in virus analysis.

The payload of Tentacle_II is shown in Figure 4.20. The virus creates a TENTACLE.GIF file on the disk, which will be displayed each time a GIF image is viewed on the infected system.

Figure 4.20. The payload of the Tentacle_II virus.

image

4.2.13.4 API-Hooking Technique on Win32

On Win32 systems, EPO techniques became highly advanced. The PE file format15 can be attacked in different ways. One of the most common EPO techniques is based on the hooks of an instruction pattern in the program's code section. A typical Win32 application makes a lot of calls to APIs (application program interfaces). Many Win32 EPO viruses take advantage of API CALL points and change these pointers to their own start code.

For example, the W32/CTX and W32/Dengue viruses of GriYo locate a CALL instruction in the host program's code section that points to the import directory. In this way, the virus can reliably identify byte patterns that belong to a function call. After that, the CALL instruction is modified in such a way that it will point to the start of the virus code located elsewhere, typically appended to the end of the file. Such viruses typically search for one or both API call implementations:

• Microsoft API Implementation

CALL DWORD PTR []

• Borland API Implementation

JMP DWORD PTR []

This kind of virus also makes its selection for an API hook location totally at random; in some cases, the virus might not even get control each time a host program is executed. Some families of computer viruses make sure that the virus will execute from the file most of the time.

Viruses can hook an API that is called whenever the application exits back to the system. In this case, most programs call the ExitProcess() API. By replacing the call to ExitProcess() with the call to the virus body, a virus can trigger its infection routine more reliably whenever the application exits. To make antivirus detection more difficult, viruses often combine EPO techniques with code obfuscation techniques, such as encryption or polymorphism.

Figure 4.21 illustrates a Win32 EPO virus that replaces a CALL to ExitProcess() API with a CALL to the virus code. After the virus takes control, it will eventually run the original code (C) by fixing the code in memory and giving control to the fixed block.

Figure 4.21. An EPO virus that hooks API calls of the host.

image

Normally disk activity increases whenever the application exits. This happens for several different reasons. For example, if an application has used a lot of virtual memory, the operating system will need to do a lot of paging, which increases disk activity. Thus it is likely that viruses like this remain unnoticed for a long time.

4.2.13.5 Function Call Hooking on Win32

Another common technique of EPO viruses is to locate a function call reliably in the application's code section to a subroutine of the program. Because the pattern of a CALL instruction could be part of another instruction's data, the virus would not be able to identify the instruction boundaries properly by looking for CALL instruction alone.

To solve this problem, viruses often check to see whether the CALL instruction points to a pattern that appears to be the start of a typical subroutine call, similar to the following:


CALL Foobar

Foobar:


PUSH EBP                       ; opcode 0x55
MOV  EBP, ESP                  ; opcode 0x89E5

Figure 4.22 illustrates the replacement of a function call to Foobar() with a call to the start of virus code. The Foobar() function starts with the 0x55 0x89 0xe5 sequence; it is easily identified as a function entry point. A similar opcode sequence is 0x55 0x8B 0xEC, which also translates to the same assembly. This virus technique is used by variants of W32/RainSong (created by the virus writer, Bumblebee).

Figure 4.22. Function call–hooking EPO virus.

image

Note

The Russian virus, Zhengxi, uses a checksum of the preceding patterns, among others, to obfuscate the virus code further. Zhengxi uses the pattern to infect DOS EXE files using the EPO technique.

4.2.13.6 Import Table Replacing on Win32

Newer Win32 viruses infect Win32 executables in such a way that they do not need to modify the original code of the program to take control. Instead, such EPO viruses work somewhat similarly to the 16-bit Windows virus Tentacle_II.

To get control, the virus simply changes the import address table entries of the PE host in such a way that each API call of the application via the import address directory will run the virus code instead. In turn, the activated virus code presents a new import table in the memory image of the program. As a result, consequential API CALLs run proper, original entry-point code via the fixed import table.

This technique is used by the W32/Idele family of computer viruses, written by the virus writer, Doxtor L, as shown in Figure 4.23. W32/Idele changes the program section slack area of the code section with a small routine that allocates memory and decrypts the virus code into the allocated block and then executes it. Thus Idele avoids creating import entries with addresses that do not point to the code section.

Figure 4.23. Import table-replacing EPO virus.

image

4.2.13.7 Instruction Tracing Technique on Win32

The Nexiv_Der virus inspired modern virus writing on 32-bit Windows systems. In 2003, new viruses started to appear that use EPO, based on the technique that was pioneered on DOS. For example, the W32/Perenast16 family of viruses is capable of tracing host programs before infection by running the host as a hidden debug process using standard Windows debug APIs.

4.2.13.8 Use of “Unknown” Entry Points

Another technique to execute virus code in a semi-EPO manner involves code execution via non-well-known entry points of applications. The Win32 PE file format is commonly known to execute applications from the MAIN entry point stored in the PE.OptionalHeader.AddressOfEntryPoint field of the executable's header structure. Thus it is common knowledge that such programs always start wherever this field points.

It might come as a surprise that this is not necessarily the first entry point in a PE file that the system loader executes. On Windows NT systems and above, the system loader looks for the thread local storage (TLS) data directory in the PE files header first. If it finds TLS entry points, it executes these first. Only afterward will it run the MAIN entry-point code.

The following two message boxes are printed by a TLSDEMO program of Peter Ferrie. The demo was created when he discovered the TLS entry-point trick at Symantec during heuristic analysis research in 2000.

When the application is executed, it prints a message box from both the TLS and the MAIN entry points of the applications.

First, it prints the message box from the TLS, as shown in Figure 4.24.

Figure 4.24. The TLS entry point is executed first.

image

When you click on the OK button, you arrive at the real main entry point, as shown in Figure 4.25.

Figure 4.25. The main entry point is executed next.

image

Initially we did not talk about this trick because it could be used to develop even trickier viruses. However, the virus writer, roy g biv, discovered this undocumented trick and has already used it successfully in some of his W32/Chiton17 viruses in 2003.

4.2.13.9 Code Integration–Based EPO Viruses

A very sophisticated virus infection technique is called code integration. A virus using this technique inserts its own code into the execution flow of the host program using standard EPO techniques and merges its code with the host program's code. This is a complicated process that requires complete disassembling and reassembling of the host. Fortunately, it is extremely complicated to develop such viruses. Disassembling the host program is a fairly CPU-intensive operation that requires a lot of memory. Such viruses need to update the host program's content with proper relocations for code and data sections of the host. The W95/Zmist virus, by the Russian virus writer, Zombie, uses this approach. Because of its high sophistication, this technique is detailed in Chapter 7.

Figure 4.26 shows a typical layout of a file infected with a sophisticated code-integration EPO virus.

Figure 4.26. A poly and metamorphic code-integration virus.

image

Code integration is a major challenge for scanners and computer virus analysts. The entire file must be examined to find the virus. The virus is camouflaged in the code section of the infected host program, and it is very difficult to locate the instruction that transfers control to the start of the virus. In the case of W95/Zmist, the decryptor of the virus code is not in one piece but is split in a manner similar to the One_Half or the Commander_Bomber virus.

4.2.14 Possible Future Infection Techniques: Code Builders

After reading the previous sections, you might wonder what could get more complicated and sophisticated than the code-integration EPO technique. This section provides you with an example that has not yet been seen in the most complex implementations of known computer viruses with the kind of sophistication that is unknown in computer viruses. The closest example is the W95/Zmist virus. The Zmist virus makes use of the host program's content in a manner that is similar to the Code Builder technique. Zmist calls into the host program's code to execute an RET (Return) instruction from it. Thus, the virus code flows into the host program's code and back. The author of the virus probably intended to extend this approach to build the entire virus body on the fly, using the content of the host program. Consider the code-builder virus shown in Figure 4.27.

Figure 4.27. A code-builder virus.

image

The idea is based on the fact that any program might contain another set of programs in it as instructions or instruction sequences. A virus might be able to analyze the host program's code in such a sophisticated way that these strings of instruction could be used as the virus itself. It might be difficult to find code that would transfer control properly with accurate register state. However, to demonstrate the idea, imagine a simple code-builder virus that would find the letters V, I, R, U, and S in the host program's code. The builder of the virus would copy these pieces together into memory. The builder itself would look like a generic sequence of code, which could be easy to vary based on metamorphic techniques. The builder would be integrated into the code of the host program itself.

Fortunately, this is a rather complicated virus, but certainly it would be very challenging to detect it in files. (A few members of the W95/Henky family use an approach similar to this, except that the viruses are not EPO, which simplifies their detection.)

4.3 An In-Depth Look at Win32 Viruses

The world of computer antivirus research has changed drastically since Windows 95 appeared on the market18. One reason this happened was that a certain number of DOS viruses became incompatible with Windows 95. In particular, the tricky viruses that used stealth techniques and undocumented DOS features failed to replicate under the new system. Many simple viruses remained compatible with Windows 95, such as Yankee Doodle, a very successful old Bulgarian virus. Regardless of this, virus writers felt that the new challenge was to investigate the new operating system, to create new DOS executable viruses and boot viruses with special attention to Windows 95 compatibility. Because most virus writers did not have enough in-depth knowledge of the internal mechanisms of Windows 95, they looked for shortcuts to enable them to write viruses for the new platform. They quickly found the first one: macro viruses, which are generally not dependent on the operating system or on hardware differences.

Some young virus writers are still happy with macro viruses and develop them endlessly. After writing a few successful macro viruses, however, most grow bored and stop developing them. You may think, fortunately, but the truth is otherwise. Virus writers are looking for other challenges, and they usually find new and different ways to infect systems.

The first Windows 95 virus, W95/Boza, appeared in the same year that Windows 95 was introduced. Boza was written by a member of the Australian VLAD virus-writing group. It took a long time for other virus writers to understand the workings of the system but, during 1997, new Windows 95 viruses appeared, some of them in the wild.

At the end of 1997, the first Win32, Windows NT–compatible virus, Cabanas, was written by the same young virus writer (Jacky Qwerty/29A) who wrote the infamous WM/Cap.A virus. Cabanas is compatible with Windows 9x, Windows NT, and Win32s. (It is also compatible with Windows 98 and Windows 2000, even though the virus code was never tested on these systems by the virus writer because these systems appeared later than the actual virus.) Cabanas turned Microsoft's Win32 compatibility dream into a nightmare.

Although it used to be difficult to write such viruses, we suspected that file-infecting DOS viruses from the early years of computer viruses would eventually be replaced by Win32 creations.

This transition in computer virus writing was completed by 2004. Even macro viruses are now very rare; virus writers currently focus on 32-bit and 64-bit Windows viruses.

4.3.1 The Win32 API and Platforms That Support It

In 1995, Windows 95 was introduced by Microsoft as a new major operating system platform. The Windows 95 system is strongly based on Windows 3.x and DOS technologies, but it gives real meaning to the term Win32.

What is Win32? Originally, programmers did not even understand the difference between Win32 and Windows NT. Win32 is the name of an API—no more, no less. The set of system functions available to be called from a 32-bit Windows application is contained in the Win32 API. The Win32 API is implemented on several platforms—one of them being Windows NT, the most important Win32 platform. Besides DOS programs, Windows NT also is capable of executing 16-bit Windows programs, OS/2 1.x character applications (and, with some extensions, even Presentation Manager–based 1.3 programs with some limitations). In addition, Windows NT introduced the new portable executable (PE) file format (format very similar to, if not based on, the UNIX COFF format) that can run Win32 applications (which call functions in the Win32 API set). As the word portable indicates, this format is supposed to be an easily portable file format, which is actually the most common and important one to run on Windows NT.

Other platforms are also capable of running Win32 applications. In fact, one of them was shipped before Windows NT. This platform is called Win32s. Anyone who has ever tried to develop software for Win32s knows that it was a very unstable solution.

Because Windows NT is a robust system that needs strong hardware on which to run, Win32 technology did not take the market position Microsoft wanted quickly enough. That process ended up with the development of Windows 95, which supported the new PE format by default. Therefore, it supports a special set of Win32 APIs. Windows 95 is a much better implementation of the Win32 APIs than Win32s. However, Windows 95 does not contain the full implementation of the Win32 APIs found in Windows NT.

Until Windows NT gained more momentum, Windows 9x was Microsoft's Win32 platform. After Windows NT, Windows 2000 and Windows 98/Me gained popularity and were replaced by Windows XP and the more secure Windows 2003 server editions, which support the .NET extension by default. On the horizon, Microsoft is talking about the next new Windows release, codenamed Longhorn. All of these systems will support a form of Win32 API that, in most cases, provides binary compatibility among all of these systems.

Last but not least, the Win32 API and the PE format are supported by Windows CE (Windows Mobile edition), which is used primarily by handheld PCs. The main hardware requirement includes 486 and above Intel and AMD processors for a Windows CE platform. However, current implementations seem to use SH3, ARM, and Intel XScale processors.

Now we get to the issue of CPUs. Both Windows NT and Windows CE are capable of running on machines that have different CPUs. The same PE file format is used on the different machines, but the actual executed code contains the compiled binary for the actual processor, and the PE header contains information about the actual processor type needed to execute the image. All of these platforms contain different implementations of Win32 functions. Most functions are available in all implementations. Thus a program can call them regardless of the actual platform on which it is running. Most of the API differences are related to the actual operating system capabilities and available hardware resources. For instance, CreateThread() simply returns NULL when called under Win32s. The Windows CE API set consists of several hundred functions, but it does not support trivial functions such as GetWindowsDirectory() at all because the Windows CE KERNEL is designed to be placed in ROM of the handheld PC. Due to the hardware's severe restrictions (Windows CE must run on machines with 2 or 4MB of RAM without disk storage), Microsoft was forced to create a new operating system that had a smaller footprint than either Windows NT or Windows 95.

Although several manifestations of the Win32 API implement some of the Win32 APIs differently or not at all, in general it is feasible to write a single program that will work on any platform that supports Win32 APIs. Virus writers already understand this fact very well. Their first such virus creation attacked Windows 95 specifically, but virus writers slowly improved the infection methods to attack the PE file format in such a way that the actual infected program remains compatible and also executes correctly under Windows NT/2000/XP systems.

Most Windows 95 viruses depend on Windows 95 system behavior and functionality, such as features related to VxD (virtual device driver) and VMM (virtual machine manager), but some of them contain only a certain amount of bugs and need only slight fixes to be able to run under more than one Win32 platform, such as Windows 95/Windows NT.

Detection and disinfection of such viruses is not a trivial task. In particular, the disinfection can be difficult to implement. This is because, so far, the PE structure is much more complicated than any other executable file format used by DOS or Windows 3.x. However, it is also a fact that the PE format is a much nicer design than, for example, NE.

Unfortunately, over the period from 1995 to 2004, virus writers utilized these platforms aggressively, resulting in the appearance of more than 16,000 variants of 32-bit Windows viruses. However, the principles of these viruses have not changed much. In the next section, you will find details about infection techniques of the PE file format from the perspective of an attacker.

Note

Win64 is almost the same as Win32, but for 64-bit Windows architectures. There are a couple of minor modifications in Win64 to accommodate the platform differences.

4.3.2 Infection Techniques on 32-Bit Windows

This section describes the different ways in which a 32-bit Windows virus can infect different kinds of executable programs used by Windows 95/Windows NT. Because the most common file format is the PE format, most of the infection methods are related to that. The PE format makes it possible for viruses to jump easily from one 32-bit Windows platform to another. We shall concentrate on infection techniques that attack this particular format because these viruses have a strong chance of remaining relevant in the future.

Early Windows 95 viruses have a VxD part, which is dropped by other infected objects such as DOS, EXE, and COM executables or a PE application. Some of these infection methods are not related to Win32 platforms on the API level. For instance, VxDs are only supported by Windows 9x and Windows 3.x, not by Windows NT. VxDs have their own 32-bit, undocumented, linear executable (LE) file format. It is interesting to note that this format was 32-bit even at the time of 16-bit Windows. Microsoft could not drop the support of VxDs from Windows 95 because of the many third-party drivers developed to handle special hardware components. The LE file format remained undocumented by Microsoft, but there are already several viruses, such as Navrhar, that infect this format correctly. I will describe these infection techniques briefly to explain the evolution of Win32 viruses.

4.3.2.1 Introduction to the Portable Executable File Format

In the following section, I will provide an introductory tour of the PE file format that Microsoft designed for use on all its Win32 operating systems (Windows NT, Windows 95, Win32s, and Windows CE). There are several good descriptions of the format on the Microsoft Developer Network CD-ROM, as well as in many other Windows 95–related books, so I'll describe the PE format from the point of view of known virus infection techniques. To understand how Win32 viruses work, you need to understand the PE format. It is that simple.

The PE file format will play a key role in all of Microsoft's operating systems for the foreseeable future. It is common knowledge that Windows NT has a VAX VMS and UNIX heritage. As I mentioned earlier, the PE format is very similar to COFF (common object file format), but it is an updated version. It is called portable because the same file format is used under various platforms.

The most important thing to know about PE files is that the executable code on disk is very similar to what the module looks like after Windows has loaded it for execution. This makes the system loader's job much simpler. In 16-bit Windows, the loader must spend a long time preparing the code for execution. This is because in 16-bit Windows applications, all the functions that call out to a DLL (dynamic loaded library) must be relocated. Some huge applications can have thousands of relocations for API calls, which have to be patched by the system loader while reading the file in portions and allocating memory for its structures one by one. PE applications do not need relocation for library calls anymore. Instead, a special area of the PE file, the import address table (IAT), is used for that functionality by the system loader. The IAT plays a key role in Win32 viruses, and I shall describe it later in detail.

For Win32, all the memory used by the module for code, data, resources, import tables, and export tables is in one continuous range of linear address space. The only thing that an application knows is the address where the loader mapped the executable file into memory. When the base address is known, the various pieces of the module can easily be found by following pointers stored as part of the image.

Another idea you should become familiar with is the relative virtual address, or RVA. Many fields in PE files are specified in terms of RVAs. An RVA is simply the offset of an item to where the file is mapped. For instance, the Windows loader might map a PE application into memory starting at address 0x400000 (the most common base address) in the virtual address space. If a certain item of the image starts at address 0x401234, then the item's RVA is 0x1234.

Another concept to be familiar with when investigating PE files and the viruses that infect them is the section. A section in a PE file is roughly equivalent to a segment in a 16-bit NE file. Sections contain either code or data (and occasionally a mixture of both). Some sections contain code or data declared by the actual application, whereas other data sections contain important information for the operating system. Before jumping into the important details of the PE file, examine Figure 4.28, which shows the overall structure of a PE file.

Figure 4.28. A high-level view of the PE file image.

image

4.3.2.1.1 The PE Header

The first important part of the PE format is the PE header. Just like all the other Microsoft executable file formats, the PE file has a header area with a collection of fields at an easy-to-find location. The PE header describes vital pieces of the portable executable image. It is not at the very beginning of the file; rather, the old DOS stub program is presented there.

The DOS stub is just a minimal DOS EXE program that displays an error message (usually “This program cannot be run in DOS mode”). Because this header is presented at the beginning of the file, some DOS viruses can infect PE images correctly at their DOS stub. However, Windows 95 and Windows NT's system loaders execute PE applications correctly as 32-bit images, and the DOS stub program remains as a compatibility issue with 16-bit Windows systems.

The loader picks up the PE header's file address from the DOS header lfanew field. The PE header starts with an important magic value of PE. After that is the image file header structure, followed by the image optional header.

From now on, I will describe only the important fields of the PE header that are involved with Windows 9x/Win32 viruses. The fields are in order, but I will concentrate on the most commonly used values—so several will be missing from the list.

Figure 4.28 shows the high-level structure of a PE file image.

The following paragraphs list important fields of the image file header.

WORD Machine

Indicates the CPU for which this file is intended. Many Windows 9x virus check this field by looking for the Intel i386 magic value before actual infection. However, some bogus viruses do not check the machine type and infect PE files for other platforms and cause such files to crash when the virus code is executed on the wrong platform. There is a certain risk that we will see viruses with multiprocessor support in the future. For example, the same viruses could target ARM as well as IA64 and regular X86 PE files.

WORD NumberOfSections

The number of sections in the EXE (DLL). This field is used by viruses for many different reasons. For instance, the NumberOfSections field is incremented by viruses that add a new section to the PE image and place the virus body in that section. (When this field is changed by the virus code, the section table is patched at the same time.) Windows NT–based systems accept up to 96 sections in a PE file. Windows 95–based system do not inspect the section number.

WORD Characteristics

The flags with information about the file. Most viruses check these flags to be sure that the executable image is not a DLL but a program. (Some Windows 9x viruses infect KERNEL32.DLL. If so, the field is used to make sure that the executable is a DLL.) This field is not usually changed by viruses.

Important fields of the image optional header follow.

WORD Magic

The optional header starts with a “magic” field. The value of the field is checked by some viruses to make sure that the actual program is a normal executable and not a ROM image or something else.

DWORD SizeOfCode

This field describes the rounded-up size of all executable sections. Usually viruses do not fix the value when adding a new code section to the host program. However, some future viruses might change this value.

DWORD AddressOfEntryPoint

The address where the execution of the image begins. This value is an RVA that normally points to the .text (or CODE) section. This is a crucial field for most Windows 9x/Win32 viruses. The field is changed by most of the known virus infection types to point to the actual entry point of the virus code.

DWORD ImageBase

When the linker creates a PE executable, it assumes that the image will be mapped to a specific memory location. That address is stored in this field. If the image can be loaded to the specified address (currently 0x400000 in Microsoft programs), then the image does not need relocation patches by the loader. This field is used by most viruses before infection to calculate the actual address of certain items, but it is not usually changed.

DWORD SectionAlignment

When the executable is mapped into memory, each section must start at a virtual address that is a multiple of this value. This field minimum is 0x1000 (4096 bytes), but linkers from Borland use much bigger defaults, such as 0x10000 (64KB). Most Win32 viruses use this field to calculate the correct location for the virus body but do not change the field.

DWORD FileAlignment

In the PE file, the raw data starts at a multiple of this value. Viruses do not change this value but use it in a similar way to SectionAlignment.

DWORD SizeOfImage

When the linker creates the image, it calculates the total size of the portions of the image that the loader has to load. This includes the size of the region starting at the image base up through the end of the last section. The end of the last section is rounded up to the nearest multiple of section alignment. Almost every PE infection method uses and changes the SizeOfImage value of the PE header.

Not surprisingly, many viruses calculate this field incorrectly, which makes image execution impossible under Windows NT. This is because the Windows 9x's loader does not bother to check this value when executing the image. Usually (and fortunately) virus writers do not test their creations for long, if at all. Most Windows 95 viruses contain this bug. Some antivirus software used to calculate this field incorrectly when disinfecting files. This causes a side effect: A Windows NT–compatible Win32 program will not be executed by Windows NT but only by Windows 9x, even when the application has been disinfected.

DWORD Checksum

This is a checksum of the file. Most executables contain 0 in this field. All DLLs and drivers, however, must have a checksum. Windows 95's loader simply ignores the checking of this field before loading DLLs, which makes it possible for some Windows 95 viruses to infect KERNEL32.DLL very easily. This field is used by some viruses to represent an infection marker to avoid double infections. Another set of viruses recalculates it to hide an infection even better.

4.3.2.1.2 The Section Table and Commonly Encountered Sections

Between the PE header and the raw data for the image's sections lies the section table. The section table contains information about each section of the actual PE image. (See the following dumps that I made with the PEDUMP tool.)

Basically, sections are used to separate different functioning modules from each other, such as executable code, data, global data, debug information, relocation, and so on. The section table modification is important for viruses to specify their own code section or to patch an already existing section to fit actual virus code into it. Each section in the image has a section header in the section table. These headers describe the name of each section (.text, … .reloc) as well as its actual, virtual, and raw data locations and sizes. First-generation viruses, like Boza, patch a new section header into the section table. (Boza adds its own .vlad section, which describes the location and size of the virus section.)

Sometimes there is no place for a section header in the file, and the patch cannot take its place easily. Therefore, viruses today (such as W95/Anxiety19 variants) attack the last existing section header and modify its fields to fit the virus code in that section. This makes the virus code section less visible and the infection method less risky.

Listing 4.1 is the section table example of CALC.EXE (the Windows Calculator).

Listing 4.1. Looking at the Section Table of CALC.EXE with PEDUMP

image

The name of the section can be anything. It could even contain just zeros; the loader does not seem to worry about the name. In general, however, the name field describes the actual functionality of the section.

There is a chance for confusion here because the actual code is placed into a .text section of the PE files. This is the traditional name, the same as in the old COFF format. The linker concentrates all the .text section of the various OBJ files to one big .text section and places this in the first position of the section table. As I will describe later, the .text section contains not only code, but an additional jump table for DLL library calls. The Borland linker calls the .text section CODE, which is not a traditional name (but not one beyond normal understanding).

Another common section name is .data, where the initialized data goes. The .bss section contains uninitialized static and global variables. The .rsrc contains and stores the resources for the application.

The .idata section contains the import table—a very important part of the PE format for viruses. (Note that sections are only used as logical separators in the file image. Because nothing is mandatory, the “.idata” section's content might be merged in any other sections—or not presented at all.)

The .edata section is also very important for viruses because it lists all the APIs that the actual module exports for other executables.

The .reloc section stores the base relocation table. Some viruses take special care of relocation entries of the executables; however, this section seems to disappear from most Windows 98 executables from Microsoft. Somehow the .reloc section had an early PE format design problem. The actual program is loaded before its DLLs, and the application is executed in its own virtual address space—there seems to be no real need for that.

Last but not least, there is a common section name, the .debug section, which holds the debug information of the executable (if there is any). This is not important for viruses, although they could take advantage of it for infections.

Because the name of the section can be specified by the programmer, some executables contain all kinds of special names by default.

Three of the section table header's fields are very important for most viruses: VirtualSize (which holds the virtual size of the section), SizeOfRawData (which holds the size of the section after it has been rounded up to the nearest file alignment), and the Characteristics field.

The Characteristics field holds a set of flags that indicate the section's attributes (code, data, readable, writable, executable, and so on). The code section has an executable flag but does not need writable attributes because the data are separated. This is not the same with appended virus code, which must keep its data area somewhere in its code. Therefore viruses must check for and change the Characteristics field of the section in which their code will be presented.

All of this indicates that the actual disinfection of a 32-bit virus can be more complicated than that of a normal DOS EXE virus. The infection itself is not trivial in most methods, but so many sources are available on various Internet locations that virus writers have all the necessary support to write new virus variants easily.

4.3.2.1.3 PE File Imports: How Are DLLs Linked to Executables?

Most of the Windows 9x and Windows NT viruses are based heavily on the understanding of the import table, which is a very important part of the PE structure. In Win32 environments, DLLs are linked through the PE file's import table to the application that uses them. The import table holds the names of the imported DLLs and also the names of the imported functions from those DLLs. Consider the following examples:

image

The executable code is located in the .text section of PE files (or in the CODE section, as the Borland linker calls it). When the application calls a function that is in a DLL, the actual CALL instruction does not call the DLL directly. Instead, it goes first to a jump (JMP DWORD PTR [XXXXXXXX]) instruction somewhere in the executable's .text section (or in the CODE section in the case of Borland linkers).

The address that the jump instruction looks up is stored in the .idata section (or sometimes in .text) and is called an entry within the IAT (Import Address Table). The jump instruction transfers control to that address pointed by the IAT entry, which is the intended target address. Thus, the DWORD in the .idata section contains the real address of the function entry point, as shown in the following dump. In Listing 4.2, an application calls FindFirstFileA() in KERNEL32.DLL.

Listing 4.2. Function Imports

image

The calls are implemented in this way to make the loader's job easier and faster. By thunking all calls to a given DLL function through one location, there is no longer the need for the loader to patch every instruction that calls a DLL. All the PE loader has to do is patch the correct addresses into the list of DWORDs in the .idata section for each imported function.

The import table is very useful for modern 32-bit Windows viruses. Because the system loader has to patch the addresses of all the APIs that a Win32 program uses by importing, viruses can easily get the address of an API they need to call by looking into the host program's import table.

With traditional DOS viruses, this problem does not exist. When a DOS virus wants to access a system service function, it simply calls a particular interrupt with the corresponding function number. The actual address of the interrupt is placed in the interrupt vector table and is picked up automatically during the execution of the program. The interrupt vector table is not saved from the running programs; all applications can read and write into it because there are no privilege levels in DOS. The OS and all applications share the same available memory with equivalent rights. Therefore access to a particular system function does cause problems for a DOS virus. It has access to everything it needs by default, regardless of the infection method used.

A Windows 95 virus must call APIs or system services to operate correctly. Most 32-bit applications use the import table, which the linker prepares for them. However, there are a couple of ways to avoid imports. Avoiding imports is often necessary for compatibility reasons. When an application is linked to a DLL, the actual program cannot be executed if the system loader cannot load all the DLLs specified in the import table. Moreover, the system loader checks all the necessary API calls and patches their addresses into the import table. If the loader is unable to locate a particular API by its name or ordinal value, the application cannot be executed.

Some applications must overcome this problem. For instance, if a Win32 program wants to list by name all the running processes under both Windows 95 and NT, it must use system DLLs and API calls under Windows 95 that are different from those under Windows NT. In such a case, the application is not linked directly to all the DLLs it wants to access because the program could not be executed on any system. Instead, the LoadLibrary() function is used to load the necessary DLLs, and GetProcAddress() is used to get the API's address. The actual program can access the API address of LoadLibrary() and GetProcAddress() from its import table. This solves the chicken-and-egg problem of how to call an API without knowing its address if an API call is needed.

As we will see later, Boza solves the problem by using hard-coded API addresses. Modern Win32 viruses, however, are capable of searching the import table during infection time and saving pointers to the .idata section's important entries. Whenever the application has imports for a particular API, the attached virus will be able to call it.

Note

One of the important differences in 64-bit and 32-bit PE files is their handling of import and export entries. The IA64 PE files use a PLABEL_DESCRIPTOR structure in place of any IAT entries. (This structure is detailed in Chapter 12.)

4.3.2.1.4 PE File Exports

The opposite of importing a function is exporting a function for use by EXEs or other DLLs. A PE file stores information about its exported functions in the .edata section. Consider the following dump, which lists a few exports of KERNEL32.DLL:

image

KERNEL32.DLL's export table consists of an Image_Export_directory, which has pointers to three different lists: the function address table, the function name table, and the function ordinal table. Modern Windows 95/NT viruses search for the “GetProcAddress” string in the function name table to be able to retrieve the API function entry-point value.

When this value is added to the ImageBase, it gives back the 32-bit address of the API in the DLL. In fact, this is almost the same algorithm that the real GetProcAddress() from KERNEL32.DLL follows internally. This function is one of the most important for Windows 95 viruses that want to be compatible with more than one Win32-based system. When the address of GetProcAddress() is available, the virus can get all the API addresses it wants to use.

4.3.2.2 First-Generation Windows 95 Viruses

The first Windows 95 virus, known as W95/Boza.A, was introduced in the VLAD virus writer magazine. Boza's authors obviously wanted to be the first with their creation, and they had to find a Windows 95 beta version very quickly to do so. Pioneer viruses used to be very buggy, and Boza was no exception. Basically, the virus cannot work on more than two Windows 95 versions: a beta release and the final version. Even on those two Windows 95 releases, the virus causes many general protection faults during replication. Infected files are often badly corrupted.

Boza is a typical appending virus that infects PE applications. The virus body is placed in a new section called .vlad. First the .vlad section header is patched into the section table as the last entry, and the number of sections field is incremented in the PE header. The body of the virus is appended to the end of the original host program, and the PE header's entry point is modified to point to the new entry point in the virus section.

Boza uses hard-coded addresses for all the APIs it has to call. That approach is the easiest, but, fortunately, it is not very successful. The authors of the virus worked on a beta version of Windows 95 first and used addresses hard-coded for that particular implementation of KERNEL32.DLL. Later they noticed that the actual virus did not remain compatible with the final release of Windows 95. This happened because Microsoft did not have to provide the same ordinal values and addresses for all the APIs for every system DLL in all releases. This would be impossible. Different Windows 95 implementations—betas, language versions, OSR2 releases—do not share the same API addresses. For instance, the first API call in Boza happens to be GetCurrentDirectoryA(). Figure 4.29 shows that the ordinal values and entry points of GetCurrentDirectoryA are different in the English version of Windows 95 and in the Hungarian OSR2 Windows 95 release of KERNEL32.DLL.

Figure 4.29. The ordinal references on two releases of Windows 95.

image

ImageBase is 0xBFF70000 in both KERNEL32.DLL releases, but the procedure address of GetCurrentDirectoryA() is 0xBFF77744 in the English release and 0xBFF7774C in the Hungarian OSR2 version. When Boza wants to replicate on the Hungarian version of Windows 95, it calls an incorrect address and, obviously, fails to replicate. Therefore, Boza cannot be called a real Windows 95–compatible virus. It turns out that Boza is incompatible with most Windows 95 releases.

Regardless of these facts, many viruses try to operate with hard-coded API addresses. Most of these Windows 95 viruses cannot become in the wild. Virus writers seem to understand Win32 systems much better already, creating viruses that are compatible not only with all Windows 95 releases but also with Windows 98 and Windows NT versions.

4.3.2.2.1 Header Infection

This type of Windows 95 virus inserts itself between the end of the PE header (after the section table) and the beginning of the first section. It modifies the AddressOfEntryPoint field in the PE header to point to the entry point of the virus instead. The first known virus to use this technique was W95/Murkry.

The virus code must be very short in Windows 95 header infections. Because sections must start at an offset that is a multiple of the FileAlignment, the maximum available place to overwrite cannot reach much more than the FileAlignment value. When the application contains too many sections and the FileAlignment is 512 bytes, there is no place for the virus code. The AddressOfEntryPoint field is an RVA; however, the virus code is not placed in any of the sections and, therefore, the actual RVA is the real physical offset in the file that the virus must place in the header. It is interesting to note that the entry point does not point into any code section but, regardless of that fact, Windows 95's loader happily executes the infected program.

There is a chance that a scanner will fail to detect the second generation of such viruses. This happens when the scanner is only tested on first-generation samples. In first-generation samples, the AddressOfEntryPoint points to a valid section. When the scanner looks for the entry point of the program, it must check all the section headers and whether the AddressOfEntryPoint points to any of them. There is a chance that this function is not implemented to handle those cases in which the entry point does not point to any of the sections. Some scanners may skip the file instead of scanning it from the real entry point, thereby failing to detect the infection in second-generation samples.

4.3.2.2.2 Prepending Viruses

The easiest way to infect PE files is to overwrite their beginning. Some DOS viruses infect PE files this way, but none of the known Windows 95 viruses use this infection method. Of course, the application will not work correctly after the infection. Such viruses are discovered almost immediately for this reason, which is why viruses that do not want to handle the complicated file format of PE files use the prepending method. Such viruses are usually written in a high-level language (HLL) such as C or even Delphi. This method consists of prepending the virus code to the PE file. The infected program starts with the EXE header of the virus. When the virus wants to transfer control to the original program code, it has to extract it to a temporary file and execute it from there.

Disinfection of such viruses is easy. The original header information is available at the very end of the infected program in a nonencrypted format. Virus writers will recognize that and will encrypt the original header information later on. This will make disinfection more complicated.

4.3.2.3 Appending Viruses That Do Not Add a New Section Header

A more advanced appending method is used by the W95/Anxiety virus. Anxiety is very similar to Boza in its infection mechanism, but its code is more related to the somewhat bogus W95/Harry virus.

The Anxiety virus does not add a new section header at the end of the section table. Rather, it patches the last section's section header to fit into that section. In this way, the virus can infect all PE EXE files easily. There is no need to worry that the actual section header does not fit into the section table.

By modifying the VirtualSize and SizeOfRawData fields, the virus code can be placed at the end of the executable. In this way, the NumberOfSection field of the PE header should not need to be modified. The AddressOfEntryPoint field is changed to point to the virus body, and the SizeOfImage is recalculated to represent the new size of the program. Listing 4.3 is the last section of CALC.EXE before and after the W95/Anxiety.1358 infection.

Listing 4.3. The Section Modification of W95/Anxiety.1358

image

The Characteristics field of the last section header is changed to have writable/executable attributes. The writable characteristic is enough in itself to execute self-modifying code from any section, but many virus writers initially did not realize that.

Viruses like W32/Zelly use two or more infection strategies. In basic infection mode Zelly adds two sections to the host program. In advanced infection mode, it merges all sections of the host into a single section, and appends the virus to the end of the image. This integrates the virus body tighter into the host program.

4.3.2.4 Appending Viruses That Do Not Modify the Entry Point

Some Windows 95 and Win32 viruses do not modify the AddressOfEntryPoint field of the infected program. The virus appends its code to the PE file, but it gets control in a more sophisticated way. It calculates where the original AddressOfEntryPoint points to and places a JMP instruction there that points to the virus body. Fortunately, it is very difficult to write such viruses.

This is because the virus must take care of the relocation entries that point to the overwritten part of the code. The W32/Cabanas virus masks out the relocation entries that point to that area. W95/Marburg does not place a JMP instruction at the entry point if it finds relocations for that area; instead, it modifies the AddressOfEntryPoint field. The JMP instruction should not be the first instruction in the program. W95/Marburg shows this by placing the JMP instruction after a random garbage block of code when no relocations are present in the first 256 bytes of entry-point code. In this way, it is not obvious to scanners and integrity checkers how to figure out the entry point of the virus code.

4.3.2.5 KERNEL32.DLL Infection

Most Windows 95 viruses attack the PE format, but some of them also infect DOS COM, EXE programs, VxDs, Word documents, and 16-bit Windows new executables (NE). Others may infect DLLs accidentally because these are linked in PE (or NE) formats, but the infection is not able to spread further because the standard entry point of the DLLs is not called by the system loader. Instead, the DLL's execution normally starts at its specified DLL entry point.

KERNEL32.DLL infectors do not attack the entry point. Instead, this type of virus must gain control differently. PE files have many other entry points that are useful for viruses, especially DLLs, which are export APIs (their entry points) by nature. Therefore, the easiest way to attack KERNEL32.DLLs is to patch the export RVA of one of the APIs (for instance, GetFileAttributesA) to point to the virus code at the end of the DLL image. W95/Lorez20 uses this approach. Viruses like this are able to go “resident” easily. The system loads the infected DLL during the system initialization period. After that, every program that has KERNEL32.DLL imports will be attached to this infected DLL. Whenever the application has a call to the API in which the virus code has been attached, the virus code gets control.

All the system DLLs contain a precalculated checksum in their PE header, placed there by the linker. Unlike Windows 95, Windows NT recalculates this checksum before it loads the DLL. If the calculated checksum is not the same as in the header of the DLL, the system loader stops with an error message during the blue screen boot-up period. However, this does not mean that such a virus cannot be implemented for Windows NT—it just makes implementation a bit more complicated. Although the checksum algorithm is not documented by Microsoft, there are APIs available in IMAGEHLP.DLL for these purposes—like CheckSumMappedFile()—which are efficient enough to calculate a new, correct checksum after the actual infection is done. This is not enough, however, for Windows NT's loader. There are several other steps to take, but there is no doubt that virus writers will be able to solve these questions soon. There is a need for virus scanners to check the consistency of a KERNEL32.DLL by recalculating the PE header checksum, especially if the scanner is a Win32 application itself and is attached to an infected KERNEL32.DLL.

4.3.2.6 Companion Infection

Companion viruses are not very common. Nevertheless, some virus writers do develop Windows 95 companion viruses. A path companion virus depends on the fact that the operating system always executes files with a COM extension first in preference to an EXE extension, if the names of two files in the same directory differ only in their extensions. These viruses simply look for a PE application with an EXE extension and then copy themselves with the same name into the same directory (or somewhere on the path) with a COM extension, using the host's name. W95/Spawn.4096 uses this technique. This functionality is implemented by using FindFirstFileA(), FindNextFileA() APIs for search, CopyFileA() to copy the virus code, and CreateProcessA() to execute the original host program.

4.3.2.7 Fractionated Cavity Infection

I originally predicted this infection technique as one that would possibly be developed in the future. However, the W95/CIH virus had already introduced this technique before my first lecture on Win32 viruses.

There is slack space between most sections, which is usually filled with zeros (or 0xCC) by the linker. This is because the sections have to start at the file alignment, as described in the PE header's FileAlignment field. The actual virtual size each section uses is usually different from the raw data representation. Usually, the virtual size is a smaller value. In most cases, Microsoft's Link program generates PE files like that. The difference between the raw data size of the section and the virtual size is the actual alignment area, which is filled by zeros and not loaded when the program is mapped into its own address space.

Because the default value of FileAlignment is 512 bytes (usual sector size), the usual slack area size is smaller than 512 bytes. When I first considered this kind of infection method, I thought that no such viruses would be developed because less than 512 bytes is not big enough for an average PE infector virus of that kind. However, two minutes later I had to recognize that this simple problem would not stop virus writers from developing such viruses. The only thing that has to be done by the virus is to split its virus body into several parts and then into as many section alignments as are available. The loader code for these blocks can be very short, first moving each separated code block to an allocated memory area, one by one. This code itself fits into a big enough section alignment area.

This is the precise method used by the W95/CIH virus. This makes the job of the scanner and the disinfector much harder. The virus changes the virtual size of the section to be the same as the raw data size in each section header, into which it injects a part of its virus body. The exact identification of such viruses is more difficult than for normal viruses because the virus body must be fetched from different areas of the PE image first.

W95/CIH uses the header infection method at the same time and infects Microsoft Linker–created images without any problem. The fragmented cavity infection technique has a very important advantage from the virus's point of view. The infected file does not get bigger after the infection; its size remains the same. This makes noticing the virus much harder. The identification must be done very carefully because a virus like that may split its body at any offset, which might also separate the actual search string into several parts. This fact shows that it is very important to analyze new Windows 95 viruses with extreme care. Otherwise, the scanner might not find all generations of the same virus code.

4.3.2.8 Modification of the lfanew Field in an Old EXE Header

This is the second infection method that I originally intended to describe as one that has not yet been developed. However, as with the fragmented cavity infection method (discussed in the previous section), this technique appeared in a virus during the time I was writing about it. This infection method is one of the simplest to implement and therefore is used in many viruses. The first known virus to use this method was W95/Cerebrus. The method itself works on Windows NT, but there is a trivial bug in the virus that makes this impossible. Basically, this infection method is an appending type—the virus body is attached to the very end of the original program.

The important difference is that the virus code itself contains its own PE header. When the virus infects a PE application, it modifies the lfanew field (at 0x3c address) in the old EXE header. As described earlier, the lfanew field holds the file address of the PE header. Because this field points to a new PE header, the program is executed as if it contains only the virus code. The virus functions like a normal Win32 application. It has its own imports and can easily access any APIs it wants to call. When the replication is done, the virus creates a temporary file with a copy of the infected program. In this file, the lfanew field will point correctly to the original PE header. Thus, the original program is functional again when the virus executes the temporary file.

4.3.2.9 VxD-Based Windows 95 Viruses

Most Windows 95 viruses are direct-action infectors. Virus writers recognized the importance of fast infection and tried to look for solutions to implement Windows 95 resident viruses. Though not the easiest, the evident solution was to write a VxD virus. One of the first VxD-based viruses was W95/Memorial. It infects DOS, COM, EXE, and PE applications. The virus does not replicate without Windows 95. The infected programs use a dropping mechanism to extract the real virus code—a VxD into the root directory of drive C: as CLINT.VXD.

When the VxD is loaded, the virus code is executed on ring 0, thus the virus can do anything it wants. VxDs can hook the file system easily, and that is exactly what most VxD viruses want to do. They simply hook the installable file system (IFS) with one simple VxD service routine. After that, the virus can monitor access to all files. The VxD code has to be extracted, and the dropper code needs different implementation for each and every format that the virus wants to infect. This makes the virus code very complicated and relatively big (12,413 bytes). Therefore, it is very unlikely that many viruses like this will be developed in the future.

4.3.2.10 PE Viruses That Operate as VxDs

A much easier solution has been introduced by the W95/Harry and W95/Anxiety viruses. These viruses can overcome complications by patching their code into the VMM (virtual machine manager) of Windows 95.

When an infected PE program is executed, the virus code takes control. Programs are executed on the application level, which is why they cannot call system-level functions (VxD calls) normally. These viruses bypass the system by installing their code into the VMM, which runs on ring 0. The installation routine of such a virus searches for a big enough hole in the VMM's code area after the 0C0001000h address.

If a large enough area, consisting of only 0FFh bytes, is detected, the virus looks for the VMM header at 0x0C000157Fh and checks this area by comparing it to VMM. If this is detected, the virus picks up the Schedule_VM_Event system function's address from the VMM and saves it for later use. Then it copies its code into the VMM by overwriting the previously located hole and changes the original Schedule_VM_Event's address to point to a new function. Finally, it executes the original host program by jumping to the original entry point. This all is possible because Microsoft is unable to protect that area from changes to keep backward compatibility with old Windows 3.x VxDs. The full VMM area is available for read and write access for application-level programs.

Before the host program can be executed, the VMM will call Schedule_VM_Event, which is now replaced by the initialization routine of the virus. This code is executed on ring 0 already, which enables it to call VxD functions. Anxiety hooks the IFS by calling IFSMgr_InstallFileSystemApiHook from there. This installs the new hook API of the virus.

The virus replication code needs special care. When VxD code is executed, VxD calls are patched by the VMM. The VMM turns the 0CDh, 20h, DWORD function ID (INT 20H, DWORD ID)21 to FAR CALLS. Some of the VxD functions consist of a single instruction. In this case, the VMM patches the six bytes with this single instruction, which fits there. The VMM does this dynamically with all the executed VxDs to speed up their execution.

When the virus code is executed, the VxD functions in the virus body are patched by the VMM, and the virus therefore cannot copy this image immediately to files again because the virus code would not work in a different Windows 95 environment. These viruses contain a function that patches all their VxD functions back to their normal format first and only after that replicates the code into the host program. Even if this technique looks very complicated, it is not very difficult for virus writers. W95/Anxiety variants used to be in the wild in many countries.

There is no doubt that several viruses will try to overcome the ring 3 to ring 0 problem using similar methods even on Windows NT–based systems. W95/CIH uses instructions that are available only from Intel 386 processors and above. It is interesting to note that the interrupt descriptor table is available to write under Windows 95 (because it is part of the VMM). W95/CIH uses the SIDT (store IDT) instruction to get a pointer to the IDT (this technique is detailed in Chapter 6). In this way, the virus can modify the gate descriptor of INT 3 (debug interrupt) in the IDT and allocate memory by using VxD services. The INT 3 routine will be executed as a ring 0 interrupt from its PE virus body. This trick shows how easy it is for virus writers to overcome the ring 3, ring 0 problem. Similar methods will be discovered by Windows 95 virus writers in the near future, resulting in an even simpler method.

4.3.2.11 VxD Infection

A few viruses, such as Navrhar, infect Windows Virtual Device Drivers (VxDs). Navrhar also infects Word documents that are in the OLE2 format and some standard system VxDs. The virus does not infect unknown VxDs, but only known system VxDs that are listed in its PE dropper. When an infected Word document is opened, the virus extracts its PE dropper, which is attached to the very end of the document. Therefore, the only way to access this code is to use Win32 APIs, which is why the virus imports KERNEL32.DLL APIs in its macro code. When the dropper's code is extracted from the document, the dropper is executed, checking for the listed VxDs and infecting them one by one. When the system is rebooted, one of the infected VxDs will be loaded by Windows 95. The virus takes control from the infected VxD, hooks the file system, and checks for Word document access.

Navrhar illustrates that, unlike DOC files, PE applications are not so commonly exchanged by users—not to mention VxDs, which are not normally exchanged at all. This is why modern Win32 viruses use some form of worm propagation mechanism instead (see Chapters 9 and 10).

4.3.2.12 DLL Load Insertion Technique

This particular infection technique is based on manipulation of PE files in such a way that when the host application is loaded, it will load an extra DLL, which is the virus code.

For example, W32/Initx loads a DLL with the name INITX.DAT via a single LoadLibrary() call inserted into the host program. This extra code is inserted into a slack space of the code section of the host, and the entry point of the host is modified to point to the inserted code. On execution of the host program and whenever the INITX.DAT file is available, the virus code is launched before the host program's code. After this, control is given to the original host entry-point code.

4.3.3 Win32 and Win64 Viruses: Designed for Microsoft Windows?

Microsoft's strategy is clear. The Designed for Microsoft Windows logo program's important requirement is that every application in your product must be a Microsoft Win32 program compiled with a 32-bit compiler that generates an executable file of the PE format. Not surprisingly, the number of Win32 programs developed by third parties has grown intensively during the last few years. People exchange and download more PE programs.

The main reason that Windows 95 and Win32 viruses did not cause big problems for a long time was that virus writers had to learn a lot to “support” the new systems. Young virus writers understand Microsoft's message: “Windows everywhere!” Their answer seems to be “Windows viruses everywhere!” These young guys will not waste their time with DOS viruses anymore but will continuously explore Win32 and Win64 platforms instead.

There is no longer any point in attackers' writing DOS viruses. Virus scanners are much weaker in handling Windows viruses generically and heuristically—detection and disinfection are not that easy. Vendors must learn and understand the new 64-bit file formats and spend a reasonable amount of time researching and designing new scanning technology.

Because Windows 95 and Windows NT are more complicated systems, it is natural that the first period of such viruses took more time than DOS viruses. However, the number of Win32 viruses surpassed 10,000 in 2004. It took about 10 years for DOS viruses to reach 10,000 known variants, but only 9 years for Win32 threats. This indicates that, although virus writing slows down as new platforms appear (replacing older ones), eventually the growth ratio of any virus type will be exponential.

In the following section, I will describe some important issues that make a Windows 95 virus incompatible with Windows NT. This specifies the differences between the Windows 95 and Win32 prefixes that scanners use to identify 32-bit Windows viruses.

4.3.3.1 Important Windows 95 and NT System Loader Differences

Before I understood W32/Cabanas, I had a different picture of Windows NT from the security point of view because I had incorrect conclusions about the level of system security when the first Windows 95 virus, Boza, appeared. Most antivirus researchers immediately performed some tests with Boza on Windows NT. The result looked reassuring: Windows NT did not even try to execute the infected image as shown on Figure 4.30.

Figure 4.30. An error message is displayed when executing W95/Boza on Windows NT.

image

What is good for Window 95's loader is not good for Windows NT. Why? I answered this question myself by patching PE files.

The PE file format was designed by Microsoft for use by all its Win32 operating systems (Windows NT/2000/XP/2003, Windows 95/98/Me, Win32s, and Windows CE). (Later, the PE file format was extended to PE+ to accommodate the needs of 64-bit platforms.) That is why all the system loaders in Win32 systems have to understand this executable structure. However, the implementation of the loader is different from one system to another. Windows NT's loader simply checks more things in the PE file before it executes the image than Windows 95's loader does. Thus Windows NT finds the Boza-infected file suspicious. This happens because one field in the .vlad section header (which is patched into the section table of the host program) is not precisely calculated by the virus. As a result, correctly calculated sections and section headers can be added to a PE file without any problem. Thus the Windows NT's loader does not have any superior virus detection, as some may assume.

If this problem were fixed in Boza, the virus would be capable of starting the host program even on a Windows NT platform. However, the virus would still not be able to replicate. This is because of another incompatibility problem, from which all the initial Windows 95 viruses have suffered. Every Windows 95 virus must overcome a specific problem: It must be able to call two Win32 KERNEL APIs: GetModuleHandle() and GetProcAddress(). Because those APIs are in KERNEL32.DLL, Windows 95 viruses could access those functions from KERNEL32.DLL directly with a hack. Most Windows 95 viruses have hard-coded pointers to GetModuleHandle() and GetProcAddress() KERNEL APIs. By using GetProcAddress(), the virus can access all the APIs it wants to call. (Alternatively, some viruses use LoadLibrary() to get a module handle to KERNEL32.DLL, but this method is less common. This is because most applications already map the KERNEL32 API in their process address space.)

When the linker creates an executable, it assumes that the file will be mapped to a specific location in memory. In the PE file header, there is a field called ImageBase holding this address. For executables, this address is usually 0x400000 by default. In the case of Windows 95, the KERNEL32.DLL's ImageBase address is 0xBFF70000. Thus, the address of GetModuleHandle() and GetProcAddress() will be at a certain fixed location in the same release of KERNEL32.DLL. However, this address can be different in a new release, which makes Windows 95 viruses incompatible even with other Windows 95 releases. This ImageBase address is 0x77F00000 in Windows NT as the default. Thus Windows 95 viruses that operate with a Windows 95–specific base address cannot work on Windows NT. (Interestingly enough, first-generation exploit code often suffers from similar problems and is only able to work on a single platform.)

The third reason for incompatibility is obvious: Windows NT does not support VxDs. Viruses such as Memorial cannot operate on Windows NT because such viruses are VxD-based. They should have included different infection algorithms at the driver level for Windows NT and Windows 95 to operate on both systems, which would make them complicated.

If a Windows 95 virus can overcome the preceding incompatibility and implementation problems, it will eventually work on Windows NT/2000/XP/2003 as well. Such viruses might have Unicode support, but it is not mandatory. W32/Cabanas supports all of these features, being able to trespass the OS barrier imposed by early Windows 95 creations.

Both Boza and Cabanas are 32-bit Win32 programs. Cabanas infects files under Windows 95/98/Me (and any other localized versions) and under all major Windows NT–based systems releases, such as 3.51, 4.0, 5.0 (Windows 2000), and 5.1 (Windows XP). Boza replicates only under the English Windows 95 release. Therefore, the prefix part of the virus name is Win32 for Cabanas and Win95 for Boza.

4.4 Conclusion

This chapter has presented a great deal about computer virus infection techniques in files and other objects. It is important to be familiar with these techniques because they have a great impact on the design of antivirus engines. Even more importantly, they affect the analysis process for both manual and automated methods, which will be demonstrated in Chapter 15.

References

1. Adam Petho, ROM BIOS, 1989, ISBN: 963-553-129-X (Paperback).

2. Fridrik Skulason, “Azusa—Complicating the Recovery Process,” Virus Bulletin, April 1991, p. 23.

3. Jakub Kaminski, “Rainbow: To Envy or to Hate,” Virus Bulletin, September 1995, pp. 2-7.

4. Mike Lambert, “Circular Extended Partitions: Round and Round with DOS,” Virus Bulletin, September 1995, p. 14.

5. Fridrik Skulason, “Investigation: The Search for Den Zuk,” Virus Bulletin, 1991, pp. 6-7.

6. Mikko Hypponen, “Virus Activation Routines,” EICAR, 1995, pp. T3 1-11.

7. Fridrik Skulason, “Disk Killer,” Virus Bulletin, January 1990, pp. 12-13.

8. Jan Hruska, “Virus Writers and Distributors,” Virus Bulletin, July 1990, pp. 12-14.

9. Dr. Vesselin Bontchev, private communication, 1996.

10. Peter Morley, personal communication, 1999.

11. Peter Szor, “Coping with Cabanas,” Virus Bulletin, November 1997, pp. 10-12.

12. Peter Szor, “Olivia,” Virus Bulletin, June 1997, pp. 11-12.

13. Peter Szor, “Nexiv_Der: Tracing the Vixen,” Virus Bulletin, April 1996, pp. 11-12.

14. Peter Szor, “Shelling Out,” Virus Bulletin, February 1997, pp. 6-7.

15. Matt Pietrek, Windows Internals, Addison-Wesley, 1993, ISBN: 0-201-62217-3 (Paperback).

16. Adrian Marinescu, “Russian Doll,” Virus Bulletin, August 2003, pp. 7-9.

17. Peter Ferrie, “Unexpected Resutls [sic],” Virus Bulletin, June 2002, pp. 4-5.

18. Peter Szor, “Attacks on Win32,” Virus Bulletin Conference, 1998.

19. Peter Szor, “High Anxiety,” Virus Bulletin, January 1998, pp. 7-8.

20. Peter Szor, “Breaking the Lorez,” Virus Bulletin, October 1998, pp. 11-13.

21. Andrew Schulman, Unauthorized Windows 95, IDG Books, 1994, ISBN: 1-568-84305-4.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.238.134