Chapter 22. Malicious Logic

 

TITUS ANDRONICUS: Ah!, wherefore dost thou urge the name of hands?To bid Aeneas tell the tale twice o'er,How Troy was burnt and he made miserable?

 
 --The Tragedy of Titus Andronicus, III, ii, 26–28.

Computer viruses, worms, and Trojan horses are effective tools with which to attack computer systems. They assume an authorized user's identity. This makes most traditional access controls useless. This chapter presents several types of malicious logic, focusing on Trojan horses and computer viruses, and discusses defenses.

Introduction

Odysseus, of Trojan War fame, found the most effective way to breach a hitherto-impregnable fortress was to have people inside bring him in without knowing they were doing so [482, 1016]. The same approach works for computer systems.

  • Definition 22–1. Malicious logic is a set of instructions that cause a site's security policy to be violated.

Trojan Horses

A critical observation is the notion of “tricked.” Suppose the user root executed this script unintentionally (for example, by typing “ls” in the directory containing this file). This would be a violation of the security policy. However, if root deliberately typed

cp /bin/sh /tmp/.xxsh
chmod o+s,w+x /tmp/.xxsh

the security policy would not be violated. This illustrates a crucial component of the problems with malicious logic. The system cannot determine whether the instructions being executed by a process are known to the user or are a set of instructions that the user does not intend to execute. The next definition makes this distinction explicit.

  • Definition 22–2. A Trojan horse is a program with an overt (documented or known) effect and a covert (undocumented or unexpected) effect.

Dan Edwards was the first to use this term [26]. Trojan horses are often used in conjunction with other tools to attack systems.

Trojan horses can make copies of themselves. One of the earliest Trojan horses was a version of the game animal. When this game was played, it created an extra copy of itself. These copies spread, taking up much room. The program was modified to delete one copy of the earlier version and create two copies of the modified program. Because it spread even more rapidly than the earlier version, the modified version of animal soon completely supplanted the earlier version. After a preset date, each copy of the later version deleted itself after it was played [290].

  • Definition 22–3. A propagating Trojan horse (also called a replicating Trojan horse) is a Trojan horse that creates a copy of itself.

Karger and Schell [552], and later Thompson [995], examined detection of Trojan horses. They constructed a Trojan horse that propagated itself slowly and in a manner that was difficult to detect. The central idea is that the Trojan horse modifies the compiler to insert itself into specific programs, including future versions of the compiler itself.

Computer Viruses

This type of Trojan horse propagates itself only as specific programs (in the preceding example, the compiler and the login program). When the Trojan horse can propagate freely and insert a copy of itself into another file, it becomes a computer virus.

  • Definition 22–4. A computer virus is a program that inserts itself into one or more files and then performs some (possibly null) action.

The first phase, in which the virus inserts itself into a file, is called the insertion phase. The second phase, in which it performs some action, is called the execution phase. The following pseudocode fragment shows how a simple computer virus works.

beginvirus:
   if spread-condition then begin
       for some set of target files do begin
           if target is not infected then begin
              determine where to place virus instructions
              copy instructions from beginvirus to endvirus
                  into target
              alter target to execute added instructions
           end;
       end;
   end;
   perform some action(s)
   goto beginning of infected program
endvirus:

As this code indicates, the insertion phase must be present but need not always be executed. For example, the Lehigh virus [470] would check for an uninfected boot file (the spread-condition mentioned in the pseudocode) and, if one was found, would infect that file (the set of target files). Then it would increment a counter and test to see if the counter was at 4. If so, it would erase the disk. These operations were the action(s).

Authorities differ on whether or not a computer virus is a type of Trojan horse. Most equate the purpose of the infected program with the overt action and consider the insertion and execution phases to be the covert action. To them, a computer virus is a Trojan horse [310, 516]. However, others argue that a computer virus has no covert purpose. Its overt purpose is to infect and execute. To these authorities, it is not a Trojan horse [204, 739]. In some sense this disagreement is semantic. In any case, defenses against a Trojan horse inhibit computer viruses.

According to Ferbrache [346], programmers wrote the first computer viruses on Apple II computers. A virus developed for research purposes in 1980 wrote itself to the disk boot sectors when the catalogue command was executed. Another one infected many copies of the game “Congo,” which stopped working. Friends of its author had released it before it was fully debugged. The author rewrote it to replace existing copies of itself with the fully debugged version. Released into the wild, it rapidly supplanted the buggy copies.

In 1983, Fred Cohen was a graduate student at the University of Southern California. During a seminar on computer security, he described a type of Trojan horse that the teacher, Len Adleman, christened a computer virus [205]. To demonstrate the effectiveness of the proposed attack, Cohen designed a computer virus to acquire privileges on a VAX-11/750 running the UNIX operating system. He obtained all system rights within half an hour on the average, the longest time being an hour and the shortest being less than 5 minutes. Because the virus did not degrade response time noticeably, most users never knew the system was under attack.

In 1984, an experiment involving a UNIVAC 1108 showed that viruses could spread throughout that system, too. Unlike the UNIX system, the UNIVAC partially implemented the Bell-LaPadula Model, using mandatory protection mechanisms.[2] Cohen's experiments indicated that the security mechanisms of systems that did not inhibit writing using mandatory access controls did little if anything to inhibit computer virus propagation [204, 205].

The Brain (or Pakistani) virus, written for IBM PCs, is thought to have been created in early 1986 [346] but was first reported in the United States in October 1987. It alters the boot sectors of floppy disks, possibly corrupting files in the process. It also spreads to any uninfected floppy disks inserted into the system. Since then, numerous variations of this virus have been reported [471].

In 1987, computer viruses infected Macintosh, Amiga, and other computers. The MacMag Peace virus would print a “universal message of peace” on March 2, 1988, and then delete itself [355]. This computer virus infected copies of the Aldus FreeHand program, which were recalled by their manufacturer [346].

In 1987, Tom Duff experimented on UNIX systems with a small virus that copied itself into executable files. The virus was not particularly virulent, but when Duff placed 48 infected programs on the most heavily used machine in the computing center, the virus spread to 46 different systems and infected 466 files, including at least one system program on each computer system, within 8 days. Duff did not violate the security mechanisms in any way when he seeded the original 48 programs [312]. He wrote another virus in a Bourne shell script. It could attach itself to any UNIX program. This demonstrated that computer viruses are not intrinsically machine-dependent and can spread to systems of varying architectures.

In 1989, Dr. Harold Joseph Highland developed a virus for Lotus 1-2-3 [471]. This virus, stored as a set of commands for that spreadsheet, was loaded automatically when a file was opened. Because the virus was intended for a demonstration only, it changed the value in a specific row and column and then spread to other files. This demonstrated that macros for office-type programs on personal computers could contain viruses.

Several types of computer viruses have been identified.

Boot Sector Infectors

The boot sector is the part of a disk used to bootstrap the system or mount a disk. Code in that sector is executed when the system “sees” the disk for the first time. When the system boots, or the disk is mounted, any virus in that sector is executed. (The actual boot code is moved to another place, possibly another sector.)

  • Definition 22–5. A boot sector infector is a virus that inserts itself into the boot sector of a disk.

Executable Infectors

  • Definition 22–6. An executable infector is a virus that infects executable programs.

The PC variety of executable infectors are called COM or EXE viruses because they infect programs with those extensions. Figure 22-1 illustrates how infection can occur. The virus can prepend itself to the executable (as shown in the figure) or append itself.

How an executable infector works. It inserts itself into the program so that the virus code will be executed before the application code. In this example, the virus is 100 words long and prepends itself to the executable code.

Figure 22-1. How an executable infector works. It inserts itself into the program so that the virus code will be executed before the application code. In this example, the virus is 100 words long and prepends itself to the executable code.

Multipartite Viruses

  • Definition 22–7. A multipartite virus is one that can infect either boot sectors or applications.

Such a virus typically has two parts, one for each type. When it infects an executable, it acts as an executable infector; when it infects a boot sector, it works as a boot sector infector.

TSR Viruses

  • Definition 22–8. A terminate and stay resident (TSR) virus is one that stays active (resident) in memory after the application (or bootstrapping, or disk mounting) has terminated.

TSR viruses can be boot sector infectors or executable infectors. Both the Brain and Jerusalem viruses are TSR viruses.

Viruses that are not TSR execute only when the host application is executed (or the disk containing the infected boot sector is mounted). An example is the Encroacher virus, which appends itself to the ends of executables.

Stealth Viruses

  • Definition 22–9. Stealth viruses are viruses that conceal the infection of files.

These viruses intercept calls to the operating system that access files. If the call is to obtain file attributes, the original attributes of the file are returned. If the call is to read the file, the file is disinfected as its data is returned. But if the call is to execute the file, the infected file is executed.

Encrypted Viruses

Computer virus detectors often look for known sequences of code to identify computer viruses (see Section 22.7.4). To conceal these sequences, some viruses encipher most of the virus code, leaving only a small decryption routine and a random cryptographic key in the clear. Figure 22-2 summarizes this technique.

An encrypted virus. The ordinary virus code is at the left. The encrypted virus, plus encapsulating decryption information, is at the right.

Figure 22-2. An encrypted virus. The ordinary virus code is at the left. The encrypted virus, plus encapsulating decryption information, is at the right.

  • Definition 22–10. An encrypted virus is one that enciphers all of the virus code except for a small decryption routine.

Polymorphic Viruses

  • Definition 22–11. A polymorphic virus is a virus that changes its form each time it inserts itself into another program.

Consider an encrypted virus. The body of the virus varies depending on the key chosen, so detecting known sequences of instructions will not detect the virus. However, the decryption algorithm can be detected. Polymorphic viruses were designed to prevent this. They change the instructions in the virus to something equivalent but different. In particular, the deciphering code is the segment of the virus that is changed. In some sense, they are successors to the encrypting viruses and are often used in conjunction with them.

Consider polymorphism at the instruction level. All of the instructions

add 0 to operand
or 1 with operand
no operation
subtract 0 from operand

have exactly the same effect, but they are represented as different bit patterns on most architectures. A polymorphic virus would insert these instructions into the deciphering segment of code.

The production of polymorphic viruses at the instruction level has been automated. At least two tool kits, the Mutation Engine (MtE) and the Trident Polymorphic Engine (TPE), were available in 1992 [1064].

Polymorphism can exist at many levels. For example, a deciphering algorithm may have two completely different implementations, or two different algorithms may produce the same result. In these cases, the polymorphism is at a higher level and is more difficult to detect.

Macro Viruses

  • Definition 22–12. A macro virus is a virus composed of a sequence of instructions that is interpreted, rather than executed directly.

Conceptually, macro viruses are no different from ordinary computer viruses. Like Duff's sh computer virus, they can execute on any system that can interpret the instructions. For example, a spreadsheet virus executes when the spreadsheet interprets these instructions. If the macro language allows the macro to access files or other systems, the virus can access them, too.

A macro virus can infect either executables or data files (the latter leads to the name data virus). If it infects executable files, it must arrange to be interpreted at some point. Duff's experiments did this by wrapping the executables with shell scripts. The resulting executables invoked the Bourne shell, which interpreted the virus code before invoking the usual executable.

Macro viruses are not bound by machine architecture. They use specific programs, and so, for example, a macro virus targeted at a Microsoft Word program will work on any system running Microsoft Word. The effects may differ. For example, most Macintosh users do not use the particular mail program that Melissa invoked, so although Macintosh Word files could have been infected, and the infection could have been spread, the virus did not mail itself to other users. On a Windows system, where most users did use that mail program, the infection was spread by mail.

Computer Worms

A computer virus infects other programs. A variant of the virus is a program that spreads from computer to computer, spawning copies of itself on each one.

  • Definition 22–13. A computer worm is a program that copies itself from one computer to another.

Research into computer worms began in the mid-1970s. Schoch and Hupp [889] developed distributed programs to do computer animations, broadcast messages, and perform other computations. These programs probed workstations. If the workstation was idle, the worm copied a segment onto the system. The segment was given data to process and communicated with the worm's controller. When any activity other than the segment's began on the workstation, the segment shut down.

Since then, there have been several incidents involving worms. The Father Christmas worm was interesting because it was a form of macro worm.

Other Forms of Malicious Logic

Malicious logic can have other effects, alone or in combination with the effects discussed in Sections 22.2 to 22.4.

Rabbits and Bacteria

Some malicious logic multiplies so rapidly that resources become exhausted. This creates a denial of service attack.

  • Definition 22–14. A bacterium or a rabbit is a program that absorbs all of some class of resource.

A bacterium is not required to use all resources on the system. Resources of a specific class, such as file descriptors or process table entry slots, may not affect currently running processes. They will affect new processes.

Logic Bombs

Some malicious logic triggers on an external event, such as a user logging in or the arrival of midnight, Friday the 13th.

  • Definition 22–15. A logic bomb is a program that performs an action that violates the security policy when some external event occurs.

Disaffected employees who plant Trojan horses in systems use logic bombs. The events that cause problems are related to the troubles the employees have, such as deleting the payroll roster when that user's name is deleted.

Theory of Malicious Logic

The types of malicious logic discussed so far are not distinct. Computer viruses are a form of Trojan horses. Computer viruses may contain logic bombs, as might computer worms. Some worms and viruses are bacteria because they absorb all the resources of some type.

An obvious question is whether a universal detector can be written to detect malicious logic. We consider the narrower question of whether there is an algorithm that can determine if an arbitrary program contains replicating code (this covers both replicating Trojan horses and computer viruses).

Theory of Computer Viruses

Cohen [207] asked if a single algorithm could detect computer viruses precisely. He demonstrated that the virus detection problem, like the safety problem (see Theorem 3–2), is undecidable.

  • Definition 22–16. [207] Let T be a Turing machine and let V be a sequence of symbols on the machine tape. Let sv be a distinguished state of T. For every vV, when T lies at the beginning of v in tape square k, suppose that after some number of instructions are executed, a sequence v' ∊ V lies on the tape beginning at location k', where either k + |v| ≤ k' or k'+ |v| ≤ k. Then (T, V) is a viral set and the elements of V are computer viruses.

Figure 22-3 illustrates this definition. The virus v may copy another element of V either before or after itself but may not overwrite itself. Both possibilities are shown. If v' precedes v, then k' + |v| ≤ k; otherwise, v precedes v', and k + |v| ≤ k'. Definition 22–16 is a formal version of Definition 22–4. It focuses on the replication (copying) aspect of computer viruses but includes the execution phase as a component of v that need not be copied. In this case, v' would be the infection part of v, and the actions other than infection would be the remainder of v.

Illustration of Cohen's definition of a viral set. Here, v, v', k, and k' are as in Definition 22–16, and |v| = j. The Turing machine can make copies of v either before or after the tape squares containing v but does not overwrite any part of v. Each diagram shows a possible position for v' with respect to v on the tape.

Figure 22-3. Illustration of Cohen's definition of a viral set. Here, v, v', k, and k' are as in Definition 22–16, and |v| = j. The Turing machine can make copies of v either before or after the tape squares containing v but does not overwrite any part of v. Each diagram shows a possible position for v' with respect to v on the tape.

Cohen established the undecidability of detecting generic computer viruses by showing that, if such a decision procedure existed, it would solve the halting problem. Consider an arbitrary Turing machine T and an arbitrary sequence S of symbols on tape. Construct a second Turing machine T' and tape V such that, when T halts on S, V and T' create a copy of S on the tape. Then T' replicates S if and only if T halts on S. By Definition 22–16, a replicating program is a computer virus. So, there is a procedure that decides if (T', V) is a viral set if and only if there is a procedure that determines if T halts on S—that is, if there is a procedure that will solve the halting problem. Because the latter does not exist, neither can the former.

  • Theorem 22–1. [207] It is undecidable whether an arbitrary program contains a computer virus.

ProofLet T and V define a Turing machine and sequence of tape symbols, respectively. We construct a second Turing machine T' and sequence V' such that T' reproduces V if and only if running T on V halts.

Let A and B be tape symbols, so A, BM. Let qi, i ≥ 1 be states of the Turing machine, so qiK for i ≥ 1. Let a, b, i, and j be non-negative integers. We also redefine the function δ as δ: K × MK × M × {L, R, –}, where “–” refers to no motion. This function is equivalent to the δ function in Section 3.2 (see Exercise 13).

We will find it convenient to abbreviate arguments and values of d as follows. Let x, y, z, u, and si, i ≥ 1, represent values drawn from the set of tape symbols M. We can then write

  • δ(qa, y) = (qa, y, L) when yA

to represent all definitions of δ where the first argument to δ is qa and the second argument to δ is an element of M other than A.

Three actions recur in our construction of T'. We define abbreviations to simplify the description of that Turing machine. For any symbol xM, LS(qa, x, qb) represents the sequence

  • δ(qa, x) = (qb, x, –)

  • δ(qa, y) = (qa, y, L) when yx

This sequence takes effect when the Turing machine is in state qa. It moves the head to the left, skipping over take squares, until the machine encounters a square with the symbol x. At that point, the Turing machine enters state qb, and the head remains over the square with the X symbol.

The abbreviation RS(qa, x, qb) is defined similarly, but for motion to the right:

  • δ(qa, x) = (qb, x, –)

  • δ(qa, y) = (qa, y, R) when yx

This sequence moves the head to the right until a square containing x is found. The head stops at that square.

The third abbreviation, COPY(qa, x, y, z, qb), means that the Turing machine's head moves right to the next square containing the symbol x and copies the symbols on the tape until the next square with the symbol y is encountered. The copy is placed after the first symbol z following the symbol y. Once the copying is completed, the Turing machine enters state qb.

The following sequence captures this. The part of each line following the semicolon is a comment, for exposition purposes only. We assume that the symbols A and B do not occur on the tape. If necessary, we augment the set M with two symbols and use them for A and B.

RS(qa, x, qa+i)

; move the head over the next x

δ(qa+i, x) = (qa+i+1, A, –)

; replace x with symbol A

RS(qa+i+1, y, qa+i+2)

; skip to the end of the segment to copy

RS(qa+i+2, z, qa+i+3)

; skip to the location to copy it to (which

δ(qa+i+3, z) = (qa+i+4, z, R)

; is the square after the one containing z)

δ(qa+i+4, u) = (qa+i+5, B,–) for any uM

; mark it with B

LS(qa+i+5, A, qa+i+6)

; move the head back to where x was

δ(qa+i+6, A) = (qa+i+7, x, –)

; put x back

δ(qa+i+7, sj) = (qa+i+5j+10, A, R) for sjy

; overwrite the first uncopied symbol

δ(qa+i+7, y) = (qa+i+8, y, R)

; for the terminal one, go to cleanup

RS(qa+i+5j+10, B, qa+i+5j+11)

; move to location to copy symbol to

δ(qa+i+5j+11, B) = (qa+i+5j+12, sj, R)

; put it down

δ(qa+i+5j+12, u) = (qa+i+5j+13, B, –)

; mark where the next symbol goes

LS(qa+i+5j+13, A, qa+i+5j+14)

; go back to where the original was

δ(qa+i+5j+14, A)= (qa+i+7, sj, R)

; copy it back

RS(qa+i+8, B, qa+i+9)

; last symbol—move to where it goes

δ(qa+i+9, B) = (qb, y, –)

; write it and enter terminal state

We proceed to construct T' and V'. Define the set of symbols in T' to be

  • M' = { A, B, C, D } ∪ M

where A, B, C, DM, and the set of states to be

  • K' = { qa, qb, qc, qd, qe, qf, qg, qh, qH } ∪ K

where qa, qb, qc, qd, qe, qf, qg, qh, qHK. The initial state of T' is qa, and the halting state of T' is qH. The initial state of T is qf, and we simulate the halting state of T by the state qh. We abbreviate the execution of T on the tape with the head at the current position as SIMULATE(qf, T, qh), where qf is the state of T' corresponding to the initial state of T and qh is the state of T' corresponding to the final (terminal) state of T.

Let V' = (A, B, V, C, D). Then the transition function δ for T' is:

δ(qa, A) = (qb, A, –)

; check beginning of tape

δ(qa, y) = (qH, y, –) for yA

; halting state

COPY(qb, B, C, D, qc)

; copy V after D

LS(qc, A, qd)

; head moves to D (T executes the copy

RS(qd, D, qe)

; of V, not the original one)

δ(qe, D) = (qe, D, R)

; move over the D

δ(qe, B) = (qf, B, R)

; enter the initial state of T with the

 

; head at the beginning of V

SIMULATE(qf, T, qh)

; simulate T running on V

LS(qh, A, qg)

; T terminated—go to beginning

 

; of T''s tape

COPY(qg, A, D, D, qH)

; copy initial contents over results of

 

; running T on V (reproduction)

The Turing machine T' first makes a copy of V. It then simulates T running on the copy of V. The original V is to the left of the copy (see Figure 22-4), so the simulation of T cannot alter it. If the simulation halts, T' enters state qh, in which the original copy of V is recopied. This satisfies Definition 22–16. On the other hand, if the simulation never halts, V is never recopied, and Definition 22–16 is never satisfied. This establishes the desired result.

The tape V' at state qf. The head is positioned over the tape for T. Note that, when T is being simulated, the head can never move left over B because T cannot move to the left of the (simulated) tape.

Figure 22-4. The tape V' at state qf. The head is positioned over the tape for T. Note that, when T is being simulated, the head can never move left over B because T cannot move to the left of the (simulated) tape.

Adelman used a completely different approach to obtain a generalization of this result, which we state without proof.

  • Theorem 22–2. [9] It is undecidable whether an arbitrary program contains malicious logic.

These results mean that there is no generic technique for detecting all malicious logic, or even all computer viruses. Hence, defenses must focus on particular aspects of malicious logic. Furthermore, multiple defenses are needed. We turn to these defenses now.

Defenses

Defending against malicious logic takes advantage of several different characteristics of malicious logic to detect, or to block, its execution. The defenses inhibit the suspect behavior. The mechanisms are imprecise. They may allow malicious logic that does not exhibit the given characteristic to proceed, and they may prevent programs that are not malicious but do exhibit the given characteristic from proceeding.

Malicious Logic Acting as Both Data and Instructions

Some malicious logic acts as both data and instructions. A computer virus inserts code into another program. During this writing, the object being written into the file (the set of virus instructions) is data. The virus then executes itself. The instructions it executes are the same as what it has just written. Here, the object is treated as an executable set of instructions. Protection mechanisms based on this property treat all programs as type “data” until some certifying authority changes the type to “executable” (instructions). Both new systems designed to meet strong security policies and enhancements of existing systems use these methods (see Section 15.3.1).

Both the LOCK scheme and Duff's proposal trust that the administrators will never certify a program containing malicious logic (either by accident or deliberately) and that the tools used in the certification process are not themselves corrupt.

Malicious Logic Assuming the Identity of a User

Because a user (unknowingly) executes malicious logic, that code can access and affect objects within the user's protection domain. So, limiting the objects accessible to a given process run by the user is an obvious protection technique. This draws on the mechanisms for confining information (see Chapter 17, “Confinement Problem”).

Information Flow Metrics

Cohen suggests an approach [206]. This approach is to limit the distance a virus can spread.

  • Definition 22–17. Define the flow distance metric fd(x) for some information x as follows. Initially, all information has fd(x) = 0. Whenever x is shared, fd(x) increases by 1. Whenever x is used as input to a computation, the flow distance of the output is the maximum of the flow distance of the input.

Information is accessible only while its flow distance is less than some particular value.

This example also shows the problem with the flow distance policy (which constrains sharing based on the flow distance metric). Although Cathy cannot be infected by viruses that Bill has acquired, she can be infected by viruses that Bill has written. (For example, had Cathy run Anne's dovirus program, she would have had her files infected.) The bounding constant limits the transitivity of trust. This number should therefore be low. If it is 1, only the people from whom Cathy copies files are trusted. Cathy does not trust anyone that they trust.

This mechanism raises interesting implementation issues. The metric is associated with information and not objects. Rather than tagging specific information in files, systems implementing this policy would most likely tag objects, treating the composition of different information as having the maximum flow distance of the information. This will inhibit sharing.

Ultimately, the only way to use this policy is to make the bounding constant 0. This isolates each user into his or her own protection domain and allows no sharing. Cohen points out that this defeats the main purpose of scientific or development environments, in which users build on the work of others.

Reducing the Rights

The user can reduce her associated protection domain when running a suspect program. This follows from the principle of least privilege (see Section 13.2.1). Wiseman discusses one approach [1055], and Juni and Ponto present another idea in the context of a medical database [532].

Although effective, this approach begs the question of how to determine which entries should be in the authorization denial subsets. Karger suggests basing access on the program being executed and some characteristic of the file being accessed.

A related approach is to base access to files on some characteristic of the command or program [206], possibly including subject authorizations as well [204].

An alternative mechanism is interception of requests to open files. The “watchdog” or “guardian” then performs a check to determine if the access is to be allowed. This effectively redefines the system calls involved. The issues of determining how to write watchdogs to meet the desired goals and allowing users to specify semantics for file accesses [88, 259] may prove useful in some contexts—for example, in protecting a limited set of files.

All such mechanisms (1) trust the users to take explicit actions to limit their protection domains sufficiently, (2) trust tables to describe the programs' expected actions sufficiently for the mechanisms to apply those descriptions and to handle commands with no corresponding table entries effectively, or (3) trust specific programs and the kernel when they would be the first programs malicious logic would attack.

Sandboxing

Sandboxes and virtual machines (see Section 17.2) implicitly restrict process rights. A common implementation of this approach is to restrict the program by modifying it. Usually, special instructions inserted into the object code cause traps whenever an instruction violates the security policy. If the executable dynamically loads libraries, special libraries with the desired restrictions replace the standard libraries.

Malicious Logic Crossing Protection Domain Boundaries by Sharing

Inhibiting users in different protection domains from sharing programs or data will inhibit malicious logic from spreading among those domains. This takes advantage of the separation implicit in integrity policies (see Chapter 6).

A more general proposal [1066] suggests that programs to be protected be placed at the lowest possible level of an implementation of a multilevel security policy. Because the mandatory access controls will prevent those processes from writing to objects at lower levels, any process can read the programs but no process can write to them. Such a scheme would have to be combined with an integrity model to provide protection against viruses to prevent both disclosure and file corruption.

Carrying this idea to its extreme would result in isolation of each domain. Because sharing would not be possible, no viruses could propagate. Unfortunately, the usefulness of such systems would be minimal.

Malicious Logic Altering Files

Mechanisms using manipulation detection codes (or MDCs) apply some function to a file to obtain a set of bits called the signature block and then protect that block. If, after recomputing the signature block, the result differs from the stored signature block, the file has changed, possibly as a result of malicious logic altering the file. This mechanism relies on selection of good cryptographic checksums (see Section 9.4).

An assumption is that the signed file does not contain malicious logic before it is signed. Page [793] has suggested expansion of Boebert and Kain's model [126] to include the software development process (in effect, limiting execution domains for each development tool and user) to ensure that software is not contaminated during development.

All integrity-based schemes rely on software that if infected may fail to report tampering. Performance will be affected because encrypting the file or computing the signature block may take a significant amount of time. The encrypting key must also be secret because if it is not, then malicious logic can easily alter a signed file without the change being detected.

Antivirus scanners check files for specific viruses and, if a virus is present, either warn the user or attempt to “cure” the infection by removing the virus. Many such agents exist for personal computers, but because each agent must look for a particular virus or set of viruses, they are very specific tools and, because of the undecidability results stated earlier, cannot deal with viruses not yet analyzed.

Malicious Logic Performing Actions Beyond Specification

Fault-tolerant techniques keep systems functioning correctly when the software or hardware fails to perform to specifications. Joseph and Avizienis have suggested treating the infection and execution phases of a virus as errors. The first such proposal [529, 530] breaks programs into sequences of nonbranching instructions and checksums each sequence, storing the results in encrypted form. When the program is run, the processor recomputes checksums, and at each branch a coprocessor compares the computed checksum with the encrypted checksum; if they differ, an error (which may be an infection) has occurred. Later proposals advocate checking of each instruction [260]. These schemes raise issues of key management and protection as well as the degree to which the software managing keys, which transmit the control flow graph to the coprocessor and implement the recovery mechanism, can be trusted.

A proposal based on N-version programming [48] requires implementation of several different versions of an algorithm, running them concurrently and periodically checking their intermediate results against each other. If they disagree, the value assumed to be correct is the intermediate value that a majority of the programs have obtained, and the programs with different values are malfunctioning (possibly owing to malicious logic). This requires that a majority of the programs are not infected and that the underlying operating system is secure. Also, Knight and Leveson [574] question the efficacy of N-version programming. Detecting the spread of a virus would require voting on each file system access. To achieve this level of comparison, the programs would all have to implement the same algorithm, which would defeat the purpose of using N-version programming [575].

Proof-Carrying Code

Necula has proposed a technique that combines specification and integrity checking [762]. His method, called proof-carrying code (PCC), requires a “code consumer” (user) to specify a safety requirement. The “code producer” (author) generates a proof that the code meets the desired safety property and integrates that proof with the executable code. This produces a PCC binary. The binary is delivered (through the network or other means) to the consumer. The consumer then validates the safety proof and, if it is correct, can execute the code knowing that it honors that policy. The key idea is that the proof consists of elements drawn from the native code. If the native code is changed in a way that violates the safety policy, the proof is invalidated and will be rejected.

Malicious Logic Altering Statistical Characteristics

Like human languages, programs have specific statistical characteristics that malicious logic might alter. Detection of such changes may lead to detection of malicious logic.

Comparison of object and source may reveal that the object file contains conditionals not corresponding to any in the source. In this case, the object may be infected [385]. Similar proposals suggest examination of the appearance of programs for identical sequences of instructions or byte patterns [516, 1066]. The disadvantage of such comparisons is that they require large numbers of comparisons and need to take into account the reuse of common library routines or of code [564].

Another proposal suggests that a filter be designed to detect, analyze, and classify all modifications that a program makes as ordinary or suspicious [247]. Along the same lines, Dorothy Denning suggests the use of an intrusion-detection expert system[6] to detect viruses by looking for increases in file size, increases in the frequency of writing to executable files, or alterations in the frequency of execution of a specific program in ways that do not match the profiles of users who are spreading the infection [270].

The Notion of Trust

The effectiveness of any security mechanism depends on the security of the underlying base on which the mechanism is implemented and the correctness of the implementation. If the trust in the base or in the implementation is misplaced, the mechanism will not be secure. Thus, “secure,” like “trust,” is a relative notion, and the design of any mechanism for enhancing computer security must attempt to balance the cost of the mechanism against the level of security desired and the degree of trust in the base that the site accepts as reasonable. Research dealing with malicious logic assumes that the interface, software, and/or hardware used to implement the proposed scheme will perform exactly as desired, meaning that the trust is in the underlying computing base, the implementation, and (if done) the verification.

Summary

Malicious logic is a perplexing problem. It highlights the impotence of standard access controls, because authorized users are requesting authorized actions. The security controls cannot determine if the user knows about such actions.

The most exciting idea is the separation of data from instructions. It unites notions of strong typing with security. In addition to blocking much malicious logic, it has applications for security in general (see Chapter 23, “Vulnerability Analysis,” for examples).

Currently, file scanners are the most popular defensive mechanism. Both integrity scanners and antivirus scanners look for changes in files. Antivirus scanners (which also check for some nonvirus Trojan horses) use a database of virus signatures. New dictionaries of these signatures are released periodically, or in the event of a major virus attack. For example, updated virus dictionaries were released within hours after Melissa's discovery.

Integrity scanners check for changes in files, but without determining their causes. If the contents of a file have changed since the last scan, the integrity checker reports this fact, but another agency (user, program) must determine the reason for the change.

Research Issues

Malicious logic is a fertile ground for study, because the problem is simple but defies easy solution. The key observation is that any solution must distinguish between the actions that users knowingly perform and those same actions when users unknowingly perform them. Humans have a difficult time determining if the actions of others are deliberate, and so how can computers be endowed with such powers of discrimination? This raises three issues for research: human interaction, integrity checking, and analysis of actions.

Effective procedural mechanisms will prevent users from downloading suspect programs, but how can users be persuaded to abide by these rules, and how can the effects of violating these rules be ameliorated? The notion of “sandboxing,” or restriction of privileges (as discussed in Section 22.7.2.3), is intuitively appealing but difficult to put into practice. One issue is how to define the sandbox. The system on which the program is to be run can define the domain of execution (as some Web browsers do) or can be constrained through a combination of the system and of the program itself. In the latter case, the program carries credentials and the receiving system checks them. Both the credentials and the way in which they are checked influence the effectiveness of the reduced domain of execution.

Integrity checking is another area of active research. Cryptographic checksums have been discussed in Section 9.4, and integrity models in Chapter 6. The application of integrity models and the protection and updating of checksums are central to system security. Networks complicate the problem.

Analysis of actions for anomalies is the basis for one form of intrusion detection. Among the issues are characterization of the expected behavior of a program to such a degree that the anomalies that viruses introduce can be distinguished from normal behavior. Because computer viruses typically increase the number of writes (during the infection phase and possibly during the execution phase), examining this number may be fruitful, but other behaviors, such as transitions between localities within the program, are also affected. Could these behaviors be detected?

Further Reading

Fites, Johnston, and Kratz [355], Hruska [495], and Levin [624] present overviews of computer viruses and their effects. The National Institute of Standards and Technology Special Publication 500-166 [1025] discusses management techniques for minimizing the threats of computer viruses. Spafford, Heaphy, and Ferbrache's book [956] is well written and gives a good exposition of the state of the art in the late 1980s. Arnold [39] and Ludwig [645] describe how to write computer viruses; Arnold's book includes sample code for UNIX systems. Cohen's short course on computer viruses [208] is an excellent technical survey. McIlroy's essay [679] presents a wonderful overview of computer viruses.

Denning's essay [281] presents the nomenclature for malicious logic used in this chapter. His anthology [282], and that of Hoffman [476], collect many of the seminal, and most interesting, papers in the study of malicious logic. Parker [799], Whiteside [1041], and others describe attacks on systems using various forms of malicious logic in a more informal (and enjoyable) manner.

Appel and Felty [35] discuss a semantic model for proof-carrying code.

Exercises

1:

Tripwire does not encipher the signature blocks. What precautions must installers take to ensure the integrity of the database?

2:

Consider how a system with capabilities as its access control mechanism could deal with Trojan horses.

  1. In general, do capabilities offer more or less protection against Trojan horses than do access control lists? Justify your answer in light of the theoretical equivalence of ACLs and C-Lists.

  2. Consider now the inheritance properties of new processes. If the creator controls which capabilities the created process is given initially, how could the creator limit the damage that a Trojan horse could do?

  3. Can capabilities protect against all Trojan horses? Either show that they can or describe a Trojan horse process that C-Lists cannot protect against.

3:

Describe in detail how an executable infecting computer virus might append itself to an executable. What changes must it make to the executable, and why?

4:

A computer system provides protection using the Bell-LaPadula policy. How would a virus spread if:

  1. the virus were placed on the system at system low (the compartment that all other compartments dominate)?

  2. the virus were placed on the system at system high (the compartment that dominates all other compartments)?

5:

A computer system provides protection using the Biba integrity model. How would a virus spread if:

  1. the virus were placed on the system at system low (the compartment that all other compartments dominate)?

  2. the virus were placed on the system at system high (the compartment that dominates all other compartments)?

6:

A computer system provides protection using the Chinese Wall model. How would a virus spread throughout the system if it were placed within a company dataset? Assume that it is a macro virus.

7:

Discuss controls that would prevent Dennis Ritchie's bacterium (see Section 22.5.1) from absorbing all system resources and causing a system crash.

8:

How could Thompson's rigged compiler be detected?

9:

Place the SAT/LOCK mechanism of treating instructions and data as separate types into the framework of the Clark-Wilson model. In particular, what are the constrained data objects, the transaction procedures, and the certification and enforcement rules?

10:

Critique Lai and Gray's virus prevention mechanism described in Section 22.7.2.2. In particular, how realistic is its assessment of the set of programs to be trusted? Are there programs that they omitted or that they should have omitted?

11:

Design a signature detection scheme to detect polymorphic viruses, assuming that no encipherment of virus code was used.

12:

Assume that the Clark-Wilson model is implemented on a computer system. Could a computer virus that scrambled constrained data items be introduced into the system? Why or why not? Specifically, if not, identify the precise control that would prevent the virus from being introduced, and explain why it would prevent the virus from being introduced; if yes, identify the specific control or controls that would allow the virus to be introduced and explain why they fail to keep it out.

13:

Prove that the δ function defined in Section 22.6.1 is equivalent to the δ function in Section 3.2.



[1] See [995], p. 763.

[2] Specifically, it implemented the simple security condition but not the *-property [516].

[3] According to Compulit, as cited in [471], “[t]he author of the virus apparently forgot to set the signature during .EXE file infection. This will cause multiple infections of .EXE files” (p. 47). Analysts at the Hebrew University of Jerusalem found that the size of a .COM file increased only one time, but the size of a .EXE file increased every time the file was executed.

[4] See [346], p. 75.

[5] We use the conventional terminology of calling this program a “computer worm” because its dominant method of propagation was from computer system to computer system. Others, notably Eichin and Rochlis [322], have labeled it a “computer virus.”

[6] Chapter 25, “Intrusion Detection,” discusses this system in more detail.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.104.183