Chapter 14. Representing Identity

 

AEMELIA: Most mighty duke, behold a man much wrong'd.All gather to see them.ADRIANA: I see two husbands, or mine eyes deceive me!DUKE SOLINUS: One of these men is Genius to the other;And so of these, which is the natural man,And which the spirit? Who deciphers them?DROMIO OF SYRACUSE: I, sir, am Dromio: command him away.DROMIO OF EPHESUS: I, sir, am Dromio: pray, let me stay.

 
 --The Comedy of Errors, V, i, 332–338

The theme of identity runs throughout humanity's experience, and computers are no exception. In computer science, an identity is the basis for assignment of privileges and is integral in the designation of a protection domain. This chapter discusses the many different types of identity and the contexts in which they arise. It begins with the identity of a principal on a system, first singly and then as defined by function. Designation of identity for certificates follows, as does identity on a network with respect to both individual processes and individual hosts. The chapter concludes with the notion of an anonymous user.

What Is Identity?

Identity is simply a computer's representation of an entity.

  • Definition 14–1. A principal is a unique entity. An identity specifies a principal.

Authentication binds a principal to a representation of identity internal to the computer. Each system has its own way of expressing this representation, but all decisions of access and resource allocation assume that the binding is correct.

Identities are used for several purposes. The two main ones are for accountability and for access control. Accountability requires an identity that tracks principals across actions and changes of other identities, so that the principal taking any action can be unambiguously identified. Access control requires an identity that the access control mechanisms can use to determine if a specific access (or type of access) should be allowed.

Accountability is tied to logging and auditing. It requires an unambiguous identification of the principal involved. On many systems, this is not possible. Instead, the logged identity maps to a user account, to a group, or to a role.

Most systems base access rights on the identity of the principal executing the process. That is, all processes executed by user bishop have some set of rights. All processes executed by user holly have a set of rights that may differ from those that bishop's processes have. However, a process may have fewer rights than the principal executing it, and in fact there are substantial reasons to reduce privileges. Chapter 15, “Access Control Mechanisms,” discusses this topic in more depth.

Files and Objects

The identity of a file or other entity (here called an “object”) depends on the system that contains the object.

Local systems identify objects by assigning names. The name may be intended for human use (such as a file name), for process use (such as a file descriptor or handle), or for kernel use (such as a file allocation table entry). Each name may have different semantics.

If the object resides on a different system, the name must encode the location of the object.

One file may have multiple names. The semantics of the system determine the effects of each name. For example, some systems define “deleting a file” to mean removing the given file name. The file object itself will not be deleted until all its names (or all names meeting certain conditions) have been deleted. Section 28.3.1.3, “File Deletion,” discusses this issue further.

Users

In general, a user is an identity tied to a single entity. Specific systems may add additional constraints. Systems represent user identity in a number of different ways. Indeed, the same system may use different representations of identity in different contexts.

The same principal may have many different identities. Typically each identity serves a particular function.

Groups and Roles

The “entity” may be a set of entities referred to by a single identifier. The members of the set must be distinguishable, but the set may have an identity separate from any of its elements.

Principals often need to share access to files. Most systems allow principals to be grouped into sets called, logically enough, groups. Groups are essentially a shorthand tool for assigning rights to a set of principals simultaneously.

Two implementations of groups provide different abilities and therefore are based on different models. The first simply uses a group as an alias for a set of principals. Principals are assigned to groups, and they stay in those groups for the lifetimes of their sessions. The second model allows principals to change from one group to another. After each change, the rights belonging to the principal as a member of the previous group are discarded and the rights of the new group are added.

The difference lies in the representations of identity. In the former model, the identity assigned to a principal remains static; it is the principal identity and the set of identities of each group that the principal is a part of. This identity does not change throughout the lifetime of the session. In the latter model, the identity of the principal is the identity of the user and the set of identities of each group of which the principal is currently a member. It is dynamic, and should the principal change from one group to another, the identity of that principal also changes.

In practice, one discusses “user identity” and “group identity.”

A role is a type of group that ties membership to function. When a principal assumes a role, the principal is given certain rights that belong to that role. When the principal leaves the role, those rights are removed. The rights given are consistent with the functionality that the principal needs to perform the tasks expected of members of the role.

Naming and Certificates

Chapter 10 described certificates as a mechanism for binding cryptographic keys to identifiers. The identifier corresponds to a principal. The identifier must uniquely identify the principal to avoid confusion.

Suppose the principals are people. The identifiers cannot be names, because many different people may have the same name. (How many people named “John Smith” or “Pierre LeBlanc” are there?) The identifiers must include ancillary information to distinguish the “Matt Bishop” who teaches at UC Davis from the “Matt Bishop” who works at Microsoft Corporation.

Certification authorities (CAs) vouch, at some level, for the identity of the principal to which the certificate is issued. Every CA has two policies controlling how it issues certificates.

  • Definition 14–2. A CA authentication policy describes the level of authentication required to identify the principal to whom the certificate is to be issued.

  • Definition 14–3. A CA issuance policy describes the principals to whom the CA will issue certificates.

The difference between these two policies is that the first simply establishes the level of proof of identity needed for the CA to accept the principal's claim of identity whereas the second answers the question, “Given the identity of the principal, will the CA issue a certificate?”

CAs can issue certificates to other organizations. The certificate-based key management architecture for the Internet [563] demonstrates how such an organization can lead to a simple hierarchical structure of policies.

The principals need not be people or organizations; they can be roles.

The identifiers in a certificate need not be formal Distinguished Names. The certificates used with PGP, for example, allow the subject to select any identifier he or she wishes. The convention is to use an electronic mail address, but this permits a high level of ambiguity, especially when mail addresses change frequently. This leads directly to conflicts; how can a CA ensure that the certificate it issues does not conflict with another?

Conflicts

Both X.509 and PGP are silent about certificate conflicts. They assume that the CAs will prevent conflicts. The CA's Distinguished Name is in the certificate, so if no two CAs have the same Distinguished Name and each CA requires that principals be identified uniquely among the set of principals certified by that CA, no conflicts will arise.

The Internet certification hierarchy uses the same approach: the IPRA requires that each PCA have a unique Distinguished Name, and no PCA may certify two CAs with the same Distinguished Name. But in practice, there may be conflicts. For example, suppose John A. Smith and John B. Smith, Jr. both live at the same address. John B. Smith, Jr. applies for a certificate, based on his residence, from the post office, which issues one.

/C=US/SP=Maine/L=Portland/PA=1 First Ave./CN=John Smith/

His father, John A. Smith, applies to the Quick Certificate Company for a residential certificate. His Distinguished Name would be identical to his son's, but the Quick Certificate Company would have no way to know this because there is no central repository of certificates.

The Internet infrastructure deals with this problem in two ways. First, it requires that all CA Distinguished Names be “superior” to the Distinguished Name of the principal.

This works for organizational certificates, since each organization can be its own CA, or can empower subordinate units to be their own CAs. However, it is unrealistic to expect that only one entity will issue residential certificates. This immediately leads to a conflict.

The Internet infrastructure contains an explicit exception that allows multiple residential CAs to have the same Distinguished Name. But this issue also arises when the same CA wishes to issue certificates under two different policies, and hence under two different PCAs. Because the CA uses the same Distinguished Name for all its certificates, how does one determine under which policy a certificate was issued?

The Internet infrastructure handles these conflicts with a Distinguished Name conflict detection database. Before a PCA may issue a certificate to a CA, it must determine if a conflict exists. It sends a query to the database containing the following information.

  1. A hash value computed on a canonical representation of the CA's Distinguished Name

  2. The CA's public key in the certificate

  3. The Distinguished Name of the PCA

If the first two fields conflict with any other entry in the database, the IPRA returns the conflicting entry. (The two PCAs must then resolve the conflict.) Otherwise, the information is entered into a new record and a timestamp is added.

This mechanism does not ensure uniqueness of Distinguished Names. It does ensure uniqueness of the pair (Distinguished Name, public key), and therein lies the answer to the above-mentioned conflicts. In the residential certificate example, the post office and Quick and Cheap Certs, Inc. have different public keys, so the CA for the certificates could be determined at validation time. In the University of Valmont example, the different public keys used to sign the certificate would indicate under which policy the university issued the certificate.

The Meaning of the Identity

The authentication policy defines the way in which principals prove their identities. Each CA has its own requirements (although they may be constrained by contractual requirements, such as with PCAs). All rely on nonelectronic proofs of identity, such as biometrics (fingerprints), documents (driver's license, passports), or personal knowledge. If any of these means can be compromised, the CA may issue the certificate in good faith to the wrong person.

This hearkens back to the issue of trust. Ignoring the trust required for cryptography to work, the certificate is the binding of an external identity to a cryptographic key and a Distinguished Name. If the issuer can be fooled, all who rely on that certificate may also be fooled.

With the erosion of privacy in many societies comes the need for anonymity. This conflicts with the notion of a certificate binding an identity to a Distinguished Name and a public key. The conflict arises when the anonymous principal needs to send a set of integrity-checked, confidential electronic messages to a recipient and to ensure that the recipient realizes that all of the messages have come from the same source (but the recipient cannot know what the source is).

Anonymous, or persona, certificates supply the requisite anonymity. A CA issues a persona certificate under a policy that makes the Distinguished Name of the principal meaningless. For example, a persona certificate with a principal Distinguished Name of

/C=US/O=Microsoft Corp./CN=John Smith/

does not imply that the certificate was issued to someone named John Smith. PGP certificates can have any name to identify the principal, and can innately provide anonymity in this sense.

Trust

The goal of certificates is to bind the correct identity to the public key. When a user obtains a certificate, the issuer of that certificate is vouching, to some degree of certainty, that the identity corresponds to the principal owning the public key. The critical question is the degree of that assurance.

X.509v3, and the PEM certification hierarchy, define the degree of certainty in the policy of the CA that issues the certificate. If a CA requires a passport as identification, then the degree of certainty is high; if it requires an unsworn statement of identity, the degree of certainty is low. But even high-assurance CAs can be fooled. In the case of the passport, passports can be stolen or forged. So the level of trust in an identity is not quantifiable. Rather, it is an estimate based on the policy of the CA, the rigor with which that policy is followed, and the assumptions that the policy makes.

PGP certificates include a series of signature fields (see Section 10.4.2.2), each of which contains a level of trust.[6] The OpenPGP specification defines four levels [167].

  1. Generic certification of a user name and a public key; this makes no assertions.

  2. Persona certification of a user name and a public key; the signer has done no verification that the user name correctly identifies the principal.

  3. Casual certification of a user name and a public key; the signer has done some verification that the user name correctly identifies the principal.

  4. Positive certification of a user name and a public key; the signer has done substantial verification that the user name correctly identifies the principal.

Even here, though, the trust is not quantifiable. What exactly do “some verification” and “substantial verification” mean? The OpenPGP specification does not define them, preferring to leave their definitions to the signer, but the same terms can imply different levels of assurance to different signers.

The point is that knowing the policy, or the trust level with which the certificate is signed, is not enough to evaluate how likely it is that the identity identifies the correct principal. Other knowledge, about how the CA or signer interprets the policy and enforces its requirements, is needed.

Identity on the Web

Certificates are not ubiquitous on the Internet. Several other means attach identity to information, even though the binding may be very transient.

The Internet requires every host to have an address. The address may be fixed or may change, and without cryptography the binding is weak. Many servers send information about the state of the client's interaction, so that when the client reconnects, the server can resume the transaction or glean information about previous transactions.

Host Identity

Host identity is intimately bound to networking. A host not connected to any network can have any name, because the name is used only locally. A host connected to a network can have many names or one name, depending on how the interface to the network is structured and the context in which the name is used.

The ISO/OSI model [990] provides a context for the issue of naming. Recall that the ISO/OSI model is composed of a series of layers (see Figure 11-2). Each host, conceptually, has a principal at each layer that communicates with a peer on other hosts. These principals communicate with principals at the same layer on other hosts. Each principal on an individual host can have different names (also called “addresses”) at each layer. All names identify the same host, but each one refers to a particular context in which the host functions.

Shoch [918] suggests that a “name” identifies a principal and an “address” identifies where that principal is located. In the context of host identification, the “address” indicates where on a network (and, sometimes, the specific network) the host is located. A “name” indicates in what domain the host resides, and corresponds to a particular address. Although Shoch's terminology is instructive in many contexts, in this context a location identifies a principal just as well as a name. We do not distinguish between the two in the context of identification.

If an attacker is able to spoof the identity of another host, all protocols that rely on that identity are relying on a faulty premise and are therefore being spoofed. When a host has a sequence of names, each relying on the preceding name, then an attacker spoofing the first identity can compromise all the other identities. For example, the host identity is based on the IP identity. Similarly, the IP identity is based on the Ethernet identity. If an attacker can alter entries in databases containing the mapping of a lower-level identity to a higher-level identity, the attacker can spoof one host by routing traffic to another.

Static and Dynamic Identifiers

An identifier can be either static or dynamic. A static identifier does not change over time; a dynamic identifier changes either as a result of an event (such as a connection to a network) or over time.

Databases contain mappings between different names. The best known of these is the Domain Name Service (DNS) [722, 723], which associates host names and IP addresses. In the absence of cryptographic authentication of hosts, the consistency of the DNS is used to provide weak authentication.

The belief in the trustworthiness of the host name in this case relies on the integrity of the DNS database. Section 14.6.1.2, “Security Issues with the Domain Name Service,” examines this issue.

Floating identifiers are assigned to principals for a limited time. Typically, a server maintains a pool of identifiers. A client contacts the server using an identifier agreed on between the two (the local identifier). The server transmits an identifier that the client can use in other contexts (the global identifier) and notifies any intermediate hosts (such as gateways) of the association between the local and global identifiers.

A gateway can translate between a local address and a global address.

In the absence of cryptography, authentication using dynamic naming is different from authentication using static naming. The primary problem is that the association of the identity with a principal varies over time, so any authentication based on the name must also account for the time. For example, if the DNS record entries corresponding to the dynamic name are not updated whenever the name is reassigned, the reverse domain lookup method of authentication fails.[7]

The contrast between static and dynamic naming in authentication is worth noting in light of the different properties described in Chapter 12, “Authentication.” The reverse domain lookup technique of authentication corresponds to checking a property of a principal (what it is) with static naming, because the name is bound permanently to the principal. But that technique corresponds to checking a possession of a principal (what it has) with dynamic naming, because the principal will relinquish that name at some point.

Security Issues with the Domain Name Service

Understanding the centrality of trust in the databases that record associations of identity with principals is critical to understanding the accuracy of the identity. The DNS provides an example of this. The belief in the trustworthiness of the host name in this case relies on the integrity of the DNS database. If the association between a host name and an IP address can be corrupted, the identifier in question will be associated with the wrong host.

Bellovin [73] and Schuba [892] discuss several attacks on the DNS. The goal of these attacks is to cause a victim to associate incorrectly a particular IP address with a host name. They assume the attacker is able to control the responses from an authoritative domain name server. “Control” means that the attacker has control over the name server or can intercept queries to that server and return its own responses.

The attacker can change the records associating the IP address with the host name, so that a query for one returns an incorrect answer for the other. A second technique, known as “cache poisoning,” relies on the ability of a server to add extra DNS records to the answer to a query. In this case, the extra records added give incorrect association information. Schuba uses this to demonstrate how the reverse name lookup can be compromised. The attacker connects to the victim. The victim queries the DNS for the host name associated with the IP address. The attacker ensures that two records are returned: a record with the bogus host name associated with the IP address, and the reverse record. The DNS protocol allows this piggybacking to enable the client to cache records. The cache is checked before any records are requested from the server, so this may save a network request. The third technique (“ask me”) is similar: the attacker prepares a request that the victim must resolve by querying the attacker. When the victim queries the attacker, the attacker returns the answer, along with two records for the mapping that he is trying to spoof (one for the forward mapping, one for the reverse).

Judicious use of cryptographically based techniques coupled with careful administration of DNS servers can effectively limit the ability of attackers to use these attacks. Supporting infrastructure is under design and development (for example, see [314, 315, 316, 317, 318]).

State and Cookies

Many Internet applications require that the client or server maintain state to simplify the transaction process [597].

  • Definition 14–4. A cookie is a token that contains information about the state of a transaction on a network.

Although the transaction can be any client-server interaction, the term “cookie” is most widely used in reference to interactions between Web browsers and clients. These cookies minimize the storage requirements of the servers and put the burden of maintaining required information on the client. The cookies consist of several values.

  1. The name and value are encoded into the cookie and represent the state. The interpretation is that the name has an associated value.

  2. The expires field indicates when the cookie is valid. Expired cookies are discarded; they are not to be given out. If this field is not present, the cookie will be deleted at the end of the session.

  3. The domain states the domain for which the cookie is intended. It consists of the last n fields of the domain name of a server. The cookie will be sent to servers in that domain. For example, domain=.adv.com specifies that the cookie is to be sent to any requesting server in the adv.com domain. A domain field must have at least one embedded “.” in it; this prevents a server from sending over a cookie ending in “.com” and then requesting all cookies for the domain “.com.”

    There is no requirement that a cookie be sent from a host in the domain. This can be used to track certain types of accesses, as discussed below.

  4. The path further restricts the dissemination of the cookie. When a Web server requests a cookie, it provides a domain (its own). Cookies that match that domain may be sent to the server. If the server specifies a path, the path must be the leading substring of the path specified in the cookie.

  5. If the secure field is set, the cookie will be sent only over secured connections (that is, to “https” or “http” over SSL).

The restriction of sending cookies to hosts in the cookie's domain prevents one Web server from requesting cookies sent by a second Web server. However, a Web server can send cookies marked for the domain of a second server. When the user accesses the second Web server, that server can request the cookies marked for its domain but sent by the first server.

Cookies can contain authentication information, both user-related and host-related. Using cookies for authentication treats them as tokens supplied by the browser to validate (or state and validate) an identity. Depending on the sensitivity of the interactions with the server, protecting the confidentiality of these cookies may be critical. Exercise 1 explores this topic in more detail.

Anonymity on the Web

Identification on the Internet arises from associating a particular host with a connection or message. The recipient can determine the origin from the incoming packet. If only one person is using the originating host, and the address is not spoofed, someone could guess the identity of the sender with a high degree of accuracy.

An anonymizer is a site that hides the origins of connections. It functions as a proxy server—that is, it operates on behalf of another entity. A user connects to the anonymizer and tells it the destination. The anonymizer makes the connection, so the destination host sees only the anonymizer. The anonymizer forwards traffic in both directions.

The destination believes it is communicating with the anonymizer because all traffic will have the anonymizer's address in it. However, the anonymizer is merely a go-between and merely passes information between the destination and the origin.

Anonymizers work primarily on electronic mail and http traffic, although the same principles apply to any type of network messages. In what follows, we focus on electronic mail, because electronic mail anonymizers are conceptually simple and demonstrate the techniques used and the privacy issues that arise. The story of the Finnish anonymizer anon.penet.fi is worth recounting, because it was the first widely used anonymizer. Its demise points out the problems in both using and running anonymizers.

This exchange is not truly anonymous. Even though the end parties do not know who each other are, the anonymizer knows who both are.

  • Definition 14–5. A pseudo-anonymous (or pseudonymous) remailer is a remailer that replaces the originating electronic mail addresses (and associated data) of messages it receives before it forwards them, but keeps mappings of the anonymous identities and the associated origins.

The problem is that the binding between the anonymous address and the real address is known somewhere. If that point can be made to reveal the association, anonymity ceases to exist.

The association can be obscured by using a sequence of pseudo-anonymous remailers. Tracing the origin then requires the trackers to obtain information from several sites. But the chain must exist if replies are to be sent back to the original sender. Eliminating that requirement allows true anonymity.

  • Definition 14–6. [330] A Cypherpunk (or type 1) remailer is a remailer that deletes the header of an incoming message and forwards the remainder to its destination.

Unlike a pseudo-anonymous remailer, no record of the association between the originating address and the remailer address is kept. Thus, one cannot trace the message by mapping the remailer's user name to an electronic mail address.

Cypherpunk remailers are typically used in a chain, and messages sent through them are always enciphered [427]. Figure 14-1 shows how this works. Bob composes a message to Alice and then uses PGP to encipher it twice. The first encipherment is for the destination “remailer 2.” The resulting message is then enciphered for delivery to remailer 1. Bob then mails the message to remailer 1. It deciphers the message, sees that it is to be sent to remailer 2, and forwards it. Remailer 2 receives the message, deciphers it, and forwards the message to Alice. Because there is no record of who sent the message to remailer 1, it cannot be tied back to Bob's electronic mail address. Because remailer 2 received the message from remailer 1, it cannot associate any real electronic mail address with the destination address (Alice). This illustrates the reason for using chains of Cypherpunk remailers. Were only one remailer used, it could associate the real sender with the real recipients. Although two remailers, or any number of remailers, could cooperate to do the same thing, in practice such cooperation is very difficult to achieve. Again, the issue of trust in the remailers is central to the success of Cypherpunk remailers.

A message sent to a Cypherpunk remailer. Remailer 1 forwards the message to remailer 2, and remailer 2 sends it to Alice.

Figure 14-1. A message sent to a Cypherpunk remailer. Remailer 1 forwards the message to remailer 2, and remailer 2 sends it to Alice.

But there is still a weakness. Suppose an attacker could monitor all traffic between the source and the destination but the remailers themselves remained uncompromised. Then the attacker could view traffic into and out of a remailer but could not see the association of incoming traffic with outgoing traffic. The goal of the attacker would be to reconstruct this association [238, 427].

Obviously, reconstructing this association from cleartext messages is simple: just compare the bodies of incoming messages with those of outgoing messages. The envelope for the current remailer will be deleted; otherwise, the bodies will be the same. This is the reason to encipher all messages going through a Cypherpunk remailer. In the following discussion, we assume that all such messages are enciphered. The attacks all involve traffic analysis.

If a remailer immediately forwards a message after receiving it, and before any other message arrives (or if processing is guaranteed to occur in order of arrival), then the attacker can determine the association. One approach to obscuring this is to hold messages for random intervals of time; however, unless the interval is greater than the average interarrival time, the delay does not help. (Some remailers allow the sender to specify the length of the interval.)

A second approach is to randomize the order of processing of the incoming messages; implicit in this approach is a delay to allow such reordering. Cypherpunk remailers that do this keep a pool of incoming messages. No messages are sent out until the pool contains a fixed number, call it n, of messages. When the nth message arrives, one of the messages in the pool is selected and sent. This protects the associations against passive attacks. However, an active attacker can send enough messages to the remailer so that all n – 1 messages in the pool are sent. (See Exercise 2.)

A third approach deals with message size. As a message moves through its chain of remailers, each remailer strips off an outside envelope. Thus, the size of the message decreases. The attacker can use this by recording the sizes of messages entering and leaving the remailer. No outbound message can be associated with an inbound message of lesser or equal size. Furthermore, the size of the envelope can be estimated well enough to estimate how much the message would shrink by, thus eliminating more possible associations. To limit this threat, some remailers allow users to append junk to the message and instruct the remailer to delete it. Again, this reduces message size; it does not increase it.

The final attack is also active. The attacker replays the messages many times to the first remailer, which forwards them. The attacker monitors the outbound traffic and looks for a bump in the amount of traffic from the remailer corresponding to the messages sent into the remailer. This associates the outbound path with the inbound path. To prevent this attack, remailers cannot forward the same message more than once.

A second type of remailer, based on ideas from Chaum's paper [184] (which uses the term “mix” to describe the obscuring of information), does not suffer from these problems.

  • Definition 14–7. [237] A Mixmaster (or type 2) remailer is a Cypherpunk remailer that handles only enciphered messages and that pads or fragments messages to a fixed size before sending them.

This hinders the attacks described above. The contents of the incoming and outgoing messages cannot be matched, because everything is enciphered. Traffic analysis based on size is not possible, because all messages (incoming and outgoing) are of the same size. All messages are uniquely numbered, so replay attacks are not possible. Message fragments are not reassembled until the message reaches the last remailer in the chain, so reordering attacks are more difficult. Figure 14-2 shows what a Mixmaster message looks like. Special software is used to construct the messages, whereas Cypherpunk remailers can accept messages constructed by hand.

A Mixmaster message. This is a fragment of a multipart message sent through two remailers. Messages are enciphered using both RSA and Triple DES, and random garbage is added as well as padding. The recipient's address is visible only to the last remailer.

Figure 14-2. A Mixmaster message. This is a fragment of a multipart message sent through two remailers. Messages are enciphered using both RSA and Triple DES, and random garbage is added as well as padding. The recipient's address is visible only to the last remailer.

In practice, messages sent through Mixmaster remailers are untraceable unless the remailers themselves are compromised. In that case, one could track packet and message IDs and make associations as desired. The point is that anonymity assumes that the remailers can be trusted not to disclose associations. The Mixmaster technique minimizes the threat of compromised remailers, because all remailers must track origin, packet, and message IDs, and the final remailer must also track destination address, packet, and message IDs for the sender to be associated with a received message. This technique is not foolproof; if only one message is sent over the network, an attacker can easily determine the sender and receiver, for example. But it substantially adds to the difficulty of matching an anonymous letter to a sender.

The Mixmaster remailer BABEL [427] adds the ability to reply without knowing the identity of, or even the actual e-mail address of, the sender (see Exercise 3).

Anonymity for Better or Worse

Anonymity provides a shield to protect people from having to associate their identities with some data. Is this desirable?

The easiest way to answer this is to ask what the purpose of anonymity is. Anonymity is power, because it allows one to make statements without fear of reprisals. One can even deny having made the statements when questioned, and with true anonymity, the denial cannot be disproved.

Anonymity allows one to shape the course of debate by implication. Alexander Hamilton, James Madison, and John Jay deliberately used the name “Publius” to hide their authorship of the Federalist Papers. Aside from hiding the authors' identity, the “Publius” pseudonym was chosen because the Roman Publius was seen as a model governor. The pseudonym implied that the authors stood for responsible political philosophy and legislation [439]. The discussion of the Federalist Papers focused on their content, not on the personalities of their authors.

Anonymity allows whistleblowers considerable protection. Those who criticize the powerholders often fall into disfavor, even when their criticism is valid, and the powerholders take action. Galileo promulgated the theory that the earth circles the sun and was brought before the Inquisition [464]. Ernest Fitzgerald exposed cost overruns on the U.S. Air Force C-54 airplane and was removed from his position. After several court victories, he was reinstated [164]. Contrast this with the anonymous sources that spoke with Bernstein and Woodward during the Watergate scandal. The reporters combined those anonymous sources (especially one called “Deep Throat”) with public records to uncover a pattern of activity that ultimately led to impeachment charges against President Richard Nixon, his resignation, and criminal indictments and convictions of many government officials. No action could be taken against the sources, because their identities were unknown (and, as of this writing, the identity of “Deep Throat” has not been revealed) [85, 86].

Whether these are benefits or drawbacks depends on whether one is the powerholder under attack or the person attacking the powerholder. In many societies, questioning of authority is considered desirable and beneficial to the society, and in such cases the need for anonymity outweighs the problems, especially when the powerholders will strike back at the critics. In other societies, those who hold power are considered to be more experienced and knowledgeable and are trusted to act in the best interests of the society. In those societies, anonymous criticism would be considered destabilizing and inimical to the best interests of the social order. The reader must decide how anonymity affects the society of which he or she is a part.

Just as anonymity is a tool with which powerholders can be attacked, the powerholders can use it to attack those they consider to be adversaries. Franz Kafka's book The Trial [534], which describes a trial in which the accused does not know the (anonymous) judges, is considered a masterpiece of existential literature. However, as dissidents in many countries have found, anonymous judges are not always fictional. In the United States during the period when Martin Dies and Joseph McCarthy held sway, anonymous accusers cost many people their livelihoods, and in some cases their lives (see, for example, Donner [308] and Nizer [778]).

Anonymity also protects privacy. From this perspective, as we move through a society, parts of that society gather information about us. Grocery stores can record what we purchase, bookstores can record what books we buy, and libraries can record what books we read. Individually, each datum seems unimportant, but when the data is correlated, the conclusions that can be drawn are frighteningly complete. Credit bureaus do this to a degree already, by obtaining information from a variety of credit sources and amalgamating them into a single credit report that includes income, loans, and revolving credit accounts such as credit cards.

This poses three risks to individuals. First, incorrect conclusions can come from data interpreted incorrectly. For example, suppose one visits Web sites looking for information on a proscribed narcotic. One conclusion is that the individual is looking for information on making or obtaining such a drug for illicit purposes, but this conclusion could be wrong. The individual could be a high school student assigned to write a report on dangerous drugs. The individual could be a doctor seeking information on the effects of the use of the drug, for treating a patient. Or the individual could simply be curious. There is insufficient information to draw any of these conclusions.

Second, erroneous information can cause great harm. The best examples of this are the increasingly common cases of “identity theft,” in which one person impersonates another, using a faked driver's license, Social Security card, or passport to obtain credit in another's name [271]. The credit reporting agencies will amalgamate the information under the real person's records, and when the thief defaults, the victim will have to clear himself.

Third, the right to privacy inherent in many societies includes what Warren and Brandeis called the “right to be let alone—the most comprehensive of rights and the right most valued by civilized men” [1034]. Anonymity serves as a shield behind which one can go about one's business and be let alone. No central, or distributed, authority can tie information obtained about an anonymous entity back to an individual. Without the right to anonymity, protecting one's privacy becomes problematic. Stalkers can locate people and harrass them; indeed, in one case a stalker murdered an actress [50]. On the Web, one may have to accept cookies that can be used to construct a profile of the visitor. Organizations that use cookies for this purpose generally adopt an “opt-out” approach, in which a user must request that no information be gathered, rather than an “opt-in” approach, in which a user must expressly give permission for the information to be gathered. If the user is anonymous, no meaningful profile can be constructed. Furthermore, the information gathered cannot be matched with information in credit records and other data banks. The ability to prevent others from gathering information about you without your consent is an example of the right to privacy,

Anonymity for personal protection has its disadvantages, too. Jeremy Bentham's panopticon introduced the notion of perpetual and complete monitoring to prevent crime and protect citizens. The idea that governments should be able to detect crimes as they happen and intervene, or establish that a crime has been committed and act to apprehend the perpetrators, is attractive because of the sense of security it gives citizens. But many, including the Founding Fathers of the United States, regarded this as too high a price to be paid. As Benjamin Franklin wrote, “They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety” [62].

Perhaps the only conclusion one can draw is that, like all freedoms and all powers, anonymity can be used for good or for evil. The right to remain anonymous entails a responsibility to use that right wisely.

Summary

Every access control mechanism is based on an identity of some sort. An identity may have many different representations (for example, as an integer and as a string). A principal may have many different identities. One certificate may identify the principal by its role, another by its job, and a third by its address. A host on the Internet has multiple addresses, each of which is an identity.

Identities are bound to principals, and the strength and accuracy of that binding determines how systems act when presented with the identity. Unfortunately, trust cannot be measured in absolute terms except for complete trust and no trust. Reality dictates a continuum, not discrete values. Understanding how an identity is bound to a principal provides insight into the trustworthiness of that identity.

Anonymity allows a principal to interact with others without revealing his or her true identity. Anonymity comes in two forms: pseudo-anonymity, in which an intermediary knows the true identity (and can relay messages without revealing that identity); and true anonymity, in which no one knows the true identity. The use of anonymity entails a responsibility to use it wisely.

Research Issues

Identification is an area of ongoing research, both in the sense of determining who a principal is and in the sense of determining how to identify that principal uniquely. The ubiquity of the World Wide Web complicates this issue, because different organizations may have the same name. If so, which one can use its name for its identifier?

The issue of naming—in particular, how to represent relationships among principals—is a deep one. One goal is to enable an observer to draw conclusions about relationships from the identities; the PEM hierarchy's use of subordinate Distinguished Names is an example of this. Another issue is delegation—in particular, how one can identify a principal acting on behalf of another. How can one use naming mechanisms to describe such a relationship, and how can one affirm that the claimed relationship is correct?

Anonymity is another important area of research. Designing remailers and other tools to anonymize the senders of messages, and to prevent messages from being traced back to their origins, is of interest.

Anonymity is also an important factor in payment of money over the Internet. Digital cash is analogous to physical cash; once spent, there is nothing that ties it to a particular individual. As commercial firms and organizations sell products over the Internet, digital cash provides a simple way for individuals to purchase items just as they would purchase items from a grocery store. The protocols involved must deal with the need for untraceability, as well as preventing the digital cash from being spent twice (thereby defrauding the repository that issued the cash). Implementing protocols that handle all situations correctly is another area of research.

Further Reading

Representation of identity varies from system to system. The use of roles is becoming a widely studied topic. Bishop [111] discusses implementation of role accounts using standard UNIX account mechanisms. McNutt [687] presents requirements and procedures for implementing roles to manage UNIX systems. Sandhu and Ahn [876] extend the UNIX group semantics to include hierarchies.

Ellison explores methods of identifying a principal through relationships to others [327] and the meaning of a name [328]. Saltzer [866] lucidly discusses the issues and principles that affect naming on the Internet. Several RFCs discuss schemes for naming hosts and other principals on the Internet [41, 65, 445, 446, 706, 1029].

Several cryptographic protocols allow information to be broadcast anonymously. The best known such algorithm is Chaum's “Dining Cryptographers Problem” [186], in which the goal is to determine if one of the dining cryptographers paid for the meal (without revealing which one), or someone else did. Waidner and Pfitzmann [1030] point out that Chaum's solution could be disrupted if one of the cryptographers lies, and present an algorithm (called “The Dining Cryptographers In the Disco”) to detect it.

Chaum [185] first described digital cash. Okamoto and Ohta [786] list desirable properties for digital cash systems and present a protocol that meets them. Other protocols include Brands' protocol [144], electronic checks [187, 189], CAFE [130], and NetCash [693]. Smart cards can carry digital cash [29, 188, 190], and some European banks are using this technology [388, 661]. Von Solms and Naccache note that the untraceability of digital cash makes solving certain crimes more difficult [946].

Bacard [51] discusses the basics of remailers. Mazières and Kaashoek [668] describe a type 1 remailer in operation. Cottrell [237] cites the Cypherpunk remailers, and a discussion on the Cypherpunk mailing list, as the inspiration for the development of Mixmaster remailers. His discussion of attacking Mixmaster and remailer sites [238] is perceptive. Engelfriet (also known as “Galactus”) [330] presents technical details of anonymity on the Web.

Exercises

1:

The Web site www.widget.com requires users to supply a user name and a password. This information is encoded into a cookie and sent back to the browser. Whenever the user connects to the Web server, the cookie is sent. This means that the user need only supply a password at the beginning of the session. Whenever the server requests reauthentication, the client simply sends the cookie. The name of the cookie is “identif.”

  1. Assume that the password is kept in the clear in the cookie. What should the settings of the secure and expires fields be, and why?

  2. Assume that the name and password are hashed and that the hash is stored in the cookie. What information must the server store to determine the user name associated with the cookie?

  3. Is the cookie storing state or acting as an authentication token, or both? Justify your answer.

2:

Assume that a Cypherpunk remailer reorders messages. It has a pool of n – 1 messages at all times. When the nth message arrives, one of the n messages is selected at random and forwarded. An attacker floods the server with enough messages to force the n – 1 messages in the original pool to be sent.

  1. Assuming that the message to be sent is chosen according to a uniform random distribution, what is the expected number of messages that the attacker would have to send to achieve this goal?

  2. How can the attacker determine when all the messages originally in the pool have been sent?

3:

Consider a scheme that allows a recipient to reply to a message from a chain of Cypherpunk remailers. Assume that encipherment is used throughout the chain.

  1. Bob selects a chain of remailers for the return path. He creates a set of keys and enciphers them so that only the key for the current remailer is visible to that remailer. Design a technique by which he could accomplish this. Describe how he would include this data in his message.

  2. How should Alice's mailer handle the processing of the return address information?

  3. When Bob receives the reply, what does it contain? How can he obtain the cleartext reply?

4:

Give reasons why root should not be able to change the audit UID on a UNIX system, and give reasons why it should. Which reasons sound more persuasive to you?



[1] If the path is an absolute path name, the first directory in the path is the root directory, which has a well-known inode number (typically 0, 1, or 2). If the path is a relative path name, the first directory has the same inode number as the directory in which the process executes.

[2] Interestingly, some systems allow root to change the audit UID after assignment.

[3] When compiled into a binary format, in many cases the key is implied by the data structure.

[4] Actually, a single CA issues multiple types of certificates. Conceptually, the single organization is acting as though it were multiple CAs.

[5] Passport photographs are notoriously poor, making visual identification questionable unless conditions are optimal.

[6] This is encoded in the signature type field of the signature.

[7] This failure does not necessarily mean that the DNS has been compromised. Some systems store the forward and reverse lookup information in separate files. Updating the forward lookup information file does not change the reverse lookup information file. Unless the latter is updated also, the stated problem occurs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.71.106