Appendix A
Three Types of Redundancy

One thing that characterizes the Internet, and to some extent perhaps the power grid, are the levels of redundancy. This is a tricky concept. I have held that redundancy that is designed in, and thus designers are aware of it, is essential and safe, but that redundancies that are added later, as safety devices, are often the source of system failures. (Perrow 1999) Their potential interaction with other failures is not foreseen by designers and not comprehensible to operators. Scott Sagan, however, goes further and persuasively argues that even designed in redundancies are a major source of accidents. He applies this to the nuclear power plant security issue. (Sagan 1996; Sagan 2003) Aside from increasing the interactive complexity of the system—always a risk—redundancies produce a false sense of security. Tentatively, I would propose that this is true for most systems, but not for heavily networked systems. In these, nodes can be connected through multiple, and thus redundant, paths; new paths can grow quickly; and malfunctioning units that could threaten the whole can be quickly, even instantaneously, isolated, severed from the Web, and, if desired, only temporarily. This is true of both the power grids and the Internet.

Furthermore, there is more than one type of redundancy. (I am indebted to Allesandro Narduzzo of the University of Bologna for suggesting the two additional types of redundancy and providing me with the following map of the Internet. The responsibility for errors, however, is solely mine) We normally think of two identical components, one being redundant and only used if the other fails, or two different power sources, the usual connection to the power grid and a standby diesel generator. This could be called replacement redundancy. But the Internet appears to have redundancy in two additional forms, involving the replication of two parts of the system’s structure.

A computer sends a message that includes the address (tier four, the lowest). It goes to an Internet router, tier three, which notes the address, and sends it to a local service provider, such as a university, corporation, or commercial service provider (tier two). At tier one, the top level, called the Internet “backbone,” the tier two service providers are linked. This is a hierarchical structure, since there are four levels, but given its enormous size no other system could get by with only four levels. Furthermore, growth of the system is not produced by adding levels of hierarchy, as happens in most organizations, in order to direct and coordinate and service the additional users. Instead, using the advantages of distributed systems, growth is achieved by replicating the three tiers above the final user and introducing links between the nodes within each level, called peering links. Servers at tier two and routers at tier three can be added, but no new levels.

Duplicating the connections between peer routers at the lowest coordinating and routing level does two things: it provides greater point-to-point available bandwidth—multiplying the paths—but it also introduces greater reliability, though only as a secondary consequence of adding new potential paths. This reliability is not achieved through replacement redundancy, since all the paths are always active and available to all the users, but through redundancy by replicating the contents of servers and routers. Let us call this form replication/link redundancy, to capture both the replication aspect and to capture the multiple links available between “peers.” The link redundancy feature is still present in the RAID feature, the redundant arrays of inexpensive disks (sometimes called redundant array of independent drives), in case the main system breaks down. But the reliability of link redundancy emerges as a property of a distributed system (semi-independent self-activating units or functions that are distributed throughout the system, capable of performing tasks that otherwise would be restricted to a central unit). The architecture is replicated in units that are always active. But even more important than the replication of the architecture, allowing multiple pathways for packets to travel, is the replication of the packets and their addresses.

This constitutes an additional form of redundancy beyond replacement redundancy. But there is a third form. “Link redundancy,” Narduzzo writes, “is the most evident effect (especially because it is analogous to other systems, such as power grids, which are based upon connections), but it is not the most critical one. By replicating its architecture, the Internet also duplicates some information, in particular, the memory of its own structure.” (Personal communication) This is called mirroring, so we will label it mirroring redundancy. It goes beyond duplicating connections—duplicating the architecture—because the content is mirrored, rather than just the connection. For example, messages are broken up into many packets which are sent by different routes by routers. If one packet is lost along the way, the servers that reconstitute the message will know a packet is missing because the “header” of any packet that arrives has the address of all the other packets—the mirroring aspect—and this will show which packets are missing, and where they came from. The server will tell the computer that sent the packets to send this one again.

Domain name system (DNS) servers are essentially databases that store the updated lists of Internet addresses (“IP numbers”) that the routers need to have. This information is replicated at each node of the network. Two DNS servers may be serving two quite separate multiple streams of data that are physically independent. But they can mirror each other such that their internal states are identical; they both have all the addresses. This mirror redundancy means that if one server is disrupted, or a packet it is serving is faulty, the data streams to it can be processed by the other server without error, and virtually instantaneously.

The mirrored servers and the DNS servers are always active and are physically dispersed in the network and can be accessed from the closest site and thus be more quickly available. When a part of the network fails, as it often momentarily does, the mirrored information is still available, even if it takes a bit more time to find it. Reliability is a secondary effect, the primary purpose of this aspect of the architecture is speed—distributed information is more quickly accessed than centralized information.

These two forms of redundancy, replication/link, and mirroring, do not appear to introduce the hazards that I and especially Sagan have detected in the normal form of replacement redundancy. They are always online, and thus do not decay or need testing; they do not give the operator a false sense of security or induce inattentiveness; and it may be that the additional complexity they contribute to the system is not a source of unexpected interaction of failures. (One troubling aspect of this system, however, has already been noted: There is a subtle additional hierarchy in the Internet— the thirteen root servers and the ten top-level domain servers. A hacker attack, perhaps by a terrorist, on these that was timed to coincide with an attack on another infrastructure, such as telephone service, a banking center, or explosive or toxic concentrations, could magnify the damage and prevent recovery. Network organizations are not immune to attack of this sort, despite the high reliability and flexibility they achieve while still being enormous in size and highly efficient.)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.202.177