The Concept

So, just what is “peer-to-peer” (P2P, p2p, peer2peer) anyway?

Unfortunately, the answer has been obfuscated by all the recent hype. “P2P” became a marketing label, or for some just a way to signal being attuned to the latest trends. To have an intelligent discussion about p2p and its real and potential uses, we need to be clear about what we mean—a definition of sorts. Otherwise this book will be no better than the ephemeral hype found elsewhere.

Bit 1.1 Some p2p technology is more about marketing than product.

A particular implementation or technology may or may not be in the p2p space at any given time, depending upon what mood the marketing people are in.


The term “peer-to-peer” was in fact used as far back as in the mid-1980s by local area network (LAN) vendors to describe their connectivity architecture. The concept applied to computers predates this by another couple of decades.

In essence, peer-to-peer simply means equal communicating with equal. That’s pretty basic. So what was all the fuss about?

Peer-to-peer became a “buzz” concept in 2000, so much so that the label was (and often still is) gratuitously applied to anything that smacks of connecting or sharing. Suddenly p2p was the hottest subject around. That such a popular and controversial service as Napster took the public limelight was both a help and a hindrance to spreading the word, and popular perception suffers a distortion of the basic concept as a result. Music label corporations were driven to bring lawsuits, intense debate raged on both sides over the right to copy and the state of copyright, and all was due to this free p2p thing.

By early 2001, Napster claimed over 60 million registered users busy swapping terabytes of music files. Clearly, something significant had happened to bring at least one aspect of p2p to general public awareness—everybody knew about Napster p2p. Napster now attempted a skin change to become a legitimate commercial music outlet instead and began to filter the file sharing.

But by then, it seemed that nobody knew what p2p was any more. The thing is, p2p as a technology had developed from so many sources for so many different reasons, that there was little consensus about what it was supposed to be. Without a suitable, dominant definition, it’s little wonder the term can seem hijacked.

Bit 1.2 There’s far more to p2p than popularized MP3 file swapping.

As a well-known implementation of peer technology, Napster is used in several places to illustrate some p2p concepts, but this book is not about how to swap MP3 files in the post-free-Napster era.


Anyway, the buzz and hype are already receding—even some p2p conferences have been renamed to something with more perceived buzz appeal for 2002, perhaps a good thing. Fortunately, what’s left are the practical implementations, the solutions and how people use them.

Beyond the hype, perhaps p2p is best described in terms of the intent of its supporters, as a set of technologies targeted at better utilizing networked resources. However, this definition is likely too broad for the purposes of this book, so we’ll seek one with more practical focus in the following sections.

The Killer P2P Application

Hyped or not, and however unfocused the term, “p2p” quickly became much more than just one computer talking to another. Exactly what, has a lot to do with the available infrastructure, the context in which we apply the technology, and ultimately what people want to do with it.

Swapping music files turned out to be a killer application for p2p largely because that’s what many people wanted to do. The coincidental availability of MP3 compression made this practical even for users with low-bandwidth connectivity, which was the majority of users at the time. Napster came to prominence by providing easy-to-use clients that could rapidly reach critical mass in terms of a user base large enough to be interesting. That Napster came to dominate the public conception of p2p is perhaps just a random conjunction of opportunity and timing.

An earlier large-scale file-sharing solution for general content, Scour, had already shut down after intense litigational pressure before the p2p term hyped fully. At its height, the Scour servers indexed around 40 TB of content—all kinds of files, not just MP3s. It was a huge community second only to Napster in number of users, and to this day, one can see nostalgic postings about “the good old days of Scour”.

It’s important to realize that although Napster quickly came to dominate the popular mind, it was only one p2p implementation among many. Like most such implementations, it was incomplete, idiosyncratic, and tightly focused on the particular service it provided—often to the exclusion of other, perhaps more interesting functionality. Napster achieved high visibility due to the controversial nature of freely swapping (copying) commercial music files. It was successful because of the way it could leverage off the new, widely available music compression technology, thus ensuring popularity among a broad class of new users with relatively low bandwidth. Numerous clone clients quickly arose, many of which remained compatible with the Napster protocol to increase client viability and so indirectly promote their own competing networks.

Another p2p implementation, instant messaging ( IM), is at least as widespread as file sharing—but without all the buzz, perhaps because it became established before the controversy. However, IM exhibits a so-far enduring split into several proprietary and incompatible networks, mainly due to active resistance to interoperability by the currently dominating actor, America Online (AOL).

IM’s fragmentation means that further incompatible implementations might yet arise and gain a significant market share. Through some combination of external factors, one of these implementations might reach a critical threshold in number of users and come to overshadow the existing systems. A proprietary IM client bundled with a common operating system might be one possible way this could happen.

This issue of critical threshold, very relevant to what becomes a killer application, can be seen as a result of Metcalfe’s Law.

Bit 1.3 Metcalfe’s Law: Network value rises by the square of the number of terminals.

Given the choice between joining a large existing network with many users or an incompatible new one with few users, new users will almost always decide that the bigger one is far more valuable.


This relationship is an important factor in the viability of any new technology, not just networks. The ratio is geometric, not linear, ensuring a runaway effect—the technology with the larger number of users soon comes to dominate. Metcalfe’s Law is often used to explain the phenomenal growth of the Internet, often recast as: The value of a network grows by the square of the processing power of all the computers attached to it. Every new computer adds resources to the Net in a feedback spiral of ever-increasing value and choice.

Increasing at least a thousand times every five years, the trajectory of Internet traffic indicates that any user at any given time is confronting just one tenth of one percent of the potential expected in the network only half a decade on. This kind of growth is absolutely phenomenal.

The growth effect actually has little to do with how “good” or “smart” the technology is. Paramount is how many users it gains compared to other solutions. The more it gains, the more it will gain in future—it’s that simple. This is a good reason to include compatibility with established standards in new technologies, or risk forever being at best just a marginal niche player.

One can quibble about the exact value factor and how to measure this, but the principle is fundamental—in both directions.

Bit 1.4 Fragmenting a network dramatically reduces the perceived value.

Divisive effects that break up a larger network make the value of the parts taken together far less than the value of the whole.


Speaking of directions, we should note that the value of fully bidirectional information flows is greater than when data flows in only one direction. Not only does it seem reasonable to factor in bidirectionality of connections into the value relationships between nodes, we might even state that Metcalfe’s Law only applies to bidirectionally connected nodes. This qualification is especially relevant when considering the added value that new nodes bring to a network.

We can briefly examine some known connectivity factors on the Internet that would seem to confirm the divisive effect of unidirectional data flow:

  • Transient nodes and dynamic IP numbers. Such nodes can reach the network but are usually unreachable by other nodes and so add little to the network’s aggregate value.

  • Intranet nodes and network address translation (NAT). The computers on an intranet LAN behind a NAT are generally not directly reachable from outside, thus they add little value to the larger Internet.

  • Firewalls. Computers behind firewalls can usually not be contacted from outside; so as far as the outside network is concerned, they don’t exist.

In all these cases, although the computers can see and use the Internet’s resources using the normal protocols, the data flow is predominantly unidirectional, inwards. The net effect by their addition to larger networks is to fragment, not add value to the whole. Therefore, Metcalfe’s Law as applied to the Internet should be seen as only valid when adding “open” nodes, freely reachable from other nodes.

For example, it’s estimated that Web search engines index an ever smaller fraction of the total information held on the Internet as a whole, perhaps by now less than a tenth. At least part of the reason for this huge information store remaining unmapped is due to the fact that not all Internet-connected sites allow themselves to be indexed or even contacted directly from outside their particular barrier.

On the other hand, because p2p has at its core the concept of peer always being able to contact other peers freely, networks based on p2p will follow the value-added law. As later chapters show, p2p technologies are designed to actively circumvent the unidirectional barriers mentioned earlier, allowing otherwise insular nodes to participate freely in two-way conversations with other peers.

The Bandwidth Factor

Any discussion of killer applications—past, present, or future—must also consider the issue of average available bandwidth. The kind of application that is interesting and practical for a large user base critically depends on both its bandwidth requirements and the average bandwidth available to the majority of potential users.

Bit 1.5 Gilder's Law: Bandwidth grows at least three times faster than computer power (both with regard to total network capacity).

This is a rough empirical average. Applying Moore’s law for computer power (said to double roughly every 18 months), it means that network bandwidth can be expected on average to quadruple every year.


Gilder’s Law is especially important during the threshold period when the technology is introduced. Thus it makes sense that the early popular p2p applications were the low-bandwidth e-mail and IM clients. File sharing in general might have been interesting, but large files were too costly in terms of the bandwidth (and online costs) this represented for the average user.

Later when higher bandwidth became more common, and when compression brought the size of a typical CD-quality music track down to a manageable average size of 4MB, the viability of general p2p file sharing networks became manifest for the average user. This development was the window of opportunity that opened for Napster, along with the perceived added value of multi-peer access.

Further increases in bandwidth mean ever-lower costs for transporting ever-larger amounts of data. This situation leads to new opportunities for other kinds of applications, services on demand, and distributed computing—many of which we can’t even begin to imagine yet. The trend increasingly favors p2p solutions.

Bit 1.6 TheBlack Box Law: Networks evolve towards high-bandwidth, dumb pipes, with intelligence spread to the machines at their peripheries.

As the cost of transporting information decreases with higher bandwidth, the most optimal configuration of network resources evolves towards a distributed, p2p one.


The Distribution Factor

Two problems turn up with the way the Web is currently organized with content servers; both are related to the demand for a particular piece of information.

  • The more popular any information is, the less available it becomes as the demand saturates the capacity of the servers and network paths that provide it.

  • Each user who accesses information will inefficiently consume a bandwidth corresponding to the entire content, often duplicating concurrent transfers of the very same information to other nearby users.

What p2p technology enables in this context is a combination of (often dynamically) distributed storage to meet peak demands without saturation and the replication of frequently requested information in locations nearer larger groups of users. Thus distribution is an important, albeit not necessarily defining characteristic of p2p technology. The implementation chapters describe some distribution examples.

A Common Denominator

The best place to begin the quest for a p2p definition is to find a common denominator for the p2p concept, even though some have come to question the relevance of a term that seems to have such a catch-all scope in common usage.

In that light, it’s useful to temporarily reinterpret the term p2p as “person-to-person communication” and try to visualize the essential core of this concept and how it plays. Imagine, therefore, a room full of people. The thing most people want to do, most of the time, is communicate.

A simple diagram, such as that in Figure 1.1, can illustrate the situation. This trivial analogy is more useful than it may at first seem, because it moves our focus away from technological details and reminds us of some of the basic functionality that a p2p implementation must provide to support conversations. Consider: We generally prefer a face-to-face conversational mode without intermediaries, so that we are aware of the other person’s presence and current state of receptivity, and we approach or face that person to “establish a connection” using particular social cues or handshaking protocols to start the conversation. Seen this way, many of the technical issues in p2p clients examined later take on an easy-to-grasp immediacy.

Figure 1.1. P2P interpreted as “person-to-person”. Person A in a room full of people wishes to communicate with B. A therefore locates and approaches B and establishes a direct connection.


Bit 1.7 People talk—with each other, directly.

Conversation is pretty basic behavior. Almost quintessentially human. And very p2p.


Conversations Between Equals

One of the distinctive characteristics of any p2p system is what is often called direct end-to-end connectivity between equals.

This one-on-one property is distinctive, not exclusive; it does not preclude one-to-many broadcasts, nor have we at this stage specified what the end-points represent—people, machines, or software. Although not sufficient in itself, one-to-one connectivity can still be considered an essential hallmark of “pure” p2p.

Bit 1.8 P2P assumes some form of end-to-end connectivity.

Whether this connection is entirely unmediated, or partially assisted by centralized services, is irrelevant. At root, it’s a “private” communication channel in a distributed network context.


In that light, early computer networks were eminently p2p at the machine level. For machine A to communicate with machine B, it established a direct connection. The connectivity technology used was secondary, either LAN or modem. The machines in the precursors to the Internet connected to each other by dial-up modem to exchange information. Later, workstations used LAN technology to maintain persistent local networking and again the initial model was p2p.

Today, the technology is different, and the physical connections with global reach are not fully p2p in themselves. Peer connectivity is instead accomplished in a different abstraction layer: a protocol suite known as TCP/IP that can connect together many separate internets on different machines and operating systems into a seemingly seamless whole—the Internet. We return to the Internet model in later discussions, because the experiences gained there are still surprisingly relevant to the “new” peer technologies.

Conversations Are Dynamic

When we see networks today, we commonly think of them as persistent connections, machine to machine. However, human conversations are dynamic, transient, and changing. These characteristics are true of modern p2p networks as well; their connectivity is constantly changing as nodes come and go.

Bit 1.9 Modern p2p connectivity is usually transient, not permanent.

Each p2p node “lives in the moment” and has an essentially random selection of other nodes as its neighbors at any given time. This topology may be slow to change, or more rapid, depending on other factors.


Mutable connectivity is easy to realize in the abstracted network, where although the physical connectivity is fixed, the protocol level of communication connectivity can assume any topology at all as long as abstract connections can be formed between arbitrary member nodes. This factor may have been a major consideration in making the communications field so expansive. Removing the requirements for end-to-end wiring for each connection, as was the case for the telegraph, a whole new kind of infrastructure could be created out of virtual connections. The cost of establishing new connections becomes essentially zero.

In our simple person-to-person network example, there are many possible connections, as shown in Figure 1.2. In general, the number of connections for n nodes is (n2-n)/2. The math shows that possible direct connections increase geometrically as the network grows. When connections are physical, such as in computer networks, it quickly becomes unrealistic to provide for all of them. Instead, we use the concept of addressing and routing over a much smaller finite set of connections, such as seen in Figure 1.3 for the simple model. The full connectivity is then abstracted to some form of protocol layer.

Figure 1.2. Basic p2p connectivity implies each node can connect directly to any other. The number of possible connections increases rapidly with the number of nodes.


Figure 1.3. In practical networks, direct p2p connectivity is usually consigned to an abstract routing layer, so that the physical connections (solid arrows) can be made much simpler and tractable.


Many topologies are possible for networks. Good design balances between simplicity, many redundant physical paths, avoidance of single points of failure, and given constraints. This subject is revisited later, especially in the context of the different implementations in Part II of this book, because it forms one of the critical factors for issues of scalability, performance, and reliability. Each design must assign priorities and make appropriate trade-offs to become practical and affordable.

It must be noted that for the Internet, any talk of a “direct” connection is a virtual construct only, conceptual rather than factual, because the basis for Internet connectivity is packet-switching router technology. Thus, a data stream sent from A to say E is chopped up into many discrete packets, each of which can be sent a physically different route whenever alternative choices exist along the way. The packets are reassembled in the correct order at the destination, possibly individually requested again and resent if errors are detected or some packets never arrive.

Network Identities

So far, the focus is on physical and addressing connectivity, in a word: infrastructure. There is also the issue of addressing even in the simple person-to-person model; we use personal names. Named identities are crucial to conversations. That’s probably why names were invented in languages (human-to-human protocol) in the first place, it’s easier and more consistent to call someone “Joe”, than “you over there by the table” or explicitly specify a coordinate set (x, y, z).

The perceptive reader may realize that representational addressing raises the issue of directory services to translate between naming and actual location. Internet IP addressing is fine for physical connections that persist. Internet domain addressing is better as a human-readable abstraction to represent such addresses, and it also enables a certain kind of relocation to occur behind the scenes.

However, even with the addition of dynamic Internet services to perform on-the-fly translation based on more transient connectivity, this kind of machine-oriented endpoint addressing is not well suited to the needs of p2p conversations. Keep this in mind, because it crops up again in the outline of different p2p models in Chapter 2.

If “direct” connectivity was all there was to peer networking (as it used to be known), we would be hard pressed to explain the subsequent buzz. However, most things go through cycles as they develop, various characteristics waxing and waning in importance. Only in the full historical context do we understand that some developments came as a reaction to something that happened earlier.

For our purposes, it’s therefore useful to briefly discuss some of the historical points in the development of early networks that had peer aspects.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.183.24