Usage Cases

In the application-specific chapters, the focus was mainly on the actual technology. In practical deployment, other issues tend to come to the fore. Some are inherent in the user community that build around the application, while others depend on the hosting network for the application.

Either way, they can come to dominate the practical deployment situation for any peer technology, at best rendering relative merits between competing implementations irrelevant by allowing only one solution, at worst making any p2p deployment impossible. So, although the basis for the following is in individual deployment cases, I have tried to generalize according to the patterns identified.

Realize that all networks in existence today started out small. Over time, some grew, gaining more users and resources, until a critical mass of users, content, or services was reached to make it genuinely useful (and popular). Somewhere along the way, critical problems were identified and addressed, and the simple designs evolved into more complex, and sometimes less p2p ones. Nevertheless, the simple solutions can be more than adequate in particular situations, as long as one is aware of the potential problems when (rather than if) the deployed network grows.

One useful by-product related to usage patterns is when existing peer implementations spin off entire libraries of peer-related code and best practice, which encourages both interoperability and modular design. FastTrack is an example of this, explicitly supporting Kazaa, Grokster and Morpheus, but with an easily licensed code base for rapidly creating other implementations. JXTA is another, even if its p2p aspect is only one component in a larger vision. This kind of architecture and open protocol benefits not only the applications that utilize the common infrastructure, but also the infrastructure itself, by increasing the rate of adoption, improving effectiveness, and providing more resources and services in the form of the new applications.

Usage Patterns and Problems

Each class of p2p application generally shows patterns and problems specific to that usage, so it perhaps can be valuable to examine them from this perspective and not so much as individual applications deployed in a particular environment.

We can take up some general scenarios to illustrate the concept, roughly grouped according to the peer-technology categories already defined. What’s particularly interesting is the way each has given rise to a specific usage pattern.

Instant Messaging

IM clients are commonly deployed in environments where people are online for extended periods of time, usually over LAN or broadband. This doesn’t mean that transient users, such as for dial-up, are rare by any means, just that their participation in live conversations is comparatively less frequent. In some sense, they form a subset category and rely heavily on any store-and-forward relaying feature provided by the particular technology deployed.

The usage pattern for IM thus divides into roughly three types:

  • Extended conversations, one on one, where both individuals are devoting more or less constant attention to reading and typing. It can reach the point where the users grow weary of the delays and move to a voice channel, either phone or Internet. It’s notable that the newer IM clients anticipate this and make it easy to place calls from within the interface. While video connections are often found intriguing, the small and jerky webcam images together with the perceived out-of-sync voice channel can prove distracting to serious conversations unless visual support is really needed.

  • Short messages separated by extended periods when the session window is ignored. This frequently happens when attention at either end is mainly elsewhere. This exchange is more immediate than e-mail and conveys some important presence cues, yet is undemanding about relative timing of responses. This context may well be where IM functions best, and it’s frequently associated with file transfer or other ancillary activities.

  • Relayed messaging, when the users are rarely online at the same time. The usage is similar to the previous, except lacking in presence, and naturally requires a client-server model that implements store-and-forward. Extensions to this are message relays to other text-based technologies, such as SMS in cellular phones.

Connectivity is usually mediated through central servers, and is in the virtual sense one-to-one. IRC-like chatrooms and one-to-many modes might be available.

The identifiable practical problems with the popular clients are dealt with in Chapter 6 and include messages not being encrypted, dependency on a client-specific central server (external to corporate firewall), advertising relayed from third-party servers, unauthorized user profiling, proprietary protocol, and perhaps distracting features as the clients strive to become some kind of personal information manager.

A big issue is the lack of interoperability, where the so-far largest block of users, those using ICQ-AIM, are jealously walled in by AOL. The solution for the user is either to install multiple IM clients, or to hope that some multiprotocol client can keep in step. In either case, you need to define separate identities for each, and when searching for another user be prepared to fail due to the multiple directories and inconsistent identity mapping.

One thing to consider is whether the individual situation makes it meaningful to use a typed-message client at all. Even if motivated by other factors, the geographical spread of some companies can mean that the most likely benefactors of an IM system can’t use it effectively because of time zone difference. A few hours skew can be enough that work hours when live messaging is likely rarely coincide, and in that case, e-mail is the better medium.

File Sharing

The history of the Gnutella network provides much material for studies of both usage patterns and network behavior under varying circumstances, but we must also look to the others for a more complete picture. These networks demonstrate some of the evolutionary aspects in making content sharing more efficient and reliable.

The usage pattern for file-sharing p2p is quite different from IM. For one thing, the other peers tend to be perceived as anonymous content repositories, so unless you specifically know who’s at a particular node and want to chat, you don’t really care if anyone is even sitting at the keyboard. File sharing is user asynchronous and frequently unattended for long periods.

There’s generally a core of constantly online nodes, but some networks have a built-in drift in topology (as in Gnutella) so your system won’t keep in touch with them indefinitely in any case. Other implementations encourage more permanent neighbor contact for various reasons, perhaps because of transaction rules and reputation systems. Local subnets will tend to be pretty static and always online, with largely equal client-LAN bandwidth, unlike many public nets that have a significant transient user base with widely varying bandwidth.

On public sharing systems, the “freeloader” issue always seems to come up sooner or later, and with it come various desired changes to the original open-to-all network behavior. In network terms, this means that the common resources are often heavily loaded by users who contribute nothing to the network, instead only causing congestion and problems for other users.

One popular modification in Gnutella was to deny access to Web-based Gnutella clients, in other words users who were not running a sharing application. Many clients are also configurable to drop connections or not share files with peers who themselves do not share or have further connectivity. A more sophisticated evaluation and filtering of peers is possible when reputation and trust systems are implemented.

Another strategy seen is to implement connection profiles to favor higher-bandwidth connections over slower modem connections. The result is that slow users are pushed to the outer edges of the network and no longer present a bottleneck to the core network maintained by a second tier of super-peers. An extension to this adds reputation tracking and other logic to evaluate peers. Another is the “reflector” principle of letting high-capacity nodes act as relay servers for slower or transient ones—similar to the common ISP-server-to-client hierarchy.

With these kinds of improvements, simple Gnutella-like networks can provide adequate performance in many situations, despite their intrinsic scalability problems. In more controlled environments, the usability focus is probably more on the search functionality, and how to optimize it for the particular content of interest. The related issue of storage strategy is also important—atomistic or distributed.

Some networks use dedicated central servers to index content while others use a hybrid architecture based on a large number of super-peers for this purpose. In this way, search performance can be made much faster than atomistic query broadcast or query route methods. In a similar way, distributed content-publishing systems also often rely on peer-cluster services to index and track content, and these services can be made more effective if they are located on super-peers. Otherwise, fully distributed searches are characterized by relatively long waits for search returns. In the public sharing networks, it seems that super-peer clusters provide a popular middle way between Napster-style architecture and the fully atomistic model, although there is some concern about the added vulnerability (legal as well as technical) that a super-peer model demonstrates compared to the latter.

File sharing can assert some unusual network demands. It’s common to note in educational settings that uplinks are almost always oversubscribed—the sum of client bandwidth requirements is greater than Internet access bandwidth. Interpreted: lots of network node overhead, plus all those music, movie, and software transfers going on. Upgrade the access lines, and the users or clients increase their transfer throttles accordingly. Some statistics of dorm and lab usage at university campuses suggest that some few to 10 percent of the users stand for more than half of the bandwidth requirements even under tightly regulated conditions—in some cases, all the way to saturation. Similar if not so extreme usage patterns might apply to file-sharing clients used within a corporate setting that are allowed access to external networks.

Adaptive bandwidth management might be required, along with more targeted measures to stem inappropriate use or abuse. In practice, most campus and corporate systems were designed to and rely on statistically multiplexed “real life” loading expectations. These expectations build on the usual client-server usage patterns of Web surfing and FTP downloads. Introducing a significant amount of p2p traffic can seriously skew this loading, much as Internet dial-up for a time seriously skewed traditional telephony loading of POTS switching.

Bit 12.1 Granting free access to large and powerful unused network resources, even as a quasi-regulated p2p resource, is not without its risks.

As the adage goes, nature abhors a vacuum. Free Internet resources have a way of attracting usage to saturation much faster than the average system administrator can envision, and also of attracting misuse and abuse.


Bandwidth Management Issues

It might seem a little off topic, but practical p2p must factor in bandwidth management and the different ways that usage can be regulated. The observant user will note, for example, that this kind of awareness exists in the design of many peer technologies, especially atomistic file sharing, where clients have user-configured limits for upload, download, and general bandwidth usage, the sum of which is generally less than the user’s total available bandwidth. Without this kind of overall control, it turns out that the p2p network as a whole suffers. In practical terms, this means that the p2p user can see the application as a background process and continue to use other, less demanding clients (such as Web browsing or IM) much as before.

From the administrative point of view, the following are various ways a host network might be bandwidth and usage controlled. Doing nothing generally means that either the network saturates to denial-of-service levels or the budget breaks under the pressure of bandwidth procurement to satisfy the essential services.

  • Rate-limiting or blocking specific ports suspected of causing or having the potential to cause overloading. It is in effect the firewall approach, but directed inward to the LAN users. Any good firewall should filter and selectively block in both directions—the defining point is what.

  • Rate-limiting total user traffic, which is difficult to do sensibly, even if scheduled or carefully rules-managed. It will assuredly prove a pain to some legitimate usage and to unforeseen requirements over time. There are router-based solutions for rate-limiting (for example, from Cisco), with a view to providing assured QoS levels to particular services or user groups.

  • Category-limiting user traffic, which might for example give priority to particular services (for example a main server, or NetBIOS sharing). The solution thus lets everyone else use what bandwidth is left over, first-come, first-served. (A converse, less crippling approach is to enhance only some traffic with accelerated content-delivery technology.)

  • Policy enforcement, which we see lately with broadband providers who actively police their customer networks for infringements on no-server or banned-application rules. In other words, they analyze traffic for specific patterns that identify applications which load the uplink (LAN to WAN) channel in unusual ways.

  • Manual or automatic traffic monitoring, which is meant in a more general way to simply identify unusually “hot” users. Administrators then take measures based on what the analysis reveals for consistent loaders. So-called tail-trimming is probably quite common as an automated response here, meaning that the high-demand users will consistently see their traffic clipped first when bandwidth capacity saturates.

  • Redefined network services. It’s not unheard of for network providers to reconfigure critical services such as local DNS to deny (or “blackhole”) requests that might trigger high-demand bandwidth usage.

  • Aggressive caching for some kinds of traffic. While this lowers demand for outside bandwidth, relying on LAN bandwidth, usually higher, to carry some repetitive loading, it can seriously cripple certain clients.

  • Natural filtering, which is an ironic way of describing the policy of just letting the existing bottlenecks in the network see to it that runaway bandwidth hogs choke. In rare cases, administrators might even retrofit lower-bandwidth components in critical paths. More commonly, a multitier transport is implemented so that some users (as determined by usage patterns or payment plans) end up with cheaper and less reliable routing options, or the available bandwidth partitioned in some way.

Chances are that one or more of these measures will be encountered by unsuspecting deployers of a new peer technology, discovered when the clients trigger network provider reprimands—or perhaps more typically, work sporadically or not at all.

It might be noted that there is an awareness in universities that p2p is something that’s here to stay. Campus provisioning of connectivity, in particular in the context of Internet2 for American universities, must consider in this opinion various peering strategies to be able to meet the bandwidth demand from p2p applications. The awareness extends to the realization that while legal enforcement measures in the short term can reduce p2p client deployment on campus, the real solution is not to ban the applications, but to make the network work better and more cost-effectively at accommodating this kind of loading.

Publishing and Retrieval

The publishing to peers situation is sort of the reverse of file sharing (retrieval), although it can easily be combined with such applications. The main issue is whether publishing is from a single or centralized source, or whether it is a matter of arbitrary peers publishing to peers.

Implementations in earlier chapters address both sides of this issue.

Commercial distribution solutions tend to be the single-source variety along the lines of Swarmcast (although cost far more) or rely on Akamai-style outsourced, adaptive replication of server-hosted content. Which, depends on the usage pattern. Either way, publication tends to be to a central store and replicated on demand.

If the demand for given material peaks on publication, the Swarmcast model works well because most clients will be requesting the same documents at the same time, so publication becomes largely identical to distribution. If heavy usage is a more random demand, like at a typical Web site, then aggressive replication across many content servers with adaptive caching will be best.

Fully distributed publishing to a decentralized store seems rare as a commercial solution. However, the Freenet-based Karma purports to do so, and it seemed as if Mojo Nation was also trying to sell in its version of swarm storage. Encrypted decentralized storage is in theory an attractive revenue-generating solution for corporations with much unused disk space, but management is no doubt very hesitant to begin selling storage in that way. Likely customers might first be other, affiliated companies to spread the costs of reliable storage and backup. On the other hand, companies in the market for outsourced storage are more likely to opt for something like the Akamai solution.

The other factor to consider is geographical spread and overall connectivity. Akamai and others specialize in serving highly distributed customers in the global perspective, ensuring high-capacity replication and near-to-client adaptivity. Some of these performance benefits can be hard to duplicate without the same dedicated external resources. Home-grown solutions tend to work best within local LANs, where the company has direct control of all aspects of connectivity. Then, distributed search might also make better sense, because it’s more likely that the entire network will be reachable with queries or that indexing mechanisms will work as intended.

It can be critical to see how publishing and retrieval meet in the area of search functionality. Search of a central store can be indexed and made both fast and efficient, but vulnerable to single points of congestion and failure. Distributed storage and search may have advantages, but unfortunately is often quite slow and may suffer from significant scope constraints. Most corporate settings probably choose just to leverage existing centralized storage as the least disruptive and demanding change. There’s some indication that only new infrastructures seriously consider a fully p2p architecture for their connectivity and content needs.

The requirements for persistent storage and content management in general, including governance, can vary immensely. Swarmcast peer distribution is essentially transient because the peer meshes form on demand, although once downloaded to peers, content can remain available for subsequent requests. However, no peer is required to ensure that content remains stored or available; the central server is the reference (and final permissions arbiter). Fully decentralized storage can be harder to manage and requires some redundancy replication to ensure that content is always available even if some peers are not online. Governance is then also harder to implement, unless carefully built into the meta- and content-tracking services.

Libraries and public archives are potential adopters of this technology.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.0.53