Swarmcast

An implementation that has a more explicit focus on swarm distribution of content split into many different parts is Swarmcast (www.opencola.com, and developer site opencola.org since August 2001). It relies on central server mediation to administer content for a network of distributed clients.

Swarmcast is promoted as a high-speed content distribution system for large files, rather than a file-sharing system. Publishers of very large content files with expected great demand can efficiently and reliably distribute these files to many users with low bandwidth costs. Users experience that they can download large files faster and more reliably because the swarming technology adapts to demand both by scaling storage allocation and replicating content closer to the points of demand.

The Swarmcast solution has two key components:

  • Swarmcast Gateway, which is a commercial piece of server software. It in effect publishes content as parts to swarms (or “meshes”) of nodes, and later distributes each received user download request to the nodes available to serve up all the parts of the requested file.

  • Swarmcast Client, which is the software that end users can download free of charge. They install it on their local machines to enable download of swarmcasted files.

The technology can be customized to work with other applications, such as download managers, software updaters and, as it happens, file-sharing applications.

It’s implicit in much of the available documentation around Swarmcast that the primary market focus is on specific “content providers” who need to distribute information to many distributed users. This focus is evident in a number of design decisions, including the heavy reliance on a central gateway server.

This reliance clearly provides the commercial hook for the venture, although from the technology point of view, it also contributes a number of performance and control benefits—more properly seen as tradeoffs—as long as one is not overly concerned with the issue of a central point of failure, for example.

An open-access version of Swarmcast was run by the OpenCola company and offered a trial server gateway at no charge to demonstrate what it is like to swarmcast files. The intent was to show how much bandwidth (and money) the technology can save. The company is lately reported as moribund, but the Swarmcast software is available as open source from www.sourceforge.net/projects/swarmcast.

Table 8.8. Swarmcast component summary
Component Specific forms Comments
Identity Internal representation based on IP. Many-to-many peer connectivity is mediated through central gateway server.
Presence Not applicable, although gateway tracks online nodes, their requests, and stage of download. Users go online only to download content and their clients, then form informal meshes based on common content download.
Roster Internal, maintained by gateway based on current content request. Gateway directs online nodes with applicable packets to send them to other requesting peers.
Agency Gateway, client. Functionality can be tweaked by supplier.
Browsing Central content list based on content provider publishing to central server. There is no permanent distributed peer storage as such. Sharing occurs at packet level during download.
Architecture Distributed, transient swarms of single kind of client software. Content-centric. Clients request server-published content using unique hashed keys.
Protocol HTTP, transport-layer agnostic. Similar to streaming media, little negotiation between nodes.

The component summary for Swarmcast is in Table 8.8. Like Mojo Nation, Swarmcast is essentially a monoclient architecture because of the gateway server, and the summary closely corresponds to the actual network characteristics. Client extensions and customizations are possible for different environments.

How It Works

Swarmcast is based on how Web clients normally request files from HTTP Web servers. It provides a content-centric architecture, where content originally is published and resides on a central server. Each file is assigned a unique hashed key (SHA-1), which ensures data integrity and name independence, and allows authentication and privacy functionality. The chosen key system incidentally also allows integration with other key-based distributed networks such as Freenet.

The requirement for fast and reliable data transfer to many users is addressed in two main ways: providing multiple p2p paths to reduce bandwidth demand on each, and an advanced redundancy encoding of the parts being transferred. The intent is to aggressively push demanded content out into a local network of interested peers, freeing central bandwidth and minimizing the data path for most nodes.

In its basic implementation, Swarmcast technology doesn’t really maintain a persistent p2p network at all. It instead assumes a normal, central server as the ultimate source for all content. This central source is all that users are aware of, and it can be in a remote location, or perhaps mirrored in traditional ways.

Users browse and request particular files through their respective clients, which then connect to the server through the gateway to download. Initially then, it’s the central content server that provides the download. The file is split into identifiable encoded packets by the gateway and sent to the clients. Single requests to a particular file are therefore not much different from normal client-server downloads.

Leveraged Bandwidth

Things get more interesting when many users request the same file. The gateway then cycles through the file’s packets, distributing them randomly among the requesting clients. Each node thus receives only a portion of the original packets, but the client software is also made aware of some of the other nodes receiving packets from the same file. As these nodes receive packets, they also rebroadcast them to each other. File packets are rapidly swapped back and forth between nodes in the mesh, at full LAN and client capacity, unconstrained by server bandwidth.

Bit 8.10 Swarmcast forms transient networks (meshes) based on demand.

As users connect to the gateway server for their own downloads, their clients are drafted into temporary p2p meshes to serve received parts of the same content to other users who want to download this data at the same time.


Recall that the distribution was made random by the gateway. Nodes check received packets to see if they are useful in reconstructing the requested file. The gateway’s distribution makes it very likely that swapped packets help complete the download. The packet-encoding scheme ensures that packets can be received in any order, and when enough useful packets have been received, the file is decoded.

Once a node has reassembled its requested file, it may leave the mesh. This is optional, however, and users can keep their computers as part of the mesh for some time after they’ve finished downloading. By staying in the mesh, these users make it easier for others to download the same content, because their complete copies of downloaded file packets are all available for immediate rebroadcast.

This file-swapping behavior between nodes progressively offloads the central server as demand increases and moves the bulk of transfer loading out into the mesh of clients interested in the same content. What the gateway server does, therefore, is leverage the bandwidth of other clients that are still downloading or have just finished downloading content.

In cases of high demand for the same files, and thus many nodes forming the download mesh, the growing availability of other nodes that can supply parts of the same content will eventually saturate any single client’s download capacity. At this point, once it has served a complete copy of the content, the central server no longer needs to answer any requests because they are fulfilled by the mesh. The network’s evolution towards peer serving is indicated by Figure 8.7.

Figure 8.7. Requesting clients form informal swarms or meshes based on common content. Within each mesh, clients rebroadcast packets to other peers requesting the same information, thus offloading the main server.


Load Adaptation

Dynamic meshing is the basis for the scalability and load adaptation in Swarmcast. As demand increases, so too does the distributed availability of this content. Given enough clients, they eventually all download from parallel sources at their respective peak throughput. Meanwhile the server is free to handle requests for other content.

New meshes automatically form around nodes that download other content, again progressively offloading the server. Fallback behavior occurs as the mesh contracts with a decreasing number of connected clients, until they can no longer fill the demand, after which the gateway once again passes requests to the central server.

A side effect of this solution is that the same content server can be used to fulfill requests from both Swarmcast users and non-Swarmcast ones at the same time. The former’s clients connect to the gateway and reap the benefits this allows, while the latter use their traditional clients to connect directly to the main server as before.

Bit 8.11 Swarmcast is a plug-in solution.

The technology is implemented as a self-contained interaction between the two components—gateway and client—that provide an alternative, parallel data path.


In this respect, Swarmcast is a plug-in p2p technology that doesn’t require retrofit measures to existing servers and clients. It simply adds another, more efficient distribution channel that offloads the central server. As a side effect, this makes content distribution more efficient even for the old-technology clients, because the new or migrated clients won’t be competing for the same bandwidth.

Redundancy Encoding

The second important feature implemented in this technology is the use of Forward Error Correction ( FEC) encoding to boost reliability in the transfer. The encoding has the additional feature of making Swarmcast largely independent of network protocol.

Many network applications use some form of FEC to provide fault tolerance. Figure 8.8 shows the block-encoding principle employed in Swarmcast, Redundant FEC ( RFEC). Recovery is possible even when a large number of packets are lost during transfer. It doesn’t matter which packets are received, only that a minimum number of correctly decodable ones arrive, in any order.

Figure 8.8. Redundant forward error correction encoding works by allowing recovery of original data as long as a sufficient number of redundancy encoded packets are received and correctly decoded.


Although the redundancy introduced by such encoding, on the face of it, would increase bandwidth demand for the same content, it isn’t so in practice. First, the method allows a more lax transfer protocol with less messages between nodes. Download can be treated almost like streaming media—declared complete as soon as the client has received enough verified correct data packets to reconstruct the file. For this reason, there is usually no need to send all the encoded data, only a sufficient subset—that is to say, until the client says “Enough!”

Reliable transfer without redundancy and error correction otherwise normally means that packets need to be individually acknowledged, and all must be received correctly. If any packets are missed or damaged, the client must ask for them to be resent, perhaps many times if the network is heavily loaded. In this context, deteriorating network performance tends to rapidly spiral into even worse conditions as the number of resend requests escalate with falling performance.

An analysis or some textbook numbers based on a single-source multicast situation can illustrate the bandwidth gains from using FEC. Suppose that 10,000 users are receiving a multicast transmission and that the average rate of packet loss is 10 percent. In traditional reliable protocols (such as TCP), all of the packets are broadcast—sent and resent—until all recipients have acknowledged each and every one. Assuming an independent loss spread for the different users in this example, each packet ends up being transmitted on average approximately five times.

Compare this bandwidth-consuming situation with one where the transmission is arbitrarily partitioned into 100,000 packets and an FEC is used to add 25,000 redundant packets to the transmission (this is a configurable relationship). All packets are now sent just once. Then, it’s a sufficient requirement that no user loses more than 25,000 packets (in other words, a maximum of at most about twice the average loss rate) for all users to receive enough of the encoding to reliably decode the message.

Let’s restate the two fundamental facts about FEC transfer for emphasis:

  • Packet reception does not have to be acknowledged!

  • No resends are necessary!

Total bandwidth usage at the same loss rate as the original retransmission situation is here factor four smaller, without even considering the packet-handshaking overhead that can be saved using a much simpler protocol.

The simple multicast example was based on an ideal FEC encoding, which turns out to be too slow in practical implementation for larger files. A randomized, “irregular” FEC ( IFEC) encoding scheme therefore was devised by Michael Luby and Michael Mitzenmacher to allow faster implementations.

IFEC allows encoding and decoding in a time proportional to the length of the encoding, multiplied by a small constant (typical value around 5), which is independent of the number of redundant packets. Encoding and decoding times of standard FEC codes by contrast are proportional to the length of the encoding multiplied by the number of redundant packets—clearly a much larger factor (by many orders of magnitude) for all but the smallest files.

In general, the irregular algorithm can be designed for any chosen trade-off between reliability margins and acceptable size/time overhead. Typical IFEC decoding times for multimegabyte-size files is less than a second with current processor speeds.

IFEC Applied to Swarmcast

The mesh of Swarmcast nodes available to send arbitrary subsets of received packets can be seen in one sense as a variably space/time-distributed multicast source. Hence, the IFEC solution is directly applicable to the situation. It doesn’t matter from which nodes individual packets come, or in what order; the IFEC encoding ensures that correct reassembly is possible at the client, regardless.

In fact, no explicit interaction at all between the downloading users and the packet-sending node is even required—which is why FEC-based encoding is so frequently used for streaming media and true multicast situations. Variations also turn up in many p2p solutions to avoid loading bandwidth with heavy handshaking protocols that verify reception of each packet at each recipient.

The Swarmcast gateway can orchestrate a cyclic broadcast of constituent packets for any file from a server, and an arbitrary and changing mesh of receiving nodes until all requests for that file are satisfied.

Reliability in Swarmcast is actually improved as demand increases, because more nodes are available to send more parts of the file in parallel. Whenever possible, Swarmcast design avoids reliance on a single source for download—a design principle that goes well with transient, distributed networks.

Minimal Knowledge Solution

Swarmcast is an example of the minimal knowledge agent approach. The gateway has minimal knowledge of the contents of the packets, of where they’re going, or of the state of the network in general. This approach turns out to be good for efficiency, and some of the reasons why are explained in the following list:

  • Randomization introduces deliberate chaos to combat the chaos of unstable, rapidly changing network conditions. An easier way of saying this is that if you’re not attempting to enforce any particular order, then external changes often won’t matter—disorder can be ignored.

  • Minimal state-aware logic makes the technology workable in situations with rapidly changing state. Simple systems are often self-organizing—in other words, you don’t need to explicitly care about the details.

  • No feedback is required, which promotes stability. Out-of-sync feedback, caused by disorder and delays, worsens the conditions it’s supposed to help. Hence, even high-latency networks are not a problem.

  • Ignorance of global state means not trying to negotiate between hosts, thus saving sometimes considerable communications overhead and delay.

  • Ignorance of state allows a small and fast implementation. There’s less to worry about, so there’s less code to worry about. Implementations focus on the primary task, moving data.

The bottom line then for Swarmcast technology: an aggressive “send-and-forget” attitude that proves surprisingly robust even in adverse conditions. In most cases, performance actually improves with loading conditions that would bring traditional distribution methods to a standstill.

What Swarmcast in particular doesn’t try to address, however, is dependence on a central content server. This puts the technology slightly on the margins of this p2p survey, because the peer clients, from the user point of view, are merely recipients of centrally published content, and users really see only this server, not the peers. I chose to include it because of the way its simple-minded focus on distribution allows an undistracted analysis of distributed swarming technology. A similar swarmcasting strategy is in fact employed in a popular file-sharing client, e-Donkey2K.

In the next chapter, Freenet shows another, in some ways similar approach, this time designed for adaptive distribution of persistent content published by anyone, without any central server, with the added bonus of secure anonymity.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.248.149