Freenet

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Freenet

Freenet (www.freenetproject.org) is defined as an open, democratic content storage-and-retrieval system that cannot be controlled by anyone, not even its creators. Called a “ censorship-proof” network, by design no one can know the location of a specific piece of information. Consequently, no person or operator can be held accountable for storage of any particular piece of content.

Originally conceived by Ian Clarke as an information publication system similar to the World Wide Web, the Freenet Protocol ( FNP) improves on the Web server model by its inherent characteristics of content being distributed, decentralized, and encrypted. Its self-organizing behavior automatically replicates and moves information nearer to points of high demand, thus distributing loading as well. FNP is open, allowing anyone to implement software for it.

Freenet is often called one of the “big threes” of p2p in name recognition (along with Gnutella and Napster), which is rather remarkable considering that it’s still essentially a prototype network for debugging basic functionality issues.

While in many respects much of the public attention has been more about the concept than any real implementation, an evolving functional network has existed in the wild since March 2000. The open source (Java-based) model is rapidly being developed by a growing number of volunteers, and the code base was at version 0.4x in late 2001, with release 0.5 likely out by the time this book is published.

Freenet information, sources, compiled software, community and discussions are currently found at Sourceforge (freenet.sourceforge.net), where Ian Clarke is the project coordinator. Although Java implementations are preferred for cross-platform compatibility, any developmental language is accepted as long as the software remains compliant with the protocol. The basic Freenet software requires an installed recent (v1.1) Java runtime environment (a JRE such as Sun, IBM, Kaffe, etc.).

The summary component list is provided in Table 9.1.

Concept of Freedom

The concept of Freenet has to do with absolute freedom of speech, among other issues, because the technology is based on the idea that published content must be impossible to remove or censor, by anyone, ever.

Ian Clarke says that early thinking about the Freenet design, around 1998, working towards his thesis in 1999, was based on a kind of philosophical interest in addition to the technical interest. He says he felt concern about increasing moves to impose censorship on the Internet, which were then nowhere near as serious threats to the free flow of information as he has seen in the past few years. He also found it curious that while nature’s designs are invariably decentralized and damage resistant, humans almost always design highly centralized and very vulnerable systems.

Table 9.1. Component summary for Freenet.
Component	Specific forms	Comments
Identity	Unique message descriptor ID derived from current IP number and port, content keys from hashed descriptors.	No active support or direct naming scheme for active nodes, but only tracking of message ID and content keys. Users may be anonymous yet verifiable with digital signatures.
Presence	Not applicable for user, assumed 24/7 for nodes.	Client/server infrastructure, not tied to personal presence.
Roster	N/A	Nodes know only of their nearest connection neighbors.
Agency	Publishing and retrieval, other extended services.	Network nodes act on behalf of requesting user via client, who can’t directly access distributed content or services.
Browsing	Optional, then usually of Web-type pages with hyperlinks.	Client-bundled plug-in allows browsing content with Web browser. Application gateway gives access from Web.
Architecture	Atomistic, unmapped, with persistent information identity.	Reach bounded by hop count. Topology and storage locations change with demand.
Protocol	Open, HTTP, encrypted.	Hashed key identification of content used for retrieval, public-key for signature.

Therefore, a core thrust to the effort to implement Freenet is an active decision to deploy technology that would withstand the increased efforts to both monitor and censor the Internet in dubious ways that simply wouldn’t be tolerated if applied to more conventional means of communication, such as the postal service or the telephone networks. Hence also the decision to make encryption an integral part of the system, which has several benefits.

Besides safeguarding data from manipulation, encryption (and associated digital signatures) also allows a kind of trust system to evolve. Thus, anyone can prove from the signature that particular documents originate from particular, verifiable sources, even if these choose to remain anonymous or to use a (possibly collective) pseudonym.

The key issues that Freenet development addresses are summarized as follows:

There is no centralized control or administration of the network. It’s by design not even possible.
Anyone can easily publish information, even if they lack permanent Internet connectivity or identity.
Both publishing authors and readers may at their discretion remain anonymous. More accurately, they are anonymous until they explicitly move to divulge their real identity in some way.
Despite full author anonymity, documents can be verified as originating from a particular source.
In practice, it’s impossible to forcibly remove information from a Freenet network. In fact, it is difficult to even localize particular information—even operators can’t determine what content is stored on their local server.
Availability of information should increase proportionately with demand, move closer to the demand, and decrease to release resources when it isn’t.

Application of this technology can be varied. The obvious one is publishing information in a way that can’t be censored. It’s not feasible in Freenet to discover the true origin, storage locations, destination, or content of files passing through the network, and thus it’s difficult for node operators to determine or be held accountable for the actual physical contents of their respective nodes.

Freenet additionally enables anyone to have a free Web site, without space restrictions or advertising banners, even without owning a computer. Needless to say, this kind of viewable content is protected in the same way as stored files.

Less obvious perhaps is how the adaptive caching and availability behavior in practical terms translates as increased bandwidth for high-demand content, making it easier to reliably distribute software updates for Linux, for example. Although not a dedicated distribution technology, as for example Swarmcast (discussed earlier, in Chapter 8), the two share some functional characteristics in their “swarm” adaptability to supply content demand.

How It Works

It can take a while to fully grasp how Freenet works and the power of the design. This is partly because Freenet is actually several different concepts tightly integrated into a synergetic whole that is more than just the sum of its parts.

The primary concept is to be a fully decentralized network, where each Freenet node acts freely and independently. The Freenet system operates at the application layer and requires an existing secure transport layer. It provides anonymity only for Freenet file transactions, not for general network usage.

Consider a request for information by some user. The information has a persistent identity in the network: the document key we use to retrieve it. The document must obviously be stored somewhere, and while storage is location-based after a fashion, we can’t know in advance which location.

The request is initially made to a local node—or more specifically, to a node the user knows and trusts. In practice, this often means a copy of Freenet server software running on the user’s own computer, but it could also be an arbitrary Freenet node contacted over the Internet. If this node already has the information, then the relevant document is retrieved, decoded, and presented to the user. End of story.

When the information is not available locally, the node forwards the request to another node that it considers more likely to have that particular piece of information. This determination of “most likely” hinges on a stack model that stores information about recently requested documents and the immediate address they came from.

Bit 9.1 Only the first contacted node knows the user’s identity and location.

Subsequent forwarding of requests to other nodes does not pass on any user information, nor the identity of the node where the request originally came from.

To resolve request routing when a new node starts up, and therefore doesn’t have any transfer history on which to base its request routing decision, dummy entries with random keys are put on the stack for each other node it knows about. Subsequent transfers allow the node to make better routing decisions later.

When each document is originally published, it is assigned a generated identity key, which in some meaningful, rather complex way, reflects a concept of “closeness” based on how its content relates to other “similar” documents that are then stored in close proximity in the network. In practical terms, the routing decision is made on the basis of the key numerically closest to the requested document’s key, and the request is forwarded to its associated node address.

As documents pass through any node in the network, they leave behind on the stack this minimum trace: the document key and the address it just came from. For a transient time, the actual document content itself will also linger on the stack—this kind of “stickiness” is often called “lazy” replication. On the stack, it rises or sinks depending on weighted factors such as its size (that is, resource consumption), its request frequency, and its source (proven by trust-signature), until it is displaced by more recent documents. Keys and addresses, on the other hand, expire less rapidly and so have a greater “depth” and persistence in the stack. They then act as pointers to “previous” nodes where the content might still be stored.

Bit 9.2 Freenet’s routing choices and performance improve over time.

This hypothesis is confirmed by actual network behavior. Successful requests align upstream pointers so that subsequent requests can find both related content quicker and the same content cached at closer locations.

The immediate stack-determined destination might not be where the document is stored, but it’s likely to be “closer”, and presumably that node in turn will be able to forward the request to a node even closer, and so on, until the request finally reaches a node that has a copy of the document. Requests are assigned unique identity numbers for tracking purposes and to allow suppression of copies in loops, and they leave stack traces in order to allow responses to be passed back along the chain.

If for any reason a node can’t forward a request to its preferred downstream node, the node having the second-nearest key is tried, then the third-nearest, and so on. For practical reasons common to all query-forwarding systems, requests are assigned a hops-to-live ( HTL) value, typically 20 to 30 hops. This value is always decremented for each try to ensure that the request eventually dies no matter what. This value may seem high, but unlike in the TTL discussion in Chapter 7, Freenet queries are routed. A node that runs out of paths to try reports failure back to its upstream neighbor, which will then try its second choice, etc. This is known as a steepest-ascent hill-climbing search with backtracking.

If the HTL limit is exceeded, no further nodes are tried at any level. Instead, a failure result is propagated back to the original requestor. Nodes may unilaterally clamp excessive HTL values to reduce network load. They may also time out pending requests to keep message memory free.

A document that fulfills a request is returned to the node from which the request came, and it is forwarded back along the chain until it reaches the request origin node. For each step, this document’s key, previous node address, and content is copied onto the forwarding node’s stack.

Bit 9.3 Freenet requests and hence users are in practical terms anonymous.

Because each node knows only the address of the previous forwarding node, it is very difficult (albeit not totally impossible) to identify the original requesting node. This proxy chaining suffices to guarantee the anonymity of Freenet users, even though it’s really only a form of obscurity granted by an indeterminate number of proxies.

Successful requests for information thus automatically result in it being propagated across the network. In addition, it at least for a while moves nearer the requests. As noted, the request passes through a number of computers in order to reach a computer that stores a copy of the document, and when that is passed back, further copies are stored on all the computers that participated in the request.

Bit 9.4 Nearness to a node is totally unrelated to geographical proximity.

It’s easy to forget at times that the notion of distance in a p2p network relates to the number of connection hops between computers essentially distributed at random across the globe, not any real geographical distance.

The overall result of demand propagation is that the more requests for a given piece of information, the more widely distributed it becomes for the duration of those requests. Conversely, as the frequency of requests goes down, then the content copies gradually expire and the distribution contracts, freeing network resources.

Another important result behind the scenes is that, as nodes process requests, they create new routing table entries for previously unknown nodes that supply files, increasing overall useful connectivity. This helps new nodes to discover more of the network through direct links to data sources, subsequently bypassing the intermediate nodes used on first requests.

Bit 9.5 Freenet connectivity rewards successful content supply nodes.

Nodes that successfully supply data will gain routing table entries throughout the network and be contacted more often than nodes that do not.

The Storage Model

Somewhere, there must be an ultimate source cluster for each published document. However, clustering is an automatic side-effect of the caching described earlier, working in conjunction with the closeness algorithm for keys. Perhaps surprisingly to some readers, there is no a priori assignment of content distribution.

In a newly started Freenet, the random seeding of each new node’s stack, described earlier, means that the initially stored documents will have a fairly random distribution, because key routing will be random. As more documents are inserted, stored information begin to cluster based on “nearness” to other keys, based on the gradual alignment of node traces and caching of content—it becomes self-organizing.

Routine node interactions therefore spontaneously let certain nodes emerge as what we could call authoritative sources for particular ranges of key values. This is because they will simply be referenced most often for data with keys close to documents they initially store and have fulfilled requests for. Requests for close documents that aren’t currently stored simply end up fetching copies from elsewhere. Frequent requests subsequently maintain cached copies of the content in this closeness range, or keyspace, while copies elsewhere just tend to expire sooner.

There is actually a balance in force between this clustering effect and the way a frequently requested document is replicated across larger segments of the network. The latter spread tends to break up tight clusters and promote the formation of similar clusters in different parts of the network. Because the “closeness” relation that defines a cluster is based on a hash with no real correlation to semantic meaning, and hence user popularity, clustering is also unrelated to possible request demand for semantically related data. This is one of the mechanisms that reduces the risk for bottlenecks in the system, because it’s highly unlikely that there will be demand peaks for many documents from the same cluster at the same time.

This emerging keyspace specialization depends on random references held by other nodes and is thus impossible to self-determine. It evolves from the overall behavior of the network, might change over time, and is as noted earlier completely decoupled from any notion of geographical location or clustering of nodes. Were that not the case, then a given node could be subverted to specialize in a particular keyspace and subsequently manipulated to deny access to that content.

Hashing the namespace into numeric keys resolves the problem of namespace management, which otherwise would easily compromise the fully distributed model because administration implies some form of centralization. However, hashing does introduce a few problems of its own, as the current mechanism handles document names on a first-come-first-served basis. Human-readable document names have a strong tendency to cluster around particular ranges, which results in an unnecessarily high risk for key contention between different documents. A little bit of democracy is implemented in the hashing assignment to prevent what the developers call DNS-style abuse of the mechanism, but solutions for better implemented “management” are being discussed, and expected to be deployed in perhaps a 0.5x release.

Leveraged Retrieval

The routing concept of closeness also enhances retrieval functionality, because requests are handled more effectively than linear search (that is, visit all nodes) and much more efficiently than broadcast search (request in all directions). It’s a node-directed, single-path search that’s always improving as more requests get fulfilled.

In general, numerous studies of request propagation as routed in Freenet all confirm that in the average case, a request for information will require log(n) hops to retrieve information from a network of size n. This is reasonable efficiency even in very large networks—increasing size just means it gets better compared to other methods. Add to this the way Freenet storage adapts to successful request history.

The secret ingredient in closeness efficiency is the self-organizing structure of this kind of adaptive content storage. To begin with, content replicates closer to the nodes where it is requested, leaving a trace on intervening nodes, which both reduces retrieval latency and improves the efficiency of future searches. A second important effect is that insertion of new content occurs at nodes that are “closest” in terms of key comparison, which by itself optimizes future search and retrieval for even the very first attempt to find new content.

One beneficial consequence is that network message load remains relatively low and adaptively well balanced, and Freenet therefore scales better both for increased content and for more nodes. Such automatic load balancing incidentally makes the network as a whole quite resistant to common kinds of DoS or “flooding” attacks. The nodes act to concentrate request fulfilment to the nodes closest to such an attack, thereby localizing the damaging effects to just a few nodes.

Note that true search based on content or actual names as understood in other p2p storage models is not yet implemented in Freenet. Published content must be advertised or circulated outside the system, with potentially detrimental effects on user anonymity—e-mail lists, newsgroups, Web sites, special servers. Users have created external databases or lists that associate particular names, content keywords, or content extracts with assigned Freenet keys. Such compilations are assuredly searchable but often out of date. Retrieval in Freenet is strictly by known numeric key—a user must first obtain or calculate a file’s hashed binary key, then send a request message to a trusted node specifying that key and suitable HTL value.

Granted, search capability to Freenet can easily be added just by running an ordinary hypertext spider like those used to search the Web—content is browsable. Although this solution might seem attractive, it conflicts with the design goal of avoiding centralization. Instead, users are encouraged to create their own special interest index compilations within Freenet, like the original indexing of the Web.

To preserve anonymity and security, Freenet’s protocol recently defined In-Freenet Key Indices, a system that provides a way for data publishers to advertise inserted keys within the system so that other people can get to the data. There’s a bootstrap problem here, however, which always returns in various guises in any fully decentralized system. In this case: How do you find the index?

The answer to that conundrum is yet to be implemented. For now, the developers can only recommend following the Freenet mailing lists, existing Web key lists, or Freenet Web sites to spot references to any node described there as an index server and to note down its associated Key Index Identifier. You can then use it with the normal Freenet tools to publish your keys to the list and search for others. Users are also encouraged to set up their own (probably special-interest) index keys, start collecting keys, and that way help build up the system.

A major and well-publicized source for off-Freenet index-related information is www.thalassocracy.org—in particular the section for keyindex. You can both browse and search for Freenet content and learn the keys associated with it.

Some clients try to incorporate their own indexing system. For example, the Windows client Frost (found at jtcfrost.sourceforge.net) maintains a compressed key-index that is shared among all online Frost clients. In contrast to the in-Freenet-indexes often used by other clients, Frost stores additional information: file size, date of last access, and checksum, along with a useful “key-score” to automatically delete “dead” keys. This last avoids creating huge indexes that might contain only a few valid keys. A sample search result in Frost is shown in Figure 9.1.

Figure 9.1. Screenshot from the Freenet Frost client showing a list of files matching a search pattern, based on the built-in index shared among all active Frost clients. Note the hash key for each file, which is the identity needed to retrieve the file from the distributed storage.

Besides key-retrieval, the user can also use FProxy, which is a Freenet plug-in included with the Freenet software and switched on by default. The plug-in allows the user to access Freenet through the host system’s normal Web browser. In addition, gateways can allow access of Freenet content from anywhere on the Web, such as Freenet CGI Request Client (FCRC), which is included in the Freenet distribution.

Bit 9.6 Browsing and accessing content on Freenet is slow.

Compared to the Web, Freenet’s usual latency—minutes rather than seconds—is the price paid for anonymity and encryption in the current implementation.

In practical terms, these solutions mean that one can surf Freenet in much the same way one might surf the Web, which is a convenient way to explore content that has been published as hyperlinked documents. Because of this visual similarity to normal Web browsing, links from Freenet pages to the normal Web are formatted in a special way to ensure that the user is made aware of the transition into “insecure” Web space. Following a link to an address located outside Freenet thus invokes a gateway-generated click-through page, as seen in the capture detail in Figure 9.2, reminding the user that Freenet anonymity ends here.

Figure 9.2. Freenet links to the normal Web are formulated in a special way and invoke this click-through warning page, here seen from a local gateway.

Can content ever be removed? No, and this is a core design issue. But it can “expire” and thus automatically disappear from the network.

Content Expiration

Content can—and will—in Freenet’s design expire through lack of interest; when nobody wants to retrieve it over a period of time. If nobody wants it, then a reasonable question is: Why spend resources to store it?

A further constraint is due to the finite storage allocated by each node. When this allocated space is filled, further storage of more current content will displace least accessed content on the node’s stack. This displacement is, of course, not fatal for a distributed document requested reasonably often, but it is a real source of attrition that becomes a factor in expiration for rarely requested content.

It’s also inevitable with TTL and timeout constraints that some rarely requested content won’t be found even when it exists somewhere. If some document is consistently not requested or found, Freenet’s stack caching model ensures that sooner or later the last copy will automatically expire, no matter where it is.

Is this a bad thing? No, not necessarily.

Bit 9.7 Removal of expired documents is automatic, intrinsic node behavior.

Because removal depends on internal stack timers, maintained independently on individual nodes, the culling process cannot be controlled or manipulated based on file identity, and culled files are never identified when they are removed.

Freenet’s stance is bluntly pragmatic in this context: As a network, it is not intended to be an eternal archive, nor is it going to try to assign content priorities. Instead, it just lets popular demand transparently and democratically determine what will remain available. As frequently requested content is replicated to optimize search, availability and bandwidth utilization, it also makes its retention almost certain.

Most storage systems implement some form of culling policy to remove “useless” data, either automatic or manual, based on various requirements and assessments. In many situations, it’s a reasonable step to assume that content that is never requested is not (or no longer) relevant and should not continue to consume valuable finite network resources.

The distinction that Freenet very carefully preserves is to remove unpopular data, not unwanted data. If people aren’t at all interested in some piece of information, that’s one thing, but if people dislike it and actively remove it, that’s a form of censorship even if the decision is based on a majority. This philosophical and political stance doesn’t sit well with everyone, but it’s valid nonetheless. And it leaves the choice of continued availability up to any single individual.

Bit 9.8 Fulfilled requests ensure continued document storage.

Anyone can easily and anonymously ensure that a particular document is not removed from storage by simply requesting it every so often. Failing that, the content can be reinserted by anyone who has a copy.

You might wonder if explicit removal is not an option, what about updates?

Trying to publish revised content under the same name generates the same key as the old version and is therefore denied due to the immutability of published content as the protocol was originally implemented. The denied attempt accesses the existing file in the same way as a normal request, delaying its expiration, and therefore actually serves to propagate the old version instead.

Later development modified this total immutability by allowing updates using the same digital signature as the original document. The application of this to publishing revised versions of content is discussed in detail in the later section about Freenet key types. An alternative or supplementary solution perhaps would be to implement some form of secure versioning system.

Publishing to Freenet

Publishing content to a Freenet network is somewhat involved at present, although a number of tools make the process more transparent to the user. However, it is not yet as easy for the user as Web-content publishing has become.

There is no requirement to be a member node of the network in order to publish. It is sufficient to be able to access an active node in some way; by client, gateway, or some other user interface to the Web.

One example of a publishing tool is the Frost client, mentioned earlier, which normally interfaces to a “localhost” instance of a Freenet server. The composite view in Figure 9.3 shows the tab views for both upload (or publish) and download (retrieve) sessions. The client enables a user to simply browse the local hard disk for a file to publish, then handles the details automatically. The index search facility, described earlier, makes selection for retrieval much easier than handling the long and opaque hashed keys that are the URI access mechanism for Freenet.

Figure 9.3. Composite views from the Freenet Frost client showing both upload and download sessions. Note the hash key for each file, which is the identity needed to retrieve the file from the distributed storage.

An increasingly popular Web-based publishing tool (only on Windows at present) is FreeWeb (found at freeweb.sourceforge.net), by David McNab—who also authored Psst, described in Chapter 6. Like many other Freenet applications, FreeWeb is strongly dependent on the current Freenet version.

Anyone can publish their own Web site content, for example, to the Freenet network—such sites are often called “ freesites”. This route avoids the usual problems associated with free Web hosts: content control, storage limits, and third-party advertising. Although freesites must have unique names, this naming scheme is internal to Freenet, unrelated to the usual domains or DNS services.

Unlike with traditional Web servers on the Web, a freesite originally had to be updated on a daily basis; otherwise, it would vanish. Newer client agents have eliminated this requirement, which was a consequence of the key-data immutability of cached storage model and of how the agent located and updated the published web. Changed freesite files are uploaded into different Freenet keys, as is a new “map” that defines the layout of the site.

Chances are, you wouldn’t even notice when browsing the Web that you might have followed a hyperlink through a gateway system and were viewing published content stored as a freesite on Freenet. Only two things would be apparent on inspection: a marked and consistent latency in serving new page content and a URL that is somewhat more involved than your usual HTTP address. Neither characteristic is really that unusual even for normal Web sites, and server delays are far more common than users would like. Greater latency is a distinctive trait for Freenet content due to the overhead of encryption, however, and because of how content is stored and accessed. Mitigating this, popular content is served progressively faster as content gets automatically replicated nearer the requesting user.

Other Freenet Client Software

The documentation specifies a special subset of FNP for clients called Freenet Client Protocol ( FCP), which is designed to abstract out the essentials of FNP so that client developers do not have to track the ever-evolving main protocol in all its gory details.

The intent is that FCP should embody the bare bones of FNP only—for example, metadata handling is not currently included in FCP. On the other hand, this subset protocol is never meant to go across a network but intended only for the loopback to a localhost server, so it doesn’t need all the features of FNP. Server nodes therefore are designed to refuse FCP connections from hosts other than localhost by default.

This leads to some common characteristics of Freenet clients. They generally assume the existence of a local Freenet server and attempt to connect with it on a localhost port in the 8000 range. The clients may also assume the existence of a JRE, because it is a normal requirement of current server software.

FreeWeb author David McNab also provides FCPTools, a set of command-line tools that allow convenient insertion into Freenet of files (fcpput) or entire Web sites (fcpputsite). Also included in the set is FCPproxy, which is a small proxy that acts as a Freenet gateway for any URL starting with “http://free/” and will enable a normal Web browser to access known freesites. In addition, it performs useful filtering functions even when browsing the Web. As initially configured, the proxy attempts to connect with the local Freenet server on localhost port 8481; it listens on port 8000 for incoming messages. The configuration file explains other options.

Freesite publishers often want an anonymous way to allow feedback from readers. One of the most popular is Frost, a tool with a Usenet-like mechanism that operates anonymously over Freenet.

Freenet recently implemented a remote procedure call interface (XML-RPC) to a node, which is a simple and light protocol. Client software authors therefore no longer have to implement the complicated encryption and encoding needed to speak full FNP. The client doesn’t even have to parse the protocol because it remotely accesses the node system’s local API calls by way of a plug-in running on the node. Libraries to call methods using XML-RPC are already available from www.xmlrpc.com for many languages (such as Java, C, Python, Perl, PHP, Delphi, REBOL, Dylan, Tcl, ASP, COM, AppleScript, Ruby, Shell script, and C++).

Another area of application development for Freenet is collected under the Everything Over Freenet ( EOF) project (at eof.sourceforge.net). The main headings found at this site include

Apt-get, which is a Freenet version of the framework used for distributing Debian Linux packages over the Internet. An installed Debian system can thus automatically search and fetch updates from Freenet.
Mail, which is a prototype of both the general e-mail transport using Freenet infrastructure and a server to provide the mailbox service. Not without some problems, but reported working. Stock e-mail clients communicate with a special gateway account.
News, which is related to Mail, defines a prototype of the news infrastructure and a server for this purpose. Stock newsreaders are directed to a gateway port (as localhost:1119), if possible.
Chat, which is a working prototype of a chat infrastructure. As yet no spiffy front-end interfaces, but only a test client. Freenet chat is said to be quite slow (painfully so) even for just two local users, with considerable startup lag. Consider this proof-of-concept only—the developers cheerfully admit to being insane to even try this application.
Gaming, which is restricted to a generic gaming framework for turn-based games because of Freenet latency. It implements a secure, anonymous transport layer for asynchronous moves. Chess is the chosen prototype.
DNS, which has the goal to implement Freenet Naming Service (FNS), an alternative method to map human-readable names to IP numbers. It restructures the hierarchical DNS or domain model into a peer model with arbitrary strings, removing the technical limitation in DNS that allows authoritarian control of namespace administration. The aim is that the FNS implementation can be a plug-in replacement for any network.

In other words, basically everything that today goes over the ordinary Internet is seen as potentially workable in a Freenet setting, conferring the privacy and persistence advantages of this network.

However, because of fundamental differences between the quasi-hierarchical Internet (which is very server-centric) and the server-agnostic Freenet, some of these services are implemented in significantly different ways. As indicated, some might not even be viable, while others, perhaps as yet unknown, will emerge.

Trust and Content Veracity

Freenet has the basic approach that individual nodes are inherently untrusted. The main issue at stake here is that a node must not be allowed to return false data.

Why is this so important? Look again at the caching process described earlier. Were a node able to pass on a bogus document, that false content would be cached by all nodes participating in the request fulfilment—it would spread “like a cancer”. All content would by implication be suspect. Note that by “false content”, we mean manipulated away from the content actually published; it’s not a value judgement about the content as created by the original publisher of the document.

One kind of “falsehood” is allowed in Freenet, however. Because maintaining a table of data sources is a potential security concern, any node along the way may unilaterally decide to change reply message headers to claim itself or another arbitrarily chosen node as the data source. Such deliberate obfuscation of real sender identity strengthens proxy-chain obscurity.

Freenet Keys

Freenet keys provide the mechanism for ensuring true data and rejecting damaged or bogus data. A node can use the keys to validate that a document or message sent from another node is correct, and if it isn’t, it will simply stop accepting traffic from that node—in principle forever if it’s a signature failure. The request that generated the invalid response is then restarted to other nodes. State is signaled by control flags (usually CB_OK and CB_RESTARTED) in keys forwarded downstream.

The basic key types are supported by Freenet as URIs, with a format given as freenet:keytype@data. Keys can be chained through document metadata references to take advantage of several different key types.

Bit 9.9 Everything in Freenet is stored in terms of key-data pairs.

Ask Freenet for a key, and it will return any data mapped to that key. Provide a key and data to publish, and Freenet will store the data (any data) under that key.

The current key-exchange system is Diffie-Hellman. Interested readers are referred to sources dealing with cryptography, and the public key algorithms in particular. A short summary of the supported key types follows.

Content hash key (CHK), which is formed from a numeric hash (160-bit SHA 1) of the data. CHK is used to verify the integrity of the (document) data. A node would apply the same known hash algorithm to any data it transfers and compare the result with the CHK that follows the data.
Keyword signed key (KSK), which is derived from the descriptive text string. KSKs are similar to paths in a normal filesystem—subject/ subtheme/documentname. Despite this appearance, a KSK string is merely a human-readable identifier; it has nothing whatsoever to do with any hierarchical storage model in the network. The generated public/private key pair is used to hash a file key (public) or digitally sign the file (private).
Signature verification key (SVK), which (as type 0x0201) is similar to a KSK (type 0x0202), except that it is a purely numeric key to begin with. The purpose of a SVK is to generate a key pair, the private component of which remains with the originating client and provides a way for the publisher of a document to update it. Ownership (and trust assignment) resides with the bearer of a private SVK key.

Keys and File Management

CHKs are unique and tamper proof, and are the primary storage key used for Freenet data. A variant for validating large documents is Progressive CHK, which enables the document to be checked in stages, blocks of data at a time. Interestingly, the CHK also prevents the same document from being inserted into the network more than once, because this would generate identical keys and hence clash.

As for KSK strings, the client transforms them into a binary type using a one-way transformation process. It is therefore impractical to attempt recovery of the text string from an intercepted binary version of it. In order to regenerate a valid KSK document, you need to know the original KSK string, and this one step prevents a node from substituting other content for a binary KSK key.

In practical terms, knowing a KSK string, any user can have a node hash it and use the public key to retrieve the file. The KSK lock is the weakest of the keys used and has a number of issues that are being worked on in the ongoing Freenet development process. One issue is the globally flat namespace with risk for name clustering.

Namespace structure is partially addressed with the SVK subspace key ( SSK), which is a client-side representation of SVK with a document name. SSKs allow the user to create a simple, personal-name subspace with some control over insertion. The trade-off due to this specific and controlled clustering is guessable keys. Using digital signatures and SSKs, published documents are clearly associated with the same source, and names won’t collide with global ones.

Files are also encrypted by a randomly generated encryption key. To allow others to retrieve the file, the user publishes somewhere the CHK together with the decryption key. Note that the decryption key is never stored with the file, because to do so would provide a means for node operators to determine the content of stored files. The decryption key is instead only published with the file key.

Indirection and Updating Files

CHKs are most useful in an indirection mechanism together with SSKs. To store an updatable file, for instance, a user inserts it under its own CHK. An indirect file containing the relevant CHK is then inserted under an SSK. Other users are able to retrieve this content in two steps, using first the SSK, then the retrieved CHK.

Updating this content is also a two-step procedure. The owner first inserts a new version under a different CHK. The new indirect file pointing to the updated version is inserted under the original SSK, however. A key collision therefore occurs when this insert reaches a node that possesses the old version. If the signature on the new version is both valid and more recent, the node replaces the old indirection with the new. The SSK indirection therefore always leads to the most recent version of the file. Note that old versions can still be accessed directly by using the CHK—if not requested, these old versions eventually lapse from the network.

This same indirection method can be used to manage directories. Another use is to split large files into multiple parts, which can be desirable because of storage and bandwidth limitations. Splitting even medium-sized files into standard-sized parts also has advantages in combating traffic analysis. Each part is inserted separately under a CHK, with SSK indirection of one or more levels to point to the parts.

Keys figure prominently in the protocol analysis that follows.

Protocol Details

It’s important to remember that FNP is under constant development, and the whole project remains at what is in effect early prototyping stage. Significant changes can occur between major versions, and several essentially different Freenets might be deployed concurrently, thus the importance of the common usage of specifying version in discussions—for example Freenet 0.3 or Freenet 0.4.

Bit 9.10 Freenet 0.3 and Freenet 0.4 are incompatible protocols.

Nodes running under one version can’t communicate with nodes running under the other. Hence, content stored in one is not directly accessible in the other, except possibly through gateways or from the Web.

Freenet version distinctions are noted in the following only when relevant.

As mentioned earlier, clients generally use the subset FCP to communicate with a local server instance over the localhost loopback. Nodes then use FNP in their further communication with other nodes. A connection (that is, a session) is established and torn down for each transaction. FNP is packet oriented and doesn’t care what underlying transport layer protocol is used for messages. Persistent protocols such as TCP allow multiple messages to be pipelined.

Each session is started with a four-byte identifier; two for session ID, two for presentation ID. Currently fixed at (0,0,0,2), these values may vary in the future depending on encryption status or alternate syntax formats. The identifier is followed by an initiating message, and the transaction is completed by the fulfilment response.

A timeout condition is implemented so that clients and nodes don’t wait indefinitely. The timeout is a function of the HTL of the message, defined in seconds as (mean * hops) + (1.28 * sd * sqrt (hops)). Mean and sd are set to 12, which results in typical timeouts on the order of a few minutes at most.

Message Formats

A transaction message consists of a sequence of end-of-line delimited values (UTF-8 text) in either lf or crlf format. Messages are assumed passed over a “clean channel”, which means that content must not be modified in any way. Implementations may simulate clean channels through encoding, such as the base64 scheme used to preserve keys, digital signatures, and binary file attachments to e-mail.

Header 
[Field1=Value1] 
.. 
[FieldN=ValueN] 
EndMessage

Header values define what kind of message. EndMessage does not appear in messages that end with a data field and trailing data. Table 9.2 shows the currently defined message types and their expected responses.

Looking in somewhat more detail at some of these message types, we can see that after a request, the client waits either for a terminating response (possibly an error condition) or for a success. A successful content request results in DataFound, here shown FCP simplified without some of the more esoteric fields for UniqueID, source, transport, hops, and so on that are used between nodes in the full FNP.

DataFound 
DataLength=<number> 
[MetadataLength=<number>] 
EndMessage

The DataLength value is the total number of bytes for data and metadata together. The MetadataLength specifier is optional and defaults to zero. A sequence of DataChunk messages then follows to transfer the content to the client.

DataChunk 
Length=<number> 
Data 
<Sequence of Length bytes of the data>

Table 9.2. Summary of defined message header types and expected response types in client (FCP) and node (FNP) communication with a Freenet server node. Exact header names vary between different documentations.
Message type	Possible responses	Comments
`ClientHello` `RequestHandshake`	`NodeHello` `ReplyHandshake` (terminates connection)	Optional handshake, never forwarded (HTL=1). The response provides protocol and node (version) information.
`ClientGet` `RequestData`	`URIError,` `DataNotFound,` `RouteNotFound,` `Restarted,` `DataFound, DataChunk` `ReplyNotFound,` `ReplyRestart,` `RequestContinue,` `SendData,` `ReplyInsert`	Frames a request for a particular document, as identified by its fully specified Freenet URI using its KSK.
`ClientPut` `RequestInsert`	`URIError, Restarted,` `RouteNotFound,` `KeyCollision,` `Success` `ReplyNotFound,` `ReplyRestart,` `RequestContinue,` `SendData,` `ReplyInsert`	Frames a request to publish a particular document under a hashed key generated from its name.
`GenerateCHK`	`Success`	Requests node to generate the hashed key based on a text string.
`GenerateSVKPair`	`Success`	Requests node to generate a public signed key pair.
(any)	`Failed, TimedOut` (terminates session)	The transaction could not be completed because of a fault in the node. A descriptive text in the response can indicate why.
(any)	`FormatError` (terminates session)	The client message could not be parsed as a valid message type. A descriptive text in the response can provide diagnostic help.

DataChunk messages have a trailing data field of the length specified, and the node continues sending chunks until the transfer is done. There is no explicit EndMessage. The client already knows the total length of the transfer from the DataFound message and can therefore determine completion on its own. No special termination message is sent by the node; the connection simply dies after the last chunk.

Latency for longer messages (such as document transfers) is handled by tunneling between nodes, so that individual chunks are passed on downstream as soon as each is received, instead of waiting for the entire document to be received. Progressive CHKs and control flags, explained earlier, are implemented to allow rapid validation and containment of invalid data.

If chunk data fails at any time to verify at a node, it may send a Restarted message, indicating that the transfer will restart from the beginning (which implies that the client should simply discard all the previously received chunks). Alternatively, the error might be fatal in terms of this retrieval, in which case another suitable error message is sent to terminate the connection.

In the case of insertion, the message format is:

ClientPut 
HopsToLive=<number> 
URI=<string> 
DataLength=<number> 
[MetadataLength=<number>] 
Data 
<Sequence of DataLength number of bytes>

The URI is a fully specified Freenet KSK string, same as used for ClientGet. If the client is inserting a CHK or SVK, the URI may be abbreviated to just “CHK@” or “SVK@”, respectively. In the former case, the node will calculate the CHK, and in the latter, the node will generate a new key pair.

Length specifiers are the same as for DataFound. However in this case, the specified data field must contain the entire content in one go; it can’t be chunked. The node must get all of the trailing field before it can start the insert into Freenet.

In the case of a KeyCollision response, insertion was refused, and the message returns a URI field with the Freenet URI of the document that already occupied the requested key slot. Non-CHK key types have an upper limit of 32KB, which explains the SizeError response. There is no limit on content size.

On the other hand, successful insertion returns the Success message with the Freenet URI of the new document. If the inserted document was a SVK, it returns a private/public key pair. The format is

Success 
URI=<string> 
[PublicKey=<string: Public key>] 
[PrivateKey=<string: Private key>] 
EndMessage

The subject of key types and their generation was discussed earlier. The special key-generation requests affect only the immediate node, not the rest of the network, unlike when specifically inserted with the previous request options. To create a CHK from an arbitrary string uses this message format:

GenerateCHK 
DataLength=<number> 
[MetadataLength=<number>] 
Data 
<Sequence of DataLength number of bytes (data+metadata)>

Success simply returns the URI string. GenerateSVKPair has no extra fields, but is followed only by EndMessage. Success there returns the public and private keys as message fields in that order. They are constructed as Freenet-base64 encoded.

These key strings can subsequently be used to insert or request signed documents by including the appropriate one in the URI specification:

freenet:SSK@<PrivateKey>/docname -- (insertion) 
freenet:SSK@<PublicKey>/docname -- (request)

Looking instead at messages between nodes, Table 9.3 summarizes the main message types encountered.

RequestData messages propagate downstream a routed chain, generating responses as might be expected. Successful requests result in some node responding with a SendData and the data. If HTL expires, a ReplyNotFound is passed back. On the other hand, if the last node in the chain runs out of paths to try and HTL is still valid, the response is instead RequestContinue with the remaining value of HTL. It is then the responsibility of the upstream node to try another, less-close key-path, and send a ReplyRestart to its upstream node. This upstream process to try alternate paths iterates up the chain as necessary, until either the data is found along some other path or a RequestContinue comes back to the requestor. The latter, having exhausted all possible paths, may then conclude that the data is not available within the request horizon defined by current topology and HTL.

Table 9.3. Summary of defined message header types for (FNP) communications between Freenet server nodes. Exact header names vary between different documentations.
Message type	Possible responses	Comments
`RequestHandshake`	`ReplyHandshake` (terminates connection)	Optional handshake, never forwarded (HTL=1). The response provides protocol and node (version) information.
`RequestData`	`URIError, DataChunk` `ReplyNotFound,` `ReplyRestart,` `RequestContinue,` `SendData, ReplyInsert`	Frames a request for a particular document, as identified by its fully specified Freenet URI using its KSK.
`RequestInsert` `SendInsert`	`URIError, Success` `ReplyNotFound,` `ReplyRestart,` `RequestContinue,` `SendData, ReplyInsert`	Frames a request to publish a particular document under a hashed key generated from its name.

Bit 9.11 In Freenet, “not found” is not the same as “not stored anywhere”.

Constrained searches are not exhaustive. While “found” responses are conclusive, “not found” depends on numerous variables, and are relative and indeterminate.

In practice, the success-reinforced “learning” behavior of the routing tables, and their inherent p2p adaptability, make search results for existing content converge towards successful retrieval for most requests at some typically small average path length. Connectivity in the tables is aligned towards found content, and successful retrieval replicates along request paths, further increasing chances of success.

Insertion attempts might generate a variety of responses. SendData implies a key collision with an existing file. ReplyNotFound also implies a collision, because routing table information was found, but a node with the content could not be contacted within allowed HTL and timeout constraints. A RequestContinue is also considered a failure in this context, because it is interpreted as meaning the request could not be extended to the required number of hops. However, if the insert request expires without encountering a collision, the last remote node in the chain replies with a ReplyInsert, indicating that the insert can proceed. As the inserted data is fed into the network using SendInsert, nodes store the data locally and pass it along downstream to the key-determined location.

Bit 9.12 Insertion occurs at locations where requests are likely to be routed.

The method of using CHK routing tables optimizes the match between initial storage location and subsequent request routing, without requiring that the data remain in any single location indefinitely.

Message Header

In the interests of completeness, Table 9.4 takes up the message header fields.

Freenet is a message-based protocol. Therefore, nodes are free in principle to close idle connections and connect back to the source later when responding. The Source header provides this reconnection information in the form of a return node address, the last immediate sender. This header is stacked at each hop, the forwarding node substituting its own address for the next hop.

Node addresses consist of a transport method plus a transport-specific identifier such as an IP address and port number (for example, tcp/192.168.10.1:9113). A node that changes IP addresses frequently may instead use a virtual address stored under an address-resolution key (ARK), which is an SSK regularly updated to contain the current real address.

The source field should be omitted for a node that doesn’t wish to accept incoming connections of this nature—or can’t because it’s behind a firewall. The node should instead just keep the idle connection open for responses. The last return source address going back up a request chain is normally the address of the requesting node/ client or possibly the address where this requestor wants the result delivered.

In the special case of a document being sent back to a requestor, it’s allowed for a node to arbitrarily change the source pointer of these messages to any random address to obfuscate the real source of the content.

Because the transaction identifier UniqueID is just a random albeit large number, it’s not guaranteed to be unique. On the other hand, the probability of a value collision occurring during a transaction lifetime, among the limited set of nodes that it sees, is exceedingly low.

Table 9.4. Summary of defined message header field types in Freenet messages. Numeric values are expressed in hexadecimal.
Field name	Value	Comments
`UniqueID`	64-bit numeric transaction identifier	Identifies related messages. It is set to a random value by the originator of a message.
`HopsToLive`	Current hops to live (enforced <= 100)	Decremented at each hop (always) until it reaches 1, after which the message is discarded.
`Depth`	Number of hops made	Incremented each time the message is (successfully) forwarded.
`KeepAlive`	Boolean (default to True)	Informs node whether the connection should be kept alive or closed after forwarding.
`Source`	Transport protocol address (currently only tcp address and port)	Identifies the immediate sender of message. Is the basis for stacked return path.
`Storable`	String, free content (for example keys)	Nodes caching documents must also cache Storable fields and include them in any responses.

HTL is set by the originator of a message and is decremented at each hop to prevent messages being forwarded indefinitely. Actually, messages do not always terminate in Freenet after HTL reaches 1 but can sometimes be forwarded once again (with HTL still at 1). This ruse is simply to reduce the information that an attacker might gain from an intercepted message and HTL value.

HTL is also coupled to message timeout. A node sending or forwarding a message starts an associated local timer set for an expected maximum duration of time it should take for the message to be relayed through this number of nodes and return a response, after which it will assume failure. While the request is being processed, a remote node may periodically send back ReplyRestart messages indicating that a message is stalled, perhaps waiting on network timeouts. In this case, the sending node knows to extend its timer.

The purpose of hop-tracking Depth is to allow a replying node to set its response HTL just high enough to reach the requestor. Requestors on their part should initialize Depth to a small random value to obscure their location. Corresponding to the HTL value ruse, a depth of 1 is not always incremented, but with finite probability, it might be passed unchanged to the next node.

Node Discovery

Joining the network is simply a matter of connecting to a number of existing nodes in the network and starting to pass messages. Node discovery however is something that’s tended to be glossed over in most descriptions of Freenet. How does any user client or local server find a Freenet node with which to connect? You’ll recall this fundamental bootstrap problem from Gnutella (discussed in Chapter 7), and it’s common to all atomistic p2p implementations.

As it turns out, the issue is glossed over in Freenet documentation as well; the discussions assume already functional nodes with content stacked and a message exchange history with other nodes. Although node discovery has been an often discussed subject on the Freenet developer mailing lists and the Freenet IRC channel, implementation of any solution has been decidedly ad hoc, described only as relying on “out-of-band” means—that is to say, on methods external to Freenet.

Actually, two different methods have been used so far.

In Freenet 0.3 (the previous major version), a designated central server collects active node IP numbers. A newly started node therefore can request a list of active nodes from this server and try to connect to them. However, the existence of any centralized service exposes a vulnerable point in the network and is foreign, of course, to the basic design principles of Freenet.
In Freenet 0.4 (current at time of writing), the Freenet developers prefer to use distributed reference files which contain the “seed” addresses to other nodes. While more flexible and less vulnerable, this alternative also presents problems similar to the issue of content index. How are these lists to be updated, distributed and accessed? How much of the address information will be outdated by the time a new node tries to use it?

At present, v0.4 nodes can be configured to use a mix of methods—manual entries as well as seed reference files. Like many other critical design points in this prototyping evolution, node discovery is very much an ongoing work.

Once connected, further node discovery is easier. Given at least one known address to an active node, the joining client begins to send messages. The request mechanism automatically enables the new nodes to learn about more of the network over time. However, because new nodes aren’t normally in a position to successfully respond to requests and, in any case, will generally not be on a routing table to receive any, existing nodes won’t discover the new nodes. Recall that Freenet nodes do not broadcast requests; they selectively route them.

New-Node Announcements

The solution for new nodes to gain recognition with other nodes is to somehow announce their presence. Such a solution unfortunately is complicated by two somewhat conflicting requirements.

On one hand, to promote efficient routing, all existing nodes should be consistent in deciding which keys to send a new node (and thus assign in their routing tables). On the other hand, it would cause a security problem if any one node could choose the routing key. This concern therefore rules out the most straightforward way of achieving consistency.

A cryptographic protocol was devised to satisfy both of these requirements. A new node chooses a random seed and sends an announcement message containing its address and the hash of that seed to some existing node. Whenever a node receives a new-node announcement, it generates a random seed, XORs that with the hash it received, and hashes the result again to create a “commitment”. It then forwards the new hash to some node chosen randomly from its routing table.

This process continues until the HTL of the announcement runs out. The last node to receive the announcement just generates a seed. Next, all nodes in the resulting chain reveal their seeds. The key for the new node is assigned as the XOR of all the seeds. Checking the commitments enables each node to confirm that everyone revealed their seeds truthfully. This seemingly convoluted process yields a consistent random key that cannot be influenced by a malicious participant. Each node therefore can safely add an entry in its routing table under that key for the new node.

Malicious Nodes

The possibility of malicious nodes joining the network is the most difficult problem that a distributed network must face and has been addressed in various ways.

One solution attempt often seen in the networked gaming clients is to keep the protocol and software code proprietary and closed. This approach proves to provide only a short-term protection, and in any case, it doesn’t usually address the issue of detecting malicious nodes that do manage to break through security measures.

Freenet philosophy is opposed to closed solutions, so it must seek answers elsewhere. The focus instead is on managing communication at the node level, so that requests are passed to a neighboring node (and data accepted from it) when it is providing evidence that it is functioning well, and routed away from it when it is not. The reasoning is that a malicious node can do little harm if other nodes refuse to communicate with it.

The practical full implementation of such a scheme still remains at the discussion level because various criteria must be evaluated to determine whether or not a particular node is “functioning well enough” to be accepted. These criteria include trust metric, reputation tracking, democratic node “votes”, message index hashes to detect spoofing, and so on. The issue is an example of the ongoing work that the development teams regularly report on.

Modification of requested files by a malicious node in a request chain is an important threat and therefore a strong reason to devise methods against such threats. Only in part is this because of possible corruption of file content. Routing tables are based on replies to requests, so a node might attempt to steer traffic towards itself by returning fictitious data as bogus successful retrievals.

The use of both CHKs and SSKs addresses this threat, because other nodes can always detect invalid data unless a node successfully forges a cryptographic signature or finds a hash collision. Signatures based only on KSKs, on the other hand, can be created by anyone in possession of the original descriptive string, which fact renders KSK signatures vulnerable to dictionary attack due to the somewhat predictable nature of human-readable descriptor strings.

Existing files could potentially be displaced by inserting alternate versions under the same keys, but this is prevented by the immutability of storage unless an update is allowed by a valid CHK or SSK. File displacement using a KSK attack may result in both versions coexisting in the network. Normal node reaction to insert collisions is to return the original version, as described earlier, and this behavior is intended to make such attacks more difficult. Thus, the more corrupt copies an attacker attempts to circulate, the greater the chance that it results in key collision and a consequent increase in the number of genuine copies replicated across the network.

Finally, various DoS attack schemes might be devised as attempts to disrupt the network. The most significant DoS threat is probably that of trying to fill all of the network’s storage capacity by inserting a large number of junk files. Various countermeasures have been suggested: “Hash Cash” to slow attacks by imposing a computational “payment” for insertion, dividing the data store into separate sections for new inserts (can be displaced) and proven requested files (can’t be displaced), and others. The respective pros and cons of these measures are under constant developer review. Depending on the deployment environment for a Freenet type network, different approaches would be deemed appropriate

Security by Obscurity

The other aspect of malicious node management is to simply limit how much useful information they can collect by just being part of the network. As explained earlier, Freenet nodes work in a relatively isolated way and know little beyond the IP identities of their nearest neighbors.

The message and key analysis sections show how this relative obscurity is further enhanced by allowing nodes to arbitrarily provide false source identities and by not consistently updating the depth and HTL values. This makes it difficult to create reliable maps of the active network or determine where particular content is stored. While it’s trivially true that a successful request will guarantee that the content is stored (for a time) in the neighbor node, there is no good way to localize where the “authoritative” node for that content is at any given time. (Unless of course you are in a position to continuously monitor and analyze traffic to and from all possible nodes—but then any form of anonymity and security becomes highly unlikely.)

The real identities of senders and requestors are similarly obscured. Freenet communication is not directed towards specific receivers, however, so receiver anonymity is more accurately viewed as key anonymity—hiding the key that is being requested or inserted. Strict key anonymity is not possible in the basic Freenet scheme, because routing depends on knowledge of the key, yet some measure of obscurity against casual eavesdropping is given by the use of hashes as keys. A residual vulnerability to a dictionary attack remains because their unhashed versions must be widely known in order to be useful. Sender anonymity is preserved against a collaboration of malicious nodes, because no node in a request path can determine whether its upstream neighbor initiated the request or merely forwarded it.

Bit 9.13 All security and anonymity measures make some assumptions.

Freenet assumes that nobody can monitor what is going on inside your computer, which is why the only truly “trusted” node for a client is one on your own machine.

The first node that a user client contacts is a weak link in that it can potentially act as a local eavesdropper, and no message protection is implemented against this. This vulnerability is why it’s recommended that users connect only to server nodes running on their own machines, as a trusted first point of entry into the Freenet network. Messages between nodes are encrypted against local eavesdropping, although traffic analysis of these nodes might still determine probable point of origin.

Stronger sender and key anonymity in this context can be achieved by adding so-called “ prerouting” of encrypted messages, where a succession of public key encryptions overrides the normal routing mechanism to determine the route that a message follows. Nodes along this route are unable to determine either message content, request key, or originator. When a message reaches the end of its prerouting path, it’s injected into the normal Freenet network and subsequently behaves as though the preroute endpoint is the originator.

The fact that a node is listed as the data source for a particular key does not necessarily mean that it actually supplied that data or was even contacted in the course of the request. This is because the source field is occasionally changed by a node in the chain passing along the file. It’s not possible to tell whether the downstream node provided the file or forwarded a reply sent by someone else.

In either case, a copy of the file remains on the downstream node, so a subsequent inspection of that node (for example, with a request probe with HTL=1) on suspicion reveals nothing about the prior state of affairs (an HTL=1 probe might be forwarded to another node regardless). This provides plausible legal ground that the data was not there until the act of investigation placed it there. The success of a large number of requests for related files, on the other hand, could conceivably provide grounds for suspicion that those files were being stored there previously.

Scalability and Stability

In the real world, scalability and stability are a matter of empirical study and not always well understood. Freenet is still experimental and changing. Nevertheless, some conclusions can be drawn both from theory and initial deployment.

Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong studied numerous aspects of theoretical, simulated and real network behavior to test some of the essential characteristics of the Freenet architecture. Their “Freenet: A Distributed Anonymous Information Storage and Retrieval System”, published in June 2000 and revised in December, provides a rich source of information. It and Clake’s original thesis are available from www.freenetproject.org/cgi-bin/twiki/view/Main/Papers.

The main simulations they performed were

Network convergence, which tested the adaptivity of the network routing
Scalability, which looked for any inherent constraints to growth
Fault tolerance, which tested how resistant the network was to lost nodes

Convergence

Inserts of random keys were sent to random nodes in a simulated test network of 1,000 nodes, interspersed randomly with requests for randomly-chosen keys known to have been previously inserted, using a HTL of 20 for both. Every 100 time steps, a snapshot of the network was taken and its performance measured using a set of probe requests. Each probe consisted of 300 random requests for previously inserted keys, using a large HTL (500).

Initially, measured path lengths were very high. Most requests probably didn’t succeed at all—failure by the test probes resulted in a measured value of the max HTL. However, path lengths decreased rapidly over time as the routing tables adapted to the actual distribution of keys. As the network converged, the median path length for requests dropped to a low value (6).

Scalability

The team started with a small network of 20 nodes and inserted more nodes over time, every five time steps. As before, they inserted new keys at random and measured the change in mean path length for random requests.

They found that the network scaled approximately logarithmically, which held up to a size limit of 40,000 nodes, probably determined by the size of the routing table (250 entries), after which path length increased more rapidly. Nevertheless, the network appeared to continue to scale reasonably to about a million nodes with the average path length reaching only 30, despite no pauses in growth to allow a steady state convergence process. Varying node bandwidth is ignored in this study. Real-world nodes could easily maintain routing tables with thousands of entries, with correspondingly greater potential scalability.

Fault Tolerance

The team grew a network to 1,000 nodes using the previous method, then removed randomly chosen nodes progressively from the network to simulate node failures.

The network proved surprisingly robust against quite large failures. The median path length in their examples remained below 20 even when up to a third of the nodes were removed. Such results bode well for full-scale performance of real networks. The team explained overall performance in terms of a “small-world” model, in which the majority of nodes have only relatively few, local connections to other nodes, while a small number of randomly dispersed nodes have large, wide-ranging sets of connections. Small-world networks permit efficient short paths between arbitrary points because of the shortcuts provided by the well-connected nodes.

The distribution of links within a Freenet network closely approximates a power law, which the team took as a sufficient property to qualify as a small-world model. Thus, random node failures are most likely to affect the majority that possess only a small number of connections. Losing poorly connected nodes will not affect routing very much in the network. Only when the number of random failures becomes high enough to disable a significant number of well-connected nodes does routing performance become significantly affected.

Practical Installation

Discussion of practical installation comes late in this chapter for several reasons. One is that in a prototype system like Freenet, much can change between time of writing and when you read this, easily making detailed installation information obsolete. Another is that much of the same user functionality can be realized through a gateway access from the Web without any particular user installations at all.

In Figure 9.4, we see the two main ways a user might access Freenet content—or for that matter, publish content to the network. Because access is always indirect, either by way of one’s own “trusted” node on the local machine or by way of a remote gateway system, the degree of indirection matters little to the network, although the latter method can compromise the user’s anonymity.

Figure 9.4. Two ways to access (or publish) content on Freenet. At left, the user runs own trusted node with connections to other Freenet nodes and uses Freenet client software to access own node. The user to the right uses a normal Web browser to access a gateway on a remote machine.

It should be noted in this context, that participation as a node in Freenet has a number of caveats, and it’s recommended to have the intention of remaining online 24/7, or at least for long periods. Although the network has a proven ability to tolerate a certain fraction of transient nodes in the system, the design is such that overall performance and stability is better with more “permanent” nodes.

Pragmatically, continuous connectivity is a good thing for the user as well—it’s simply in your own interest to stay connected. Recall from earlier discussions how the network adapts storage and routing. This particular kind of adaptive single-path routing puts new nodes at a distinct disadvantage when it comes to finding and retrieving content. It’s not unusual for a user, who has just installed the node software, joined the network, and uncovered a number of content keys, to have great difficulty finding anything at all even with maximum HTL. The first key requests will be sent out in random directions, there being no stack history of successful requests on the local node to indicate a best routing. Therefore, they generally fail.

As the node remains online for a longer time, it eventually acquires a better routing table; in part from the successful requests that occasionally do occur, in part from participating in passing along results from other node searches that sometimes do take paths that include it. This routing improvement is clearly demonstrated when a new user successfully finds a freesite—associated content is suddenly also available, and finding related content is much more probable.

Node Installation

The first question for the user is which version of Freenet to join. Normally, you might be inclined to install the most recent node software, but in the case of Freenet 0.3 and 0.4, these versions define two separate and incompatible networks. If you are on the cusp of a new major version being deployed—v4 to v5 transition seems likely before this book ships—you might want to check on possible incompatibility between versions and perhaps set up the previous version instead. If the new version defines a new and incompatible network, then the previous one is where most of the content is initially. As the new version becomes established over time, it acquires more nodes and more available content. Then again, perhaps you prefer to be on the forefront and publish your content on the new network.

The other aspect to version choice depends on what kind of client software you wish to use. Much of this software has version dependencies or at least might require configuration tweaks to work on another version than originally intended for. This is admittedly a tough call before you’ve used any clients at all, but the developer sites often have feature lists and screenshots. The following is mainly about v0.4.

The first step for any Freenet node installation at present is to have a JRE installed on your system, whatever your operating system. A JRE can be obtained in many ways, from many sources. In some cases, it might already be present, installed by some other Java-enabled application—Web browsers such as Opera (www.opera.com) come in both plain and “j” versions; the latter will install a working v1.1 JRE that can also be used by Freenet software.

The second step is to decide whether you want to install a precompiled binary Freenet package or to compile your own from the sources. For Windows users, a self-installing binary is usually the best choice unless you’re dedicated to keeping up with the bleeding-edge developer code daily snapshots. This decision is less of an issue for Linux users, because they normally have both the required compilation tools present and more experience in handling source-distributed software.

Installing the binary node software is rather undramatic. The main setting question encountered is type of connectivity, and it can be changed later. It’s recommended to start with “transient” regardless, as it has no harmful effects, and move to permanent only when you’re really sure you can reliably run 24/7. It seems that the 24/7 setting is crucial for some esoteric internals that might affect network stability or your own ability to reconnect if you are offline too much or too often.

The default install process makes some reasonable assumptions about storage allocations based on connectivity and current free space on your machine’s hard disk. Such detail can be tweaked later, so don’t worry about it.

Bit 9.14 Go with the defaults and recommendations for node installation and configuration, unless you really, really know what you’re doing.

Many Advanced and Serious Geeks Only settings (yes, the dialog tab does say this!) are available in Configuration, most of which can seriously disrupt your Freenet connectivity if set incorrectly.

The other main user setting is mainly a “security” one: how to acquire “seed nodes” when starting. This is the bootstrap mechanism for node discovery, discussed earlier. The normal method is to go with the default, which makes the node go to a predefined Web address and retrieve a seed.ref file with a selection of known nodes. Someone more concerned about security can choose more circumspect methods of acquiring and selecting which nodes to initiate contact with. Because this network is still a prototype, there seems little reason to avoid the default method.

Assuming that your Internet connection is up and the node runs immediately, the Windows version stakes out a taskbar space for its status icon and retreats to the background. Right-click to reach the menu, shown in Figure 9.5, and from there select, for example, Configure to reach the configuration dialog. Other functionality options are to stop (and later start) or to stop-and-restart the node. Importing and exporting “refs” is a way of manually managing seed lists as local files, perhaps externally exchanging them encrypted through other channels with other users.

Figure 9.5. The Freenet node popup menu from the taskbar icon in Windows.

But essentially, once it’s running, you can forget about the node software, as it manages node connectivity completely and unobtrusively in the background, with for the most part no measurable system loading.

OK, it’s running. Now what? How do I reach Freenet content?

Through the local machine’s Freenet node, of course.

Node Access

To communicate with the local node and, by extension, the rest of Freenet, you need a local client. Fortunately, the node is Web aware through a proxy component, so you can use your default Web browser.

Although you can manually type in the URL (localhost:port), where port number depends on version and configuration, it’s more convenient to simply use the taskbar icon’s popup menu. Select Open Gateway to automatically invoke your system Web browser with the proper URL.

Figure 9.6 shows the Gateway-generated default page with forms for both document retrieval and file publishing (insertion). To request content from here, you must know the Freenet key for the document and either type it in manually (No thanks!) or paste it in from somewhere else. This page also provides useful tips on where to find lists of content keys.

Figure 9.6. The Freenet 0.4 node gateway’s default page accessed with stock Web browser from the installation’s localhost proxy. Note the forms for retrieval requests and browse-insertion of files from the local system.

Further down on the page, not visible in the screen capture, are some examples of normal-seeming Web page hyperlinks that in fact point to a number of Freenet resources, helpfully provided as examples to get the new user started without having to seek externally maintained content lists. The links hide URI addresses similar to the following (for a freesite):

http://127.0.0.1:8888/ 
 SSK%40npfV5XQijFkF6sXZvuO0o%7EkG4wEPAgM/homepage//

This kind of key complexity can thankfully be hidden behind a plain text, descriptive anchor, which greatly simplifies publishing Freenet keys. Just click—and wait a goodly while since you’ve just joined the network—and with luck, the browser will suddenly display Freenet-published content when a successful retrieve reaches your local node. Like all Web interfaces, this one has both advantages and disadvantages compared to an application GUI. On the one hand, Web browsing is a familiar metaphor to the user, but application interfaces can make dealing with keys more transparent so that the user is never explicitly confronted with the hash.

Once you retrieve a document, it remains in your local cache (so revisits will load the page immediately) and on the stacks of the nodes that routed the result to you, until it expires. After that, you need to request a new copy, but the routing tables (retaining pointers longer than content) now easily direct your request to the source.

Readers familiar with Web browser caching, and how the browser detects and fetches updated server content on the normal Web, might wonder how it deals with Freenet caching. The short answer is that it doesn’t. The browser can compare only with the local, node-cached copy because that’s all it knows about.

The practical matter is that freesite updates currently “roll over” at midnight UTC (the common Internet time) and only then propagate, no matter when published. Before then, any updates remain invisible, even to the publisher. This has to do with the date-stamp SVK name-space indirection that was chosen to simplify keeping a freesite coherently available during selective updates. On the next midnight, therefore, your cached copy simply expires—in fact, all cached copies, everywhere, expire. A new read request by the browser after this will generate a new search for the content on the network. If the content publisher has updated during this time, you get back the new content; if not, you get another, “renewed” copy of the old, assuming the publisher (or publishing tool) is maintaining the site.

Bit 9.15 Freesite content updates at midnight UTC—and only then.

Coordinated Universal Time (UTC) is the international time standard previously known as GMT and is used as the Freenet clock, as it is for the Internet as a whole.

However, don’t be discouraged if you don’t immediately find any of the example documents or freesites—remember, a new node has great difficulty finding anything until the routing tables evolve from their initial random values. This is why a broadcast request system like Gnutella performs better for transient nodes that have just connected, because search performance there depends only on the number of reachable nodes at any time.

Other Clients

A number of other clients that can communicate with a Freenet node are mentioned earlier in this chapter, Frost and FreeWeb to name two better known ones, not to mention the basic command-line clients bundled in the basic distribution.

The advantage of using some of the more developed clients comes from having a better interface (that is, a GUI), more features, and perhaps some optimization that the stock form requests can’t provide. For example, Figure 9.7 shows one such extension: Frost’s anonymous messaging system, the News Boards, which is a kind of newsgroup discussion securely passed between clients over Freenet.

Figure 9.7. Frost implements News Boards, which is an encrypted and anonymized form of newsgroup discussions internal to Freenet. The drop-down shown activated lists the current boards detected.

Frost’s basic functionality is discussed and shown in the earlier section on retrieving and publishing content on Freenet because it provides a convenient interface for download, upload, and messaging, plus an index-enhanced search. FreeWeb, also mentioned there as a popular publishing tool, is shown with its Site menu visible in Figure 9.8, just to give an indication of how one can easily manage freesite content using it.

Figure 9.8. FreeWeb client in Windows is for easy management of Freenet-published Web sites. The Site menu shows typical useful commands.

Ongoing Work

Interested readers can track the ongoing development at the Freenet Wiki (at www.freenetproject.org/wiki/index.php)—anyone wondering what Wiki is should read The Wiki Way: Collaboration on the Web (Addison-Wesley, April 2001).

As noted earlier, a number of important Freenet issues are still being addressed, or sometimes they’re still at the stage of being formulated. For example, there is a plan to implement a form of trusted nodes. Underlying mechanisms are partially in place, such as digital signatures, but it’s unclear as yet how exactly such a trust system will work. Eventually, node-to-node communication will be fully encrypted, but the current prototype network still uses open messages in the interests of debugging.

True anonymity (that is, strong protection) would currently require connecting to a Freenet node by way of an external anonymizer that has full control of routing and encryption. Gateway solutions to this kind of strong privacy protection are under consideration.

Another issue that might be resolved in future versions of Freenet is that the current network is not especially tolerant of transient nodes; functionality deteriorates if too many (usually dial-up) nodes join only for shorter periods of time. This degradation relates to how stacked paths and cached content get disrupted and to the resulting increase in query traffic. Nodes with permanent connections are preferred, cable or DSL being deemed adequate. As long as these nodes form the majority, a smaller group of transient nodes doesn’t impact performance too much.

Proposals for adding safe searching and indexing capabilities to Freenet are being discussed for the future—for instance, indexable hyperlinks, lists of keywords, or other readable metadata distributed through the network. As things stand, such extra data for searches must reside elsewhere and is vulnerable.

As mentioned earlier in the context of content expiration and removal, more flexible options for updating documents are being considered. In addition the current expiry model where a small document can displace a large without regard to this size difference is under review.

Because of the anonymous nature of the Freenet system, it is impossible to tell exactly how many users are in a deployed network, or how well the insert and request mechanisms are working. However, anecdotal evidence from the prototype and growing Freenet is so far very positive.

Business Solutions

A Freenet-derivative technology that’s aimed at business users is KARMA (Key Accessed Redundant Memory Architecture), which is just another way of designating the Freenet storage and retrieval model.

KARMA technology is promoted by Uprizer (www.uprizer.com), a company cofounded by Rob Kramer and Freenet creator Ian Clarke in August 2000. They have developed a content distribution product line that addresses the problems of the traditional server-client architectures by aggregating unused and wireless network resources into a single, large, intelligent computer operating system.

With the stated goal of building the next Internet, often now referred to as Internet 3.0, Uprizer describes itself as a peer-to-peer technology company designed to create a new category of distributed computing software for enterprises, content providers, service providers, mobile operators, and application developers.

How KARMA Works

The system maintains secure control over a small portion of unused disk space on each node in a network. Nodes can be any network-aware device: PCs, mobile phones, and PDAs. The storage and bandwidth pool acts like distributed RAM, and the company refers to this pool as a KARMA Drive.

The idea is that applications can address this virtual resource like any other local storage device. In effect then, KARMA puts a common driver software interface on top of a Freenet-style network architecture. Looking at the more detailed descriptions, most of the component parts of Freenet, described earlier, are easily recognized— specifically, adaptive replication and key encryption.

The KARMA client (actually the node application) is very small and uses Freenet’s heuristic routing method for requests, although here it’s called a Whispercast process. One difference is that only dormant network resources are used to retrieve and replicate information, so client performance suffers no degradation. What implications this solution has for routing is unclear.

Like Freenet, KARMA assumes that all hardware nodes on the network and their communication links are potentially insecure. Public key encryption and digital signatures prevent unwanted content, such as viruses and worms, from being arbitrarily inserted into the network, and also ensures that received content has not been modified, either in transit or in storage.

Data is stored in encrypted chunks and retrieved using a specially designed self-correcting UDP protocol to minimize latency and network traffic. This technique seems to be a refinement of the original Freenet protocol but not radically different. Uprizer points to Freenet as the practical proof-of-concept for KARMA, further confirming that the differences between the two are minimal.

Related Work

Espra (www.espra.net) deserves mention as a media file-sharing technology, like those discussed in Chapter 7, that uses Freenet as its network infrastructure. It has metadata and rating functionality, a system to reward content creators who publish to the system, and the inherent anonymity that Freenet provides. While nowhere close to mature, the underlying Freenet indexing concepts are interesting.

The Eternity Service project is an alternative server concept with aggressive encrypted distribution, where the goal is that published data is never lost. The defining paper is found at www.cl.cam.ac.uk/~rja14/eternity/eternity.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Freenet

Create new playlist

Sign In

Sign Up

Freenet

Concept of Freedom

How It Works

Bit 9.1 Only the first contacted node knows the user’s identity and location.

Bit 9.2 Freenet’s routing choices and performance improve over time.

Bit 9.3 Freenet requests and hence users are in practical terms anonymous.

Bit 9.4 Nearness to a node is totally unrelated to geographical proximity.

Bit 9.5 Freenet connectivity rewards successful content supply nodes.

The Storage Model

Leveraged Retrieval

Figure 9.1. Screenshot from the Freenet Frost client showing a list of files matching a search pattern, based on the built-in index shared among all active Frost clients. Note the hash key for each file, which is the identity needed to retrieve the file from the distributed storage.

Bit 9.6 Browsing and accessing content on Freenet is slow.

Figure 9.2. Freenet links to the normal Web are formulated in a special way and invoke this click-through warning page, here seen from a local gateway.

Content Expiration

Bit 9.7 Removal of expired documents is automatic, intrinsic node behavior.

Bit 9.8 Fulfilled requests ensure continued document storage.

Publishing to Freenet

Figure 9.3. Composite views from the Freenet Frost client showing both upload and download sessions. Note the hash key for each file, which is the identity needed to retrieve the file from the distributed storage.

Other Freenet Client Software

Trust and Content Veracity

Freenet Keys

Bit 9.9 Everything in Freenet is stored in terms of key-data pairs.

Keys and File Management

Indirection and Updating Files

Protocol Details

Bit 9.10 Freenet 0.3 and Freenet 0.4 are incompatible protocols.

Message Formats

Bit 9.11 In Freenet, “not found” is not the same as “not stored anywhere”.

Bit 9.12 Insertion occurs at locations where requests are likely to be routed.

Message Header

Node Discovery

New-Node Announcements

Malicious Nodes

Security by Obscurity

Bit 9.13 All security and anonymity measures make some assumptions.

Scalability and Stability

Convergence

Scalability

Fault Tolerance

Practical Installation

Figure 9.4. Two ways to access (or publish) content on Freenet. At left, the user runs own trusted node with connections to other Freenet nodes and uses Freenet client software to access own node. The user to the right uses a normal Web browser to access a gateway on a remote machine.

Node Installation

Bit 9.14 Go with the defaults and recommendations for node installation and configuration, unless you really, really know what you’re doing.

Figure 9.5. The Freenet node popup menu from the taskbar icon in Windows.

Node Access

Figure 9.6. The Freenet 0.4 node gateway’s default page accessed with stock Web browser from the installation’s localhost proxy. Note the forms for retrieval requests and browse-insertion of files from the local system.

Bit 9.15 Freesite content updates at midnight UTC—and only then.

Other Clients

Figure 9.7. Frost implements News Boards, which is an encrypted and anonymized form of newsgroup discussions internal to Freenet. The drop-down shown activated lists the current boards detected.

Figure 9.8. FreeWeb client in Windows is for easy management of Freenet-published Web sites. The Site menu shows typical useful commands.

Ongoing Work

Business Solutions

How KARMA Works

Related Work

Table of Contents for
Freenet