Chapter 16. Parasitic Computing, or How Pennies Add Up

Where the old truth that having an army of minions is better than doing the job yourself is once again confirmed

I hope you’ve enjoyed the ride so far. I’ve discussed a number of fancy problems that affect the security and privacy of information from its input at the keyboard to its ultimate destination hundreds or thousands of miles away. But it is too early for either of us to throw a party; something is missing from the picture—something far bigger than what we have discussed so far. The dark matter.

The problem with our story so far is simple: communications do not occur in a void. Although the process of exchanging data is usually limited to two systems and a dozen or so intermediate ones, the grand context of all events simply cannot be ignored; the properties of the surrounding environment can shape the reality of a chitchat between endpoints in profound ways. We cannot ignore the relevance of systems that are not directly involved in communications or the importance of all the tiny, seemingly isolated bits of individually trivial events that data meets along its path. It can be fatal to focus only on what appears relevant to a specific application or a particular case, as I hope this book has shown you thus far.

Rather than fall into this shortsighted trap, I’ve chosen to embrace the grand scheme of things in all its glory. Thus, the fourth and last part of this book focuses exclusively on the security of networking as a whole and discusses the Internet as an ecosystem, instead of a collection of systems accomplishing specific tasks. We pay tribute to the seemingly inert matter that binds the world together.

This part of the book begins with an analysis of a concept that appears to be the most appropriate way to make the transition. For many computer geeks, this concept, called parasitic computing, has revolutionized the way we think of the Internet.

Nibbling at the CPU

A humble research paper published in letters to Nature by Albert-Laszlo Barabasz, Vincent W. Freeh, Hawoong Jeong, and Jay B. Brochman in 2001[108] could easily have gone unnoticed. At first glance, this letter did not seem worthy of much attention; in fact, it posed a seemingly laughable proposition. The authors suggest that traffic could be created within well-established network protocols such as TCP/IP that would pose (as a message) a trivial arithmetic challenge—a problem to be solved—to a remote computer; the remote system would unwittingly solve the problem while parsing the message and preparing a response. But why would anyone waste time casting riddles at emotionless machines? What could one gain from this? Wouldn’t it be as much fun to solve them yourself? Of course, the answer is quite interesting.

First, there is a business to solving puzzles with a computer: much of today’s cryptography is based on the relative difficulty of solving a set of so-called non-polynomial[33] (NP) problems. NP-complete problems seem to take pleasure in crashing every codebreaker’s party at the least opportune times. The ability to solve them efficiently—whether with enormous computing power, clever algorithms, or both—would likely take a lucky inventor one step closer to world domination. There’s the incentive, then, but how would one do it?

The method proposed in the research is quite novel. The paper first states that many NP problems in mathematics can be easily expressed in terms of Boolean satisfiability (SAT) equations. SAT equations represent these problems as Boolean logic operations, effectively constructing a sequence of parameters and variables (a Boolean formula). A classic example of an SAT formula might be

P = (x1 XOR x2) AND (~x2 AND x3)

Here, P is the formula (problem) itself, and x1to x3 are binary inputs, or parameters.

Although there are 23 possible combinations of values for x1, x2, and x3, only one of them makes P true: x1 = 1, x2 = 0, x3 = 1. Hence, we say that only this triplet is a solution to P. Finding solutions to SAT problems boils down to determining a set of values for all variables in the equation, for which the whole formula that incorporates those variables has a logic value of truth. Although trivial SAT problems like the one shown earlier are easy to solve, even without invoking any solving mechanism other than trial and error, more complex multivariable cases are indeed NP complete, and, consequently, other NP problems can be reduced to SAT problems in polynomial (meaning sane) time.

And here lies the problem. We can formulate a hard NP problem in terms of SAT, but this does not buy us much. As of this writing, when it comes to a non-trivial equation, even the best SAT-solving algorithms known aren’t much more effective than a brute-force search whereby all possibilities are tried, and the value of the formula is evaluated for each possibility. This means that if we have a SAT problem and enough computing power to even consider approaching it, attempting a solution using brute force is not such an insane approach, and we would not get much further by with a more sophisticated one. Anyway, there’s not much to lose by trying.

And here’s the revelation that binds SAT problems and TCP/IP networking. The basic observation made by the researchers is fairly obvious (or should be, if you subscribe to Nature): the checksumming algorithm of TCP (or IP), as discussed in Chapter 9, although in principle designed for a wholly different purpose than solving equations, is nothing more than a set of Boolean operations subsequently performed on bits of the input message. After all, at the low level, the algorithm boils down to pure Boolean logic carried out on words of the transmitted packet. They conclude that, by providing specific contents of the packet (“input”), the remote system can thus be forced to carry out a set of arithmetic operations and then evaluate its correctness—its agreement with the checksum declared in the TCP or IP header.

Although the operation performed by the remote system during the checksumming process is in every single iteration exactly the same, it has a functionality sufficient to serve as a universal logic gate, a mechanism we remember from Chapter 2. By interleaving the actual tested input with carefully chosen “control” words that invert or otherwise alter the partial checksum computed thus far, it is possible to carry out any Boolean operation.

This, in turn, means that SAT logic can be easily re-created using a specific sequence control and “input” bits in a packet once the data is exposed to a checksumming algorithm; equation variables (chosen this or the other way) are interleaved with fixed words that are used to transmogrify the current checksum value so that the outcome of the next operation mimics a specific Boolean operator. The final result—the value to which a packet sums—denotes the final outcome: the logic value of a formula to be evaluated.

Thus, the satisfiability test is quite accidentally carried out by the remote recipient when, upon arrival, it attempts to validate the checksum. If the checksum comes out as 1 (or as some other value that in our SAT computation system corresponds to an SAT statement evaluating true), it passes the satisfiability test for the variable values chosen for this particular packet (and the traffic is passed to higher layers and acted upon). If the checksum fails, the formula has not been satisfied, and the packet is dropped silently. In other words, if our input bits denoted a specific hypothesis, the recipient had either verified it or proved it wrong, taking different actions depending on the outcome.

Further, a party wanting to solve an SAT problem quickly can prepare a set of all possible combinations of variable values (inputs) for a given formula, interleave it with information that causes the inputs to combine with others in the most desirable way, stuff this information into TCP packets, and send them out (nearly in parallel) to a large number of hosts around the globe. The checksum for a packet would be set manually to a value we know the “hypothesis” would produce if proven true, instead of actually calculating it. Only hosts that receive packets with variable values for which the formula evaluates to the desired value would respond to the traffic; other systems would simply disregard such traffic as corrupted due to the checksum mismatch. The sender can thus determine the correct solution without performing massive computations and can simply look up the set of values used in packets sent to those hosts that replied to a request.

The research goes further and reports on a successful attempt to solve an NP problem using real-world hosts across the globe, thus providing not only theoretical background, but also actual confirmation of the approach.

The impact of this technique is quite subtle, but also important: it proves that it is possible to effectively “outsource” computations to unaware and unwilling remote parties on the network, including sets of operations needed to solve real-world computing problems, without actually attacking these systems, taking them over, installing malicious software, or otherwise interfering with legitimate tasks. One person can thus, effectively, divide a specific computational task among a large number of systems. In the process, they can consume only a tiny and negligible fraction of a system’s computing power that could nevertheless add up to the equivalent of a decent supercomputer, when millions of systems work on a problem together.

World domination at hand? Not so fast.



[33] In complexity theory, polynomial problems can be solved by a Turing machine in time that is polynomially proportional to input length (number or size of variables for which the answer must be found). This means that the time needed to solve a polynomial problem corresponds directly to the input length raised to a constant exponent, which can be zero (causing the time not to depend on input length at all, as with testing for parity). Non-polynomial (NP) problems have no known solutions of this nature and may require dramatically more time to solve as the input length increases, exhibiting, for example, exponential dependency. A subset of NP problems, known as NP complete, are proven to have no polynomial time solutions. NP problems are generally regarded as “hard” for nontrivial inputs, whereas P problems are less expensive to solve.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.141.6