Chapter 2. Encryption

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2. Encryption

2.1. Pure White

In the early years of the 21st century, Pinnacle Paint was purchased by the MegaGoth marketing corporation in a desperate attempt to squeeze the last bit of synergy from the world. The executives of MegaGoth, who were frantic with the need to buy something they didn't already own so they could justify their existence, found themselves arguing that the small, privately owned paint company fit nicely into their marketing strategy for dominating the entertainment world.

Although some might argue that people choose colors with their eyes, the executives quickly began operating under the assumption that people purchased paint that would identify them with something. People wanted to be part of a larger movement. They weren't choosing a color for a room, they were buying into a lifestyle—how dare they choose any lifestyle without licensing one from a conglomerate? The executives didn't believe this, but they were embarrassed to discover that their two previous acquisitions targets were already owned by MegaGoth. Luckily, their boss didn't know this either when he gave the green light to those projects. Only the quick thinking of a paralegal saved them from the disaster of buying something they already owned and paying all of that tax.

One of the first plans for MegaGoth/Pinnacle Paints is to take the standard white paint and rebottle it in new and different product lines to target different demographic groups. Here are some of Megagoth's plans:

Moron and Moosehead's Creative Juice What would the two lovable animated characters paint if they were forced to expand their creativity in art class? Moron might choose a white cow giving milk in the Arctic for his subject. Moosehead would probably try to paint a little lost snowflake in a cloud buffeted by the wind and unable to find its way to its final destination: Earth.
Empathic White White is every color. The crew of “Star Trek: They Keep Breeding More Generations” will welcome Bob, the “empath,” to the crew next season. His job is to let other people project their feelings onto him. Empathic White will serve the same function for the homeowner as the mixing base for many colors. Are you blue? Bob the Empath could accept that feeling and validate it. Do you want your living room to be blue? That calls for Empathic White. Are you green with jealousy? Empathic White at your service.
Fright White MegaGoth took three British subjects and let them watch two blood-draining horror movies from the upcoming MegaGoth season. At the end, they copied the color of the subject's skin and produced the purest white known to the world.
Snow White A cross-licensing product with the MegaGoth/Disney division ensures that kids in their nursery won't feel alone for a minute. Those white walls will be just another way to experience the magic of movie produced long ago when Disney was a distinct corporation.
White DwarfWhite The crew of “Star Trek” discovers a White Dwarf star and spends an entire episode orbiting it. But surprise! The show isn't about White Dwarf stars qua White Dwarfs, it's really using their super-strong gravitational fields as a metaphor for human attraction. Now, everyone can wrap themselves in the same metaphor by painting their walls with White DwarfWhite.

2.2. Encryption and White Noise

Hiding information is a tricky business. Although the rest of this book will revolve around camouflaging information by actually making the bits look like something else, it is a good idea to begin with examining basic encryption.

Standard encryption functions like AES or RSA hide data by making it incomprehensible. They take information and convert it into total randomness or white noise. This effect might not be a good way to divert attention from a file, but it is still an important tool. Many of the algorithms and approaches described later in the book perform best when they have a perfectly random source of data.

Encrypting a file before applying any of the other approaches is a good beginning, but it doesn't complete the picture. Sometimes too much randomness can stick out like a sore thumb. Chapter 17 describes several algorithms that can flag images with hidden information by relying on statistical tests that measure, often indirectly, the amount of randomness in the noise. A file that seems too random stands out because the noise generated by many digital cameras isn't as random as it might seem.

The trick is to use some extra processing to add a bit of statistical color to the data before it is introduced. Chapters 6 and 7 describe some solutions. Others involve mixing in the hidden message in a way that doesn't distort the statistical profile of the data.

The world of cryptography began attempting to produce perfect white noise during World War II. This is because Claude Shannon-Claude E. Shannon, a mathematician then working for Bell Labs, developed the foundations of information theory that offered an ideal framework for actually measuring information.

Most people who use computers have a rough idea about just how much information there is in a particular file. A word processing document, for instance, has some overhead and about one byte for each character– a simple equation that doesn't seem to capture the essence of the problem. If the number of bytes in a computer file is an accurate measurement of the information in it, then there would be no way that a compression program could squeeze files to be a fraction of the original size. Real estate can't be squeezed and diamonds can't be smooshed, but potato chips always seem to come in a bag filled with air. That's why they're sold by weight not volume. The success of compression programs like PKZIP or Stuffit means that measuring a file by the number of bytes is like selling potato chips by volume.

Compression is discussed in Chapter 5.

Shannon's method of measuring information “by weight” rests on probability. He felt a message had plenty information if you couldn't anticipate the contents, but it had little information if the contents were easy to predict. A weather forecast in Los Angeles doesn't contain much information because it is often sunny and 72 degrees Fahrenheit. A weather forecast in the Caribbean during hurricane season, though, has plenty of potential information about coming storms that might be steaming in.

Shannon measured information by totaling up the probabilities. A byte has 8 bits and 256 different possible values between 00000000 and 11111111 in base 2. If all of these possible values occur with the same probability, then there are said to be 8 bits of information in this byte. On the other hand, if only two values like 00101110 and 10010111 happen to appear in a message, then there is only one bit of information in each byte. The two values could be replaced with just a 0 and a 1 and the entire file would be reduced to one-eighth the size. The number of bits of information in a file is called, in this context, its entropy.

Shannon also provided a precise formula for measuring the size of information, a topic found later in Section 2.3. This measurement of information offered some important insights to cryptographers. Mathematicians who break codes rely on deep statistical analysis to ferret out patterns in files. In English, the letter “q” is often followed by the letter “u” and this pattern is a weak point that might be exploited by attackers trying to get at the underlying message. A good encryption program would leave no such patterns in the final file. Every one of the 256 possible values of a byte would occur with equal probability. It would seem to be filled chock-full with information.

One-time pads are an encryption system that is a good example of the basic structure behind information theory. The one-time pad received its name because spies often carried pads of random numbers that served as the encryption key. They would use each sheet once and then dispose of it.

A secret can be split into parts using an extension of one-time pads described on page 58.

A one-time pad can be built by using a standard method of encryption. Assume for the moment that a key is just a number like 5 and a message consists of all uppercase letters. To encrypt a letter like “C” with a key number like 5, count over five letters to get “H”. If the counting goes past “Z” at the end of the alphabet, simply go back to “A” and keep going. The letter “Y” encrypted with the key number 6 would produce “E”. To decrypt work backward.

Here is a sample encryption:

In this case, the key is the five numbers 9, 0, 2, 1, and 0. They would constitute the one-time pad that encrypted this message. In practice, the values should be as random as possible. A human might reveal some hidden short circuits in its brain.^[1]

¹ Or the limitations of creativity brought on by too much television.

Shannon proved that a one-time pad is an unbreakable cipher because the information in the final file is equal to the information in the key. An easy way to see why this is true is to break the message, “QENMO” from above. Any five-letter word could be the underlying message because any key is possible. The name, “BRUNO”, for instance, would have generated “QENMO” if the key numbers were 15, 13, 19, 25, and 0. If all possibilities are available, then the attacker can't use any of the information about English or the message itself to rule out solutions. The entropy of the message itself should be greater than or equal to the entropy in the key. This is certainly the case here because each byte of the message could be any value between 0 and 255 and so could the key. In practice, the entropy of the key would be even greater because the distribution of the values in the message would depend on the vagaries of language while the key can be chosen at random.

A real one-time pad would not be restricted to uppercase characters. You could use a slightly different encryption process that employed all 256 possible values of a byte. One popular method is to use the operation known as exclusive-or (XOR), which is just addition in the world of bits. (0 + 0 = 0, 0 + 1 = 1, and 1 + 1 = 0 because it wraps around.) If the one-time pad consists of bytes with values between 0 and 255 and these values are evenly distributed in all possible ways, then the result will be secure. It is important that the pad is not used again because statistical analysis of the underlying message can reveal the key. The United States was able to read some crucial correspondence between Russia and its spies in the United States during the early Cold War because the same one-time pad was reused. [Age95] The number of bits in the key was now less than the number of bits of information in the message, and Shannon's proof that the one-time pad is a perfect encryption no longer holds.

The one-time pad is an excellent encryption system, but it's also very impractical. Two people who want to communicate in secret must arrange to securely exchange one-time pads long before they need to start sending messages. It would not be possible, for instance, for someone to use their WWW browser to encrypt the credit card numbers being sent to a merchant without exchanging a one-time pad in person. Often, the sheer bulk of the pad makes it too large to be practical.

Many people have tried to make this process more efficient by using the same part of the pad over and over again. If they were encrypting a long message, they might use the key 90210 over and over again. This makes the key small enough to be easily remembered, but it introduces dangerous repetition. If the attackers are able to guess the length of the key, they can exploit this pattern. They would know in this case that every fifth letter would be shifted by the same amount. Finding the right amount is often trivial and it can be as easy as solving a crossword puzzle or playing Hangman.

2.2.1. DES and Modern Ciphers

There are many different encryption functions that do a good job of scrambling information into white noise. One of the once practical and secure encryption algorithms still in use today is the Data Encryption Standard (DES) developed by IBM in the 1970s. The system uses only 56 bits of key information to encrypt 64-bit blocks of data. Today, the number of the bits in the key is considered too small because some computer scientists have assembled computers that can try all 2⁵⁵ possible keys in about 48 hours. [Fou98] Newer machines can search all of the keys even faster.

One of the newest and most efficient replacement for DES is the Advanced Encryption Standard, an algorithm chosen by the U.S. government after a long, open contest. The algorithm, Rijndael, came from Joan Daemen and Vincent Rijmen, and narrowly defeated four other highly qualified finalists.^[2] [DR00, DR01]

² Daemen and Rijmen suggest pronouncing the name: “Reign Dahl”, “Rain Doll”, or “Rhine Dahl”.

The basic design of most modern ciphers like DES and Rijndael was inspired, in part, by some other work of Claude Shannon in which he proposed that encryption consists of two different and complementary actions: confusion and diffusion. Confusion consists of scrambling up a message or modifying it in some non-linear way. The one-time pad system above confuses each letter. Diffusion involves taking one part of the message and modifying another part so that each part of the final message depends on many other parts of the message. There is no diffusion in the one-time pad example because the total randomness of the key made it unnecessary.

DES consists of sixteen alternating rounds of confusion and diffusion. There are 64 bits that are encrypted in each block of data. These are split into two 32-bit halves. First, one half is confused by passing it through what is called an “S-box.” This is really just a random function that is preset to scramble the data in an optimal way. Then these results are combined with the key bits and used to scramble the other half. This is the diffusion because one half of the data is affecting the other half. This pattern of alternating rounds is often called a Feistel network.

The alternating rounds would not be necessary if a different S-box were used for each 64-bit block of the message. Then the cipher would be the equivalent of a one-time pad. But that would be inefficient because a large file would need a correspondingly large set of S-boxes. The alternating rounds are a compromise designed to securely scramble the message with only 64 bits.

The confusion and diffusion functions were designed differently. Confusion was deliberately constructed to be as nonlinear as possible. Linear functions, straight lines, are notoriously easy to predict. The results don't even come close.

Creating a nonlinear S-box is not an easy process. The original technique was classified, leading many to suspect that the U.S. government had installed a trap door or secret weakness in the design. The recent work of two Israeli cryptographers, Eli Biham and Adi Shamir, however, showed how almost linear tendencies in S-boxes could be exploited to break a cipher like DES. Although the technique was very powerful and successful against DES-like systems, Biham and Shamir discovered that DES itself was optimally designed to resist this attack.

The diffusion function, on the other hand, was limited by technology. Ideally, every bit of the 64-bit block will affect the encryption of any other bit. If one bit at the beginning of the block is changed, then every other bit in the block may turn out differently. This instability ensures that those attacking the cipher won't be able to localize their effort. Each bit affects the others.

Figure 2.1 shows how one half of the data encrypts the other half. Alternating which half scrambles the other is a good way to ensure that the contents of one half affect the other. The diffusion in DES is even more subtle. Although the information in one half would affect the other after only one round, the bits inside the halves wouldn't affect each other quite as quickly. This part of the book does not go into the design of the S-boxes in detail, but the amount of scrambling was limited by the technology available in the mid-1970s when the cipher was designed. It takes several rounds of this process to diffuse the information thoroughly.

Figure 2.1. A schematic view of one round of DES. 64 bits enter and are split into two 32-bit halves. The left half is scrambled up with the key using the S-boxes. This result is then mixed in with the right half and the result of adding these two together becomes the new left half. The new right half is just a copy of the old left half.

Figure 2.2 shows one of the eight S-boxes from DES. It is simply a table. If the input to the S-box is 000000 then the output is 1110. This is the most basic form of scrambling and it is fairly easy to reverse. The S-box takes 6 bits as input to implement diffusion. The 32 bits of one half are split into eight 4-bit blocks. Each of the 4-bit blocks then grabs one bit from the block to the left and one bit from the block to the right. That means that each 4-bit block influences the processing of the adjacent 4-bit block. This is how the bits inside each of the halves affect each other.

Figure 2.2. This table shows how the first DES S-box converts 6-bit values into 4-bit ones. Note that a change in one input bit will generally change two output bits. The function is also nonlinear and difficult to approximate with linear functions.

This is already too much detail for this part of the book. The rest of DES is really of more interest to programmers who actually need to implement the cipher. The important lesson is how the designers of DES chose to interleave some confusion functions with some diffusion functions to produce incomprehensible results.

The best way to judge the strength of an encryption system like DES is to try to break it. Talking about highly technical things like code breaking at a high level can be futile because the important details can often be so subtle that the hand-waving metaphors end up flying right over the salient fact. Still, a quick sketch of an attack on the alternating layers of confusion and diffusion in DES can give at least an intuitive feel for why the system is effective.

Imagine that you're going to break one round of DES. You have the 64 bits produced by one step of confusion and one step of diffusion. You want to reconstruct the 64 bits from the beginning and determine the 56 key bits that were entered. Since only one round has finished, you can immediately discover one half of the bits. The main advantage that you have is that not much diffusion has taken place. Thirty-two bits are always unchanged by each round. This makes it easier to determine if the other half could come from the same file. Plus, these 32 bits were also the ones that fed into the confusion function. If the confusion process is not too complicated, then it may be possible to run it in reverse. The DES confusion process is pretty basic, and it is fairly straightforward to go backward. It's just a table lookup. If you can guess the key or the structure of the input, then it is simple.

Now imagine doing the same thing after 16 rounds of confusion and diffusion. Although you can work backward, you'll quickly discover that the confusion is harder to run in reverse. After only one round, you could recover the 32 bits of the left half that entered the function. But you can't get 32 bits of the original message after 16 rounds. If you try to work backward, you'll quickly discover that everything is dependent on everything else. The diffusion has forced everything to affect everything else. You can't localize your search to one 4-bit block or another because all of the input bits have affected all of the other bits in the process of the 16 rounds. The changes have percolated throughout the process.

Rijndael is similar in theme to DES, but much more efficient for modern CPUs. The S-boxes from DES are relatively simple to implement on custom chips, but they are still complicated to simulate with the general purpose CPUs used in most computers. The confusion in AES is accomplished by multiplying by a polynomial and the diffusion occurs when the subblocks of the message block are scrambled. This math is much more basic than the complex S-boxes because the general-purpose CPUs are designed to handle basic arithmetic.

The other four AES finalists can also be shoehorned into this model of alternating rounds of confusion and diffusion. All of them are considered to be quite secure which means they all provide more randomization.

2.2.2. Public-Key Encryption

Public-key encryption systems are quite different from the popular private-key encryption systems like DES. They rely on a substantially different branch of mathematics that still generates nice, random white noise. Even though these foundations are different, the results are still the same.

The most popular public-key encryption system is the RSA algorithm that was developed by Ron Rivest, Adi Shamir, and Len Adleman when they were at MIT during the late 1970s. Ron Rivest, Adi Shamir, and Len Adleman The system uses two keys. If one key encrypts the data, then only the other key can decrypt it. After the encryption, first key becomes worthless It can't decrypt the data. This is not a bug, but a feature. Each person can create a pair of keys and publicize one of the pair, perhaps by listing it in some electronic phone book. The other key is kept secret. If someone wants to send a message to you, they look up your public key and use it to encrypt the message to you. Only the other key can decrypt this message now and only you have a copy of it.

In a very abstract sense, the RSA algorithm works by arranging the set of all possible messages in a long, long loop in an abstract mathematical space. The circumference of this loop, call it n, is kept a secret. You might think of this as a long necklace of pearls or beads. Each bead represents a possible message. There are billions of billions of billions of them in the loop. You send a message by giving someone a pointer to a bead.

The public key is just a relatively large number, call it k. A message is encrypted by finding its position in the loop and stepping around the loop k steps. The encrypted message is the number at this position. The secret key is the circumference of the loop minus k. A message is decrypted by starting at the number marking the encrypted message and marching along the n − k steps. Because the numbers are arranged in a loop, this will bring you back to where everything began– the original message.

Two properties about this string of pearls or beads make it possible to use it for encryption. The first is that given a bead, it is hard to know its exact position on the string. If there is some special first bead that serves as the reference location like on a rosary, then you would need to count through all of the beads to determine the exact location of one of the beads. This same effect happens in the mathematics. You would need to multiply numbers again and again to determine if a particular number is the one you want.

The second property of the string of beads in this metaphor does not make as much sense, but it can still be easily explained. If you want to move along the string k beads, then you can jump there almost instantaneously. You don't need to count each of the k beads along the way. This allows you to encrypt and decrypt messages using the public-key system.

The two special features are similar but they do not contradict each other. The second says that it is easy to jump an arbitrary number of beads. The first says it's hard to count the number of pearls between the first bead and any particular bead. If you knew the count, then you could use the second feature. But you don't so you have to count by hand.

The combination of these two features makes it possible to encrypt and decrypt messages by jumping over large numbers of beads. But it also makes it impossible for someone to break the system because they can't determine the number of steps in the jump without counting.

This metaphor is not exactly correct, but it captures the spirit of the system. Figure 2.3 illustrates it. Mathematically, the loop is constructed by computing the powers of a number modulo some other number. That is, the first element in the loop is the number. The second is the square of the number, the third is the cube of the number, and so on. In reality, the loop is more than one-dimensional, but the theme is consistent.

Figure 2.3. RSA encryption works by arranging the possible messages in a loop with a secret circumference. Encryption is accomplished by moving a random amount, k, down the loop. Only the owners know the circumference, n, so they can move n − k steps down the loop and recover the original message.

2.2.3. How Random Is the Noise?

How random is the output of a encryption function like DES or RSA? Unfortunately, the best answer to that question is the philosophical response, “What do you mean by random?” Mathematics is very good at producing consistent results from well-defined questions, but it has trouble accommodating capricious behavior.

At the highest level, the best approach is indirect. If there was a black box that could look at the first n bits of a file and predict the next set of bits with any luck, then it is clear that the file is not completely random. Is there such a black box that can attack a file encrypted with DES or AES? The best answer is that no one knows of any black box that will do the job in any reasonable amount of time. A brute-force attack is possible, but this requires a large machine and some insight into the structure of the encrypted file. So we could argue that the results of DES or AES should appear random because we can't predict them successfully.[Way92, Fou98]

The same arguments also hold for RSA. If there was some black box that could take a number and tell you where it stood in the loop, then you would be able to break RSA. If the input doesn't fall in a pattern, then the output should be very random. If there was some way of predicting it, then that could be used to break RSA. Of course, the bits coming out of a stream of RSA-encrypted values are not perfectly random, at least at the level of bits. The values in the output are all computed modulo n so they are all less than n. Since n is not a power of 2, some bits are a little less likely.

Even if the values can't be predicted, they still might not be as random looking as we might want. For instance, an encrypted routine might produce a result that is uncrackable but filled with only two numbers like 7 and 11. The pattern might be incomprehensible and unpredictable, but you still wouldn't want to use the source as the random number generator for your digital craps game. One immediate clue is that if the 7 and the 11 occur with equal probability, then the entropy of such a file is clearly 1 bit per number.

It is easy to construct a high-level argument that this problem will not occur with DES. All possible output values should be produced with equal probability. Why? Because DES can be decoded successfully. 64 bits go into DES and 64 bits go out. Each possible output can have only one matching input and vice versa. Therefore each possible output can be produced.

The same argument also holds for RSA. The loop contains a number for each of all possible messages and these numbers are distributed around the loop in a way that we can't invert. Therefore, each output value has practically the same probability of emerging from the function.

Although these two arguments don't prove that the output from an encryption function is random, they do suggest that DES and RSA will pass any test that you can throw at them. If a test is good enough to detect a pattern, then it would be a good lever for breaking the code. In practice, the simple tests support these results. The output of DES is quite random.^[3] Many tests show that it is a good way to “whiten” a random number source to make it more intractable. For instance, some people experiment with using a random physical process like counting cosmic rays to create random numbers. However, there might be a pattern caused by the physics of the detector. A good way to remove this possibility is to use DES to encrypt the random data and produce the whitest noise possible.

³ The level of randomness depends on the input file if there is no key feedback mechanism being used. In some versions of DES, the results of one block are XORed with the inputs for the next block so that there will be diffusion across the blocks. If this is not used, someone could input a file with a pattern and get out a file with a pattern as long as the pattern repeats in an even multiple of 8 bytes.

2.3. Measuring and Encrypting Information

Information is a slippery notion. Just how big is a fact? How much data must be accumulated before you have a full-fledged concept? None of these questions are easy to answer, but there are approximations that help with digital data. Shannon's measure of information is closely tied to probability and randomness. In a sense, information is defined by how much randomness it can remove. Our goal is to harness randomness and replace it with a hidden message. Knowing the size, length, depth or breadth of our target is a good beginning.

Let an information stream be composed of n characters between x⁰ and xⁿ⁻¹ that occur in the stream with probability ρ(xⁱ). Shannon's measure of the entropy in the information stream, that is the number bits per character, can be written:

The log is taken base two.

If a stream is made up of bytes with values between 0 and 255 and every byte value occurs with equal probability of

, then the entropy of the stream is 8 bits per byte. If only two bytes, say 43 and 95, each occur half of the time and the other 254 bytes don't occur at all, the entropy of this stream is only 1 bit per byte. In this basic example, it should be obvious how the bit stream can be compressed by a factor of 8 to 1 bit per character. In more complex examples, the entropy is still a good rough measure of how well a basic compression algorithm will do.

The limitations of Shannon's measure of information are pretty obvious. An information stream that repeats the bytes 0, 1, 2,…, 254, 255, 0, 1… ad infinitum would appear to contain 8 bits of information per byte. But, there really isn't that much information being conveyed. You could write a short two-line program in most computer languages that would duplicate the result. This computer program could stand in for this stream of information and it would be substantially cheaper to ship this program across the network than it would be to pay for the cost of sending an endless repeat stream of bytes.

In a sense, this repeating record computer program is a good compressed form of the information. If the data was potato chips, you would hope that it was measured by the number of lines in a computer program that could generate it, not the Shannon entropy. There is another measure of information known as the Kolmogorov complexity that attempts to measure the information by determining the size of the smallest program that could generate the data. This is a great theoretical tool for analyzing algorithms, but it is entirely impractical. Finding the smallest program is both theoretically and practically impossible because no one can test all possible programs. It might be a short program in C, but how do we know the length in Pascal, Smalltalk, or a language that no one has written yet?

The Shannon measure of information can be made more complicated by including the relationship between adjacent characters:

ρ(xⁱ|x^j) means the probability that xⁱ follows x^j in the information stream. The sum is computed over all possible combinations. This measure does a good job of picking up some of the nature of the English language. The occurrence of a letter varies significantly. “h” is common after a “t” but not after a “q”. This measure would also pick up the pattern in the example of 0, 1, 2,…, 255, 0, 1,…

But there are many slightly more complicated patterns that could be generated by a computer program yet confound this second-order entropy calculation. Shannon defined the entropy of a stream to include all orders up to infinity. Counting this high may not be possible, but the higher order terms can usually be safely ignored. While it may be practical to compute the first- or second-order entropy of an information stream, the amount of space devoted to the project obviously becomes overwhelming. The number of terms in the summation grows exponentially with the order of the calculation. Shannon created several experimental ways for estimating the entropy, but the limits of the model are still clear.

2.3.1. RSA Encryption

The section “Encryption and White Noise” on page 20 described RSA encryption with the metaphor of a long circle of beads. Here are the equations. The system begins with two prime numbers p and q. Multiplying p and q together is easy, but no one knows of an efficient way to factor n = pq into its components p and q if the numbers are large (i.e., about 1024 to 2048 bits).

This is the basis of the security of the system. If you take a number x and compute the successive powers of x, then x^ϕ(n) mod pq = x.^[4] That is, if you keep multiplying a number by x modulo pq, then it returns to x after ϕ(pq) + 1 steps.

⁴x mod y means the remainder after x is divided by y. So 9 mod 7 is 2, 9 mod 3 is 0.

A message is encrypted by treating it as the number x. The sender encrypts the number x by multiplying it by itself e times, that is computing x^e mod pq. The receiver decrypts the message by multiplying it by itself d times, that is computing (x^e)^d mod pq. If d × 3 = ϕ(x), then the result will be x.

This ϕ(n) is called the Euler Totient function and it is the number of integers less than n that are relatively prime to n. If n is a prime number then ϕ(n) is n − 1 because all of the integers less than n are relatively prime to it. The values are commutative so ϕ(pq) = ϕ(p)ϕ(q). This means that ϕ(pq) = pq − p − q + 1. For example, ϕ(15) = 8. The numbers 1, 2, 4, 7, 8, 11, 13 and 14 are relatively prime to 15. The values 3, 5, 6, 9, 10 and 12 are not.

Calculating the value of ϕ(pq) is easy if you know both p and q, but no one knows an efficient way to do it if you don't. This is the basis for the RSA algorithm. The circumference of this string of pearls or beads is ϕ(pq). Moving one pearl or bead along the string is the equivalent of multiplying by x.

Neal Koblitz's book, [Kob87], gives a good introduction to finding this inverse.

The two keys for the RSA are chosen so they both multiply together to give 1 modulo ϕ(pq). One is chosen at random and the other is calculated by finding the inverse of it. Call these e and d where de = 1 mod ϕ(pq). This means that:

This can be converted into an encryption system very easily. To encrypt with this public key, calculate x^e mod pq. To decrypt, raise this answer to the d power. That is, compute:

This fulfills all of the promises of the public-key encryption system. There is one key, e, that can be made public. Anyone can encrypt a message using this value. No one can decrypt it, however, unless they know d. This value is kept private.

The most direct attack on RSA is to find the value of ϕ(pq). This can be done if you can factor pq into p and q.

Actually implementing RSA for encryption requires attention to a number of details. Here are some of the most important ones in no particular order:

Converting Messages into Numbers Data is normally stored as bytes. RSA can encrypt any integer that is less than pq. So there needs to be a solid method of converting a collection of bytes into and out of integers less than pq. The easiest solution is to glue together bytes until the string of bytes is a number that is greater than pq. Then remove one byte and replace it with random bits so that the value is just less than pq. To convert back to bytes, simply remove this padding. The equations here make it easy to describe RSA, but they aren't enough to make it easy to build a working implementation. Dan Boneh, Antoine Joux, and Phong Q. Nguyen found major weaknesses in naive solutions for converting a message into a number. [BJN00]
Fast Modulo Computation Computing x^e mod pq does not require multiplying x together e times. This would be prohibitive because e could be quite large. An easier solution is to compute x, x² mod pq, x⁴ mod pq, x⁸ mod pq,… That is, keep squaring x. Then choose the right subset of them to multiply together to get x^e mod pq. This subset is easy to determine. If the i^th bit of the binary expansion of e is 1, then multiply in x^2ⁱ mod pq into the final answer. [BFHMV84], [Bri82], [Mon85], and [QC82] discuss efficient multiplication algorithms.
Finding Large Prime Numbers The security of the RSA system depends on how easy it is to factor pq. If both p and q are large prime numbers, then this is difficult. Identifying large prime numbers as luck would have it, is pretty easy to do. There are a number of tests for primality that work quite well. The solution is to choose a large, odd number at random and test it to see if it is prime. If it isn't, choose another. The length of time it takes to find a prime number close to an integer x is roughly proportional to the number of bits in x. The Lehman test [Leh82] is a good way to determine if n is prime. To do so, choose a random number a and compute a^(n−1)/2 mod n. If this value is not 1 or −1, then n is not prime. Each value of a has at least a 50% chance of showing up a nonprime number. If we repeat this test m times, then we're sure that we have a 1 in 2^m chance that n is not prime, but we haven't found an a that would prove it yet. Making m = 100 is a good starting point. It is not absolute proof, but it is good enough.

RSA encryption is a very popular algorithm used for public-key encryption. There are also a large number of other algorithms that are available. The discussion of these variants is beyond the scope of this book. Both Bruce Schneier's book, [Sch94], and Gus Simmons' book [ed.92] offer good surveys.

2.4. Summary

Pure encryption algorithms are the best way to convert data into white noise. This alone is a good way to hide the information in the data. Some scientists, for instance, encrypt random data to make it even more random. Encryption is also the basis for all of the other algorithms used in steganography. The algorithms that take a block of data and hide it in the noise of an image or sound file need data that is as close to random as possible. This lowers the chance that it can be detected.

Of course, nothing is perfect. Sometimes data that is too random can stick out too. Chapter 17 describes how to find hidden information by looking for values that are more random than they should be.

The Disguise Good encryption turns data into white noise that appears random. This is a good beginning for many algorithms that use the data as a random source to imitate the world.
How Secure Is It? The best new encryption algorithms like Rijndael and the other four AES finalists have no practical attack known to the public. These algorithms are designed and evaluated on their ability to resist attack. DES is no longer very secure for serious applications.
How to Use It? Encryption code can be downloaded from a number of places on the Net.

Table of Contents for
Chapter 2. Encryption

Chapter 2. Encryption

2.1. Pure White

2.2. Encryption and White Noise

2.2.1. DES and Modern Ciphers

Figure 2.2. This table shows how the first DES S-box converts 6-bit values into 4-bit ones. Note that a change in one input bit will generally change two output bits. The function is also nonlinear and difficult to approximate with linear functions.

2.2.2. Public-Key Encryption

Figure 2.3. RSA encryption works by arranging the possible messages in a loop with a secret circumference. Encryption is accomplished by moving a random amount, k, down the loop. Only the owners know the circumference, n, so they can move n − k steps down the loop and recover the original message.

2.2.3. How Random Is the Noise?

2.3. Measuring and Encrypting Information

2.3.1. RSA Encryption

2.4. Summary

Further Reading

Table of Contents for Chapter 2. Encryption

Create new playlist

Sign In

Sign Up

Chapter 2. Encryption

2.1. Pure White

2.2. Encryption and White Noise

2.2.1. DES and Modern Ciphers

Figure 2.2. This table shows how the first DES S-box converts 6-bit values into 4-bit ones. Note that a change in one input bit will generally change two output bits. The function is also nonlinear and difficult to approximate with linear functions.

2.2.2. Public-Key Encryption

2.2.3. How Random Is the Noise?

2.3. Measuring and Encrypting Information

2.3.1. RSA Encryption

2.4. Summary

Further Reading

Table of Contents for
Chapter 2. Encryption