Chapter 19. Synchronization

19.1. Stealing Baseball's Signs

To: Newly Signed Recruits

From: Third-Base Coach

Here are the signs for tonight's game. As usual, I will string together a sequence of bogus signs in front of the two trigger signals. After the two trigger signals, alpha and beta follow in immediate sequence. I will deliver the one sign that isn't fake. Pay attention to that one. Then I'll add in another three to seven fake signals to close out the set. Remember: Ignore all signs that don't follow the alpha and beta, sign in sequence. If you see another sign between the alpha and beta, ignore them all. If you see the beta before the alpha, ignore it. Use the alpha and beta to synchronize yourself on the sign that matters.

Sign Meaning
Touch Nose Drug tester wants to check your steroid level. Drink lots of fluids.
Grab Left Ear Someone dinged your new Escalade in the parking lot.
Grab Right Ear alpha
Grab Left Ear Your broker called with bad news about that investment in pork bellies.
Cover Eyes beta
Slap Left Hip Watch your mouth. We could lipread that last epithet.
Kick Left Foot The owner is calling your bluff. He said “No” to your salary demands.
Muss Hair We should dock your pay for that last one.
Hitch up Pants The team doctor says I should cut back on the steak dinners.

19.2. Getting In Sync

Most of the algorithms in this book depend, in one form or other, on getting the complete file. The message may be hidden in a small fraction of the file, but everything is available for extraction. If the hidden bits are supposed to be located 14 pixels from the left and 212 from the bottom, that spot can be found because the entire frame is available.

There are many cases when the entire file isn't around. Photos are routinely cropped before being reused. Only a short clip of a sound file or a movie may be available. Anyone turning on a radio at a particular moment may start receiving the signal at any arbitrary place in the stream.

Unfortunately, some of the more extreme algorithms in this book rely heavily on absolute synchronization to function correctly. Most effective encryption algorithms change their process throughout the file, encrypting the nth byte differently from the n + 1th byte. Some of the algorithms for hiding information use a subset of the data and rely heavily on the boundaries to define those pixels.

Synchronizing a bit stream between a sender and a receiver is not hard to do, but it is a bit tricky to do in an efficient manner. The simplest trick is to introduce a special synchronization character and only use it at the beginning or end of a word. The easiest example is unary or base one encoding, where a 1 is used as the boundary and the number of 0s encodes the value. So a message of 2,3,2 would look like 1001000100.

Morse code uses two different-sized pauses in the signal stream. Short pauses break up the dits and the dashes, while pauses that are three times as long break up the letters. The gap between the three dots in an S (…) should be one third the size of the gap between the last dot and the beginning of the next letter. The factor of 3 is just a convention and many people probably deviate.

Dedicating an entire character like 1 to synchronization in an algorithm is expensive. Can you imagine transmitting the yearly budget of the United States government (currently about $3.1 trillion) in unary? While the approach is basic, it's not easy to extend in the digital world, where there are only two real characters. Other characters are just composed of blocks of 0s or 1s and it's not simple to find the beginning and end of the block.

19.3. Extending Other Tools

Section 5.2.1 describes Huffman coding, a compression algorithm that will produce variable-length codes for characters and compress the bit stream by giving shorter codes to more common letters and longer codes to the rarer ones. The standard algorithm may be very good at squeezing out the extra entropy from a file, but it is extremely fragile. If only one bit is lost, the rest of the stream can be compromised.

Adding a synchronization character to a Huffman code is not as easy as creating another character and adding it the mix. Figure 19.1 is a modified version of Figure 5.1 created by adding another character, ‘*’, to the alphabet to act as a synchronizer. This seems to work. Adding the asterisk at the beginning of a word or a paragraph would be a signal. The word ‘ate’ would be encoded as ‘1111011101110’. Anyone who needs to synchronize a stream of bits could look for the four 1s in a row and then start forward from that point.

Figure 19.1. An imperfect modification of the Huffman tree from Figure 5.1. The extra synchronization character, ‘*’, is not unique.

This would seem to work. There are four 1s in the asterisk's code block, ‘1111’, and because of the shape of the tree, four 1s only occur in this asterisk. But there are three problems:

  • The rest of the tree is not guaranteed to be as short as it is in this example. This can be guaranteed by assigning the synchronization character the lowest probability, something that ensures it will be the longest code in the tree.
  • Two letters together can produce the synchronization code.The code for ‘at’ is ‘011110’. Is this the place where we should start decoding?
  • It's not clear where the code itself begins. The word ‘*tea’ would be encoded: ‘1111110111001’. Where do the four 1s end and the real message begin?

The last two problems can be solved by choosing the synchronization code carefully and tweaking the tree by adding extra values or nodes. This will decrease the efficiency but not as much as converting to unary.

The key to avoiding the third problem is choosing a synchronization code that is not susceptible to prefix inversion. That is, it's not possible to break the code block into two separate parts, A and B, such that A + B = B + A where + stands for concatenation. The block 1111 is one of the worst candidates for this because there are three different ways to break it into A and B that satisfy the equation. Setting A = 111 and B = 1 is just one of them. A better choice is something like 0111. Figure 19.2 shows a tree redrawn to avoid this problem.

Figure 19.2. Figure 19.1 after the tree is rearranged to make the synchronization code be 0111. These techniques are also used in Section 7.3.3 to scramble the tree to add more confusion to the code.

Avoiding the second problem is a bit trickier. If two adjacent characters can produce the synchronization code, then it would introduce bad synchronization characters. The solution is to add extra bits to the codes to eliminate this possibility— a process that will hurt the code's efficiency. The price, however, may be fairly low if the codes are long. Adding an extra bit to a 12-bit code isn't as bad as adding an extra one to a 3-bit code.

The trick is to ensure that the code for one character can't be broken into parts A and B such that B is a prefix for the synchronization code. So if the synchronization code is 0111, then a code that ends with 01 would be bad because it is a prefix of the synchronization code.

The codes produced by the Huffman algorithm can be made longer by adding extra characters to the end. It's not easy to add them to the beginning or the middle, but it is to add them to the end. These are wasted bits that serve no purpose except to prevent the synchronization code from appearing.

Figure 19.3 shows Figure 19.2 after the code for A has been rewritten to use an extra bit, 110. This creates a code, 111, that is never used when compressing the data. This inefficency, though, ensures that none of the other letters will combine to form the synchronization code. There is no code that will produce three 1s in a row except the synchronization code.

Figure 19.3. Figure 19.2 after the code for ‘A’ is changed from 11 to 110. Notice that there's a blank node for the code 111. This shouldn't be used.

The construction in Figure 19.3 is more of an accident produced by the fact that the tree is small and there is only one letter that ends in a 1. There are a number of similar constructions that can produce fairly efficient codes.

Imagine, for instance, that the tree will have height 2n, meaning that the longest codes will be 2n bits long. Let one of these long codes be the synchronization code created from n 0s followed by n 1s. So if n = 4, then the synchronization code would be 00001111. Let's represent this as 0414 or 0n11. Using a long code for synchronization may be a bit inefficient if you intend to include it often, but it simplifies the process of ensuring that no two letters can combine to form the synchronization character.

We can fix the codes for the remaining characters by adding the right suffixes:

  • If a code for a character ends with more than n 0s, add 101 to the code.
  • If a code for a character ends with fewer than n 0s, add 1 to the code.
  • If a code for a character ends with more than n 1s, add nothing.
  • If a code for a character ends with fewer than n 1s, add 01 to the code.

This construction may not be optimal in many cases, but it should suffice to prevent n 0s from appearing before n 1s except in the synchronization character.

There are other techniques that rearrange the tree and provide more efficient codes.

19.4. Summary

Being able to pick up a steganographic trail in the middle of a stream is a useful technique for adding watermarks and other messages when the cover file may be cropped or truncated along the way. When users cut and paste snippets of songs or parts of an image, the watermark can still survive if it's possible to synchronize the solution.

The algorithms given here are just a small selection. Synchronization is a common problem in data communications because noise can corrupt signals. Many radio devices like cell phones must be able to synchronize themselves with a network when they turn on.

  • The Disguise The algorithms given here don't add any cover to the process themselves. They just make it simpler to recover the hidden message when the disguise is chopped or truncated. A snippet can still reveal enough information.
  • How Secure Is It? This construction doesn't add any security itself, but it does offer it indirectly by making it possible to recover information from partial fragments.
  • How to Use It? If you have a short message and a long cover file, repeat the message throughout the cover file numerous. This only works on methods that don't base the encoding on the relative position in the file. Random walks that leave the steganographic method in places won't work. The algorithms should use some regular hiding algorithm that will be possible to pick up even after cropping.

Further Reading

  • S. Manoharan describes how self-synchronizing codes can solve issues created by cropping and sampling. [Man03]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.136.87.35