Chapter 15. Synthetic Worlds

15.1. Slam Dunks

The Play By Play Man and the Color Man work through a way of encoding messages in an athletic contest.

PBPM: Things are looking a bit difficult for the Montana Shot Shooters. They had a lead of 14 points at the halftime, but now the Idaho Passmakers have hit two three-point shots in a row. Whammo! They're back in the game. The Shot Shooters have called a time out to regroup.
CM: Putting six points on the board that way really sends a message. They made it look easy.
PBPM: Those two swishes announced, “We're still here. You can't beat us that easily. We've got pride, intestinal fortitude and pluck.” The emphasis is on pluck.
CM: And composure too. They whipped the ball around the key. They signaled, “We can move the rock and then send it home. We can pass the pill and force you to swallow it whole. We know basketball. This is our game too.”
PBPM: There was even a startling subtext to the message. I believe the Passmakers were telling the Shot Shooters that this game was different from the last. Yes, the Shot Shooters beat them at home by 22 points, but that was two months ago. Now, Jimmy D's leg is better. He's quicker. The injury he sustained while drinking too much in the viscinity of a slippery pool deck is behind him. The nasty, golddigging girlfriend is history. The Passmakers are reminding the Shot Shooters that a bit of cortisone is a time proven solution for knee problems, but moving on with your life and putting bad relationships behind you is an even better cure for the human heart. That's the message I think that is encoded in those three-point shots.
CM: We're back from the time out now. Let's see what the Shot Shooters can do.
PBPM: The Shot Shooters put the ball in play. Carter pauses and then passes the ball over halfcourt to Martin. He fakes left, goes right. It's a wide open lane. He's up and bam, bam, bam. That's quite a dunk. The Passmakers didn't even have a defense.
CM: Whoa. That sends a message right there. A dunk like that just screams, “You think three-point shots scare me? You think I care about your prissy little passing and your bouncy jump shots? There was no question where this ball was going. Nobody in the stands held their breath to see if it would go in. There was no pregnant pause, no hush sweeping the crowd, and no dramatic tension. This ball's destiny was the net and there was no question about it.” He's not being steganographic at all.

15.2. Created Worlds

Many of the algorithms for sound and image files revolve around hiding information in the noise. Digitized versions of the real world often have some extra entropy waiting for a signal. But advances in computer graphics and synthesis mean that the images and sound often began life in the computer itself. They were not born of the real world and all of the natural entropy constantly oozing from the plants, the light, the animals, the decay, the growth, the erosion, the wind, the rain and who knows what else. Synthetic worlds are, by definition, perfect.

At first glance, perfection is not good for hiding information. Purely synthetic images began as mathematics and this means that a mathematician can find equations to model that world. A synthetic image of a ball in the light has a perfect gradient with none of the distortions that might be found in an image of an imperfect ball made by worn machinery and lit by a mass-produced bulb powered by an overtaxed electrical system.

These regularities make it easy for steganalysis to identify images with extra, hidden information. Even slight changes to the least significant bit become detectable. The only advantage is that the increasing complexity of the models means that any detection process must also become increasingly complex too. This does provide plenty of practical cover. Better computer graphics technology is evolving faster than any algorithm for detecting the flaws. More complicated models are coming faster than we can suss them out.

If the practical limitations aren't good enough, the models for synthesizing worlds can be deputized to carry additional information. Instead of hiding the extra information of the final image or sound file, the information can be encoded during the synthesis.

There are many opportunities to hide information. Many computer graphics algorithms use random number generators to add a few bits of imperfection and the realism that comes along with them. Any of these random number streams can be hijacked to carry data.

The program MandelSteg, developed by Henry Hastur, hides information in the least significant bit of an image of the Mandelbrot Set. This synthetic image is computed to seven bits of accuracy and then the message is hidden in the eighth. See page 319.

Another source can be found in tweaking the data used to drive the synthesis, perhaps by changing the least significant bits of the data. One version of an image may put the ball at coordinates (1414,221) and another version may put it at (1413,220). A watermarked version of a movie may encode the true owner's name in the position of one of the characters and the moment they start talking. Each version of the film will have slightly different values for these items. The rightful owner could be extracted from these subtle changes.

Wolfgang Funk worked through some of the details for perturbing three-dimensional models described with polynomial curves in the NURBS standard. [Fun06] The polynomials describing the objects are specified by points along the surface known as control points. New control points can be added without significantly changing the description of the surface if they're placed in the right spots. Old control points can usually be moved a slight amount so the shape changes a bit. Hao-Tian Wu and Yiu-ming Cheung suggest using a secret key to traverse the points describing the object and then modifying their position relative to the centroid of other points. Moving one direction by a small amount encodes a 0 and moving in the other direction encodes a 1. [WmC06]

An even more complicated location to hide the information can be found by changing the physics of the model. The acoustical characteristics of the room are easy to change slightly. The music may sound exactly the same. The musicians may start playing at exactly the same time. But the size and characteristics of the echos may change just a bit.

There are many ways that the parameters used to model the physics can be changed throughout a file. The most important challenge is guaranteeing that the changes will be detectable in the image or sound file. This is not as much of a problem as it can be for other approaches. Many of the compression algorithms are tuned to save space by removing extraneous information. The locations of objects and the physical qualities of the room, however, are not extraneous. The timbre of the instruments and the acoustical character of the recording studio are also not extraneous. Compression algorithms that blurred these distinctions would be avoided, at least by serious users of music and image files.

Markus Kuhn and Ross Anderson suggest that tweaking the video signal sent to the monitor can send messages because the electron guns in the monitor emit so much electro-magnetic “noise”. [KA98]

Designing steganographic algorithms that use these techniques can be something of an art. There are so many places to hide extra bits that the challenge is arranging for them to be found. The changes should be large enough to be detected by an algorithm but small enough to escape casual detection by a human.

15.3. Text Position Encoding and OCR

Chapters 6 and 8 show how to create synthetic textual descriptions and hide information in the process. A simpler technique for hiding information in text documents is to fiddle with the letters themselves.

Matthew Kwan developed a program called Snow which hides three bits at the end of each text line by adding between 0 and 7 bits.

One of the easiest solutions is to encode a signal by switching between characters that appear to be close to each other, if not identical. The number zero, ‘0’, and the capital O are close to each other in many basic fonts. The number ‘1’ and the lower-case l are also often indistinguishable. The front cover of the Pre-proceedings of the 4th Information Hiding Workshop carried a message from John McHugh. [McH01]

If the fonts are similar, information can be encoded by swapping the two versions. Detecting the difference in printed versions can be complicated because OCR programs often use context to distinguish between the two. If the number one is found in the middle of a word made up of alphanumeric characters, the programs often will fix the perceived mistake.

If the fonts are identical, the swap can still be useful for hiding information when the data is kept in electronic form. Anyone reading the file will not notice the difference, but the data will still be extractable.

This is often taken to extremes by some members of the hacker subculture who deliberately swap vaguely similar characters. The number four (‘4’) bears some resemblance to the capital A, the number three (‘3’) looks like a reversed capital E. This technique can elude keyword searches and automated text analysis programs, at least until the spelling becomes standardized and well known. Then a document with the phrase “3L33t h4XOR5” starts to look suspicious.

Humans also provide some error correction for basic spelling and grammatical errors. A hidden message can be encoded by introducing seemingly random misspellings from time to time.[1]

1 Some might be tempted to blame me and the proofreader for any errors that crept into the text. But perhaps I was sending a secret message.

15.3.1. Positioning

Another possible solution is to simply adjust the positions of letters, words, lines and paragraphs. Typesetting is as much of an art as a job and much care can be devoted to algorithms for arranging these letters on a page. Information can always be hidden when making this decision.

The LATEX and TEX typesetting systems used in creating this book justify lines by inserting more white space after a punctuation mark than after a word. American typesetters usually put three times as much space after punctuation. The French, on the other hand, avoid this distinction and set both the same. This mechanism is easy to customize and it is possible to change the “stretchability” of white space following any character. This paragraph was typeset so the whitespace after words ending in ‘e’ or ‘r’ received three times as much space with the sfcode macro.

Changing these values throughout a document to smaller values is relatively simple to do. In many cases, detecting these changes is also relatively simple. Many ommercial OCR programs continue to make minor errors on a page, but steganographic systems using white space can often be more accurate. Detecting the size of the whitespace is often easier than sussing out the differences between the ink marks.

Jack Brassil, Steve Low, Nicholas Maxemchuk and Larry O'Gorman experimented with many techniques for introducing small shifts to the typesetting algorithms. [BO96, LMBO95, BLMO95, BLMO94] This can be easy to do with open source tools like TEX that also include many hooks for modifying the algorithms.

One of their most successful techniques is moving the individual lines of text. They successfully show that entire lines can be moved up or down one or two six-hundredths of an inch. Moving a line is easy to detect if any skew can be eliminated from the image. As long as the documents are close to horizontal alignment, the distance between the individual lines can be measured with enough precision to identify the shifted lines.

Dima Pröfrock, Mathias Schlauweg and Erika Müller suggest that actual objects in digitized video can be moved slightly to encode watermarks. [PSM06]

The simplest mechanism for measuring a line is to “flatten” it into one dimension. That is, count the number of pixels with ink in each row. Figure 15.2 shows the result of summing the intensity of each pixel in the row. The maximum value of each pixel is 255 if it is completely white. For this reason, the second row has a much smaller valley because it is shorter and made up of more white space.

Figure 15.2. A graph of the sums of the rows in Figure 15.1. White is usually assigned 255, so the short line in the middle is less pronounced.

Figure 15.1. Three lines of printed text scanned in at 400 pixels per inch.

The structure of the peaks and valleys also varies with the words. The third row has more capital letters, so there is a more pronounced valley at the beginning corresponding to the horizontal lines in the capital letters. The choice of the font also changes the shape of this graph and can even be used to identify the font. [WH94] Generally, fonts with serifs make it easier to identify the baselines than sansserif fonts, but this is not guaranteed.

More bits can be packed into each line by shifting individual words up or down a three hundredth of an inch. The detection process becomes more complicated because there is less information to use to measure the horizontal shift of the baseline. Short words are not as desireable as longer ones. In practice, it may make sense to group multiple small words together and shift them in a block.

Experiments by Jack Brassil and Larry O'Gorman show that the location of the baseline and any information encoded in it can be regularly extracted from text images even after repeated photocopying. Moving the individual lines up or down by one or two six-hundredths of an inch is usually sufficient to be detected. They do note that their results require a well-oriented document where the baselines of the text are aligned closely with the raster lines. Presumably, a more sophisticated algorithm could compensate for the error by modeling the antialiasing, but it is probably simpler to just line up the paper correctly in the first place.

15.3.2. MandelSteg and Secrets

Any image is a candidate for hiding information, but some are better than others. Ordinarily, images with plenty of variation seem perfect. If the neighboring pixels are different colors, then the eye doesn't detect subtle changes in the individual pixels. This led Henry Hastur to create a program that flips the least significant bits of a Mandelbrot set. These images are quite popular and well-known throughout the mathematics community. This program, known as MandelSteg, is available with source code from the Cypherpunks archive (ftp://ftp.csua.berkeley.edu/pub/cypherpunks/steganography/).

The manual notes that there are several weaknesses in the system. First, someone can simply run the data recovery program, GifExtract, to remove the bits. Although there are several different settings, one will work. For this reason, the author suggests using Stealth, a program that will strip away the framing text from a PGP message, leaving only noise.

There are other weaknesses. The Mandelbrot image acts as a onetime pad for the data. As with any encoding method, the information can be extracted if someone can find a pattern in the key data. The Mandelbrot set might look very random and chaotic, but there is still plenty of structure. Each pixel represents the number of iterations before a simple equation (f(z) = z2 + c) converges. Adjacent pixels often take a different number of pixels, but they are still linked by their common generating equation. For this reason, I think it may be quite possible to study the most significant bits of a fractal image and determine the location from where it came. This would allow someone to recalculate the least significant bits and extract the answer.[2]

2 David Joyce offers a Mandelbrot image generator on the Web (http://aleph0.clarku.edu/djoyce/julia/explorer.html).

15.4. Echo Hiding

Hiding information in the noise of sound files is a good solution, but the information may be erased by good compression algorithms. Daniel Gruhl, Anthony Lu, and Walter Bender suggest tweaking the basic acousitics of the room to hide information. While this can still be obscured by sufficiently strong compression, it is often more likely to withstand standard algorithms. Echoes are part of recordings and sophisticated listeners with trained ears can detect small changes in them. Good recording engineers and their compressionists try to avoid eliminating the echos in an effort to provide as much verisimilitude as possible. [GLB96]

Many recording software programs already include the ability to add (or subtract) echoes from a recording. They can also change the character of the echo by twiddling with strength of the echo and the speed at which it vanishes.

Information can be included by changing either the strength or the length of the decay. Gruhl, Lu and Bender report success with encoding a single bit by changing the length of time before the echo begins. A one gets a short wait (about .001 seconds) and a zero gets a slightly longer wait (about .0013 seconds). More than one bit is encoded by splitting up the signal and encoding one bit in each segment.

The signal is detected by autocorrelation. If the audio signal is represented by f(t), then the bit is extracted by computing f(t + .001) and f(t + .0013). Segments carrying a signal of one will generally produce a higher value of f(t + .001) and segments carrying a signal of zero will produce a higher value of f(t + .0013).

The bandwidth available depends on the sampling rate and a lesser amount on the audio file itself. Higher-frequency sounds and higher sampling rates can provide accurate results with shorter segments, both alone and in combination. Gruhl, Lu and Bender report success with segments lasting one-sixteenth of a second.

The success of this algorithm depends, to a large extent, on the ears listening to it. Some humans are born with good hearing, some train their ears to hear better, and some do both. The music industry continues to experiment with techniques like echo hiding to add a watermark to recordings. The results are often quite good, but still problematic. In many cases, the average human can't detect the additional echo. Many of those who do detect it think the sound is richer. Still, some of the best artists in the business often reject any change to their perfect sound. At times, this debate can ring with irony. Garage bands devoted to making noisy, feedback-rich music sometimes complain about an tiny added bit of echo added as a watermark. This process still continues to require an artist's touch.

15.5. Summary

There is no reason to stop with just moving lines of text or adding echos. Any synthetic file can be tweaked during the construction. The real challenge is creating detection algorithms that will detect and extract the changes from the files. In some cases, the data is readily available. An animated presentation developed in Macromedia's Flash format, for instance, could encode information in the position and timing of the items. This data is easy to extract from the files using the publicly distributed information about the file format.

If the data can't be extracted from the file many, many of the techniques developed by the artificial intelligentisia for image and audio analysis can be quite useful. Machine vision algorithms, for instance, can extract the position and orientation of animated characters in a movie. Echo detection and elimination tools used by audio engineers can also help located echoes carring hidden information.

  • The Disguise Any synthetic file, be it text, sound, light, or maybe one day smell, can carry information by tweaking the parameters during synthesis.
  • How Secure Is It The security depends, to a large extent, on the nature of the files and the skill of the artist. High-quality graphics have many places where a slight twist of the head could carry several bits of information without anyone noticing.
  • How to Use It Any of the steganographic techniques for tweaking the least significant bits or adding in signals can be used on the raw data used to guide the synthesis. To a large extent, all of the techniques for steganography are just applied at an earlier step in the process. For instance, an animator may choose the position of a character's limbs and then feed this data to a rendering engine. The steganography is done just after the animator chooses the position but before the rendering.

Further Reading

  • Siwei Lyu and Hany Farid analyzed the statistical structure of the wavelet decomposition of images and found subtle but useful differences between the decomposition of natural scenes and the scenes produced by computer graphics. They were able to train a statistical quantifier to distinguish between the two. [Lyu05, LF05]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.42.158