Chapter 16. Watermarks

16.1. A Patent for Watermarking Humans

Why should watermarks be limited to digital files and pieces of paper? Can any way of leaving a trace act like a watermark? We offer the claims to an unfiled patent application for “implanting a memetic watermark through humorous banter”.

Claims:

  1. A method for implanting and detecting a watermark in a human subject byidentifying a funny sentence selected from a group of witticisms, whimsical retorts, riddles, puns, limericks, jokes, droll patter, parodistic remarks, and satirical levity;offering said funny sentence to said human subject in a manner designed to attract their attention and implant said funny sentence in their brain and create an instance of a memetic watermark;detecting the presence of said memetic watermark by repeating said funny sentence in order to analyze the response of said human subject, who will either laugh or announce that the joke was not new to them.
  2. The method of claim (1) where said funny sentence is repeated and repeated until it is firmly implanted in said human subject's neural patterns.
  3. The method of claim (2) where said funny sentence is repeated again and again in order to increase the reaction time and volume of the response of said human subject in order to increase the detection of said watermark.
  4. The method of claim (3) where said funny sentence is repeated several more times, increasing the sensitivity in said human subject enough to prompt them to throttle the neck of the next person to repeat said funny sentence increasing further the ability to detect said watermark.

16.2. Tagging Digital Documents

One of the most demanding applications for the algorithms that hide information is protecting copyrighted information. The job requires the hidden information to somehow identify the rightful owner of the file in question and, after identifying it, prevent it from being used in unauthorized ways. This is a tall order because the content industry has great dreams for digital books, music, movies and other multimedia presentations. Putting a computer in the loop means that content producers can experiment with as many odd mechanisms for making money as they can imagine. Some suggest giving away the first n − 1 chapters of a murder mystery and charging only for the last one with the identity of the murderers. Others propose giving people a cut when they recommend a movie to a friend and the friend buys a copy. All of these schemes depend on some form of secure copy protection and many of the dreams include hidden information and steganography.

Hiding information to protect text, music, movies, and art is usually called watermarking, a reference to the light image of the manufacturer's logo pressed into the paper when it was made. The term is apt because steganography can hide information about the creat or of a document as well as information spelling out who can use it and when. Ideally, the computer displaying the document will interpret the hidden information correctly and do the right thing by the creators.

Treating the document as the creators demand is not an easy challenge. All of the algorithms in this book can hide arbitrarly complex instructions for what can and can't be done with the document carrying the hidden information. Some copy protection schemes use as few as 70 bits, a number that can fit comfortably in almost any document.

Just inserting the information is not good enough because watermarks face different threats. Most standard steganographic algorithms fight against discovery by blending in as well as possible to avoid detection. Watermarks also try to hide— but usually to stay out of the way, not to avoid being discovered. Most consumers and pirates will know the watermark is there soon after they try to make a copy. The real challenge is keeping the consumer or pirate from making a copy and removing the watermark.

Digital Watermarking by Ingemar J. Cox, Matthew L. Miller, and Jeffrey A. Bloom is a good survey of a quickly growing field. [CMB01]

This too is not easy. The ideal watermark will stick with a document even after editing, cropping, compression, rotation, or any of the basic forms of distortion. Alas, there are no ideal watermarks out there to date, although many offer some form of resistance to basic distortions.

Defending against basic copying is easy. A digital copy of a document will be exact and carry any watermark along with it. But not all copies are exact. Artists often crop or rotate an image. Compression algorithms for sound or image files add subtle distortions by reproducing only the most significant parts of the information stream. Pirates seek to reproduce all of the salient information while leaving the hidden information behind. Defending against all of the possible threats is practically impossible.

This shouldn't come as a surprise. Making a copy of a document means duplicating all of the sensations detectable by a human. If the sky is a clear, bright blue in the document, then it should be a clear, bright blue in the copy as well. If a bell rings in the document, then it should ring with close to the same timbre in the copy. But if some part of a document can't be perceived, then there's no reason to make a copy of that part.

The watermark creator faces a tough conundrum. Very-well-hidden information is imperceptable to humans and thus easy to leave behind during copying. The best techniques for general steganography are often untenable for watermarks. Compression algorithms and nonexact copying solutions will strip the watermarks away.

But information that's readily apparent to human eyes and ears isn't artistically desirable. Distortions to the music and the images can ruin them or scar them, especially in an industry that often pushes the quality of the reproduction in marketing.

If the ideal watermark can't be created, there's no reason why a practical one can't solve some of the problems. Adobe Photoshop, for instance, comes with a tool for embedding a watermark designed by Digimarc. The software can insert a numerical tag into a photo that can then be used to find the rightful owner in a database. The solution that uses some of the wavelet-encoding techniques from Chapter 14 can resist many basic distortions and changes introduced, perhaps ironically, by Photoshop. The technique is not perfect, however, and preliminary tests show that rotating an image by 45 degrees before blurring and sharpening the image will destroy the watermark.

All of the watermarking solutions have some weakness to tricks like that. Measuring the amount of resistance is hard to do. The StirMark suite is a collection of basic distortions that bend, fold and mutilate an image while testing to see if the watermark survives. This collection is a good beginning, but the range of distortion is almost infinite and difficult to model or define.

16.2.1. A Watermarking Taxonomy

At this point, watermark creators are still exploring the limits of the science and trying to define what can and can't be done to resist the real and somewhat imagined threats. Toward this end, they've created a kind of taxonomy of watermarks that describes the different kinds and their usefulness. Here is a list of the different ways to evaluate them:

  • Fragility Some watermarks disappear if one bit of the image is changed. Hiding information in the least significant bit (Chapter 9) is usually not a robust watermark because one flipped bit can make it impossible to recover all of the information. Even error correction and redundancy can add only so much strength. Fragile watermarks, though, are not always useless. Some propose inserting watermarks that break immediately as a technique to detect any kind of tampering. If the watermark includes some digital signature of the document, then it offers assurance that the file is unaltered.
  • Continuity Some watermarks resist a wide range of distortions by disappearing gradually as the changes grow larger. Larger and larger distortions produce weaker and weaker indications that a watermark is present. This continuity is often found in some of the wavelet-encoding solutions described in Chapter 14. The watermark itself is a vector of coefficients describing the image. Small changes in the image produce small changes of the coefficients. A vector matching algorithm finds the watermark by finding the best match. In many cases, the strength of the watermark is a trade off with the amount of information in the watermark itself. A large number of distinct watermarks requires a small distance between different watermarks. A small distance means that only a small distortion could convert one watermark into another.
  • Watermark Size How many “bits” of information are available? Some watermarks simply hide bits of information. Counting the number of bits stored in the document is easy. Other watermarking schemes don't hide bits per se. They add distortions in such a way that the shape and location of the distortions indicate who owns the document. Hiding lots of information means having many different and distinct patterns of distortions. In some cases, packing many different patterns is not easy because the size, shape and interaction with the cover document are not easy to model or describe.
  • Blind Detection Some watermarks require providing some extra data to the detector. This might be the original unwatermarked image or sound file, or it could be a key. The best solutions offer blind detection, which provides as little information as possible to the algorithm that looks for a watermark. The ideal detector will examine the document, check for a watermark and then enforce the restrictions carried by the watermark. Blind detection is a requirement for many schemes for content protection. Providing a clean, unwatermarked copy to the computers of the users defeats the purpose. But this doesn't mean that nonblind schemes are worthless. Some imagine situations where the watermark is only extracted after the fact, perhaps as evidence. One solution is to embedded the Id number of the rightful owner of a document in a watermark. If the document later appears in open circulation, perhaps on the Internet, the owners could use a nonblind scheme to extract the watermark and track down the source of the file. They could still hold the original clean copy without releasing it.
  • Resistance to Multiple Watermarks Storing more hidden information is one of the easiest attacks to launch against a document with hidden information. Using the same algorithm often guarantees that the same hidden spots will be altered to carry the new message. An ideal watermark will carry multiple messages from multiple parties, who can insert their data and retrieve it without any coordination. Some of these least significant bit schemes from Chapter 9 offer this kind of resistance by using a key to choose where to hide the data. Many can carry multiple messages without any problem, especially if error correction handles occasional collisions. Unfortunately, these ideal solutions are often fragile and thus undesirable for other reasons. This is another trade off. Localizing the information in the watermark reduces the chance that another random watermark will alter or destroy it, but it also increases the chance that a small change will ruin it.
  • Accuracy Many watermarking schemes achieve some robustness to distortion by sacrificing accuracy. Many rely on finding the best possible match and thus risk finding the wrong match if the distortion is large enough. These algorithms sacrifice accuracy in a strange way. Small changes still produce the right answer, but large enough changes can produce a dramatically wrong answer.
  • Fidelity One of the hardest effects of watermarks to measure is the amount of distortion introduced by the watermarking process itself. Invisible or inaudible distortions may be desirable, but they're usually easy to defeat by compression algorithms that strip away all of the unnecessary data. The best schemes introduce distortions that are small enough to be missed by most casual observers. These often succeed by changing the relative strength or position of important details. One classic solution is to alter the acoustics of the recording room by subtly changing the echos. The ear usually doesn't care if the echoes indicate a 8 × 8 room or a 20 × 20 room. At some point, this approach fails and the art is finding the right way to modulate the accoustics without disturbing the greatest number of listeners. Many of the wavelet-encoding techniques from Chapter 14 succeed by changing the relative strength of the largest coefficients assembled to describe the image. The smallest coefficients are easy to ignore or strip away, but the largest can't be removed without distorting the image beyond recognition. The solution is to change the relative strengths until they conform to some pattern.
  • Resistance to Framing One potential use for watermarks is to identify the rightful owner of each distinct copy. Someone who might want to leak or pirate the document will try to remove that name. One of the easiest techniques is to buy multiple copies and then average them together. If one pixel comes from one version and another pixel comes from another, then there's a good chance that neither watermark will survive. Some schemes deliberately try to avoid this kind of attack by embedding multiple copies of the signature and creating identification codes that can survive this averaging.
  • Keying Is a key required to read the watermark? Is a key required to insert it? Some algorithms use keys to control who inserts the data to prevent unauthorized people from faking documents or creating faked watermarks. Others use keys to ensure that only the right people can extract the watermark and glean the information. Chapter 12 describes some of the approaches.

No algorithm offers the ideal combination of these features, in part because there's often no way to have one feature without sacrificing the other. The good news is that often watermarks that fail one task can find use in another form.

16.3. A Basic Watermark

Here is a basic technique for watermarking that blends together many of the different solutions proposed in recent years. This description is a bit abstract, which obscures the challenges of actually producing a working version that implements the technique. Here's the algorithm:

  1. Begin with a document in a standard form.
  2. Choose a mechanism for decomposing the document into important components. One solution is to use the discrete cosine transform to model the signal as the sum of a collection of cosine functions multiplied by some coefficients. Let {c0,…,cn} be the set of coefficients.
  3. The coefficients measure the size of the different components. Ideally, the model will guarantee that large coefficients have a large effect on the document and small coefficients have a small effect. If this is the case, find a way to exclude the small coefficients. They're not important and likely to be changed dramatically by small changes in the document itself.
  4. Quantize the coefficents by finding the closest replacement from a small set of values. One of the simplest quantization schemes for a value, ci, is to find the integer ki such that kiQ is closest to x. The value of Q is often called the quanta. Exponential or logarithmic schemes may be appropriate for some cases.
  5. Let {b0,…,bn} be a watermark. Insert a watermark by tweaking each coefficient. Each value of ci lies between an odd and an even integer multiple of Q. That is, kiQci ≤ (ki + 1)Q. To encode bi = 0 at coefficient ci, set ci to the even multiple of Q. To encode bi = 1, set ci to be the odd multiple of Q.
  6. Use a reverse transform to reconstruct the original document from the new values of {c0,…,cn}. If the understanding of the decomposition process is correct, the changes will not alter the image dramatically. Of course, some experimentation with the value of Q may be necessary.

This scheme for encoding a watermark can be used with many models for deconstructing images and sound files. The greatest challenge is setting the value of Q correctly. A large Q adds robustness at the cost of introducing greater distortion. The cost of a large Q should be apparent by this point. The value can be seen by examining the algorithm for extracting the watermark:

  1. To extract a watermark, begin by applying the same deconstructive technique that models the document as a series of coefficients: {c'0,…,c'n}.
  2. Find the integer ki such that |kiQc'i| is minimized. If there's been no distortion in the image, then c'i = kiQ. If our model of the document is good, small changes in the document should correspond to small changes in the coefficients. Small changes should still result in the same values of ki.
  3. If ki is odd, then bi = 1. If ki is even, then bi = 0. This is the watermark.

Many watermarking schemes add an additional layer of protection by including some error correction bits to the watermark bits, {b0,…,bn}. (See Chapter 3.) Another solution is to compare the watermarking bits to a known set of watermarks and find the best match. To some extent, these are the same techniques. Choosing the size of Q and the amount of error correction lets you determine the amount of robustness available.

This solution is a very good approach that relies heavily on the decomposition algorithm. The discrete cosine transform is a good solution, but it has weaknesses. Even slight rotations can introduce big changes in the coefficients produced by the transform. Some researchers combat this with a polar transform that produces the same coefficients in all orientations of the document. This solution, though, often breaks if the document is cropped thus changing the center. Every model has strengths and weaknesses.

16.3.1. Choosing the Coefficients

Another challenge is choosing the coefficients to change. Some suggest changing only the largest and most salient. In one of the first papers to propose a watermark scheme like this, Ingemar Cox, Joe Kilian, Tom Leighton, and Talal Shamoon suggested choosing the largest coefficients from a discrete cosine transform of the image. The size guaranteed that these coefficients contributed more to the final image than the small ones. Concentrating the message in this part of the image made it more likely that the message would survive compression or change. [CKLS96]

Others suggest concentrating in a particular range for perceptual reasons. Choosing the right range of discrete cosine coefficients can introduce some resistance to cropping. The function cos(2πx), for instance, repeats every unit while

repeats every 1000 units. A watermark that uses smaller, shorter waves is more likely to resist cropping than one that relies on larger ones. These shorter waves also introduce smaller, more localized distortions during the creation of the watermark.

16.4. An Averaging Watermark

Cropping is one of the problems confronting image watermark creators. Artists frequently borrow photographs and crop them as needed. A watermark designed to corral these artists must withstand cropping.

An easy solution is to repeat the watermark a number of times throughout the image. The first solution in this chapter accomplishes this to some extent by using decomposition techniques like the discrete cosine transform. Many of the same coefficents usually emerge even after cropping.

This example is a more ordinary approach to repeating the watermark. It is not elegant or mathematically sophisticated, but the simplicity has some advantages.

A watermark consists of an m × n block of small integers. For the sake of simplicity, let's assume the block is 4 × 4 and constructed of values from the set {−2, −1,0,1,2}. There are 516 = 152587890625 possible watermarks in this example, although it will not always be practical to tell the difference between them all. Let these values be represented by wi,j|, where 0 ≤ i < m and 0 ≤ j < n. In this case, m = n = 4.

The watermark is inserted by breaking the image into 4 × 4 blocks and adding the values into the pixels. Pixel pi,j is replaced with pi,j + wi mod m,j mod n. If the value is too large or too small, it is replaced with either the maximum value or zero, usually 255 and 0.

How is the watermark recovered? By averaging pixels. Let w'a,b be the average intensity of all pixels, pi,j, such that i mod m = a and j mod n = b. In the 4 × 4 example, w'0,1 is the average of pixels like p0,1, p4,1, p8,1, p0,5, p4,5, p8,9, etc.

The success of this step assumes that the patterns of the image do not fall into the same m × n rhythm as the watermark. That is, the average value of all of the pixels will be the same. A light picture may have a high average value while a dark picture may have a low average value. Ideally, these numbers balance out. If this average value is S, then the goal is to find the best watermark that matches w’ – S.

This is easy if the image has not been cropped or changed. w’ – S should be close to, if not the same as, the inserted watermark. The only inaccuracy occurs when the watermark value is added to a pixel and the pixel value overflows.

Recovering the watermark is still simple if the image is cropped in the right way. If the new boundaries are an integer multiple of m and n pixels away from the original boundaries then the values of w’ and w will still line up exactly.

This magical event is not likely and any of the mn possible orientations could occur. One solution is to compare the values recovered from the image to a database of known watermarks. This takes kmn steps, where k is the number of known watermarks. This may not be a problem if the system uses only a small number of watermarks, but it could become unwieldy if k grows large.

Another solution is to create a canonical order for the watermark matrix. Let p and q be the canonical offsets. The goal is to find one pair of values for p and q so that we always find the same order for the watermark, no matter what the scheme. This is a bit cumbersome and there are other solutions.

  1. For this example, let zi,j = 54i+j. Five is the number of possible values of the watermark.
  2. Let F(w,p,q) = ∑zi,j (2 + w(i+p) mod 4(j+q) mod 4).
  3. Try all possible values of F(w,p,q) and choose the maximum (or minimum). This is the canonical order of w.

If mn is a reasonable value, then it might make more sense to store mn versions of each watermark in a database. If it is large, then a canonical order might make more sense.

It should become clear at this point that all possible 516 watermarks can't be used. Some of them have identical canonical values. Every watermark has 15 other shifty cousins.

16.4.1. Effects of Distortion

Much of the success of this watermark depends on the way that the averaging balances out any potential changes. If the noise or distortion in the image is uniformly distributed, then the changes should balance out. The averages will cancel out the changes.

Not all distortions are equal. Several of the StirMark changes introduce or delete rows or columns of pixels from the middle of the image. This can throw off the averaging completely and destroy this kind of watermark. While it may be possible to recover it by sampling sections of the image, the process is neither easy nor guaranteed.

16.4.2. Birthday Marks

Here is a short, more basic example loosely based on a number of systems. The phrase birthday marks is borrowed from the birthday paradox, the fact that the odds of any two people having the same birthday in a group of n people increases quadratically, O(n2), as the group grows. In a group of 23 people, there's a 50–50 chance that one pair will share the same birthday and it becomes almost a sure thing, (> 99%) when there are 57 people. Clearly at 367 people, it's guaranteed.

Imagine that you can put mi,j marks in the content by tweaking positions in the file. While the notation suggests a grid because there are two parameters, the marks don't need to be arranged in a grid. Indeed, there shouldn't be any detectable connection between the parameters, i and j, and the locations of the marks. If a file has k locations, you might store the marks at a position computed by encrypting i and j.

The birthday paradox was also an inspiration for the birthday attacks using hash collisions. [Cop85, GCC88]

Although there won't be any connection between the locations and the parameters i and j, let's write these marks in a matrix for clarity. For simplicity, let's imagine there are 16 users so we'll assign 16 unique ids. When user a buys a document, let's put a mark on spots ma,i and mi,a for all 0 ≤ i < 16. For the sake of example, assume a = 3 spots with marks are drawn as ones in this matrix and the unmarked spots are left as zeros:

0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

Now assume that there's another user out there with an Id number of 7. The matrix of marks made in 7's copy of the content will look like this:

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

Let's assume that 3 and 7 decide to collude to strip out their marks. They don't understand the watermarking algorithm. They don't know how the data is inserted into the files. They just have two versions of the content and they can compare them with each other. If they do this, they'll find a number of places where there are differences: every mi,j, where i or j is either 3 or 7 except for m3,3, m3,7, m7,3 and m7,7.

If 3 and 7 are able to blur away the marks at every place where there's a difference, they'll still leave four marks that identify the pair. The content owners will be able to track the document to these two people.

If 3 and 7 decide to use an attack and flip a coin at every difference they find in their document and keep one or the other version, they'll still leave plenty of marks with their unique identity.

This naive solution does not have the best performance as the number of unique Ids grows larger because it requires O(n2) marks for n unique Ids. This can be avoided to some extent by repeating the process a number of times. Imagine, for instance, that a unique Id number consists of six hexadecimal digits. The previous matrix with 162 = 256 marks could be repeated six times, once to encode each digit. Instead of requiring 2242 = 248 marks, it would only require 6 × 256, a much smaller number.

16.5. Summary

Watermarking is not an easy challenge for steganography. Many of the researchers exploring it have high ideals and want their watermarks to remain intact even after a fairly daunting array of distortions and changes.

This chapter offers two basic algorithms that resist some basic distortions. The discussion avoids much of the complexity involved in determining when a watermark may or may not be present. Finding it is easy if the image is not changed. Finding it afterwards becomes an exercise in informed guessing. Just when is one vector of numbers close enough to another? Which is the best match for a watermark? These are questions of engineering and design not answered in this chapter. Finding the best solution requires building a system and testing it with a collection of sample images and sound files. Much of the real challenge is tweaking the coefficents.

  • The Disguise Watermarks are steganographic solutions designed to withstand more general attacks from a savvy set of users and potential pirates. The ideal watermark will embed information in such a way that the only way to destroy it is to introduce so much noise and distortion that the original image is unusable and perhaps unrecognizable.
  • How Secure Is It? None of the watermarks come close to this ideal, but some are quite useful. Systems like Digimarc's are in wide use thanks to the company's partnership with Adobe. They can be circumvented but they can withstand many casual distortions. Much of their success depends on the individual algorithm and the sample files.
  • How to Use It. Find an image, introduce some changes, and then hope to track down the person violating your copyright. In practice, the legal and logistical headaches may be greater than the problems of making the watermark work. If you get really frustrated, just start suing people as some companies have done.

Further Reading

While any of the algorithms in this book can be used to embed a watermark, most of the common algorithms use mechanisms based on either the discrete cosine transform or the discrete wavelet decomposition. Chapter 14 digs deeply into this area. Some other suggestions are:

  • Digital Watermarking by Ingemar Cox, Matthew L. Miller and Jeffrey A. Bloom is a great introduction to using wavelet techniques to embed watermarks and hide information. The 2007 edition is broader and it includes a good treatment of steganography and steganalysis. [CMB02, CMB07]
  • The proceedings of the International Workshop on Digital Watermarking are invaluable sources that track the development of steganography using wavelets and other functional decomposition. [PK03, KCR04, CKL05, SJ06]
  • A good conference focused on watermarking digital content is the Security, Steganography, and Watermarking of Multimedia Contents. [DW04, DW05]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.74.87