Chapter 9

Calibration Techniques

One of the key challenges in steganalysis is that most features vary a lot within each class, and sometimes more than between classes. What if we could calibrate the features by estimating what the feature would have been for the cover image?

Several such calibration techniques have been proposed in the literature. We will discuss two of the most well-known ones, namely the JPEG calibration of Fridrich et al. (2002) and calibration by downsampling as introduced by Ker (2005b). Both of these techniques aim to estimate the features of the cover image. In Section 9.4, we will discuss a generalisation of calibration, looking beyond cover estimates.

9.1 Calibrated Features

We will start by considering calibration techniques aiming to estimate the features of the cover image, and introduce key terminology and notation.

We view a feature vector as a function images/c09_I0001.gif, where images/c09_I0002.gif is the image space (e.g. images/c09_I0003.gif for 8-bit grey-scale images). A reference transform is any function images/c09_I0004.gif. Given an image images/c09_I0005.gif, the transformed image images/c09_I0006.gif is called the reference image.

If, for any cover image images/c09_I0007.gif and any corresponding steganogram images/c09_I0008.gif, we have

9.1 9.1

we say that images/c09_I0010.gif is a cover estimate with respect to images/c09_I0011.gif. This clearly leads to a discriminant if additionally images/c09_I0012.gif. We could then simply compare images/c09_I0013.gif and images/c09_I0014.gif of an intercepted image images/c09_I0015.gif. If images/c09_I0016.gif we can assume that the image is clean; otherwise it must be a steganogram.

The next question is how we can best quantify the relationship between images/c09_I0017.gif and images/c09_I0018.gif to get some sort of calibrated features. Often it is useful to take a scalar feature images/c09_I0019.gif and use it to construct a scalar calibrated feature images/c09_I0020.gif as a function of images/c09_I0021.gif and images/c09_I0022.gif. In this case, two obvious choices are the difference or the ratio:

images/c09_I0023.gif

where images/c09_I0024.gif is a scalar. Clearly, we would expect images/c09_I0025.gif and images/c09_I0026.gif for a natural image images/c09_I0027.gif, and some other value for a steganogram. Depending on the feature images/c09_I0028.gif, we may or may not know if the calibrated feature of a steganogram is likely to be smaller or greater than that of a natural image.

The definition of images/c09_I0029.gif clearly extends to a vector function images/c09_I0030.gif; whereas images/c09_I0031.gif would have to take the element-wise ratio for a vector function, i.e.

images/c09_I0032.gif

We will refer to images/c09_I0033.gif as a difference calibrated feature and to images/c09_I0034.gif as a ratio calibrated feature.

Figure 9.1 illustrates the relationship we are striving for in a 2-D feature space. The original feature vectors, images/c09_I0035.gif and images/c09_I0036.gif, indicated by dashed arrows are relatively similar, both in angle and in magnitude. Difference calibration gives us feature vectors images/c09_I0037.gif and images/c09_I0038.gif, indicated by solid arrows, that are images/c09_I0039.gifimages/c09_I0040.gif apart. There is a lot of slack in approximating images/c09_I0041.gif for the purpose of illustration.

Figure 9.1 Suspicious image and (difference) calibrated image in feature space

9.1

Calibration can in principle be applied to any feature images/c09_I0042.gif. However, a reference transform may be a cover estimate for one feature vector images/c09_I0043.gif, and not for another feature vector images/c09_I0044.gif. Hence, one calibration technique cannot be blindly extended to new features, and has to be evaluated for each feature vector considered.

The ratio and difference have similar properties, but the different scales may very well make a serious difference in the classifier. We are not aware of any systematic comparison of the two in the literature, whether in general or in particular special cases. Fridrich (2005) used the difference of feature vectors, and Ker (2005b) used the ratio of scalar discriminants.

The calibrated features images/c09_I0045.gif and images/c09_I0046.gif are clearly good discriminants if the approximation in (9.1) is good and the underlying feature images/c09_I0047.gif has any discriminating power whatsoever. Therefore, it was quite surprising when Kodovský and Fridrich (2009) re-evaluated the features of Pevný and Fridrich (2007) and showed that calibration actually leads to inferior classification. The most plausible explanation is that the reference transform is not a good cover estimate for all images and for all the features considered.

The re-evaluation by Kodovský and Fridrich (2009) led to the third approach to feature calibration, namely the Cartesian calibrated feature. Instead of taking a scalar function of the original and calibrated features, we simply take both, as the Cartesian product images/c09_I0048.gif for each scalar feature images/c09_I0049.gif. The beauty of this approach is that no information is discarded, and the learning classifier can use all the information contained in images/c09_I0050.gif and images/c09_I0051.gif for training. Calibration then becomes an implicit part of learning. We will return to this in more detail in Section 9.2.2.

9.2 JPEG Calibration

The earliest and most well-known form of calibration is JPEG calibration. The idea is to shift the images/c09_I0052.gif grid by half a block. The embedding distortion, following the original images/c09_I0053.gif grid, is assumed to affect features calculated along the original grid only. This leads to the following algorithm.


Algorithm 9.2.1
Given a JPEG image images/c09_I0054.gif, we can obtain the calibrated image images/c09_I0055.gif by the following steps:
1. Decompress the image images/c09_I0056.gif, to get a pixmap images/c09_I0057.gif.
2. Crop four pixels on each of the four sides of images/c09_I0058.gif.
3. Recompress images/c09_I0059.gif using the quantisation matrices from images/c09_I0060.gif, to get the calibrated image images/c09_I0061.gif.

JPEG calibration was introduced for the blockiness attack, using 1-norm blockiness as the only feature. The technique was designed to be a cover estimate, and the original experiments confirmed that it is indeed a cover estimate with respect to blockiness. The reason why it works is that blockiness is designed to detect the discontinuities on the block boundaries caused by independent noise in each block. By shifting the grid, we get a compression domain which is independent of the embedding domain, and the discontinuities at the boundaries are expected to be weaker. This logic does not necessarily apply for other features.

9.2.1 The FRI-23 Feature Set

The idea of using JPEG calibration with learning classifiers was introduced by Fridrich (2005), creating a 23-dimensional feature vector which we will call FRI-23. Most of the 23 features are created by taking the 1-norm images/c09_I0062.gif of an underlying, multi-dimensional, difference calibrated feature vector, as follows.


Definition 9.2.2 (Fridrich calibrated feature)
Let images/c09_I0063.gif be any feature extraction function operating on JPEG images. The Fridrich calibrated feature of images/c09_I0064.gif is defined as

images/c09_I0065.gif

where images/c09_I0066.gif.

Note that this definition applies to a scalar feature images/c09_I0067.gif as well, where the images/c09_I0068.gif-norm reduces to the absolute value images/c09_I0069.gif. The features underlying FRI-23 are the same as for NCPEV-219, as we discussed in Chapter 8. Fridrich used 17 multi-dimensional feature vectors and three scalar features, giving rise to 20 features using Definition 9.2.2. She used a further three features not based on this definition. The 23 features are summarised in Table 9.1. The variation and blockiness features are simply the difference calibrated features from images/c09_I0070.gif, images/c09_I0071.gif and images/c09_I0072.gif from NCPEV-219.

Table 9.1 Overview of the calibrated features used by Fridrich (2005) and Pevný and Fridrich (2007)

c09tnt001

For the histogram features, Definition 9.2.2 allows us to include the complete histogram. Taking just the norm keeps the total number of features down. The feature vectors used are:

  • images/c09_I0073.gif (global histogram);
  • images/c09_I0074.gif (dual histogram) for images/c09_I0075.gif; and
  • images/c09_I0076.gif (per-frequency AC histograms) for images/c09_I0077.gif.

Each of these feature vectors gives one calibrated feature. Note that even though the same underlying features occur both in the local and the dual histogram, the resulting Fridrich calibrated features are distinct.

The last couple of features are extracted from the co-occurrence matrix. These are special in that we use the signed difference, instead of the unsigned difference of the 1-norm in Definition 9.2.2.


Definition 9.2.3 (Co-occurrence features)
The co-occurrence features images/c09_I0078.gif are defined as follows:

images/c09_I0079.gif


According to Fridrich, the co-occurrence matrix tends to be symmetric around images/c09_I0080.gif, giving a strong positive correlation between images/c09_I0081.gif for images/c09_I0082.gif and for images/c09_I0083.gif. Thus, the elements which are added for images/c09_I0084.gif and for images/c09_I0085.gif above will tend to enforce each other and not cancel each other out, making this a good way to reduce dimensionality.

9.2.2 The Pevný Features and Cartesian Calibration

An important pioneer in promoting calibration techniques, FRI-23 seems to have too few features to be effective. Pevný and Fridrich (2007) used the difference-calibrated feature vector images/c09_I0086.gif directly, instead of the Fridrich-calibrated features, where images/c09_I0087.gif is NCPEV-219.

The difference-calibrated features intuitively sound like a very good idea. However, in practice, they are not always as effective as they were supposed to be. Kodovský and Fridrich (2009) compared PEV-219 and NCPEV-219. Only for JP Hide and Seek did PEV-219 outperform NCPEV-219. For YASS, the calibrated features performed significantly worse. For the other four algorithms tested (nsF5, JSteg, Steghide and MME3), there was no significant difference.

The failure of difference-calibrated features led Kodovský and Fridrich to propose a Cartesian-calibrated feature vector, CCPEV-438, as the Cartesian product of NCPEV-219 calculated from the image images/c09_I0088.gif and NCPEV-219 calculated from images/c09_I0089.gif.

Table 9.2 shows some experimental results with different forms of calibration. The first test is based on Pevný's features, and it is not a very strong case for calibration of any kind. We compare the uncalibrated features (NCPEV-219), the difference-calibrated features (PEV-219), and the Cartesian features (PEV-438). We have also shown the features calculated only from images/c09_I0090.gif as CPEV-219. We immediately see that the original uncalibrated features are approximately even with Cartesian calibration and better than difference calibration.

Table 9.2 Comparison of accuracies of feature vectors for JPEG steganography

c09tnt002

A better case for calibration is found by considering other features, like the conditional probability features CP-27. Calibrated versions improve the accuracy significantly in each of the cases tested. Another case for calibration was offered by Zhang and Zhang (2010), where the accuracy of the 243-D Markov feature vector was improved using Cartesian calibration.

When JPEG calibration does not improve the accuracy for NCPEV-219, the most plausible explanation is that it is not a good cover estimate. This is confirmed by the ability of CPEV-219 to discriminate between clean images and steganograms for long messages. This would have been impossible if images/c09_I0091.gif. Thus we have confirmed what we hinted earlier, that although JPEG calibration was designed as a cover estimate with respect to blockiness, there is no reason to assume that it will be a cover estimate with respect to other features.

JPEG calibration can also be used as a cover estimate with respect to the histogram (Fridrich et al., 2003b). It is not always very good, possibly because calibration itself introduce new artifacts, but it can be improved. A blurring filter applied to the decompressed image will even out the high-frequency noise caused by the original sub-blocking. Fridrich et al. (2003b) recommended applying the following blurring filter before recompression (between Steps 2 and 3 in Algorithm 9.2.1):

images/c09_I0092.gif

The resulting calibrated image had an AC histogram closely matching that of the original cover image, as desired.


Remark 9.1
Also the run-length features (Section 6.1.4) can be seen as an application of Cartesian calibration. The motivation when we combined features from the original image and a quantised image is exactly the same as in the introduction of Cartesian calibration; the quantised image provides a baseline which is less affected by the embedding than the original image would be.

9.3 Calibration by Downsampling

Downsampling is the action of reducing the resolution of a digital signal. In the simplest form a group of adjacent pixels is averaged to form one pixel in the downsampled image. Obviously, high-frequency information will be lost, while the low-frequency information will be preserved. Hence, one may assume that a downsampling of a steganogram images/c09_I0093.gif will be almost equal to the downsampling of the corresponding cover image images/c09_I0094.gif, as the high-frequency noise caused by embedding is lost. Potentially, this gives us a calibration technique, which has been explored by a number of authors.

Ker (2005b) is the pioneer on calibration based on downsampling. The initial work was based on the HCF-COM feature of Harmsen (2003) (see Section 6.1.1). The problem with HCF-COM is that its variation, even within one class, is enormous, and even though it is statistically greater for natural images than for steganograms, this difference may not be significant. In this situation, calibration may give a baseline for comparison and to improve the discrimination.

Most of the work on downsampling has aimed to identify a single discriminating feature which in itself is able to discriminate between steganograms and clean images. This eliminates the need for a classification algorithm; only a threshold images/c09_I0095.gif needs to be chosen. If images/c09_I0096.gif is the discriminating feature, we predict one class label for images/c09_I0097.gif and the alternative class for images/c09_I0098.gif. Therefore we will not discuss feature vectors in this section. However, there is no reason why one could not combine a number of the proposed statistics, or even intermediate quantities, into feature vectors for machine learning. We have not seen experiments on such feature vectors, and we leave it as an exercise for the reader.

9.3.1 Downsampling as Calibration

Ker (2005b) suggests downsampling by a factor of two in each dimension. Let images/c09_I0099.gif denote the down sampled version of an image images/c09_I0100.gif. Each pixel of images/c09_I0101.gif is simply the average of four (images/c09_I0102.gif) pixels of images/c09_I0103.gif, as shown in Figure 9.2. Mathematically we write

images/c09_I0104.gif

Except for the rounding, this is equivalent to the low-pass component of a (2-D) Haar wavelet decomposition.

Figure 9.2 Downsampling á la Ker

9.2

Downsampling as calibration is based on the assumption that the embedding distortion images/c09_I0105.gif is a random variable, identically and independently distributed for each pixel images/c09_I0106.gif. Taking the average of four pixels, we reduce the variance of the distortion. To see this, compare the downsampled pixels of a clean image images/c09_I0107.gif and a steganogram images/c09_I0108.gif:

9.2 9.2

9.3 9.2

If images/c09_I0110.gif are identically and independently distributed, the variance of images/c09_I0111.gif is a quarter of the variance of images/c09_I0112.gif. If images/c09_I0113.gif has zero mean, this translates directly to the distortion power on images/c09_I0114.gif being a quarter of the distortion power on images/c09_I0115.gif.

Intuitively one would thus expect downsampling to work as a cover estimate with respect to most features images/c09_I0116.gif. In particular, we would expect that

9.4 9.3

9.5 9.3

If this holds, a natural image can be recognised as having images/c09_I0118.gif, whereas images/c09_I0119.gif for steganograms.

We shall see later that this intuition is correct under certain conditions, whereas the rounding (floor function) in (9.3) causes problems in other cases. In order to explore this, we need concrete examples of features using calibration. We start with the HCF-COM.

9.3.2 Calibrated HCF-COM

The initial application of downsampling for calibration (Ker, 2005b) aimed to adapt HCF-COM (Definition 4.3.5) to be effective for grey-scale images. We remember that Harmsen's original application of HCF-COM depended on the correlation between colour channels, and first-order HCF-COM features are ineffective on grey-scale images. Downsampling provides an alternative to second-order HCF-COM on grey-scale images. The ratio-calibrated HCF-COM feature used by Ker is defined as

images/c09_I0120.gif

where images/c09_I0121.gif is the HCF-COM of image images/c09_I0122.gif. Ker's experiment showed that images/c09_I0123.gif had slightly better accuracy than images/c09_I0124.gif against LSBimages/c09_I0125.gif at 50% of capacity, and significantly better at 100% of capacity. In Figure 9.3 we show how the HCF-COM features vary with the embedding rate for a number of images. Interestingly, we see that for some, but not all images, images/c09_I0126.gif show a distinct fall around 50–60% embedding.

Figure 9.3 HCF-COM: (a) calibrated for various images; (b) calibrated versus non-calibrated

9.3

In Figure 9.3(b), we compare images/c09_I0127.gif, images/c09_I0128.gif and images/c09_I0129.gif for one of the images where images/c09_I0130.gif shows a clear fall. We observe that neither images/c09_I0131.gif nor images/c09_I0132.gif shows a similar dependency on the embedding rate, confirming the usefulness of calibration. However, we can also see that downsampling does not at all operate as a cover estimate, as images/c09_I0133.gif can change more with the embedding rate than images/c09_I0134.gif does.

Ker (2005b) also considered the adjacency HCF-COM, as we discussed in Section 4.3. The calibrated adjacency HCF-COM is defined as the scalar feature images/c09_I0135.gif given as

images/c09_I0136.gif

Our experiments with uncompressed images in Figure 9.4 show a more consistent trend than we had with the regular HCF-COM.

Figure 9.4 Calibrated adjacency HCF-COM for various images

9.4

9.3.3 The Sum and Difference Images

Downsampling as calibration works well in some situations, where it is a reasonable cover estimate. In other situations, it is a poor cover estimate, even with respect to the features images/c09_I0137.gif and images/c09_I0138.gif. We will have a look at when this happens, and how to amend the calibration technique to be more robust.

We define

images/c09_I0139.gif

Clearly, we can rewrite (9.2) and (9.3) defining the downsampled images as images/c09_I0140.gif and images/c09_I0141.gif.

The critical question is the statistical distribution of images/c09_I0142.gif. With a uniform distribution, both images/c09_I0143.gif and images/c09_I0144.gif will be close to 1 for natural images and significantly less for steganograms. Assuming that images/c09_I0145.gif is uniformly distributed, Ker (2005a) was able to prove that

images/c09_I0146.gif

and it was verified empirically that

images/c09_I0147.gif

According to Ker, uniform distribution is typical for scanned images. However, images decompressed from JPEG would tend to have disproportionately many groups for which images/c09_I0148.gif.

We can see that if images/c09_I0149.gif, a positive embedding distortion images/c09_I0150.gif will disappear in the floor function in the definition of images/c09_I0151.gif, while negative distortion images/c09_I0152.gif will carry through. This means that the embedding distortion images/c09_I0153.gif is biased, with a negative expectation. It can also cause the distortion to be stronger in images/c09_I0154.gif than in images/c09_I0155.gif, and not weaker as we expected. The exact effect has not been quantified in the literature, but the negative implications can be observed on some of the proposed features using calibration. They do not give good classifiers for previously compressed images.

The obvious solution to this problem is to avoid the rounding function in the definition images/c09_I0156.gif:

images/c09_I0157.gif

In fact, the only reason to use this definition is to be able to treat images/c09_I0158.gif as an image of the same colour depth as images/c09_I0159.gif. However, there is no problem using the sum image

images/c09_I0160.gif

as the calibrated image. The only difference between images/c09_I0161.gif and images/c09_I0162.gif is that the latter has four times the colour depth, that is a range images/c09_I0163.gif if images/c09_I0164.gif is an 8-bit image.

The increased colour depth makes images/c09_I0165.gif computationally more expensive to use than images/c09_I0166.gif. If we want to calculate HAR3D-3, using a joint histogram across three colour channels and calculating a 3-D HCF using the 3-D DFT, the computational cost increases by a factor of 64 or more. A good compromise may be to calculate a sum image by adding pairs in one dimension only (Ker, 2005a), that is

images/c09_I0167.gif

Thus the pixel range is doubled instead of quadrupled, saving some of the added computational cost of the DFT. For instance, when a 3-D DFT is used, the cost factor is 8 instead of 64.

The HCF of images/c09_I0168.gif will have twice as many terms as that of images/c09_I0169.gif, because of the increased pixel range. In order to get comparable statistics for images/c09_I0170.gif and for images/c09_I0171.gif, we can use only the lower half of frequencies for images/c09_I0172.gif. This leads to Ker's (2005a) statistic

images/c09_I0173.gif

with images/c09_I0174.gif for an 8-bit image images/c09_I0175.gif. It is the high-frequency components of the HCF of images/c09_I0176.gif that are discarded, meaning that we get rid of some high-frequency noise. There is no apparent reason why images/c09_I0177.gif would not be applicable to grey-scale images, but we have only seen it used with colour images as discussed below in Section 9.3.4.

The discussion of 2-D histograms and difference matrices in Chapter 4 indicate that the greatest effect of embedding can be seen in the differences between neighbour pixels, rather than individual pixels or even pixel pairs. Pixel differences are captured by the high-pass Haar transform. We have already seen the sum image images/c09_I0178.gif, which is a a low-pass Haar transform across one dimension only. The difference image can be defined as

images/c09_I0179.gif

and it is a high-pass Haar transform across one dimension. Li et al. (2008a) suggested using both the difference and sum images, using the HCF-COM images/c09_I0180.gif and images/c09_I0181.gif as features. Experimentally, Li et al. show that images/c09_I0182.gif is a better detector than both images/c09_I0183.gif and images/c09_I0184.gif.

The difference matrix images/c09_I0185.gif differs from the difference matrix of Chapter 4 in two ways. Firstly, the range is adjusted to be images/c09_I0186.gif instead of images/c09_I0187.gif. Secondly, it is downsampled by discarding every other difference; we do not consider images/c09_I0188.gif. The idea is also very similar to wavelet analysis, except that we make sure to use an integer transform and use it only along one dimension.

Since the sum and difference images images/c09_I0189.gif and images/c09_I0190.gif correspond respectively to the low-pass and high-pass Haar wavelets applied in one dimension only, the discussion above, combined with the principles of Cartesian calibration, may well be used to justify features calculated from the wavelet domain.

9.3.4 Features for Colour Images

In the grey-scale case we saw good results with a two-dimensional adjacency HCF-COM. If we want to use this in the colour case, we can hardly simultaneously consider the three colour channels jointly, because of the complexity of a 6-D Fourier transform. One alternative would be to deal with each colour channel separately, to get three features.

Most of the literature on HCF-COM has aimed to find a single discriminating feature, and to achieve this Ker (2005a) suggested adding all the three colour components together and then taking the adjacency HCF-COM. Given an RGB image images/c09_I0191.gif, this gives us the totalled image

images/c09_I0192.gif

This can be treated like a grey-scale image with three times the usual pixel range, and the usual features images/c09_I0193.gif for images/c09_I0194.gif as features of images/c09_I0195.gif.

The final detector recommended by Ker is calculated using the totalled image and calibrated with the sum image:

images/c09_I0196.gif

where images/c09_I0197.gif is half the pixel range of images/c09_I0198.gif and images/c09_I0199.gif the pixel range of images/c09_I0200.gif.

9.3.5 Pixel Selection

The premise of calibrated HCF-COM is that images/c09_I0201.gif. The better this approximation is, the better we can hope the feature to be. Downsampling around an edge, i.e. where the images/c09_I0202.gif pixels being merged have widely different values, may create new colours which were not present in the original image images/c09_I0203.gif. Li et al. (2008b) improved the feature by selecting only smooth pixel groups from the image.

Define a pixel group as

images/c09_I0204.gif

Note that the four co-ordinates of images/c09_I0205.gif indicate the four pixels contributing to images/c09_I0206.gif. We define the ‘smoothness’ of images/c09_I0207.gif as

images/c09_I0208.gif

and we say that a pixel group images/c09_I0209.gif is ‘smooth’ if images/c09_I0210.gif for some suitable threshold. Li et al. recommend images/c09_I0211.gif based on experiments, but they have not published the details.

The pixel selection image is defined (Li et al., 2008b) as

images/c09_I0212.gif

Forming the downsampled pixel-selection of images/c09_I0213.gif, we get

9.4

Based on these definitions, we can define the calibrated pixel-selection HCF-COM as

images/c09_I0215.gif

It is possible to create second-order statistics using pixel selection, but it requires a twist. The 2-D HCF-COM according to Ker (2005b) considers every adjacent pair, including pairs spanning two pixel groups. After pixel selection there would be adjacency pairs formed with pixels that were nowhere near each other before pixel selection. In order to make it work, we need to adapt both the pixel selection and the second-order histogram.

The second-order histogram is modified to count only pixel pairs within a pixel group images/c09_I0216.gif, i.e.

images/c09_I0217.gif

The pixel selection formula is modified with an extra smoothness criterion on an adjacent pixel group. To avoid ambiguity, we also define the pixel selection as a set images/c09_I0218.gif of adjacency pairs, so that the adjacency histogram can be calculated directly as a standard (1-D) histogram of images/c09_I0219.gif.

We define images/c09_I0220.gif first, so that both elements of each pair satisfy the criterion images/c09_I0221.gif. Thus we write

images/c09_I0222.gif

We now want to define images/c09_I0223.gif to include all adjacent pixel pairs of elements taken from an element images/c09_I0224.gif used in images/c09_I0225.gif. Thus we write

images/c09_I0226.gif

Let images/c09_I0227.gif be the histogram of images/c09_I0228.gif and images/c09_I0229.gif be its DFT. Likewise, let images/c09_I0230.gif be the histogram of images/c09_I0231.gif and images/c09_I0232.gif its DFT. This allows us to define the pixel selection HCF-COM as follows:

images/c09_I0233.gif

and the calibrated pixel selection HCF-COM is

images/c09_I0234.gif

9.3.6 Other Features Based on Downsampling

Li et al. (2008b) introduced a variation of the calibrated HCF-COM. They applied ratio calibration directly to each element of the HCF, defining

images/c09_I0235.gif

We can think of this as a sort of calibrated HCF. Assuming that downsampling is a cover estimate with respect to the HCF, we should have images/c09_I0236.gif for a cover image images/c09_I0237.gif. According to previous arguments, the HCF should increase for steganograms so that images/c09_I0238.gif. For this reason, Li et al. capped images/c09_I0239.gif from below, defining

images/c09_I0240.gif

and the new calibrated feature is defined as

images/c09_I0241.gif

where images/c09_I0242.gif are some weighting parameters. Li et al. suggest both images/c09_I0243.gif and images/c09_I0244.gif, and decide that images/c09_I0245.gif gives the better performance.

The very same approach also applies to the adjacency histogram, and we can define an analogous feature as

images/c09_I0246.gif

where images/c09_I0247.gif and Li et al. use images/c09_I0248.gif.

Both images/c09_I0249.gif and images/c09_I0250.gif can be combined with other techniques to improve performance. Li et al. (2008b) tested pixel selection and noted that it improves performance, as it does for other calibrated HCF-based features. Li et al. (2008a) note that the high-frequency elements of the HCF are more subject to noise, and they show that using only the first images/c09_I0251.gif elements in the sums for images/c09_I0252.gif and images/c09_I0253.gif improves detection. Similar improvements can also be achieved for the original calibrated HCF-COM feature images/c09_I0254.gif.

9.3.7 Evaluation and Notes

All the calibrated HCF-COM features in this section have been proposed and evaluated in the literature as individual discriminants, and not as feature sets for learning classifiers. However, there is no reason not to combine them with other feature sets for use with SVM or other classifiers.

There are many variations of these features as well. There is Ker's original HCF-COM and the images/c09_I0255.gif and images/c09_I0256.gif features of Li et al., each of which is in a 1-D and a 2-D variant. Each feature can be calculated with or without pixel selection, and as an alternative to using all non-redundant frequencies, one can reduce this to 64 or 128 low-frequency terms. One could also try Cartesian calibration images/c09_I0257.gif (see Section 9.4) instead of the ratio images/c09_I0258.gif.

Experimental comparisons of a good range of variations can be found in Li et al. (2008b) and Li et al. (2008a), but for some of the obvious variants, no experiments have yet been reported in the literature. The experiments of Li et al. (2008b) also give inconsistent results, and the relative performance depends on the image set used. Therefore, it is natural to conclude that when fine-tuning a steganalyser, all of the variants should be systematically re-evaluated.

9.4 Calibration in General

So far we have discussed cover estimates only. We will now turn to other classes of reference transforms. The definitions of ratio- and difference-calibrated feature images/c09_I0259.gif and images/c09_I0260.gif from Section 9.1 are still valid.

An obvious alternative to cover estimates would be a stego estimate, where the reference transform aims to approximate a steganogram, so that images/c09_I0261.gif. When the embedding replaces cover data, as it does in LSB replacement and JSteg, we can estimate a steganogram by embedding a new random message at 100% of capacity. The resulting transform will be exactly the same steganogram, regardless of whether we started with a clean image or one containing a hidden message. The old hidden message would simply be erased by the new one. Fridrich et al. (2003a) used this transform in RS steganalysis.

Stego estimation is much harder when the distortion of repeated embeddings adds together, as it would in LSB matching or F5. Double embedding with these stego-systems can cause distortion of images/c09_I0262.gif in a single coefficient. Thus the transform image would be a much more distorted version than any normal steganogram.

Both cover and stego estimates are intuitive to interpret and use for steganalysis. With cover estimates, images/c09_I0263.gif for covers so any non-zero value indicates a steganogram. With stego estimates it is the other way around, and images/c09_I0264.gif for steganograms.

Kodovský and Fridrich (2009) also discuss other types of reference transforms. The simplest example is a parallel reference, where

images/c09_I0265.gif

for some constant images/c09_I0266.gif. This is clearly degenerate, as it is easy to see that

images/c09_I0267.gif

so that the difference-calibrated feature vector contains no information about whether the image is a steganogram or not.

The eraser transform is in a sense similar to stego and cover transforms. The eraser aims to estimate some abstract point which represents the cover image, and which is constant regardless of whether the cover is clean or a message has been embedded. This leads to two requirements (Kodovský and Fridrich, 2009):

images/c09_I0268.gif

The second requirement is a bit loosely defined. The essence of it is that images/c09_I0269.gif must depend on the image images/c09_I0270.gif/images/c09_I0271.gif. Requiring that the feature vector of the transform is ‘close’ to the feature vectors of images/c09_I0272.gif and images/c09_I0273.gif is just one way of achieving this.

Both stego and cover estimates result in very simple and intuitive classifiers. One class will have images/c09_I0274.gif (or images/c09_I0275.gif), and the other class will have something different. This simple classifier holds regardless of the relationship between images/c09_I0276.gif and images/c09_I0277.gif, as long as they are ‘sufficiently’ different. This is not the case with the eraser transform. For the eraser to provide a simple classifier, the shift images/c09_I0278.gif caused by embedding must be consistent.

The last class of reference transforms identified by Kodovský and Fridrich (2009) is the divergence transform, which serves to boost the difference between clean image and steganogram by pulling images/c09_I0279.gif and images/c09_I0280.gif in different directions. In other words, images/c09_I0281.gif and images/c09_I0282.gif, where images/c09_I0283.gif.

It is important to note that the concepts of cover and stego estimates are defined in terms of the feature space, whereas the reference transform operates in image space. We never assume that images/c09_I0284.gif or images/c09_I0285.gif; we are only interested in the relationship between images/c09_I0286.gif and images/c09_I0287.gif or images/c09_I0288.gif. Therefore the same reference transform may be a good cover estimate with respect to one feature set, but degenerate to parallel reference with respect to another.

9.5 Progressive Randomisation

Over-embedding with a new message has been used by several authors in non-learning, statistical steganalysis. Rocha (2006) also used it in the context of machine learning; see Rocha and Goldenstein (2010) for the most recent presentation of the work. They used six different calibrated images, over-embedding with six different embedding rates, namely

images/c09_I0289.gif

Each images/c09_I0290.gif leads to a reference transform images/c09_I0291.gif by over-embedding with LSB at a rate images/c09_I0292.gif of capacity, forming a steganogram images/c09_I0293.gif.

Rocha used the reciprocal of the ratio-calibrated feature discussed earlier. That is, for each base feature images/c09_I0294.gif, the calibrated feature is given as

images/c09_I0295.gif

where images/c09_I0296.gif is the intercepted image.

This calibration method mainly makes sense when we assume that intercepted steganograms will have been created with LSB embedding. In a sense, the reference transform images/c09_I0297.gif will be a stego estimate with respect to any feature, but the hypothetical embedding rate in the stego estimate will not be constant. If we have an image of size images/c09_I0298.gif with images/c09_I0299.gif bits embedded with LSB embedding, and we then over-embed with images/c09_I0300.gif bits, the result is a steganogram with images/c09_I0301.gif embedded bits, where

9.6 9.5

This can be seen because, on average, images/c09_I0303.gif of the new bits will just overwrite parts of the original message, while images/c09_I0304.gif bits will use previously unused capacity. Thus the calibration will estimate a steganogram at some embedding rate images/c09_I0305.gif, but images/c09_I0306.gif depends not only on images/c09_I0307.gif, but also on the message length already embedded in the intercepted image.

Looking at (9.6), we can see that over-embedding makes more of a difference to a natural image images/c09_I0308.gif than to a steganogram images/c09_I0309.gif. The difference is particularly significant if images/c09_I0310.gif has been embedded at a high rate and the over-embedding will largely touch pixels already used for embedding.

Even though Rocha did not use the terminology of calibration, their concept of progressive randomisation fits well in the framework. We note that the concept of a stego estimate is not as well-defined as we assumed in the previous section, because steganograms with different message lengths are very different. Apart from this, progressive randomisation is an example of a stego estimate. No analysis was made to decide which embedding rate images/c09_I0311.gif gives the best reference transform images/c09_I0312.gif, leaving this problem instead for the learning classifier.

We will only give a brief summary of the other elements making up Rocha's feature vector. They used a method of dividing the image into sub-regions (possibly overlapping), calculating features from each region. Two different underlying features were used. The first one is the images/c09_I0313.gif statistic which we discussed in Chapter 2. The second one is a new feature, using Maurer's (1992) method to measure the randomness of a bit string. Maurer's measure is well suited for bit-plane analysis following the ideas from Chapter 5. If we consider the LSB plane as a 1-D vector, it should be a random-looking bit sequence for a steganogram, and may or may not be random-looking for a clean image, and this is what Maurer's test is designed to detect.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.254.61