Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

20.3 Representation of Transformed Components for DP

When the data are linearly transformed from the original data space to a new data space, each transformed data sample vector is essentially a linear combination or mixture of data sample vectors in the original space. In order to effectively represent the data in this new linearly transformed data space, it is desirable to find a set of basic constituent elements that can serve as a base for all the transformed data sample vectors. In this case, each basic constituent element represents a one-dimensional transformed component whose information significance can be measured by an information criterion. This section presents one such approach that can be considered as a generalization of PCA and ICA.

20.3.1 Projection Index-Based PP

In Section 6.5, an approach called projection pursuit (PP) is developed to generalize PCA and ICA to a PP-based component analysis transform in which the PP-transformed components are specified by projection vectors derived from a more general concept called projection index (PI). Such a PI-based PP is referred to as PIPP and its generated transformed components can be ranked by an information measure for DP. Although PIPP is already given in Section 6.5.1 of Chapter 6, we recap its details here for reference.

The term “PP”, as first coined by Friedman and Tukey (1974), was used to represent a technique for exploratory analysis of multivariate data. The idea is to project a high-dimensional data set into a low-dimensional data space while retaining the information of interest. It designs a PI to explore projections of interestingness. We assume that there are N data points , each with dimensionality K, and is a K × N data matrix and w is a K-dimensional column vector that serves as a desired projection. Then represents an N-dimensional row vector that is the orthogonal projections of all sample data points mapped onto the direction w. Now if is a function measuring the degree of the interestingness of the projection for a fixed data matrix X, a PI is a real-valued function of w, defined by

(20.1)

The PI can be easily extended to multiple directions, . In this case, is a K × J projection direction matrix and the corresponding projection index is also a real-valued function, is given by

(20.2)

The choice of the in (20.1) and (20.2) is application dependent. Its purpose is to reveal interesting structures within data sets such as clustering. The PP using PI specified by (20.2) is called PI-based project pursuit (PIPP). Within the context of PIPP, PCA and ICA can be considered as special cases of PIPP in which PCA uses data variance as a PI to produce eigenvectors, while ICA uses mutual information as a PI to produce statistically independent projection vectors. However, finding an optimal projection matrix W in (20.2) is not a simple matter, since there is no equation similar to the characteristic polynomial equation that can be used to find eigenvalues and eigenvectors analytically. In this case, the PI is confined to statistics of high orders, such as skewness, kurtosis, entropy, mutual information, information divergence (ID), etc., so that an equation can be used to solve a projection matrix W in (20.2) derived as follows.

We assume that the ith PI-projected transformed component is described by a random variable ζ_i, with values specified by the gray-level value of the nth pixel denoted by . The original data set is first sphered to remove the mean and to make the covariance matrix an identity matrix. Let denote the set of the sphered data sample vectors. A general form to be used to solve a projection matrix with PI specified by the kth-order orders of statistics: kth moment is recently derived by Wang and Chang (2006a) and Ren et al. (2006) by solving the following eigen problem for w:

(20.3)

Specifically, for , the form in (20.3) is called equations of skewness and kurtosis, respectively. Unlike the case of PCA, which solves eigenvalues of a data sample covariance matrix via the characteristic polynomial equation and then uses the obtained eigenvalues to obtain eigenvectors that specify its principal components (PCs), PIPP needs to solve (20.3) directly for the projection matrix because there is no counterpart of a characteristic polynomial equation in PIPP that can be used to derive the W. To do so, a general approach developed by Ren et al. (2006) and Wang and Chang (2006a) is to implement PIPP to produce components, referred to as projection index components (PICs) in which a PI is used as a criterion to find directions of interestingness of data to be processed and then represents the data in the data space specified by these new interesting directions.

Instead of finding the projection matrix , an algorithm developed by Ren et al. (2006) for finding a sequence of projection vectors, , to solve (20.3) can be described as follows.

Projection-Index Projection Pursuit (PIPP) Algorithm

1. Initial condition:

is a data matrix and a PI is specified.

2. The first projection vector

is found by maximizing the PI.

3. The obtained

is used to generate the first projection image

that can be used to detect the first projection vector.

4. The orthogonal subspace projector (OSP) specified by

is applied the data set X to produce the first OSP-projected data set denoted by

5. The data set

is used to find the second projection vector

by maximizing the same PI again.

6. Let

be applied to the data set

to produce the second OSP-projected data set denoted by

, which can be used to produce the third projection vector

by maximizing the same PI again. Or, equivalently, we define a matrix projection matrix

and apply

to the data set X to obtain

7. The procedure of steps 5 and 6 is repeated many times to produce

until a stopping criterion is met. It should be noted that a stopping criterion can be either a predetermined number of projection vectors required to be generated or a predetermined threshold for the difference between two consecutive projection vectors.

20.3.2 Mixed Projection Index-Based Prioritized PP (M-PIPP)

PIPP, as described in Section 20.3.1, uses the same PI to generate all the PICs. In general, it does not have to be this case. Since different PIs are designed to capture different details of information, it may be more effective in that PIPP can adapt its PI while it generates PICs. A similar idea proposed by Chai et al. (2007) who developed mixed PCA/ICA component analysis can also be applied to PIPP, referred to as mixed PIPP (M-PIPP) where the same PI used in step 5 in the above PIPP algorithm can be replaced and specified by different PIs. For example, the first, second, and third PICs can be produced by different PIs specified by variance, skewness, and kurtosis in order to represent the first data variance-specified principal component, the second skewness-specified component, and the kurtosis-specified component, respectively. The M-PIPP algorithm is exactly the same PIPP algorithm with the exception that the same PI implemented for all the components in step 5 can be replaced by various PIs, as specified by users. More details can be found in Safavi and Chang (2008) and Safavi (2010).

20.3.3 Projection Index-Based Prioritized PP (PI-PRPP)

According to PIPP described in Section 20.3.1, a vector is randomly generated as an initial condition used by PIPP to converge to a desired projection vector that is used to generate a PIC. As a consequence, a different randomly generated initial condition may converge to a different projection vector that also results in a different PIC. In other words, if PIPP is performed at different times or by different users, the resulting final PICs will also be different due to the use of different sets of random vectors. In order to correct this problem, this section presents a PI-based prioritized PP (PI-PRPP) that also uses a PI as a prioritization criterion to rank PIPP-generated PICs so that all PICs will be prioritized in accordance with the priorities measured by the given PI. Such a PI is called the PIC prioritization index. In this case, the PICs will always be ranked and prioritized by the PIC prioritization index in the same order regardless of what initial vectors are used to produce projection vectors. It must be noted that there is a major distinction between PIPP and PI-PRPP. While PIPP uses a PI as a criterion to produce a desired projection vector for each of PICs, the PI-PRPP uses a PIC prioritization index to prioritize PIPP-generated PICs. Therefore, the PIs used in both PP and PI-PRPP are not necessarily the same PI. In other words, the PI used to prioritize PICs as a PIC prioritization index can be different from the PI used to generate the PICs. As a matter of fact, on many occasions, different PIs can be used in applications. In what follows, we describe various criteria that use statistics beyond the second order and can be used to define a PIC prioritization index.

Projection Index (PI)-Based Criteria

1. Sample mean of third-order statistics: skewness for ζ_j:

(20.4)

where

is the sample mean of the third order of statistics in the PIC_j.

2. Sample mean of fourth-order statistics: kurtosis for ζ_i:

(20.5)

where

is the sample mean of the fourth order of statistics in the PIC_j.

3. Sample mean of kth-order statistics: kth central moments for ζ_j:

(20.6)

where

is the sample mean of the kth moment of statistics in the PIC_j.

4. Neg-entropy: combination of third and fourth orders of statistics for ζ_j:

(20.7)

It should be noted that (20.7) is taken from (5.35) in Hyvarinen et al. (2001, p. 115), which is used to measure the neg-entropy by high-order statistics.

5. Entropy

(20.8)

where

is the probability distribution derived from the image histogram of PIC_i.

6. Information divergence (ID)

(20.9)

where

is the probability distribution derived from the image histogram of PIC_i and

is the Gaussian probability distribution with the mean and variance calculated from PIC_i..

20.3.4 Initialization-Driven PIPP (ID-PIPP)

The PI-PRPP presented in Section 20.3.3 intended to remedy the issue that PICs could appear in a random order due to the use of randomly generated initial vectors. PI-PRPP allows users to prioritize PICs according to the information significance measured by a specific PIC prioritization index. Despite the fact that the PICs ranked by PI-PRPP may appear in the same order independent of different sets of random initial conditions, they are not necessarily identical because the slight discrepancy in two corresponding PICs at the same appearing order may be caused by randomness introduced by their used initial conditions. Although such a variation may be minor compared to different appearing orders of PICs without prioritization, the inconsistency may still cause difficulty in data analysis. Therefore, this section further develops a new approach, called initialization-driven PP (ID-PIPP), that custom-designs an initialization algorithm to produce a specific set of initial conditions for PIPP so that the same initial condition is always used whenever PIPP is implemented. Therefore, ID-PIPP-generated PICs are always identical. When a particular initial algorithm, say A, is used to produce a specific initial set of vectors for ID-PIPP to generate PICs, the resulting ID-PIPP is referred to as A-ID-PIPP.

One good candidate algorithm that can be used for this purpose is the automatic target generation process (ATGP) developed previously by Ren and Chang (2003). It makes use of an orthogonal subspace projector defined by (2.78)

(20.10)

where is the pseudoinverse of the U, in a repetitive manner to find target pixel vectors of interest from the data without prior knowledge regardless of what types of pixels these targets are. Details of implementing ATGP can be found in Section 8.5.1 and redescribed in the following steps.

Automatic Target Generation Process

1. Initial condition:

Let L be the total number of spectral bands.

An initial target pixel vector of interest denoted by t₀ is selected. In order to initialize ATGP without knowing t₀, we select a target pixel vector with the maximum length as the initial target t₀, namely,

, which has the highest response, that is, the brightest pixel vector in the image scene. Set

and

(It is worth noting that this selection may not be necessarily the best selection. However, according to our experiments it was found that the brightest pixel vector was always extracted later on, provided that it was not selected as an initial target pixel vector in the initialization.)

2. At the nth iteration,

is applied via (20.10) to all image pixels r in the image and the nth target t_n generated at the nth stage is found, which has the maximum orthogonal projection as follows:

(20.11)

where

is the target matrix generated at the (n − 1)st stage.

3. Stopping rule:

, let

be the nth target matrix, go to step 2. Otherwise, continue.

4. At this stage, ATGP is terminated. At this point, the target matrix is U_L−1, which contains L − 1 target pixel vectors as its column vectors, which do not include the initial target pixel vector t₀.

As a result of ATGP, the final set of L target pixel vectors produced by ATGP at step 4, , will be used as an initial set of vectors to produce L PICs, where each of the L target pixels in S_ATGP is used to generate a particular PIC. An ID-PIPP using ATGP as its initialization algorithm is called ATGP-ID-PIPP.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 20.3 Representation of Transformed Components for DP

Create new playlist

Sign In

Sign Up

20.3 Representation of Transformed Components for DP

20.3.1 Projection Index-Based PP

20.3.2 Mixed Projection Index-Based Prioritized PP (M-PIPP)

20.3.3 Projection Index-Based Prioritized PP (PI-PRPP)

20.3.4 Initialization-Driven PIPP (ID-PIPP)

Table of Contents for
20.3 Representation of Transformed Components for DP