In real applications only a limited number of samples are available for data analysis, referred to as the power of the test. In this case, the data sample pool is generally not sufficiently large to constitute reliable statistics that can be used to characterize the LRT Λ(r) implemented by a detector. Under such a circumstance there is no effective means of producing Λ(r) and the ROC analysis must be carried out with data samples rather than statistics, p0(r) and p1(r).
In what follows, we define
N = total number of data samples used for a particular detection method (technique)
Nsignal = total number of data samples with presence of a signal (according to ground truth)
Nno-signal = total number of data samples with absence of a signal (according to ground truth)
ND = total number of data samples with presence of a signal which is actually detected by the method
NF = total number of data samples with absence of a signal, but claimed to have an signal detected by the method
NM = total number of data samples with presence of a signal which is not detected by the method
NTN = total number of data samples with presence of a signal and also claimed to have no signal detected by the method.
False alarm or false positive rate/probability is defined by
False negative or miss rate/probability:
(3.11)
Detection power/true-positive rate/probability:
(3.12)
True-negative rate/probability:
Based on (3.10)–(3.13), the following relationships are true:
with
Until now Equations (3.10)–(3.13), (3.14)–(3.16) are defined based on real samples. So, for a given set of sample pool used for testing any detection technique, only one point (PD, PF) can be generated for the ROC curve of a particular technique. Therefore, in order to produce a complete ROC curve for any specific detection technique (method), an infinite number of samples pool are required, which is practically impossible. One way to mitigate this difficulty is to assume that the noise in the binary hypothesis decision problem described by (3.1) is a zero-mean Gaussian distribution and the given sample pool is sufficiently large to generate reliable statistics. In this case, we can calculate the sample mean and variance for each hypothesis, and then assume these calculated sample means and variances to be the Gaussian means and variances under each hypothesis. With these new Gaussian distributions, finding the ROC curve of a specific detection technique (method) becomes feasible and can be actually derived mathematically from a standard signal detection theory as follows.
Now if we further assume that the probability density functions p0(y) and p1(y) in (3.1) that govern H0 and H1 are Gaussian distributions with means μ0 and μ1 and variances and calculated from a large pool of samples, respectively, the Λ(r) in (3.5) becomes . As a result, (3.6) and (3.7) can be further simplified to
where and are two Gaussian distributions with means μ0 and μ1 and variances and , respectively. Furthermore, if both of the variances and are set to 1, (3.25) and (3.26) simplify to most familiar forms:
Using Figure 3.3 as an example, a decision can be made by adjusting threshold τ via (3.19) and (3.20) where the regions corresponding to PD, PF, PM, and PTN are indicated with different shaded areas. For example, PD is the area to the right of the threshold τ under the Gaussian distribution p1(y) when H1 is true and PF is the area to the right of the threshold τ under the Gaussian distribution p0(y) when H0 is true. On the contrary, PM is the area to the left of the threshold τ under the Gaussian distribution p1(y) when H1 is true and PTN is the area to the left of the threshold τ under the Gaussian distribution p0(y) when H0 is true. It should be noted that the threshold τ is determined by the false alarm rate. If the false alarm rate is upper bounded by β in (3.4) with Gaussian distributions, using (3.17) we can calculate the corresponding τ by the following:
where Φ(x) is a standard Gaussian distribution given by
Therefore, the best decision to find an optimal threshold τ for (3.18) is (3.21) which is determined only by β. We now substitute τ given by (3.21) for τ in (3.18) and obtain the best detection power given by
Using (3.21) and (3.23) we can plot a Gaussian-fitted ROC curve of PD versus PF = β for real data.
A major disadvantage resulting from the use of the traditional ROC curve is that both the detector statistics, Λ(r), implemented by the Neyman–Pearson detector δNP(r) and the threshold τ are independent parameters and are not specified in the ROC curve of (PD, PF) where PD and PF are actually dependent on Λ(r) and τ. Since the value of Λ(r) obtained from a data sample r is generally real valued and represents the detected signal strength present in r, that is, concentration level of a signal in r, a soft decision must be made directly on Λ(r) by varying the threshold τ instead of using the parameter β imposed on PF. In doing so, we introduce a normalized detected signal strength of Λ(r) as
Using τ as a detection threshold value between 0 and 1 for (3.24) we can further define a normalized Neyman–Pearson detector, denoted by based on as follows:
which uses τ as a threshold value to convert the normalized real value of to a binary value. Accordingly, a “1” produced by (3.25) indicates that a target is detected; otherwise, there is no target present. By varying in (3.25), a family of detectors are generated for target detection, where for each τ the detector produces its pair of detection rate and a false alarm rate, (PD, PF). Therefore, if a third dimension is created to specify the threshold τ that is used to define a detector via (3.35), a 3D ROC curve can be generated and plotted based on three parameters, PD, PF and τ. With such a 3D ROC curve, three 2D ROC curves can also be generated, the 2D ROC curve of (PD, PF) which is the traditional ROC curve in Figure 3.1, a 2D ROC curve of (PD,τ) and a 2D ROC curve of (PF,τ). To generate a 3D ROC curve for real data, three steps are performed:
(3.28)
Figure 3.6 shows a diagram of relationships among ΩS, ΩSD, ΩM, ΩFA, ΩNS, and ΩNSD.
In analogy with Section 3.5.2, we can also generate Gaussian fitted 3D ROC curves by the following steps:
(3.31)
(3.32)
(3.33)
(3.34)
The hypothesis testing problem (3.1) considered so far assumes the standard signal detection in noise (SN) model where hypotheses H0 and H1 represent noise and signal + noise, respectively. There have been studies on extending (3.1) to two scenarios. One is called the signal/background/noise model proposed in Thai and Healey (2002) which includes background B as a third signal source described by
A second scenario is called the signal-decomposed interference/noise (SDIN) model suggested in Du and Chang (2004b) and is given by
where the signal source S considered in the SBN model (3.35) is further decomposed into the desired signal source D and the undesired signal source U and the background B considered in the SBN model is included in the interference signal source matrix I.
Using the SDIN model specified by (3.36), we can interpret various commonly used models as follows. When and D = S, the SDIN model is reduced to the standard SN model. If D = S and I = B with , then the SDIN model becomes the SBN model. The SDIN allows us to deal with multisignal detection and classification by interpret and signal sources as a signal source matrix comprising multiple signal sources with D = d representing the target signal source of interest to be detected and U being other target signal sources with no interest to d.
In order to extend a single target signal detection-based ROC analysis to a multiple-signal detection model specified by (3.36), we assume that there are p signal sources of interest, . Then the detection rate RD(mj) and false alarm rate RF(mj) for the jth signal source mj defined by
(3.37)
(3.38)
where ND(mj) is the total number of true pixels which are mj and detected as mj, NF(mj) is the total number of true pixels which are not mj but detected as mj, N(mj) is the total number of pixels that are specified by target signature mj and N is the total number of pixels in the image. For detection of multiple signal sources, the detection rate/power PD and false alarm rate PF are then replaced by the mean detection rate and mean false alarm rate , respectively, which can be defined by taking the mean of RD(mj) and the mean of RF(mj) over all the p as
where , N(mj) is the total number of pixels, which are, mj and is the total number of all target pixels given by .
It notes that through specified by (3.25) along with (3.39) and (3.40) each of multiple signals, will be detected and classified jointly by a fixed and same threshold τ for all the p signal sources to produce a point in a 2D space given by . This is different from the single signal detection of mj which uses its own and separate individual threshold τi in (3.25) to produce its own pair (PD, PF). However, such a subtle and crucial difference cannot be seen from the traditional 2D ROC curve of (PD, PF) since the threshold τ is hidden in PD and PF and the curve cannot show its influence on both PD and PF.
By decreasing τ from 1 to 0 in a third dimension, it results in a 3D mean-ROC curve, which can be used to evaluate the performance of a detector where the (x,y) coordinate is specified by (,τ) and the z-axis is specified by . Using this 3D mean-ROC curve we can further plot three 2D curves, a curve of versus which is the traditional ROC curve, a curve of versus τ and a curve of versus τ for detection performance analysis. Once the 2D ROC curve of (,) is generated, the area under the curve is calculated and defined as detection rate, which can be used to evaluate the effectiveness of a detector. The higher the detection rate the better the detector.
18.222.196.175