without duplication and loss. There is a distinct for [14, 13] that the authors
deal with the data as a graph instead of a tree, which was assumed in the
previous work [21].
There are some other works applying the generalization and suppression
strategies to tackle the privacy preserving issue [23, 10]. The basic idea of
[10] is that it applies a top down approach to traverse all the candidate
generalization. Because of the special property of the process, which is to
reverse the bottom up process, it will decrease the privacy and increase
the accuracy of the data while traversing the candidates. As stated in the
paper [10], the method can control the process so as to obey the k-anonymity
rule. In a later paper [23], the authors introduced several complementary
strategies, e.g., bottom-up generalization, to further improve the whole
performance.
The essential step to generate the candidate generalization is to
traverse all the subspaces of multi-dimensions. As a result, it could use
genetic algorithm or simulated annealing to tackle the issue. Iyengar [11]
introduced the genetic algorithm based strategy to transform the original
into k-anonymity model. In another work [24], the authors proposed a
simulated annealing algorithm to address the problem.
In addition to the commonly used strategies, i.e., generalization and
regression, there is some other techniques proposed, such as the cluster
based approach [8, 2, 3]. The basic idea for these works is that the records
are fi rst clustered and each cluster is represented by some representative
value (e.g., average value). With the help of these pseudo data, privacy can
be effectively preserved while the aggregation characteristics of the original
data is well reserved. However, how to measure the trade-off between the
privacy and the reserved data information seems to be an issue.
1RQ6HQVLWLYH 6HQVLWLYH
=LS&RGH $JH 1DWLRQDOLW &RQGLWLRQ
5XVVLDQ +HDUW'LVHDVH
$PHULFDQ +HDUW'LVHDVH
-DSDQHVH 9LUDO,QIHFWLRQ
$PHULFDQ 9LUDO,QIHFWLRQ
,QGLDQ &DQFHU
5XVVLDQ +HDUW'LVHDVH
$PHULFDQ 9LUDO,QIHFWLRQ
$PHULFDQ 9LUDO,QIHFWLRQ
$PHULFDQ &DQFHU
,QGLDQ &DQFHU
-DSDQHVH &DQFHU
$PHULFDQ &DQFHU
1RQ6HQVLWLYH 6HQVLWLYH
=LS&RGH $JH 1DWLRQDOLW &RQGLWLRQ
+HDUW'LVHDVH
+HDUW'LVHDVH
9LUDO,QIHFWLRQ
9LUDO,QIHFWLRQ
ı &DQFHU
ı +HDUW'LVHDVH
ı 9LUDO,QIHFWLRQ
ı 9LUDO,QIHFWLRQ
&DQFHU
&DQFHU
&DQFHU
&DQFHU
D,QSDWLHQW0LFURGDWD
EDQRQPRXV,QSDWLHQW0LFURGDWD
Figure 9.2.1: Example for l-diversity [17]
Using views appropriately is another technique to protect privacy. The
basic idea is that we can just show a small part of the views (that sensitive
attributes can be controlled) to the public. However, this approach may fail if
we unintentionally publish some important part of the views, which lead to
Privacy Preserving in Data Mining 207