index

A

ABAC (attribute-based access control) 229, 279

access control

implementing fine-grained 229

supporting attribute-level 228

types of 279

accuracy 218219

additive homomorphic encryption 255

adult dataset 66, 69

age attribute 213

agglomerative hierarchical clustering 168

AI (artificial intelligence) 35

AMI (adjusted mutual information) score 75

anonymization 151155

beyond k-anonymity 154155

private information sharing vs. privacy concerns 151152

using k-anonymity against re-identification attacks 152154

ARI (adjusted rand index) 75

association rule hiding 20, 217218

attacks 816

challenges of privacy protection in big data analytics 1516

correlation attacks 16

identification attacks 16

de-anonymization or re-identification attacks 15

membership inference attacks 1315

model inversion attacks 1213

on database systems 222225

targeting data confidentiality 223224

targeting data privacy 224225

problem of private data in 9

reconstruction attacks 912

attacker's perspective of 1011

real-world scenario involving 1112

attribute-based access control (ABAC) 229, 279

attribute-level access control 228

Australian dataset 172

AVERAGE query 286

B

big data analytics 1516

correlation attacks 16

identification attacks 16

binary mechanism (randomized response) 3537

binary response 35

binning method 138

bins 138

blinding 258

breast cancer dataset 172

budget, privacy 3132

C

CA (continuous authentication) applications 256

categorical values 191

cloud-based storage 280282

CNAE dataset 86, 88

complex statistics 25

composition properties 5155, 296298

parallel composition 5455, 297

sequential composition 5152, 296297

compressive privacy 2122, 237

concrete attacks 224

confidence parameter 217

confidentiality of data 223224

continuous values 191193

continuous variables 132

correlation attacks 16, 225

counting queries

Laplace mechanism 3839

sequential composition of 5254

CP (compressive privacy) 233267, 269

mechanisms 237239

other dimensionality reduction methods 238239

PCA 237

overview 235236

privacy-preserving PCA and DCA on horizontally partitioned data 251266

achieving privacy preservation on horizontally partitioned data 253254

evaluating efficiency and accuracy of 263266

how privacy-preserving computation works 258263

overview of proposed approach 256257

recapping dimensionality reduction approaches 254255

using additive homomorphic encryption 255

using for ML applications 239251

accuracy of utility task 246248

effect of p' in DCA 249251

implementing compressive privacy 240246

create operation 278

CRUD (create, read, update, and delete) operations 286

cryptographic-based approaches 284

CSP (crypto service provider) 253, 255, 257

CSP (crypto service provider)-based trust model 276277

D

DAC (discretionary access control) 279

data

horizontally partitioned data

privacy-preserving PCA and DCA on 251266

how data is processed inside ML algorithms 6

problem of private data in clear 9

publishing data 186200

implementing data sanitization operations in Python 189193

k-anonymity 193198

storage with NoSQL database 280282

synthetic data

evaluating performance of 171176

generating 169170

use of private data 6

data flows 231

data generator 169

data management 202232

database systems 220231

attacks on 222225

considerations for designing customizable privacy-preserving database system 228231

how likely to leak private information 222

SDB systems 225227

threats and vulnerabilities 221222

modifying data mining output 216220

association rule hiding 217218

inference control in statistical databases 219 – 220

reducing accuracy of data mining operation 218219

privacy protection beyond k-anonymity 204215

implementing privacy models with Python 211215

l-diversity 205207

t-closeness 208211

privacy protection in data processing and mining 203204

data mining 1921, 179201

different approaches to data publishing 20

how to protect privacy on data mining algorithms 2021

importance of privacy preservation in 180182

modifying input 185186

modifying output 216220

association rule hiding 217218

inference control in statistical databases 219220

reducing accuracy of data mining operation 218219

on privacy-preserving data collection 20

privacy protection in 183184, 203204

impact of privacy regulatory requirements 184

what is data mining and how can it help 183

publishing data 186200

implementing data sanitization operations in Python 189193

k-anonymity 193198

data owner 4

data perturbation approach 226

data preprocessing 169

data privacy 224225

data processing and mining 204

database systems 220231

attacks on 222225

targeting data confidentiality 223224

targeting data privacy 224225

considerations for designing customizable privacy-preserving database system 228231

implementing fine-grained access control to data 229

keeping rich set of privacy-related metadata 228

maintaining privacy-preserving information flow 231

protecting against insider attacks 231

supporting attribute-level access control mechanisms 228

how likely to leak private information 222

SDB systems 225227

threats and vulnerabilities 221222

data protection schemes currently employed by industry 221

privacy assurance as challenge 221222

DataHub 268289

integrating privacy and security technologies into 280288

data storage with cloud-based secure NoSQL database 280282

PPML 284286

privacy-preserving data collection with LDP 282284

privacy-preserving query processing 286287

using synthetic data generation 287288

research collaboration workspace 272280

architectural design 275276

blending different trust models 276278

configuring access control mechanisms 278280

significance of research data protection and sharing platform 270272

important features 271272

motivation behind DataHub 270271

DCA (discriminant component analysis) 143

effect of p' in 249251

on horizontally partitioned data 251266

achieving privacy preservation on horizontally partitioned data 253254

evaluating efficiency and accuracy of 263266

how privacy-preserving computation works 258263

overview of proposed approach 256257

recapping dimensionality reduction approaches 254255

using additive homomorphic encryption 255

DE (direct encoding) 104, 140

de-anonymization 15

delete operation 278

diabetes dataset 172

differential privacy. See DP

differentially private distributed PCA (DPDPCA) protocol 85

diffprivlib IBM's Differential Privacy Library 41, 67

dimensionality reduction (DR) 238239, 254255

dims array 242

direct encoding (DE) 104110, 140

discrete variables 132

discretionary access control (DAC) 279

discretionary models 279

discriminant component analysis. See DCA

displayImage routine 241

DLPA (distributed Laplace perturbation algorithm) 287

downgrading classifier effectiveness 20

DP (differential privacy) 15, 1718, 2555, 234235, 268, 285, 291298

composition properties of 296298

parallel composition DP 297

sequential composition DP 296297

concept of 2729

for synthetic data generation 155167

DP synthetic histogram representation generation 156159

DP synthetic multi-marginal data generation 162167

DP synthetic tabular data generation 160162

formal definition of 291292

how it works 3035

formulating solution for private company scenario 3235

privacy budget 3132

sensitivity of 3031

mechanisms 3548, 292295

binary mechanism (randomized response) 3537

exponential mechanism 4348

Gaussian mechanism 293294

geometric mechanism 293

Laplace mechanism 3843

staircase mechanism 294

vector mechanism 295

Wishart mechanism 295

properties of 4855

composition properties 5155

group privacy property 5051

postprocessing property 4849

DP sanitizer 169

DPDPCA (differentially private distributed PCA) protocol 85, 285

DR (dimensionality reduction) 238239, 254255

E

EDBs (encrypted databases) 224

education-num attribute 213

EFB (Equal Frequency Binning) 138

EMD (earth mover distance) 209

empirical risk 61

epsilon (ϵ) 32

European Union’s GDPR (General Data Protection Regulation) 8

EVD (eigenvalue decomposition) 237, 259

EWB (Equal-Width Binning) 138

EWD (Equal-Width Discretization) 137

exponential mechanism 4348

F

feature-level clustering 170

feature-level micro-aggregation case study 168176

evaluating performance of generated synthetic data 171176

datasets used for experiments 172

performance evaluation and results 172176

generating synthetic data 169170

concepts 170

how data preprocessing works 169170

using hierarchical clustering and micro-aggregation 168169

FERPA (Family Educational Rights and Privacy Act) 8

fine-grained access control 229

fit command 242

full system compromise 224

G

Gaussian mechanism 293294

GDPR (General Data Protection Regulation) 8

generalization technique 152

geometric mechanism 293

GEVD (generalized eigenvalue decomposition) 262

GISETTE dataset 86, 88

Glasses dataset 240

grayscale representation scheme 10

group privacy property 5051

H

hierarchical clustering 168169

HIPAA (Health Insurance Portability and Accountability Act of 1996) 8

histograms

DP synthetic histogram representation generation 156159

encoding 110117

SHE 113114

THE 114117

queries 4041

HMAC (hash value) 281

horizontally partitioned data

privacy-preserving PCA and DCA on 251266

achieving privacy preservation on horizontally partitioned data 253254

evaluating efficiency and accuracy of 263266

how privacy-preserving computation works 258263

overview of proposed approach 256257

recapping dimensionality reduction approaches 254255

using additive homomorphic encryption 255

I

identification attacks 16, 225

inference control 219220

inferring membership 1415

information

maintaining privacy-preserving information flow 231

private information sharing vs. privacy concerns 151152

injection attacks 223

input modification 185186

insider attacks 231

ISOLET dataset 86

K

k-anonymity 1819, 193198, 204215, 287

anonymization beyond 154155

does not always work 195198

implementing in Python 198200

implementing privacy models with Python 211215

l-diversity 205207

t-closeness 208211

using against re-identification attacks 152154

what is k and how to apply 194195

k-means++ itemization 76

KMeans function 76

Kogan, Aleksandr 4

L

l-diversity 18, 205207, 287

Laplace mechanism 3843

counting queries 3839

histogram queries 4041

LDA (linear discriminant analysis) 59, 255

LDP (local differential privacy) 18, 95122

concept of 97101

in detail 98100

scenario with survey 100101

mechanisms 104121

direct encoding 104110

histogram encoding 110117

unary encoding 117121

privacy-preserving data collection with 282284

randomized response for 101104

leaking information

database systems 222

l-diversity 206207

LIBSVM dataset repository 86

linear discriminant analysis (LDA) 59, 255

load_digits dataset 75

M

MAC (mandatory access control) 279

machine learning. See ML

MAX query 219

MCDO (multiple-class data owners) 260261

MDA (multiple discriminant analysis) 59

MDAV (maximum distance to average record) 169

MDR (multiclass discriminant ratio) 239, 255

mean 65

mean squared error (MSE) 172

membership inference attacks 1315

metadata, privacy-related 228

MGGM (multivariate Gaussian generative model) 287

micro-aggregation 168169

minutiae representation scheme 10

ML (machine learning) 325, 124, 146, 269

PPML 1622

compressive privacy 2122

DP 1718

LDP 18

privacy-preserving data mining techniques 1921

privacy-preserving synthetic data generation 1819

privacy complications in AI era 45

threat of learning beyond intended purpose 58

how data is processed inside ML algorithms 6

importance of privacy protection in ML 7

regulatory requirements and utility vs.privacy tradeoff 78

use of private data 6

threats and attacks for 816

challenges of privacy protection in big data analytics 1516

de-anonymization or re-identification attacks 15

membership inference attacks 1315

model inversion attacks 1213

problem of private data in clear 9

reconstruction attacks 912

using CP for 239251

accuracy of utility task 246248

effect of p' in DCA 249251

implementing 240246

MLaaS (Machine Learning as a Service) 4

model inversion attacks 1213

models.GaussianNB module 67

MSE (mean squared error) 172

multi-marginal data generation, DP synthetic 162167

multiple-class data owners (MCDO) 260261

multivariate Gaussian generative model (MGGM) 287

mydca object 242

mypca object 242

N

noise addition techniques 226

noise information 238

noise matrix SW 260

non-CSP-based trust model 277278

NoSQL database 280282

ntests variable 246

O

object 278

OLAP (online analytical processing) 225

Olivetti faces dataset 240

operation 278

ORDER BY query 286

OSDC (Open Science Data Cloud) 272

OT (oblivious transfer) techniques 254

OUE (optimal unary encoding) 117, 140

output perturbation 226

P

p' 249251

parallel composition 5455, 297

PCA (principal component analysis) 143, 237, 284

on horizontally partitioned data 251266

achieving privacy preservation on horizontally partitioned data 253254

evaluating efficiency and accuracy of 263266

how privacy-preserving computation works 258263

overview of proposed approach 256257

recapping dimensionality reduction approaches 254255

using additive homomorphic encryption 255

PCI DSS (Payment Card Industry Data Security Standard) 8

perturbation-based approaches 285286

phase representation scheme 10

pip command 211

platform sharing 270272

important features 271272

motivation behind DataHub 270271

plausible deniability 19

postprocessing property 4849

PPDM (privacy-preserving data mining) 19

PPE (property-preserving encryption) 224

PPML (privacy-preserving machine learning) 1622, 284286

compressive privacy 2122

cryptographic-based approaches 284

DP 1718

LDP 18

perturbation-based approaches 285286

privacy-preserving data mining techniques 1921

different approaches to data publishing 20

how to protect privacy on data mining algorithms 2021

techniques on privacy-preserving data collection 20

privacy-preserving synthetic data generation 1819

preprocessing data 169170

principal component analysis. See PCA

principal components 237

privacy 324

PPML 1622

compressive privacy 2122

DP 1718

LDP 18

privacy-preserving data mining techniques 1921

privacy-preserving synthetic data generation 1819

privacy complications in AI era 45

threat of learning beyond intended purpose 58

how data is processed inside ML algorithms 6

importance of privacy protection in ML 7

regulatory requirements and utility vs.privacy tradeoff 78

use of private data 6

threats and attacks for ML systems 816

challenges of privacy protection in big data analytics 1516

de-anonymization or re-identification attacks 15

membership inference attacks 1315

model inversion attacks 1213

problem of private data in clear 9

reconstruction attacks 912

privacy budget 30, 32

processing, data 203204

proximity matrix 168

proxy 78

publishing data 186200

implementing data sanitization operations in Python 189193

working with categorical values 191

working with continuous values 191193

k-anonymity 193198

implementing in Python 198200

k-anonymity does not always work 195198

what is k and how to apply 194195

Python

data sanitization operations in 189193

implementing privacy models with 211215

k-anonymity implementation in 198200

Q

queries

counting

Laplace mechanism 3839

sequential composition of 5254

privacy-preserving query processing 286287

query (or data) restriction technique 227

query auditing and restriction 21

R

random initializations 76

randomized response (RR) 18, 101104

randrange function 241

RAPPOR (Randomized Aggregatable Privacy-Preserving Ordinal Response) 5, 97

RBAC (role-based access control) 228, 279280

re-identification attacks 152

overview 15

using k-anonymity against 152154

read operation 278

reconstruction (leakage-abuse) attacks 224

reconstruction attacks 912

attacker's perspective of 1011

real-world scenario involving 1112

regulatory requirements 78, 184

research collaboration workspace 272280

architectural design 275276

blending different trust models 276278

CSP-based trust model 276277

non-CSP-based trust model 277278

configuring access control mechanisms 278280

research data protection 270272

important features 271272

motivation behind DataHub 270271

rho parameter 242

rho_p parameter 242, 249

role-based access control (RBAC) 228, 280

RR (randomized response) 18

run operation 278

S

sample space 62

sample-level clustering 170

sanitization operations 189193

working with categorical values 191

working with continuous values 191193

SCDO (single-class data owner) 260261

scikit-learn load_digits dataset 75

score function 44

SDB (statistical database) systems

inference control in 219220

privacy preserving techniques in 225227

sensitivity, DP 3031

sequential composition 5152, 296297

SHE (summation with histogram encoding) 113114, 140

signal information 238

signal matrix SB 260

skeleton representation scheme 10

skewness attack 207

snapshot leaks 224

spectral decomposition of the center-adjusted scatter matrix 237

staircase mechanism 294

statistic extraction 169

subject 278

SUE (symmetric unary encoding) 117, 121, 140

SUM query 219, 286

support parameter 217

suppression technique 152153

SVMs (support vector machines) 62, 172

synthetic data generation 1819, 146176

application aspects of using for privacy preservation 149150

assuring privacy via data anonymization 151155

anonymization beyond k-anonymity 154155

private information sharing vs. privacy concerns 151152

using k-anonymity against re-identification attacks 152154

DP for 155167

DP synthetic histogram representation generation 156159

DP synthetic multi-marginal data generation 162167

DP synthetic tabular data generation 160162

importance of 148149

in DataHub platform 287288

private synthetic data release via feature-level micro-aggregation case study 168176

evaluating performance of generated synthetic data 171176

generating synthetic data 169170

using hierarchical clustering and micro-aggregation 168169

process of 150151

synthetic multi-marginal data 163

T

t-closeness 18, 208211, 287

tabular data generation, DP synthetic 160162

TDE (Transparent Data Encryption) 221

THE (thresholding with histogram encoding) 114117, 140

“This Is Your Digital Life” quiz (Kogan) 4

threats, database systems 221222

TLS (Transport Layer Security) 221

trust models 276278

CSP-based trust model 276277

non-CSP-based trust model 277278

U

UCI ML repository 86, 141

unary encoding 117121

update operation 278

utility function 44

utility task feature space 236

V

variance 65

vector mechanism 295

view 229230

VM image leakage attacks 224

VMs (virtual machines) 222, 224, 271

VoIP (voice over IP) 221

VPNs (virtual private networks) 221

vulnerabilities, database systems 221222

W

W3C’s (World Wide Web Consortium’s) P3P (Platform for Privacy Preferences Project) 228

Wishart mechanism 295

Y

Yale Face Database 240

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.244.201