A
Access control
content-based, 478
manager, 399
and policy ontology modeling, 326
system, 408
Access control policies, 15, 16
attribute-based access control, 19
authorization-based access control policies, 16–18
role-based access control, 18–19
usage control, 19
Access Token List (AT-list), 283
Access Tokens (AT), 282
Access Token Tuples (ATT), 282
Accuracy-weighted classifier ensembles (AWEs), 343
Actual data, 357
ADABOOST. PL algorithms, 193
ADCi., see Aggregated dissimilarity count
“Added error”, 117
Administration policies, 20; see also Access control policies
Advanced CPT system, 382–383, 384
Advanced Encryption Standard (AES), 331, 333, 336
Adversaries, 403
adversarial data miner, 400
Aerosol optical depth (AOD), 435
AES, see Advanced Encryption Standard
AFOSR, see Air Force Office of Scientific Research
Aggregated dissimilarity count (ADCi), 153
Aggregate object, 521
Airavat, 440
Aircraft equipment problem, 161
Air Force Office of Scientific Research (AFOSR), 308
Air quality data, 435
AIS, see Assured information sharing
ALKH12b approach, 193
Amazon, 332
Amazon S3, 331
integrating blackbook with Amazon S3, 331–335
server, 336
Amazon Web Services (AWS), 83
DynamoDB Accelerator, 83
Android applications, 418
Android WindowManager, 424
ANNs, see Artificial neural networks
Anomalies, 189
anomalous data generation process, 228
Anomaly detection, 47, 190, 223–224, 227, 414; see also Graph-based anomaly detection (GBAD)
over IoT network traffic, 411
in social network and author attribution, 252–253
AOD, see Aerosol optical depth
Apache
Accumulo, 440
Apache-distributed file system, 309–310
Apche Pig, 80
Flink, 80
HDFS, 51
Kafka, 80
Mahout, 83
APIs, see Application program interfaces
Apple’s official scrutinization process, 418
Application program interfaces (APIs), 81, 324, 415, 447
Android, 418
Cyber security, 46
framework relationship, 502
independent integrity constraints, 515
of SNOD, 300
App Store, 421
AprioriTid, 35
Arbitrary model, 221
Area under ROC curve (AUC), 163
ARM, see Association rule mining
ARMA process, 409
Artificial drift, 228
Artificial neural networks (ANNs), 27, 28–31
ASRS, see Aviation Safety Reporting Systems
Association rule mining (ARM), 27, 35–37
Association rules, 35
Assured information sharing (AIS), 307
algorithm, 35
prototypes, 324
AT, see Access Tokens
AT-list, see Access Token List
Attack; see also Cyber security (CyS)
BDMA for preventing cyber attacks, 480
collusion, 252
computer, 47
covert channel, 420
on critical infrastructures, 45–46
data, 403
host-based, 47
network-based, 47
types, 142
AT&T Network (ATT), 403
Attribute-based access control, 15, 19
AUC, see area under ROC curve
Audio signal, 409
Authorization-based access control policies, 16
conflict resolution, 17
consistency and completeness of rules, 18
negative authorization, 17
positive authorization, 17
propagation of authorization rules, 17
strong and weak authorization, 17
Authorization rules propagation, 17
Automatic image annotation, 39–40
Automatic vehicle location techniques, 405
Autonomy, 520
Availability, 47, 56, 79, 83, 193, 379, 380
Aviation Safety Reporting Systems (ASRS), 100, 161–162, 173
AWEs, see Accuracy-weighted classifier ensembles
AWS, see Amazon Web Services
B
Background
check score computation, 295
Backup and recovery, 517
Baseline
frameworks, 279
Basic graph patterns (BGPs), 64, 312
Basic Security Mode auditing program (BSM auditing program), 207
Batch learning
algorithms, 94
techniques, 106
BDMA, see Big data management and analytics
BDSP, see Big data security and privacy
Behavior(al)
analysis, 414
detection mechanisms, 414
detector, 414
signatures, 414
Behavioral feature extraction and analysis, 415
classification model development, 417
evolving data stream classification, 416–417
graph-based behavior analysis, 415–416
sequence-based behavior analysis, 416
Bestplan problem, 275
BGPs, see Basic graph patterns
Big data, 1, 173, 331, 394–396, 470
access control and privacy policy challenges in, 474
big dataset for insider threat detection, 244–245
business intelligence meets, 473
CPT within context of big data and social networks, 388–390
extensions for big data-based social media applications, 326–327
formal methods for preserving privacy while loading, 473
integrity for, 396
issues, 184
problem, 494
results for big data set relating to insider threat detection, 245
securing big data in cloud, 473
stream mining as big data mining problem, 253
techniques for scalability, 192–193
technologies, 15
big data and cloud for malware detection, 340
binary n-grams, 351
design and implementation of system, 344
EMPC, 352
empirical error reduction and time complexity, 345
ensemble construction and updating, 344
error reduction analysis, 344–345
experiments, 349
Hadoop/MapReduce framework, 345–347
for insider threat detection, 454
malicious code detection, 347–349
privacy aware, 473
related work, 303–304, 342–344
security, and privacy, 5
for security applications, 472
and security layer, 9
for social media applications, 290
techniques, 181
Big data management
and cloud for assured information sharing, 308
commercial developments, 326
experiments, 336
extensions for big data-based social media applications, 326–327
formal policy analysis, 321
implementation approach, 321
integrating blackbook with Amazon S3, 331–335
overall related research, 324–326
related work, 321
system design, 309
Big data management and analytics (BDMA), 1, 9, 13, 79, 261, 339, 373, 377, 453, 469, 470–471, 483, 485
Apache Cassandra, 82
Apache CouchDB, 82
Apache HBase, 82
Apache Hive, 81
Apache Mahout, 83
curriculum development, 455–457
educational program and experimental infrastructure, 454
education and infrastructure program, 455
experimental BDMA systems, 4
Google BigQuery, 81
Google BigTable, 82
infrastructure tools to host BDMA systems, 79–80
layered framework, 486
MongoDB, 82
NoSQL database, 81
supporting technologies for, 2–3
systems and tools, 80
technologies, 173
Weka, 83
Big data security and privacy (BDSP), 1, 9, 13, 261, 339, 373, 377, 453, 469, 485
big data analytics for security applications, 472
capstone course on, 461
community building, 472
educational program and experimental infrastructure, 454
experimental BDSP systems, 4
issues in, 469
layered framework, 486
philosophy for, 475
research issues, 470
supporting technologies for, 2–3
Big data systems, 379
integrity for big data, 396
integrity management, 391, 394–396
BigQuery, 81
BigSecret system, 440
Big streaming analytics framework, 409
BigTable, 193
Binary classification, 31
Binary code analysis, 427, 455, 461
Binary signatures, 414
BioMANTA project, 193
Biometrics techniques, 20
BitMat, 267
Bit string, 476
Blackbook integration with Amazon S3, 331–335
BLS, see U.S. Bureau of Labor and Statistics
BOAT, see Bootstrapped optimistic decision tree
Bootstrapped optimistic decision tree (BOAT), 106
Botnet, 122
Bots, 350
Breaking ties by summary statistics, 277–278
BSM auditing program, see Basic Security Mode auditing program
Buffer (buf), 134
C
CAISS, see Cloud-based information sharing system
CAISS++, see Ideal cloud-based assured information sharing system
Calgary dataset, 227
CapEx, see Capital expenditure
Capital expenditure (CapEx), 332
Capstone BDMA course, 456
Capstone course
on BDSP, 461
on secure mobile computing, 428
Cassandra, Apache, 57, 82, 436, 438
CBIR, see Content-based image retrieval
CCA, see Computer Corporation of America
Centralized architecture, 509
Centroid (µ), 133
Chief Information Officer (CIO), 307
Chi square statistic, 72
Chronic obstructive pulmonary disease (COPD), 435
Chunking, 439
“chunk-based” approaches, 417
CIE, see Confidentiality inference engine
CIO, see Chief Information Officer
Classes, 59
analysis and discussion, 137
deviation between approximate and exacting q-NSC computation, 138–140
justification of novel class detection algorithm, 137–138
with novel class detection, 133, 134–137
problem, 27
time and space complexity, 140–141
Classifier-based data mining technique, 171
Classify(M,xj,buf) algorithm, 133–134
Class/subclass hierarchy, 521
Client–server approach, 518
Client–server architectures, 493, 496
Cloud-based information sharing system (CAISS), 309–311, 373, 489
Cloud, 409
cloud-centric policy manager, 308
cloud-design of Inxite to hanndle big data, 301–302
cloud-enabled NoSQL systems, 56
data systems, 379
deployment models, 53
development and security, 428
provider, 55
storage and data management, 54–55
Cloudant, 84
Cloud computing, 51, 173, 237, 263, 307, 331, 332
cloud storage and data management, 54–55
components, 52
framework, 173
frameworks based on semantic web technologies, 63–65
for malware detection, 341
technologies, 52
Cloud platforms, 83
Amazon Web Services’ DynamoDB, 83
Google’s cloud-based big data solutions, 84
IBM’s cloud-based big data solutions, 84
Microsoft Azure’s Cosmos DB, 83–84
Cloud query processing system for big data management
approach, 264
cloud computing, 263
contributions, 265
experimental setup, 264, 279–280
results, 279
for integrity management, 394
models, 54
algorithm, 39
cluster-impurity, 109
techniques, 28
CM, see Compression method
CMRJ, see Conflicting MapReduceJoins
CNSIL, see Computer Networks and Security Instructional Lab
Collusion attack, 252
Command sequences (cseq), 244
Communication
data, 410
devices, 405
energy-efficient, 410
small communication frames, 407
wireless communication networks, 404
Community building, 472
Complete elimination, 275
Completely labeled training data, 94
Complexity
analysis, 224
of Bestplan, 276
of inference engine, 365
Compound impurity-measure, 109–110
Compressed/quantized dictionary construction, 251–252
Compression-based techniques, 203
Compression method (CM), 221
Compression/quantization using MR, 243
Computer attacks, 47
Computer Corporation of America (CCA), 495
Computer Networks and Security Instructional Lab (CNSIL), 427
Concept-adapting very fast decision tree learner (CVFDT), 106
Concept-drifting data streams, 127, 171
classification with novel class detection, 133–141
datasets and experimental setup, 122
ensemble development, 115
error reduction using MPC training, 116–121
evaluation approach, 143
performance study, 122–125, 143
Concept-drifting synthetic dataset (SynD), 161
Concept-evolving synthetic dataset (SynDE), 161, 166
Concept drift, 93–95, 141, 160, 253, 340, 373, 410
issues, 416
in sequence stream, 238
SynDE, 161
synthetic data with, 99
Concept evolution, 93, 95–97, 410
synthetic data with, 99
Concept Instantiation, 60
Concept satisfiability, 60
Concept subsumption, 60
Confidentiality, 379
approach to confidentiality management, 384–385
Confidentiality inference engine (CIE), 382–383
Confidentiality, privacy, and trust (CPT), 379, 380, 483, 489
approach to confidentiality management, 384–385
big data systems, 379
within context of big data and social networks, 388–390
framework, 381
integrated system, 387–388, 389
privacy for social media systems, 385–387
trust for social networks, 387
trust, privacy, and confidentiality, 379–381, 383–384
Conflicting MapReduceJoins (CMRJ), 271
resolution, 17
Consistency and completeness of rules, 18
Constraints, 24
constraint-based approaches, 109
Content-based access control, 478
Content-based image retrieval (CBIR), 38
Content-based score computation, 294
Control processing units, 405
Control systems, 405
Conventional data mining, 476
Conventional relational database management system, 436
COPD, see Chronic obstructive pulmonary disease
CoreNLP, 458
Cost estimation for query processing, 270–274
Covert channel attack in mobile apps, 420
CPS, see Cyber-physical systems
CPT, see Confidentiality, privacy, and trust
Credit card fraud, 45
Critical infrastructures, 43
attacks on, 45
security for, 428
CRM, see Customer relationship management
Cryptographic approaches, 476, 477
Cryptographic commitment, 476
cseq, see Command sequences
Curriculum development, 426, 455–457, 460
capstone course on BDSP, 461
capstone course on secure mobile computing, 428
extensions to existing courses, 426–428, 460–461
integration of study modules with existing
courses, 460
“Curse of dimensionality”, 39
Customer relationship management (CRM), 332
Cutset networks, 447
CVFDT, see Concept-adapting very fast decision tree learner
Cyber attacks, BDMA for preventing, 480
Cybercrime datasets, 190
Cyber-defense framework, 403
Cyber-physical systems (CPS), 405, 428
security, 455
Cyber-Provenance Infrastructure for Sensor-based Data-Intensive Research (CY-DIR), 454
Cyber security (CyS), 1–2, 43, 459
applications, 46
data mining for, 43
data mining services for cyber security, 47
Cyber signals, 409
CY-DIR, see Cyber-Provenance Infrastructure for Sensor-based Data-Intensive Research
CyS, see Cyber security
D
DaaS, see Data as a Service
DAC, see Discretionary access control
DAG, see Directed acyclic graph
Data; see also Information; Security
accuracy, 393
and applications security, 427, 460
authenticity, 393
classification methods, 447
collection, 363
completeness, 393
confidentiality, 478
gathering, 419
generation and storage, 267–268
lifecycle framework, 479
ownership, 479
points, 217
publication, 479
quality policy, 395
recovery, 393
reduction techniques, 471
reverse engineering of Smartphone applications, 419
sanitization approaches, 476
services, 52
sources, 436
virtualization, 54
warehousing, 523
Data analytics, 436; see also Big data analytics
system, 408
techniques, 471
Data as a Service (DaaS), 53
Database
design process, 511
virtualization, 54
Database administrator (DBA), 20, 511
Database management systems (DBMS), 507, 522–523
autonomy, 520
centralized architecture, 509
database administration, 511–512
distributed databases, 517–518
entity-relationship data models, 507, 508–509
extensible, 511
functional architecture, 510
heterogeneous and federated data management, 518–520
relational data models, 507, 508
three-schema architecture, 510
Database system, 21
technology, 470
Data management systems, 407, 471, 493
building information systems from framework, 500–502
comprehensive view, 496
developments in database systems, 494–497
relationship between texts, 502–504
status, vision, and issues, 497
3D view, 499
algorithms, 471
answering queries using Hadoop mapreduce, 74
artificial neural networks, 28–31
challenges, related work, and approach, 68–69
for cyber security, 43
data mining-based malware detectors, 342
Data mining (Continued)
data mining applications, 74–75
data mining services for cyber security, 47
feature extraction and compact representation, 70–72
and insider threat detection, 68
for insider threat detection, 69
outcomes, 27
RDF repository architecture, 72–73
support vector machines, 31–32
Data-oblivious learning mechanisms, 465
Data-obliviousness, 464
multiobjective optimization framework for, 476–477
Data security, 16
policy enforcement and related issues, 21–24
security impact on database functions, 25
Dataset(s), 160–162, 279, 349–350
Data-sharing policies, 408
Data stream classification, 93, 149, 172
approach to data stream classification, 105–106
comparison with baseline methods, 163–165
ensemble classification, 156–160
ensemble classification, 107–108
experiments, 99–100, 160, 162–163
with limited labeled data, 109–110
MPC ensemble approach, 171–172
network intrusion detection using, 106
novel class detection, 108
and novel class detection in data streams, 172
novelty detection, 108
problems and proposed solutions, 94
running times, scalability, and memory requirement, 165–166
with scarcely labeled data, 172
sensitivity to parameters, 166–168
single-model classification, 106–107
task, 127
training with limited labeled data, 152–156
Data streams, 3, 93, 127, 173, 410, 446, 457, 463
classification and novel class detection in, 172
classifiers, 417
constructing LZW Dictionary by selecting patterns, 221–222
DBA, see Database administrator
DBD, see Duplicate big data
DBMS, see Database management systems
DCS, see Distributed control systems
DDBMS, see Distributed database management system
DDTS, see Distributed Database Testbed System
Decentralized CAISS++, 314–315, 316
Deductive database systems, see Next-generation database systems
Deep learning, 494
Demand management, 404
Demographics-based score computation, 294
Department of Defense (DoD), 307
Description length (DL), 204, 415, 416
Description logics (DL), 59–60, 310, 358
Descriptive tasks, 27
Detecting anomalies, 27
DGSOT, 47
Dictionary construction and compression using single MR, 243–244
Digital Equipment Corporation, 495
Digital forensics, 427
BDMA for, 480
DIM, see Distributed Integrity Manager
Directed acyclic graph (DAG), 362
Discretionary access control (DAC), 323–324
policies, 15
Dissimilarity count, 153
Distance-based techniques, 109
Distributed control systems (DCS), 405
Distributed database management system (DDBMS), 517–518
Distributed database systems, 264
Distributed Database Testbed System (DDTS), 495
Distributed feature extraction and selection, 348–349
Distributed Integrity Manager (DIM), 517
Distributed Metadata Manager (DMM), 517
Distributed processing of SPARQL, 319–320
Distributed processor (DP), 517, 518
Distributed Query Processor (DQP), 517, 518
Distributed reasoners (DRs), 312–313
Distributed reasoning, 325–326
Distributed Security Manager (DSP), 517
Distributed system, 264
Distributed Transaction Manager (DTM), 517
Diverse computing systems, 403
DL., see Description length; Description logics
DLL, see Dynamic-Link Library
DMM, see Distributed Metadata Manager
DoD, see Department of Defense
DP, see Distributed processor
DQP, see Distributed Query Processor
DroidDream, 413
DRs, see Distributed reasoners
DSP, see Distributed Security Manager
DTM, see Distributed Transaction Manager
Duplicate big data (DBD), 245
Dynamic-Link Library (DLL), 342
Dynamic analysis, 421
Dynamic chunk size, 173
Dynamic feature vector, 173
Dynamo, 193
E
E-count(v), 275
E-M technique, see Expectation-maximization technique
Early elimination heuristic, 277
EC, see Explicit content
ECSMiner, see Enhanced Classifier for Data Streams with novel class Miner
Efficiency, 391
Electronic patient record (EPR), 355–356
ElephantSQL, 84
Embedded systems, 405
Emergency room (ER), 435
EMPC, see Extended, multipartition, multichunk
Empirical error reduction and time complexity, 345
Encapsulation, 521
Enclave Page Cache (EPC), 462
Encoded sensing (ES), 410
Energy-efficient communication, 410
Enhanced Classifier for Data Streams with novel class Miner (ECSMiner), 95, 96, 100–101, 108, 109, 127, 129, 141, 142, 172
creating decision boundary during training, 132–133
nearest neighborhood rule, 129–130
novel class and properties, 130–131
Enhanced policy engine, 310
Enhanced SPARQL query processor, 310
Ensemble
construction and updating, 344
size, 173
for supervised learning, 200–201
techniques, 417
training process, 160
for unsupervised learning, 199–200
Ensemble-based insider threat detection, 197
ensemble for supervised learning, 200–201
ensemble for unsupervised learning, 199–200
Ensemble-based learning, 183
algorithms, 203
approach, 190
Ensemble-based stream mining, 76
Ensemble-based techniques, 177, 207
Ensemble-based USSL, 220
Ensemble classification, 107–108, 156
classification overview, 156
ensemble update, 160
time complexity, 160
Entity
entity-relationship data models, 507, 508–509
Entropy, 132
EPC, see Enclave Page Cache
EPR, see Electronic patient record
ER, see Emergency room
Erlang, 82
Error rates (ERR), 143, 145, 146
Error reduction
using MPC training, 116
time complexity of MPC, 121
ES, see Encoded sensing
ETL, see Extract-transfer-load
Evaluation approach, 143
Expectation-maximization technique (E-M technique), 109, 131, 150
optimizing objective function with, 154–155
Experimental activities, 419
covert channel attack in mobile apps, 420
large scale, automated detection of SSL /TLS, 421
location spoofing detecting in mobile apps, 420
Experimental program, 457, 461
association between big data management and case studies, 457
coding for political event data, 458
geospatial data processing on GDELT, 458
programming projects to supporting lab, 462–465
timely health indicator, 459
layer, 9
Expert systems support, 300–301
Explicit content (EC), 70
Explicit type information of object, split using, 269
Extended, multipartition, multichunk (EMPC), 344, 352, 489
Extended relational database systems, 521
eXtensible Access Control Markup Language (XACML), 307, 440
eXtensible Markup Language (XML), 15, 57, 58, 485
layer, 58
schemas, 61
security, 62
Extensions for big data-based social media applications, 326–327
Extensions to existing courses, 426, 460–461
big data analytics and management, 428
Critical Infrastructure Security, 428
data and applications security, 427
developing and securing cloud, 428
digital forensics, 427
integration of study modules with existing courses, 426
language-based security, 428
network security, 427
systems security and binary code analysis, 427
External threat detection, 189, 190
Extract-transfer-load (ETL), 56
F
Fading factor, 199
False detection, 197
False negatives (FN), 190, 197, 212, 230, 251
False positive rates (FPR), 183, 230
False positives (FP), 186, 190, 197, 212, 230, 251
Farthest-first traversal heuristic, 155
Fast classification model, 174
Fault
detection, 95
fault-tolerant computing, 24
FDP, see Federated data processor
Feature weighting, 175
Federated data management, 518–520
Federated data processor (FDP), 519
Field actuation mechanisms, 404
predicate object split, 74
Filtered outlier (F outliers), 97, 134–135
Firewalls, 407
First-order logic formulas and inference, 443
First-order Markov model, 34
Five Vs, see Volume, velocity, variety, veracity, and value
FN, see False negatives
Forecasting, 409
Forest cover dataset, 100
from UCI repository, 142
Formal policy analysis, 321, 324
Forming associations, 27
Foursquare, 289
F outliers, see Filtered outlier
FP, see False positives
FPR, see False positive rates
Framework design, 437
mixed continuous and discrete domains, 444–446
offline scalable statistical analytics, 442–444
privacy and security aware data management for scientific data, 440–442
real-time stream analytics, 446–448
storing and retrieving multiple types of scientific data, 437–440
Framework integration, 320
Frequency, 221
Frequent itemset graph, 36, 37
“Friends-smokers” social network domain, 443, 444
Functional architecture, 510
Functional database systems, 522–523
Functionality, 415
Future system, 439–442, 444, 446
online structure learning methods for stream classification, 447–448
semisupervised classification/prediction, 446–447
G
Gaussian distribution, 141, 163, 204
GBAD, see Graph-based anomaly detection
GDELT, see Global Database of Event, Language, and Tone
Generating and populating knowledge base, 366
Generic problems, 456
Genetic algorithms, 109
Geospatial data processing on GDELT, 458
GFS, see Google File System
Gibbs sampling, 444
Gini index, 132
Global big data security and privacy controller, 400–401
Global data-mining models, 408
Global Database of Event, Language, and Tone (GDELT), 458
geospatial data processing on, 458
Google, 266
BigTable, 82
Calendar, 405
cloud-based big data solutions, 84
Compute Engine, 409
Google+, 289
Monkey tool, 423
Google File System (GFS), 82, 193, 438
GPS-equipped vehicles techniques, 405
Graph
analysis, 70
graph-based behavior analysis, 415–416
mining techniques, 69
rewriting, 361
transformation, 361
Graph-based anomaly detection (GBAD), 183–184, 190, 197, 203–204, 251; see also Anomaly detection
GBAD-MDL, 204
GBAD-MPS, 205
models, 488
Graphical models and rewriting, 361
Graphical user interface (GUI), 421
GREE88 dataset, 227
Guest machine, 54
Guests, 54
GUI, see Graphical user interface
H
cluster, 244
distributed system setup, 351
storage architecture, 312, 318, 325
Hadoop distributed file system (HDFS), 51, 70, 79, 173, 174, 184, 237, 265, 312, 322
Hadoop/MapReduce, 438
technologies, 373
HAN, see Home area network
HAQU13a approach, 193
HAQU13b approach, 193
Hard subspace clustering, 71
hardware-assisted security, 406
hardware-level security, 406
services, 52
virtualization, 54
Hardware security modules (HSMs), 406
HDFS, see Hadoop distributed file system
HDP, see Heterogeneous data processor
Healthcare, 1
architecture of methodologies, 437
for big data analytics and security, 433
Health Insurance Portability and Accountability Act (HIPAA), 356
Heart rate monitor, 407
Heterogeneity, 410
issue, 69
Heterogeneous components, 403
Heterogeneous data(base)
interoperability, 501
systems, 496
types, 517
Heterogeneous data processor (HDP), 518–519
Heterogeneous IoT environment, 409
Hewlett Packard Company, 495
Open Cirrus Testbed, 51
Hijacked kernel function pointers, 455
HIPAA, see Health Insurance Portability and Accountability Act
Hive-based assured cloud query processing, 322
HiveQL, 81
HMLNs, see Hybrid MLNs
Home area network (HAN), 405
Homomorphic encryption schemes, 463
Host-based attacks, 47
Host BDMA systems, infrastructure tools to, 79–80
Host machine, 54
HSMs, see Hardware security modules
HTML, see Hypertext Markup Language
Hybrid cloud, 53
Hybrid high-order Markov chain models, 189
Hybrid layout, 319
Hybrid MLNs (HMLNs), 444
Hyperplane technique, 161
Hypertext Markup Language (HTML), 263
Hypervisor, see Virtual machine monitor
I
IaaS, see Infrastructure as a Service
IARPA, see Intelligence Advanced Research Project Activity
IBM
cloud-based big data solutions, 84
System R, 494
IBM, see International Business Machine Corporation
ICD, see International Classification of Diseases
ICDE, see International Conference on Data Engineering
ICE, see Immigration and Customs Enforcement
Ideal cloud-based assured information sharing system (CAISS++), 309, 312, 489
framework integration, 320
hybrid layout, 319
limitations, 312
naming conventions, 318
policy specification and enforcement, 320–321
Identity
management, 51
theft, 45
IDS, see Intrusion detection systems
IG, see Information gain
Image mining, 38
automatic image annotation, 39–40
feature selection, 39
goal, 39
image classification, 40
IME, see Input method editor
IME/Update app, 425
Immigration and Customs Enforcement (ICE), 424
Implicit type information of object, split using, 269
Impurity measurement, 153
IMS, see Information management system
In-line reference monitor (IRM), 76
INAN12, 477
Incident management, 404
Incremental learning, 106, 183, 190, 191, 218, 219
Incremental probabilistic action modeling (IPAM), 191
Index, 208
Inference, 355
web, 365
domains and provenance, 362–363
inference controller with two users, 363–364
through query modification, 361
SPARQL query modification, 364–365
Inference controller, 355, 360, 365, 400
approach, 365
background generator module, 366–367
generating and populating knowledge base, 366
implementation of medical domain, 365–366
complexity, 365
Infinite length, 93–95, 340, 410
Infinite sequences, 217
Information; see also Data
sharing manager, 399
systems from data management systems framework, 500–502
Information engine, 291
information integration, 293
Information gain (IG), 71
Information management system (IMS), 81
Information Resource Dictionary System (IRDS), 495
Information technology (IT), 339, 405
Informix Corporation, 495
Infrastructure as a Service (IaaS), 53, 332
Infrastructure development, 421, 455
curriculum development, 426–428
virtual laboratory development, 421–426
project at University of California at Berkeley, 23
Input events generation, 424
Input files selection, 270
Input method editor (IME), 424
Insider threat detection, 51, 67–68, 189–191, 209, 251; see also Malware detection; Security policies
additional experiments, 252
anomaly detection in social network and author attribution, 252–253
big data analytics for, 454
big data issues, 184
challenges, related work, and approach, 68–69
collusion attack, 252
comprehensive framework, 75–76
feature extraction and compact representation, 70–72
incorporate user feedback, 252
RDF repository architecture, 72–73
sequence stream data, 184
stream data analytics applications for, 3–4
stream mining as big data mining problem, 253
as stream mining problem, 183, 184
SVMs, 251
Insider threats, 43–44, 67, 197, 203
analysis, 46
Instrumental behavior analysis, 415
Integrated system, 387–388, 389
Integration framework, 310–311
for big data, 396
of data, 380
Intellidimension RDF Gateway, 385
Intelligence Advanced Research Project Activity (IARPA), 331
Intelligent fuzzier for automatic android GUI application testing, 423
Intelligent transportation systems, 404
Intel SGX-enabled machine, 461
SDK and SGX driver, 462
Interface manager, 358
International Business Machine Corporation (IBM), 494
International Classification of Diseases (ICD), 439
International Conference on Data Engineering (ICDE), 472
Internet of Things (IoT), 2, 377, 403–404, 433, 485
layered framework for securing, 406–407
scalable analytics for IOT security applications, 408–411
of heterogeneous database systems, 518
Interuser parallelization, 244
Intrusion detection systems (IDS), 27, 414
application of SNOD, 300
cloud-based system, 289
cloud-design of Inxite to hanndle big data, 301–302
expert systems support, 300–301
implementation, 302
InXite-Law, 302
InXite-Marketing, 302
InXite-Security, 302
plug-and-play approach, 291
threat detection and prediction, 298–300
InXite POI
profile generation and analysis, 293–294
IoT, see Internet of Things
IPAM, see Incremental probabilistic action modeling
IRDS, see Information Resource Dictionary System
IRM, see In-line reference monitor
IT, see Information technology
Iterative conditional mode algorithm (ICM algorithm), 155
J
Jena (Java application programming package), 266, 385
Job JB, 271
JobTracker, 79
Joining variable, 275
K
Kafka, 448
KDD cup 1999 intrusion detection dataset (KDD99), 100, 141–142, 160–161
KEND98 dataset, 207
Keynote presentations, 473
access control and privacy policy challenges in big data, 474
additional presentations, 474
authenticity of digital images in social media, 473
big data analytics, 473
business intelligence meets big data, 473
final thoughts, 474
formal methods for preserving privacy while loading big data, 473
privacy in world of mobile devices, 474
securing big data in cloud, 473
timely health indicators using remote sensing and innovation for validity of environment, 474
toward privacy aware big data analytics, 473
K-means clustering, 28
K-means clustering with cluster-impurity minimization (MCI-K means), 152–154
K models, 209
k-nearest neighbor algorithm (KNN algorithm), 40, 149, 342
classification model, 131
k-NN-based approach, 108
KNN algorithm, see k-nearest neighbor algorithm
Knowledge base, 282
Knowledge representation (KR), 59
L
K-means clustering with cluster-impurity minimization, 152–154
optimizing objective function with E-M, 154–155
problem description, 152
storing classification model, 155–156
training with limited, 152
unsupervised K-means clustering, 152
Labeled points, 155
Language-based security, 428
Large scale, automated detection of SSL /TLS, 421
Layered framework for secure IOT, 406–407
Layered security framework, 403
LBAC, see Location based access control
Learning classes
supervised learning, 203
unsupervised learning, 203–205
Learning models, 183
Lehigh University Benchmark (LUBM), 314
Lempel−Ziv–Welch algorithm (LZW algorithm), 220, 224, 237
constructing LZW Dictionary by selecting patterns, 221–222
dictionary construction using MR, 241–242
scalable LZW and QD construction using MR job, 238–244
Leveraging randomized response-based differential-privacy technique, 408
LIBSVM, 209
Lifted learning and approximations of pseudolikelihood, 445
Lightweight IP-based network stacks, 407
Lincoln Laboratory Intrusion Detection dataset, 207, 210–211
“Lineage”, 394
Link analysis, 28
LinkedIn, 289
L-model, 158
Location based access control (LBAC), 359, 398
Location spoofing detecting in mobile apps, 420
Logic database systems, see Next-generation database systems
LOGITBOOST.PL algorithms, 193
Loop detectors, 404
Lossy compression process, 221
6LoWPAN, 407
LUBM, see Lehigh University Benchmark
LZW algorithm, see Lempel−Ziv–Welch algorithm
M
Machine learning, 409
algorithms, 83
Mahout, 193
Major mechanical problem, 98
Malicious applications, 418
Malicious code detection, 347
distributed feature extraction and selection, 348–349
nondistributed feature extraction and selection, 347–348
Malicious insiders, 3
Malicious intrusions, 45
behavior modeling, 415
dataset, 350
Malware detection, 46, 95, 340–342, 414–419; see also Insider threat detection
application to Smartphones, 418–419
behavioral feature extraction and analysis, 415–417
cloud computing for, 341
as data stream classification problem, 340–341
experimental activities, 419–421
infrastructure development, 421–426
reverse engineering methods, 417
in Smartphones, big data analytics for, 413, 414
Mandatory security policies, 15
Manual labeling of data, 149
Map input phase (MI phase), 272
Map keys (MKey), 346
Map output phase (MO phase), 272
Mappings, 509
MapReduce framework (MR framework), 51, 56, 70, 79, 184, 193, 237, 265–266, 269, 348, 428, 438, 456
breaking ties by summary statistics, 277–278
compression/quantization, 243
cost estimation for query processing, 270–274
input files selection, 270
LZW dictionary construction, 241–242
paradigm, 458
processes, 265
query plan generation, 274–277
scalable LZW and QD construction, 238–244
technology, 193
MapReduceJoin (MRJ), 271
Map values (MVal), 346
Markov logic, 442
Markov logic networks (MLNs), 443
Markov network, 443
Masquerade detection, 189, 190, 191
Maximum likelihood tree, 447
MaxWalksat, 444
MCI-K means, see K-means clustering with cluster-impurity minimization
MDL approach, see Minimum description length approach
Mean distance (μd), 133
Medical domain implementation, 365–366
Mermaid, 495
Metadata, 391
controller, 398
Meteorological data, 446
Mica2 nodes running TinyDB applications, 410
Microlevel location mining, 296
Microsoft Azure’s Cosmos DB, 83–84
Minimum cost plan generation problem, 275
Minimum description length approach (MDL approach), 69, 190, 204
Minimum support (minsup), 35
Minor mechanical problem, 98
Minor weather problem, 98
minsup., see Minimum support
MI phase, see Map input phase
Misapprehension, 197
Mixed continuous and discrete domains, 444
approximate compilation for online inference knowledge, 445–446
lifted learning and approximations of pseudolikelihood, 445
MKey, see Map keys
MLNs, see Markov logic networks
Mobile devices, privacy in world of, 474
Mobile interfaces, 428
Mobile OS, 420
Mobile sensors, 405
Model update, 416
Modern transportation algorithms, 404
MO phase, see Map output phase
Motivation, 433
air quality data, 435
system architecture, 434
MPC, see Multipartition and multichunk; Multiple partition and multiple chunk
MQTT, 410
MR framework, see MapReduce framework
MRJ, see MapReduceJoin
1MRJ approach, see Single map reduce job approach
2MRJ, see Two MapReduce jobs
Multichunk ensemble approach, 343
Multiclass novelty detection technique, 108
Multidisciplinary approaches, 477–480
Multidisciplinary University Research Initiative (MURI), 324
Multilabel classification problem, 173
Multilabel instances, 173
Multimedia database systems, 522
Multimedia data management for collaboration, 500
Multiobjective optimization framework for data privacy, 476–477
Multipartition and multichunk (MPC), 94
Multiple partition and multiple chunk (MPC), 91, 115, 122, 123, 125, 171, 177
ensemble approach, 100, 107, 116, 171–172 487
ensemble built on, 115
ensemble updating algorithm, 115–116
error reduction using MPC training, 116–121
Multiple shards in cluster, 83
Multiple video signals, 409
Multisource derivation, 442
Multistep Markovian model, 189
MURI, see Multidisciplinary University Research Initiative
“Muslim-brotherhood”, 290
Mutual information, 447
MVal, see Map values
MyHealtheVet Decision Support Tool, 434–435
N
Naïve Bayes (NB), 342
Naming conventions, 318
National Institute of Standards and Technology (NIST), 52
National Science Foundation (NSF), 4, 469
SATC funded project CNS-1228198, 440
SATC funded project CNS-1237235, 440
National Security Agency (NSA), 290, 307
Natural language processing (NLP), 295, 455–456
NB, see Naïve Bayes
NCMRJ, see Nonconflicting MapReduceJoins
Nearest neighbor classification (NN classification), 150
Nearest neighborhood rule, 129–130
Negative authorization, 17
Network
intrusion detection, 95
network-based attacks, 47
types, 403
Networking and Information Technology Research and Development (NITRD), 469
Next-generation database systems, 495–496, 522
Neyman Pearson theory, 409
NIST, see National Institute of Standards and Technology
NITRD, see Networking and Information Technology Research and Development
NLP, see Natural language processing
NN classification, see Nearest neighbor classification
Noise, 189
Non-SQL (NoSQL), 81
Nonconflicting MapReduceJoins (NCMRJ), 271
Nondistributed feature extraction and selection, 347–348
Nonrelational high performance database, 81
Nonsequence data, 207; see also Sequence data
experimental setup, 209
results, 210
stream data, 251
supervised learning, 209–210, 210–212
unsupervised learning, 210, 212–214
Normative patterns, 220
Normative substructures, 197, 204
NoSQL, see Non-SQL
Novel class and properties, 130–131
Novel class detection, 3, 27, 51, 96, 134–137
analysis and discussion, 137
in data streams, 172
deviation between approximate and exacting q-NSC computation, 138–140
justification of algorithm, 137–138
time and space complexity, 140–141
Novel success control models, 407
Novelty detection, 108
Novice programmer, 183
NSA, see National Security Agency
NSF, see National Science Foundation
N-Triples, 72
Number of hops concept, 35
O
class/subclass hierarchy, 521
object-relational data model, 521–522
objects and classes, 520
Objects and classes, 520
OCSVM, see One-class support vector machine
OD, see Original data
Offline scalable statistical analytics, 442
current systems and limitations, 443–444
future system, 444
problem and challenges, 442–443
OLAP models, see On-line analytical processing models
On-line analytical processing models (OLAP models), 523
On Demand Stream approach (OnDS approach), 162–163
One-class classifiers, 108
One-class support vector machine (OCSVM), 183, 191, 197, 200, 203, 207, 209
algorithm, 190
OCSVM models, 488
One-pass learning paradigm, 94, 416
One time password (OTP), 331
One-VS-all approach, 38
Onion routing techniques, 407
Online inference knowledge, approximate compilation for, 445–446
Online reputation-based score computation, 295
Online structure learning methods for stream classification, 447–448
Ontologies, 487
security and, 63
Open provenance model (OPM), 361
Operating systems (OS), 53, 403, 419
level virtualization, 54
Operational expenditure (OpEx), 332
OpEx, see Operational expenditure
OPM, see Open provenance model
Optimizing objective function with E-M, 154–155
Oracle Corporation, 495
Original data (OD), 245
OS, see Operating systems
OTP, see One time password
OWL, see Web Ontology Language
P
PaaS, see Platform as a Service
PAD algorithm, see Probabilistic anomaly detection algorithm
PANG04 techniques, 129
Parallel boosting algorithms, 193
Parallel database systems, 522
Parameter
reduction, 174
sensitivity, 146
Partial elimination, 275
Partially labeled data, 94
Particulate matter (PM), 433
Partitioner, 237
PARV12a approach, 192
PCA, see Principle component analysis
PCS systems, see Process control systems systems
PDP, see Policy Decision Point
Pedigree, 394
Peer effect, 303
Peer-to-peer (P2P), 100, 122, 350
PEP, see Policy Enforcement Point
Person of interest (POI), 293
analysis, 293
InXite POI profile generation and analysis, 293–294
InXite POI threat analysis, 294–296
InXite psychosocial analysis, 296
PEs, see Portable Executables
PET, see Privacy-enhancing symposium
PETRARCH, 458
Physical system stream data, 409
PIE, see Privacy inference engine
Pig Latin, 80
Pig query language, 438
Platform as a Service (PaaS), 53, 332
Platform for Privacy Preferences (P3P), 380
PLCs, see Programmable logic controllers
Plug-and-play approach, 291
PM, see Particulate matter
POI, see Person of interest
Policy Decision Point (PDP), 334
Policy enforcement and related issues, 21
discretionary security and database functions, 23–24
policy specification, 23
query modification, 23
SQL extensions for security, 22–23
Policy Enforcement Point (PEP), 334
Policy manager, 357–358, 360, 398–399
Policy specification and enforcement, 320–321
Political event data, coding for, 458
Portable Executables (PEs), 350
POS, see Predicate Object Split
Positive authorization, 17
P2P, see Peer-to-peer
P3P, see Platform for Privacy Preferences
Predicate Object Split (POS), 267
Predicate object split, 74
Predicate split (PS), 73–74, 267, 269
Prediction, 409
Predictive tasks, 27
Preliminaries in cloud computing, 52
cloud deployment models, 53
service models, 53
Preprocessing, 409
Principle component analysis (PCA), 40
Privacy
policy, 380
privacy-enhancing techniques, 475–476
“privacy-sensitive” tuples, 441
for social media systems, 385–387
Privacy-enhancing symposium (PET), 475
Privacy-preserving
biometric authentication, 476
collaborative data mining, 476
data correlation techniques, 478–479
data management, 407
data matching, 476
record matching problem, 477
Privacy and security aware data management, 440
current systems and limitations, 440–441
problem and challenges, 440
Privacy inference engine (PIE), 382–383
Private cloud, 53
PRM, see Processor reserved memory
Probabilistic anomaly detection algorithm (PAD algorithm), 189
Probabilistic theorem proving (PTP), 445
Probability of state, 443
Process control systems systems (PCS systems), 405
Processor reserved memory (PRM), 462
Program analysis, 421
Programmable logic controllers (PLCs), 405
Programming projects to supporting lab, 462
proposed architecture, 464
secure data storage and retrieval in cloud, 462
secure encrypted stream data processing, 463–465
systematic performance study of TEE, 462–463
Propositional algorithms, 444
Proprietary protocols, 425
Provenance controller, 359–360, 398
PS, see Predicate split
Pseudocode
for entity extraction, 293
for information integration, 293
Pseudolikelihood, lifted learning and approximations of, 445
“Pseudopoint”, 132
Psychological score computation, 294
Psychosocial analysis, InXite, 296
PTP, see Probabilistic theorem proving
Q
QD, see Quantized dictionary
QEs, see Query engines
q-nearest neighborhood rule (q-NH rule), 130, 138
q-neighborhood silhouette coefficient (q-NSC), 135–140
q-NH rule, see q-nearest neighborhood rule
q-NSC, see q-neighborhood silhouette coefficient
QS, see Quantified self
Quantified self (QS), 1
movement, 453
Quantized dictionary (QD), 184, 218, 221–224, 237
scalable LZW and QD construction using MR job, 238–244
Query engines (QEs), 307
Query execution and optimization, 323
Query manager, 399
Query modification, 23
algorithm, 24
Query operation, 512
Query optimization, 23–24, 512
Query plan generation, 274–277
Query processing, 437, 512–513
module, 359
system, 264
Query processor, 513
Query transformation, 512
R
RabbitMQ, 448
Radial-based function (RBF), 209
Radius (R), 133
RAMP, see Reduce and map provenance
Raspberry Pi, 409
Raw outlier, 97
RBAC, see Role-based access control
RBF, see Radial-based function
RDD, see Resilient distributed dataset
RDF-S, see RDF schema
RDF, see Resource description framework
RDFKB, see RDF Knowledge Base
RDF Knowledge Base (RDFKB), 267
RDFQL, see RDF Query Language
RDF Query Language (RDFQL), 385
RDF schema (RDF-S), 59
Real dataset-ASRS, 161
Realistic Data Stream Classifier (ReaSC), 149–151
Real-time
classification, 174
database systems, 522
threat, 43
traveler information systems, 404
Real-time stream analytics, 446
current systems and limitations, 446
problem and challenges, 446
Real-world problems, 494
ReaSC, see Realistic Data Stream Classifier
ReaSC, 98, 101, 109, 110, 163, 168, 172
Receiver operating characteristic curves (ROC curves), 163
Recovery, 513
Redaction manager, 399
Reduce and map provenance (RAMP), 64
Reduce input phase (RI phase), 272
Reduce output phase (RO phase), 272
Relational databases, 264, 508
systems, 496
Relational data models, 507, 508
Relational learning, 456
Relaxed Bestplan problem, 276–277
Research and infrastructure activities in BDMA and BDSP, 454
big data analytics for insider threat detection, 454
binary code analysis, 455
CPS security, 455
infrastructure development, 455
secure cloud computing, 454–455
secure data provenance, 454
TEE, 455
Resilient distributed dataset (RDD), 80
Resource description framework (RDF), 3, 15, 57, 58, 263, 290, 308, 364, 373, 438, 487, 488
data manager, 308
Gateway, 385
graphs, 69
processing engines, 326
RDF-3X, 267
RDF-based policy engine, 325, 367
repository architecture, 72–73
Reverse engineering methods, 417
RI phase, see Reduce input phase
Risk analyzer, 399
Risk models, 479
Robotium (ROBO), 423
ROC curves, see Receiver operating characteristic curves
Role-based access control (RBAC), 15, 18–19, 331, 359, 398, 442
Role hierarchy, 19
RO phase, see Reduce output phase
Routing protocols, 407
Rule-combining algorithms, 335
S
SaaS, see Software as a Service
Sanitization
task output derivation, 441
tasks, 441
techniques, 477
Satellite AOD data, 446
SCADA systems, see Supervisory control and data acquisition systems
Scalability, 69, 184, 186, 391, 410
big dataset for insider threat detection, 244–245
big data techniques for, 192–193
experimental setup and results, 244
Hadoop cluster, 244
Hadoop MapReduce platform, 237–238
issues, 447
results for big data set relating to insider threat detection, 245–248
scalable analytics for IOT security applications, 408–411
scalable LZW and QD construction using MR job, 238–244
test, 147
Scalable, high-performance, robust and distributed (SHARD), 266, 325
Scalable LZW and QD construction using MR job, 238–244
Schema, 509
multidimensional array data model, 436
Scientific data
privacy and security aware data management, 440–442
storing and retrieving multiple types, 437–440
SDB, see SPARQL database
SDC, see System Development Corporation
SDN, see Software-defined networking
Search space size, 276
Second-order Markov model, 34
Secret sharing-based techniques, 408
Secure big data management and analytics, unified framework for, 392
global big data security and privacy controller, 400–401
integrity management and data provenance for big data systems, 391–396
Secure cloud computing, 454–455, 461
Secure cyber-physical systems, 461
Secure data
integration framework, 339
provenance, 454
storage and retrieval in cloud, 322, 324–325, 462
Secure encrypted stream data processing, 463–465
SecureMR, 440
Secure multiparty computation (SMC), 476
Secure SPARQL query processing on cloud, 322–323
Security, 516
labels, 441
and ontologies, 63
query and rules processing, 63
semantic web AND, 61
XML, 62
Security and privacy for big data, 459
curriculum development, 460–461
Security applications
data mining for cyber security, 43–47
Security extensions, 281
access token assignment, 283–284
Security policies, 15, 16; see also Insider threat detection
access control policies, 16–19
administration policies, 20
auditing, 21
discretionary security policies, 16
views for security, 21
SElinux, 440
Semantic gap, 38
Semantic web-based inference controller for provenance big data
architecture for inference controller, 356–360
big data management and inference control, 367–368
implementing inference controller, 365–367
inference control through query modification, 361–365
cloud computing frameworks based on technologies, 63–65
graphical models and rewriting, 361
OWL, 59
preliminaries in, 52
RDF, 58
semantic web-based models, 360–361
semantic web-based security policy engines, 326
SWRL, 61
technologies, 52, 263, 360, 396
technology stack for, 57
XML, 58
Semantic Web Rules Language (SWRL), 58, 61, 309, 358–359, 387
Semisupervised classification/prediction, 446–447
Semisupervised clustering
stream classification algorithm, 172
Sensing infrastructure, 404
Sensor signal, 409
Sequence-based behavior analysis, 416
Sequence data, 217; see also Nonsequence data
choice of ensemble size, 233–235
complexity analysis, 224
concept drift in training set, 228–230
experiments and results for, 227
insider threat detection for, 217
NB-INC vs. USSL-GG for various drift values, 231–232
results, 230
Serializability, 513
Service models, 53
SETM algorithm, 35
SGX hardware, 463
SHARD, see Scalable, high-performance, robust and distributed
Signature(s), 47
database, 342
detection, 339
signature-based malware detectors, 342
Silver Lining, 440
Simple Protocol and RDF Query Language (SPARQL), 58–59, 69, 263, 269, 488
Single-chunk approach, 171
Single-partition, single-chunk approach (SPC approach), 115, 340, 344
ensemble approach, 116
Single map reduce job approach (1MRJ approach), 238, 241–244
Single model approach, 94
incremental approaches, 417
Single pass algorithm, 220
Single source derivation, 441
Singular value decomposition (SVD), 40
Small communication frames, 407
Smart home, 405
Smart meters, 408
Smartphones application, 418
classification model, 418
data gathering, 419
data reverse engineering, 419
malware detection, 419
SMC, see Secure multiparty computation
SMM, see System management mode
SNOD, see Stream-based novel class detection
Social factor-based technique, 297
Social graph-based score computation, 295
Social media
authenticity of digital images in, 473
sites, 291
community, 263
trust for, 387
Soft subspace clustering, 71
Software, 280
Software as a Service (SaaS), 53, 307, 332
Software-defined networking (SDN), 407
SOWT, see Special operations weather specialists
emerge, 490
running, 409
SPARQL, see Simple Protocol and RDF Query Language
SPARQL database (SDB), 321
SpatialHadoop, 458
Spatiotemporal Database Systems, 522
SPC approach, see Single-partition, single-chunk approach
Special operations weather specialists (SOWT), 459
Split using explicit type information of object, 269
SQL, see Structured Query Language
SSL/TLS, large scale, automated detection, 421
SSO, see System security officer
Stand-alone systems, 497
Stanford framework, 458
State-of-the-art stream classification techniques, 127, 149, 171
Static analysis, 421
Static GBAD approaches, 190
Static learning, 190
Statistical models, 410
Status, 497
Sticky policies, 478
Storage management, 514
Storage services, 52
Storage virtualization, 54
Storing and retrieving multiple types of scientific data, 437
current systems and limitations, 438–439
problem and challenges, 437–438
Storm (data system), 442
Stream, 197
analytics, 171
classification techniques, 150
sequence data, see Infinite sequences
Stream-based novel class detection (SNOD), 289
application, 300
SNOD++, 300
classification, see Data stream classification
mining, 181
applications for insider threat detection, 3–4
for insider threat applications layer, 6–7
for insider threat detection, 4
layer, 6
big data issues, 184
as big data mining problem, 253
insider threat detection as stream mining problem, 183, 184
sequence stream data, 184
techniques, 207
Strong authorization, 17
Structured Query Language (SQL), 15, 55, 69, 485, 495, 512
extensions for security, 22–23
Supervised approach, 197
Supervised ensemble classification updating, 200
Supervised learning, 68, 190, 203, 209–212; see also Unsupervised learning
Supervised methods, 191
Supervised microclustering technique, 110
Supervised model, 191
Supervised testing algorithm, 200
Supervised/unsupervised learning, 456
Supervisory control and data acquisition systems (SCADA systems), 405
Supporting technologies, 2–3; see also Big data management and analytics (BDMA); Big data security and privacy (BDSP)
Support vector machines (SVMs), 27, 31–32, 47, 68, 183, 185, 207–209, 251, 342
Support vectors, 32
SVD, see Singular value decomposition
SVMs, see Support vector machines
SWRL, see Semantic Web Rules Language
Sybase Inc., 495
Symposium on Access Control Models and Technologies, 18
SynC, see Synthetic Data with only Concept Drift
SynCN, see Synthetic Data with Concept Drift and Novel Class
SynD, see Concept-drifting synthetic dataset
SynDE, see Concept-evolving synthetic dataset
Synthetic datasets, 99, 160, 349–350
Synthetic data with concept drift and concept evolution, 99
Synthetic Data with Concept Drift and Novel Class (SynCN), 141
Synthetic Data with only Concept Drift (SynC), 141
Synthetic data with only concept drift, 99
Systematic performance study of TEE, 462–463
System Development Corporation (SDC), 495
System management mode (SMM), 462
System(s)
services, 52
System security officer (SSO), 20, 511
T
TABARI software, 458
Tag, 442
TaintDroid, 425
TEE, see Trusted execution environments
Temporary buffer, 129
Text(s)
classification approaches, 189
Third-party IME, 424
Threat
assessment, 295
data, 403
Three-schema architecture, 510
TIE, see Trust inference engine
Time based access control (TRBAC), 359
Time complexity, 121, 140–141, 160
Timely health indicators, 459, 474
Time role-based access control (TRBAC), 398
TM, see Translation model
TMP36 sensors, 409
TNs, see True negatives
Token, 207
subgraph, 208
Toy problems, 494
TPJ, see Triple Pattern Join
TPR, see True positive rate
TPs, see Triple patterns; True positives
Trace Files, 227
Traditional data stream classification techniques, 127, 416
Traditional machine-learning tools, 409
Traditional static supervised method, 183
Traffic flow control, 404
Transactional approach, mitigating data leakage in mobile apps using, 424–425
Transaction management, 513–514
Translation model (TM), 40
Traveler information, 404
TRBAC, see Time based access control; Time role-based access control
Triple Pattern Join (TPJ), 271
Triple patterns (TPs), 264, 271
Triples, 72
True negatives (TNs), 197, 230
True positive rate (TPR), 186, 230
True positives (TPs), 197, 230
“Truncated” UNIX shell commands, 189, 191
probabilities, 387
for social networks, 387
Trusted execution environments (TEE), 454, 455, 459
systematic performance study, 462–463
Trust inference engine (TIE), 382–383
Trust, privacy, and confidentiality, 379
current successes and potential failures, 380–381
motivation for framework, 381
TrustZone security, 406
Twitter, 289
Two MapReduce jobs (2MRJ), 238
Two-phase commit, 513
Type sink, 417
U
UAV could, 409
UCON, see Usage control
UI, see User interface
Unbounded data stream, 221
Unified framework
global big data security and privacy controller, 400–401
integrity management and data provenance for big data systems, 391–396
learning framework, 409
for secure big data management and analytics, 392
Uniform resource identifiers (URIs), 58, 74, 269, 318, 331
UNIX shell commands, 189
Unsupervised ensemble classification and updating, 198
clustering, 152
Unsupervised learning, 191, 203, 210, 212–214, 415; see also Supervised learning
GBAD-MDL, 204
GBAD-MPS, 205
Unsupervised method, 183
Unsupervised stream-based sequence learning (USSL), 184, 185, 218, 219, 220, 230
constructing LZW Dictionary, 221–222
URIs, see Uniform resource identifiers
Usage control (UCON), 19
U.S. Bureau of Labor and Statistics (BLS), 1
User demographics-based, 297
User feedback, 252
User-level applications, 189
U.S. Homeland Security, 67
USSL, see Unsupervised stream-based sequence learning
V
VA, see Veterans Administration
Vector representation of content (VRC), 70–71
Vertically partitioned layout, 318–319
Very Fast Decision Trees (VFDTs), 106, 340
Veterans Administration (VA), 433, 434
decision support tools, 436
Personal Health Record system, 434
VFDTs, see Very Fast Decision Trees
Victim selection, 220
Video signal, 409
View management, 517
ViewServer, 424
Vigiles, 441
Virtual laboratory development, 421
architectural diagram for virtual lab and integration, 422
input events generation, 424
intelligent fuzzier for automatic android GUI application testing, 423
mitigating data leakage in mobile apps, 424–425
policy engine, 426
problem statement, 423
programming projects to supporting virtual lab, 423
technical challenges, 425
Virtual machine manager (VMM), 462
Virtual machines (VM), 244
image, 55
monitor, 54
Vision, 497
VM, see Virtual machines
VMM, see Virtual machine manager
VMware, 54
Volume, velocity, variety, veracity, and value (Five Vs), 1
Voting, 409
VRC, see Vector representation of content
W
WA., see Weighted average
W3C, see World Wide Web Consortium
WCE, see Weighted classifier ensemble
WCOP, see Web rules, credentials, ontologies, and policies
Weak authorization, 17
Web-based interface, 421
Web Ontology Language (OWL), 58, 59, 263, 309, 355, 364, 487
OWL 2 specification, 400
Web rules, credentials, ontologies, and policies (WCOP), 388
Weighted average (WA), 199
Weighted classifier ensemble (WCE), 142
Weight learning, 443
Weka (machine learning open source package), 83, 122
Whitepages, 366
WHO, see World Health Organization
Wireless communication networks, 404
Wireless sensor networks (WSN), 410
Workgroups, 474
Workshop discussions, 474
BDMA for cyber security, 480–481
examples of privacy-enhancing techniques, 475–476
multiobjective optimization framework for data privacy, 476–477
philosophy for BDSP, 475
research challenges and multidisciplinary approaches, 477–480
workgroups, 474
Workshop presentations
keynote presentations, 473–474
World Health Organization (WHO), 433
World Wide Web, 20, 24, 53, 57, 365, 462
World Wide Web Consortium (W3C), 57, 380
Wrapper-based simultaneous feature weighing, 39
WSN, see Wireless sensor networks
X
XACML, see eXtensible Access Control Markup Language
XEN, 54
XML, see eXtensible Markup Language
XQuery, 23
Y
Yahoo!, 266
Yellowpages, 366
Z
Zero-knowledge proof of knowledge protocols (ZKPK protocols), 476
3.12.161.77