Index
Note: Page numbers followed by f indicate figures.
A
12 lines of Python code
31–34
natural language autocoders
25–26
nomenclature coding
24–25
B
data structure and content
back-of-envelope analyses
estimation-only analyses
266
complete and representative data
188
approximate/local solutions unacceptable
327
data objects identification and classification
187
normal/Gaussian distribution
191–192
annotate with metadata
193
data within data object
195
membership in defined class
196
large files, view and search
198–200
sample number/dimension dichotomy
187
query output adequacy
330
identification errors
426
motor vehicle accidents
427
reformulated questions
330
ontologists and classification experts
429
data reduction specialists
433
free-lance Big Data consultants
434
generalist problem solver
432
scientists with minimal programming skills
433
results and conclusions
335
security policy/restricted data
197–198
self-descriptive information
188
word frequency distributions
260–264
cancel-out hypothesis
308
ambiguity of system elements
313
statistical method bias
313
Burrows Wheeler transform (BWT)
36–50
C
Cancel-out hypothesis
308
Cancer Biomedical Informatics Grid (CaBig
TM)
339–344
CODIS (Combined DNA Index System)
368–369
medical error/counting errors
212–213
systematic counting error
211
D
addition and multiplication
238
cryptographic programs, beware of
241–242
pseudorandom number generator
240–241
random access to files
237
speed and scalability issues
high-speed programming languages
232
iterative loops, system calls within
234
look-up tables and pre-computed pointers
235
software testing on data subset
233
unpredictable software
236
identifier system, properties of
55–58
social security number
62
life science identifiers
64
additional analyses and updating results
356
clarification and improved earlier studies
355
data and data documentation errors
353
data misinterpretation
353
extending original study
356
scientific misconduct
354
CODIS (Combined DNA Index System)
368–369
novel data sets creation
365
original research performance
364
Plate Boundary Observatory data
369–370
public/private key cryptography
signature and authentication
383–384
data compartmentalization
379
data misinterpretation
374
limited access to responsible professionals
375
universal data standards
375
Labeled-Release data on life on mars
387–388
Digital Millennium Copyright Act of 1998 (DMCA)
399
E
F
Cancer Biomedical Informatics Grid
339–344
data management principles
326
National Biological Information Infrastructure
337 ,
338f
legacy data, preserving
339
Frequency distribution of words
G
H
Havasupai Tribe
v. Arizon Board of Regents
413–416
I
Immutability and identifiers
blockchains and distributed ledgers
176–179
immortal data objects
173
reconciliation across institutions
174–175
zero-knowledge reconciliation
179–183
object oriented programming
J
L
biases by consent process
408
confidential consent status
407
divert responsibility
410
legally valid consent form
405
train staff on consent-related issues
408
unmerited revenue source
409
resources, right to create, use and share
data managers, suggestions for
399–400
Digital Millennium Copyright Act of 1998
399
No Electronic Theft Act of 1997
399
intellectual property
401
Life science identifiers (LSID)
64
M
medical error/counting errors
212–213
systematic counting error
211
normalizing and transforming data
converting interval data set
217 ,
218f
population difference, adjusting
216
rendering data values dimensionless
216 ,
217f
Message digest version 5 (md5) algorithm
74–75
N
Natural language autocoders
25–26
No Electronic Theft Act of 1997 (NET Act)
399
Nomenclature coding
24–25
O
Object by relationships
97–101
Object oriented programming
116
data object, assigning
107
multiclass inheritance
107
class blending (noisy class)
110–111
data objects hierarchy
102
vs. identification system
104
object oriented programming
106–107
Python/Perl programming languages
106
Ruby programming language
106
simple classification
109
class relationships visualization
classification of human neoplasms
121 ,
122f
corrupted classification
122 ,
123f
classes and properties
113
miscellaneous classes
112
P
Plate Boundary Observatory data
369–370
Pseudorandom number generator
240–241
Python/Perl programming languages
106–107
R
frequency of unlikely occurrences
293–294
pseudorandom number generator
Resource Description Framework (RDF) Schema
S
Big Brother hypothesis
420
Borg invasion hypothesis
420
Egghead heaven hypothesis
421
reduced cost and increased productivity
422–424
Scavenger hunt hypothesis
421
filtering-out process
156
Suggested Upper Merged Ontology (SUMO)
114–115
T
U
life science identifiers
64
Universally unique identifier (UUID)
V
W
World Intellectual Property Organization (WIPO)
401
X
XML (eXtensible Markup Language)
Z
Zero-knowledge reconciliation
179–183