Index

4GL technology, 3–4

A

access. See data access
access query, 340
active data warehouse, 1416
active metadata repository, 99100
ad hoc transactions, 340
Address data, 196
administration
architectural, 348351
Archival Sector, 348349
budget, 334, 358359
data base, 352353
data models, 347348
management, of DW 2.0 environment, 358361
managing consultants, 359361
metadata, 41, 351352
Near Line Sector, 349351
prioritization/conflicts, 358
resource allocation, 359
scheduling and milestones, 359
stewardship, 353355
systems/technology, 355358
aggregate data
data models, 159
granularity managers, 235
airline metadata example, 106109
analysis
creating new, 273277
creating new, from DW 2.0 data, 276277
analytical productivity and response time, 243244
analytical response time, 241
applications
active data warehouse, 1416
from application to corporate data, 216, 219221
evolution of data warehousing, 23, 56, 910
Interactive Sector, 29
monitoring, 172
OLAP, 20
transaction monitor and response time, 171172
transaction processing. See OLTP
useful, 5152
architecture
architectural administration, 348351
building “real” data warehouse, 2122
creating new analysis, 273277
flow of data through, 203
new paradigm of DW 2.0, 24, 25
Archival Sector, 7686
adding, 264265
architectural administration, 348349
data access, 3334, 8081, 8485
flow of data, 4849, 209
life cycle of data, 2730
metadata, 3133, 105106
passive indexes, 8183, 344
performance, 8081
processes, 343344
reasons for, 3031
searching, 8085
time-variant data, 192193, 199200
volumes of data, 5051, 80, 85
workload, 80
archiving data. See also Archival Sector
business perspective, 26
with metadata, 32
ASCII/EBCDIC conversions, 223
assets and ROI, 128
ATM transactions, 28
attacks on data, sensing, 185187
audience and star schemas, 1819
audit trail
correcting/resetting data, 330331
ETL processing, 223224

B

backflow of data
from exploration facilities, 152154
from Integrated Sector, 212213
bad data
balancing entry for, 330
introduction, 329330
making corrections, 330331
resetting values, 330
balancing entry for bad data, 330
bank account information, 116
banking transactions, 193, 217
barrier security, 172
Basel II taxonomy, 104, 109
batch mode
ETL processing, 217218
exception-based flow of data, 212
beverage sales data, 143144
BI (business intelligence) universed, 9798
big bang approach, 89
blather, 38
brittleness and star schemas, 18
Brooks, Fred, 115
budget, 334, 358359
business
BI universe, 9798
changing requirements, 4748, 114115
corporate data and Integrated Sector, 62
corporate data model, 162163
corporate information factory, 12, 13, 134
enterprise knowledge coordination stream, 129133
impact of data warehousing, 11
metadata, 44, 102103, 352
representation and data models, 157158
“Business Intelligence Road Map” (Moss), 125126
business perspective
cost justification, 282
data models, 166
DW 2.0 environment, 24, 26, 9092
ETL processing, 227228
evolution of data warehousing, 56, 14
flow of data, 213214
granularity managers, 238
metadata, 109110
migration, 269270
monitoring DW 2.0 environment, 178
performance, 258259
security, 187188
statistical processing, 155156
system of record, 319320
technology infrastructure, 121122
time-variant data, 200
unstructured data, 310

C

caching data, 54, 71, 356
calculations and data mapping, 317
capacity/disk storage
active data warehouse, 15
costs, 335336
DBMS activities, 2
evolution to DW 2.0 environment, 10
history of data warehousing, 1
Near Line Sector, 73
optimization, 336
planning and performance, 247248, 356357
captured text, 87
CDC (changed data capture), 226
changes
in business requirements, 4748, 114115
CDC, 226
growth of data, 334
infrastructure. See technology infrastructure
mitigating business change, 119
propagation of, 20
rapid business changes, 114
states of data, 215, 219221
treadmill of change, 114115
click-stream data, 232
CMSM (cross-media storage manager), 7475, 211212
Coca Cola sales data, 143144
code
checking automatically generated, 257258
ETL, 225
concepts data models, 291
consultants, managing, 359361
continuity of data, 198
continuous time span data
beginning/ending sequence of records, 197198
features of, 194196
nonoverlapping records, 197
overview, 63
sequence of records, 196
conversions, data, 62, 221223
corporate data
from application data to, 216, 219221
enterprise knowledge coordination stream, 129133
Integrated Sector, 62
model, 162163
corporate information factory, 12, 13, 134
correcting/resetting data, 330331
costs/cost justification
active data warehouse, 16
business perspective, 24
creating new analysis, 273277
creating new analysis from DW 2.0 data, 276277
DW 2.0 implementation, 271, 273
economics and evolution of data warehousing, 1011
factoring cost of DW 2.0, 277278
factors affecting, for new analysis, 276
first-generation vs. DW 2.0, 281282
historical information, 280281
macro-level justification, 271272
micro-level justification, 272273
Near Line Sector, 7273
perspective of business users, 282
real economics of DW 2.0, 279
reality of information, 278
storage, 335336
time value of information, 279280
value of integration, 280
credit card data, 11, 295
cross-media storage manager (CMSM), 7475, 211212
currency conversions, 62, 221223
customer metadata, 101102
customer profiles, 11

D

data access
Archival Sector, 3334, 8081, 8485
business perspective, 26
in DW 2.0 environment, 3334
Integrated Sector, 6667, 6970
Interactive Sector, 59
Near Line Sector, 74
probability of, 3031, 209210
security, 181
unstructured data, 8990
volumes. See volumes of data
data base
administration, 352353
DBMS, 12, 223, 332337
relational, 309
data correction stream, 133
data flow. See flow of data
data integration
Integrated Sector, 58
introduction, 78
data item set (dis) level, 159160
data mapping, 219, 223, 316319
data marts
convenience of, 324325
as data warehouse, 15, 2021
described, 13
vs. exploration facilities, 152
moving data, 327328
performance, 251
source data, 323
system of record, 319
transforming data from, 325
data models
business representation, 157158
concepts, 291
corporate, 162163
corporate, and seven streams approach, 131
data item set level, 159160
ERD level, 159160
granular vs. summarized data, 159
intellectual road map, 157
Interactive Sector, 161162
levels of, 159161
logical, 291294
perspective of business users, 166
physical model, 159160, 290292
scope of integration, 158159
top-down modeling, 294296
transformation, 163164
types used in DW 2.0, 289294
unstructured data, 164166
data profiling
enterprise knowledge coordination stream, 129133
inconsistencies, 294296
and mapping stream, 133
tools and reverse-engineered data model, 288289
data quality
checking, 170171, 174175, 224
data model types, 289294
data profiling inconsistencies, 294296
data profiling tools, 288289
DW 2.0 environment, 285286
reverse-engineered data model, 288289
TIQM, 134137
tool set, 287288
TQdM, 134
data quality editor, 63
data quality monitor, 170171, 174175, 224
data warehouse
active approach, 1416
bad data, 329330
building “real” vs. variations, 2122
business appeal of DW 2.0, 24
business impact, 11
business perspective, 26, 9092
changing business requirements, 4748, 114115
compared to data mart, 21
data mart. See data marts
defined, 7
different development approach, 89
diversity of, 4041
DW 2.0 technology foundation, 4546
environment. See environment, DW 2.0
exploration warehouse, 13
federated approach, 1415, 1618
first-generation. See history of data warehousing
house/city analogy, 261262
integrating data, 78
new paradigm of DW 2.0, 24, 25
shaping factors of DW 2.0, 2324
speed of data movement into/through, 331
star schema approach, 15, 1819
suboptimal, 19
useful applications, 5152
variations, 1415
volumes of data, 8, 5051
data warehouse monitor
falling probability of data access, 209210
features of, 176177, 326327
overview, 171
security, 185
data warehouse utility (DWU), 332337
Dataupia, 332, 336
date formats, 62, 222
Date of Birth data, 196
DBMS (data base management systems)
conversions, 223
data warehouse utilities, 332337
purpose of, 12
DDL (data definition language), 290
Decision Support Systems (DSS) processing, 41, 7071
default values and ETL processing, 223
demographics, 51
development
data warehouse approach, 89
ETL programs, 224225
PCs and 4GL technology, 34
devolution, 19
dis (data item set) level, 159160
discontinuity of data, 197
discrete data
continuity, 198
time-variant data, 194
disk storage. See capacity/disk storage
diversity and metadata, 41
dividing data and technology infrastructure, 121
domain checking, 63, 174, 224
domains in data quality tool set, 287
dormant data
monitoring, 176177
removing, 245246
Dow Jones Industrial average, 194
DSS (Decision Support Systems) processing, 41, 7071
dump, data, 184185
DW 2.0. See data warehouse; environment, DW 2.0
DWU (data warehouse utility), 332337

E

EBCDIC/ASCII conversions, 223
edited text, 87, 302
ELT (extract/load/transform) processing
described, 226227
Interactive Sector, 58
managing, 351
perspective of business users, 227228
email
blather, 38
as unstructured data, 299
view of customer, 51
encryption
features of, 181182
limiting, 184
end-user perspective. See business perspective
English, Larry, 134, 137
enterprise knowledge coordination stream, 129133
enterprise reference model stream, 130
enterprise-wide metadata
features of, 101102
local metadata, 4345, 9798, 103
metadata in DW 2.0 environment, 9798
environment, DW 2.0
access of data, 3334
components of, 1113
cost justification, 271
data access, 3334
data flow, 4849
data warehouse, 6
data warehouse monitor, 171
DW 2.0 landscape, 290
ETL processing, 215216
evolution, 911
management administration, 358361
metadata, 3133, 4044, 9699
migration to unstructured, 267269
monitoring, 169, 246247
performance, 239
preparing unstructured data, 3840
referential integrity, 52
reporting, 53, 206
responding to business changes, 4748
spider web, 45, 6
structured/unstructured data, 3435, 8690
technology foundation, 4546
textual data, 3435
transaction monitor, 169170
transaction monitor and response time, 171172
volumes of data, 5051
ERD (entity relationship level), 159160
ETL (extract/transform/load) processing
from application to corporate data, 216, 219221
audit trail, 223224
batch mode, 217218
CDC, 226
changing states of data, 215, 219221
code creation/parametrically driven, 225
compared to ELT, 226227
complex transformations, 221
creating programs, 224225
data flow in DW 2.0, 48, 205
data quality monitor, 170171, 174175, 224
default values, 223
domain checking, 174, 224
in DW 2.0 environment, 215216
Integrated Sector, 29, 63, 6768
Interactive Sector, 2729, 5859
introduction, 12
mapping, 219
metadata, 223
migration shock absorber, 267
online mode, 216217
real-time processing, 218
rejected data, 225226
source/target, 218219
system of record, 218
technology to prepare data, 308
throughput, 222223
unstructured processing, 8788
evolution of data warehousing. See history of data warehousing
exception-based flow of data, 210213
exploration facilities
backflow of data, 152154
data marts compared to, 152
features of, 147
frequency of analysis, 147
project-based data, 13, 150151
refreshing exploration data, 149
sources for exploration processing, 149
using data internally, 155
exploration processing, 146
exploration warehouse, 13, 24
extensibility
nonextensibility and data marts, 20
star schemas, 18
external taxonomies, 104105, 304305
extract/load/transform (ELT) processing, 226227
extracts
ETL. See ETL (extract/transform/load)
processing
proliferation and data marts, 20

F

federated data warehouse
described, 1618
variations of data warehouses, 1415
filtering data, 232234
Find it domain, 287
firewalls, 182
first-generation data warehousing. See history of data warehousing
Fix it domain, 287
flow of data
Archival Sector, 4849, 209
bulk batch mode, 212
in DW 2.0 environment, 4849
exception-based, 210213
falling probability of data access, 209210
Integrated Sector, 4849, 205207
Interactive Sector, 4849, 203205
Near Line Sector, 4849, 207209
performance, 241242
perspective of business users, 213214
role of ETL, 205
staging request, 213214
throughout architecture, 203
triggers, 206207
foreign keys, 174175
freezing data, 145146
frequency of analysis, 147
frequent flyer programs, 11

G

gender data, 63, 174, 196, 223
general/specific text, 3940
glossaries, 304
granularity
in data warehouse, 7
devolution, 19
in federated data warehouse, 1718
granular vs. summarized data, 159
Integrated Sector, 65, 7071
Interactive Sector, 60
in star schemas, 19
granularity managers
aggregate data, 235
compared to ELT, 231232
eliminating data, 234
filtering data, 232234
functions of, 234236
home-grown vs. third-party, 236
metadata as by-product, 237238
parallelizing, 237
perspective of business users, 238
raising level of granularity, 232
recast data, 235
summarizing data, 234
growth of data, 334

H

hardware/software selection, 256
heuristics
analysis and statistical processing, 145146
freezing data, 145146
processing, 243, 341
highway analogy for workload, 64, 66
historical data
data warehouse, 7
federated data warehouse, 17
Integrated Sector, 65, 67, 70
Interactive Sector, 60
value of information, 280281
historical record, 120
history of data warehousing
from business perspective, 56, 14
capacity/disk storage, 1
data warehouse environment, 6
DBMS, 12
DW 2.0 compared to first-generation, 2324
early progression of systems, 2
forces shaping, 911
master files, 5
online applications, 23
PCs and 4GL technology, 34
spider web environment, 45, 6
homographs
resolution, 303304
taxonomies, 105
house/city analogy, 261262

I

Improving Data Warehouse and Business Information Quality (English), 137
indexing
passive indexes for archival data, 8183, 344
performance, 245
information factory development
stream, 133
infrastructure
stream, 133
technology. See technology infrastructure
integrated data
in data warehouse, 78
evolution of data warehousing, 10
federated data warehouse, 17
integrating text, 301307
scope of, and data models, 158159
value of integration, 280
Integrated Sector, 6271
changes to data, 67
continuous time span data, 63
corporate data, 62
data access, 6667, 6970
data key reconciliation, 62
data quality editor, 63
DSS processing, 7071
ETL processing, 29, 63, 6768
flow of data, 4849, 205207
granularity, 65, 7071
historical data, 65, 67, 70
life cycle of data, 2730
performance, 6566
processes, 341342
profile data, 63
queries/searches, 6567
reasons for, 3031
referential integrity, 6869
subject-oriented detailed data, 6263
summary data, 63
time relativity, 192193
transactions, and time-variant data, 193194
volumes of data, 5051, 65
workload, 64
integrity of data
active data warehouse, 15
comparisons, 144145
referential, 52
star schemas, 19
statistical comparison, 144145
Interactive Sector, 5561
access of data, 3334
data access, 59
data models, 161162
ETL processing, 58
flow of data, 4849, 203205
granularity, 60
historical data, 60
life cycle of data, 2730
metadata, 3133
performance, 57, 254255
protecting, 254255
reasons for, 3031
referential integrity, 58
searches, 58
time relativity, 192
volumes of data, 5051, 57
workload, 56
internal taxonomies, 104105
intersector/intrasector referential integrity, 52
inventory management, 11, 127128
IT (information technology)
reducing IT response time, 115
technology infrastructure, 112113

K

Kalido, 121
keys
reconciliation, 62
restructure/creation, 223

L

LDM (logical data model), 291294
legacy data. See also ETL (extract/transform/load) processing
creating new analysis, 273275
as data source, 315
federated data warehouse approach, 1618
life cycle of data, 2730
operational/legacy systems environment, 313
licenses and DWU, 336
life cycle of data
reasons for sectors, 3031
sectors, described, 2730
linkages, 87, 309
local metadata, 4345, 9798, 103
logic
data mapping, 318
LDM, 291294
transactions, 339341
logical data model (LDM), 291294

M

macro-level cost justification, 271272
maintaining metadata, 106
management administration of DW 2.0 environment, 358361
mangled characters, monitoring, 175
mapping, data, 219, 223, 316319
maps, level of detail, 159160
master files, 5
meltdowns, 173
metadata
active/passive repositories, 99100
administration, 41, 351352
in Archival Sector, 3133, 105106
building infrastructure, 266
business, 44, 102103, 352
business perspective, 26
as by-product of granularity manager, 237238
card catalog analogy, 95
creating enterprise, 265266
in DW 2.0 environment, 3133, 4044, 9699
end-user perspective, 109110
enterprise-wide, 101102
ETL processing, 223
infrastructure and performance, 248
in Interactive Sector, 3133
local, 4345, 9798, 103
maintaining, 106
repositories, 9899
reusability of data and analysis, 96, 249
stop words, 105, 302
structure of, 9697
system of record, 102103
taxonomies, 104105
technical, 44, 102103, 352
transformation process, 341
unstructured data, 41, 103, 104105
using, airline example, 106109
methodology
seven streams approach, 129139
spiral, 123129, 137139
waterfall, 123126
micro-level cost justification, 272273
migration
adding Archival Sector, 264265
adding components incrementally, 262264
building metadata infrastructure, 266
creating enterprise metadata, 265266
ETL as shock absorber, 267
in perfect world, 262
perspective of business users, 269270
swallowing source systems, 266267
to unstructured environment, 267269
milestones and scheduling, 359
money data, 17, 162, 220221
Monitor it and report it domain, 287
monitoring DW 2.0 environment
application monitoring, 172
by architectural administrator, 350
data quality, 170171, 174175, 224
data warehouse monitor, 171, 176177, 185, 326327
domain checking, 174, 224
dormant data, 176177, 245246
mangled characters, 175
meltdowns, 173
null values, 175
outlying range, 175
overview, 169
peak-period processing, 172174
performance, 246247
perspective of business users, 178
queue monitoring, 171
sniffing, 176
transaction monitor, 169170
transaction monitor and response time, 171172
transaction queue monitoring, 171
transaction record monitoring, 172
unmatched foreign keys, 174175
Moss, Larissa, and spiral methodology, 125128
Move it domain, 287
Mythical Man Month, The (Brooks), 115

N

Name data, 196
NAME data, 308
Near Line Sector, 7176
architectural administration, 349351
CMSM, 7475, 211212
cost and performance, 7273
data access, 74
data storage, 73
flow of data, 4849, 207209
life cycle of data, 2730
processes, 342343
reasons for, 3031
security, 187
time relativity, 192193
volumes of data, 5051, 76
workload, 7374
nonextensibility and data marts, 20
nonoverlapping records, 197
normalization of textual data, 3840
normalized data, 38
null values, monitoring, 175

O

ODS (operational data store), 13
offline data and security, 182184
OLAP (online application processing), 20
OLTP (online transaction processing)
DWU, 336
federated data warehouse, 1617
long queue time, 243
ODS, 13
performance, 239
SLA, 254
O’Neill, Bonnie, 287
online applications/processing
active data warehouse, 1416
evolution of data warehousing, 910
history of data warehousing, 23, 56
OLAP, 20
transaction processing. See OLTP
online mode for ETL, 216217
online response performance, 239241
operational application systems
environment, 218, 313314, 316
operational/legacy systems environment, 313
organizational charts, 116117, 120
outlying range, 175

P

paradigm of DW 2.0, 24, 25
parallelization
batch, and performance, 249
granularity managers, 237
transaction processing, 249250
partitioning data, 255256
passive indexes for archival data, 8183, 344
passive metadata repository, 99100
password flooding attacks, 186
patient’s records, 51, 52
PCs and 4GL technology, 34
peak-period processing, 172174
performance
analytical productivity and response time, 243244
analytical response time, 241
Archival Sector, 8081
batch parallelization, 249
capacity planning, 247248
checking automatically generated code, 257258
data marts, 251
data models and Interactive Sector, 161162
in DW 2.0 environment, 239
end-user education, 246
exploration facilities, 252
facets to, 244245
federated data warehouse, 1617
flow of data, 241242
hardware/software selection, 256
heuristic processing, 243
indexing, 245
Integrated Sector, 6566
Interactive Sector, 57, 254255
metadata infrastructure, 248249
monitoring environment, 246247
Near Line Sector, 7273
OLTP, 239
online response time, 240241
parallelization for transaction processing, 249250
partitioning data, 255256
perspective of business users, 258259
physically grouping data, 257
protecting Interactive Sector, 254255
queues, 242243
reducing IT response time, 115
removing dormant data, 245246
separating farmers/explorers, 256257
separation of transactions into classes, 253254
service level agreements, 254
transaction monitor and response time, 171172
unstructured data, 8889
workload management, 250251
physical model, 159160
physically grouping data, 257
pointers and unstructured processing, 87
preprogrammed complex transactions, 340
prioritization/conflicts and administration, 358
probability of data access
for different sectors, 3031
elevated, 210
falling probability, 209210
processing in DW 2.0 environment, 339344
in DW 2.0 sectors, 341344
transaction types, 339341
profile data, 63, 131
project-based data, 13, 150151
proliferation and star schemas, 19
protecting Interactive Sector, 254255

Q

quality monitor, 170171, 174175, 224. See also data quality
queries. See also data access; searches
access, 340
ad hoc, 340
Interactive Sector, 58
nonreplicable, in federated data warehouse, 17
statistical processing, 141143
queue monitoring, 171
queues, 242243

R

random data access, 3334
range checking, 63, 175, 224
rationalization of textual data, 3940
reading text for analytical processing, 299300
reality of information, 278
real-time ETL processing, 218
reasonability checking, 224
recast data, 235
reconciliation
data marts, 20
encoded values for data mapping, 318
keys, 62
referential integrity
described, 52
Integrated Sector, 6869
Interactive Sector, 58
refreshing exploration data, 149
rejected data, 225226
relational data base, 309
reports, 53, 206, 278
repositories for metadata, 98100
resetting data values, 330
resource allocation, 359
response time
analytical productivity, 243244
IT, reducing, 115
online response time, 240241
transaction monitor, 171172
transaction monitoring, 171172
return on investment (ROI), 128
reusability
data and analysis, 96
metadata, 249
spiral methodology, 127128
revenue metadata, 101103
reverse-engineered data model, 288289
road maps
“Business Intelligence Road Map” (Moss), 125126
DW/BI project road map, 137138
intellectual road map, 157

S

sales data
in Integrated Sector, 6263
semantically stable data, 117
sales territories, 116117
SAN technology, 333, 334
SAP technology, 333
Sarbanes Oxley taxonomy, 109, 306307
scheduling and milestones, 359
scope creep, 348
screening data, 38
SDLC (systems development life cycle), 123
searches. See also data access; queries
Archival Sector, 8085
direct/indirect, 306307
indexing, 8183, 245, 344
Integrated Sector, 6567
Interactive Sector, 58
sectors. See also Archival Sector;
Integrated Sector; Interactive
Sector; Near Line Sector
data access, 3334
metadata, 3133
reasons for different sectors, 3031
types of sectors, 2730, 55
security
barrier security, 172
data access, 172
data warehouse monitor, 185
direct dump of data, 184185
drawbacks, 182
encryption, 181182, 184
firewall, 182
moving data offline, 182184
Near Line Sector, 187
password flooding attacks, 186
perspective of business users, 187188
protected unstructured data, 187
sensing attacks, 185187
Self Organizing Map (SOM), 165
semantic relationships
enterprise metadata, 101
mitigating business change, 119
mixing stable/unstable data, 118
separating stable/unstable data, 118
stable data, 117
temporal data, 116117
temporal/static data, 115116
semantically temporal/static data, 116117
semistructured data/value, 307308
sequence of records, 196
sequential data access, 3334
service level agreements (SLAs), 254, 259, 349350
settling data, 331
seven streams approach
data correction stream, 133
data profiling and mapping stream, 133
DW/BI project road map, 137138
enterprise knowledge coordination stream, 129133
enterprise reference model stream, 129
information factory development stream, 133
infrastructure stream, 133
overview, 129
summary, 137139
total information quality management stream, 134137
shared data mart data, 327328
SLAs (service level agreements), 254, 259, 349350
slivers and spiral methodology, 127
snapshots of data, 119120
sniffing and data warehouse monitor, 176
software
creating ETL, 224225
disruption and DWU, 336
performance, 256
SOM (Self Organizing Map), 165
source data system of records, 316
sources
best source data from operational environment, 316
data mapping, 219, 223, 316319
data marts, 323
ETL processing, 218219
exploration processing, 149
swallowing source systems, 266267
system of record, 313319
specific/general text, 3940
speed of data movement, 331
spellings, alternate, 105, 305
spider web environment
history of data warehousing, 45
transition to data warehouse environment, 6
spiral methodology, 123129, 137139
stable/unstable data, semantically features of, 117
mixing/separating, 118
staging request, 213214
star schemas, 15, 1819
states of data, changing, 215, 219221
static/temporal data, 115117
statistical analysis, 141143
statistical comparison, 144145
statistical processing
active data warehouse, 16, 141
backflow of data, 152154
data marts and exploration facilities, 152
in DW 2.0 environment, 141142, 341
exploration facilities, 147
exploration processing, 146
freezing data, 145146
frequency of analysis, 147
heuristic analysis, 145146, 341
integrity of comparisons, 144145
perspective of business analyst, 155156
project-based data, 150151
queries, 141143
refreshing exploration data, 149
sources for exploration processing, 149
using exploration data internally, 155
using statistical analysis, 143
stemming, 301, 305
stewardship, 353355
stop words, 105, 302
storage. See capacity/disk storage
Strauss, Derek, 287
structured data
data flow, 4849
linkages, 87, 309
metadata, 41
vs. unstructured data, 3435
volumes of data, 5051
subject-oriented detailed data, 6263
subjects
subject area definitions, 101102
system of record, 102103
summary data
granular vs. summarized data, 159
granularity manager, 234
in Integrated Sector, 63
synonyms
replacement/concatenation, 303
taxonomies, 105
system of record
best source data from operational environment, 316
data mapping, 316319
data marts, 319
ETL processing, 218
metadata, 102103
operational/legacy systems environment, 313
perspective of business users, 319320
systems development life cycle (SDLC), 123
systems/technology administration, 355358

T

targets
data mapping, 219, 223, 316319
ETL processing, 218219
taxonomies
external, 104105, 304305
features of, 104105
stop words, 105, 302
synonyms, 105
unstructured processing, 87
technical metadata, 44, 102103, 352
technology
administration, 355358
for different sectors, 34
evolution of data warehousing, 9
federated data warehouse, 17
responding to business changes, 4748
seven streams approach, 129139
spiral methodology, 123129, 137139
technology infrastructure
creating snapshots of data, 119120
dividing data, 121
end-user perspective, 121122
features of, 112113
getting off treadmill, 115
historical record, 120
mitigating business change, 119
mixing semantically stable/unstable data, 118
overview, 111112
rapid business changes, 114
reducing IT response time, 115
semantically stable data, 117
semantically temporal data, 116117
semantically temporal/static data, 115116
separating semantically stable/unstable data, 118
treadmill of change, 114115
temporal/static data, 115117
terminology
handling text, 307
normalization of, 3840
text across languages, 305
textual analytical processing, 300301
textual data
alternate spellings, 105, 305
analytics, 35
data access, 8990
direct searches, 306
in DW 2.0 environment, 3435
ETL technology, 308
evolution of data warehousing, 10
external glossaries/taxonomies, 304305
homographic resolution, 303304
indirect searches, 306307
integrating text, 301307
NAME data, 308
normalization, 3840
performance, 8889
perspective of business users, 310
relational data base, 309
semistructured data/value, 307308
simple editing, 87, 302
specific/general, 3940
stemming, 305
stop words, 105, 302
synonym replacement/concatenation, 303
synonyms, 105
terminology, 307
text across languages, 305
themes, 104, 165166, 304
unstructured processing, 8690
workload, 88
themes, 104, 165166, 304
throughput, ETL, 222223
time capsules, 79
time, performance. See performance
time value of information, 279280
time-variant data
Archival Sector, 192193, 199200
beginning/ending sequence of records, 197198
continuity, 198
continuous time span data, 63, 194196
discrete data, 194, 198
end-user perspective, 200
key structure, 192
nonoverlapping records, 197
sequence of records, 196
structure of DW 2.0 data, 191192
time relativity in Interactive Sector, 192
time-collapsed data, 198199
transactions in Integrated Sector, 193194
TIQM (total information quality management) stream, 134137
top-down modeling, 294296
total information quality management (TIQM) stream, 134137
total quality data management (TQdM), 134
TQdM (total quality data
management), 134
transaction monitor
application monitoring, 172
overview, 169170
queue monitoring, 171
record monitoring, 172
response time, 171172
transaction processing. See also OLTP
ad hoc transactions, 340
in DW 2.0 environment, 339344
Interactive Sector, 5657
life cycle of data, 2730
logic, 339341
parallelization and performance, 249250
performance, 253254
preprogrammed transactions, 340
simple/complex transactions, 339340
transaction types, 141142
transformation
of data. See ETL (extract/transform/load) processing
of data mart data, 325
of data models, 163164
process, 340341
transparency
CMSM, 7475
data base, 336
triggers for data flow, 206207

u

unstable/stable data, semantically
features of stable data, 117
mixing/separating, 118
unstructured data
alternate spellings, 105, 305
data access, 8990
data flow, 4849
data models, 164166
direct searches, 306
DW 2.0 environment, 299
ETL technology, 308
evolution of data warehousing, 10
external glossaries/taxonomies, 304305
homographic resolution, 303304
indirect searches, 306307
integrating text, 301
linkages, 87, 309
metadata, 41, 103, 104105
migration to unstructured environment, 267269
NAME data, 308
performance, 8889
perspective of business users, 310
preparing, for DW 2.0 environment, 3840
processing, 8690
reading text, 299300
relational data base, 309
screening for blather, 38
semistructured data/value, 307308
simple editing, 87, 302
stemming, 305
stop words, 105, 302
vs. structured data, 3435
synonym replacement/concatenation, 303
terminology, 307
text across languages, 305
textual analytical processing, 300301
themes, 104, 165166, 304
volumes of data, 5051
workload, 88

V

volumes of data
Archival Sector, 5051, 80, 85
business perspective, 26
for different sectors, 31
in DW 2.0 environment, 8
filtering data, 232234
Integrated Sector, 5051, 65
Interactive Sector, 5051, 57
Near Line Sector, 5051, 76

w

waterfall methodology, 123126
workload
Archival Sector, 80
highway analogy, 64, 66
Integrated Sector, 64
Interactive Sector, 56
Near Line Sector, 7374
unstructured data, 88
workload management, 250251

z

Zachman framework, 131133, 289292
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.136.84.88