Index

A

ADD method 53, 57–58, 61, 67, 147, 151, 165, 170, 173, 189, 199, 200, 212, 236, 243, 274, 276, 281, 365

Adelson-Volsky and Landis (AVL) 11

algorithm, hash table 13

APPEND procedure 161

argument tag mode 53–54

ARRAY statement 239, 308

arrays

about 279

associative 14, 271

sorting 279–282

assigned CHECK calls 50

associative array 14, 271

AtBats (Bizarro Ball) 118–121

auto-formatting 343–345

AVL (Adelson-Volsky and Landis) 11

AVL binary tree 271

B

balanced binary tree 11

base memory 14

"big O" notation 11

binary search 10

Bizarro Ball sample data

about 117–118, 125

AtBats 118–121

descriptions 118–125

Games 121

Leagues 122

Pitches 122–123

Player_Candidates 123–124

Runs 124–125

Teams 125

Bizarro Ball star scheme data warehouse

creating 182–190

defining fact and dimension hash tables 183–185

defining tables 182–183

BY statement 153, 215, 236–237, 269, 270–271, 291

C

CALL MISSING function 195, 200, 353

Carpenter's Complete Guide to the SAS REPORT Procedure (Carpenter) 368

case studies

Researching Alternative Pitching 349–359

What If the Count Is 0-2 After the First Two Pitches 361–369

CATS function 226, 238, 341

CATX function 147, 309, 329, 336, 338, 340–341, 342–343, 344–345, 345–346

CHAR function 26

character expression 20–21, 53–54

CHECK calls

assigned 50

unassigned 50–51

CHECK method 49, 51, 65, 80, 84, 111, 131, 163, 170, 293

CLASS statement 219

CLEAR method 36–37, 114, 147, 149, 205, 321, 330

CLEAR operation

about 36

CLEAR method 36–37

DELETE operation vs. 37

hash tools 37

joining data 298

sorted input 145–146

collisions (MD5 function) 339–340

compile phase 22

compile time statement 95

compile-time only directive 17

composite keys 5, 72, 279

compound key 6

concatenation

delimiters and endpoints 341–343

length sizing 340–341

order and consistency 345–346

CONTENTS procedure 24

CONTINUE statement 364

counts, adding 132–134, 252–255

Create, Retrieve, Update, Delete (CRUD) 8

CREATE operation

about 16–17

combining declaration and instantiation 19–20

creating hash object instances 18–19

declaring hash objects 17–18

defining hash table variables 20–21

defining hash variables as non-literal expressions 28–29

defining hash variables dynamically one at a time 29

defining hash variables using metadata 29–31

defining multiple hash variables 27–28

dynamic parameter type matching 23–25

ensuring single instance usage 33–34

finishing 21–22

handling multiple instances 34–35

hash tools 35

multiple instances issue 31–32

omitting DEFINEDATA method 21

parameter type matching 22–26

PDV host variables 22–23

unsorted input 151

CRUD (Create, Retrieve, Update, Delete) 8

D

data

combining 134–144

grouping 153–156

joining 294–299, 308–312

ordering 153–156

splitting 144–152

subsetting 128–134

testing for grouping 269–271

data aggregates

about 191–193

calculating mean 209–213

calculating medians 201–205, 209–213

calculating mode 201–202, 207–213

calculating percentiles 201–202, 205–207, 209–213

calculating unique counts 198–201

creating 191–228

creating multi-way 219–228

defining using parameter files 223–228

determining distribution of consecutive events 213–219

getting variables from tables 194–198

key order 289–292

MD5 hash key reduction 302–305

merging joins with 137–138

uniform MD5 split on the fly 328–330

via key byte split 321–325

via partial key split 316–321

data portion 5

data portion offload

about 310

joining data 310–312

selective unduplication 312–315

DATA statement 39–40, 145, 146, 357

data tables

getting variables from 194–198

operations 8–9

data tasks 127–157

data types, of expressions 50

data unduplication

key order 292–294

MD5 hash key reduction 305–308

DATASET argument 42–43, 44

DATASETS procedure 186

DCL statement 32, 219

declaration, combining with instantiation 19–20

DECLARE HASH statement 233

DECLARE statement 17–20, 22, 31, 35, 69, 96–97, 114, 214

DEFINEDATA method 20–22, 27–31, 35, 41–43, 136, 184, 185, 205, 215, 219, 225, 286, 315

DEFINEDONE method 21–23, 26, 35, 136, 170, 185, 225

DEFINEKEY method 20–22, 27–31, 35, 42–43, 184, 185, 205, 225, 286, 296

DELETE ALL operation

about 61–62

hash tools 63

hash-PDV interaction 63

implementing 62–63

item locking and 63

DELETE method 35–36, 89–90, 149, 205

DELETE operation

about 35, 88–89

CLEAR operation vs. 37

DELETE method 35–36

details for 36

hash tools 36

implementing stacks 275

multiple items 90–92

UPDATE operation vs. 94

DELETE statement 33

delimiters, concatenation 341–343

dequeue 273

DESCRIBE operation

about 44

hash tools 46

ITEM_SIZE attribute 45–46

NUM_ITEMS attribute 44–45, 46

DIM function 242

Direct Retrieve 64–65

direct-addressing table look-up 7

disk, memory vs. 284–288

DISTINCT option 226

DO loop 29, 30, 31, 33–34, 45, 69, 80, 81, 91, 103–104, 133, 140, 278, 281, 293, 296, 311

DO statement 147

DO UNTIL loop 33

DO WHILE loop 86, 136, 137–138, 152, 170, 337, 364

DO_OVER method 81–82, 84, 87–88, 133–134, 136, 139–140, 337

DoW loop 246, 270, 278, 291, 298, 337

DROP= option 24, 42

Duplicate Insert 56–57, 58–59

dynamic data structures, hash objects as

See hash object

dynamic parameter type matching 23–25

E

END= option 33

end points

concatenation of 341–343

enumerating from 103–105

END statement 147

ENUMERATE ALL operation

about 95

combining data 134

creating HoH tables 233

creating iterator object 97–98

DELETE ALL method 63

direct iterator access 99–102

enumerating from end points 103–105

enumerating sequentially from key-items 108–109

FIRST method 105–106

harvesting same-key items from key-items 109–110

hash iterator 110–111

hash iterator object 95–97

hash iterator pointer 98–99

hash tools 114

hash-PDV interaction 114

implementing stacks 274–275

item locking 110–111

iterator priming 105

keeping iterator in table 106–108

LAST method 105–106

linking iterator object 97–98

locking 111–112

locking hash tables 113–114

locking same-key item groups 113

NEXT method 105–106

PREV method 105–106

sequential access 103

sorting arrays 281

unlocking 111–112

ENUMERATE operation 59, 64, 69, 134

enumeration

See also ENUMERATE ALL operation; KEYNUMERATE operation

about 76

of hash tables 76–77

as a process 76

environment 8

events, consecutive 245–247, 277–279

execution phase 22

explicit formatting 343–345

Explicit Insert

about 54–55

Implicit Insert vs. 56

rules for 55–56

explicit search 49

expressions, data types of 50

F

fact tables

creating and updating 161–163

defining via metadata 183–185

updating 186–190

FIFO (first-in-first-out) 272

file reference

dynamic parameter type matching via 23–24

dynamic parameter type matching via forced 25

parameter type matching by default 26

FIND + FIND_NEXT method 139–140

FIND method 65–66, 82, 84, 86, 87, 94, 113, 134, 143, 151, 167, 174, 176, 181, 193, 195, 197, 219, 242, 251, 254, 259, 314, 352, 364, 365

FIND_NEXT method 80, 81, 83–84, 87–88, 90, 92, 94, 113, 170, 173, 180, 365

FIND_PREV method 82–84

FIRST method 99–101, 103, 105, 111, 113, 114, 202, 212

NEXT/PREV methods vs. 105–106

first-in-first-out (FIFO) 272

FORMAT procedure 192, 319

formatting

auto-formatting 343–345

explicit 343–345

FREQ procedure 213–214

frontal attack 292, 294, 299, 305, 308, 313

FULLSTIMER option 291, 300

functions

See specific functions

G

Games (Bizarro Ball) 121

general data table 8

GENMAX= option 41–42

grouped data, non-grouped data vs. 269

H

hash function 12–13

hash index 310

See also data portion offload

hash iterator 63, 110–111

hash iterator object 4, 95–97

hash iterator pointer 98–99

hash memory

See hash objects

hash object attributes 44

hash objects

about 1, 3–4, 261–262, 284

combining memory-saving techniques 335–338

creating instances 18–19

creating method calls using macros 184–185

data portion offload 310–315

declaring 17–18

MD5 argument concatenation 338–346

MD5 hash key reduction 299–309

memory vs. disk 284–288

nomenclature 10

stable unduplication 262–268

syntax 9–10

testing data for grouping 269–271

tool classifications 9

tools 8–10

uniform input split 315–327

uniform MD5 split on the fly 327–332

uniform split using SAS index 332–335

using existing key order 288–299

Hash of Hashes (HoH)

about 150

consecutive events 245–247

multiple splits 247–260

Hash of Hashes (HoH) table

about 231–232

calculating medians 236–244

calculating mode 239–244

calculating percentiles 234–236, 239–244

creating 232–234

hash search algorithm 6

hash table entry 5

hash tables

about 4–5, 127–128, 157

algorithm 13

checking for grouping using 269–271

combining data 134–144

data tasks using 127–157

defined 4

defining variables 20–21

enumerating 76–77

grouping data 153–156

indexed SAS data files vs. 7

internal structure 11

keys 6

locking 113–114

lookup organization 6–7

operations 8–10

order principle 72

ordering data 153–156

organization 10–11

as other data structures 271–279

properties of 5–6

sorting arrays using 279–282

splitting data 144–152

structure 13

subsetting data 128–134

hash tools

CLEAR operation 36–37, 145–146, 298

CREATE operation 16–35, 151

DELETE ALL operation 61–63

DELETE operation 35–37, 88–92, 275

DESCRIBE operation 44–46

ENUMERATE ALL operation 63, 77–78, 95–114, 134, 233, 274–275, 281

INSERT operation 51–61, 151, 161, 165, 233, 281

KEYNUMERATE operation 68, 79–94, 134, 138–141, 298, 309, 311

ORDER operation 68–73

OUTPUT operation 37–44, 72

RETRIEVE operation 64–66, 134, 138, 140, 143, 167, 176, 193, 194, 232, 310

UPDATE ALL operation 66–68

hash variables

defining as non-literal expressions 28–29

defining dynamically one at a time 29

defining multiple 27–28

defining using metadata 29–31

role enforcement of 5–6

HASHEXP 13–14

hashing algorithm 4

hashing scheme 11–12

HAS_NEXT method 83–84, 91, 92

HAS_PREV method 83–84

HoH

See Hash of Hashes (HoH) table

host variables 5, 6

I

IF statement 141

IFN function 31

IF-THEN-ELSE statement 163

IMPLICIT INSERT operation 56, 185

implicit search 48–49

INDEX= option 42, 180

indexed SAS data files, hash tables vs. 7

inner joins 139

INPUT statement 43

INSERT operation

about 51

ADD method 53

argument tag mode 53–54

creating and updating fact tables 161

creating hash of hash (HOH) tables 233

Duplicate Insert 56–57, 58–59

dynamic memory acquisition 51–52

handling Type 0 dimension tables 165

hash data split 151

hash tools 61

hash-PDV interaction 61

Implicit Insert 52–54

insertion order 59–61

METHOD call mode 52–53

REF method 53

REPLACE method 53

sorting arrays 281

Unique Insert 56–58

instances

handling multiple 34–35

multiple 31–32

single usage of 33–34

instantiation, combining with declaration 19–20

interleaving files 294, 297

intrinsic data grouping 155–156

item locking

DELETE ALL operation and 63

hash iterator and 110–111

item-level operations

about 48, 76

DELETE ALL operation 61–63

ENUMERATE ALL operation 95–114

enumeration 76–78

INSERT operation 51–61

KEYNUMERATE operation 79–94

ORDER operation 68–73

RETRIEVE operation 64–66

SEARCH operation 48–51

UPDATE ALL operation 66–68

ITEM_SIZE attribute 45–46

iterator object

creating 97–98

direct access 99–102

keeping in tables 106–108

linking 97–98

priming 105

sequential access 103

J

joins

inner 139

left 135–137

merging with aggregates 137–138

right 135–137

uniform MD5 split on the fly 330–332

unique-key 140–144

via key byte split 325–327

K

KEEP= option 24, 42

KEY argument tag 49

key order

about 288–289

data aggregation 289–292

data unduplication 292–294

joining data 294–299

key portion 5

key variables 6

key-items

enumerating sequentially from 108–109

harvesting same-key items from 109–110

KEYNUMERATE operation

about 79

backward 82–83

coding styles 81

combining data 134

DO_OVER method 81–82

DO_OVER vs, FIND + FIND_NEXT 139

FIND_NEXT 80

harvesting hash items 85–88

hash tools 94

hash-PDV interaction 94

HAS_NEXT 84

HAS_PREV 84

joining data 298, 309, 311

mechanics of 79–80

merging joins 138

selective DELETE operations 88–92, 94

selective UPDATE operation 68

selective UPDATE operations 92–94

staying within items lists 83–84

unique-key joins 140–141

L

LAG function 163

LAST method 101, 105–106, 114, 275

last-in-first-out (LIFO) 272

Leagues (Bizarro Ball) 122

left joins 135–137

LENGTH statement 23

%LET statement 281, 356

LIFO (last-in-first-out) 272

LINK statement 189, 190, 197, 203, 205, 222, 226, 227, 231, 365

locking

about 111–112

hash tables 113–114

same-key item groups 113

M

macros, creating hash object method calls using 184–185

MD5 function

about 288

argument concatenation 338–346

auto-formatting 343–345

collisions 339–340

concatenation delimiters and endpoints 341–343

concatenation length sizing 340–341

concatenation order and consistency 345–346

explicit formatting 343–345

SHA256 function 339–340

MD5 hash key reduction

about 299–300

data aggregation 302–305

data unduplication 305–308

general concept of 300–301

joining data 308–309

in sample data 301–302

mean, calculating 209–213

MEANS procedure 198, 219

medians, calculating 201–205, 209–213, 236–244

member types, output data set 40

memory, disk vs. 284–288

MERGE statement 17, 24, 165

merging joins with aggregates 137–138

metadata

defining fact and dimension hash tables via 183–185

defining hash variables using 29–31

METHOD call mode 52–53

methods

See specific methods

metrics

adding 353–354

creating 191–228

MOD function 318, 323

mode, calculating 201–202, 207–213, 239–244

MODIFY statement 17, 24, 165

MPRINT option 329

N

name 14

NEXT method 103–106, 110, 111, 112, 114, 207, 208, 211–212, 236

nomenclature

hash object 10

for hash object tools 9

non-grouped data, grouped data vs. 269

non-literal expressions

DATASET argument as 42–43

defining hash variables as 28–29

NOTSORTED option 72, 269–271, 278, 291

numeric expression 50

NUM_ITEMS attribute 44–45, 46, 61, 106–108

O

O(1) notation 7, 11

object-dot syntax 9

one-to-one correspondence 9

open file conflicts, avoiding 39–40

Open-Write-Close cycle 39

operations

data table 8–9

hash tables 8–10

ORDER operation

about 68–69, 153

hash items vs. hash item groups 70–71

hash table order principle 72

hash tools 73

hash-PDV interaction 73

invoking 69

ORDERED argument tag plasticity 69–70

ordering by composite keys 72

OUTPUT operation effects 72

setting SORTEDBY= option 73

ORDERED argument tag 70–71

organization, of hash tables 10–11

output data, order of 43–44

output data set

creating and overwriting 41–42

options for 42

OUTPUT method 38–40, 42–44, 73, 78, 85, 147, 152, 154, 156, 165, 166, 174, 176, 177, 189, 192, 212, 227, 243, 251, 313, 314, 357, 365

OUTPUT operation

about 37–38

avoiding open file conflicts 39–40

creating output data set 41–42

data set member types 40

DATASET argument as non-literal expression 42–43

effects of 72

hash tools 44

Open-Write-Close cycle 39

output data order 43–44

output data set options 42

OUTPUT method 38

overwriting output data set 41–42

OUTPUT statement 107–108, 136, 139, 140

outputs, reordering split 153–155

P

parameter files, defining aggregates using 223–228

parameter type matching 6, 22–26

PDV (program data vector) 5

PDV host variables 22–23

percentiles, calculating 201–202, 205–207, 209–213, 234–236, 239–244

Pitches (Bizarro Ball) 122–123

Player_Candidates (Bizarro Ball) 123–124

PoC (Proof of Concept) 117, 202

PREV method 105–106, 112, 114, 276

procedures

See specific procedures

program data vector (PDV) 5, 6

Proof of Concept (PoC) 117, 202

properties

of hash tables 5–6

key 316

PUT function 240

PUT statement 111, 233–234, 325, 330

Q

queues

about 272–273

implementing 275–276

R

RANK function 321–322

record order 305

REF method 53, 58, 61, 63, 80, 84, 215, 219

REMOVE method 62–63, 65, 111, 163, 189, 278

REMOVEDUP method 89–92, 93, 113

RENAME statement 42, 165, 174, 190

REPLACE method 53, 58, 59, 61, 63, 67, 68, 88, 167, 171, 173, 174, 176, 190, 193, 208, 211–212, 219, 281, 313, 315

REPLACEDUP method 89, 92, 93–94, 171, 178, 180

Researching Alternative Pitching case study

about 349, 359

adding metrics 353–354

output data set 354–358

sample program 350–353

residence, of hash tables 5

RETAIN statement 23, 43, 315

RETRIEVE operation

about 64

combining data 134

creating aggregates 193, 194

creating HoH tables 232

data portion offload 310

Direct Retrieve 64–65

DO_OVER vs. FIND = FIND_NEXT 140

hash tools 66

hash-PDV interaction 66

implicit vs. explicit FIND calls 65–66

including facts into SCD tables 176

merging joins 138

performing table lookups 167

unique-key joins 143

RETURN statement 189, 205

right joins 135–137

run time directive 31

Runs (Bizarro Ball) 124–125

run-time assignment statement 95

S

SAS index, uniform split using 332–335

satellite data 5

SCAN function 28, 296, 299

SCD tables

See slowly changing dimension (SCD) tables

SEARCH operation

about 48, 134

argument tag type match 50

assigned CHECK calls 50

explicit search 49

hash tools 51

hash-PDV interaction 51

implicit search 48–49

unassigned CHECK calls 50–51

SELECT statement 145

selective unduplication, data portion offload 312–315

SELECTIVE UPDATE operation 67, 68, 170–171, 178

semicolon (;) 226

SET statement 17, 24, 25, 27–28, 33, 143, 147, 165, 166, 170, 185, 207, 212, 234, 285, 290

SETCUR method 102, 108–109, 112, 114

SHA256 function 339–340

simple key 6

slashline hash table 194

slowly changing dimension (SCD) tables

creating and updating 163–182

defining via metadata 183–185

Type 0 165–167

Type 1 167–168

Type 2 168–173

Type 3 173–176

Type 4 177

Type 6 177–182

updating 186–190

SORT procedure 193, 279–282, 306, 312

SORTC procedure 279–282

SORTEDBY= option 73, 154

SORTN procedure 279–282

splits, multiple 247–260

splitting data

about 144–145

sorted input 145–149

unsorted input 150–152

SQL procedure 161, 163, 193, 224–225, 239, 240, 306

stacks

about 272–273

implementing 273–275

using to find consecutive events 277–279

stand-alone call 51

standard table operations 8

star schema data warehouse

about 159–160

creating Bizarro Ball 182–190

creating fact tables 161–163

creating initial data structure for 185–186

creating slowly changing dimension (SCD) tables 163–182

updating fact tables 161–163

updating slowly changing dimension (SCD) tables 163–182

statements

See specific statements

STOP statement 25, 166, 207, 212, 234, 270

structure, of hash tables 13

subsetting, of data

about 128

adding simple counts 132–134

methods of 128–129

reasons for using hash tables 132

via hash tables 129–132

SUM method 215, 217, 219, 337

SUMINC function 216

SUMMARY procedure 198, 219

syntactic form, for hash object tools 9

syntax, hash object 9–10

%SYSFUNC function 299

T

table lookups, performing 166–167, 171–173, 176, 177, 180–182

table-level operations

about 16

CLEAR operations 36–37

CREATE operation 16–35

DELETE operation 35–36

DESCRIBE operation 44–46

OUTPUT operation 37–44

tables

See data tables

tasks 8

Teams (Bizarro Ball) 125

temporal separation 23

tools

classifications 9

hash object 8–10

TRANWRD function 299

Type 0 slowly changing dimension (SCD) tables 165–167

Type 1 slowly changing dimension (SCD) tables 167–168

Type 2 slowly changing dimension (SCD) tables 168–173

Type 3 slowly changing dimension (SCD) tables 173–176

Type 4 slowly changing dimension (SCD) tables 177

Type 6 slowly changing dimension (SCD) tables 177–182

TYPES statement 219

U

unassigned CHECK calls 50–51

underscore (_) 222

uniform input split

about 315

aggregation via key byte split 321–325

aggregation via partial key split 316–321

joining via key byte split 325–327

using key properties 316

using SAS index 332–335

uniform MD5 split on the fly

about 327–328

aggregation 328–330

joins 330–332

unindexed key search 10–11

Unique Insert 56–57, 57–58

unique-key joins 140–144

unique-key mode 59–60

UNIVARIATE procedure 201

unlocking 111–112

UPDATE ALL operation

about 66–67

assigned vs. unassigned REPLACE calls 68

hash tools 68

hash-PDV interaction 68

implementing 67

implicit vs. explicit REPLACE calls 68

selective 68

UPDATE operation

about 88–89, 92–94

DELETE operation vs. 94

UPDATE statement 17, 24, 165

UUIDGEN function 177

V

variables

defining hash table 20–21

getting from tables 194–198

PDV host 22–23

VNAME function 242

volatility, of hash tables 5

VVALUE function 243

W

WAYS statement 219

What If the Count Is 0-2 After the First Two Pitches case study

about 361, 369

sample program 361–368

WHERE= option 42

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.190.211