Home Page Icon
Home Page
Table of Contents for
Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study
Close
Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study
by Don Henderson, Paul Dorfman
Data Management Solutions Using SAS Hash Table Operations
About This Book
About These Authors
Acknowledgments
Part One—The HOW of the SAS Hash Object
Chapter 1: Hash Object Essentials
1.1 Introduction
1.2 Hash Object in a Nutshell
1.3 Hash Table
1.4 Hash Table Properties
1.4.1 Residence and Volatility
1.4.2 Hash Variables Role Enforcement
1.4.3 Key Variables
1.4.4 Program Data Vector (PDV) Host Variables
1.5 Hash Table Lookup Organization
1.5.1 Hash Table Versus Indexed SAS Data File
1.6 Table Operations and Hash Object Tools
1.6.1 Tasks, Operations, Environment, and Tools Hierarchy
1.6.2 General Data Table Operations
1.6.3 Hash Object Tools Classification
1.6.4 Hash Object Syntax
1.6.5 Hash Object Nomenclature
1.7 Peek Under the Hood
1.7.1 Table Organization and Unindexed Key Search
1.7.2 Internal Hash Table Structure
1.7.3 Hashing Scheme
1.7.4 Hash Function
1.7.5 Hash Table Structure and Algorithm in Tandem
1.7.6 The HASHEXP Effect
1.7.7 What Is in the Name?
Chapter 2: Table-Level Operations
2.1 Introduction
2.2 CREATE Operation
2.2.1 Declaring a Hash Object
2.2.2 Creating a Hash Object Instance
2.2.3 Combining Declaration and Instantiation
2.2.4 Defining Hash Table Variables
2.2.5 Omitting the DEFINEDATA Method
2.2.6 Wrapping Up the Create Operation
2.2.7 PDV Host Variables and Parameter Type Matching
2.2.8 Other Ways of Hard-Coded Parameter Type Matching
2.2.9 Dynamic Parameter Type Matching via File Reference
2.2.10 Parameter Type Matching by Forced File Reference
2.2.11 Parameter Type Matching by Default File Reference
2.2.12 Defining Multiple Hash Variables
2.2.13 Defining Hash Variables as Non-Literal Expressions
2.2.14 Defining Hash Variables Dynamically One at a Time
2.2.15 Defining Hash Variables Using Metadata
2.2.16 Multiple Instances Issue
2.2.17 Ensuring Single Instance Usage
2.2.18 Handling Multiple Instances
2.2.19 Create Operation Hash Tools
2.3 DELETE (Table) Operation
2.3.1 The DELETE Method
2.3.2 DELETE Operation Details
2.3.3 Delete (Table) Operation Hash Tools
2.4 CLEAR Operation
2.4.1 The CLEAR Method
2.4.2 Clear Operation vs Delete (Table) Operation
2.4.3 CLEAR Operation Hash Tools
2.5 OUTPUT Operation
2.5.1 The OUTPUT Method
2.5.2 Open-Write-Close Cycle
2.5.3 Open-Write-Close Cycle Encapsulation
2.5.4 Avoiding Open File Conflicts
2.5.5 Output Data Set Member Types
2.5.6 Creating and Overwriting Output Data Set
2.5.7 Using Output Data Set Options
2.5.8 DATASET Argument as Non-Literal Expression
2.5.9 Output Data Order
2.5.10 Output Operation Hash Tools
2.6 DESCRIBE Operation
2.6.1 The NUM_ITEMS Attribute
2.6.2 The ITEM_SIZE Attribute
2.6.3 Describe Operation Hash Tools
Chapter 3: Item-Level Operations: Direct Access
3.1 Introduction
3.2 SEARCH (Pure LookUp) Operation
3.2.1 Implicit Search: No Arguments
3.2.2 Explicit Search: Using the KEY Argument Tag
3.2.3 Argument Tag Type Match
3.2.4 Assigned CHECK Calls
3.2.5 Unassigned CHECK Calls
3.2.6 Search Operation Hash Tools
3.2.7 Search Operation Hash-PDV Interaction
3.3 INSERT Operation
3.3.1 Dynamic Memory Acquisition
3.3.2 Implicit INSERT
3.3.3 Implicit INSERT: Method Call Mode
3.3.4 Implicit INSERT: Methods Other Than ADD
3.3.5 Implicit INSERT: Argument Tag Mode
3.3.6 Explicit INSERT
3.3.7 Explicit INSERT Rules
3.3.8 Implicit vs Explicit INSERT
3.3.9 Unique Key and Duplicate Key INSERT
3.3.10 Unique INSERT
3.3.11 Duplicate INSERT
3.3.12 Insertion Order
3.3.13 Insert Operation Hash Tools
3.3.14 INSERT Operation Hash-PDV Interaction
3.4 DELETE ALL Operation
3.4.1 DELETE ALL Implementation
3.4.2 DELETE ALL and Item Locking
3.4.3 DELETE ALL Operation Hash Tools
3.4.4 DELETE ALL Operation Hash-PDV Interaction
3.5 RETRIEVE Operation
3.5.1 Direct RETRIEVE
3.5.2 Successful Direct RETRIEVE
3.5.3 Unsuccessful Direct RETRIEVE
3.5.4 Implicit vs Explicit FIND Calls
3.5.5 RETRIEVE Operation Hash Tools
3.5.6 RETRIEVE Operation Hash-PDV Interaction
3.6 UPDATE ALL Operation
3.6.1 UPDATE ALL Implementation
3.6.2 Assigned vs Unassigned REPLACE Calls
3.6.3 Implicit vs Explicit REPLACE Calls
3.6.4 Selective UPDATE Operation Note
3.6.5 UPDATE ALL Operation Hash Tools
3.6.6 UPDATE ALL Operation Hash-PDV Interaction
3.7 ORDER Operation
3.7.1 ORDER Operation Invocation
3.7.2 ORDERED Argument Tag Plasticity
3.7.3 Hash Items vs Hash Item Groups
3.7.4 OUTPUT Operation Effects
3.7.5 General Hash Table Order Principle
3.7.6 Ordering by Composite Keys
3.7.7 Setting the SORTEDBY= Option
3.7.8 ORDER Operation Hash Tools
3.7.9 ORDER Operation Hash-PDV Interaction
Chapter 4: Item-Level Operations: Enumeration
4.1 Introduction
4.2 Enumeration: Basics and Classification
4.2.1 Enumeration as a Process
4.2.2 Enumerating a Hash Table
4.2.3 KeyNumerate (Key Enumerate) Operation
4.2.4 Enumerate All Operation
4.3 KEYNUMERATE Operation
4.3.1 KeyNumerate Operation Mechanics
4.3.2 FIND_NEXT: Implicit vs Explicit
4.3.3 Other KeyNumerate Coding Styles
4.3.4 Version 9.4 Add-On: DO_OVER
4.3.5 Forward and Backward, In and Out
4.3.6 Staying within the Item List (Keeping It Set)
4.3.7 HAS_NEXT and HAS_PREV Peculiarities
4.3.8 Harvesting Hash Items
4.3.9 Harvesting Hash Items via Explicit Calls
4.3.10 Selective DELETE and UPDATE Operations
4.3.11 Selective DELETE: Single Item
4.3.12 Selective Delete: Multiple Items
4.3.13 Selective UPDATE
4.3.14 Selective DELETE vs Selective UPDATE
4.3.15 KeyNumerate Operation Hash Tools
4.3.16 KeyNumerate Operation Hash-PDV Interaction
4.4 ENUMERATE ALL Operation
4.4.1 The Hash Iterator Object
4.4.2 Creating and Linking the Iterator Object
4.4.3 Hash Iterator Pointer
4.4.4 Direct Iterator Access: First Item
4.4.5 Direct Iterator Access: Last Item
4.4.6 Direct Iterator Access: Key-Item
4.4.7 Sequential Access
4.4.8 Enumerating from the End Points
4.4.9 Iterator Priming Using NEXT and PREV
4.4.10 FIRST/LAST vs NEXT/PREV
4.4.11 Keeping the Iterator in the Table
4.4.12 Enumerating Sequentially from a Key-Item
4.4.13 Harvesting Same-Key Items from a Key-Item
4.4.14 The Hash Iterator and Item Locking
4.4.15 Locking and Unlocking
4.4.16 Locking Same-Key Item Groups
4.4.17 Locking the Entire Hash Table
4.4.18 ENUMERATE ALL Operation Hash Tools
4.4.19 Hash-PDV Interaction
Part Two—The WHAT and the WHY of the SAS Hash Object
Chapter 5: Bizarro Ball Sample Data
5.1 Introduction
5.2 Sample Data Descriptions
5.2.1 AtBats
5.2.2 Games
5.2.3 Leagues
5.2.4 Pitches
5.2.5 Player_Candidates
5.2.6 Runs
5.2.7 Teams
5.3 Summary
Chapter 6: Data Tasks Using Hash Table Operations
6.1 Introduction
6.2 Subsetting Data
6.2.1 Two Principal Methods of Subsetting
6.2.2 Simple Data File Subsetting via a Hash Table
6.2.3 Why a Hash Table and Not SQL?
6.2.4 Subsetting with a Twist: Adding a Simple Count
6.3 Combining Data
6.3.1 Left / Right Joins
6.3.2 Merging a Join with an Aggregate
6.3.3 Inner Joins
6.3.4 DO_OVER Versus FIND + FIND_NEXT
6.3.5 Unique-Key Joins
6.4 Splitting Data
6.4.1 Hash Data Split - Sorted Input
6.4.2 Hash Data Split - Unsorted Input
6.5 Ordering and Grouping Data
6.5.1 Reordering Split Outputs
6.5.2 Intrinsic Data Grouping
6.6 Summary
Chapter 7: Supporting Data Warehouse Star Schemas
7.1 Introduction
7.2 Creating and Updating Fact Tables
7.3 Creating and Updating Slowly Changing Dimension Tables
7.3.1 Handling Type 0 Dimension Tables
7.3.2 Handling Type 1 Dimension Tables
7.3.3 Handling Type 2 Dimension Tables
7.3.4 Handling Type 3 Dimension Tables
7.3.5 Handling Type 4 Dimension Tables
7.3.6 Handling Type 6 Dimension Tables
7.4 Creating a Bizarro Ball Star Schema Data Warehouse
7.4.1 Defining the Data Warehouse Tables
7.4.2 Defining the Fact and Dimension Hash Tables via Metadata
7.4.3 Creating the Initial Data Structures for a Star Schema
7.4.4 Updating the Fact and Dimension Tables
7.5 Summary
Chapter 8: Creating Data Aggregates and Metrics
8.1 Overview
8.2 Creating Simple Aggregates
8.2.1 Getting Variables from Other Tables
8.2.2 Calculating Unique Counts
8.2.3 Calculating Medians, Percentiles, Mode, and More
8.3 Creating Multi-Way Aggregates
8.3.1 Using Parameter Files to Define Aggregates
8.4 Summary
Part Three—Expanding the WHAT and the WHY, along with the HOW of the SAS Hash Object
Chapter 9: Hash of Hashes – Looping Through SAS Hash Objects
9.1 Overview
9.2 Creating a Hash of Hashes (HoH) Table – Simple Example
9.3 Calculating Percentiles, Mode, Mean, and More
9.3.1 Percentiles
9.3.2 Multiple Medians
9.3.3 Percentiles, Mode, Median, and More
9.4 Consecutive Events
9.5 Multiple Splits
9.5.1 Adding a Unique Count
9.5.2 Multiple Split Calculations
9.6 Summary
Chapter 10: The Hash Object as a Dynamic Data Structure
10.1 Introduction
10.2 Stable Unduplication
10.2.1 Basic Stable Unduplication
10.2.2 Selective Unduplication
10.3 Testing Data for Grouping
10.3.1 Grouped vs Non-Grouped
10.3.2 Using a Hash Table to Check for Grouping
10.4 Hash Tables as Other Data Structures
10.4.1 Stacks and Queues
10.4.2 Implementing a Stack
10.4.3 Implementing a Queue
10.4.4 Using a Hash Stack to Find Consecutive Events
10.5 Array Sorting
10.5.1 Using a Hash Table to Sort Arrays
10.6 Summary
Chapter 11: Hash Object Memory Management
11.1 Introduction
11.2 Memory vs. Disk Trade-Off
11.2.1 General Considerations
11.2.2 Hash Memory Overload Scenarios and Solutions
11.3 Making Use of Existing Key Order
11.3.1 Data Aggregation
11.3.2 Data Unduplication
11.3.3 Joining Data
11.4 MD5 Hash Key Reduction
11.4.1 The General Concept
11.4.2 MD5 Key Reduction in Sample Data
11.4.3 Data Aggregation
11.4.4 Data Unduplication
11.4.5 Joining Data
11.5 Data Portion Offload (Hash Index)
11.5.1 Joining Data
11.5.2 Selective Unduplication
11.6 Uniform Input Split
11.6.1 Uniform Split Using Key Properties
11.6.2 Aggregation via Partial Key Split
11.6.3 Aggregation via Key Byte Split
11.6.4 Joining via Key Byte Split
11.7 Uniform MD5 Split On the Fly
11.7.1 MD5 Split Aggregation On the Fly
11.7.2 MD5 Split Join On the Fly
11.8 Uniform Split Using a SAS Index
11.9 Combining Hash Memory-Saving Techniques
11.10 MD5 Argument Concatenation Ins and Outs
11.10.1 MD5 Collisions and SHA256
11.10.2 Concatenation Length Sizing
11.10.4 Concatenation Delimiters and Endpoints
11.10.5 Auto-Formatting and Explicit Formatting
11.10.6 Concatenation Order and Consistency
11.11 Summary
Part Four—Wrapping up: Two Case Studies
Chapter 12: Researching Alternative Pitching Metrics
12.1 Overview
12.2 The Sample Program
12.2.1 Adding More Metrics
12.2.2 One Output Data Set with All the Splits Results
12.3 Summary
Chapter 13: What If the Count Is 0-2 After the First Two Pitches
13.1 Overview
13.2 The Sample Program
13.3 Summary
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Next
Next Chapter
Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset