Home Page Icon
Home Page
Table of Contents for
Checking Values of Character Variables
Close
Checking Values of Character Variables
by Ronald P. Cody
Cody’s Data Cleaning Techniques Using SAS® Software
Copyright
Introduction
Acknowledgments
Checking Values of Character Variables
Introduction
Using PROC FREQ to List Values
Description of the Raw Data File PATIENTS.TXT
Using a DATA Step to Check for Invalid Values
Using PROC PRINT with a WHERE Statement to List Invalid Values
Using Formats to Check for Invalid Values
Using Informats to Check for Invalid Values
Checking Values of Numeric Variables
Introduction
Using PROC MEANS, PROC TABULATE, and PROC UNIVARIATE to Look for Outliers
Using PROC PRINT with a WHERE Statement to List Invalid Data Values
Using a DATA Step to Check for Invalid Values
Creating a Macro for Range Checking
Using Formats to Check for Invalid Values
Using Informats to Check for Invalid Values
Using PROC UNIVARIATE to Look for Highest and Lowest Values by Percentage
Using PROC RANK to Look for Highest and Lowest Values by Percentage
Extending PROC RANK to Look for Highest and Lowest “n” Values
Finding Another Way to Determine Highest and Lowest Values
Checking a Range Using an Algorithm Based on Standard Deviation
Macros Based on the Two Methods of Outlier Detection
Checking a Range Based on the Interquartile Range
Checking Ranges for Several Variables
Checking for Missing Values
Introduction
Inspecting the SAS Log
Using PROC MEANS and PROC FREQ to Count Missing Values
Using DATA Step Approaches to Identify and Count Missing Values
Using PROC TABULATE to Count Missing and Nonmissing Values for Numeric Variables
Using PROC TABULATE to Count Missing and Nonmissing Values for Character Variables
Creating a General Purpose Macro to Count Missing and Nonmissing Values for Both Numeric and Character Variables
Searching for a Specific Numeric Value
Working with Dates
Introduction
Checking Ranges for Dates (Using a DATA Step)
Checking Ranges for Dates (Using PROC PRINT)
Checking for Invalid Dates
Working with Dates in Nonstandard Form
Creating a SAS Date When the Day of the Month Is Missing
Suspending Error Checking for Known Invalid Dates
Looking for Duplicates and “n” Observations per Subject
Introduction
Eliminating Duplicates by Using PROC SORT
Detecting Duplicates by Using DATA Step Approaches
Using PROC FREQ to Detect Duplicate ID’s
Selecting Patients with Duplicate Observations by Using a Macro List and SQL
Identifying Subjects with “n” Observations Each (DATA Step Approach)
Identifying Subjects with “n” Observations Each (Using PROC FREQ)
Working with Multiple Files
Introduction
Checking for an ID in Each of Two Files
Checking for an ID in Each of “n” Files
A Simple Macro to Check ID’s in Multiple Files
A More Complicated Multi-File Macro for ID Checking
More Complicated Multi-File Rules
Checking That the Dates Are in the Proper Order
Double Entry and Verification (PROC COMPARE)
Introduction
Conducting a Simple Comparison of Two Data Sets without an ID Variable
Using PROC COMPARE with an ID Variable
Using PROC COMPARE with Two Data Sets That Have an Unequal Number of Observations
Comparing Two Data Sets When Some Variables Are Not in Both Data Sets
Some SQL Solutions to Data Cleaning
Introduction
A Quick Review of PROC SQL
Checking for Invalid Character Values
Checking for Outliers
Checking a Range Using an Algorithm Based on the Standard Deviation
Checking for Missing Values
Range Checking for Dates
Checking for Duplicates
Identifying Subjects with “n” Observations Each
Checking for an ID in Each of Two Files
More Complicated Multi-File Rules
Using Validation Data Sets
Introduction
A Simple Example of a Validation Data Set
Making the Program More Flexible and Converting It to a Macro
Validating Character Data
Converting Program 9-7 into a General Purpose Macro
Extending the Validation Macro to Include Valid Character Ranges
Combining Numeric and Character Validity Checks in a Single Macro with a Single Validation Data Set
Introducing SAS Integrity Constraints (Versions 7 and Later)
Listing of Raw Data Files and SAS Programs
Description of the Raw Data File PATIENTS.TXT
Layout for the Data File PATIENTS.TXT
Listing of Raw Data File PATIENTS.TXT
Program to Create the SAS Data Set PATIENTS
Listing of Raw Data File PATIENTS2.TXT
Program to Create the SAS Data Set PATIENTS2
Program to Create the SAS Data Set AE (Adverse Events)
Program to Create the SAS Data Set LAB_TEST
Books Available from SAS® Press
JMP® Books
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Acknowledgments
Next
Next Chapter
Introduction
1. Checking Values of Character Variables
Introduction
1
Using PROC FREQ to List Values
1
Description of the File PATIENTS.TXT
2
Using a DATA Step to Check for Invalid Values
6
Using PROC PRINT with a WHERE Statement to List Invalid Values
11
Using Formats to Check for Invalid Values
13
Using Informats to Check for Invalid Values
17
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset