SAS Data Sets

Overview of Data Sets

A SAS data set is a file that consists of two parts: a descriptor portion and a data portion. Sometimes a SAS data set also points to one or more indexes, which enable SAS to locate rows in the data set more efficiently. (The data sets that are shown in this chapter do not contain indexes.) Extended attributes are user-defined attributes that further define a SAS data set.
Figure 2.6 Parts of a SAS Data Set
Graphic presentation of the parts of a SAS data set

Descriptor Portion

The descriptor portion of a SAS data set contains information about the data set, including the following:
  • the name of the data set
  • the date and time that the data set was created
  • the number of observations
  • the number of variables
The table below lists part of the descriptor portion of the data set Cert.Insure, which contains insurance information for patients who are admitted to a wellness clinic.
Table 2.3 Descriptor Portion of Attributes in a SAS Data Set
Data Set Name:
CERT.INSURE
Member Type:
DATA
Engine:
V9
Created:
07/03/2018 10:53:05
Observations:
21
Variables:
7
Indexes:
0
Observation Length:
64

SAS Variable Attributes

The descriptor portion of a SAS data set contains information about the properties of each variable in the data set. The properties information includes the variable's name, type, length, format, informat, and label.
When you write SAS programs, it is important to understand the attributes of the variables that you use. For example, you might need to combine SAS data sets that contain same-named variables. In this case, the variables must be the same type (character or numeric). If the same-named variables are both character variables, you still need to check that the variable lengths are the same. Otherwise, some values might be truncated.
The following table uses Cert.Insure data and the VALIDVARNAME=ANY system option. The SAS variable has several attributes that are listed here:
Table 2.4 Variable Attributes
Variable Attribute
Definition
Example
Possible Values
Name
identifies a variable. A variable name must conform to SAS naming rules.
See Rules for SAS Names for SAS names rules.
Policy
Total
Name
Any valid SAS name.
Type
identifies a variable as numeric or character. Character variables can contain any values. Numeric variables can contain only numeric values (the numerals 0 through 9, +, -, ., and E for scientific notation).
Char
Num
Char
Numeric and character
Length
refers to the number of bytes used to store each of the variable's values in a SAS data set. Character variables can be up to 32,767 bytes long. All numeric variables have a default length of 8 bytes. Numeric values are stored as floating-point numbers in 8 bytes of storage.
5
8
14
2 to 8 bytes
1 to 32,767 bytes for character
Format
affects how data values are written. Formats do not change the stored value in any way; they merely control how that value is displayed. SAS offers a variety of character, numeric, and date and time formats.
$98.64
Any SAS format
If no format is specified, the default format is BEST12. for a numeric variable, and $w. for a character variable.
Informat
reads data values in certain forms into standard SAS values. Informats determine how data values are read into a SAS data set. You must use informats to read numeric values that contain letters or other special characters.
99
Any SAS informat
The default informat for numeric is w.d and for character is $w.
Label
refers to a descriptive label up to 256 characters long. A variable label, which can be printed by some SAS procedures, is useful in report writing.
Policy Number
Total Balance
Patient Name
Up to 256 characters
The following output is the descriptor portion of Cert.Insure.
Output 2.2 Descriptor Portion of Cert.Insure
Descriptor Portion of Cert.Insure

Data Portion

Data Portion Overview

The data portion of a SAS data set is a collection of data values that are arranged in a rectangular table. In the example below, the company MUTUALITY is a data value, Policy 32668 is a data value, and so on.
Figure 2.7 Parts of a SAS Data Set: Data Portion
Parts of a SAS Data Set : Data Portion

Observations (Rows)

Observations (also called rows) in the data set are collections of data values that usually relate to a single object. The values 2458, Murray W, 32668, MUTALITY, 100, 98.64, and 0.00 are comprised in a single observation in the data set shown below.
Figure 2.8 Parts of a SAS Data Set: Observations
Observations
This data set has 21 observations, each containing information about an individual. To view the full descriptor portion of this data set, see Descriptor Portion of Attributes in a SAS Data Set . A SAS data set can store any number of observations.

Variables (Columns)

Variables (also called columns) in the data set are collections of values that describe a particular characteristic. The values 2458, 2462, 2501, and 2523 are comprised in the variable ID in the data set shown below.
Figure 2.9 Parts of a SAS Data Set: Variables
Parts of a SAS Data Set: Variable
This data set contains seven variables: ID, Name, Policy, Company, PctInsured, Total, and BalanceDue. A SAS data set can store thousands of variables.

Missing Values

Every variable and observation in a SAS data set must have a value. If a data value is unknown for a particular observation, a missing value is recorded in the SAS data set. A period ( . ) is the default value for a missing numeric value, and a blank space is the default value for a missing character value.
Figure 2.10 Parts of a SAS Data Set: Missing Data Values
Missing Data Values

SAS Indexes

An index is a separate file that you can create for a SAS data file in order to provide direct access to a specific observation. The index file has the same name as its data file and a member type of INDEX. Indexes can provide faster access to specific observations, particularly when you have a large data set. The purpose of SAS indexes is to optimize WHERE expressions and to facilitate BY-group processing. For more information, see Specifying WHERE Expressions and see Group Processing Using the BY Statement.

Extended Attributes

Extended attributes are user-defined metadata that is defined for a data set or for a variable (column). Extended attributes are represented as name-value pairs.
Tip
You can use PROC CONTENTS to display data set and variable extended attributes.
Last updated: August 23, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.159.152