SAS Data Sets

Overview of Data Sets

A SAS data set is a file that consists of two parts: a descriptor portion and a data portion. Sometimes a SAS data set also points to one or more indexes, which enable SAS to locate rows in the data set more efficiently. (The data sets that are shown in this chapter do not contain indexes.) Extended attributes are user-defined attributes that further define a SAS data set.
Figure 2.6 Parts of a SAS Data Set
Graphic presentation of the parts of a SAS data set

Descriptor Portion

The descriptor portion of a SAS data set contains information about the data set, including the following:
  • the name of the data set
  • the date and time that the data set was created
  • the number of observations
  • the number of variables
The table below lists part of the descriptor portion of the data set sasuser.insure, which contains insurance information for patients who are admitted to a wellness clinic.
Table 2.4 Descriptor Portion of Attributes in a SAS Data Set
Data Set Name:
SASUSER.INSURE
Member Type:
DATA
Engine:
V9
Created:
10:05 Thursday, February 16, 2017
Observations:
21
Variables:
7
Indexes:
0
Observation Length:
64

Data Portion

Data Portion Overview

The data portion of a SAS data set is a collection of data values that are arranged in a rectangular table. In the example below, the company MUTUALITY is a data value, Policy 32668 is a data value, and so on.
Figure 2.7 Parts of a SAS Data Set: Data Portion
Parts of a SAS Data Set : Data Portion

Observations (Rows)

Observations (also called rows) in the data set are collections of data values that usually relate to a single object. The values 2458, Murray W, 32668, MUTALITY, 100, 98.64, and 0.00 comprise a single observation in the data set shown below.
Figure 2.8 Parts of a SAS Data Set: Observations
Observations
This data set has 21 observations, each containing information about an individual. To view the full descriptor portion of this data set, see Descriptor Portion of Attributes in a SAS Data Set . A SAS data set can store any number of observations.

Variables (Columns)

Variables (also called columns) in the data set are collections of values that describe a particular characteristic. The values 2458, 2462, 2501, and 2523 comprise the variable ID in the data set shown below.
Figure 2.9 Parts of a SAS Data Set: Variables
Parts of a SAS Data Set: Variable
This data set contains seven variables: ID, Name, Policy, Company, PctInsured, Total, and BalanceDue. A SAS data set can store thousands of variables.

Missing Values

Every variable and observation in a SAS data set must have a value. If a data value is unknown for a particular observation, a missing value is recorded in the SAS data set. A period ( . ) is the default value for a missing numeric value, and a blank space is the default value for a missing character value.
Figure 2.10 Parts of a SAS Data Set: Missing Data Values
Missing Data Values

Variable Attributes

In addition to general information about the data set, the descriptor portion contains information about the properties of each variable in the data set. The properties information includes the variable's name, type, length, format, informat, and label.
When you write SAS programs, it is important to understand the attributes of the variables that you use. For example, you might need to combine SAS data sets that contain same-named variables. In this case, the variables must be the same type (character or numeric). If the same-named variables are both character variables, you still need to check that the variable lengths are the same. Otherwise, some values might be truncated.
Here is a partial listing of the attribute information in the descriptor portion of the SAS data set sasuser.insure.
Table 2.5 Variable Attributes in the Descriptor Portion of a SAS Data Set sasuser.insure
Variable
Type
Length
Format
Informat
Label
Policy
Char
8
Policy Number
Total
Num
8
DOLLAR8.2
COMMA10.
Total Balance
Name
Char
20
Patient Name

Variable Names

Rules for Variable Names

Each variable has a name that conforms to SAS naming conventions. Variable names follow the same rules as SAS data set names.
  • They can be 1 to 32 characters long.
  • They must begin with a letter (A-Z, either uppercase or lowercase) or an underscore (_).
  • They can continue with any combination of numbers, letters, or underscores.
Table 2.6 Variable Name Attributes
Variable
Type
Length
Format
Informat
Label
Policy
Char
8
Policy Number
Total
Num
8
DOLLAR8.2
COMMA10.
Total Balance
Name
Char
20
Patient Name

VALIDVARNAME= System Option

Note: If you use characters other than the ones that are valid when the VALIDVARNAME system option is set to V7 (letters of the Latin alphabet, numerals, or underscores), then you must express the variable name as a name literal and you must set VALIDVARNAME=ANY. If the name includes either a percent sign (%) or an ampersand (&), then you must use single quotation marks in the name literal in order to avoid interaction with the SAS macro facility.
VALIDVARNAME specifies the rules for valid SAS variable names that can be created and processed during a SAS session.
Syntax, VALIDVARNAME=
VALIDVARNAME= V7|UPCASE|ANY
V7 specifies that variable names must follow these rules:
  • SAS variable names can be up to 32 characters long.
  • The first character must begin with a letter of the Latin alphabet (A - Z, either uppercase or lowercase) or an underscore (_). Subsequent characters can be letters of the Latin alphabet, numerals, or underscores.
  • Trailing blanks are ignored. The variable name alignment is left-justified.
  • A variable name cannot contain blanks or special characters except for an underscore.
  • A variable name can contain mixed-case letters. SAS stores and writes the variable name in the same case that is used in the first reference to the variable. However, when SAS processes a variable name, SAS internally converts it to uppercase. Therefore, you cannot use the same variable name with a different combination of uppercase and lowercase letters to represent different variables. For example, cat, Cat, and CAT all represent the same variable.
  • Do not assign variables the names of special SAS automatic variables (such as _N_ and _ERROR_) or variable list names (such as _NUMERIC_, _CHARACTER_, and _ALL_) to variables.
UPCASE specifies that the variable name follows the same rules as V7, except that the variable name is uppercase, as in earlier versions of SAS.
ANY specifies that SAS variable names must follow these rules:
  • The name can begin with or contain any characters, including blanks, national characters, special characters, and multi-byte characters.
  • The name can be up to 32 bytes long.
  • The name cannot contain any null bytes.
  • Leading blanks are preserved, but trailing blanks are ignored.
  • The name must contain at least one character. A name with all blanks is not permitted.
  • A variable name can contain mixed-case letters. SAS stores and writes the variable name in the same case that is used in the first reference to the variable. However, when SAS processes a variable name, SAS internally converts it to uppercase. Therefore, you cannot use the same variable name with a different combination of uppercase and lowercase letters to represent different variables. For example, cat, Cat, and CAT all represent the same variable.
CAUTION:
Throughout SAS, using the name literal syntax with SAS member names that exceed the 32-byte limit or have excessive embedded quotation marks might cause unexpected results.
The VALIDVARNAME=ANY system option enables compatibility with other DBMS variable (column) naming conventions, such as allowing embedded blanks and national characters.

Type

A variable's type is either character or numeric.
  • Character variables, such as Name (shown below), can contain any values.
  • Numeric variables, such as Total (shown below), can contain only numeric values (the numerals 0 through 9, +, -, ., and E for scientific notation).
Table 2.7 Type Attribute
Variable
Type
Length
Format
Informat
Label
Policy
Char
8
Policy Number
Total
Num
8
DOLLAR8.2
COMMA10.
Total Balance
Name
Char
20
Patient Name

Length

A variable's length (the number of bytes used to store it) is related to its type.
  • Character variables can be up to 32,767 bytes long. In the example below, Name has a length of 20 characters and uses 20 bytes of storage.
  • All numeric variables have a default length of 8 bytes. Numeric values are stored as floating-point numbers in 8 bytes of storage.
Table 2.8 Length Attribute
Variable
Type
Length
Format
Informat
Label
Policy
Char
8
Policy Number
Total
Num
8
DOLLAR9.2
COMMA10.
Total Balance
Name
Char
20
Patient Name

Format

Formats are variable attributes that affect how data values are written. Formats do not change the stored value in any way; they merely control how that value is displayed. SAS software offers a variety of character, numeric, and date and time formats. You can also create and store your own formats. To write values out using a particular form, you select the appropriate format.
For example, to display the value 1234 as $1,234.00 in a report, you can use the DOLLAR9.2 format, as shown for Total below.
Table 2.9 Format Attribute
Variable
Type
Length
Format
Informat
Label
Policy
Char
8
Policy Number
Total
Num
8
DOLLAR9.2
COMMA10.
Total Balance
Name
Char
20
Patient Name
Usually you have to specify the maximum width (w) of the value to be written. Depending on the particular format, you might also need to specify the number of decimal places (d) to be written. For example, to display the value 5678 as 5,678.00 in a report, you can use the COMMA8.2 format, which specifies a width of 8 including 2 decimal places.
Tip
You can permanently assign a format to a variable in a SAS data set, or you can temporarily specify a format in a PROC step to determine how the data values appear in output.

Informat

Whereas formats write values out using some particular form, informats read data values in certain forms into standard SAS values. Informats determine how data values are read into a SAS data set. You must use informats to read numeric values that contain letters or other special characters.
For example, the numeric value $12,345.00 contains two special characters, a dollar sign ($) and a comma (,). You can use an informat to read the value while removing the dollar sign and comma, and then store the resulting value as a standard numeric value. For Total below, the COMMA10. informat is specified.
Table 2.10 Informat Attribute
Variable
Type
Length
Format
Informat
Label
Policy
Char
8
Policy Number
Total
Num
8
DOLLAR9.2
COMMA10.
Total Balance
Name
Char
20
Patient Name

Label

A variable can have a label, which consists of descriptive text up to 256 characters long. By default, many reports identify variables by their names. You might want to replace the name with more descriptive information about the variable by assigning a label to the variable.
For example, you can label Policy as Policy Number, Total as Total Balance, and Name as Patient Name to display these labels in reports.
Table 2.11 Label Attribute
Variable
Type
Length
Format
Informat
Label
Policy
Char
8
Policy Number
Total
Num
8
DOLLAR9.2
COMMA10.
Total Balance
Name
Char
20
Patient Name
You can use labels to shorten long variable names in your reports.

SAS Indexes

An index is a separate file that you can create for a SAS data file in order to provide direct access to a specific observation. The index file has the same name as its data file and a member type of INDEX. Indexes can provide faster access to specific observations, particularly when you have a large data set. The purpose of SAS indexes is to optimize WHERE expressions and to facilitate BY-group processing. For more information, see Specifying WHERE Expressions and see Group Processing Using the BY Statement.

Extended Attributes

Extended attributes are user-defined metadata that is defined on a data set or on a variable (column). Extended attributes are represented as name-value pairs and are created using the DATASETS procedure.
Tip
You can use PROC CONTENTS to display data set and variable extended attributes.
Last updated: January 10, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.97.170