;
proc datasets library=work nolist;
modify Singers1;
ic create primary key (FirstName LastName); 1
run;
modify Singers2;
ic create foreign key (FirstName LastName) references Singers1
on delete restrict on update restrict; 2
run;
modify Singers2;
ic create primary key (LastName FirstName); 3
run;
modify Singers1;
ic create foreign key (LastName FirstName) references Singers2
on delete restrict on update restrict; 4
run;
quit;
1
Defines a primary key constraint for data set Singers1, for variables FirstName and
LastName.
2
Defines a foreign key constraint for data set Singers2 for variables FirstName and
LastName that references the primary key defined in Step 1. Because the intention is
to define a primary key using the same variables, the foreign key update and delete
referential actions must both be RESTRICT.
3
Defines a primary key constraint for data set Singers2 for variables LastName and
FirstName. Because those exact same variables are already defined as a foreign key,
the order must be different.
4
Defines a foreign key constraint for data set Singers1 for variables LastName and
FirstName that references the primary key defined in Step 3. Because those exact
same variables are already defined as a primary key, the order must be different.
Because a primary key is already defined using the same variables, the foreign key's
update and delete referential actions must both be RESTRICT.
Understanding SAS Indexes
Definition of SAS Indexes
An index is an optional file that you can create for a SAS data file in order to provide
direct access to specific observations. The index stores values in ascending value order
for a specific variable or variables and includes information as to the location of those
values within observations in the data file. In other words, an index enables you to locate
an observation by value.
For example, suppose that you want the observation with SSN (Social Security number)
equal to 123-45-6789:
Without an index, SAS accesses observations sequentially in the order in which they
are stored in the data file. SAS reads each observation, looking for
SSN=123-45-6789 until all observations are read.
638 Chapter 26 SAS Data Files
With an index on variable SSN, SAS accesses the observation directly. SAS satisfies
the condition using the index and goes straight to the observation that contains the
value without having to read each observation.
You can either create an index when you create a data file or create an index for an
existing data file. The data file can be either compressed or uncompressed. For each data
file, you can create one or multiple indexes. Once an index exists, SAS treats it as part of
the data file. That is, if you add or delete observations or modify values, the index is
automatically updated.
Benefits of an Index
In general, SAS can use an index to improve performance in the following situations:
For WHERE processing, an index can provide faster and more efficient access to a
subset of data. To process a WHERE expression, SAS by default decides whether to
use an index or to read the data file sequentially.
For BY processing, an index returns observations in the index order, which is in
ascending value order, without using the SORT procedure even when the data file is
not stored in that order.
Note: If you use the SORT procedure, the index is not used.
For the SET and MODIFY statements, the KEY= option enables you to specify an
index in a DATA step to retrieve particular observations in a data file.
In addition, an index can benefit other areas of SAS. In SCL (SAS Component
Language), an index improves the performance of table lookup operations. For the SQL
procedure, an index enables the software to process certain classes of queries more
efficiently (for example, join queries). For the SAS/IML software, you can explicitly
specify that an index be used for read, delete, list, or Append operations.
Even though an index can reduce the time required to locate a set of observations,
especially for a large data file, there are costs associated with creating, storing, and
maintaining the index. When deciding whether to create an index, you must consider
increased resource usage, along with the performance improvement.
Note: An index is never used for the subsetting IF statement in a DATA step, or for the
FIND and SEARCH commands in the FSEDIT procedure.
The Index File
The index file is a SAS file that has the same name as its associated data file, and that
has a member type of INDEX. There is only one index file per data file. That is, all
indexes for a data file are stored in a single file.
The index file might be a separate file, or be part of the data file, depending on the
operating environment. In any case, the index file is stored in the same SAS library as its
data file.
The index file consists of entries that are organized hierarchically and connected by
pointers, all of which are maintained by SAS. The lowest level in the index file hierarchy
consists of entries that represent each distinct value for an indexed variable, in ascending
value order. Each entry contains this information:
a distinct value
Understanding SAS Indexes 639
one or more unique record identifiers (referred to as a RID) that identifies each
observation containing the value. (Think of the RID as an internal observation
number.)
That is, in an index file, each value is followed by one or more RIDs, which identify the
observations in the data file that contains the value. (Multiple RIDs result from multiple
occurrences of the same value.) For example, the following represents index file entries
for the variable LastName:
Table 26.8 Index File Entries
Value Record Identifier
Avery 10
Brown 6, 22, 43
Craig 5, 50
Dunn 1
When an index is used to process a request, such as a WHERE expression, SAS
performs a binary search on the index file and positions the index to the first entry that
contains a qualified value. SAS then uses the value's RID to read the observation that
contains the value. If a value has more than one RID (such as in the value for Brown in
the previous example), SAS reads the observation that is pointed to by the next RID in
the list. The result is that SAS can quickly locate the observations that are associated
with a value or range of values.
For example, using an index to process the WHERE expression, SAS positions the index
to the index entry for the first value greater than 20 and uses the value's RID or RIDs to
read the observation or observations where age > 20 and age < 35;. SAS then
moves sequentially through the index entries reading observations until it reaches the
index entry for the value that is equal to or greater than 35.
SAS automatically keeps the index file balanced as updates are made, which means that
it ensures a uniform cost to access any index entry, and all space that is occupied by
deleted values is recovered and reused.
Types of Indexes
Simple and Composite Indexes
When you create an index, you designate which variable or variables to index. An
indexed variable is called a key variable. You can create two types of indexes:
a simple index, which consists of the values of one variable
a composite index, which consists of the values of more than one variable, with the
values concatenated to form a single value
In addition to deciding whether you want a simple index or a composite index, you can
also limit an index (and its data file) to unique values and exclude from the index
missing values.
640 Chapter 26 SAS Data Files
Simple Index
The most common index is a simple index, which is an index of values for one key
variable. The variable can be numeric or character. When you create a simple index,
SAS assigns to the index the name of the key variable.
The following example shows the DATASETS procedure statements that are used to
create two simple indexes for variables Class and Major in data file College.Survey:
proc datasets library=college;
modify survey;
index create class;
index create major;
run;
To process a WHERE expression using an index, SAS uses only one index. When the
WHERE expression has multiple conditions using multiple key variables, SAS
determines which condition qualifies the smallest subset. For example, suppose that
College.Survey contains the following data:
42,000 observations contain class=12
6,000 observations contain major='Biology'
350 observations contain both class=12 and major='Biology'
With simple indexes on Class and Major, SAS would select Major to process the
following WHERE expression.
where class=12 and major='Biology';
Composite Index
A composite index is an index of two or more key variables with their values
concatenated to form a single value. The variables can be numeric, character, or a
combination. An example is a composite index for the variables LastName and
FirstName. A value for this index consists of the value for LastName immediately
followed by the value for FirstName from the same observation. When you create a
composite index, you must specify a unique index name.
The following example shows the DATASETS procedure statements that are used to
create a composite index for the data file College.MailList, specifying two key variables:
ZipCode and SchoolId.
proc datasets library=college;
modify maillist;
index create zipid=(zipcode schoolid);
run;
Often, only the first variable of a composite index is used. For example, for a composite
index on ZipCode and SchoolId, the following WHERE expression can use the
composite index for the variable ZipCode because it is the first key variable in the
composite index:
where zipcode = 78753;
However, you can take advantage of all key variables in a composite index by how you
construct the WHERE expression, which is referred to as compound optimization.
Compound optimization is the process of optimizing multiple WHERE expression
conditions using a single composite index. If you issue the following WHERE
expression, the composite index is used to find all occurrences where the ZIP code is
78753 and the school identification number is 55. In this way, all of the conditions are
satisfied with a single search of the index:
Understanding SAS Indexes 641
where zipcode = 78753 and schoolid = 55;
When you are deciding whether to create a simple index or a composite index, consider
how you will access the data. If you often access data for a single variable, a simple
index will do. But if you frequently access data for multiple variables, a composite index
could be beneficial.
Unique Values
Often it is important to require that values for a variable be unique, like Social Security
number and employee number. You can declare unique values for a variable by creating
an index for the variable and including the UNIQUE option. A unique index guarantees
that values for one variable or the combination of a composite group of variables remain
unique for every observation in the data file. If an update tries to add a duplicate value to
that variable, the update is rejected.
The following example creates a simple index for the variable IdNum and requires that
all values for IdNum be unique:
proc datasets library=college;
modify student;
index create idnum / unique;
run;
Missing Values
If a variable has a large number of missing values, it might be desirable to keep them
from using space in the index. Therefore, when you create an index, you can include the
NOMISS option to specify that missing values are not maintained by the index.
The following example creates a simple index for the variable Religion and specifies that
the index does not maintain missing values for the variable:
proc datasets library=college;
modify student;
index create religion / nomiss;
run;
In contrast to the UNIQUE option, observations with missing values for the key variable
can be added to the data file, even though the missing values are not added to the index.
SAS does not use an index that was created with the NOMISS option to process a BY
statement or to process a WHERE expression that qualifies observations that contain
missing values. If no missing values are present, SAS considers using the index in
processing the BY statement or WHERE expression.
In the following example, the index Age was created with the NOMISS option and
observations exist that contain missing values for the variable Age. In this case, SAS
does not use the index:
proc print data=mydata.employee;
where age < 35;
run;
Deciding Whether to Create an Index
Costs of an Index
An index exists to improve performance. However, an index conserves some resources
at the expense of others. Therefore, you must consider costs associated with creating,
642 Chapter 26 SAS Data Files
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.119.170