Modifying Character Values with Functions

SCAN Function

Overview of the SCAN Function

The SCAN function returns the nth word from a character string. The SCAN function enables you to separate a character value into words and to return a specified word.
Figure 14.16 SCAN Function
Scan Function
Note: In SAS 9.4 or later, in a DATA step, the SCAN function returns a value to a variable that has not yet been given a length, that variable is given the length of the first argument. In SAS 9.3 or earlier, by default, the variable is assigned a length of 200.
Syntax, SCAN function:
SCAN(argument,n<,<delimiters>>)
  • argument specifies the character variable or expression to scan.
  • n specifies which word to return.
  • delimiters are special characters that must be enclosed in single quotation marks (' '). If you do not specify delimiters, default delimiters are used.

Details

  • Leading delimiters before the first word in the character string do not affect the SCAN function.
  • If there are two or more contiguous delimiters, the SCAN function treats them as one.
  • If n is greater than the number of words in the character string, the SCAN function returns a blank value.
  • If n is negative, the SCAN function selects the word in the character string starting from the end of the string.

Example: Create New Name Variables

Use the SCAN function to create your new name variables for Hrd.Temp. First, examine the values of the existing Name variable to determine which characters separate the names in the values.
LastName=scan(name,1);
Notice that blanks and commas appear between the names and that the employee's last name appears first, then the first name, and then the middle name.
To create the LastName variable to store the employee's last name, you write an assignment statement that contains the following SCAN function:
LastName=scan(name,1,' ,');
Note that a blank and a comma are specified as delimiters. You can also write the function without listing delimiters, because the blank and comma are default delimiters.
LastName=scan(name,1);
Here is the complete DATA step that is needed to create LastName, FirstName, and MiddleName. Notice that the original Name variable is dropped from the new data set.
data hrd.newtemp(drop=name); 
   set hrd.temp; 
   LastName=scan(name,1); 
   FirstName=scan(name,2); 
   MiddleName=scan(name,3); 
run;

Specifying Delimiters

The SCAN function uses delimiters, which are characters that are specified as word separators, to separate a character string into words. For example, if you are working with the character string below and you specify the comma as a delimiter, the SCAN function separates the string into three words.
Specifying Delimiters
Then the function returns the word that you specify. In this example, if you specify the third word, the SCAN function returns the word HIGH.
Here is another example that uses the comma as a delimiter, and specifies that the third word be returned.
209 RADCLIFFE ROAD, CENTER CITY, NY, 92716
In this example, if you specify the third word, the word returned by the SCAN function is NY ( NY contains a leading blank).

Specifying Multiple Delimiters

When using the SCAN function, you can specify as many delimiters as needed to correctly separate the character expression. When you specify multiple delimiters, SAS uses any of the delimiters, singly or in any combination, as word separators. For example, if you specify both the slash and the hyphen as delimiters, the SCAN function separates the following text string into three words:
Specifying Multiple Delimiters
The SCAN function treats two or more contiguous delimiters, such as the parenthesis and slash below, as one delimiter. Also, leading delimiters have no effect.
Specifying Multiple Delimiters

Default Delimiters

If you do not specify delimiters when using the SCAN function, default delimiters are used. Here are the default delimiters:
blank . < ( + | & ! $ * ) ; ^ - / , %

Specifying Variable Length

If a variable is not assigned a length before it is specified in the SCAN function, the variable is given the length of the first argument. This length could be too small or too large for the remaining variables.
You can add a LENGTH statement to the DATA step, and specify an appropriate length for all three variables. Because SAS sets the length of a new character variable the first time it is encountered in the DATA step, the LENGTH statement is placed before the assignment statements that contain the SCAN function.
data hrd.newtemp(drop=name); 
   set hrd.temp; 
   length LastName FirstName MiddleName $ 10; 
   lastname=scan(name,1); 
   firstname=scan(name,2); 
   middlename=scan(name,3); 
run;

SUBSTR Function

Overview of the SUBSTR Function

The SUBSTR function extracts a substring from an argument, starting at a specific position in the string.
Figure 14.17 SUBSTR Function
SUBSTR Function
The SUBSTR function can be used on either the right or left of the equal sign to replace character value constants.
Syntax, SUBSTR function:
SUBSTR(argument, position <,n>)
  • argument specifies the character variable or expression to scan.
  • position is the character position to start from.
  • n specifies the number of characters to extract. If n is omitted, all remaining characters are included in the substring.

Example: SUBSTR Function

This example begins with the task of extracting a portion of a value. In the data set Hrd.Newtemp, the names of temporary employees are stored in three name variables: LastName, FirstName, and MiddleName.
SAS Data Set Hrd.Newtemp
However, suppose you want to modify the data set to store only the middle initial instead of the full middle name. To do so, you must extract the first letter of the middle name values and assign these values to the new variable MiddleInitial.
SAS Data Set Work.Newtemp
Using the SUBSTR function, you can extract the first letter of the MiddleName value to create the new variable MiddleInitial.
SAS Data Set Hrd.Newtemp
You write the SUBSTR function as the following:
substr(middlename,1,1)
This function extracts a character string from the value of MiddleName. The string to be extracted begins in position 1 and contains one character. This function is placed in an assignment statement in the DATA step.
data work.newtemp(drop=middlename); 
   set hrd.newtemp; 
   length MiddleInitial $ 1;
   MiddleInitial=substr(middlename,1,1); 
run;
The new MiddleInitial variable is given the same length as MiddleName. The MiddleName variable is then dropped from the new data set.
SAS Data Set Work.Newtemp
You can use the SUBSTR function to extract a substring from any character value if you know the position of the value.

Replacing Text Using SUBSTR

There is a second use for the SUBSTR function. This function can also be used to replace the contents of a character variable. For example, suppose the local phone exchange 622 was replaced by the exchange 433. You need to update the character variable Phone in Hrd.Temp to reflect this change.
SAS Data Set Hrd.Newtemp
You can use the SUBSTR function to complete this modification. The syntax of the SUBSTR function, when used to replace a variable's values, is identical to the syntax for extracting a substring.
SUBSTR(argument,position,n)
However, in this case, note the following:
  • The first argument specifies the character variable whose values are to be modified.
  • The second argument specifies the position at which the replacement is to begin.
  • The third argument specifies the number of characters to replace. If n is omitted, all remaining characters are replaced.

Positioning the SUBSTR Function

SAS uses the SUBSTR function to extract a substring or to modify a variable's values, depending on the position of the function in the assignment statement.
When the function is on the right side of an assignment statement, the function returns the requested string.
MiddleInitial=substr(middlename,1,1);
But if you place the SUBSTR function on the left side of an assignment statement, the function is used to modify variable values.
substr(region,1,3)='NNW';
When the SUBSTR function modifies variable values, the right side of the assignment statement must specify the value to place into the variable. For example, to replace the fourth and fifth characters of a variable named Test with the value 92, you write the following assignment statement:
substr(test,4,2)='92'; 

Test             Test

S7381K2S7392K2
S7381K7S7392K7
It is possible to use the SUBSTR function to replace the 622 exchange in the variable Phone. First, write this assignment statement:
data hrd.temp2; 
   set hrd.temp; 
   substr(phone,1,3)='433'; 
run;
This statement specifies that the new exchange 433 should be placed in the variable Phone, starting at character position 1 and replacing three characters.
However, executing this DATA step places the value 433 into all values of Phone. You need to replace only the values of Phone that contain the 622 exchange. To extract the exchange from Phone, add an assignment statement to the DATA step. Notice that the SUBSTR function is used on the right side of the assignment statement.
data hrd.temp2(drop=exchange); 
   set hrd.temp; 
   Exchange=substr(phone,1,3); 
   substr(phone,1,3)='433'; 
run;
Now the DATA step needs an IF-THEN statement to verify the value of the variable Exchange. If the exchange is 622, the assignment statement executes to replace the value of Phone.
data hrd.temp2(drop=exchange); 
   set hrd.temp; 
   Exchange=substr(phone,1,3); 
   if exchange='622' then substr(phone,1,3)='433'; 
run;
After the DATA step is executed, the appropriate values of Phone contain the new exchange.
SAS Data Set Hrd.Temp2
Recall that when the SUBSTR function is on the right side of an assignment statement, the function extracts a substring.
MiddleInitial=substr(middlename,1,1);
When the SUBSTR function is on the left side of an assignment statement, the function replaces the contents of a character variable.
 substr(region,1,3)='NNW';

SCAN versus SUBSTR Functions

The SUBSTR function is similar to the SCAN function. Here is a brief comparison. Both the SCAN and SUBSTR functions can extract a substring from a character value:
  • SCAN extracts words within a value that is marked by delimiters.
  • SUBSTR extracts a portion of a value by starting at a specified location.
The SUBSTR function is best used when you know the exact position of the string that you want to extract from the character value. It is unnecessary to mark the string by delimiters. For example, the first two characters of the variable ID identify the class level of college students. The position of these characters does not vary within the values of ID.
The SUBSTR function is the best choice to extract class level information from ID. By contrast, the SCAN function is best used during the following actions:
  • You know the order of the words in the character value.
  • The starting position of the words varies.
  • The words are marked by some delimiter.

LEFT and RIGHT Functions

Overview of the LEFT and RIGHT Functions

  • The LEFT function left-aligns a character expression.
    LEFT returns an argument with leading blanks moved to the end of the value.
  • The RIGHT function right-aligns a character expression.
    RIGHT returns an argument with trailing blanks moved to the start of the value.
Syntax, LEFT and RIGHT function:
LEFT(argument)
RIGHT(argument)
argument specifies a character constant, variable, or expression.

Example: LEFT Function

The following SAS statements produce these results:
SAS Statement
Result
----+----1----+
a='  
DUE DATE';
b=left(a);
put b;
 
DUE DATE

Example: RIGHT Function

The following SAS statements produce these results:
SAS Statement
Result
----+----1----+
a='Due Date  ';
b=right(a);
put a $10.;
put b $10.;
 
Due Date
  Due Date

Concatenation Operator

The concatenation operator concatenates character values. The operator can be expressed as || (two vertical bars), ¦ ¦ (two broken vertical bars), or !!( two exclamation points).
FullName = First || Middle || Last;
The length of the resulting variable is the sum of the lengths of each variable or constant in the concatenation operation, unless you use a LENGTH statement to specify a different length for the new variable.
The concatenation operator does not trim leading or trailing blanks. If variables are padded with trailing blanks, use the TRIM function to trim trailing blanks from values before concatenating them.

TRIM Function

Overview of the TRIM Function

The TRIM function removes trailing blanks from character expressions and returns one blank if the expression contains missing values.
FullName = trim(First) || trim(Middle) || Last;
The TRIM function is useful for concatenating because the concatenation operator does not remove trailing blanks.
If the TRIM function returns a value to a variable that was not yet assigned a length, by default, the variable length is determined by the length of the argument.
Syntax, TRIM function:
TRIM(argument)
argument can be any character expression. Here are examples:
  • a character variable: trim(address)
  • another character function: trim(left(id))

Example: TRIM Function

data work.nametrim;
   length Name $ 20 First Middle Last $ 10;
   Name= 'Jones, Mary Ann, Sue';
   First = left(scan(Name, 2, ','));
   Middle = left(scan(Name, 3, ','));
   Last = scan(name, 1, ',');
   FullName = trim(First) || trim(Middle) ||Last;
   drop Name;
run;

proc print data=work.nametrim;
run;
Figure 14.18 TRIM Function
TRIM Function

CATX Function

Overview of the CATX Function

The CATX function enables you to concatenate character strings, remove leading and trailing blanks, and insert separators. The CATX function returns a value to a variable, or returns a value to a temporary buffer. The results of the CATX function are usually equivalent to those that are produced by a combination of the concatenation operator and the TRIM and LEFT functions.
In the DATA step, if the CATX function returns a value to a variable that has not previously been assigned a length, then the variable is given the length of 32,767. To save storage space, you can add a LENGTH statement to your DATA step, and specify an appropriate length for your variable. Because SAS sets the length of a new character variable the first time it is encountered in the DATA step, be sure to place the LENGTH statement before the assignment statements that contain the CATX function.
If the concatenation operator (||) returns a value to a variable that has not previously been assigned a length, then that variable is given a length that is the sum of the lengths of the values that are being concatenated.
Remember that you learned to use the TRIM function along with the concatenation operator to create one address variable that contains the values of the three variables Address, City, and Zip, and to remove extra blanks from the new values. You used the DATA step shown below.
data hrd.newtemp(drop=address city state zip); 
   set hrd.temp; 
   NewAddress=trim(address)||', '||trim(city)||', '||zip; 
run;
You can accomplish the same concatenation using only the CATX function.
Syntax, CATX function:
CATX(separator,string-1 <,...string-n>)
  • separator specifies the character string that is used as a separator between concatenated strings
  • string specifies a SAS character string.

Example: Create New Variable Using CATX Function

You want to create the new variable NewAddress by concatenating the values of the Address, City, and Zip variables from the data set Hrd.Temp. You want to strip excess blanks from the old variables' values and separate the variable values with a comma and a space. The DATA step below uses the CATX function to create NewAddress.
data hrd.newtemp(drop=address city state zip); 
   set hrd.temp; 
   NewAddress=catx(', ',address,city,zip); 
run;
The revised DATA step creates the values that you would expect for NewAddress.
SAS Data Set Hrd.Newtemp

INDEX Function

Overview of the INDEX Function

The INDEX function enables you to search a character value for a specified string. The INDEX function searches values from left to right, looking for the first occurrence of the string. It returns the position of the string's first character. If the string is not found, it returns a value of 0.
Syntax, INDEX function:
INDEX(source,excerpt)
  • source specifies the character variable or expression to search.
  • excerpt specifies a character string that is enclosed in quotation marks ('').

Example: Search for Occurrences of a Phrase

Suppose you want to search the values of the variable Job, which lists job skills. You want to create a data set that contains the names of all temporary employees who have word processing experience. The following figure shows a partial list of observations in the Hrd.Temp data set.
To search for the occurrences of the phrase “word processing” in the values of the variable Job, you write the INDEX function as shown below. Note that the character string is enclosed in quotation marks.
index(job,'word processing')
To create the new data set, include the INDEX function in a subsetting IF statement. Only those observations in which the function locates the string and returns a value greater than 0 are written to the data set.
data hrd.datapool; 
   set hrd.temp; 
   where index(job,'word processing') > 0; 
run;
Here is the data set that shows the temporary employees who have word processing experience. The program processed all of the observations in the Hrd.Temp data set.
Figure 14.19 SAS Data Set Hrd.Datapool (partial output)
SAS Data Set Hrd.Datapool
Note that the INDEX function is case sensitive, so the character string that you search for must be specified exactly as it is recorded in the data set. For example, the INDEX function shown below would not locate any employees who have word-processing experience.
index(job,'WORD PROCESSING')

Finding a String Regardless of Case

To ensure that all occurrences of a character string are found, you can use the UPCASE or LOWCASE function with the INDEX function. The UPCASE and LOWCASE functions enable you to convert variable values to uppercase or lowercase letters. You can then specify the character string in the INDEX function accordingly.
index(upcase(job),'WORD PROCESSING') 

index(lowcase(job),'word processing')

FIND Function

Overview of the FIND Function

The FIND function enables you to search for a specific substring of characters within a specified character string.
  • The FIND function searches the string, from left to right, for the first occurrence of the substring, and returns the position in the string of the substring’s first character.
  • If the substring is not found in the string, the FIND function returns a value of 0.
  • If there are multiple occurrences of the substring, the FIND function returns only the position of the first occurrence.
Syntax, FIND function:
FIND(string,substring<,modifiers><,startpos> )
  • string specifies a character constant, variable, or expression that is searched for substrings.
  • substring is a character constant, variable, or expression that specifies the substring of characters to search for in string.
  • modifiers is a character constant, variable, or expression that specifies one or more modifiers.
  • startpos is an integer that specifies the position at which the search should start and the direction of the search. The default value for startpos is 1.
Note: If string or substring is a character literal, you must enclose it in quotation marks.

Details

The modifiers argument enables you to specify one or more modifiers for the function, as listed below.
  • The modifier i causes the FIND function to ignore character case during the search. If this modifier is not specified, FIND searches for character substrings with the same case as the characters in substring.
  • The modifier t trims trailing blanks from string and substring.
Here are several facts about modifiers and constants.
  • If the modifier is a constant, enclose it in quotation marks.
  • Specify multiple constants in a single set of quotation marks.
  • Modifier values are not case sensitive.
If startpos is not specified, FIND starts the search at the beginning of the string and searches the string from left to right. If startpos is specified, the absolute value of startpos determines the position at which to start the search. The sign of startpos determines the direction of the search. That is, when startpos is positive, FIND searches from startpos to the right; when startpos is negative, FIND searches from startpos to the left.

Example: Find Word Processing Jobs in a Data Set

The values of the variable Job are all lowercase. Therefore, to search for the occurrence of word processing in the values of the variable Job, you write the FIND function as shown below. Note that the character substring is enclosed in quotation marks.
find(job,'word processing')
To create the new data set, include the FIND function in a subsetting IF statement. Only those observations in which the function locates the string and returns a value greater than 0 are written to the data set.
data hrd.datapool; 
   set hrd.temp; 
   where find(job,'word processing') > 0; 
run;
proc print data=hrd.datapool;
run;
Figure 14.20 HTML Output: FIND function (partial output)
The FIND function is embedded in an IF statement and finds those values in which the string has a value of greater than 0.

UPCASE Function

The UPCASE function converts all letters in a character expression to uppercase.
Syntax, UPCASE function:
UPCASE(argument)
argument can be any SAS character expression, such as a character variable or constant.
In this example, the function is placed in an assignment statement in a DATA step. You can change the values of the variable Job in place.
data hrd.newtemp; 
   set hrd.temp; 
   Job=upcase(job); 
run;
proc print data=hrd.newtemp;
run;
The new data set contains the converted values of Job.
Figure 14.21 HTML Output: UPCASE Function (partial output)
The UPCASE function converted all titles from the Job variable to uppercase.

LOWCASE Function

The LOWCASE function converts all letters in a character expression to lowercase.
Syntax, LOWCASE function:
LOWCASE(argument)
argument can be any SAS character expression, such as a character variable or constant.
In this example, the function converts the values of the variable Contact to lowercase letters.
data hrd.newtemp; 
   set hrd.temp; 
   Contact=lowcase(contact); 
run;
proc print data=hrd.newtemp;
run;
The new data set is created.
Figure 14.22 HTML Output: LOWCASE Function (partial output)
The assignment statement in this DATA step uses the LOWCASE function to convert the values of the variable Contact to lowercase.

PROPCASE Function

The PROPCASE function converts all words in an argument to proper case (so that the first letter in each word is capitalized).
Syntax, PROPCASE function:
PROPCASE(argument<,delimiter(s)>)
  • argument can be any SAS expression, such as a character variable or constant.
  • delimiter(s) specifies one or more delimiters that are enclosed in quotation marks. The default delimiters are blank, forward slash, hyphen, open parenthesis, period, and tab.
Note: If you specify delimiter(s), then the default delimiters are no longer in effect.
  • The PROPCASE function first converts all letters to lowercase letters and then converts the first character of words to uppercase.
  • The first character of a word is the first letter of a string or any letter preceded by a default list of delimiters.
    Default delimiter List: blank / — ( . tab
Tip
Delimiters can be specified as a second argument, instead of using the default list.
In this example, the function converts the values of the variable named Contact to proper case and uses the default delimiters.
data hrd.newtemp; 
   set hrd.temp;
   Contact=propcase(contact); 
run;
proc print data=hrd.newtemp;
run;
After the DATA step executes, the new data set is created.
Figure 14.23 HTML Output: PROPCASE Function
PROPCASE function converts the values in the variable Contact to proper case and uses the default delimiters.

TRANWRD Function

Overview of the TRANWRD Function

The TRANWRD function replaces or removes all occurrences of a word in a character string. The translated characters can be located anywhere in the string.
Figure 14.24 TRANWRD Function
TRANWRD function
Syntax, TRANWRD function:
TRANWRD(source,target,replacement)
  • source specifies the source string that you want to translate.
  • target specifies the string that SAS searches for in source.
  • replacement specifies the string that replaces target.
Note: target and replacement can be specified as variables or as character strings. If you specify character strings, be sure to enclose the strings in quotation marks (' ' or " ").
In a DATA step, if the TRANWRD function returns a value to a variable that has not previously been assigned a length, then that variable is given a length of 200 bytes. To save storage space, you can add a LENGTH statement to the DATA step and specify an appropriate length for the variable. SAS sets the length of a new character variable the first time it is encountered in the DATA step. Be sure to place the LENGTH statement before the assignment statements that contain the TRANWRD function.

Example: Update Variables in Place Using TRANWRD Function

You can use TRANWRD function to update variables in place. In this example, the function updates the values of Name by changing every occurrence of the string Monroe to Manson.
name=tranwrd(name,'Monroe','Manson')
Another example of the TRANWRD function is shown below. In this case, two assignment statements use the TRANWRD function to change all occurrences of Miss or Mrs. to Ms.
data hrd.after; 
   set hrd.before; 
   name=tranwrd(name,'Miss','Ms.'); 
   name=tranwrd(name,'Mrs.','Ms.'); 
run;
proc print data=work.after;
run;
The new data set is created. The TRANWRD function changes all occurrences of Miss or Mrs. to Ms.
Figure 14.25 HTML Output: TRANWRD Function
TRANWRD function

COMPBL Function

The COMPBL function removes multiple blanks from a character string by translating each occurrence of two or more consecutive blanks into a single blank.
Syntax, COMPBL function:
COMPBL(source)
  • source specifies a character constant, variable, or expression to compress.
In a DATA step, if the COMPBL function returns a value to a variable that has not previously been assigned a length, then the length of that variable defaults to the length of the first argument.
The following SAS statements produce these results:
SAS Statement
Result
----+----1----+-—-2--
string='Hey
 Diddle  Diddle';
string=compbl(string);
put string;
Hey Diddle Diddle
string='125    E Main St';
length address $10;
address=compbl(string);
put address;
125 E Main

COMPRESS Function

Overview of the COMPRESS Function

The COMPRESS function returns a character string with specified characters removed from the original string. Null arguments are allowed and treated as a string with a length of zero.
The COMPRESS function returns a character string with specified characters removed from the original string.
Syntax, COMPRESS function:
COMPRESS(source<, characters> <, modifier(s)>)
  • source specifies a character constant, variable, or expression from which specified characters are removed.
  • characters specifies a character constant, variable, or expression that initializes a list of characters.
    By default, the characters in this list are removed from the source argument. If you specify the K modifier in the third argument, then only the characters in this list are kept in the result.
    Note: You can add more characters to this list by using other modifiers in the third argument.
    Note: Enclose a literal string of characters in quotation marks.
  • modifier specifies a character constant, variable, or expression in which each non-blank character modifies the action of the COMPRESS function. Blanks are ignored.
Based on the number of arguments, the COMPRESS functions works as follows:
Number of Arguments
Result
only the first argument, source
All blanks have been removed from the argument. If the argument is completely blank, then the result is a string with a length of zero. If you assign the result to a character variable with a fixed length, then the value of that variable will be padded with blanks to fill its defined length.
the first two arguments, source and chars
All characters that appear in the second argument are removed from the result.
three arguments, source, chars, and modifier(s)
The K modifier (specified in the third argument) determines whether the characters in the second argument are kept or removed from the result.
The COMPRESS function compiles a list of characters to keep or remove, comprising the characters in the second argument plus any types of characters that are specified by the modifiers. For example, the D modifier specifies digits. Both of the following function calls remove digits from the result:
compress(source, "1234567890");
compress(source, , "d");
To remove digits and plus or minus signs, you can use either of the following function calls:
compress(source, "1234567890+-");
compress(source, "+-", "d");

Example: Compress a Character String

data _null_;
   a='A B C D';
   b=compress(a);
   put b=;
run;
The following SAS statements produce this result:
Table 14.12 Compressing Blanks
SAS Statement
Result
----+----1
a='A B C D ';
b=compress(a);
put b;
 
ABCD
Last updated: January 10, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.188.201