There
are many situations in business statistics where you wish to create
a new variable that is, in effect, a transformation of an existing
variable’s data. Here are some initial examples:
-
Creating an index such as a financial
ratio (such as creating a price/earnings ratio from two columns containing
price and earnings data, respectively).
-
Creating mathematical transformations
of variables, such as a new variable that is the square root or log
of another variable.
-
Using the birthdates of people
to create a new column that, on a consistently updating basis, calculates
their ages.
In addition, you can
change and manipulate existing variables in SAS.
In our main textbook
example, so far we have two major types of such tasks:
-
Creating new variables that reverse the data
of reverse-worded survey questions. Specifically,
Satisfaction04 is a reverse-worded survey item (see Chapter 4 and
Chapter 9 for more on this), which required us to create a new variable
that reverses its data.
-
Creating two new factor variables, which are
the aggregation of multi-item scales. Trust and
satisfaction ultimately needed to be created as factors which are
an average of the individual multi-item scores. (Of course, we can’t
do this step without having assessed internal reliability. Again,
see Chapters 4 and 9).
One of the many things
SAS is brilliant at is data manipulation. You can manipulate data
by using the SAS point-and-click interfaces like SAS Enterprise Guide,
but it is quicker and easier to use code in programs like SAS 9 or
SAS Studio. The DATA step in SAS not only creates new datasets or
edits existing ones, but manipulates data columns or rows.
Figure 6.5 Example of creating new variables in the SAS DATA step shows a sample
SAS data step in which the new dataset is created based on an existing
dataset (specifically, we create a dataset called “Transformed”
in the Work library because no library is specified, and we copy and
paste everything from the Textbook.Data02_Cleaned dataset using the
SET statement).
Then, each subsequent
line creates a new variable:
-
We create a new variable called
“Rev_Satisfaction04” that takes the data from the existing
variable Satisfaction04 and reverses it using the principles discussed
in Chapter 4.
-
We create new variables called
“Trust” and “Satisfaction” that are averages
of some of the individual currently existing multi-item scale columns.
Note the way the average works. Also, note here that I have only averaged
the values for Satisfaction01-Satisfaction03; see Chapter 9 a little
later for why.
-
We create two new mathematical
transformations of the Sales variable, one the natural log and one
for the square (each Sales number to the power of two).
-
We create several conditional variables
using the IF-THEN concept, where the new variable only takes on a
certain value if a given condition is true. In the first of these,
we create a new variable called “Premium” that will
have the value 1 whenever the currently existing License variable
contains the value “Premium” in a row, and takes the
value 0 for all rows where License is not “Premium.”
Take note of the following
programming notes about this sort of programming:
-
Take another look at the IF-THEN
statements in
Figure 6.5 Example of creating new variables in the SAS DATA step.
Note here that this is the only situation in which capitalization
counts in SAS. Take the example of the
if License
= “Premium” section of the code.
Here, we are asking SAS to go look in the dataset for all rows where
this exact condition is true
including the exact
capitalization of “Premium,” and
then apply the result only in those rows. If there are also entries
in the License column spelled “premium” then the above
condition will not identify these rows. So, be careful of capitalization
in these situations only.
-
As always, note that all statements
are separated by semicolons and the entire set ends with a “Run;”
statement.
You could do so much
more. For instance, you could create a new variable that is the sum
of other variables (replace MEAN in the above code with SUM). You
can identify rows to delete based on certain rules. SAS has an almost
endless set of possible variable manipulations – see the SAS
helpfiles (notably SAS/STAT 13.2 User’s Guide) for more.
Once you have told SAS
what you want to do, submit the code using the Run button as seen
above. Once you have done so, always check the log for errors and
always open the new dataset to check that it is right. (And then close
it: an open dataset in SAS cannot be replaced).
You can see the code
from this section in the textbook resources files, under “Code06
Manipulating data example.”