Reading Microsoft Excel Data

Decide How to Reference Your Data

Use the DATA step to create a SAS data set whether the input data source is a SAS data set, a raw data file, or a file from another application. The difference between reading these various types of input is in how you reference the data. You can use one of the following methods to read data:
  • SAS/ACCESS LIBNAME statement
  • PROC IMPORT
The Base SAS LIBNAME statement associates a SAS name (libref) with a SAS data library by pointing to its physical location. But, the SAS/ACCESS LIBNAME statement associates a SAS name with an Excel workbook file by pointing to its location.
In doing so, the Excel workbook becomes a new library in SAS, and the worksheets in the workbook become the individual SAS data sets in that library.
The figure below illustrates the difference between how the two LIBNAME statements treat the data.
Figure 6.15 Comparing LIBNAME Statements
Comparing LIBNAME Statements
The next figure shows how the DATA step is used with three types of input data.
Figure 6.16 Using the DATA Step with Different Types of Output
Using the DATA step with Different Types of Output
Notice how the INFILE and INPUT statements are used in the DATA step for reading raw data, but the SET statement is used in the DATA step for reading in the Excel worksheets.

Running SAS with Microsoft Excel

  • You must have licensed SAS/ACCESS Interface to PC Files to use a SAS/ACCESS LIBNAME statement that references an Excel workbook.
  • If you are running SAS 9.1 or earlier and want to read in Microsoft Excel data, you must use Microsoft Excel 2003 or earlier.
  • To read Microsoft Excel 2007 or later data, you must be running SAS 9.2 or later.
  • The examples in this section are based on SAS 9.4 running with Microsoft Excel 2016.

Steps for Reading Excel Data

To read the Excel workbook file, SAS must receive the following information in the DATA step:
  • a libref to reference the Excel workbook to be read
  • the name and location (using another libref) of the new SAS data set
  • the name of the Excel worksheet that is to be read
The table below outlines the basic statements that are used in a program that reads Excel data and creates a SAS data set from an Excel worksheet. The PROC CONTENTS and PROC PRINT statements are not requirements for reading in Excel data and creating a SAS data set. However, these statements are useful for confirming that your Excel data has successfully been read into SAS.
Table 6.6 Basic Steps for Reading Excel Data into a SAS Data Set
Task
Statement
Example
Reference an Excel workbook file
SAS/ACCESS LIBNAME statement
LIBNAME results
'c:certdataexercise.xlsx';
Write out the contents of the SAS Library
PROC CONTENTS
proc contents data=results._all_;
Execute the PROC CONTENTS statement
RUN statement
run;
Name and create a new SAS data set
DATA statement
data work.stress;
Read in an Excel worksheet (as the input data for the new SAS data set)
SET statement
set results.'ActLevel$'n;
Execute the DATA step
RUN statement
run;
View the contents of a particular data set
PROC PRINT
proc print data=stress;
Execute the PROC PRINT statement
RUN statement
run;

The SAS/ACCESS LIBNAME Statement

To extend the SAS global LIBNAME statement and assign a libref to a relational DBMS, use the SAS/ACCESS LIBNAME statement. LIBNAME associates a SAS libref with a DBMS database, schema, server, or a group of tables and views.
Syntax, SAS/ACCESS LIBNAME statement:
LIBNAME <libref>XLSX <'physical-path-and-filename.xls'><options>;
  • libref is a name that you associate with an Excel workbook.
  • XLSX is the name of the SAS/ACCESS engine.
  • 'physical-path-and-filename.xls' is the physical location of the Excel workbook.
Example:
libname results XLSX 'c:certdataexercise.xlsx';
Note: The EXCEL engine (also called the XLSX engine) requires single quotation marks for physical-path-and-filename.xls.

Referencing an Excel Workbook

Overview

This example uses data similar to the scenario used for the raw data in the previous section. The data shows the readings from exercise stress tests that have been performed on patients at a health clinic.
The stress test data is located in an Excel workbook named exercise.xlsx (shown below), which is stored in the location c:users.
Figure 6.17 Excel Workbook
Excel Workbook
In the sample worksheet above, the date column is defined in Excel as dates. If you right-click the cells and select Format Cells, the cells have a category of Date. SAS reads this data just as it is stored in Excel. If the date had been stored as text in Excel, then SAS would have read it as a character string.
To read in this workbook, create a libref to point to the workbook's location:
libname results XLSX 'c:certdataexercise.xlsx';
The LIBNAME statement creates the libref Results, which points to the Excel workbook exercise.xlsx. The workbook contains two worksheets, Tests and Adv, which are now available in the new SAS library (Results) as data sets.

Referencing an Excel Workbook in the DATA Step

SET Statement

Use the SET statement to indicate which worksheet in the Excel file you want to read.
data work.stress;
   set results.'ActivityLevels'n;
run;
In this example, the DATA statement tells SAS to name the new data set, Stress, and store it in the temporary library Work. The SET statement in the DATA step specifies the libref (the reference to the Excel file) and the worksheet name as the input data.
You can use several statements in the DATA step to subset your data as needed. Here, the WHERE statement is used with a variable to include only those participants whose activity level is HIGH.
data work.stress;
   set results.'ActivityLevels'n;
   where ActLevel='HIGH';
run;
The figure below shows the partial output for this DATA step in table format.
Figure 6.18 DATA Step Output (partial output)
DATA Step Output

Name Literals

The LIBNAME statement created a permanent library, Results, which is the libref for the workbook file and its location. The new library contains two SAS data sets, which access the data from the Excel worksheets.
The Excel worksheet names have the special character ($) at the end. All Excel worksheets are designated this way. But remember, special characters such as these are not allowed in SAS data set names by default. So, in order for SAS to allow this character to be included in the data set name, you must assign a name literal to the data set name. A SAS name literal is a name token that is expressed as a string within quotation marks, followed by the uppercase or lowercase letter n. The name literal tells SAS to allow the special character ($) in the data set name.
Figure 6.19 Name Literal
Name Literal

Named Ranges

A named range is a range of cells within a worksheet that you define in Excel and assign a name to. In the example below, the worksheet contains a named range, tests_week_1, which SAS recognizes as a data set.
The named range, tests_week_1, and its parent worksheet, tests, appear in the SAS windowing environment as separate data sets, except that the data set that was created from the named range has no dollar sign ($) appended to its name.
For more information about named ranges, see your Microsoft Excel documentation.
Figure 6.20 Named Range
Named Range
Tip
Named ranges are available as data sets even though they do not appear in PROC CONTENTS.

Printing an Excel Worksheet as a SAS Data Set

After using the DATA step to read in the Excel data and create a SAS data set, you can use PROC PRINT to produce a report that displays the data set values.
You can also use the PRINT procedure to refer to a specific worksheet. Remember to use the name literal when referring to a specific Excel worksheet. In the example below, the first PRINT statement displays the data values for the new data set that was created in the DATA step. The second PRINT statement displays the contents of the Excel worksheet that was referenced by the LIBNAME statement.
proc print data=work.stress;
run;
proc print data=results.ActivityLevels'n; 
run;

Using PROC CONTENTS

In the example below, PROC CONTENTS displays a description of the SAS data set. The example also includes a LABEL= data set option that gives a label for the entire data set while the LABEL statement assigns labels to individual labels. The INFORMAT and FORMAT statements associate informats or formats with variables. When an INFORMAT or FORMAT statement is specified in a DATA step, the name of that statement can be used in PROC steps to customize your report.
data funnies (LABEL= 'Comics Character Data');
	input Id Name $ Height Weight DoB MMDDYY8. @@;
	Label Id = 'Identification no.';
		  Height = 'Height in inches';
		  Weight = 'Weight in pounds';
		  DoB = 'Date of Birth';
		INFORMAT DoB MMDDYY8.;
		FORMAT DoB WORDDATE18.;
Datalines;
53 Susie 42 41 07-11-93 	
54 Charlie 46 55 10-26-90
55 Calvin 40 35 01-10-91	
56 Lucy 46 52 01-13-95
;
proc contents data=funnies;
run;
Tip
Use _ALL_ keyword to produce information about a data library and its contents.
Figure 6.21 CONTENTS Procedure Output
CONTENTS Procedure Output
Note: This example demonstrates reading data that is not aligned in columns. To learn about reading data that is not aligned, see Reading and Creating Raw Data in Free-Format.

Disassociating a Libref

If SAS has assigned a libref to an Excel workbook, the workbook cannot be opened in Excel. To disassociate a libref, use a LIBNAME statement, specify the libref, and use the CLEAR option.
libname results XLSX "c:usersexercise.xlsx";
   proc print data=results.'tests'n;
run;

libname results clear;
SAS disconnects from the data source and closes any resources that are associated with that libref’s connection.
Last updated: January 10, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.235.144