5

Importing Data into SAS

Introduction

Now that you have learned how to perform operations on built-in SASHELP data sets, it’s time to see how to import your own data into a SAS library.

SAS data sets contain two parts: the first part is called the data descriptor, also known as metadata. Metadata is a fancy word for data about your data. In the case of a SAS data set, this portion of the data set contains such information as the number of rows and columns in the table, the column names, the data type for each column (SAS has only two data types—character and numeric), and other information about when the data set was created.

The second part of a SAS data set contains the actual data values. If you tried to examine a SAS data set using another program such as Word or Notebook, it would show up as nonsense. Only SAS can read, write, and analyze data in a SAS data set. If you have data in Excel workbooks or text files, you need to convert that data into a SAS data set before you can use SAS to modify or analyze the data.

In this chapter, you will see how easy it is to import your own data from Excel workbooks, CSV files, and many other file formats such as Microsoft Access and SPSS, and create SAS data sets.

Uploading Data from Your Local Computer to SAS Studio

Before you start uploading data files and programs to the cloud, you need to create a folder to hold the files.  This is similar to folders on a Windows platform. If you open up the Server Files and Folders tab in the navigation pane, you will see the following.

Figure 5.1: Server Files and Folders Tab

Next, click the New icon as shown circled in Figure 5.2.

Figure 5.2: Click the New Icon

From the pull-down list, select Folder (Figure 5.3).

Figure 5.3: Select Folder in the Pull-Down List

You can now name your new folder.  In Figure 5.4, the new folder is called Statistics Data.

Figure 5.4: Naming Your Folder

Click Save to finish creating your folder.  Your folder now shows up in the list of folders as shown in Figure 5.5.

Figure 5.5: Your New Folder is Now Included in the List of Files

Notice that there is a small triangle to the left of the folder name.  You can expand a list by clicking this triangle.  When you expand the contents of a folder, the triangle points downward and you see the list of files.  This action is a “toggle,” that is, it will either expand or collapse the lists.

For this example, we are going to upload an Excel workbook called Grades.xlsx to the Statistics Data location.  In Figure 5.6, you see a copy of the worksheet that contains student names, ID numbers, some Quiz and exam grades.  The first row of the worksheet contains variable names (also known as column names).  The remaining rows contain data on three students (yes, it was a very small class).  The worksheet name was not changed, so it has the default name Sheet1.

Figure 5.6: Contents of the Spreadsheet Grades.xlsx

In order to permit SAS Studio to access data files stored on your hard drive, you first need to upload the files to SAS Studio.  As shown in Figure 5.7, you click the Upload icon.

Figure 5.7: Click the Upload Icon to Upload One or More Files

This brings up a screen where you can choose which files you want to upload to SAS Studio.

Figure 5.8: Choose Files Anywhere on Your Hard Drive

Click the Choose Files box and then you can choose a single file or multiple files.  There are two ways to choose multiple files, using methods familiar to Windows users.  One way is to hold down the control key and click each file that you want to upload.  The other is to click one file, hold down the Shift key, and then click another file.  All files from the first to the last will be selected.  In Figure 5.9 this latter method was used.

Figure 5.9: Choose Your File(s)

For this example, we are uploading all the files that will be used in examples in this book.  Notice the Excel workbook Grades.xlsx is included in this group of files.  Once you have selected the files that you want to upload, click the box labeled Upload.

Although the maximum file space on the SAS Studio cloud is 5 megabytes, you cannot upload more than 1 megabyte at a time.

Figure 5.10: Getting Ready to Upload Selected Files

In Figure 5.11, you see all the selected files in the Statistics Data folder. Notice that the triangle to the left of Statistics Data points downward because you clicked it to reveal the individual files.

Figure 5.11: Files Uploaded to Statistics Data

You can now use SAS Studio to analyze any of these files or write a SAS program of your own using one or more of these files.

Before you can work with any data such as an Excel workbook or a CSV file, you need to convert it into a SAS data set.  To accomplish this, you select the Import task that can be found under Tasks and Utilities task.  (See Figure 5.12 below.)

Figure 5.12: Selecting the Tasks and Utilities Task

Double click Import Data to begin converting the Excel workbook Grades.xlsx into a SAS data set.

Figure 5.13: Selecting a File to Import

You have two ways to select which file you want to import.  One is to click the Select File box on the right side of the screen—the other method is to click the Server Files and Folders tab in the navigation pane (on the left), find the file, and drag it to the drag and drop area.

Clicking Select File brings up a window where you can select a file to import.  Here it is.

Figure 5.14: Selecting the Grades Workbook

Click Grades.xlsx and then click Open.  This brings up a split window showing options for the import on the top half and the SAS program that will be generated by these options in the bottom half.

Figure 5.15: File Information

You might need to place your cursor at the bottom of this window to expand the window, enabling you see the selections for naming your output data set. There is an alternative. At the very top of the window are three tabs, labeled Settings, Code/Results, and Split.  If you click the Settings tab, you do not have to expand the window manually.

Because you only have one worksheet, you do not have to enter a worksheet name.  You probably want to change the name of the output (SAS) data set from the default name (IMPORT) to something more meaningful.  You might also want to change the library where the SAS data set will be stored from the default WORK library. Clicking the Change button brings up a list of SAS libraries (below) and enables you to do both.

Figure 5.16: Selecting a File Name and a SAS Library

For this example, you are naming the file Grades and placing it in the default WORK library.  This library is a temporary library whose contents are deleted when you exit your SAS session.  Later in this chapter you will see how to create a permanent SAS library where your files will not be deleted when you close your SAS session.

Notice that you did not have to tell SAS Studio that you were importing an Excel workbook—it figured it out by the file extension (XLSX).  SAS Studio will typically use the file extension to figure out how to import data.  Because the first row of the spreadsheet contains variable names, leave the check on the Generate SAS variable names option.  This tells the import utility to use the first row of the worksheet to generate variable names.

If you have a nonstandard file extension or if you prefer to manually select a file format, then you can use the drop-down list displayed in Figure 5.17 to instruct SAS how to convert your file.

Figure 5.17: List of Supported File Types

When all is ready, click the Run icon (Figure 5.18).

Figure 5.18: The Run Icon

SAS Studio will now list information about the Grades data set.  Selections of the output are shown in Figure 5.19.

Figure 5.19: Edited Output from the Import Task

At the bottom of this output, you see a list of the variable names, whether they are stored as numeric or character, along with some other information that we don’t need at this time.  Notice that the import utility correctly read Name as character and the other variables as numeric.

Listing the SAS Data Set

A quick way to see a listing of the Grades data set is to select the Libraries tab in the navigation pane and select My Libraries.

Figure 5.20: The Libraries Tab

Expand the Work library and double-click Grades.  It looks like Figure 5.21.

Figure 5.21: Data Set Grades in the Work Library

You can use your mouse to scroll to the right to see the rest of the table.

In this book, as well as in SAS Studio, you will see the terms SAS data set and table used interchangeably as well as these other equivalent terms: variables are also called columns and observations are called rows.

To create a nicer looking report, click the Tasks and Utilities tab of the navigation pane and select Data followed by List Data, as in Figure 5.22.

Figure 5.22: The List Data Task

Double-click List Data.  The screen that appears resembles Figure 5.23.

Figure 5.23: List Data Task

Click the icon at the far right side of the box labeled DATA (circled in the figure below).

Figure 5.24: Select Table
Figure 5.25: Select Grades in the WORK Library

Click OK.  Your next task is to select which variables you want in the listing.  Under ROLES, you see four boxes. At this time, you are only interested in the first two: List variables and Group analysis by.  Click the plus sign next to List Variables.  A list of variables from the Grades data set will show up (Figure 5.26).

Figure 5.26: Select Variables

Select the variables that you want to see in your listing.  If you want to see all of the variables, first click Name.  Then hold down the Shift key and click Final.  Click OK.  Your screen now should look like Figure 5.27.

Figure 5.27: Tables and Variables Selected

Click the Run icon and the listing of the Grades data set appears as shown in Figure 5.28.

Figure 5.28: Listing of Data Set Grades

Importing Data from a CSV File

CSV (comma-separated values) files are a popular format for external data files.  As the name implies, CSV files use commas as data delimiters.  Many websites enable you to download data as CSV files.  As with Excel workbooks, your CSV file might or might not contain variable names at the beginning of the file.  If the file does contain variable names, be sure the Generate SAS variable names options box is checked; if not, uncheck this option.

For example, take a look at the CSV file called Grades.csv in Figure 5.29.

Figure 5.29: CSV File Grades.csv

This CSV file contains the same data as the Excel workbook Grades.xlsx.  Notice that variable names are included in the file.  You can import this file and create a SAS data set, using the same steps that you used to import the Excel workbook.  The import facility will automatically use the correct code to import this data file because of the CSV file extension.  The resulting SAS data set is identical to the one shown in Figure 5.28.

Conclusion

You have seen how easy it is to import data in a variety of formats and create SAS data sets.  Even experienced programmers (at least the ones this author knows) would prefer to use the import data utility to convert external data to SAS data sets rather than writing their own code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.207.129