Importing data is the first step in analyzing data. It is important that you have reliable and relevant data. You should be able to import data correctly because the computer processes what data you input. If the imported data is faulty, the analysis that you will receive after performing various tasks on it will also be erroneous and misleading.
This concept is also commonly known as GIGO (Garbage In Garbage OUT). Therefore, the input step is one of the most important steps in the data science pipeline. There could also be different ways to input data in R and SAS from files or from data connections. Importing of datasets calls for certain functions in R whereas it calls for certain procedures for the same in SAS.
Importing of data in R can be done using certain packages and functions, and to use those packages, we need to install them in our application.
Installing a package has the following command in R:
After installation to use this package you must load that package. Loading a package means getting the package in active state (session). To load a package use:
Updating a package:
Note that we install the package only once, we update it occasionally and we load it every time we begin a R session. To unload a package, we use:
To uninstall a package we use:
Here we study multiple ways to input data.
In SAS to save space you can put this in the beginning options compress = yes;
Data Input in SAS manually has been an easy task, and there are a certain set of examples where you can easily learn how to input data in SAS.
The INPUT statement reads raw data from instream data lines or external files into a SAS dataset. Data input is the first step for every analysis, without any dataset or data there can be no analysis of any kind. Data input can be done in various forms. Let’s look at a few examples of data input in SAS.
In the examples given below, we have input normal numerical data, strings, names etc.
The code above creates a dataset named first.
Code to import in SAS is different from R because in R we use functions whereas in SAS we generally call procedures. R is an object‐oriented language.
Using the Import Wizard is an easy and straightforward way to import existing data with well‐behaved formatting into SAS. There are other methods for importing data into SAS like proc. import, or even entering raw observations into SAS itself to create a new dataset.
These methods of importing or creating data can give you greater control over how to read variables (the informats), how to write the variables (the formats), how to parse the data (delimited, aligned, repetition, etc.), and more.
Here we use the proc. import step to import a raw data file and save it as an SAS dataset. DBMS is used to specify the file type, e.g.: CSV, XLS etc.
getnames = yes is to specify that the first row contains column names.
Note: The type of dataset created (temporary or permanent) depends on the name you specify in the out = statement.
A permanent dataset has to be referenced by a two‐level name: ‐ library_name.data_set_name whereas a temporary dataset just has a one‐level name.
There are a number of ways to import data into R, and several formats are available:
https://rforanalytics.wordpress.com/useful‐links‐for‐r/odbc‐databases‐for‐r
Let us explore some of the ways to import data in R.
There are three functions which can be used to import csv files in R:
fread and read_csv are the fastest of all these.
You can use the system. Time() function to verify that as follows:
We need to install readxl package and use the read_excel function to import .xls or .xlsx types of files.
Example: To import sheet 1 of an excel file with the first row as column names
We can also use sheet names put within double quotes instead of the sheet number to specify the sheet we want from any excel file.
read.sas7bdat() from sas7bdat package is used to import .sas7bdat files
We use the read.spss () and read.dta() function from foreign package to import SPSS and STATA files respectively.
Assigning in R has the following syntax:
The following code is used to assign the imported file to an object.
Similarly, data read using other functions can be assigned to R objects.
Note: Each of the functions used to import data discussed above take in more parameters which define certain formatting to be done on the data while importing.
To manually input we use the following
We can do the same for other types of data except string variables which will be in quotes (i.e. “ten”)
We can also create datasets, vectors or matrices by using the input value given by us.
We can input numerical, dates and string values as follows:
NA in R signifies missing values (in SAS a missing value is denoted by a single period.)
is.na() function is used to detect missing values in the vector.
This creates a data frame with two columns as follows:
or, we can create a matrix using:
This code makes a matrix with values in c() arranged in three rows and two columns arranged column wise. Note: vector and matrix must have all values of the same type but data frames can have values of different types.
Data Input in SAS manually has been an easy task, however, there are a certain set of examples where you can easily learn how to input data in SAS.
The INPUT statement reads raw data from instream data lines or external files into an SAS dataset. Data input is the first step for every analysis; without any dataset or data there can be no analysis of any kind. Data input could be done in various forms lets see few examples of data input in SAS.
In the examples given below, we have input normal numerical data, strings, names etc.
This code creates a dataset named first.
$ sign is used to specify that the variable it follows is a character variable
Missover option is used to prevent the data step from going to the next line if it does not find values for all variables in the input statement in the current record. Here the dsd option is used to treat commas as separator characters.
After importing the data, the next important step is to print that data to have a look at the type of data you now have to analyze.
Printing the dataset in SAS involves calling the print procedure in SAS. The code below will help you print the whole dataset named ajaydat.
The code below will help you print the first five observations of the dataset named ajaydat.
The code below will help you print the observations ranging from 10 to 20 for dataset ajaydat.
In R, printing of data does not need any function or package. You simply write the dataset name and then run it to print the data.
If you read data in mydata and write the data_set name:
The whole data in mydata will be printed at console.
Only the first observation of mydata is printed to the console. Default value of n is 6.
Observations ranging from 10 to 20 would be displayed.
Importing data in R requires a variety of functions to import different types of files whereas proc. import is used with different options or parameters to import any type of file in SAS. Data input in R is done using the c() function and using a data step with input option in SAS. In R, printing a dataset just requires the writing of the name of the dataset and running it, whereas SAS uses proc. print to print any dataset.
3.149.214.21