4
Handling Dates, Strings, Numbers

So far, we have learnt how to import data and create your own data in R and SAS, along with data inspection and cleaning. Here, we will learn how to work with different kinds of data, for example, dates, strings and numbers and how to convert one data format to another in both R and SAS. This includes handling numeric data, manipulating string/character variables (i.e. by extracting a substring of a string variable), handling different types of date format and numeric calculations with dates (i.e. difference between dates) as well as categorical data.

4.1 Working with Numeric Data

Integral types represent only whole numbers (positive, negative, and zero), and nonintegral types represent numbers with both integer and fractional parts.

4.1.1 Handling Numbers in SAS

Assigning Numeric Values to Variables

We also use input to convert character to numeric variable, but in SAS we need to specify the correct type of informat as well.

COMMA ELIMINATION USING INFORMAT

DOLLAR AND COMMA ELIMINATION USING INFORMAT

NUMERIC TO CHARACTER USING PUT

4.1.2 Numeric Data in R

Using R As Calculator

Assignment Of Numeric Values

Arithmetic with Numeric Variables

Numeric Vector

Naming A Vector

Convert Numeric to String

4.2 Working with Date Data

Dates are a special case of numeric data because dates have multiple formats. Date variables can pose a challenge in data management. Date data are quite critical for industries like finance, telecom, sales etc. R and SAS provide several options for dealing with date and datetime data.

4.2.1 Handling Dates in SAS

SAS date value is a value that represents the number of days between 1 January 1960, and a specified date. Dates before 1 January 1960, are negative numbers; dates after are positive numbers. Various SAS language elements handle SAS date values: functions, formats and informats.

READING DATE

The format used here is date9. which is used when specifying the first three letters of month, in total making it nine characters.

DATE9. is an INFORMAT. In an INPUT statement, it provides the SAS interpreter with a set of translation commands it can send to the compiler to turn your text into the right numbers, which will then look like a date once the right FORMAT is applied.

INFORMAT tells the compiler how to read data while FORMAT tells the compiler how to write data. FORMATs are just visible representations of numbers (or characters).

Changing Date Formats

By default, ddmmyy. reads seven characters. To specify year with century use ddmmyy10.

Calculating Differences Between Dates

Here today() in SAS is like Sys.Date() in R. We simply subtract dates in SAS (instead of using difftime in R)

Using Intck Option

We can use datepart to extract date and use intck to obtain difference in dates

4.2.2 Handling Dates in R

Having your dates in the proper format allows R to know that they are dates, and what calculations it should perform on them.

The builtin as.Date function handles dates (without times). The as.Date function allows a variety of input formats through the format = argument.

The default format is a four‐digit year, followed by a month, then a day, separated by either dashes or slashes.

Consider another example for the usage of format option in as.Date function.

tz option is used to specify a time zone.

CONVERTING DATE TO NUMERIC VALUE

R stores dates using 1 January 1970 as the origin. When R looks at dates as integers it calculates the days passed since 1 January 1970.

CONVERTING DATE TO CHARACTER VALUE

USING Sys.Date FUNCTION

To get the current date, the Sys.Date function will return a Date object.

Using Posixct Function

Date and time are stored using POSIXct function but it can parse only the format of YYYY‐MM‐DD HH:MM:SS. strptime is used for different formats.

The formatting and order variations of the date pieces is what strptime deals with so as.POSIXct can recognize the date.

Giving Time Differences Between Two Dates using difftime

Lubridate Package

R is greatly simplified and augmented by the lubridate package. It has many functions but primarily the format either says ‘dmy’, ‘mdy’, “ymd” which can be easily used in R.

Giving Time Differences using Lubridate Package

4.3 Handling Strings Data

A vast amount data is in the form of text particularly emails, documents and the Internet. The ability to manipulate string data types is critical to a data scientist.

A “string” is a collection of characters that make up one element of a vector. You can tell a string because it will be mostly be enclosed in (double) quotation marks.

4.3.1 Handling Strings Data in SAS

Assigning String Values to Variables

Using SUBSTRN option

Using Trimn Function

CONCATENATION OF STRING

TO CONVERT CHARACTER TO NUMERIC

Replace One Value of String with Another‐ Here, translate function replaces a with &

Eliminating Whitespace using compress

4.3.2 Handling Strings Data in R

In R, a piece of text is represented as a sequence of characters (letters, numbers, and symbols). The data type R provided for storing sequences of characters is character. Formally, the mode of an object that holds character strings in R is “character”.

You express character strings by surrounding text within double quotes or single quotes.

ASSIGNING STRING VALUE TO AN OBJECT:

SPECIAL CHARACTERS IN STRINGS

PRINT FORMATTED STRING

COUNTING CHARACTERS IN STRING

VECTOR STRINGS

USING EXTRACTION AND ASSIGNMENT OPERATORS

CONVERTING NUMERIC TO STRINGS

PARSING STRINGS TOGETHER

BREAKING A STRING AT A DELIMITER

USING DescTools, trimws IN R ELIMINATES WHITE SPACE

STRING TO NUMERIC

Note R has another data type called factor which is used for categorical variables. They may look like string data but usually have a few levels and are identified as factor variables in R. In SAS they would be character and analyzed using Proc Freq.

4.4 Quiz Questions

  1. How do you create a vector in R?
  2. Give the function used to name the vector elements.
  3. Name the functions used to convert numeric data type to character data type in R.
  4. How can you simplify date handling in R?
  5. Give the name of function to calculate the difference between two dates in R?
  6. Give the name of function to calculate the difference between two dates in SAS?
  7. Give three different types of date format and their syntax in SAS.
  8. Calculate the characters in string “R and SAS” in R.
  9. Concatenate the strings “Hello” and “World” in SAS and R.
  10. What is the command used to eliminate white space in R.
  11. 11 11 Name a package in R that handles strings data.

Quiz Answers

  1. Vectors are created using c() command.
  2. names(vector_name) command is used to name the elements of a vector.
  3. In R: as.character.
  4. Date handling can be simplified using lubridate function.
  5. In R: difftime
  6. In SAS: intck
  7. 17‐07‐18 ddmmyy8., 07–17‐18 mmddyy8., 17jul2018 date9.
  8. >nchar(“R and SAS”)
  9. In R: paste(“Hello”, “World”, sep=” “) In SAS:
  10. Using DescTools package, trimws command eliminates whitespace in R. We can also use gsub ( gsub(“ “,,dataframe$variable) )
  11. stringr package
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.98.18