Categorical or nominal data is information that is classified into nonnumeric levels. Two pertinent columns in our battle history dataset, and subsequently our head to head combat subset, are represented by categorical data. These are the SuccessfullyExecuted
(categorized as Y
or N)
and Result
(categorized as Victory
or Defeat)
columns. A major benefit of categorical data is that it represents information in a very practical and understandable manner. However, categorical data is not well-suited for quantitative data analysis. Fortunately, R is able to recode categorical data in numeric form, thus allowing us to analyze it quantitatively.
Let us proceed through the steps required to recode our SuccessfullyExecuted
and Result
columns and save them as numeric variables:
SuccessfullyExecuted
column using as.numeric(data)
, as can be seen in the following:> #represent categorical data numerically using as.numeric(data) > #recode the SuccessfullyExecuted column into N = 1 and Y = 2 > numericExecutionHeadToHead <- as.numeric(subsetHeadToHead$SuccessfullyExecuted)
> #display the contents of numericSuccessfullyExecutedHeadToHead > numericExecutionHeadToHead [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Note that if you prefer your categorical variables to begin with a value of zero, as in N = 0
and Y = 1
, then you should subtract one from our statement in step 1.
SuccessfullyExecuted
column so it begins with a value of zero.> #recode the SuccessfullyExecuted column into N = 0 and Y = 1 > #by default, R recodes variables alphabetically from 1 to n, so subtract one to offset the coding from 0 to n > numericExecutionHeadToHead <- as.numeric(subsetHeadToHead$SuccessfullyExecuted) - 1
> #display the contents of numericExecutionHeadToHead > numericExecutionHeadToHead [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Result
column using as.numeric(data):
> #recode the Result column into Defeat = 0 and Victory = 1 > numericResultHeadToHead <- as.numeric(subsetHeadToHead$Result) - 1
> #display the contents of numericResultHeadToHead > numericResultHeadToHead [1] 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 1
You have represented your categorical columns (SuccessfullyExecuted
and Result
) from the head to head combat dataset as numeric variables, thereby preparing them for quantitative analysis. During this process, you encountered the as.numeric(data)
function and exercised your ability to overwrite variables.
The as.numeric(data)
function is used to represent nonnumeric data in numeric terms. For example, we used as.numeric(data)
to convert our N
and Y
text values from the SuccessfullyExecuted
column into the numbers 0
and 1
respectively, using the following:
> numericExecutionHeadToHead <- as.numeric(subsetHeadToHead$SuccessfullyExecuted) - 1
Similarly, we used as.numeric(data)
to code our Result
column text of Defeat
and Victory
into the numbers 0
and 1:
> numericResultHeadToHead <- as.numeric(subsetHeadToHead$Result) - 1
In step 1 of our activity, we originally recoded our SuccessfullyExecuted
column using values of N
as 1 and Y
as 2 and saved the results into a variable called numericExecutionHeadToHead
, this was done by the following command:
> numericExecutionHeadToHead <- as.numeric(subsetHeadToHead$SuccessfullyExecuted)
Then, in step 3, we recoded the column using values of N
as 0 and Y
as 1 and then saved the results into a variable with the same name of numericExecutionHeadToHead:
> numericExecutionHeadToHead <- as.numeric(subsetHeadToHead$SuccessfullyExecuted) - 1
While this was a seamless process that occurred without interruption, it demonstrates an important property of R variables. That is, R variables can be reassigned to new values. When a variable is overwritten in this manner, it assumes a new value and abandons its previous one. So, after step 3, our numericSuccessfullyExecutedHeadToHead
variable represented N
and Y
as 0 and 1 and ceased to depict the values as we had defined them in step 1.
To demonstrate this point, consider variable A
, which has yet to be assigned a value. Once we execute the line:
> A <- 1
Variable A
will take on a value of 1 in the preceding line. If we were then to enter the line:
> A <- 2
Variable A
would take on a value of 2 in the preceding line. Its previous contents would be overwritten and therefore forgotten.
N
and Y
in the SuccessfullyExecuted
column if it were recoded using the following line?> as.numeric(as.numeric(subsetHeadToHead$SuccessfullyExecuted) + 5
a. N
= 0 and Y
= 1
b. N
= 1 and Y
= 2
c. N
= 5 and Y
= 6
d. N
= 6 and Y
= 7
A
after the following lines were executed in the R console?> A <- 0 > A <- 1 > A <- 2 > A <- 3
a. 3
b. 2
c. 1
d. 0
Now that you have quantified your first categorical variables, proceed to recode the SuccessfullyExecuted
and Result
columns for each of the remaining battle methods surround, ambush, and fire. Follow a similar console structure and naming convention that we used with our head to head combat data. For example, you should create the following variables with your ambush data:
numericExecutionAmbush
numericResultAmbush
3.149.240.196