Time for action - creating a subset from a large dataset

We will start by assessing the feasibility of head to head combat with the Wei army. Since we have past data related directly to head to head battles, we should specifically target this information in order to best address the method's prospects. Currently, those data are part of a large set that also contains information on other methods. However, we can use the multi-argument function subset(data, ...) to isolate our head to head combat data and simplify our analysis of this strategy:

  1. Create a subset of data using the subset(data, ...) function and save it to a new variable named subsetHeadToHead:
    > #use the subset(data, ...) function to create a subset from a larger dataset
    > #create a subset that isolates our head to head combat data
    > subsetHeadToHead <- subset(battleHistory, battleHistory$Method == "headToHead")
    
  2. Verify the contents of the new subset. Note that the console should return thirty rows, all of which contain headTohead in the Method column:
    > #display the contents of the head to head subset
    > subsetHeadToHead
    
    Time for action - creating a subset from a large dataset

What just happened?

In the one console line that it took to create a subset of our data, you encountered your first multi-argument (and variable-argument) function in the R language.

Multi-argument functions

You were first introduced to functions in Chapter 2. There, the date() function received no arguments and output the current date and time in the R console. Shortly after, you used setwd(dir) and getwd(dir) to set and retrieve your R working directory. Both of these functions received a single argument. With subset(data, ...) you have used your first multi-argument function. Further, subset(data, ...) represents a variable-argument function, meaning that the exact number of arguments it receives can be different depending the circumstance. In our example, we used two arguments. However, we could have used more arguments to further specify our subset. For instance, we could have added an additional argument to our subset(data, ...) function that told R to include only certain columns in its output.

Variable-argument functions

Any time that you see ellipsis (...) in an R function, you know that it accepts a variable number of arguments. In contrast, some multi-argument functions, such as cor(x, y, use, method) for correlations, accept no more and no less than a specific number of arguments. However many others, such as plot(x, y, ...) for scatterplots, can accept relatively few or many arguments, depending on the situation.

Equivalency operators

In the second argument of our subset(data, ...) function, we employed the equivalency operator. It is formed by two consecutive equals signs (==). This operator evaluates the equivalency of two statements, the one to its left and the one to its right. If the statements are equal, then the argument is deemed True. If not, it is considered False.

Conversely, the non-equivalency operator, which is formed by an exclamation point joined with a single equals sign (!=), tests to see if two statements are not equal. If they indeed are not, then the argument is deemed True, otherwise False.

subset(data, ...)

Our implementation of the subset(data, ...) function made use of two arguments. The first referred to our data source, the battleHistory variable. The second specified the exact data that we wanted to pull from that source.

> subsetHeadToHead <- subset(battleHistory, battleHistory$Method
== "headToHead")

In our case, we wanted to include battles only if they employed the head to head combat method. To clarify this operation, let us dissect the second argument.

battleHistory$Method == "headToHead"

You should already be familiar with the left-hand segment, which selects the Method column from the battleHistory dataset. By using the equivalency operator (==) and "headToHead", we are telling our function to select only the rows in the Method column that contain a value of headToHead. In words, this argument can be read as "in the battleHistory dataset, select rows from the Method column only if they have a value of headTohead." Hence, our resulting subset yielded only the 30 rows from our original dataset that contained the head to head combat method.

Pop quiz

  1. What does an ellipsis (...) mean when encountered inside an R function definition?

    a. The function accepts a single argument.

    b. The function accepts multiple arguments.

    c. The function accepts a specific number of arguments.

    d. The function accepts a variable number of arguments.

  2. Interpret the following argument of the subset(data, ...) function in words: battleHistory$Result != "Victory"

    a. In the battleHistory dataset, select rows from column Result only if they do not have a value of Victory.

    b. In the battleHistory dataset, select rows from column Result only if they have a value of Victory.

    c. In the battleHistory dataset, select cells from column Result only if they do not have a value of Victory.

    d. In the battleHistory dataset, select cells from column Result only if they have a value of Victory.

Have a go hero

Now that you are familiar with extracting information from large datasets, use the subset(data, ...) function to create subsets for each of the remaining battle methods surround, ambush, and fire. Save each of these subsets into new variables, named subsetSurround, subsetAmbush, and subsetFire respectively.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.197.251