Time for action - manipulating variable data

Being able to access the information stored in a variable is the initial step towards manipulating its data. Variables and their data can be used in the same way that we used numbers to perform calculations in Chapter 2. They can be used in mathematical formulas as well as in function arguments.

  1. Use your hanzhongResources variable to calculate the amount of resources that the Shu army would have remaining if a flood were to destroy 75% of each resource:
    > #if a flood destroyed 75% of the Shu resources at Hanzhong,
    how much of each resource would remain?
    > #multiply the hanzhongResources variable by 0.25 to represent
    the remaining 25% of the original resources
    > hanzhongResources * 0.25
    
  2. R will display the result of the calculation:
    Time for action - manipulating variable data
  3. Now assume that the hypothetical flood only affected the provisions at Hanzhong, while all of the other resources remained unharmed. Here, you must perform a calculation only on the Provisions column of the hanzhongResources variable:
    > #if a flood destroyed 75% of the Provisions at Hanzhong,
    how much would remain?
    > #multiply the Provisions column by 0.25 to represent the
    remaining 25% of the original resources
    > hanzhongResources$Provisions * 0.25
    
  4. R will display the results of the calculation. Note that calculations can be applied in the same fashion to rows, columns, and cells.
    Time for action - manipulating variable data
  5. Variable data can also be used in function arguments. On a less disastrous note, use your soldiersByCity variable to calculate the mean (average) number of soldiers stationed in a Shu city:
    > #use the mean(data) function to calculate the average number
    of soldiers stationed in a Shu city
    > #on average, a Shu city has this many soldiers:
    > mean(soldiersByCity$Soldiers)
    
  6. R will display the results of the calculation. Note that functions can be applied in the same fashion to row, column, or cell data, or entire datasets.
    Time for action - manipulating variable data
  7. Moreover, calculation results can be saved into new variables for use at a later time. This time, save the calculation from step 5 into a new variable named meanSoldiersByCity:
    > #save the mean number of soldiers per city into a new
    variable named meanSoldiersByCity
    > meanSoldiersByCity <- mean(soldiersByCity$Soldiers)
    
  8. R will not return any output. Verify the contents of meanSoldiersByCity by entering it into the R console:
    > #display the contents of meanSoldiersByCity
    > meanSoldiersByCity
    
  9. R will display the contents of the meanSoldiersByCity variable:
    Time for action - manipulating variable data

What just happened?

In just a few lines of code, you have experienced the range of variable manipulations that you will use on a regular basis in R. Let us explore each one individually.

Performing a calculation on an entire dataset

When you used your hanzhongResources variable to calculate the consequences of a flood across each resource, you discovered that when a variable is manipulated in this manner, so is all of its underlying data.

For demonstration, consider the following table with the cell values of 1, 2, 3, and 4 in columns a, b, c, and d respectively:

a

b

c

d

1

2

3

4

Suppose that this table is saved in a R variable named lettersAndNumbers. If we were to add one to the lettersAndNumbers variable in R, by the following command:

> lettersAndNumbers + 1

Our resulting table would contain the addition of each cell's value and one, as follows:

a

b

c

d

2

3

4

5

As you can see, R will attempt to perform any calculation made on a dataset to each of its values. However, it is worth noting that R will not always be able to make a successful calculation on every cell in a dataset.

For instance, if we tried to make a numeric calculation on the Kingdom and City columns of our soldiersByCity variable, R would return a warning along with an NA or not applicable values. This is due to the fact that our Kingdom and City columns contain text and therefore it does not make sense to manipulate them numerically. To see this warning in action, enter the following lines into the R console:

> #what happens if we try to make a numeric calculation on
nonnumeric data?
> #we receive a warning, because it does not make sense to
manipulate text mathematically
> soldiersByCity * 5

This would result in the following screen:

Performing a calculation on an entire dataset

Here, the Soldiers columns contain numeric values and therefore each value within it is successfully multiplied by five. However, the text in the Kingdom and City columns cannot be multiplied. Hence, a warning message is returned. To avoid deriving meaningless values and upsetting the R console, it is important to be aware of your data and apply appropriate calculations to them.

Performing a calculation on a row, column, or cell

Manipulating row, column, or cell data is identical to manipulating an entire dataset contained within a variable. The difference is not in the calculation, but rather in what you choose to perform the calculation on. Depending on whether you aim to manipulate row, column, or cell data, you will need to access the values in the appropriate manner. See the Accessing data within variables section of this chapter for a review of these methods.

Using variable data in function arguments

A variable's data, be it from the entire set or a specific subset (row, column, or cell), can be used in function arguments. Our preceding activity used the mean(data) function to calculate the average number of soldiers among the Shu cities listed in our soldiersByCity variable. We could have easily done the same with the entire soldiersByCity dataset, a single row, or an individual cell. The best method for using variable data in arguments will depend on the goal of the manipulation and the specific function being employed.

Saving a variable calculation into a new variable

Do not forget that a variable's purpose is to store and organize your information. Quite often, we will need to store the results of a calculation or function into a new variable for subsequent manipulation. The body of variables and other objects that we amass throughout our work are stored in the R workspace, which is the topic of our next section.

Pop quiz

The table myTable contains two rows, three columns, and six cells with the numbers one through six. Use this table to answer questions 1 and 2.

myTable

  

1

2

3

4

5

6

  1. Consider the following line of code:
    > myTable * 10
    

    If this code were applied to myTable, what would be the result? Write the appropriate values in the blank cells of myTableAfterManipulation1:

    myTableAfterManipulation1

      
       
       
  2. Consider the following line of code:
    > myTable[1,2] + 10
    

    If this code were applied to myTable, what would be the result? Write the appropriate values in the blank cells of myTableAfterManipulation2:

    myTableAfterManipulation2

      
       
       
  3. Interpret the following R console line in words:
    > myVariable <- mean(myData$myColumn)
    

    a. Calculate the mean of myColumn and then set myVariable equal to the result.

    b. Calculate the mean of myData and then set myVariable equal to the result.

    c. In myData, select myColumn, calculate its mean, and then set myVariable equal to the result.

    d. Set myVariable equal to the contents of myData and then calculate its mean.

Have a go hero

To practice the variety of methods that we have covered for manipulating variables, use your resource data and knowledge of R to complete the following tasks:

  1. Suppose you are concerned with the potential of flooding to damage your resources. Calculate the amount of resources that would remain if a flood destroyed half of each resource stored in your hanzhongResources variable. Save the results into a single variable named hanzhongResourcesAfterFlood.
  2. To account for a recent relocation of 5000 soldiers from Guanghan to Baxi, subtract 5000 from the cell representing the number of Guanghan soldiers and add 5000 to the cell representing the number of Baxi soldiers in the soldiersByCity variable. Save each of these calculations into a new variable. The variables should be named guanghanSoldiersAfterRelocation and baxiSoldiersAfterRelocation respectively.
  3. Use the min(data) and max(data) functions and your soldiersByCity variable to calculate minimum and maximum number of soldiers in either army by city. Save the results as variables named minSoldiersByCity and maxSoldiersByCity respectively.
  4. Use the sum(data) function and your soldiersByCity variable to calculate the total number of soldiers in the Shu and Wei armies. Then, save the result as a variable named totalSoldiers.

If you encounter a warning or error during any of these tasks, think about how you can be more specific about which data you want to apply your calculation or function to. For detailed information on handling these occurrences, refer back to the Performing a calculation on an entire dataset section of this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.134.229