Time for action - accessing data within variables

Both our hanzhongResources and soldiersByCity variables contain a complete set of values (as opposed to a single value). We already know that typing a variable's name into R will output all of its contents in the console. However, we often need to access the columns, rows, and cells within a dataset to perform calculations.

We will start by exploring two methods for accessing the columns in our soldiersByCity variable:

  1. First, we will access the Soldiers column from our soldiersByCity variable through R's variable$column notation:
    > #isolate a single column within a dataset using the
    variable$column notation.
    > #display the contents of the Soldiers column from the
    soldiersByCity variable
    > soldiersByCity$Soldiers
    
  2. R will display the contents of the Soldiers column, and the result is shown in the following screenshot:
    Time for action - accessing data within variables
  3. This time, let us use the attach(variable) function to simplify our operation.
    > #isolate a single column within a dataset using the
    attach(variable) function and simplified notation
    > #attach the soldiersByCity variable
    > attach(soldiersByCity)
    > #display the contents of the Soldiers column from the
    soldiersByCity variable
    > Soldiers
    
  4. R will display the contents of the Soldiers column:
    Time for action - accessing data within variables

    Next, we will access a single row within the soldiersByCity variable:

  5. Use the variable[row, column] matrix notation to display the contents of the tenth row in our soldiersByCity variable:
    > #isolate a single row within a dataset using the
    variable[row, column] matrix notation.
    > #display the contents of the tenth row in the soldiersByCity
    variable
    > soldiersByCity[10,]
    
  6. R will display the contents of the tenth row in our soldiersByCity dataset:
    Time for action - accessing data within variables
  7. Similarly, we can use matrix notation to access a single cell within our dataset.

    Use matrix notation to display the contents of cell [5,3] in our soldiersByCity variable:

    > #isolate a single cell within a dataset using the
    variable[row, column] matrix notation.
    > #display the contents of cell [5,3] in the soldiersByCity
    variable
    > soldiersByCity[5,3]
    
  8. R will display the contents of cell [5,3], as shown:
    Time for action - accessing data within variables

What just happened?

You have just practiced accessing data within a variable from each possible angle, that is, by columns, rows, and individual cells. Let us take a closer look at how variable data is accessed in R.

variable$column notation

Individual columns within a dataset can be accessed via the variable$column notation. Think of the dollar sign ($) as the letter S, as in the word "select." In this way, the notation can be read in words. For example, the line> A$B can be read as "from variable A, select column B." During our activity, we selected the Soldiers column from the soldiersByCity variable by typing the following code in the R console:

> soldiersByCity$Soldiers

attach(variable) function

The attach(variable) function is a convenient way to relieve ourselves of lengthy notation in some, but not all, cases. When a variable is attached in the R console, its columns can be referred to by name, without the need to identify the variable. For example, after we attached soldiersByCity, we could display the contents of the Soldiers column by simply typing> Soldiers in the console.

A caveat with the attach(variable) function is that often only a single variable can be attached to the R console at a given time. For instance, if we were to attach both our hanzhongResources and soldiersByCity variables at the same time, we would run into a problem regarding the Soldiers column. Since both of these variables contain such a column, R can only refer to the most recently attached version. Accessing the other would require the use of variable$column notation. In fact, R will warn you if you attach two variables that share a common column name. The following error occurs when the soldiersByCity variable is attached, followed by hanzhongResources:

attach(variable) function

On the other hand, attaching a variable can be useful and efficient when you are working with a single, large dataset. If you are only manipulating data from one variable, then you will not run into the demonstrated error. Furthermore, you can always have one variable attached, even if you are working with datasets that have identical column names. Of course, if your variables do not have columns in common, then attaching them all is an option. In any case, you can always refer to columns using variable$column notation, which we will do throughout the remainder of this book.

Note that should you ever need to detach a variable, you can use the detach(variable) function. This will return the variable to its prior status in the console, as if it had never been attached in the first place.

variable[row, column] notation

When referring to row data or individual cells, the variable[row, column] notation should be used. For rows, such as when we accessed the tenth row in soldiersByCity via> soldiersByCity[10,] the column portion of the notation is omitted. This tells R to retrieve all of the columns in the row.

To isolate an individual cell, both a row and column value must be specified. When we accessed cell [5,2] from soldiersByCity via> soldiersByCity[5,2] the 5 represented the cell's row, whereas the 2 defined the cell's column. This is similar to selecting a single point from a graph using its x-y coordinates, except the graph in our case is a matrix of data values.

On a side note, you may have noticed that variable[row,column] notation can also be used to refer to columns. This can be accomplished by leaving the row portion of the notation blank. For example, to access the City column in soldiersByCity, we could use the code soldiersByCity[,1], this tells R to retrieve every row within the City column.

Pop quiz

  1. Interpret the following R console line in words:
    > myVariable$myColumn
    

    a. Multiply the data within myVariable by the data within myColumn.

    b. Divide the data within myVariable by the data within myColumn.

    c. In variable myColumn, select column myVariable.

    d. In variable myVariable, select column myColumn.

  2. Under which of the following circumstances is it best not to attach dataset variables in the R console?

    a. You are working with a single dataset.

    b. You are working with multiple datasets that contain identical column names.

    c. You are working with multiple datasets that contain identical column names, but want to attach only one of them.

    d. You are working with multiple datasets that do not contain identical column names.

  3. The variable[row,column] notation can be used to access data from which of the following locations?

    a. Rows.

    b. Columns.

    c. Cells.

    d. All of the above.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.36.72