Time for action - creating a box plot

A useful way to convey a collection of summary statistics in a dataset is through the use of a box plot. This type of graph depicts a dataset's minimum and maximum, as well as its lower, median, and upper quartiles in a single diagram. Let us look at how box plots are created in R:

  1. Use the boxplot(...) function to create a box plot.
    > #create a box plot that depicts the number of soldiers
    required to launch a fire attack
    > #get the data to be used in the plot
    > boxplotFireShuSoldiersData <- subsetFire$ShuSoldiers
    > #customize the plot
    > boxPlotFireShuSoldiersLabelMain <- "Number of Soldiers
    Required to Launch a Fire Attack"
    > boxPlotFireShuSoldiersLabelX <- "Fire Attack Method"
    > boxPlotFireShuSoldiersLabelY <- "Number of Soldiers"
    > #use boxplot(...) to create and display the box plot
    > boxplot(x = boxplotFireShuSoldiersData,
    main = boxPlotFireShuSoldiersLabelMain,
    xlab = boxPlotFireShuSoldiersLabelX,
    ylab = boxPlotFireShuSoldiersLabelY)
    
  2. Your plot will be displayed in the graphic window, as shown in the following:
    Time for action - creating a box plot
  3. Use the boxplot(...) function to create a box plot that compares multiple datasets.
    > #create a box plot that compares the number of soldiers
    required across the battle methods
    > #get the data formula to be used in the plot
    > boxplotAllMethodsShuSoldiersData <- battleHistory$ShuSoldiers
    ~ battleHistory$Method
    > #customize the plot
    > boxPlotAllMethodsShuSoldiersLabelMain <- "Number of Soldiers
    Required by Battle Method"
    > boxPlotAllMethodsShuSoldiersLabelX <- "Battle Method"
    > boxPlotAllMethodsShuSoldiersLabelY <- "Number of Soldiers"
    > #use boxplot(...) to create and display the box plot
    > boxplot(formula = boxplotAllMethodsShuSoldiersData,
    main = boxPlotAllMethodsShuSoldiersLabelMain,
    xlab = boxPlotAllMethodsShuSoldiersLabelX,
    ylab = boxPlotAllMethodsShuSoldiersLabelY)
    
  4. Your plot will be displayed in the graphic window, as shown in the following:
Time for action - creating a box plot

What just happened?

We just created two box plots using R's boxplot(...) function, one with a single box and one with multiple boxes.

boxplot(...)

We started by generating a single box plot that was composed of a dataset, main title, and x and y labels. The basic format for a single box plot is as follows:

boxplot(x = dataset)

The x argument contains the data to be plotted. Technically, only x is required to create a box plot, although you will often include additional arguments. Our boxplot(...) function used the main, xlab, and ylab arguments to display text on the plot, as shown:

> boxplot(x = boxplotFireShuSoldiersData,
main = boxPlotFireShuSoldiersLabelMain,
xlab = boxPlotFireShuSoldiersLabelX,
ylab = boxPlotFireShuSoldiersLabelY)

Next, we created a multiple box plot that compared the number of Shu soldiers deployed by each battle method. The main, xlab, and ylab arguments remained from our single box plot, however our multiple box plot used the formula argument instead of x. Here, a formula allows us to break a dataset down into separate groups, thus yielding multiple boxes.

The basic format for a multiple box plot is as follows:

boxplot(formula = dataset ~ group)

In our case, we took our entire Shu soldier dataset (battleHistory$ShuSoldiers) and separated it by battle method (battleHistory$Method):

> boxplotAllMethodsShuSoldiersData <- battleHistory$ShuSoldiers ~
battleHistory$Method

Once incorporated into the boxplot(...) function, this formula resulted in a plot that contained four distinct boxes ambush, fire, head to head, and surround:

> boxplot(formula = boxplotAllMethodsShuSoldiersData,
main = boxPlotAllMethodsShuSoldiersLabelMain,
xlab = boxPlotAllMethodsShuSoldiersLabelX,
ylab = boxPlotAllMethodsShuSoldiersLabelY)

Pop quiz

  1. Which of the following best describes the result of the following code?
    > boxplot(x = a)
    

    a. A single box plot of the a dataset.

    b. A single box plot of the x dataset.

    c. A multiple box plot of the a dataset that is grouped by x.

    d. A multiple box plot of the x dataset that is grouped by a.

  2. Which of the following best describes the result of the following code?.
    > boxplot(formula = a ~ b)
    

    a. A single box plot of the a dataset.

    b. A single box plot of the b dataset.

    c. A multiple box plot of the a dataset that is grouped by b.

    d. A multiple box plot of the b dataset that is grouped by a.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.29.126