Time for action - creating a scatterplot

A scatterplot is a fundamental statistics graphic that can be used to better understand the relationships underlying a dataset. Like descriptive statistics and correlations, scatterplots are especially useful as a precursor to more extensive data analyses, such as linear regression modeling. We can use R to generate scatterplots that depict a single relationship between two variables or the relationships between all of the variables in a dataset. We will practice both of these methods:

  1. Use the plot(...) function to create a scatterplot depicting a single relationship between two variables:
    > #create a scatterplot that depicts the relationship between
    the number of Shu and Wei soldiers engaged in past fire attacks
    > #get the data to be used in the plot
    > scatterplotFireWeiSoldiersData <- subsetFire$WeiSoldiers
    > scatterplotFireShuSoldiersData <- subsetFire$ShuSoldiers
    > #customize the plot
    > scatterplotFireSoldiersLabelMain <-
    "Soldiers Engaged in Past Fire Attacks"
    > scatterplotFireSoldiersLabelX <- "Wei"
    > scatterplotFireSoldiersLabelY <- "Shu"
    > #use plot(...) to create and display the scatterplot
    > plot(x = scatterplotFireWeiSoldiersData,
    y = scatterplotFireShuSoldiersData,
    main = scatterplotFireSoldiersLabelMain,
    xlab = scatterplotFireSoldiersLabelX,
    ylab = scatterplotFireSoldiersLabelY)
    
  2. Your plot will be displayed in the graphic window, as shown in the following:
    Time for action - creating a scatterplot
  3. Use the plot(...) function to simultaneously depict the relationships between all of the variables in the dataset:
    > #create a scatterplot that depicts the relationships between
    all of the variables in our fire attack dataset
    > plot(x = subsetFire)
    
  4. A grouping of several plots will be displayed in the graphic window:
Time for action - creating a scatterplot

What just happened?

We created two scatterplots using R's plot(...) function, one portraying a single relationship and one displaying all of the relationships in our dataset.

Single scatterplot

To plot a single relationship between two variables, use R's plot(...) function. The primary arguments for plot(...) are:

  • x: the variable to be plotted on the x axis
  • y: the variable to be plotted on the y axis

Thus, the simplest form of plot(...) contains arguments only for the x and y variables, and is as shown:

plot(x = xVariable, y = yVariable)

We used the plot(...) function to visualize the relationship between the number of Shu and Wei soldiers involved in past fire attacks. To add relevant text to our graphic, we included the main, xlab, and ylab arguments:

> plot(scatterplotFireWeiSoldiersData,
scatterplotFireShuSoldiersData,
main = scatterplotFireSoldiersLabelMain,
xlab = scatterplotFireSoldiersLabelX,
ylab = scatterplotFireSoldiersLabelY)

Multiple scatterplots

We also used the plot(...) function to simultaneously explore all of the relationships within our dataset. This yielded a graphic that contained a scatterplot for every variable pair. The format for creating this type of scatterplot is:

plot(x = dataset)

Where dataset is a set of data containing multiple variables. For us, the dataset argument contained our fire attack data.

> plot(x = subsetFire)

The resulting plot allowed us to visualize all of the relationships between our variables in a single graphic.

Pop quiz

  1. Assume that a and b are data variables. Which of the following best describes the graphic that would result from the following line of code?
    > plot(x = a, y = b)
    

    a. A scatterplot with a on the x axis and b on the y axis.

    b. A scatterplot with b on the x axis and a on the y axis.

    c. A scatterplot containing all of the relationships in the dataset.

    d. A scatterplot containing none of the relationships in the dataset.

  2. Assume that a is a dataset. Which of the following best describes the graphic that would result from the following line of code?
    > plot(x = a)
    

    a. A scatterplot with a on the x axis.

    b. A scatterplot with a on the y axis.

    c. A scatterplot containing all of the relationships in the dataset.

    d. A scatterplot containing none of the relationships in the dataset.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.192.247