9
Data Visualization

Data Visualization is an important part of the data science process. Here we learn how to do data science in both languages.

9.1 Importance of Data Visualization

The anscombe dataset shows the importance of data visualization. On statistical examination it shows data is similar. But on visualization it shows the data is very different.

Property Value
Mean of x 9
Sample variance of x 11
Mean of y 7.50
Sample variance of y 4.125
Correlation between x and y 0.816
Linear regression line y = 3.00 + 0.500x
Coefficient of determination of the linear regression 0.67

But the graphs are quite different.

Anscombe dataset in R consists of 4 graphs for y1 vs x1 (top left), y2 vs. x2 (top right), y3 vs. x3 (bottom left), and y4 vs. x4 (bottom right) displaying circle markers with various patterns.

Figure 9.1 Anscombe Dataset in R.

We are going to do the following graphs in this chapter for both SAS and R:

  • Bar Plot: A bar chart represents data in vertical bars with height of the bar proportional to the value of the variable.
  • Bar‐Line Plot: A combination of Bar Plots with Line Graphs, with one quantity being represented in a Bar Plot and the other in a Line Graph.
  • Box Plot: A plot in which a rectangle is drawn to represent the second and third quartiles, usually with a vertical line inside to indicate the median value.
  • Bubble Plot: A bubble chart is a type of chart that displays three dimensions of data. Each entity with its triplet (v1, v2, v3) of associated data is plotted as a disk that expresses two of the vi values through the disk's xy location and the third through its size.
  • Heat Map: A plot in which data values are represented as colors.
  • Histogram: This represents the frequencies of a variable bucketed into breaks.
  • Line Chart: A graph that connects a series of points by drawing lines between them.
  • Mosaic Plot: A graphical display of the cell frequencies of a contingency table in which the area of boxes of the plot are proportional to the cell frequencies.
  • Pie Chart: A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to illustrate numerical proportion.
  • Scatter Plot: A graph in which the values of two variables are plotted along two axes, the pattern of the resulting points revealing any correlation present.

9.2 Data Visualization in SAS

The Tasks and Utilities option in SAS Studio enables graphs quite easily as well as generation of code. Due to printability options we are giving partial output here.

Snipped image displaying data visualization options in SAS. Options such as bar chart, bar–line chart, box plot, bubble plot, heat map, histogram, line chart, mosaic plot, pie chart, etc. are listed under graph.

Figure 9.2 Data Visualization Options in SAS.

Bar Plot:

Bar chart in SAS displaying 3 clusters of bars for three Iris species: setosa (left), versicolor (middle), and virginica (right). Legend box at the bottom displays the shades for various sepal lengths.

Figure 9.3 Bar Plot in SAS.

Bar‐ Line Plot:

Bar–line chart in SAS displaying three bars arranged in ascending order representing the sepal lengths for setosa (left), versicolor (middle), and virginica (right). Ascending line represent the petal length.

Figure 9.4 Bar‐Line Plot in SAS.

Box Plot:

Box plot in SAS illustrating the sepal length for setosa (left), versicolor (middle), and virginica (right).

Figure 9.5 Box Plot in SAS.

Bubble Plot:

Bubble plot in SAS illustrating the sepal width for setosa, versicolor, and virginica depicted by three clusters of dark-gray, light-gray, and gray circles, respectively.

Figure 9.6 Bubble Plot in SAS.

Heat Map:

Heat map in SAS with vertical axis at the left indicates petal length, vertical axis at the right indicates petal width, and horizontal axis indicates sepal length.

Figure 9.7 Heat Map in SAS.

Histogram:

Histogram in SAS displaying 8 adjacent vertical bars between sepal length of 40 and 80 mm. The histogram also displays a superimposed normal curve (bell-shaped).

Figure 9.8 Histogram in SAS.

Line Chart:

Line chart in SAS displaying a fluctuating wave form representing frequency.

Figure 9.9 Line Plot in SAS.

Mosaic Plot:

Mosaic plot in SAS illustrating the distribution of cylinders by type. Columns of stacked bars are displayed for hybrid, SUV, sedan, sports, truck, and wagon (left–right).

Figure 9.10 Mosaic Plot in SAS.

Pie Chart:

Pie chart in SAS displaying a circle divided into 4 unequal segments. Each segment has corresponding numbers indicated.

Figure 9.11 Pie Plot in SAS.

Scatter Plot:

Scatter plot in SAS displaying horizontally aligned circle markers along the horizontal lines for hybrid, truck, wagon, sports, sedan, and SUV.

Figure 9.12 Scatter Plot in SAS.

9.3 Data Visualization in R

Bar Plot:

Bar‐Line Plot:

Box Plot:

Bubble Plot:

Heat Map:

Histogram:

Line Chart:

Mosaic Plot:

Pie Chart:

Scatter Plot:

9.4 Quiz Questions

  1. What type of plot shows inter quartile range and median?
  2. What type of plot shows relative numerical quantities in terms of height?
  3. Which type of plot connects points with lines?
  4. Which type of plot shows relative frequency of two or more categorical variables through area?
  5. Which type of graph shows frequencies as grouped in breaks?
  6. Which type of graph shows color intensity as a measure of variable quantity?
  7. Which type of graph can show three and even four quantities?
  8. Which type of graph shows relative frequencies as a circle?
  9. Where do Graphs appear in SAS Studio?
  10. What does Anscombe dataset prove?

Quiz Answers

  1. Boxplot
  2. Barplot
  3. Line Plot
  4. Mosaic Plot
  5. Histogram
  6. HeatMap
  7. Bubble Plot
  8. Pie Chart
  9. Tasks and Utilities in the left pane
  10. Pure numerical statistics can be deceptive without data visualization.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.144.248