Plotting 3D data

Many of the data analysis packages (R, Python, and so on) have significant data visualization capabilities. An interesting one is to display data in three dimensions. Often, when three dimensions are used, unexpected visualizations appear.

For this example, we are using the car dataset from https://uci.edu/. It is a well-used dataset with several attributes for vehicles, for example, mpg, weight, and acceleration. What if we were to plot three of those data attributes together and see if we can recognize any apparent rules?

The coding involved is as follows:

%matplotlib inline
# import tools we are using
import pandas as pd
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
# read in the car 'table' – not a csv, so we need
# to add in the column names
column_names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'year', 'origin', 'name']
df = pd.read_table('http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data', 
                  sep=r"s+", index_col=0, header=None, names = column_names)
print(df.head())
      cylinders  displacement horsepower  weight  acceleration  year  origin  
mpg                                                                           
18.0          8         307.0  130.0  3504.0          12.0    70       1  
15.0          8         350.0  165.0  3693.0          11.5    70       1  
18.0          8         318.0  150.0  3436.0          11.0    70       1  
16.0          8         304.   150.0  3433.0          12.0    70       1  
17.0          8         302.   140.0  3449.0          10.5    70       1  
                   
mpg                        name             
18.0  chevrolet chevelle malibu 
15.0          buick skylark 320 
18.0         plymouth satellite 
16.0              amc rebel sst 
17.0                ford torino

In the following code, we plot out the data according to three axes that appear to be significant factors—weight, miles per gallon, and the number of cylinders in the engines:

#start out plotting (uses a subplot as that can be 3d)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# pull out the 3 columns that we want
xs = []
ys = []
zs = []
for index, row in df.iterrows():
 xs.append(row['weight'])
 ys.append(index) #read_table uses first column as index
 zs.append(row['cylinders'])
# based on our data, set the extents of the axes
plt.xlim(min(xs), max(xs))
plt.ylim(min(ys), max(ys))
ax.set_zlim(min(zs), max(zs))
# standard scatter diagram (except it is 3d)
ax.scatter(xs, ys, zs)
ax.set_xlabel('Weight')
ax.set_ylabel('MPG')
ax.set_zlabel('Cylinders')
plt.show()

Unexpectedly, there appears to be three levels by the apparent three lines of data points, regardless of weight:

Six cylinders with higher mpg
A lower mpg four cylinder
A higher mpg for four cylinder vehicles

I would have expected the weight to have a bigger effect:

Table of Contents for Plotting 3D data

Create new playlist

Sign In

Sign Up

Table of Contents for
Plotting 3D data