Plotting 3D data

Many of the data analysis packages (R, Python, and so on) have significant data visualization capabilities. An interesting one is to display data in three dimensions. Often, when three dimensions are used, unexpected visualizations appear.

For this example, we are using the car dataset from https://uci.edu/. It is a well-used dataset with several attributes for vehicles, for example, mpg, weight, and acceleration. What if we were to plot three of those data attributes together and see if we can recognize any apparent rules?

The coding involved is as follows:

%matplotlib inline
# import tools we are using
import pandas as pd
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
# read in the car 'table' – not a csv, so we need
# to add in the column names
column_names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'year', 'origin', 'name']
df = pd.read_table('http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data',
sep=r"s+", index_col=0, header=None, names = column_names)
print(df.head())
cylinders displacement horsepower weight acceleration year origin
mpg
18.0 8 307.0 130.0 3504.0 12.0 70 1
15.0 8 350.0 165.0 3693.0 11.5 70 1
18.0 8 318.0 150.0 3436.0 11.0 70 1
16.0 8 304. 150.0 3433.0 12.0 70 1
17.0 8 302. 140.0 3449.0 10.5 70 1

mpg name
18.0 chevrolet chevelle malibu
15.0 buick skylark 320
18.0 plymouth satellite
16.0 amc rebel sst
17.0 ford torino

In the following code, we plot out the data according to three axes that appear to be significant factors—weight, miles per gallon, and the number of cylinders in the engines:

#start out plotting (uses a subplot as that can be 3d)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# pull out the 3 columns that we want
xs = []
ys = []
zs = []
for index, row in df.iterrows():
xs.append(row['weight'])
ys.append(index) #read_table uses first column as index
zs.append(row['cylinders'])
# based on our data, set the extents of the axes
plt.xlim(min(xs), max(xs))
plt.ylim(min(ys), max(ys))
ax.set_zlim(min(zs), max(zs))
# standard scatter diagram (except it is 3d)
ax.scatter(xs, ys, zs)
ax.set_xlabel('Weight')
ax.set_ylabel('MPG')
ax.set_zlabel('Cylinders')
plt.show()

Unexpectedly, there appears to be three levels by the apparent three lines of data points, regardless of weight:

  • Six cylinders with higher mpg
  • A lower mpg four cylinder
  • A higher mpg for four cylinder vehicles

I would have expected the weight to have a bigger effect:

>

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.242.108