Chapter 7. Driving Visual Analyses with Automobile Data (Python)

In this chapter, we will cover:

  • Getting started with IPython
  • Exploring IPython Notebook
  • Preparing to analyze automobile fuel efficiencies
  • Exploring and describing fuel efficiency data with Python
  • Analyzing automobile fuel efficiency over time with Python
  • Investigating the makes and models of automobiles with Python

Introduction

In the first chapter on R (Chapter 2, Driving Visual Analysis with Automobile Data (R)), we walked through an analysis project that examined automobile fuel economy data using the R statistical programming language. This dataset, available at http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip, contains fuel efficiency performance metrics over time for all makes and models of automobiles in the United States of America. This dataset also contains numerous other features and attributes of the automobile models other than fuel economy, providing an opportunity to summarize and group the data so that we can identify interesting trends and relationships.

Unlike the first chapter on R, we will perform the entire analysis using Python. However, we will ask the same questions and follow the same sequence of steps as before, again following the data science pipeline. With study, this will allow you to see the similarities and differences between the two languages for a mostly identical analysis.

In Chapter 6, Creating Application-oriented Analyses Using Tax Data (Python), we used mostly pure Python with some help from NumPy and SciPy, either straight from the Python command line—also known as Read-Eval-Print Loop (REPL)—or from executable script files. In this chapter, we will take a very different approach using Python as a scripting language in an interactive fashion that is more similar to R. We will introduce the reader to the unofficial interactive environment of Python, IPython, and the IPython notebook, showing how to produce readable and well-documented analysis scripts. Further, we will leverage the data analysis capabilities of the relatively new but powerful pandas library and the invaluable data frame data type that it offers. pandas often allows us to complete complex tasks with fewer lines of code. The drawback to this approach is that while you don't have to reinvent the wheel for common data manipulation tasks, you do have to learn the API of a completely different package, which is pandas.

The goal of this chapter is not to guide you through an analysis project that you have already completed but to show you how that project can be completed in another language. More importantly, we want to get you, the reader, to become more introspective with your own code and analysis. Think not only about how something is done but why something is done that way in that particular language. How does the language shape the analysis?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.27.45