How Python and pandas fit into the data analytics mix

The Python programming language is one of the fastest growing languages today in the emerging field of data science and analytics. Python was created by Guido von Russom in 1991, and its key features include the following:

  • Interpreted rather than compiled
  • Dynamic type system
  • Pass by value with object references
  • Modular capability
  • Comprehensive libraries
  • Extensibility with respect to other languages
  • Object orientation
  • Most of the major programming paradigms-procedural, object-oriented, and to a lesser extent, functional.

Note

For more information, refer the Wikipedia page on Python at http://en.wikipedia.org/wiki/Python_%28programming_language%29.

Among the characteristics that make Python popular for data science are its very user-friendly (human-readable) syntax, the fact that it is interpreted rather than compiled (leading to faster development time), and its very comprehensive library for parsing and analyzing data, as well as its capacity for doing numerical and statistical computations. Python has libraries that provide a complete toolkit for data science and analysis. The major ones are as follows:

  • NumPy: The general-purpose array functionality with emphasis on numeric computation
  • SciPy: Numerical computing
  • Matplotlib: Graphics
  • pandas: Series and data frames (1D and 2D array-like types)
  • Scikit-Learn: Machine learning
  • NLTK: Natural language processing
  • Statstool: Statistical analysis

For this book, we will be focusing on the 4th library listed in the preceding list, pandas.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.30.178