What is pandas?

The pandas we are going to obsess over in this book are not the cute and lazy animals that also do kung fu when needed.

pandas is a high-performance open source library for data analysis in Python developed by Wes McKinney in 2008. pandas stands for panel data, a reference to the tabular format in which it processes the data. It is available for free and is distributed with a 3-Clause BSD License under the open source initiative.

Over the years, it has become the de-facto standard library for data analysis using Python. There's been great adoption of the tool, and there's a large community behind it, (1,200+ contributors, 17,000+ commits, 23 versions, and 15,000+ stars) rapid iteration, features, and enhancements are continuously made.

Some key features of pandas include the following:

  • It can process a variety of datasets in different formats: time series, tabular heterogeneous, and matrix data.
  • It facilitates loading/importing data from varied sources, such as CSV and databases such as SQL.
  • It can handle myriad operations on datasets: subsetting, slicing, filtering, merging, groupBy, re-ordering, and re-shaping.
  • It can deal with missing data according to rules defined by the user/developer, such as ignore, convert to 0, and so on.
  • It can be used for parsing and munging (conversion) of data as well as modeling and statistical analysis.
  • It integrates well with other Python libraries such as statsmodels, SciPy, and scikit-learn.
  • It delivers fast performance and can be sped up even more by making use of Cython (C extensions to Python).

For more information, go through the official pandas documentation at http://pandas.pydata.org/pandas-docs/stable/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.244.137