What is pandas?

The pandas we are going to obsess over in this book are not the cute and lazy animals that also do kung fu when needed.

pandas is a high-performance open source library for data analysis in Python developed by Wes McKinney in 2008. pandas stands for panel data, a reference to the tabular format in which it processes the data. It is available for free and is distributed with a 3-Clause BSD License under the open source initiative.

Over the years, it has become the de-facto standard library for data analysis using Python. There's been great adoption of the tool, and there's a large community behind it, (1,200+ contributors, 17,000+ commits, 23 versions, and 15,000+ stars) rapid iteration, features, and enhancements are continuously made.

Some key features of pandas include the following:

It can process a variety of datasets in different formats: time series, tabular heterogeneous, and matrix data.
It facilitates loading/importing data from varied sources, such as CSV and databases such as SQL.
It can handle myriad operations on datasets: subsetting, slicing, filtering, merging, groupBy, re-ordering, and re-shaping.
It can deal with missing data according to rules defined by the user/developer, such as ignore, convert to 0, and so on.
It can be used for parsing and munging (conversion) of data as well as modeling and statistical analysis.
It integrates well with other Python libraries such as statsmodels, SciPy, and scikit-learn.
It delivers fast performance and can be sped up even more by making use of Cython (C extensions to Python).

For more information, go through the official pandas documentation at http://pandas.pydata.org/pandas-docs/stable/.

Table of Contents for What is pandas?

Create new playlist

Sign In

Sign Up

Table of Contents for
What is pandas?