How it works...

Step 1 gives basic information on the size of the dataset. The shape attribute returns a two-element tuple of the number of rows and columns. The size attribute returns the total number of elements in the DataFrame, which is just the product of the number of rows and columns. The ndim attribute returns the number of dimensions, which is two for all DataFrames. Pandas defines the built-in len function to return the number of rows.

The methods in step 2 and step 3 aggregate each column down to a single number. Each column name is now the index label in a Series with its aggregated result as the corresponding value.

If you look closely, you will notice that the output from step 3 is missing all the object columns from step 2. The reason for this is that there are missing values in the object columns and pandas does not know how to compare a string value with a missing value. It silently drops all of the columns for which it is unable to compute a minimum.

In this context, silently means that no error was raised and no warning thrown. This is a bit dangerous and requires users to have a good familiarity with pandas.

The numeric columns have missing values as well but have a result returned. By default, pandas handles missing values in numeric columns by skipping them. It is possible to change this behavior by setting the skipna parameter to False. This will cause pandas to return NaN for all these aggregation methods if there exists at least a single missing value.

The describe method displays the main summarizations all at once and can expand its summary to include more quantiles by passing a list of numbers between 0 and 1 to the percentiles parameter. It defaults to showing information on just the numeric columns. See the Developing a data analysis routine recipe for more on the describe method.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.163.13