Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

How it works...

How to do it...

Read in the movie dataset, and grab the basic descriptive attributes, shape, size, and ndim, along with running the len function:

>>> movie = pd.read_csv('data/movie.csv')
>>> movie.shape
(4916, 28)

>>> movie.size
137648

>>> movie.ndim
2

>>> len(movie)
4916

Use the count method to find the number of non-missing values for each column. The output is a Series that now has the old column names as its index:

>>> movie.count()
color                     4897
director_name             4814
num_critic_for_reviews    4867
duration                  4901
                          ... 
actor_2_facebook_likes    4903
imdb_score                4916
aspect_ratio              4590
movie_facebook_likes      4916
Length: 28, dtype: int64

The other methods that compute summary statistics such as min, max, mean, median, and std all return similar Series, with column names in the index and their computational result as the values:

>>> movie.min()
num_critic_for_reviews     1.00
duration                   7.00
director_facebook_likes    0.00
actor_3_facebook_likes     0.00
                           ... 
actor_2_facebook_likes     0.00
imdb_score                 1.60
aspect_ratio               1.18
movie_facebook_likes       0.00
Length: 16, dtype: float64

The describe method is very powerful and calculates all the descriptive statistics and quartiles in the preceding steps all at once. The end result is a DataFrame with the descriptive statistics as its index:

>>> movie.describe()

It is possible to specify exact quantiles in the describe method using the percentiles parameter:

>>> movie.describe(percentiles=[.01, .3, .99])

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.138.172.130