How to do it...

  1. Read in the movie dataset, and grab the basic descriptive attributes, shape, size, and ndim, along with running the len function:
>>> movie = pd.read_csv('data/movie.csv')
>>> movie.shape
(4916, 28)

>>> movie.size
137648

>>> movie.ndim
2

>>> len(movie)
4916
  1. Use the count method to find the number of non-missing values for each column. The output is a Series that now has the old column names as its index:
>>> movie.count()
color 4897
director_name 4814 num_critic_for_reviews 4867 duration 4901 ... actor_2_facebook_likes 4903 imdb_score 4916 aspect_ratio 4590 movie_facebook_likes 4916 Length: 28, dtype: int64
  1. The other methods that compute summary statistics such as min, max, mean, median, and std all return similar Series, with column names in the index and their computational result as the values:
>>> movie.min()
num_critic_for_reviews 1.00 duration 7.00 director_facebook_likes 0.00 actor_3_facebook_likes 0.00 ... actor_2_facebook_likes 0.00 imdb_score 1.60 aspect_ratio 1.18 movie_facebook_likes 0.00 Length: 16, dtype: float64
  1. The describe method is very powerful and calculates all the descriptive statistics and quartiles in the preceding steps all at once. The end result is a DataFrame with the descriptive statistics as its index:
>>> movie.describe()
  1. It is possible to specify exact quantiles in the describe method using the percentiles parameter:
>>> movie.describe(percentiles=[.01, .3, .99])
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.172.130