How to do it...

  1. After reading in the movies dataset, let's select two Series with different data types. The director_name column contains strings, formally an object data type, and the column actor_1_facebook_likes contains numerical data, formally float64:
>>> movie = pd.read_csv('data/movie.csv')
>>> director = movie['director_name']
>>> actor_1_fb_likes = movie['actor_1_facebook_likes']
  1. Inspect the head of each Series:
>>> director.head()
0 James Cameron 1 Gore Verbinski 2 Sam Mendes 3 Christopher Nolan 4 Doug Walker Name: director_name, dtype: object

>>> actor_1_fb_likes.head()
0 1000.0 1 40000.0 2 11000.0 3 27000.0 4 131.0 Name: actor_1_facebook_likes, dtype: float64
  1. The data type of the Series usually determines which of the methods will be the most useful. For instance, one of the most useful methods for the object data type Series is value_counts, which counts all the occurrences of each unique value:
>>> director.value_counts()
Steven Spielberg 26 Woody Allen 22 Martin Scorsese 20 Clint Eastwood 20 .. Fatih Akin 1 Analeine Cal y Mayor 1 Andrew Douglas 1 Scott Speer 1 Name: director_name, Length: 2397, dtype: int64
  1. The value_counts method is typically more useful for Series with object data types but can occasionally provide insight into numeric Series as well. Used with actor_1_fb_likes, it appears that higher numbers have been rounded to the nearest thousand as it is unlikely that so many movies received exactly 1,000 likes:
>>> actor_1_fb_likes.value_counts()
1000.0 436 11000.0 206 2000.0 189 3000.0 150 ... 216.0 1 859.0 1 225.0 1 334.0 1 Name: actor_1_facebook_likes, Length: 877, dtype: int64
  1. Counting the number of elements in the Series may be done with the size or shape parameter or the len function:
>>> director.size
4916
>>> director.shape
(4916,)
>>> len(director)
4916
  1. Additionally, there is the useful but confusing count method that returns the number of non-missing values:
>>> director.count()
4814
>>> actor_1_fb_likes.count()
4909
  1. Basic summary statistics may be yielded with the min, max, mean, median, std, and sum methods:
>>> actor_1_fb_likes.min(), actor_1_fb_likes.max(), 
actor_1_fb_likes.mean(), actor_1_fb_likes.median(),
actor_1_fb_likes.std(), actor_1_fb_likes.sum()
(0.0, 640000.0, 6494.488490527602, 982.0, 15106.98, 31881444.0)
  1. To simplify step 7, you may use the describe method to return both the summary statistics and a few of the quantiles at once. When describe is used with an object data type column, a completely different output is returned:
>>> actor_1_fb_likes.describe()
count 4909.000000 mean 6494.488491 std 15106.986884 min 0.000000 25% 607.000000 50% 982.000000 75% 11000.000000 max 640000.000000 Name: actor_1_facebook_likes, dtype: float64

>>> director.describe()
count 4814
unique 2397
top Steven Spielberg
freq 26
Name: director_name, dtype: object
  1. The quantile method exists to calculate an exact quantile of numeric data:
>>> actor_1_fb_likes.quantile(.2)
510

>>> actor_1_fb_likes.quantile([.1, .2, .3, .4, .5,
.6, .7, .8, .9])
0.1 240.0 0.2 510.0 0.3 694.0 0.4 854.0 ... 0.6 1000.0 0.7 8000.0 0.8 13000.0 0.9 18000.0 Name: actor_1_facebook_likes, Length: 9, dtype: float64
  1. Since the count method in step 6 returned a value less than the total number of Series elements found in step 5, we know that there are missing values in each Series. The isnull method may be used to determine whether each individual value is missing or not. The result will be a Series of booleans the same length as the original Series:
>>> director.isnull()
0 False 1 False 2 False 3 False ... 4912 True 4913 False 4914 False 4915 False Name: director_name, Length: 4916, dtype: bool
  1. It is possible to replace all missing values within a Series with the fillna method:
>>> actor_1_fb_likes_filled = actor_1_fb_likes.fillna(0)
>>> actor_1_fb_likes_filled.count()
4916
  1. To remove the Series elements with missing values, use dropna:
>>> actor_1_fb_likes_dropped = actor_1_fb_likes.dropna()
>>> actor_1_fb_likes_dropped.size
4909
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.165.70