How to do it...

  1. Read in the movie dataset, set the index as the title, and then create a boolean Series matching all movies with a content rating of G and an IMDB score less than 4:
>>> movie = pd.read_csv('data/movie.csv', index_col='movie_title')
>>> c1 = movie['content_rating'] == 'G'
>>> c2 = movie['imdb_score'] < 4
>>> criteria = c1 & c2
  1. Let's first pass these criteria to the .loc indexer to filter the rows:
>>> movie_loc = movie.loc[criteria]
>>> movie_loc.head()
  1. Let's check whether this DataFrame is exactly equal to the one generated directly from the indexing operator:
>>> movie_loc.equals(movie[criteria])
True
  1. Now let's attempt the same boolean indexing with the .iloc indexer:
>>> movie_iloc = movie.iloc[criteria]
ValueError: iLocation based boolean indexing cannot use an indexable as a mask
  1. It turns out that we cannot directly use a Series of booleans because of the index. We can, however, use an ndarray of booleans. To extract the array, use the values attribute:
>>> movie_iloc = movie.iloc[criteria.values]
>>> movie_iloc.equals(movie_loc)
True
  1. Although not very common, it is possible to do boolean indexing to select particular columns. Here, we select all the columns that have a data type of 64-bit integers:
>>> criteria_col = movie.dtypes == np.int64
>>> criteria_col.head()
color False director_name False num_critic_for_reviews False duration False director_facebook_likes False dtype: bool

>>> movie.loc[:, criteria_col].head()
  1. As criteria_col is a Series, which always has an index, you must use the underlying ndarray to make it work with .iloc. The following produces the same result as step 6.
>>> movie.iloc[:, criteria_col.values].head() 
  1. A boolean Series may be used to select rows and then simultaneously select columns with either integers or labels. Remember, you need to put a comma between the row and column selections. Let's keep the row criteria and select content_rating, imdb_score, title_year, and gross:
>>> cols = ['content_rating', 'imdb_score', 'title_year', 'gross']
>>> movie.loc[criteria, cols].sort_values('imdb_score')
  1. This same operation may be replicated with .iloc, but you need to get the integer location of all the columns:
>>> col_index = [movie.columns.get_loc(col) for col in cols]
>>> col_index
[20, 24, 22, 8]

>>> movie.iloc[criteria.values, col_index]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.12.50