How to do it...

  1. Read in the movie dataset and slim it down to just the three columns we care about, movie_title, title_year, and imdb_score:
>>> movie = pd.read_csv('data/movie.csv')
>>> movie2 = movie[['movie_title', 'title_year', 'imdb_score']]
  1. Use the sort_values method to sort the DataFrame by title_year. The default behavior sorts from the smallest to largest. Use the ascending parameter to invert this behavior by setting it equal to True:
>>> movie2.sort_values('title_year', ascending=False).head()
  1. Notice how only the year was sorted. To sort multiple columns at once, use a list. Let's look at how to sort both year and score:
>>> movie3 = movie2.sort_values(['title_year','imdb_score'],
ascending=False)
>>> movie3.head()
  1. Now, we use the drop_duplicates method to keep only the first row of every year:
>>> movie_top_year = movie3.drop_duplicates(subset='title_year')
>>> movie_top_year.head()
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.133.233