Sorting a pandas DataFrame

In this section, we will learn about the pandas sort_values method. We will also use various methods to sort a pandas DataFrame and learn how to sort a pandas series object.

We will start by importing the pandas module and reading the dataset of house prices from zillow.com into the Jupyter Notebook. First, let's start with the simple type of sorting. We will use pandas' sort_values method for this. For example, imagine that we want to sort the data by the Metro column. We need to pass Metro as a parameter to the sort_values method, and call the method on the DataFrame as follows:

zillow.sort_values('Metro') 

This shows that the data has been sorted by the Metro column, as shown in the following screenshot:

If you notice, by default, the Date column is sorted in ascending order. We can change the sorting order, giving the ascending parameter the value of False, as shown in the following code block:

sorted = zillow.sort_values('Metro', ascending=False) 

The ascending parameter is optional, and when not passed, it is set to True by default. Now, we will look into how to sort data by more than one column. To do this, we need to pass the list of columns, by which we want our data to be sorted, to the parameter column of the sort_values method, as follows:

sorted = zillow.sort_values(by=['Metro','County']) 

The data has now been sorted by Metro first, and then the County column; that is, in the same order that we passed them into the sort_values method. We can take the multiple column sort further, and introduce a mixed ascending order. For example, we can sort by three columns: Metro, County, and the Price column, as follows:

sorted = zillow.sort_values(by=['Metro','County', 'Zhvi'], ascending=[True, True, False]) 
sorted.head() 

You must have noticed that we are passing a list of three Boolean values in ascending parameter. This sets the sort order to ascending for Metro and County, and descending for the last column, which is Zhvi:

Next, we see how to sort a series object. First, let's create a series. Let's select the RegionID column from our dataset, and create a series as follows:

regions = zillow.RegionID 
type(regions) 

Before we sort it, let's look at the original series by using regions.head(). The output would be as follows:

Now, let's sort it by calling the sort_values method on it. Since the dataset contains only one column, we don't need to pass any column name. Hence, the code to sort the data would be regions.sort_values().head(), and the output would be as follows:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.95.60