Sneak-peek at the data types

Let's now look at our data types:

df.dtypes 

The preceding code results in the following output:

This is what we want to see. Notice that since we can have a half bath, we needed a float there rather than an integer.

Next, let's carry out an inspection. Let's get a count of the number of units in each neighborhood:

df.groupby('neighborhood')['rent'].count().to_frame('count') 
.sort_values(by='count', ascending=False) 

The preceding code generates the following output:

It looks like most of the units are in Manhattan, which is what we might expect. Let's make sure that our neighborhood strings are clean. We can do that by doing a number of groupby operations:

df[df['neighborhood'].str.contains('Upper East Side')]['neighborhood'].value_counts() 

The preceding code generates the following output:

It looks like we have some issues with leading and possibly trailing spaces. Let's clean that up. We do so in the following code:

df['neighborhood'] = df['neighborhood'].map(lambda x: x.strip()) 

That should clear it up. Let's validate that:

df[df['neighborhood'].str.contains('Upper East Side')]['neighborhood'].value_counts() 

The preceding code results in the following output:

Perfect. Exactly what we want to see. At this point, we can do a few more inspections. Let's just take a look at the mean rent by neighborhood:

df.groupby('neighborhood')['rent'].mean().to_frame('mean') 
.sort_values(by='mean', ascending=False) 

The preceding code results in the following output:

We see that the Lincoln Square area appears to have the highest rent on average. At this point, we could continue on querying the data for interesting patterns, but let's move on to visualizing the data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.134.17