Forecasting

Let's say we've decided from our prior analysis that we are interested in three particular ZIP codes: 10002, 10003, and 10009. How can we use our model to determine what we should pay for a given apartment? Let's now take a look.

First, we need to know what the inputs into the model looked like so that we know how to enter a new set of values. Let's take a look at our X matrix:

X.head() 

The preceding code generates the following output:

What we see is that our input is coded with what are called dummy variables. To represent a ZIP code feature, since it is not numerical, dummy coding is used. If the apartment is in 10003, then that column will be coded as 1, while all other ZIP codes are coded as 0. Beds will be coded according to the actual number since they are numerical. So let's now create our own input row to predict:

to_pred_idx = X.iloc[0].index 
to_pred_zeros = np.zeros(len(to_pred_idx)) 
tpdf = pd.DataFrame(to_pred_zeros, index=to_pred_idx, columns=['value']) 
 
tpdf 

The preceding code generates the following output:

We have just used the index from the X matrix and filled in the data with all zeros. Let's now fill in our values. We are going to price a one-bedroom apartment in the 10009 area code:

tpdf.loc['Intercept'] = 1 
tpdf.loc['beds'] = 1 
tpdf.loc['zip[T.10009]'] = 1 
 
tpdf 
The intercept value for a linear regression must always be set to 1 for the model in order to return accurate statistical values.

The preceding code generates the following output:

We have set our features to the appropriate values, so let's now use our model to return a prediction. We'll need to convert it to a DataFrame and transpose it in order to get the correct format. We do this as follows:

results.predict(tpdf['value'].to_frame().T) 

The preceding code generates the following output:

You will recall that results was the variable name we saved our model to. That model object has a .predict() method, which we call with our input values. And, as you can see, the model returns a predicted value.

What if we want to add another bedroom? We can do it as follows:

  1. Let's change our inputs and see:
tpdf['value'] = 0 
tpdf.loc['Intercept'] = 1 
tpdf.loc['beds'] = 2 
tpdf.loc['zip[T.10009]'] = 1 
  1. Then we'll run the prediction again:
results.predict(tpdf['value'].to_frame().T) 

The preceding code generates the following output:

  1. It looks like that extra bedroom will cost us about $800 more a month. But what if we choose 10069 instead? Lets change our input and see:
tpdf['value'] = 0 
tpdf.loc['Intercept'] = 1 
tpdf.loc['beds'] = 2 
tpdf.loc['zip[T.10069]'] = 1 
 
results.predict(tpdf['value'].to_frame().T) 

The preceding code generates the following output:

According to our model, two bedrooms in the Lincoln Center area is going to cost a pretty penny compared to the East Village.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.237.123