Linear regression to predict the temperature of a city

In this section, we are going to use another file. The file is found in the directory for Chapter 14, Modeling Weather Data Points with Python. The file is called GlobalLandTemperaturesByMajorCity.csv.

Let's import the file into our notebook:

#reading the file and collecting only the timestamp and the year 
dframe= pd.read_csv('GlobalLandTemperaturesByMajorCity.csv')
df_ny = dframe[dframe['City']=='New York']
df_ny= df_ny.iloc[:, :2]

It is always a good idea to examine the first few entries to see how our dataset looks. This can be done using a head function:

df_ny.head(10)

The output should look like the following:

Screenshot 14.8: First 10 entries from the file

Now, we can select only the year from the time stamp and group the data by the mean temperature of each year. This can be done using the following snippet:

a = df_ny['dt'].apply(lambda x: int(x[0:4]))
grouped = df_ny.groupby(a).mean()
grouped.head(10)

And let's plot the temperature by year:

#plotting the data
plt.plot(grouped['AverageTemperature'])
plt.show()

The graph should like the following:

Screenshot 14.9: Temperature of New York City by year

We can see there are several blank spaces due to the NaN blocks in the data. We can fix the anomalies by filling each of the NaN blocks with its preceding block value:

#Then Plotting the fixed data
df_ny['AverageTemperature'] = df_ny['AverageTemperature'].fillna(method = 'ffill')
grouped = df_ny.groupby(a).mean()
plt.plot(grouped['AverageTemperature'])
plt.xlabel('year')
plt.ylabel('temperature in degree celsius')
plt.title('New York average temperature versus year')
plt.show()

It can be seen from the following chart that AverageTemperature is an increasing function for the most part:

Screenshot 14.10: Temperature of New York City with preprocessed data

Now that the date is fixed, let's use linear regression to predict the temperature of New York City in the future. The first step is to import the library:

from sklearn.linear_model import LinearRegression as LinReg

And we need to reshape the data:

#Reshape the index of 'grouped' i.e. years
x= grouped.index.values.reshape(-1,1)
#obtaining values of temperature
y = grouped['AverageTemperature'].values

Now, let's create the model and find the precision:

#Using linear regression and finding accuracy of our prediction
reg = LinReg()
reg.fit(x,y)
y_preds = reg.predict(x)
Accuracy = str(reg.score(x,y))
print(Accuracy)

The accuracy printed in this case is 0.24223324942541424. We can also plot the temperature data with years to fit the model:

#plotting data along with regression
plt.scatter(x=x, y=y_preds)
plt.scatter(x=x,y=y, c='r')
plt.ylabel('Average Temperature in degree celsius')
plt.xlabel('year')
plt.show()

The output of the code should appear as follows:

Screenshot 14.11: Linear regression model generated from the dataset

Now, let's use the model to predict future temperature values:

#finding future values of temperature
reg.predict(2048)

The output says array([10.70528732]). Similarly, we can use other types of algorithm to create beautiful models that can be used to analyze the data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.249.220