In this section, we are going to use another file. The file is found in the directory for Chapter 14, Modeling Weather Data Points with Python. The file is called GlobalLandTemperaturesByMajorCity.csv.
Let's import the file into our notebook:
#reading the file and collecting only the timestamp and the year
dframe= pd.read_csv('GlobalLandTemperaturesByMajorCity.csv')
df_ny = dframe[dframe['City']=='New York']
df_ny= df_ny.iloc[:, :2]
It is always a good idea to examine the first few entries to see how our dataset looks. This can be done using a head function:
df_ny.head(10)
The output should look like the following:
Now, we can select only the year from the time stamp and group the data by the mean temperature of each year. This can be done using the following snippet:
a = df_ny['dt'].apply(lambda x: int(x[0:4]))
grouped = df_ny.groupby(a).mean()
grouped.head(10)
And let's plot the temperature by year:
#plotting the data
plt.plot(grouped['AverageTemperature'])
plt.show()
The graph should like the following:
We can see there are several blank spaces due to the NaN blocks in the data. We can fix the anomalies by filling each of the NaN blocks with its preceding block value:
#Then Plotting the fixed data
df_ny['AverageTemperature'] = df_ny['AverageTemperature'].fillna(method = 'ffill')
grouped = df_ny.groupby(a).mean()
plt.plot(grouped['AverageTemperature'])
plt.xlabel('year')
plt.ylabel('temperature in degree celsius')
plt.title('New York average temperature versus year')
plt.show()
It can be seen from the following chart that AverageTemperature is an increasing function for the most part:
Now that the date is fixed, let's use linear regression to predict the temperature of New York City in the future. The first step is to import the library:
from sklearn.linear_model import LinearRegression as LinReg
And we need to reshape the data:
#Reshape the index of 'grouped' i.e. years
x= grouped.index.values.reshape(-1,1)
#obtaining values of temperature
y = grouped['AverageTemperature'].values
Now, let's create the model and find the precision:
#Using linear regression and finding accuracy of our prediction
reg = LinReg()
reg.fit(x,y)
y_preds = reg.predict(x)
Accuracy = str(reg.score(x,y))
print(Accuracy)
The accuracy printed in this case is 0.24223324942541424. We can also plot the temperature data with years to fit the model:
#plotting data along with regression
plt.scatter(x=x, y=y_preds)
plt.scatter(x=x,y=y, c='r')
plt.ylabel('Average Temperature in degree celsius')
plt.xlabel('year')
plt.show()
The output of the code should appear as follows:
Now, let's use the model to predict future temperature values:
#finding future values of temperature
reg.predict(2048)
The output says array([10.70528732]). Similarly, we can use other types of algorithm to create beautiful models that can be used to analyze the data.