Exploring streaming sensor data from a weather station

Let's get our hands dirty with steaming data. Assuming that you have your virtual machine on and ready to use, let's start with these steps:

  1. Open a Terminal shell. Change into the sensor directory. In this case, suppose that you have downloaded them into the Downloads folder, as follows:
cd Downloads/HandsOnBigData/CH09/sensor
  1. View the streaming weather station data. Run stream-data.py to see the streaming data from the weather station:
./stream-data.py

Running the preceding script will generate output similar to the following:

The measurements are appearing as they are produced by the weather station. By looking at the timestamp, we can see that the data arrives about every second. Additionally, different measurement types are produced at different frequencies. For example, R1 is measured every second, but R2 is less frequent. The following code snippet connects to the weather station and gets the streaming data:

#!/usr/bin/python
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('rtd.hpwren.ucsd.edu', 12020))
for i in range(0, 60):
d = s.recv(1024)
p = d.split(' ', 2)
print "{0}: {1}".format(i, p[2])
s.close()
  1. Create a plot of the streaming data. We can plot the streaming data by running stream-plot-data.py, as follows:
./stream-plot-data.py Sm

The code to plot the streaming data is as follows:

#!/usr/bin/python

import sys
import re
import time
import matplotlib.pyplot as plt
import matplotlib.dates as mdate
from pytz import timezone

x = []
y = []

file = open(sys.argv[1], 'r')
for line in file:
parts = re.split("s+", line)

data = parts[1].split(",")

for field in data:
match = re.match(sys.argv[2] + '=(d+.d+).*', field)
if match:
timestamp = float(parts[0])
x.append(timestamp)
#time_parts = time.localtime(timestamp)
y.append(float(match.group(1)))

file.close()

#fig, ax = plt.subplots()
fig = plt.figure()
ax = fig.add_subplot(111)

secs = mdate.epoch2num(x)

ax.plot_date(secs, y)

plt.xlabel('time')
plt.ylabel(sys.argv[2])

date_formatter = mdate.DateFormatter('%H:%M.%S', tz=timezone('US/Pacific'))
ax.xaxis.set_major_formatter(date_formatter)
fig.autofmt_xdate()

plt.show()

From this plot diagram, we can see that the plot is updated less frequently than the first plot, since air temperature measurements are produced less frequently. To plot the graph, we used the matplotlib library. To format the time and date, we used the timezone library: 

The preceding plot demonstrates the average wind speed (Sm) and is updated every time a new measurement is generated. The plot is updated every second, as new data is pushed into the stack.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.165.247