First, we will read the close price data. Second, we will split the prices according to the day of the week. Third, for each weekday, we will calculate the average price. Finally, we will find out which day of the week has the highest average and which has the lowest average. A health warning before we commence – you might be tempted to use the result to buy stock on one day and sell on the other. However, we don't have enough data to make this kind of decision. Please consult a professional statistician first!
Coders hate dates because they are so complicated! NumPy is very much oriented towards floating point operations. For that reason, we need to take extra effort to process dates. Try it out yourself; put the following code in a script or use the one that comes with the book:
dates, close=np.loadtxt('data.csv', delimiter=',', usecols=(1,6), unpack=True)
Execute the script and the following error will appear:
ValueError: invalid literal for float(): 28-01-2011
Now perform the following steps to deal with dates:
loadtxt
function has a special parameter for this purpose. The parameter is called converters
and is a dictionary that links columns with so-called converter functions. It is our responsibility to write the converter function.Let's write the function down:
# Monday 0 # Tuesday 1 # Wednesday 2 # Thursday 3 # Friday 4 # Saturday 5 # Sunday 6 def datestr2num(s): return datetime.datetime.strptime(s, "%d-%m-%Y").date().weekday()
We give the datestr2num
function dates as a string, such as "28-01-2011". The string is first turned into a datetime
object using a specified format "%d-%m-%Y"
. This is, by the way, standard Python and is not related to NumPy itself. Second, the datetime
object is turned into a day. Finally the weekday
method is called on the date to return a number. As you can read in the comments, the number is between 0 and 6. 0 is for instance Monday and 6 is Sunday. The actual number, of course, is not important for our algorithm; it is only used as identification.
dates, close=np.loadtxt('data.csv', delimiter=',', usecols=(1,6), converters={1: datestr2num}, unpack=True) print "Dates =", dates
This prints the following output:
Dates = [ 4. 0. 1. 2. 3. 4. 0. 1. 2. 3. 4. 0. 1. 2. 3. 4. 1. 2. 4. 0. 1. 2. 3. 4. 0. 1. 2. 3. 4.]
No Saturdays and Sundays, as you can see. Exchanges are closed over the weekend.
averages = np.zeros(5)
This array will hold the averages for each weekday.
where
function that returns indices of the array for elements that conform to a specified condition. The take
function can use these indices and takes the values of the corresponding array items. We will use the take
function to get the close prices for each weekday. In the following loop we go through the date values 0 to 4, better known as Monday to Friday. We get the indices with the where
function for each day and store it in the indices
array. Then, we retrieve the values corresponding to the indices, using the take
function. Finally, we compute an average for each weekday and store it in the averages
array, as follows:for i in range(5): indices = np.where(dates == i) prices = np.take(close, indices) avg = np.mean(prices) print "Day", i, "prices", prices, "Average", avg averages[i] = avg
The loop prints the following output:
Day 0 prices [[ 339.32 351.88 359.18 353.21 355.36]] Average 351.79 Day 1 prices [[ 345.03 355.2 359.9 338.61 349.31 355.76]] Average 350.635 Day 2 prices [[ 344.32 358.16 363.13 342.62 352.12 352.47]] Average 352.136666667 Day 3 prices [[ 343.44 354.54 358.3 342.88 359.56 346.67]] Average 350.898333333 Day 4 prices [[ 336.1 346.5 356.85 350.56 348.16 360. 351.99]] Average 350.022857143
max
and min
functions, as shown next:top = np.max(averages) print "Highest average", top print "Top day of the week", np.argmax(averages) bottom = np.min(averages) print "Lowest average", bottom print "Bottom day of the week", np.argmin(averages)
The output is as follows:
Highest average 352.136666667 Top day of the week 2 Lowest average 350.022857143 Bottom day of the week 4
The argmin
function
returned the index of the lowest value in the averages
array. The index returned was 4
, which corresponds to Friday. The argmax
function returned the index of the highest value in the averages
array. The index returned was 2
, which corresponds to Wednesday (see weekdays.py
).
import numpy as np from datetime import datetime # Monday 0 # Tuesday 1 # Wednesday 2 # Thursday 3 # Friday 4 # Saturday 5 # Sunday 6 def datestr2num(s): return datetime.strptime(s, "%d-%m-%Y").date().weekday() dates, close=np.loadtxt('data.csv', delimiter=',', usecols=(1,6), converters={1: datestr2num}, unpack=True) print "Dates =", dates averages = np.zeros(5) for i in range(5): indices = np.where(dates == i) prices = np.take(close, indices) avg = np.mean(prices) print "Day", i, "prices", prices, "Average", avg averages[i] = avg top = np.max(averages) print "Highest average", top print "Top day of the week", np.argmax(averages) bottom = np.min(averages) print "Lowest average", bottom print "Bottom day of the week", np.argmin(averages
Hey, that was fun! For the sample data, it appears that Friday is the cheapest day and Wednesday is the day when your Apple stock will be worth the most. Ignoring the fact that we have very little data, is there a better method to compute the averages? Shouldn't we involve volume data as well? Maybe it makes more sense to you to do a time-weighted average. Give it a go! Calculate the VWAP and TWAP. You can find some hints on how to go about doing this at the beginning of this chapter.
18.217.5.86