Time for action – dealing with dates

First, we will read the close price data. Second, we will split the prices according to the day of the week. Third, for each weekday, we will calculate the average price. Finally, we will find out which day of the week has the highest average and which has the lowest average. A health warning before we commence – you might be tempted to use the result to buy stock on one day and sell on the other. However, we don't have enough data to make this kind of decision. Please consult a professional statistician first!

Coders hate dates because they are so complicated! NumPy is very much oriented towards floating point operations. For that reason, we need to take extra effort to process dates. Try it out yourself; put the following code in a script or use the one that comes with the book:

dates, close=np.loadtxt('data.csv', delimiter=',',
  usecols=(1,6), unpack=True)

Execute the script and the following error will appear:

ValueError: invalid literal for float(): 28-01-2011

Now perform the following steps to deal with dates:

  1. Obviously, NumPy tried to convert the dates into floats. What we have to do is explicitly tell NumPy how to convert the dates. The loadtxt function has a special parameter for this purpose. The parameter is called converters and is a dictionary that links columns with so-called converter functions. It is our responsibility to write the converter function.

    Let's write the function down:

    # Monday 0
    # Tuesday 1
    # Wednesday 2
    # Thursday 3
    # Friday 4
    # Saturday 5
    # Sunday 6
    def datestr2num(s):
        return datetime.datetime.strptime(s, "%d-%m-%Y").date().weekday()

    We give the datestr2num function dates as a string, such as "28-01-2011". The string is first turned into a datetime object using a specified format "%d-%m-%Y". This is, by the way, standard Python and is not related to NumPy itself. Second, the datetime object is turned into a day. Finally the weekday method is called on the date to return a number. As you can read in the comments, the number is between 0 and 6. 0 is for instance Monday and 6 is Sunday. The actual number, of course, is not important for our algorithm; it is only used as identification.

  2. Now we will hook up our date converter function to load the data.
    dates, close=np.loadtxt('data.csv', delimiter=',', usecols=(1,6), converters={1: datestr2num}, unpack=True)
    print "Dates =", dates

    This prints the following output:

    Dates = [ 4.  0.  1.  2.  3.  4.  0.  1.  2.  3.  4.  0.  1.  2.  3.  4.  1.  2.  4.  0.  1.  2.  3.  4.  0.  1.  2.  3.  4.]

    No Saturdays and Sundays, as you can see. Exchanges are closed over the weekend.

  3. We will now make an array that has five elements for each day of the week. The values of the array will be initialized to 0.
    averages = np.zeros(5)

    This array will hold the averages for each weekday.

  4. We already learned about the where function that returns indices of the array for elements that conform to a specified condition. The take function can use these indices and takes the values of the corresponding array items. We will use the take function to get the close prices for each weekday. In the following loop we go through the date values 0 to 4, better known as Monday to Friday. We get the indices with the where function for each day and store it in the indices array. Then, we retrieve the values corresponding to the indices, using the take function. Finally, we compute an average for each weekday and store it in the averages array, as follows:
    for i in range(5):
        indices = np.where(dates == i) 
        prices = np.take(close, indices)
        avg = np.mean(prices)
        print "Day", i, "prices", prices, "Average", avg
        averages[i] = avg

    The loop prints the following output:

    Day 0 prices [[ 339.32  351.88  359.18  353.21  355.36]] Average 351.79
    Day 1 prices [[ 345.03  355.2   359.9   338.61  349.31  355.76]] Average 350.635
    Day 2 prices [[ 344.32  358.16  363.13  342.62  352.12  352.47]] Average 352.136666667
    Day 3 prices [[ 343.44  354.54  358.3   342.88  359.56  346.67]] Average 350.898333333
    Day 4 prices [[ 336.1   346.5   356.85  350.56  348.16  360.    351.99]] Average 350.022857143
  5. If you want, you can go ahead and find out which day has the highest, and which the lowest, average. However, it is just as easy to find this out with the max and min functions, as shown next:
    top = np.max(averages)
    print "Highest average", top
    print "Top day of the week",  np.argmax(averages)
    bottom = np.min(averages)
    print "Lowest average", bottom
    print "Bottom day of the week",  np.argmin(averages)

    The output is as follows:

    Highest average 352.136666667
    Top day of the week 2
    Lowest average 350.022857143
    Bottom day of the week 4

What just happened?

The argmin function returned the index of the lowest value in the averages array. The index returned was 4, which corresponds to Friday. The argmax function returned the index of the highest value in the averages array. The index returned was 2, which corresponds to Wednesday (see weekdays.py).

import numpy as np
from datetime import datetime

# Monday 0
# Tuesday 1
# Wednesday 2
# Thursday 3
# Friday 4
# Saturday 5
# Sunday 6
def datestr2num(s):
    return datetime.strptime(s, "%d-%m-%Y").date().weekday()

dates, close=np.loadtxt('data.csv', delimiter=',', usecols=(1,6), converters={1: datestr2num}, unpack=True)
print "Dates =", dates

averages = np.zeros(5)

for i in range(5):
    indices = np.where(dates == i) 
    prices = np.take(close, indices)
    avg = np.mean(prices)
    print "Day", i, "prices", prices, "Average", avg
    averages[i] = avg


top = np.max(averages)
print "Highest average", top
print "Top day of the week", np.argmax(averages)

bottom = np.min(averages)
print "Lowest average", bottom
print "Bottom day of the week", np.argmin(averages

Have a go hero – looking at VWAP and TWAP

Hey, that was fun! For the sample data, it appears that Friday is the cheapest day and Wednesday is the day when your Apple stock will be worth the most. Ignoring the fact that we have very little data, is there a better method to compute the averages? Shouldn't we involve volume data as well? Maybe it makes more sense to you to do a time-weighted average. Give it a go! Calculate the VWAP and TWAP. You can find some hints on how to go about doing this at the beginning of this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.5.86