Any analysis like the one presented in this chapter has a number of things that we need to question. This chapter is no exception.
The main weakness of this project was that it was carried out on far too little data. This cuts in several ways:
For all of these, there are reasons we didn't address the issues in this chapter. However, if you plan to take this further, you'd need to figure out some way around these.
There are several ways to look at the results too. The day we looked at, the results all clustered close to zero. In fact, this stock if relatively stable, so if it always indicated little change, then it would always have a fairly low SSE. Large changes seem to happen occasionally, and the error from not predicting them has a low impact on the SSE.
Second, and more importantly, simply putting some stock data into a jar with some machine learning and shaking it is a risky endeavor. This isn't a get-rich-quick scheme, and by approaching it so naively, you're asking for trouble. In this case, that means losing money.
For one thing, there's not much noise in news articles, and the relationship between their content and stock prices is tenuous enough that in general, stock prices may not be predictable from news reports in the first place, whatever results we achieve is this study, particularly given how small it is.
Really, to do this well, you need to understand at least two things:
With this knowledge, you should be able to formulate a better model of how the stock prices change and which prices you should pay attention to.
But keep in mind, André Christoffer Andersen and Stian Mikelsen have published a master's thesis in 2012 showing that it's very, very difficult to do better than buying and holding index funds (http://blog.andersen.im/wp-content/uploads/2012/12/ANovelAlgorithmicTradingFramework.pdf). So, if you do try this route, you have a hard, hard task in front of you.
18.227.190.211