One of my most trafficked posts is How to Download Historical Stock Data into Matlab. But truth be told, I haven’t opened Matlab in several months. I’m now a zealous devotee of R.

Let’s bring things up to date with instructions on how to download stock information into R. So, assuming you’ve got your ticker symbol stored in a variable…

ticker = 'BUD'

prices <- read.csv( paste( "http://ichart.finance.yahoo.com/table.csv?s=",ticker,"&g=d&ignore=.csv",sep="") )

I mean, that's so easy I don't even need to give you a script. Maybe you want to turn the prices into returns:

Done!

Might as well do something with the data, I guess. How about a multiple regression to see if we can predict today's return (y) from the prior 3 days' (x1, x2, x3)? Let's use the most recent 100 days.

y = returns[1:100]
x1 = returns[2:101]
x2 = returns[3:102]
x3 = returns[4:103]
model = lm(y ~ x1 + x2 + x3)
print(summary(model))

Ahhh... unfortunately none of our predictor variables have coefficients that are statistically significant. And that's a pretty dismal R-squared. What should we do next: add more days? Add prior returns of other stocks? Simplify things and just try to predict price direction instead? Replace linear regression with a neural network? Random forest?

Oh man, if 7 years ago Matlab had made statistical learning as easy and free as Stanford and R do today, I would have backtested myself silly.

But there's something I heard a great investor (guesses?) say that gave me a new perspective on data mining. To paraphrase, Rule #1 of quantitative investing is to respect the fact that you live in a non-stationary world.

If you find a pattern in historical stock data, before you start putting money to work through it, you need to also have a thesis for both why it's there and why it will persist. Otherwise, when you enter the inevitable period where the algorithm has been losing money, you really have no idea if something's changed, the pattern no longer works, or whether you should stick to your guns or even double-down.

And that of course is the beauty of statistical value strategies like lowest decile P/B. Their outperformance is a pattern that appears across time and markets. Behavioral finance has convincing theories as to why it has been there and always will be (loss aversion bias, preference for lotteries, stationarity of human nature). There are even ways to mathematically argue that they're less risky and deliver higher risk-adjusted returns.

My laptop still adds an extra couple of degrees to any room it's in - relentlessly mining historical data for patterns. Only these days, it's all for kaggle.