Note: Please read the disclaimer. The author is not providing professional investing advice or recommendations.
If you’ve arrived at this link just wanting to download a copy of my Matlab function that retrieves historical stock data and normalizes for splits & dividends, click here: get_hist_stock_data.m
If you want to understand a little more of the particulars, read on…
Many of us in the engineering field use Matlab for our daily university or professional work. It’s a fast and easy tool for algorithm development and simulations of all sorts. And often, after becoming proficient in Matlab, and having used it to solve many difficult problems, the Big IdeaTM dawns that goes something like this:
With all my background and experience in mathematics, programming, and problem solving, is there any reason that this skill set I’m drawing on right now to design a robust and efficient wireless communication system couldn’t also be used to develop a robust and efficient stock trading algorithm?
After all, isn’t a lot of engineering about recognizing and exploiting patterns or trends – or extracting meaningful information from signals corrupted with noise?
You can see it all now. Code up a trading algorithm, simulate it over historical stock data, tweak, adjust, and repeat until you have a system that would have historically delivered huge returns. Then unleash it upon an unexpecting Wall Street…
… and pray that the future tracks the past…
So the first question is where to get historical stock data for your backtesting? I had seen Compustat mentioned as the data source in many investing books so I contacted them for a quote. Let’s just say that after paying for the license I wouldn’t have had any money left over to actually invest. As S&P kindly told me, it’s really for institutions, not private investors.
But it turns out there is a fairly good free database via Yahoo! Finance’s Historical Portal. It only has high, low, open, close, and volume – and it only has data for businesses that are still alive (survivorship bias) – but it’s at least a step in the right direction!
And depending upon the algorithm you have in mind, these historical prices and volume may be all you’re really after anyway. So the next step is how to easily get the data into Matlab.
The steps for downloading historical stock data from Yahoo! Finance are already available at the Mathworks site. But I’ll expand on those steps a bit.
It is easy to read historical data into Matlab because of Matlab’s java interface. So for those who wish to use java instead of Matlab, the steps are therefore very similar.
Step 1 is to create the proper URL name for the stock and time period you are after. This takes a little deciphering of the URL syntax used by Yahoo! for historical data – which you can do by clicking on various historical data options and noting what appears in the URL address. Here’s an example URL for some historical data (daily) in CSV format for Apple Computer, from November 15, 2005 through February 17, 2006:
http://ichart.finance.yahoo.com/table.csv?s=AAPL&a=10&b=15&c=2005&d=01&e=17&f=2006&g=d&ignore=.csv
You’ll find that after s= comes the ticker symbol, after a= the start month (minus 1), after b= the start day of the month, c= the start year and so on. The final g= parameter lets you choose between getting historical stock information on a daily, weekly, or monthly basis.
So the first step is to create a character string with the desired URL name as described above, saved in a variable called, say, url_name:
url_name=’http://ichart.finance.yahoo.com/table.csv?…
s=AAPL&a=10&b=15&c=2005&d=01&e=17&f=2006&g=d&ignore=.csv’;
Then we set up a buffer for reading from the URL:
buff_reader = java.io.BufferedReader(…
java.io.InputStreamReader(openStream(java.net.URL(url_name))));
We then use this buffer initially to read the first line of the file (the header) and discard:
dummy = readLine(buff_reader);
Now we will set up some sort of while loop and use the command below to read a line of the file at a time and store it into a character string:
char_string = char(readLine(buff_reader));
You’ll need to parse char_string after each read to extract the stock’s high, low, open, close, etc. and use str2num to convert the price strings to doubles. Voila – you’re ready to develop your first simulation! Don’t forget to use the relationship between Adjusted Close and Close to first normalize your data – otherwise 2:1 splits could be seen by your algorithm as 50% price drops and so on.
And in case you missed it at the beginning of the article, here is a link again to my own function that puts all this together: get_hist_stock_data.m.
Never forget that not only is past performance no guarantee of future results, it might have no correlation whatsoever! But if you do develop something that appears to work year in and year out, through expansion and recession, good times and bad, such as Joel Greenblatt’s Magic Formula, you might just be on to something.
{ 49 comments… read them below or add one }
← Previous Comments
Hi,
I have modified this Matlab code to download some ocean wave data from the web. Works very nice.
Thanks.
Great!
I read somewhere that researchers found a correlation between stock prices and the production levels of butter in Bangladesh. Maybe we should also incorporate ocean wave and sunspot data to build a better trading algorithm…
Hi,
coool ! your tips on taking data from net to matlab.
I am also an engineer with interests in Finance. Your
attitude to share your CFA efforts, along the way, its
amazing ! Keep it Up ! If you are interested, lets keep
in touch, as i feel there’s lots in common in terms of
approach and thoughts (from what i read in your blog).
Anyways, good luck buddy.
Google finance has historical stock data
Thanks for sharing your code.
I wonder how to best build historical portfolio data. I observed that one stock may have gaps or doubles (latter with google data). Cleaning then becomes an issue. My flow would be like this:
- get the historical data from symbols in a list
- set up empty database with working days
- fill each day for every symbol
- look anomalies like spikes, gaps etc. – fill etc.
Any ideas?
Hi Zuio,
You could look for anomalies and adjust for them but so far I’ve had good luck just using Yahoo’s database rather than Google’s b/c it has an “adjusted close” column that allows you to automate the accounting for splits & dividends (using ratio of “adjusted close” to “close”).
- Lumi
Hi Lumi,
I am interested in setting up a simple Matlab code for a portfolio strategy. The proper alignment of cleaned time series is then mandatory. Any help is welcome and I would like to share results.
Hi Zuio,
Here is a post where I talk about this:
http://luminouslogic.com/how-to-normalize-historical-data-for-splits-dividends-etc.htm
I haven’t done any time-series stuff in a while (hope to get back into it one day) but I wish you luck!
- Lumi
Hi, could you help me how to download data into Matlab from url such this one: http://stooq.com/q/d/l/?s=zn.f&i=d
I would like to use Matlab to save csv file, but Matlab function urlwrite does not work on it.
Thanks
Istvan
Istvan,
I have very little time at the moment – I’ll put this on my To Do list and try to get back to you on this soon. It looks like we can use a similar script for stooq as was used for yahoo, but leaving out volume.
Hope to be back in touch with you soon,
-lumi
UPDATE: I tried but it does not appear possible to access the CSV info from Stooq in Matlab. Sorry.
do u know if there is a tool to download in Matlab … Google Finance Data ?
I know there is (or was) a toolbox you could buy from Mathworks to download stock info from yahoo – maybe they have one for google.
However, what particular elements are you after (historical prices, P/E, etc.)? I have a few different Matlab scripts I use to grab info from various financial sites. I could probably modify one to get whatever you want from google.
thank a lot Lumilog.
I need Hist Prices …
the other info are not a must at the moment.
thx
Pacca
just did a quick post with a link to the code here:
http://luminouslogic.com/how-download-historical-stock-data-google-matlab.htm
happy hunting!
-lumi
Hey, this is neat. I’m trying to exploit matlab for backtesting but I’m just now beginning. Are you rich and famous yet from your endeavors?
yes! and you can buy my system for 3 easy payments of $999999.99!
in all seriousness the CFA Program showed some serious flaws in my previous models so i’m starting again from scratch. i currently use matlab only to automate my own version of getting a current snapshot of my asset allocation profile (sort of like morningstar’s portfolio x-ray, but with a few extra bells and whistles).
but soon i will unleash algorithm 2.0 and take over the world. you might be eligible for a position in my cabinet!
Hi Lumi
Great posts. I am a grad student and i am working exactly on this… my background is control engineering.
I wanted to know how to obtain Historical Data for
1) Companies , P/E and other fundamental data… such as the data listed on the Key statistics page on yahoo.
2) How to download other macro economic data such as the FED interest rate, unemployment rate… for this i guess we need to scrape the data from the web page..
can you help me on this..
thanks
Divakar
Hi Div – thanks for writing.
If you can find the metrics on the web, you can easily grab them with Matlab. Of course the trick is finding what you’re after.
For most backtesting involving metrics beyond just price high, low, open, close, and volume, I’ve never been able to go back beyond about 10 years b/c that’s about all I can find for free.
As I mentioned in the post, I once inquired about purchasing Compustat in order to have more than just historical price. But it’s outrageously expensive for a private individual. Perhaps your university business department has access?
Hi Lumi
Thanks for the reply.
Yeah i found out that my university business dept has access to compustat. However since i am an eng student i wont have the access to that. I will find a way to work around that. Thanks for pointing it out. I would want to clarify more doubts. if you are ok , can u share ur email id so that i can write too you rather than posting it on ur posts? I know you are busy but your help will be of immense use to me.
Thanks
div – sohamm@gmail.com
Hi Lumilog,
I tried to use your cde to download IBM historical data. I ran into some error I couln’t understand.
I typed get_hist_stock_data(‘IBM’) in the command window, It created a 1530X1 cell of dates strings from 2004-09-27 till todate.
Could you please inform where I have gone wrong ?
Hi Sunil – you just need to call the function in a different way, otherwise you only get the first return argument and not all of them. Try:
[hist_date, hist_high, hist_low, hist_open, hist_close, hist_vol] = get_hist_stock_data(‘IBM’);
It works now. Thank you so much.
It was a silly mistake and proves that I am getting rusty using Matlab.
Hi Lumilog,
I was looking for a website to download stock time series data going back 5 years when I came upon this great blog…No long ago I did fMRI research to analyze time-series data, and realized while I was doing it that I could probably perform the same or similar statistical analyses to stock time series data to do forcasting–I never got a hold of MATLAB to do my analyses, but wonder where I could download a free copy of the latest version and then try your script for downloading time series data for technology sector stocks from yahoo.finance.
In any case, it would be great to communicate with you through email to find out more about data pre-processing steps. For fMRI data, there are a series of steps I go through before analyzing the time series–I’m not sure if the assumptions (e.g. normal distribution, homogeneity of variance, etc.) for analysis of stock time series data are different from these and if they require different pre-processing steps as a result (detrending, removing outliers, etc.)
I appreciate your blog and look forward to hearing from you soon, as your help will go a long way for me.
Best,
Lee
Thanks for writing Lee. Unfortunately I know of no free Matlab versions. In the beginning of my research when I didn’t want to pay for matlab, I used a free Java compiler instead (netbeans). It worked but was a little cumbersome since I didnt have lots of experience with Java. Ultimately I decided to bite the bullet and buy my own personal matlab license though to save time.
The preprocessing steps you mention for fMRI sound almost identical for analyzing market time series, so I guess they are common to all llinear regression and autoregressive models. Other common steps include differencing the data or working with % change instead of $ change and taking natural logs to convert exponential growh rates into linear ones for proper regression.
Please feel free to email me at any time. See my “About” page link at the top of this page to get my email address (trying to avoid spam by posting it exactly).
Best,
Lumi
could you please help as to how to load the fmri data on ma comp using spm toolbox in matlab
Thank you for your short tutorial- I can’t wait to get off work now and mess around with Matlab. I had no idea (but I’m not suprised) that Matlab can use java so easily. My class was given a simple problem to convert different world currencies into US dollars, and with this method I can easily get the data from the web instead of relying upon a static index. I don’t know why this sounds like so much fun.
Yesterday, I found out about sptool, today I find out about Java, tomorrow… maybe a winning lotto number function?
Thanks again.
karthik – no experience with spm toolbox. maybe lee will check back with us and respond…
john – thanks so much for stopping by and leaving a comment. maybe you can fuse the signal processing with currency trading and make a bundle. i’m sure there is more than one way to use an FFT. keep us posted!
-lumi
Felt like I have to leave a comment for taking advantage of your great work. This is an excellent tool that I will make good use of in my Risk Management course at my university. I’ll myself work on a tool that takes two time series and match the dates between them and post a link to it here later.
For example. I want to compare ^BVSP with ^DJI. In Brazil, they have different bank holidays than in the US. So, I’ll compare the two date vectors line by line and fill in the missing dates on the other.
An example:
change
^DJI ^BVSP
2010-10-13, 55 2010-10-13, 66
2010-10-12, 44 2010-10-11, 55
2010-10-11, 33 2010-10-08, 44
2010-10-08, 22 2010-10-07, 33
to
^DJI ^BVSP
2010-10-13, 55 2010-10-13, 66
2010-10-12, 44 2010-10-12, 55
2010-10-11, 33 2010-10-11, 55
2010-10-08, 22 2010-10-08, 44
I realized it wasn’t necessary to do such a script. I just used
todaily(fints(BVSP_date,BVSP_close))
from Financial Toolbox, and it just put NaN as data in the missing days. all the financial time series functions then ignores the NaN’s.
Simonize – thanks so much for taking the time to leave some comments as well as your code. Let me know if your Risk Mgmt course appears in iTunes U, would be fun to check out.
lumi
Just to say… GREAT and THANK YOU!!!
Hi Lumi.
Found this blog and I must say, thank you! This is a good script. However, I have a question for you. I’m using your function to screen multiple stocks. I ran into a problem. When I tried retrieving the ticker “AGII”, I received an error message:
Java exception occurred:
ice.net.URLNotFoundException: Document not found on server
at ice.net.HttpURLConnection.getInputStream(OEAB)
at java.net.URL.openStream(Unknown Source)
When I looked at this ticker in yahoo finance, there’s no historic data. No wonder it kicked me out. Is there a way to check whether the site has “historic data”? Otherwise, “skip” this ticker and move on the other tickers. I’m not quite an expert on Matlab (and Java for this matter), and I’m actually using your scripts to learn. Any guidance will help.
Thanks!
Clarence
I think I figured it out. Thanks anyways…
Tsaiko – Welcome!
Clarence – sorry late getting back to you, but glad you figured it out. I sometimes get an error similar to yours. It’s usually b/c I’m trying to look up a ticker that no longer exists, or has been moved to the pink sheets. Sometimes when the Wi-Fi is spotty I get that message too.
All best,
Lumi
Nice script. Unfortunately I get this error:
??? Undefined function or method ‘get_hist_stock_data’ for input arguments of type ‘char’.
Any idea on what could be the problem?
Baz – could you show us exactly what you’re typing on the command line to call the function?
Lumilog, I have wanted something like this for a few years and was glad to find your script. Thank you for sharing it.
I was wondering though, ‘How did you discover this?’
And, ‘How can I also get the historical market capitalization for a stock?’
Best Regards,
Motes
Does using the adjusted close compensate for market cap?
Hi Motes – maybe a bit tricky to get historical market cap. If I were after that I’d probably use the unadjusted closing price from this script and then multiply by the nearest date Shares Outstanding from MSN money as you can see in the example link below.
http://moneycentral.msn.com/investor/invsub/results/statemnt.aspx?lstStatement=10YearSummary&Symbol=US%3aEXC&stmtView=Qtr
Once you get comfortable doing this type of data retrieval via Matlab, you could quite easily automate the process by writing a script to retrieve the 10-year shares outstanding history also, and then even do a little linear interpolation to estimate what Shares Outstanding was on any individual date before multiplying by price per share to get market cap.
Not perfect but maybe better than nothing. Hope that helps!
Lumi,
I thought I left a comment here couple days ago, but it is not here somehow. So I will try again. I downloaded your matlab problem it works great I really like it. I am wondering if you tried to get option data as well? I am interested to download the daily option data and don’t know where to start yet, can you share your experience if you have?
Thanks a lot, great blog and Matlab program
It worked this time, even though my original message was much nicer
Anyway, it is a great tool works real well and I am thinking if I can use it to download option data. I have experience with Matlab and FFT before and want to do the same thing, financial freedom journey
Hi Baixiao – I think your previous comment was left on a different post, because I remember responding to it. ?!
Anyway, I don’t know of any free source for historical options prices – I wish I did. Let me know if you come across any – and thanks for taking the time to write. Now get those wavelet transforms and FFT engines humming and make us proud!
-lumi
Gracias por compartir un código excelente!
Do you have any experience with other software like jmp (it is a front end for SAS). I expect I could do the same thing as you are trying. I’m just starting. I have done regression with jmp and and done PCA/PCR with another program called unscrambler for spectral data.
To start I think I would like to use the algorithms to write covered calls and or hedge a portfolio in a down market.
I would think that it would be good to be able to run scripts on databases on thier sever and have subsets of data returned. Once you find specific coorelations and probabilities then you only need to access those data sets or use a service for that , correct?
Jaime – de nada!
Bill – the only other language I’ve used to do this is Java (via Netbeans). The one problem you might run into is I know of no free site to get historical option data from. But I’ve certainly done my share of correlation-style backtesting to try out things like pairs trading. Best of luck to you!
I have read your blog from the start of your CFA trek since my post. I too came up with a pairs trading concept on my own. I started in with non linear models and at least 3 components–the stock price and commodity prices tied to the business. Why over complicate things, particularly if you have no idea what you are doing? For an oil stock, you could use oil price and something to tied to the dollar. Harmonic oscillators are fascinating!
I’m contemplating the CFA exam. How old will I be in a few years if I don’t do it? I’m Looking forward to reading about the rest of your journey. You have related some invaluable insights for me!
On Matlab, I was wondering if it was worth it to learn it. The scripting language in JMP is like C (I never formally learned C). The Matlab scripts you posted look more like VB (I do a lot of VBA with excel). Code examples abound for VB; I feel I spend too much time trying to figure out how to script something in JMP that would easy to do in VB. For now I think I can use excel and internet queries and then dump the data into JMP. I think I can convert some of your scripts.
I’ll look into the paid options sites…recommendations?
Guys give stookle.com a shot
Hey!
Is it possible to tweak your MATLAB code always use the IPO date as start date? Your code gives an error if you set a start date prior to the IPO of a stock. I am currently trying to solve this problem, I will post a code fix if I manage to correct it myself.
Cheers!
P.S. great blog!
/Nemo
Hi,
I need historical data from Brazil stock xchange.
Yahoo and google doesnt have those datas for several stocks.
The ‘advfn’ provide the historical data i need.
How can i modify this script to read historial data from ‘advfn’ ?
advfn: http://www.advfn.com.br
thanks a lot.
{ 5 trackbacks }