How to Download Historical Stock Data into Matlab

Greetings! You’ve arrived here most likely because you’ve just realized that you’ve been Wasting Your Talents at your job using Matlab to research new medicines, design fuel-efficient engines, or develop next-generation wireless communication technologies when – duh! – you could put those Matlab skills to much better, MUCH better use turning the stock market into your own ATM. From your boat!

Well here it is – the gateway drug – a few simple lines of code that will let you download Yahoo’s free historical stock price data right into Matlab. Then you’re on your own as to what to do with it, but I’ll bet you’ve learned quite a bit of signal processing mojo over the years that you’re salivating to put to good use: particle filters, neural networks, wavelet transforms, …maybe even Viterbi decoders!

Assuming you already speak Matlab well, there are really just a handful of new things to learn.

First, it might help to know exactly where to download from. Yahoo Finance provides a convenient historical data CSV file for each stock currently trading. Type in any ticker, click on Historical Prices, and scroll down to the bottom…

Tricky Move #1 is to crack the URL genome so that you can generate your own URL on the fly. The URLs looks like this:

http://ichart.finance.yahoo.com/table.csv?s=AAPL&a=10&b=15&c=2005&d=01&e=17&f=2006&g=d&ignore=.csv

With a little (VERY little) trial & error, you might come to discover that after s= goes the ticker symbol, after a= the start month (minus 1), after b= the start day, c= the start year and so on. The final g= parameter lets you choose between getting historical stock information on a daily, weekly, or monthly basis. Construct your desired URL as a Matlab character string (let’s call it url_string).

Tricky Move #2 is to tunnel your way directly to a webpage from within Matlab, no browser required! This is done via Matlab’s java interface and your URL string like so:

Tricky Move #3 is to use that connection to read in the individual lines of the webpage’s source code into a buffer.

It’s really no harder than that – just put that last tidbit in a while loop and parse the buffer each time around to grab the data and store it in a matrix. It’s especially easy here since we’re working with CSV. Parsing HTML from other websites is a little trickier, but I’ll bet you’ll graduate quickly (gateway drug, I tell you).

Still sound too hard? OK, just click below to download my source code for free. See you in St. Maarten!

get_hist_stock_data.m

Just Florida. I’m still tweaking my algorithms…

122 thoughts on “How to Download Historical Stock Data into Matlab”

  1. I have read your blog from the start of your CFA trek since my post. I too came up with a pairs trading concept on my own. I started in with non linear models and at least 3 components–the stock price and commodity prices tied to the business. Why over complicate things, particularly if you have no idea what you are doing? For an oil stock, you could use oil price and something to tied to the dollar. Harmonic oscillators are fascinating!

    I’m contemplating the CFA exam. How old will I be in a few years if I don’t do it? I’m Looking forward to reading about the rest of your journey. You have related some invaluable insights for me!

    On Matlab, I was wondering if it was worth it to learn it. The scripting language in JMP is like C (I never formally learned C). The Matlab scripts you posted look more like VB (I do a lot of VBA with excel). Code examples abound for VB; I feel I spend too much time trying to figure out how to script something in JMP that would easy to do in VB. For now I think I can use excel and internet queries and then dump the data into JMP. I think I can convert some of your scripts.

    I’ll look into the paid options sites…recommendations?

  2. Hey!

    Is it possible to tweak your MATLAB code always use the IPO date as start date? Your code gives an error if you set a start date prior to the IPO of a stock. I am currently trying to solve this problem, I will post a code fix if I manage to correct it myself.

    Cheers!
    P.S. great blog!

    /Nemo

  3. Hi,

    I need historical data from Brazil stock xchange.

    Yahoo and google doesnt have those datas for several stocks.

    The ‘advfn’ provide the historical data i need.

    How can i modify this script to read historial data from ‘advfn’ ?

    advfn: http://www.advfn.com.br

    thanks a lot.

  4. Hello all,
    first of all I’d like to thank Lumilog for this very interesting code he shared.
    Also I wanted to ask if someone could explain a bit the part where he says that there must be a normalization between the the close and the adjusted close.. What is the relationship between them? And can you explain why?

    Also does anyone have experience with time series correlations? Can you suggest any metrics to use??
    Thanks a lot in advance..
    Regards

  5. Thanks for stopping by Lefteris. The normalization accounts for 2 things: stock splits and dividends. If you don’t normalize then stock splits could be misinterpreted by any quant algorithm you develop as 50% overnight losses for example. Similarly, if you don’t include the effect of dividends any computation of rate of return will be price moves only – not total returns. Plus stocks (theoretically) drop in price by the amount of the dividend before it’s paid. Hope that helps!

    lumi

  6. Hi again Lumi. First of all thanks a lot for your instant response. Yes I understood exactly what happens. I am an undergraduate computer engineer and now researching the area of stock market and prediction using large datasets. I have some questions concerning the correlation metrics used in order to determine if two time series are correlated or not. As the first step of my essay I am trying to build a Dynamic Bayesian Network that will map those metrics as probabilities that Xt-1 can determine Xt. I am thinking to use auto correlation but I have other thoughts too.. As I see you are a software engineer with knowledge of finance..so just what I need 🙂
    Is there any way you can spend some time chatting with me so I can ask you for your guidance? Do you have skype or another means to reach you?
    Thanks once more for your will to help.
    Yours,
    L.

  7. Hi Lumi,

    I had a problem and was just wondering if I am using a different Matlab version than you.

    I put in the link for the url_string. However the line right below that where it should gather the stock symbol after the s, it gives me an error.

    url_string = strcat(url_string, ‘&s=’, upper(stock_symbol) );

    It is this line of code. Should this line execute without a problem?

  8. Hi Michael. Are you passing the stock_symbol as a char string too? For example, the argument should be passed as ‘AAPL’, not AAPL.

  9. Hi Lumi,

    Yep that was my problem. Sorry for such a trivial question. I am fairly new to anything past matrices in Matlab!

    Thanks again,

    -Michael

  10. Hi Lumi,

    Thank you very much for sharing this beautiful script with us! As a newbie on MATLAB, may I ask you how to merge one or more columns into one and only variable? I tried the ‘Result = [hist_high hist_close]’ command. This works in the general case, but it doesn’t when date is involved as a argument (i.e. hist_date). In that case I just get an error message. Any hint?

    Thank you very very much in advance,
    Euangelos.

  11. Hi again,

    I just figured it out! The command is called ‘dataset’.
    Thanks again! Keep the nice work!

    Euangelos.

  12. Hi,
    I’m still having the same problem as an earlier poster where it is only returning dates, my function is:

    [hist_date, hist_high, hist_low, hist_open, hist_close, hist_vol] = get_hist_stock_data(stock_symbol)

    I’ve also tried it as (for example):

    [hist_date, hist_high, hist_low, hist_open, hist_close, hist_vol] = get_hist_stock_data(‘IMB’)

    However when I do this it tells me I have an invalid syntax. Where am I going wrong here?

    Cheers

  13. I think I see the problem. When I cut-and-pasted what you typed in above (fixing the ticker to IBM) it didn’t work for me either. The problem is that the font formatting of the blog changes ‘ to ` around the ticker name. When I changed that it worked fine.

    Note that when I use the function, I also put a ; on the end to suppress the output. If you do a matlab who command afterwards, you can see that all the data arrays are in the workspace.

    I’m going to send you an email so that I can cut-and-paste what I typed into Matlab. When I tried to cut-and-paste it here, the apostrophes get changed by wordpress.

  14. Worked right out of the box.
    Added a few %#ok because I am OC plus start-year in the parameter string and I am a very happy guy.
    Thanks.

  15. Hi Lumi,

    I accidentally stopped by your blog and can’t leave it. I’m using your get_hist_stock_data.m function. It’s awesome and saves me a lot of time.

    But in case I have two stocks or more and I want to put date, stock1 and stock2, … in a matrix, I’m wondering how Matlab can align the dates in order to avoid date mismatch between stocks.

    Any suggestion?

    Many thanks.

  16. thanks for stopping by. the way i always handle multiple stocks w/ different records is to read them into different variables (or cells), find the one with minimum length, and cut all others down to that minimum length so that you can put them in a matrix. have fun!

  17. That what I usually do, but sometime even you have 2 stocks with same length, it doesn’t mean all dates are the same. For example, some markets closed the day Sandy arrived, other did not.

    To solve it, I use Stata to merge date buy date multiple stocks and put them into Matlab, but it’s a waste of time.

    Anyway, thank you for your quick reply.

  18. Awesome resource. I am totally new to Matlab, so excuse me if this is ridiculous. When I run the code I am just getting an output of dates but none of the actual stock data. What’s going wrong here?

    Thanks!

  19. This is nuts. I am an undergrad Engineering major and I have to take a course on MATLAB. Since my first day in the class I have been working on making a stock picker with it. (I am actually sitting in the class as I write this). Thank you so much for posting this code!

  20. Hey Lumi my friend, I’m just trying to get this thing up and operational, and I seem to be able to download some of the data from Yahoo! However, I’m finding that the output is only the consecutive dates, meaning that the price etc. is not included. Where is the user choking up?

  21. Curtis – sounds like you’re calling the function w/o the output arguments (left hand side of equal sign). Call it like this:

    [hist_date, hist_high, hist_low, hist_open, hist_close, hist_vol] = get_hist_stock_data(stock_symbol);

    where stock_symbol is a string with the ticker

  22. Presently, my function is as follows:

    [hist_date, hist_high, hist_low, hist_open, hist_close, hist_vol] = get_hist(stock_symbol)

    However, I’m still getting an output of solely dates.

  23. did you include the “;” at the end? i think if you don’t it prints out the days to the screen. however, your data should still be in those left-hand side arrays.

    let me know,
    lumi

  24. Yes sir, I put the “;” after [hist_date, hist_high, hist_low, hist_open, hist_close, hist_vol] = get_hist(goog)

  25. I’m also having trouble getting the starting and ending dates to change, despite the fact that I am showing the start date to be 2/3/XXXX and end date to be (3-1)/24/2013

  26. Are you using the single apostrophes around GOOG? I’ve had a problem on the blog here before with cut and past b/c the font scheme changes ‘ to `.

    I just re-downloaded my file from the link above, and entered the following in Matlab:

    [hist_date, hist_high, hist_low, hist_open, hist_close, hist_vol] = get_hist_stock_data(‘GOOG’);

    Then when it finishes, I typed

    >> hist_high(1:10)

    and got

    ans =

    104.0600
    109.0800
    113.4800
    111.6000
    108.0000
    107.9500
    108.6200
    105.4900
    103.7100
    102.9700

    So the data is there and it’s working for me. Do you have an old version of Matlab? Mine is circa 2009.

  27. Whenever I put;

    [hist_date, hist_high, hist_low, hist_open, hist_close, hist_vol] = get_hist_stock_data(‘GOOG’)

    in the same script, my URL string looks like;

    url_string = ‘http://ichart.finance.yahoo.com/table.csv?’;
    url_string = strcat(url_string, ‘&s=’, ‘GOOG’ );
    url_string = strcat(url_string, ‘&d=’, num2str(this_month-1) );
    url_string = strcat(url_string, ‘&e=’, num2str(this_day) );
    url_string = strcat(url_string, ‘&f=’, num2str(this_year) );
    url_string = strcat(url_string, ‘&g=d&a=0&b=1&c=’, start_year);
    url_string = strcat(url_string, ‘&ignore.csv’);

    I get an error saying that the apostrophe is an unexpected MATLAB expression.

    However, if I don’t put the apostrophe surrounding goog on the output of the function, it will run but still post a matrix that is XX by 1, and when I type in “>>hist_high(1:10)” it gives me a message saying “undefined function or method ‘hist_high’ for input arguments of type ‘double’.

    *It should be apparent now that I’m clueless on the matter, so I really appreciate you working with me here.

  28. Curtis – I see what you’re doing wrong now. The script itself is a Matlab function and isn’t supposed to be edited.

    (1) Save my script unmodified to some directory on your harddrive.
    (2) Start Matlab and change directories to that directory from (1)
    (3) Call the function from the Matlab command line by typing in:
    [hist_date, hist_high, hist_low, hist_open, hist_close, hist_vol] = get_hist_stock_data(‘GOOG’);

    at the >> command line prompt in Matlab.

    IMPORTANT! if you cut-and-paste the line I wrote in (3) above, delete and add back in the two ‘ appostrophes around GOOG. My blog font automatically changes one to ` and the other to ‘ when they should both be ‘.

  29. Correct. I think I figured out how to work it as is. However, I’m trying to add some functionality where I can show a matrix of something of a select lift of dates and their corresponding values for, lets say, hist_close and hist_vol.

  30. Hi Lumilog, I read all the above threads and finally i figure-out how the code works! (i am dumb and new to matlab)

    i have few questions now,
    1) is there any way to change the code a little bit so that the “ans” can save in an Excel file? (since i need the data for further analysis)
    2) if 1) work, can make all the hist_date, hist_high, hist_low, hist_open, hist_close, hist_vol in one file at a time?

    arghhh, hope you understand what i talking.. i am urgent on this.. really appreciate your effort. you save my life in my final year proj!

  31. Hi Charise – if I understand what you’re asking you don’t need Matlab at all. Go back to my first picture in the post. See the yellow “Download to spreadsheet” link in the photo from Yahoo Finance. Just click on that (or right-click as the case may be) and you’ll have your Excel file. It’s actually CSV but Excel will open it just fine.

    – lumi

  32. Dear Lumi

    Love your code and it’s great that you help us users with problems.

    I’m baffled as to why &s= ‘DJI’ – which should access the Dow Jones Index – gets me a message that there is no such web page, where &s=’FTSE’ and just about any other stock symbol I try does work.

    Specifically, this works:

    http://ichart.finance.yahoo.com/table.csv?&s=^FTSE&d=3&e=10&f=2013&g=d&a=0&b=1&c=&ignore.csv

    while this does not:

    http://ichart.finance.yahoo.com/table.csv?&s=^DJI&d=3&e=10&f=2013&g=d&a=0&b=1&c=&ignore.csv

    Finally, Yahoo seems (nearly always) to give the open of one day as the close of the previous day, which is wrong. Not your fault really, but any ideas on how I can get the actual historic opening prices?

    Thanks

    Don

  33. Hey thanks for the comments!

    DJI (or ^DJI) is a special case at Yahoo Finance (I just noticed this a couple weeks ago) in that there is no historical data for it available for download. I think you can get around it by using DIA instead (ETF which tracks the DJI).

    Hope that helps!
    lumi

  34. PS I see I left out the start_year in the examples. That’s not the problem, just stick in ‘2012’ or anything else.

    Don

  35. Thanks, there is data there at s=’DIA’ – scaled by .1 and high by 2 points before the scaling – must be the spread.

    But I’m still a bit baffled because interactively you can download the DJI. I don’t know how to crack the url for that but I’ll report back if I get it.

    Don

  36. Great. Yes I tend to use the ETF proxies instead b/c you get the dividends too. ^GSPC has no dividend history for the S&P 500, but the SPY ETF does. Good luck! – lumi

  37. Hi!

    I just looked at this script , i would appriciate some help,

    When i run it it only returns ans as a n x 1 of dates, i do not get the prices?

    Regards

Leave a Reply

Your email address will not be published.