Retrieving Historical Stock Data in Python via Yahoo Finance

Retrieving historical stock data for analysis can be somewhat of a task. Many APIs that provide this information require some type of membership, account, or even fee before you have access to the data. Fortunately, Yahoo Finance offers the information free of charge on their site with the ability to download historic stock data. However, using the site to do this can become a hassle when you want to retrieve new data on the fly or when needing data for many different companies, EFTs, Index Funds, etc. In this post, I will show you how to use Python to create a class that can retrieve data for any ticker that Yahoo has data for. This can be done easily and on the fly for any number of tickers. In subsequent posts, I will be using this “API wrapper” for projects pertaining to stock price/time series simulation and analysis.

The API

I’m not familiar with any official APIs offered by Yahoo Finance but many people take advantage of the fact that historic data can be directly downloaded from the site. When downloading this data you’re essentially “navigating” to a different URL but rather than leaving the current page this URL will just download a file. We can take advantage of this using the Pandas library for Python.

The great thing about Pandas, besides almost everything, is the ability to pass into the data frame constructor anything that will result in a CSV file, this includes URLs that download CSVs. This is something taken advantage of in the following API.

The Class

We define a class to be used in other Python projects for the API and include a few libraries:

import datetime           # for start and end periods
import time                
import pandas as pd   # to return a data frame with the stock data

class YahooAPI(object):
    def __init__(self, interval="1d"):
        pass

    def __build_url(self, ticker, start_date, end_date):
        pass

    def get_ticker_data(self, ticker, start_date, end_date):
        pass

As seen above this API is going to be very simple and straightforward; its only intent is to retrieve historic stock data. The class’ init function takes one optional parameter to specify the interval in which we want the data returned. Valid options are 1d (one day), 1wk (one week), and 1m (1 month), by default we’ll just pull daily data.

The Functions

Now we’ll fill in the functions:

__init__(…)

def __init__(self, interval="1d"):
        self.base_url = "https://query1.finance.yahoo.com/v7/finance/download/{ticker}?period1={start_time}&period2={end_time}&interval={interval}&events=history"
        self.interval = interval

The setup is simple: define the base URL and store the desired interval.

__build_url(…)

def __build_url(self, ticker, start_date, end_date):
        return self.base_url.format(ticker=ticker, start_time=start_date, end_time=end_date, interval=self.interval)

The private method __build_url(…) takes the ticker symbol of the stock we’re interested in retrieving data for, a start date, and an end date (correctly formatted) and builds the URL that can be used to get the stock data.

get_ticker_data(…)

def get_ticker_data(self, ticker, start_date, end_date):
        # must pass datetime into this function
        epoch_start = int(time.mktime(start_date.timetuple()))
        epoch_end = int(time.mktime(end_date.timetuple()))

        return pd.read_csv(self.__build_url(ticker, epoch_start, epoch_end))

The get_ticker_data(…) function is the access point to the API from the developer wanting the stock data. This function takes the ticker symbol and start and end dates as Python datetime objects. The start and end dates are transformed into the correct format for Yahoo (timestamps representing time since the Unix epoch). A Pandas data frame containing all of the historic stock data is returned from this function call. As seen here, the data frame creation is easily done by passing the URL for the CSV file into the Pandas data frame constructor.

Test

if __name__ == '__main__':
    dh = YahooAPI()
    now = datetime.datetime(2020, 6, 28)    # get data up to 6/28/2020
    then = datetime.datetime(2020, 1, 1)        # get data from 01/01/2020
    df = dh.get_ticker_data("msft", then, now)
    print(df)

Finally, the above snippet of code is used to test the API. For those unfamiliar, the code after the if statement if __name__ == ‘__main__’: in Python essentially only executes if this source code is being used as the main entry point to the program. That is, if I run this file standalone the logic will execute, otherwise it will not (e.g. if the code in this file is imported into another file). For a quick-and-dirty test, I’ve used this if statement to verify I am able to able to fetch data for Microsoft stock (MSFT). Running this logic produces the following results:

           Date        Open        High  ...       Close   Adj Close    Volume
0    2020-01-02  158.779999  160.729996  ...  160.619995  159.737595  22622100
1    2020-01-03  158.320007  159.949997  ...  158.619995  157.748581  21116200
2    2020-01-06  157.080002  159.100006  ...  159.029999  158.156342  20813700
3    2020-01-07  159.320007  159.669998  ...  157.580002  156.714310  21634100
4    2020-01-08  158.929993  160.800003  ...  160.089996  159.210495  27746500
..          ...         ...         ...  ...         ...         ...       ...
118  2020-06-22  195.789993  200.759995  ...  200.570007  200.570007  32818900
119  2020-06-23  202.089996  203.949997  ...  201.910004  201.910004  30917400
120  2020-06-24  201.600006  203.250000  ...  197.839996  197.839996  36740600
121  2020-06-25  197.800003  200.610001  ...  200.339996  200.339996  27803900
122  2020-06-26  199.729996  199.889999  ...  196.330002  196.330002  54649200

[123 rows x 7 columns]

Conclusion

As seen above retrieving data from Yahoo Finance is very straightforward in Python. In under 20 lines of code we’ve developed the ability to get daily, weekly, or monthly data for any ticker symbol listed on Yahoo Finance. This data can be used for a plethora of applications including stock data analysis, training and testing machine learning algorithms, and developing stock trading bots. In future posts, I will use this API to build up portfolios of securities to work with some ideas in modern portfolio theory and stochastic process modeling.

Full Code

class YahooAPI(object):
    def __init__(self, interval="1d"):
        self.base_url = "https://query1.finance.yahoo.com/v7/finance/download/{ticker}?period1={start_time}&period2={end_time}&interval={interval}&events=history"
        self.interval = interval

    def __build_url(self, ticker, start_date, end_date):
        return self.base_url.format(ticker=ticker, start_time=start_date, end_time=end_date, interval=self.interval)

    def get_ticker_data(self, ticker, start_date, end_date):
        # must pass datetime into this function
        epoch_start = int(time.mktime(start_date.timetuple()))
        epoch_end = int(time.mktime(end_date.timetuple()))

        return pd.read_csv(self.__build_url(ticker, epoch_start, epoch_end))

if __name__ == '__main__':
    dh = YahooAPI()
    df = dh.get_ticker_data("msft", "01/01/2020", "2020-01-31")
    print(df)

3 thoughts on “Retrieving Historical Stock Data in Python via Yahoo Finance

  1. Thanks for putting this together. I was on the verge of writing something similar myself after discovering that pandas_datareader.data.DataReader pulls in the entire Yahoo finance page much like a web browser, which is a massive overhead, and then extracts the data from it. In contrast, this targets the exact data point.
    With this now, overheads gone. Latency and throughput improved, traffic reduced. Well done.
    I expanded your code a little bit to include support for requests.Session, which (in theory) allows some connection pooling for multiple requests and for proxy servers to be specified.
    Thanks again!

  2. One last thing (sorry!), to return the exact same results as pandas DataReader does (which a few people use), I’d include the datetime parsing and indexing in the get_ticker_data method:
    data[‘Date’] = pd.to_datetime(data[‘Date’], format=’%Y-%m-%d’)
    data.set_index(‘Date’, inplace=True)

    1. Thanks for the comments.

      That’s an excellent point. I’ve found myself using the API a few times now and having to convert the column to a datetime in the other scripts rather than the API. I’m actually not familiar with the DataReader in Pandas, although I have heard of it, so I wasn’t focused on being consistent with that but I think that would add a lot of convenience to my implementation.

      Thanks again!

Leave a Reply to amorast Cancel reply

Your email address will not be published. Required fields are marked *