Creating a Database with an API (Stock Price Data)

NOTE: because I live in Montana (and was too lazy to deal with time zones) the data stored in this database (therefore, the dates and times passed to the API endpoints) are recorded in Mountain Standard Time (Denver, CO’s time zone). I apologize in advance for forcing the conversion and inconvenience onto the user.

Typically it’s difficult to get high fidelity stock price and volume data via API. Most of the time the stock data services offer daily, weekly, monthly, and/or annual information for stocks. For example, in a previous post, I created a wrapper around the Yahoo Finance data download functionality to create an API that downloads daily, weekly, and monthly data. For some applications more granular data, such as one-, three-, and five-minute intervals, is useful, e.g. developing high-frequency trading algorithms or complex time-series forecasting research, but it’s often the case the end-user has to pay a monthly fee to access data at these frequencies.

Because of this, I’ve decided to create a database of my own storing stock price and volume data at 3-minute intervals. Three minutes was chosen as the interval in order to not bloat the database (and anger my database hosting service) and because it’s a good middle ground between the 1- and 5-minute intervals. The data will be accessible to view and download on my website and via an API to programmatically use the data. Below I outline the steps taken in creating these three components: the database, the website, and the API.

The focus of this post is creating a database of stock price and volume information collected at 3-minute intervals. However, the process implemented here could be used to collect any data and create an API to access the data. The only thing that would really change is the database structure, the SQL statements, and the way in which the data is gathered.

Collecting the Stock Data

In order to create an API to interact with some data, we’ll first need to gather the data. This is easily done with the AllyInvestPy library that I created and discussed in a previous post. This library is a wrapper around the Ally Invest API function calls which currently only support fetching the data from Ally Invest (order placing and other functionality is in development).

To begin we’ll need a few Python libraries,

import pandas as pd        # working with CSV files
from ally import *             # AllyInvestPy
import datetime      
import mysql.connector  # execute SQL queries to store the DB
import time                        
import pandas_market_calendars as mcal  #market calendars

these can be installed via

pip3 install pandas AllyInvestPy mysql-connector-python pandas_market_calendars

Access to the Ally Invest API is required to query the stock information. To access the API you need to generate and copy the OAuth keys from Ally, more details are given in their API documentation. I include these in the Python script as

CONSUMER_KEY = "CONSUMER_KEY"
OAUTH_TOKEN = "OAUTH_TOKEN"
OAUTH_SECRET = "OAUTH_SECRET"

With that setup, we can get into the gathering and storing of the data in Python. First the pandas_market_calendars library is used to determine if the market is open. Since the hours for the three big US exchanges (NASDAQ, NYSE, and AMEX) are all the same (at least similar, I believe), only the NYSE trading schedule is considered. This check (and some logging) is done in the logic below.

with open('log.dat', 'a') as f:
    f.write("Starting API script on {}\n".format(str(datetime.datetime.now().replace(microsecond=0))))
    nyse = mcal.get_calendar("NYSE")
    today = datetime.datetime.strftime(datetime.datetime.today(), '%Y-%m-%d')
    sched = nyse.schedule(start_date=today, end_date=today)
    now = datetime.datetime.now()
    m_open = now.replace(hour=7, minute=30, second=0, microsecond=0)
    m_close = now.replace(hour=14, minute=0, second=0, microsecond=0)
    if sched.shape[0] == 0 or now < m_open or now > m_close: # check if open for day
        f.write("\tThe market is closed, exiting...\n\n")
        exit()

There were a few issues when I began developing the script. For starters, Ally Invest only allows 60 API calls every minute, so to gather data for ~2,500 stocks would be impossible if done every 3 minutes. Fortunately, the API allows multiple tickers in a single request. After some trial and error, I found you can get data for about 495 tickers at a time (before the URL gets too long). To be safe, I only query 400 tickers at a time, as seen below.

# create the object to use the Ally Invest API (AllyInvestPy library)
ally = AllyAPI(OAUTH_SECRET, OAUTH_TOKEN, CONSUMER_KEY, response_format="json")
# read in the ticker symbols for the Russel 2000 and S&P 500
snp = pd.read_csv('sp500tickers.csv')['symbol']
russel = pd.read_csv('russel2000.csv')['symbol']

# multiple tickers count as one request! how many can we do at once?
# the limit is ~495 tickers per request; we'll use 400 to be safe
tickers_per_request = 400
tickers = snp.tolist()
tickers.extend(russel.tolist())
tickers = list(dict.fromkeys(tickers))
ticker_size = len(tickers)
with open('log.dat', 'a') as f:
    f.write('\tLoading {} tickers.\n'.format(ticker_size))
# create a list of list for the 'chunks' of tickers to be queried together 
# (400 tickers each)
tickers = [tickers[x:x+tickers_per_request] for x in range(0, len(tickers), tickers_per_request)]

# save all the data first so the fetches are as close to the minute as possible
# and aren't slowed down by database writes
data_to_enter = []
# for each chunk of 400 tickers
for ticker_list in tickers:
    # create a QuoteRequest for the 400 tickers (AllyInvestPy library)
    quote_request = QuotesRequest(symbols=ticker_list)
    response = quote_request.execute(ally)
    fetch_time = datetime.datetime.now().replace(microsecond=0)
    for quote in response.get_quotes():    # response data from API call 
        # ticker, volume of last trade, cumulative volume, last trade price, 
        # bid price, ask price
        if quote.symbol != 'na':    # is the ticker valid
            incr_vl = float(quote.incr_vl) if quote.incr_vl != '' else 0
            vl = float(quote.vl) if quote.vl != '' else 0
            data_to_enter.append((quote.symbol, incr_vl, vl, float(quote.last), float(quote.bid), float(quote.ask), str(fetch_time)))

With the data gathered from the Ally Invest API and stored in the data_to_enter list, it’s time to write the data to the database, which brought up another issue. Initially, I was inserting the data one row at a time; doing this took ~3.25 minutes to run the script… Such a long database write would force me to change the interval to four or five minutes. Fortunately, mysql-connector offers functionality for batch executions (cursor.executemany(…)). Using this function took the database write time from ~165 seconds to 0.89 seconds! (Words of advice: batch write to the database whenever possible.) The final code for this section, which writes the data to the database, is provided below.

database = mysql.connector.connect(
    host='database_server',
    user='db_user',
    passwd='db_usr_passwd',
    database='database_name'
)
cursor = database.cursor()

with open('log.dat', 'a') as f:
    f.write('\tWriting data for {} tickers to the database.\n'.format(len(data_to_enter)))
    f.write('\tMissing data for {} tickers.\n'.format(ticker_size - len(data_to_enter)))

stock_data_insert_sql = \
    "INSERT INTO table_name (ticker, last_trade_volume, cumulative_volume, \
    last_trade_price, bid_price, ask_price, datetime) \
    VALUES (%s, %s, %s, %s, %s, %s, %s);"

# cut the database time from ~163 seconds to ~0.89 seconds
# entire script runs in 11.019s
try:
    cursor.executemany(stock_data_insert_sql, data_to_enter)
    database.commit()
except Exception as e:
    with open('log.dat', 'a') as f:
        f.write("\tError writing to database:\n")
        f.write('\t' + str(e))
        f.write('\n')
    database.rollback()

# removed the timings from the logic above, the timing variables
# will be included in the Full Code at the end of the post
with open('log.dat', 'a') as f:
    f.write("\tScript Completed. Timing: {}\tAPI Time: {}\tDB Time \
        {}\n\n".format((db_end - start), (api_end - start), (db_end - db_start)))

The first day data is available is 11/19/2020 since the project wasn’t ready to be deployed until then (data gathering started in the late morning due to a software bug). The script is run via a cron job (on a Raspberry Pi 3 B+ that I had lying around which is easily capable of being left running uninterrupted for long periods of time) and collects data for the tickers in the S&P 500 and the Russel 2000 index, removing any overlapping tickers (data for 2,471 tickers is inserted into the database every three minutes).

With data now being collected, an API and website can be built to interact with the database.

Building the API

At first I wanted to use a modern web development language/framework, Node.js/Express.js in particular. However, after spending an entire evening not being able to get node running on my DreamHost shared server (I read all of their documentation on setup and some other stuff online), I decided to go with PHP. I would have used Python with Flask but was a little tired of writing Python code for the web after developing my projects’ website and another project of mine (BookMotifs.com) with Django. I’ve only ever done web development for personal projects and have no professional or academic experience with web development. So instead of trying to figure out some obscure Phusion/Passenger errors from the web server logs I went with the tried, true, and easy-to-use PHP.

The API request endpoint is

http://stockdata.anthonymorast.com/api/v1/

and currently offers only three functions which are all GET requests:

  • fetch – retrieve stock data between two datetimes for a list of tickers
  • tickers – retrieve a list of tickers currently in the database
  • dates – determine the start and end dates for which data is available.

The PHP code for all three of the functions is very similar with the only things that really change are the SQL statements, the JSON array, and the CSV headers. Because of this I will exclude the logic from the tickers and dates functions but will opt to only include the logic for the fetch endpoint (due to it being slightly more complex than the other two).

tickers

The API call is used to determine which stock ticker symbols are available in the database. The endpoint for this request is

http://stockdata.anthonymorast.com/api/v1/tickers/

There is one optional parameter, download, which, if specified, will force the API to return a CSV file filled with the available tickers, by default a JSON file is returned. That is, the endpoint

http://stockdata.anthonymorast.com/api/v1/tickers?download

will return a CSV file with one column containing all of the tickers available in the database while the previous call will return JSON. Below are sample outputs of the call. It should be noted that the download parameter is available for all of the function calls. Although JSON can typically be translated to CSV fairly easily or just kept in the JSON format, some utility of the parameter is seen in the section on the fetch endpoint

CSV
ticker
A
AA
AAL
AAN
AAOI
AAON
AAP
AAPL
AAT
AAWW
ABBV
ABC
ABCB
ABEO
JSON
[{"ticker":"A"},{"ticker":"AA"},{"ticker":"AAL"},{"ticker":"AAN"},{"ticker":"AAOI"},{"ticker":"AAON"},{"ticker":"AAP"},{"ticker":"AAPL"},{"ticker":"AAT"},{"ticker":"AAWW"},{"ticker":"ABBV"},{"ticker":"ABC"},{"ticker":"ABCB"},{"ticker":"ABEO"},{"ticker":"ABG"},{"ticker":"ABM"},{"ticker":"ABMD"},{"ticker":"ABR"},{"ticker":"ABT"},{"ticker":"ABTX"},{"ticker":"AC"},{"ticker":"ACA"},{"ticker":"ACBI"},{"ticker":"ACCO"},{"ticker":"ACEL"},
....,
{"ticker":"YEXT"},{"ticker":"YMAB"},{"ticker":"YORW"},{"ticker":"YUM"},{"ticker":"ZBH"},{"ticker":"ZBRA"},{"ticker":"ZEUS"},{"ticker":"ZGNX"},{"ticker":"ZION"},{"ticker":"ZIOP"},{"ticker":"ZIXI"},{"ticker":"ZNTL"},{"ticker":"ZUMZ"},{"ticker":"ZUO"},{"ticker":"ZYXI"}]

dates

The dates method is just as simple as the tickers method. The function returns the first date/time that data was recorded (MIN_DATE) and the most recent date/time that data was recorded (MAX_DATE). Like above, a download parameter can be specified as described above. The request endpoints are shown below, with and without the download parameter.

http://stockdata.anthonymorast.com/api/v1/dates

http://stockdata.anthonymorast.com/api/v1/dates?download

Example Output (JSON)
[{"min_date":"2020-11-19 10:09:36", "max_date":"2020-12-03 13:57:35"}]

fetch

The most useful endpoint is the fetch method which allows the querying of the database. To do so, there are three required GET parameters

  1. start_date – the first date/time for which data should be queried
    • Format: yyyy-mm-dd hh:mm:ss
      • The hours, minutes, and seconds are optional if you don’t need that level of fidelity in your query
  2. end_date – the last date/time for which data should be queried
    • Format: yyyy-mm-dd hh:mm:ss
      • As above, the hours, minutes, and seconds are optional if you don’t need that level of fidelity in your query
  3. tickers – a comma-separated list of tickers to get data for
    • Example: …?tickers=aapl, msft, intc, amd,tsla

Like with the previous methods, there is an optional download parameter that changes the return data to CSV format. The parameters can be specified in any order in the URL.

The base endpoint is

http://www.stockdata.anthonymorast.com/api/v1/fetch/

Examples

The following query will return all the stock and volume data for Apple between 12 PM and 1 PM on December 2nd. The data will be returned as a CSV.

http://www.stockdata.anthonymorast.com/api/v1/fetch/?start_date=2020-12-02%2012:00:00&end_date=2020-12-02%2013:00:00&tickers=aapl&download

Results

ticker,datetime,last_trade_price,bid_price,ask_price,cumulative_volume,last_trade_volume
AAPL,"2020-12-02 12:00:20",122.96,122.96,122.97,56656757,249
AAPL,"2020-12-02 12:03:18",123.01,122.99,123.01,57065969,1500
AAPL,"2020-12-02 12:06:21",122.99,122.98,122.99,57364938,300
AAPL,"2020-12-02 12:09:14",122.89,122.89,122.90,57750025,100
AAPL,"2020-12-02 12:12:17",122.88,122.88,122.89,58037924,100
AAPL,"2020-12-02 12:15:14",122.80,122.80,122.81,58328696,100
AAPL,"2020-12-02 12:18:19",122.98,122.97,122.98,58688903,232
AAPL,"2020-12-02 12:21:19",123.05,123.04,123.05,59199234,100
AAPL,"2020-12-02 12:24:15",122.99,122.99,123.00,59528437,100
AAPL,"2020-12-02 12:27:14",122.98,122.97,122.98,59961261,190
AAPL,"2020-12-02 12:30:10",123.01,123.00,123.01,60276434,500
AAPL,"2020-12-02 12:33:15",122.98,122.98,122.99,60828573,800
AAPL,"2020-12-02 12:36:12",122.89,122.89,122.90,61246634,100
AAPL,"2020-12-02 12:39:18",122.82,122.81,122.82,61632041,200
AAPL,"2020-12-02 12:42:19",122.79,122.78,122.79,62128875,100
AAPL,"2020-12-02 12:45:20",122.88,122.87,122.88,62511260,250
AAPL,"2020-12-02 12:48:11",122.88,122.87,122.88,62782834,100
AAPL,"2020-12-02 12:51:22",122.88,122.87,122.88,63101577,300
AAPL,"2020-12-02 12:54:18",122.95,122.94,122.95,63511453,200
AAPL,"2020-12-02 12:57:13",122.82,122.81,122.82,63842111,100

The following URL will fetch 2 days’ worth of data for Intel (INTC) and AMD (AMD). The data will not be downloaded as a CSV but will be returned as JSON.

http://www.stockdata.anthonymorast.com/api/v1/fetch/?start_date=2020-11-19&end_date=2020-11-20&tickers=intc,amd

The results are too long to show here but you can navigate to the URL yourself to view them. This example was primarily used to demonstrate the exclusion of the hours, minutes, and seconds from the start and end dates.

To demonstrate the JSON return value 10 minutes of data for AMD, Intel, and Xilinx (XLNX) will be queried via the URL below.

http://www.stockdata.anthonymorast.com/api/v1/fetch/?start_date=2020-11-19 11:00:00&end_date=2020-11-19 11:10:00&tickers=intc,amd,xlnx

Results

{"AMD":[{"last_trade_volume":"100","cumulative_volume":"30430637","last_trade_price":"84.98","bid_price":"84.96","ask_price":"84.97","datetime":"2020-11-19 11:00:10","ticker":"AMD"},{"last_trade_volume":"100","cumulative_volume":"30627289","last_trade_price":"84.95","bid_price":"84.95","ask_price":"84.96","datetime":"2020-11-19 11:03:12","ticker":"AMD"},{"last_trade_volume":"100","cumulative_volume":"30900181","last_trade_price":"84.98","bid_price":"84.97","ask_price":"84.98","datetime":"2020-11-19 11:06:14","ticker":"AMD"}],
"INTC":[{"last_trade_volume":"200","cumulative_volume":"11660382","last_trade_price":"45.11","bid_price":"45.10","ask_price":"45.11","datetime":"2020-11-19 11:00:13","ticker":"INTC"},{"last_trade_volume":"100","cumulative_volume":"11735738","last_trade_price":"45.14","bid_price":"45.14","ask_price":"45.15","datetime":"2020-11-19 11:03:14","ticker":"INTC"},{"last_trade_volume":"1000","cumulative_volume":"11820830","last_trade_price":"45.13","bid_price":"45.12","ask_price":"45.13","datetime":"2020-11-19 11:06:09","ticker":"INTC"}],
"XLNX":[{"last_trade_volume":"100","cumulative_volume":"1345392","last_trade_price":"132.60","bid_price":"132.60","ask_price":"132.71","datetime":"2020-11-19 11:00:10","ticker":"XLNX"},{"last_trade_volume":"100","cumulative_volume":"1350235","last_trade_price":"132.56","bid_price":"132.48","ask_price":"132.55","datetime":"2020-11-19 11:03:09","ticker":"XLNX"},{"last_trade_volume":"100","cumulative_volume":"1361990","last_trade_price":"132.54","bid_price":"132.52","ask_price":"132.57","datetime":"2020-11-19 11:06:15","ticker":"XLNX"}]}

PHP Code

The PHP logic for the API is short and sweet. The first thing we need is the database.php file which houses the database connection parameters and connects to the database.

<?
class Database {
    private $host = "database_hostname";
    private $db_name = "database_name";
    private $username = "database_username";
    private $password = "database_user_password";
    private $conn;

    public function getConnection() {
        $this->conn = null;

        $this->conn = new mysqli($this->host, $this->username, $this->password, $this->db_name);
        if($this->conn->connect_error) {
            die("Connection failed.".$this->conn->connect_error);
        }
        return $this->conn;
    }
}
?>

Little explanation is needed for the code above. An attempt is made to connect to the database, if it’s successful the connection object is returned otherwise the script errors out.

With that, we’re able to start performing database queries. Note that the logic below is almost the same for all of the API methods. The only changes are the preprocessing, the output format, and the SQL queries.

One other thing to note is that all of the PHP scripts are named index.php. This allows us to navigate to the directory, i.e.

http://www.stockdata.anthonymorast.com/api/v1/fetch/

rather than having to actually specify the PHP file’s name, i.e.

http://www.stockdata.anthonymorast.com/api/v1/fetch/index.php

To me, this looks way cleaner (and it helps hide the fact that I’m using PHP :D). Therefore, each API endpoint is a sub-directory of the http://stockdata.anthonymorast.com/api/v1 directory with an index.php file in it. When navigating to the particular subdirectory on the webserver, the index.php file is ‘called’ allowing the API to do its job.

To get the PHP code started the GET parameters are checked and stored as variables if they are available.

<?
include '../database.php';

$start = isset($_GET["start_date"]) ? $_GET["start_date"] : null;
$end = isset($_GET["end_date"]) ? $_GET["end_date"] : null;
$tickers = isset($_GET["tickers"]) ? $_GET["tickers"] : null;
$download = isset($_GET["download"]);

I don’t actually error-check the parameters since, if they are incorrect or unavailable, the SQL will return no results or error. Either way, an error message is returned from the PHP script.

All of the preprocessing done in the script is on the tickers list. First, the spaces are removed from the list. Then, the list is translated into a string format that is expected by MySQL.

$tickers = str_replace(" ", "", $tickers);
$tickers = explode(",", $tickers);

$str = '';
for($i = 0; $i < count($tickers)-1; $i++) {
    $str = $str."'".$tickers[$i]."',";
}
$str = $str."'".$tickers[count($tickers)-1]."'";

Now we can format the SQL query, query the database and return the results in the format determined by the download parameter.

$db = new Database();
$conn = $db->getConnection();

$query = "select ticker, datetime, last_trade_price, bid_price, ask_price, cumulative_volume, last_trade_volume ".
         "from stock_data ".
         "where ticker in (".$str.") and datetime > '".$start."' and datetime < '".$end."';";
$result = $conn->query($query);

$result_array = array();
$tmp_filename = "tmp".round(microtime(true)*1000).".csv";
$csv_file = fopen($tmp_filename, "w+");
fputcsv($csv_file, array("ticker", "datetime", "last_trade_price", "bid_price", "ask_price", "cumulative_volume", "last_trade_volume"));
if($result->num_rows > 0) {
    while($row = $result->fetch_assoc()) {
        $result_array[strtoupper($row["ticker"])][] = array("last_trade_volume" => 
                    $row["last_trade_volume"],
                    "cumulative_volume" => $row["cumulative_volume"], 
                    "last_trade_price" => $row["last_trade_price"],
                    "bid_price" => $row["bid_price"], "ask_price"=>$row["ask_price"], 
                    "datetime"=>$row["datetime"], "ticker"=>$row["ticker"]);
        fputcsv($csv_file, $row);
    }
} else {
    $erorr_arr = array();
    echo json_encode(array("ERROR"=>"No results"));
    return;
}

if (!$download) {
    // return the JSON
    echo json_encode($result_array);
} else {
    // download the CSV file
    fseek($csv_file, 0);
    header('Content-Type: application/csv');
    header('Content-Disposition: attachment; filename="'.$tmp_filename.'";');
    fpassthru($csv_file);
}

You might notice that both the CSV and JSON objects are created regardless of the download parameter. This might waste a small amount of processing time but doesn’t seem to greatly affect the performance of the API. In my opinion, this looks cleaner than having “if($download)” statements everywhere and potentially forcing code duplication.

The full code for the example index.php file is provided below. The database.php file was presented here in its entirety.

Easily Importing Data to Python/Pandas and R

In this section, the utility of the download parameter will be accentuated. As mentioned above, the parameter, at first glance, can appear pretty useless since the translation can (pretty) easily be done from JSON to CSV. However, when using many tabular data structures (R and Pandas data frames) a CSV file is required to create the object directly from the data. Fortunately, in both of these libraries, a URL that results in a CSV file is also allowed. Therefore, by specifying the download parameter, Pandas and R data frames can quickly and easily be created with the results of the API’s fetch call.

Python

By using the Python REPL loop, we can create a quick example of using this API with Pandas. A data frame containing the 3-minute stock data can be created in one line of code (after importing Pandas).

>>> import pandas as pd
>>> data = pd.read_csv('http://stockdata.anthonymorast.com/api/v1/fetch/?tickers=aapl,gme&download&start_date=2020-11-01&end_date=2020-12-01')

Viewing the data frame gives the output below.

>>> data
     ticker             datetime  last_trade_price  bid_price  ask_price  cumulative_volume  last_trade_volume
0      AAPL  2020-11-19 10:09:42            118.26     118.26     118.27           35592894               1200
1      AAPL  2020-11-19 10:15:18            118.13     118.13     118.14           36178822                100
2      AAPL  2020-11-19 10:18:14            118.05     118.04     118.05           36824213                101
3      AAPL  2020-11-19 10:21:12            118.05     118.04     118.05           37864577                200
4      AAPL  2020-11-19 10:24:14            117.87     117.87     117.88           38334278                100
...     ...                  ...               ...        ...        ...                ...                ...
1699    GME  2020-11-30 13:45:21             17.02      17.00      17.03           27892467                125
1700    GME  2020-11-30 13:48:22             16.91      16.91      16.94           28050085                185
1701    GME  2020-11-30 13:51:13             16.83      16.82      16.84           28335777               1600
1702    GME  2020-11-30 13:54:15             16.86      16.85      16.87           28549672                100
1703    GME  2020-11-30 13:57:20             16.74      16.73      16.74           28864732                100

[1704 rows x 7 columns]

R

The process in R is even more straightforward (although, essentially, exactly the same) since no extra libraries are needed to use data frames. The R code below loads stock data for Apple, Intel, and Wal-Mart into a data frame in one line of code.

data <- read.csv('http://stockdata.anthonymorast.com/api/v1/fetch?download&start_date=2020-11-01&end_date=2020-12-01&tickers=aapl,wmt,intc')

Stock Data Download/Viewing Site

Sometimes, we don’t want to use APIs and programming languages but still have a use for this data. For that reason, I’ve created a website where the stock data can be viewed and downloaded as a CSV file after filling out a simple form. I won’t provide the code for the website since HTML, CSS, and JavaScript can be a little boring and the post is already pretty long. However, below are some screenshots of the website and its URL.

http://www.stockdata.anthonymorast.com/

The home page for the website: http://stockdata.anthonymorast.com
Clicking the ‘View Data’ button creates a table listing the data for the parameters specified in the form.

As seen here, the website is not very pretty (it’s extremely basic), but hopefully offers some useful functionality nonetheless. Also seen in the screenshots above is a ‘Download Data’ button. Clicking this button will download a CSV file containing the data corresponding to the specified parameters (similar to what is done when specifying the download parameter for the API endpoints).

PHP and Python Code

The HTML/CSS and JavaScript code used to create the website and handle the API calls was omitted since it isn’t very interesting.

fetch Method

<?
    ini_set('display_errors', 1);
    include '../database.php';

    $start = isset($_GET["start_date"]) ? $_GET["start_date"] : null;
    $end = isset($_GET["end_date"]) ? $_GET["end_date"] : null;
    $tickers = isset($_GET["tickers"]) ? $_GET["tickers"] : null;
    $download = isset($_GET["download"]);

    $tickers = str_replace(" ", "", $tickers);
    $tickers = explode(",", $tickers);

    $str = '';
    for($i = 0; $i < count($tickers)-1; $i++) {
        $str = $str."'".$tickers[$i]."',";
    }
    $str = $str."'".$tickers[count($tickers)-1]."'";

    $db = new Database();
    $conn = $db->getConnection();

    $query = "select ticker, datetime, last_trade_price, bid_price, ask_price, cumulative_volume, last_trade_volume ".
             "from stock_data ".
             "where ticker in (".$str.") and datetime > '".$start."' and datetime < '".$end."';";
    $result = $conn->query($query);

    $result_array = array();
    $tmp_filename = "tmp".round(microtime(true)*1000).".csv";
    $csv_file = fopen($tmp_filename, "w+");
    fputcsv($csv_file, array("ticker", "datetime","last_trade_price","bid_price","ask_price","cumulative_volume","last_trade_volume"));
    if($result->num_rows > 0) {
        while($row = $result->fetch_assoc()) {
            $result_array[strtoupper($row["ticker"])][] = array("last_trade_volume"=>$row["last_trade_volume"],
                        "cumulative_volume" => $row["cumulative_volume"], "last_trade_price" => $row["last_trade_price"],
                        "bid_price" => $row["bid_price"], "ask_price"=>$row["ask_price"], "datetime"=>$row["datetime"], "ticker"=>$row["ticker"]);
            fputcsv($csv_file, $row);
        }
    } else {
        $erorr_arr = array();
        echo json_encode(array("ERROR"=>"No results"));
        return;
    }

    if (!$download) {
        echo json_encode($result_array);
    } else {
        fseek($csv_file, 0);
        header('Content-Type: application/csv');
        header('Content-Disposition: attachment; filename="'.$tmp_filename.'";');
        fpassthru($csv_file);
    }
?>

Python Data Capture

import pandas as pd
from ally import *
import datetime
import mysql.connector
import time
import pandas_market_calendars as mcal

CONSUMER_KEY = "..."
CONSUMER_SECRET = "..."
OAUTH_TOKEN = "..."
OAUTH_SECRET = "..."

if __name__ == '__main__':
    # markets open: 9:30AM EST
    # markets close: 4:00PM EST
    # just going to assume they all have the same trading days

    with open('log.dat', 'a') as f:
        f.write("Starting API script on {}\n".format(str(datetime.datetime.now().replace(microsecond=0))))
        nyse = mcal.get_calendar("NYSE")
        today = datetime.datetime.strftime(datetime.datetime.today(), '%Y-%m-%d')
        sched = nyse.schedule(start_date=today, end_date=today)
        now = datetime.datetime.now()
        m_open = now.replace(hour=7, minute=30, second=0, microsecond=0)
        m_close = now.replace(hour=14, minute=0, second=0, microsecond=0)
        if sched.shape[0] == 0 or now < m_open or now > m_close: # check if open for day
            f.write("\tThe market is closed, exiting...\n\n")
            exit()

    ally = AllyAPI(OAUTH_SECRET, OAUTH_TOKEN, CONSUMER_KEY, response_format="json")
    snp = pd.read_csv('/home/pi/stock_data_api/sp500tickers.csv')['symbol']
    russel = pd.read_csv('/home/pi/stock_data_api/russel2000.csv')['symbol']

    # multiple tickers counts as one request! how many can we do at once?
    # the limit is ~495 tickers per request; we'll use 400 to be safe
    tickers_per_request = 400
    tickers = snp.tolist()
    tickers.extend(russel.tolist())
    tickers = list(dict.fromkeys(tickers))
    ticker_size = len(tickers)
    tickers = [tickers[x:x+tickers_per_request] for x in range(0, len(tickers), tickers_per_request)]

    # save all the data first so the fetches are as close to the minute as possible
    # and aren't slowed down by database writes
    start = time.time()
    data_to_enter = []
    for ticker_list in tickers:
        quote_request = QuotesRequest(symbols=ticker_list)
        response = quote_request.execute(ally)
        fetch_time = datetime.datetime.now().replace(microsecond=0)
        for quote in response.get_quotes():
            # ticker, volume of last trade, cumulative volume, last trade price, bid price, ask price
            if quote.symbol != 'na':
                incr_vl = float(quote.incr_vl) if quote.incr_vl != '' else 0
                vl = float(quote.vl) if quote.vl != '' else 0
                data_to_enter.append((quote.symbol, incr_vl, vl, float(quote.last), float(quote.bid), float(quote.ask), str(fetch_time)))
    api_end = time.time()

    db_start = time.time()
    database = mysql.connector.connect(
        ...
    )
    cursor = database.cursor()

    # seconds to complete so we tried to find faster ways to perform the insert.
    stock_data_insert_sql = \
        "INSERT INTO stock_data (ticker, last_trade_volume, cumulative_volume, last_trade_price, bid_price, ask_price, datetime) \
        VALUES (%s, %s, %s, %s, %s, %s, %s);"

    # cut the database time from ~163 seconds to ~0.89 seconds
    # entire script runs in 11.019s
    try:
        cursor.executemany(stock_data_insert_sql, data_to_enter)
        database.commit()
    except Exception as e:
        with open('/home/pi/stock_data_api/log.dat', 'a') as f:
            f.write("\tError writing to database:\n")
            f.write('\t' + str(e))
            f.write('\n')
        database.rollback()

    db_end = time.time()

    with open('/home/pi/stock_data_api/log.dat', 'a') as f:
        f.write("\tScript Completed. Timing: {}\tAPI Time: {}\tDB Time: {}\n\n".format((db_end - start), (api_end - start), (db_end - db_start)))

Leave a Reply

Your email address will not be published. Required fields are marked *