Goal

I’m in what will be a very long process of building my own personal database of stock prices and related data. One of the first points of data I aim to collect is daily price data of some of the major indices, i.e., Dow Jones, Nasdaq etc.

This post will go over the first steps on how i'm capturing that data. The API service that is being used, the SQL Server table structure and the loading of data from the API call into the database using python.

Tools and Services used

SQL Server 2019
Visual Studio Code
SQL Server Database Project
Python 3.9
Jupyter plugin in Visual Studio Code
Rapid API service

https://rapidapi.com/asepscareer/api/yfinance-stock-market-data/

API Service Chosen

The data provider that I’ve chosen for capturing the market data is from Rapid API. Unfortunately, a lot of stock price providers don't contain index data.

https://rapidapi.com/asepscareer/api/yfinance-stock-market-data/

API Call Response Structure

The call i'm going to be using is the PriceCustomDate method call. It takes a symbol, start date, and end date for a date range to select from. What I want to do is create a table in the SQL Database that will store the data from the API Call.

data:[] 82 items
message:"Success"
status:200

data:[] 82 items

▶
0:{} 7 keys
- Adj Close:180.9597320557
- Close:182.0099945068
- Date:1641168000000
- High:182.8800048828
- Low:177.7100067139
- Open:177.8300018311
- Volume:104487900

Database Table Structure

For the purposes of simply loading the data the table consists of all strings for column types for the data points being uploaded. The other columns added are meta data used for traceability.

CREATE TABLE [CODE_STORM].[Daily_Price_Load]

(

DailyPriceId BIGINT IDENTITY(1,1) NOT NULL PRIMARY KEY,

[index] NVARCHAR(250),

Ticker NVARCHAR(250),

Adjusted_Close NVARCHAR(250),

Close_ NVARCHAR(250),

Date_ NVARCHAR(250),

High_ NVARCHAR(250),

Low_ NVARCHAR(250),

Open_ NVARCHAR(250),

Volume_ NVARCHAR(250),

Price_Date NVARCHAR(250),

EntityId BIGINT NOT NULL,

Load_Date DATE NOT NULL,

Load_Time DATETIME2 NOT NULL,

Source_ NVARCHAR(500) NOT NULL,

Source_Dataset NVARCHAR(255) NOT NULL,

Source_API_Call NVARCHAR(2000) NOT NULL

)

Some things to take a note of.

The schema CODE_STORM is created because that is the name of service the data is coming from putting grouping all tables and procedures used in association with that service under the same schema allows for easy convention structure to quickly know about those database objects.
All the incoming data from the API Call is stored as strings so that it reduces the chances of running into an error when loading in the data into the database.
This is only the first step of loading in the data. It will be converted into the actual data types like Dates and Decimals in another step.
The other columns that are for keeping meta data about the data being loaded.

Loading Data With Python:

Using Python is a quick and easy way to set up a script to pull data from the API Call and load into a SQL Database.

First thing I created a Python script to handle the connection to the database. This is in its own file.

import sqlalchemy as sa
import urllib as ur

def get_sql_engine(db):
    conn_str = ""

    if db == "StockTradingConfig":
        conn_str = "DRIVER={ODBC Driver 17 for SQL Server};"
        "SERVER={Computer_Name}\SQLSERVER2019;"
          "DATABASE=StockTradingConfig;UID={User_Id};PWD={User_Password}"
    elif db == "StockTradingStage":
        conn_str = "DRIVER={ODBC Driver 17 for SQL Server};"
        "SERVER={Computer_Name}\SQLSERVER2019;"
        "DATABASE=StockTradingStage;UID={User_Id};PWD={User_Password}"

    params = ur.parse.quote_plus(conn_str)
    return sa.create_engine('mssql+pyodbc:///?odbc_connect={}'.format(params))

Next is the Python file to do the API call then do some slight transformation and load into the SQL Database.

import requests as req

import pandas as pd
import sql_db as db
import datetime
from datetime import datetime as date
import time 

# database names
stage_database = "StockTradingStage"
config_database = "StockTradingConfig"
source = "Code Storm"

# column names that that pandas columns need
# to be changed into to math the database
transfomed_columns = [
    "Adjusted_Close", "Close_", "Date_",
    "High_", "Low_", "Open_",
    "Volume_"
]

# gets maket index info (tickers) from database
def get_market_index_frame():
    engine = db.get_sql_engine(config_database)
    cmd = "SELECT * FROM MARKET.Index_;"
    frame = pd.read_sql(cmd, con = engine)

    return frame    

# creates call to api services for a ticker and 
# selection date range
def get_api_url(ticker, startDate, endDate):
    ticker = ticker.replace('^', '%5E')
    startDate = startDate.strftime('%Y-%m-%d')
    endDate = endDate.strftime('%Y-%m-%d')

    return {
        "url": "https://yfinance-stock-market-data.p.rapidapi.com/price-customdate",
        "payload":f"symbol={ticker}&end={endDate}&start={startDate}",
        "headers": {
            "content-type": "application/x-www-form-urlencoded",
            "X-RapidAPI-Key": "{subscriber_key}",
            "X-RapidAPI-Host": "yfinance-stock-market-data.p.rapidapi.com"            
        }
    }

# loads index date for all indices in the database
def load_daily_market_index_data(startDate, endDate):
    index_frame = get_market_index_frame()

    for i, row in index_frame.iterrows():
        entity_id = row['EntityId']
        ticker = row['Ticker']
        print(f"Loading Index Price Data for Ticker: {ticker}")
        load_daily_price_data(entity_id, ticker, startDate, endDate, False)
        #since calls are limited to 5 a minute the sleep make sure 
        # no extra calls are done
        time.sleep(30) 

    return None 

# makes api call for specific index and date frame
# thansforms data and uploads into sql database
def load_daily_price_data(entity_id, ticker, startDate, endDate, display = False):
    # get sql database connection
    engine = db.get_sql_engine(stage_database)
    # create api call
    call = get_api_url(ticker, startDate, endDate)

    if display == True:
        print(f"DB Engine: {engine} \n")
        print(f"API Call: {call} \n")   

    # do api call
    response = req.request("POST", call['url'],
        data = call['payload'], headers = call['headers'])

    if display == True:
        print(f"Status Code: {response.status_code} \n")

    #transform into dataframe from json
    json = response.json()
    frame = pd.DataFrame(json['data'])

    frame.columns = transfomed_columns
    frame.insert(0, 'Ticker', f'{ticker}')
    # need to convert unix time to a date
    frame['Price_Date'] = frame['Date_']
        .apply(lambda x: datetime.datetime.fromtimestamp(int(str(x)[0:10]))
        .strftime('%Y-%m-%d'))
    # these are all meta data for the index data
    frame['EntityId'] = entity_id
    frame['Load_Date'] = date.now()
    frame['Load_Time'] = date.now()
    frame['Source_'] = source
    frame['Source_Dataset'] = 'Price Custom Date'
    frame['Source_API_Call'] = str(call)

    # load into sql database
    frame.to_sql(name = 'Daily_Price_Load', con = engine, 
    schema = 'CODE_STORM', if_exists = 'append')

    return frame, response

There is one method 'get_market_index_frame' that is not covered as part of this post. Before I created the database for the stock price data, I created a database for ticker and other configuration information. In a future series I will go over that. For the moment in the 'load_daily_market_index_data' method, instead of iterating over the index frame it can be changed to an array of tickers and the entity_id can be dropped for the moment. The entity_id is part of the configuration database and will be gone over later.

Testing Logic with a Jupyter Lab Notebook

Jupyter lab is a fantastic tool for data science. It's also great just for testing Python scripts and classes to get them working.

First a section is created for loading in the libraries.

import importlib as imp
import pandas as pd 
import CS_Daily_Price_Load as dl
import datetime 
from datetime import datetime as date
imp.reload(dl)

Next create the variables needed to pass into the method.

entity_id, ticker = 1, '^DJI'
today = date.today()
end_date = date(today.year, today.month, today.day)
start_date = end_date - datetime.timedelta(days = 10)

Next another section with the method call.

frame, response = dl.load_daily_price_data(entity_id, ticker, 

    start_date, end_date, True)

Then the call result can be viewed by calling this method on the pandas dataframe.

frame.head()

Create Automated Task to do the Upload

Next thing I want to do is create a task that will run the python script and do the data load automatically.

First, I'm creating a Python file that will run the logic to do the data load.

import CS_Daily_Price_Load as pl
import datetime
from datetime import datetime as date 

today = date.today()
end_date = date(today.year, today.month, today.day)
start_date = end_date - datetime.timedelta(days = 10)

pl.load_daily_market_index_data(start_date, end_date)

Next, a .bat file needs to be created to do the execution of the python script above. Notepad can be used just and saved as a .bat file. It needs the full file path of where the Python.exe is and the full path of the python file that is to be run.

"C:/Users/{UserName}/AppData/Local/Microsoft/WindowsApps/python3.9.exe" "A:\GIT\Finance\Code_Storm\CS_Price_Run.py"

Next in windows, the Task Scheduler is used for the automation.

I've set the schedule to daily.

Then the time it runs.

Next the action, "Start a program" is selected.

Finally, the Program/Script just need to be the full file path of the .bat file.

Summary

This is one of the first steps into loading stock market index data into a local database, that can be used for personal research, my goal eventually is to tie many different sources of data together. This is purely uploading the raw data; in the next post I will go over the transformation into the actual data types for example the strings of prices will be converted into decimals. There are a few reasons for splitting up the steps, either for making it easier to single out where an error is occurring or creating smaller manageable pieces of logic instead of trying to do everything in big logic files.

Part 2 can be found here. Transforming the loaded data.

Search This Blog

Daft Trader

Capturing Stock Market Index Prices into a Local Database

Goal

Tools and Services used

API Service Chosen

API Call Response Structure

Database Table Structure

Loading Data With Python:

Testing Logic with a Jupyter Lab Notebook

Create Automated Task to do the Upload

Summary

Comments

Post a Comment

Popular posts from this blog

Capturing Stock Prices into a Local Database

Transforming Captured Stock Market Index Data