Capturing Stock Market Index Prices into a Local Database

 

Goal

I’m in what will be a very long process of building my own personal database of stock prices and related data. One of the first points of data I aim to collect is daily price data of some of the major indices, i.e., Dow Jones, Nasdaq etc.

This post will go over the first steps on how i'm capturing that data. The API service that is being used, the SQL Server table structure and the loading of data from the API call into the database using python. 


Tools and Services used



API Service Chosen

The data provider that I’ve chosen for capturing the market data is from Rapid API. Unfortunately, a lot of stock price providers don't contain index data.

https://rapidapi.com/asepscareer/api/yfinance-stock-market-data/

 

API Call Response Structure 

The call i'm going to be using is the PriceCustomDate method call. It takes a symbol, start date, and end date for a date range to select from. What I want to do is create a table in the SQL Database that will store the data from the API Call.

 

  • [] 82 items
    • "Success"
    • 200

    [] 82 items

    • {} 7 keys
      • 180.9597320557
      • 182.0099945068
      • 1641168000000
      • 182.8800048828
      • 177.7100067139
      • 177.8300018311
      • 104487900



    Database Table Structure

    For the purposes of simply loading the data the table consists of all strings for column types for the data points being uploaded. The other columns added are meta data used for traceability.


    CREATE TABLE [CODE_STORM].[Daily_Price_Load]

    (

    DailyPriceId BIGINT IDENTITY(1,1) NOT NULL PRIMARY KEY,

    [index] NVARCHAR(250),

    Ticker NVARCHAR(250),

    Adjusted_Close NVARCHAR(250),

    Close_ NVARCHAR(250),

    Date_ NVARCHAR(250),

    High_ NVARCHAR(250),

    Low_ NVARCHAR(250),

    Open_ NVARCHAR(250),

    Volume_ NVARCHAR(250),

    Price_Date NVARCHAR(250),

    EntityId BIGINT NOT NULL,

    Load_Date DATE NOT NULL,

    Load_Time DATETIME2 NOT NULL,

    Source_ NVARCHAR(500) NOT NULL, 

    Source_Dataset NVARCHAR(255) NOT NULL,

    Source_API_Call NVARCHAR(2000) NOT NULL

    )


    Some things to take a note of. 

    • The schema CODE_STORM is created because that is the name of service the data is coming from putting grouping all tables and procedures used in association with that service under the same schema allows for easy convention structure to quickly know about those database objects.
    • All the incoming data from the API Call is stored as strings so that it reduces the chances of running into an error when loading in the data into the database. 
    • This is only the first step of loading in the data. It will be converted into the actual data types like Dates and Decimals in another step.
    • The other columns that are for keeping meta data about the data being loaded.


    Loading Data With Python:

    Using Python is a quick and easy way to set up a script to pull data from the API Call and load into a SQL Database.


    First thing I created a Python script to handle the connection to the database. This is in its own file.


    import sqlalchemy as sa
    import urllib as ur

    def get_sql_engine(db):
        conn_str = ""

        if db == "StockTradingConfig":
            conn_str = "DRIVER={ODBC Driver 17 for SQL Server};"
            "SERVER={Computer_Name}\SQLSERVER2019;"
              "DATABASE=StockTradingConfig;UID={User_Id};PWD={User_Password}"
        elif db == "StockTradingStage":
            conn_str = "DRIVER={ODBC Driver 17 for SQL Server};"
            "SERVER={Computer_Name}\SQLSERVER2019;"
            "DATABASE=StockTradingStage;UID={User_Id};PWD={User_Password}"

        params = ur.parse.quote_plus(conn_str)
        return sa.create_engine('mssql+pyodbc:///?odbc_connect={}'.format(params))


    Next is the Python file to do the API call then do some slight transformation and load into the SQL Database. 

     import requests as req

    import pandas as pd
    import sql_db as db
    import datetime
    from datetime import datetime as date
    import time

    # database names
    stage_database = "StockTradingStage"
    config_database = "StockTradingConfig"
    source = "Code Storm"

    # column names that that pandas columns need
    # to be changed into to math the database
    transfomed_columns = [
        "Adjusted_Close", "Close_", "Date_",
        "High_", "Low_", "Open_",
        "Volume_"
    ]

    # gets maket index info (tickers) from database
    def get_market_index_frame():
        engine = db.get_sql_engine(config_database)
        cmd = "SELECT * FROM MARKET.Index_;"
        frame = pd.read_sql(cmd, con = engine)

        return frame    

    # creates call to api services for a ticker and
    # selection date range
    def get_api_url(ticker, startDate, endDate):
        ticker = ticker.replace('^', '%5E')
        startDate = startDate.strftime('%Y-%m-%d')
        endDate = endDate.strftime('%Y-%m-%d')

        return {
            "url": "https://yfinance-stock-market-data.p.rapidapi.com/price-customdate",
            "payload":f"symbol={ticker}&end={endDate}&start={startDate}",
            "headers": {
                "content-type": "application/x-www-form-urlencoded",
                "X-RapidAPI-Key": "{subscriber_key}",
                "X-RapidAPI-Host": "yfinance-stock-market-data.p.rapidapi.com"            
            }
        }

    # loads index date for all indices in the database
    def load_daily_market_index_data(startDate, endDate):
        index_frame = get_market_index_frame()

        for i, row in index_frame.iterrows():
            entity_id = row['EntityId']
            ticker = row['Ticker']
            print(f"Loading Index Price Data for Ticker: {ticker}")
            load_daily_price_data(entity_id, ticker, startDate, endDate, False)
            #since calls are limited to 5 a minute the sleep make sure
            # no extra calls are done
            time.sleep(30)

        return None

    # makes api call for specific index and date frame
    # thansforms data and uploads into sql database
    def load_daily_price_data(entity_id, ticker, startDate, endDate, display = False):
        # get sql database connection
        engine = db.get_sql_engine(stage_database)
        # create api call
        call = get_api_url(ticker, startDate, endDate)

        if display == True:
            print(f"DB Engine: {engine} \n")
            print(f"API Call: {call} \n")  

        # do api call
        response = req.request("POST", call['url'],
            data = call['payload'], headers = call['headers'])

        if display == True:
            print(f"Status Code: {response.status_code} \n")

        #transform into dataframe from json
        json = response.json()
        frame = pd.DataFrame(json['data'])

        frame.columns = transfomed_columns
        frame.insert(0, 'Ticker', f'{ticker}')
        # need to convert unix time to a date
        frame['Price_Date'] = frame['Date_']
            .apply(lambda x: datetime.datetime.fromtimestamp(int(str(x)[0:10]))
            .strftime('%Y-%m-%d'))
        # these are all meta data for the index data
        frame['EntityId'] = entity_id
        frame['Load_Date'] = date.now()
        frame['Load_Time'] = date.now()
        frame['Source_'] = source
        frame['Source_Dataset'] = 'Price Custom Date'
        frame['Source_API_Call'] = str(call)

        # load into sql database
        frame.to_sql(name = 'Daily_Price_Load', con = engine,
        schema = 'CODE_STORM', if_exists = 'append')

        return frame, response

     

    There is one method 'get_market_index_frame' that is not covered as part of this post. Before I created the database for the stock price data, I created a database for ticker and other configuration information. In a future series I will go over that. For the moment in the 'load_daily_market_index_data' method, instead of iterating over the index frame it can be changed to an array of tickers and the entity_id can be dropped for the moment. The entity_id is part of the configuration database and will be gone over later. 


    Testing Logic with a Jupyter Lab Notebook

    Jupyter lab is a fantastic tool for data science. It's also great just for testing Python scripts and classes to get them working. 


    First a section is created for loading in the libraries.

    import importlib as imp
    import pandas as pd
    import CS_Daily_Price_Load as dl
    import datetime
    from datetime import datetime as date
    imp.reload(dl)


    Next create the variables needed to pass into the method.

    entity_id, ticker = 1, '^DJI'
    today = date.today()
    end_date = date(today.year, today.month, today.day)
    start_date = end_date - datetime.timedelta(days = 10)


    Next another section with the method call.

    frame, response = dl.load_daily_price_data(entity_id, ticker,
        start_date, end_date, True)


    Then the call result can be viewed by calling this method on the pandas dataframe.

    frame.head()





    Create Automated Task to do the Upload

    Next thing I want to do is create a task that will run the python script and do the data load automatically.


    First, I'm creating a Python file that will run the logic to do the data load.

    import CS_Daily_Price_Load as pl
    import datetime
    from datetime import datetime as date

    today = date.today()
    end_date = date(today.year, today.month, today.day)
    start_date = end_date - datetime.timedelta(days = 10)

    pl.load_daily_market_index_data(start_date, end_date)


    Next, a .bat file needs to be created to do the execution of the python script above. Notepad can be used just and saved as a .bat file. It needs the full file path of where the Python.exe is and the full path of the python file that is to be run.

    "C:/Users/{UserName}/AppData/Local/Microsoft/WindowsApps/python3.9.exe" "A:\GIT\Finance\Code_Storm\CS_Price_Run.py"




    Next in windows, the Task Scheduler is used for the automation. 




    I've set the schedule to daily.




    Then the time it runs. 



    Next the action, "Start a program" is selected.



    Finally, the Program/Script just need to be the full file path of the .bat file. 





    Summary

    This is one of the first steps into loading stock market index data into a local database, that can be used for personal research, my goal eventually is to tie many different sources of data together. This is purely uploading the raw data; in the next post I will go over the transformation into the actual data types for example the strings of prices will be converted into decimals. There are a few reasons for splitting up the steps, either for making it easier to single out where an error is occurring or creating smaller manageable pieces of logic instead of trying to do everything in big logic files. 


    Part 2 can be found here. Transforming the loaded data.








    Comments

    Popular posts from this blog

    Capturing Stock Prices into a Local Database