Transforming Captured Stock Market Index Data
Goal
In the first part located here stock index data was uploaded into a SQL Server database. All the incoming data that was captured from the API call was stored as strings. In this update that data will be transformed into its proper data types.
Tools Used
- SQL Server 2019
- SQL Server Database Project
- SQL Server Management Studio 18 (SSMS)
Loaded Data Table
Below is the table definition for the API call result. All the incoming data is stored as strings, with the exception of the columns to keep meta data. The 'Load' at the end of the table is a convention used so it easily identifiable as to what the tables' purpose is.
CREATE TABLE [CODE_STORM].[Daily_Price_Load]
(
DailyPriceId BIGINT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[index] NVARCHAR(250),
Ticker NVARCHAR(250),
Adjusted_Close NVARCHAR(250),
Close_ NVARCHAR(250),
Date_ NVARCHAR(250),
High_ NVARCHAR(250),
Low_ NVARCHAR(250),
Open_ NVARCHAR(250),
Volume_ NVARCHAR(250),
Price_Date NVARCHAR(250),
EntityId BIGINT NOT NULL,
Load_Date DATE NOT NULL,
Load_Time DATETIME2 NOT NULL,
Source_ NVARCHAR(500) NOT NULL,
Source_Dataset NVARCHAR(255) NOT NULL,
Source_API_Call NVARCHAR(2000) NOT NULL
)
Source Data Table
Below is the definition of the table that the loaded data will be transformed into. The 'Source' at the end of the table is a convention used so it easily identifiable as to what the tables' purpose is.
CREATE TABLE [CODE_STORM].[Daily_Price_Source]
(
DailyPriceSourceId BIGINT IDENTITY(1,1) NOT NULL PRIMARY KEY,
EntityId BIGINT NOT NULL,
HashKey VARBINARY(64) NOT NULL,
Ticker NVARCHAR(10) NOT NULL,
Price_Date DATE NOT NULL,
Unix_DateTime BIGINT NOT NULL,
Open_ DECIMAL(30,8) NOT NULL,
High_ DECIMAL(30,8) NOT NULL,
Low_ DECIMAL(30,8) NOT NULL,
Close_ DECIMAL(30,8) NOT NULL,
Adjusted_Close DECIMAL(30,8) NOT NULL,
Volume_ BIGINT NOT NULL,
Load_Date DATE NOT NULL,
Load_Time DATETIME2 NOT NULL,
Modified_Date DATE NOT NULL,
Modified_Time DATETIME2 NOT NULL,
Source_ NVARCHAR(500) NOT NULL,
Source_Dataset NVARCHAR(255) NOT NULL,
Source_API_Call NVARCHAR(2000) NOT NULL,
CONSTRAINT uq_cs_daily_price UNIQUE(Ticker, Price_Date)
)
Transformation Procedure
The transformation does five things.
- Because loaded data is appended, duplicates are likely. This step deletes duplicate records prior to the last one uploaded by load time.
- The next step creates a temp table to store the loaded data.
- The third step transforms the column data to match the source table column data type. Prices are decimals, price date is a date type.
- The fourth step either inserts new data into the destination source table or it does an update if it finds an existing record.
- The final step simply deletes the records from the load table. The source table is the data is the data that needs to be kept and the load data table after transformed is not necessary to store.
CREATE PROCEDURE [CODE_STORM].[sp_Daily_Price_Transform]
@display BIT = 0
AS
BEGIN
SET NOCOUNT ON;
-- DATES
DECLARE @date_ DATE = GETDATE();
DECLARE @time_ DATETIME2 = GETDATE();
/* DELETE DUPES */
DROP TABLE IF EXISTS #dupes;
CREATE TABLE #dupes(
Ticker NVARCHAR(250),
Price_Date NVARCHAR(250),
Count_ INT,
IDX INT IDENTITY(1,1)
);
INSERT INTO #dupes(
Ticker, Price_Date, Count_
)
SELECT
DPL.Ticker, DPL.Price_Date, COUNT(DPL.DailyPriceId)
FROM CODE_STORM.Daily_Price_Load DPL
GROUP BY DPL.Ticker, DPL.Price_Date
HAVING COUNT(DPL.DailyPriceId) > 1;
IF @display = 1
BEGIN
SELECT *
FROM #dupes;
END
DECLARE @row_count INT = (SELECT MAX(IDX) FROM #dupes);
DECLARE @curr INT = 1;
DECLARE @a_row_count INT, @a_curr INT;
DECLARE @ticker NVARCHAR(250), @price_date DATE, @count_ INT, @load_id BIGINT;
WHILE @curr <= @row_count
BEGIN
SELECT
@ticker = D.Ticker,
@price_date = D.Price_Date,
@count_ = D.Count_
FROM #dupes D
WHERE D.IDX = @curr;
SET @a_curr = 1;
WHILE @a_curr < @count_
BEGIN
SELECT TOP 1
@load_id = DPL.DailyPriceId
FROM CODE_STORM.Daily_Price_Load DPL
WHERE DPL.Ticker = @ticker
AND DPL.Price_Date = @price_date
ORDER BY DPL.Load_Time ASC;
DELETE
FROM CODE_STORM.Daily_Price_Load
WHERE DailyPriceId = @load_id;
SET @a_curr += 1;
END
SET @curr += 1;
END
/* END DELETE DUPES */
/* GET LOADED DATA */
DROP TABLE IF EXISTS #load_data;
CREATE TABLE #load_data(
DailyPriceId BIGINT,
[index] NVARCHAR(250),
Ticker NVARCHAR(250),
Adjusted_Close NVARCHAR(250),
Close_ NVARCHAR(250),
Date_ NVARCHAR(250),
High_ NVARCHAR(250),
Low_ NVARCHAR(250),
Open_ NVARCHAR(250),
Volume_ NVARCHAR(250),
Price_Date NVARCHAR(250),
EntityId BIGINT NOT NULL,
Load_Date DATE NOT NULL,
Load_Time DATETIME2 NOT NULL,
Source_ NVARCHAR(500) NOT NULL,
Source_Dataset NVARCHAR(255) NOT NULL,
Source_API_Call NVARCHAR(2000) NOT NULL
);
INSERT INTO #load_data(
DailyPriceId, [index], Ticker,
Adjusted_Close, Close_,
Date_, High_, Low_,
Open_, Volume_, Price_Date,
EntityId, Load_Date, Load_Time,
Source_, Source_Dataset, Source_API_Call
)
SELECT
DPL.DailyPriceId, DPL.[index], DPL.Ticker,
DPL.Adjusted_Close, DPL.Close_,
DPL.Date_, DPL.High_, DPL.Low_,
DPL.Open_, DPL.Volume_, DPL.Price_Date,
DPL.EntityId, DPL.Load_Date, DPL.Load_Time,
DPL.Source_, DPL.Source_Dataset, DPL.Source_API_Call
FROM CODE_STORM.Daily_Price_Load DPL;
IF @display = 1
BEGIN
SELECT *
FROM #load_data;
END
/* END LOADED DATA */
/* TRANSFOR DATA */
DROP TABLE IF EXISTS #load_transform;
CREATE TABLE #load_transform(
EntityId BIGINT NOT NULL,
HashKey VARBINARY(64) NOT NULL,
Ticker NVARCHAR(10) NOT NULL,
Price_Date DATE NOT NULL,
Unix_DateTime BIGINT NOT NULL,
Open_ DECIMAL(30,8) NOT NULL,
High_ DECIMAL(30,8) NOT NULL,
Low_ DECIMAL(30,8) NOT NULL,
Close_ DECIMAL(30,8) NOT NULL,
Adjusted_Close DECIMAL(30,8) NOT NULL,
Volume_ BIGINT NOT NULL,
Load_Date DATE NOT NULL,
Load_Time DATETIME2 NOT NULL,
Modified_Date DATE NOT NULL,
Modified_Time DATETIME2 NOT NULL,
Source_ NVARCHAR(500) NOT NULL,
Source_Dataset NVARCHAR(255) NOT NULL,
Source_API_Call NVARCHAR(2000) NOT NULL
);
INSERT INTO #load_transform(
EntityId, HashKey, Ticker,
Price_Date, Unix_DateTime, Open_,
High_, Low_, Close_,
Adjusted_Close, Volume_,
Load_Date, Load_Time,
Modified_Date, Modified_Time,
Source_, Source_Dataset, Source_API_Call
)
SELECT
LD.EntityId, 0x0, LD.Ticker,
CASE
WHEN TRY_CAST(LD.Price_Date AS DATE) IS NOT NULL THEN CONVERT(DATE, LD.Price_Date)
ELSE '0001-01-01'
END,
CASE
WHEN TRY_CAST(LD.Date_ AS BIGINT) IS NOT NULL THEN CONVERT(BIGINT, SUBSTRING(LD.Date_, 1, 10))
ELSE ''
END,
CASE
WHEN TRY_CAST(LD.Open_ AS DECIMAL) IS NOT NULL THEN CONVERT(DECIMAL(30,8), LD.Open_)
ELSE 0
END,
CASE
WHEN TRY_CAST(LD.High_ AS DECIMAL) IS NOT NULL THEN CONVERT(DECIMAL(30,8), LD.High_)
ELSE 0
END,
CASE
WHEN TRY_CAST(LD.Low_ AS DECIMAL) IS NOT NULL THEN CONVERT(DECIMAL(30,8), LD.Low_)
ELSE 0
END,
CASE
WHEN TRY_CAST(LD.Close_ AS DECIMAL) IS NOT NULL THEN CONVERT(DECIMAL(30,8), LD.Close_)
ELSE 0
END,
CASE
WHEN TRY_CAST(LD.Adjusted_Close AS DECIMAL) IS NOT NULL THEN CONVERT(DECIMAL(30,8), LD.Adjusted_Close)
ELSE 0
END,
CASE
WHEN TRY_CAST(LD.Volume_ AS BIGINT) IS NOT NULL THEN CONVERT(BIGINT, LD.Volume_)
ELSE 0
END,
LD.Load_Date, LD.Load_Time,
@date_, @time_,
LD.Source_, LD.Source_Dataset, LD.Source_API_Call
FROM #load_data LD;
-- CREATE HASHKEY
UPDATE #load_transform
SET HashKey = HASHBYTES('SHA2_256', CONCAT(
EntityId, Ticker, Price_Date, Unix_DateTime, Open_, High_, Low_,
Close_, Adjusted_Close, Volume_
));
IF @display = 1
BEGIN
SELECT *
FROM #load_transform;
END
/* END TRANSFORM */
/* LOAD INTO SOURCE TABLE */
MERGE CODE_STORM.Daily_Price_Source AS tgt
USING #load_transform AS src ON (
tgt.Ticker = src.ticker
AND tgt.Price_Date = src.Price_Date
)
WHEN MATCHED AND tgt.HashKey != src.HashKEy THEN UPDATE SET
tgt.EntityId = src.EntityId,
tgt.HashKey = src.HashKey,
tgt.Open_ = src.Open_,
tgt.High_ = src.High_,
tgt.Low_ = src.Low_,
tgt.Close_ = src.Close_,
tgt.Adjusted_Close = src.Adjusted_Close,
tgt.Volume_ = src.Volume_,
tgt.Modified_Date = src.Modified_Date,
tgt.Modified_Time = src.Modified_Time
WHEN NOT MATCHED THEN INSERT(
EntityId, HashKey, Ticker,
Price_Date, Unix_DateTime, Open_,
High_, Low_, Close_,
Adjusted_Close, Volume_,
Load_Date, Load_Time,
Modified_Date, Modified_Time,
Source_, Source_Dataset, Source_API_Call
)
VALUES (
src.EntityId, src.HashKey, src.Ticker,
src.Price_Date, src.Unix_DateTime, src.Open_,
src.High_, src.Low_, src.Close_,
src.Adjusted_Close, src.Volume_,
src.Load_Date, src.Load_Time,
src.Modified_Date, src.Modified_Time,
src.Source_, src.Source_Dataset, src.Source_API_Call
);
/* END LOAD INTO SOURCE TABLE */
-- DELETE SOURCE
DELETE DPL
FROM #load_data LD
INNER JOIN CODE_STORM.Daily_Price_Load DPL ON DPL.DailyPriceId = LD.DailyPriceId;
IF @display = 1
BEGIN
SELECT S.*
FROM CODE_STORM.Daily_Price_Source S
INNER JOIN #load_transform T ON T.HashKey = S.HashKey;
END
-- CLEAN UP
DROP TABLE IF EXISTS #dupes;
DROP TABLE IF EXISTS #load_data;
DROP TABLE IF EXISTS #load_transform;
END
Automate Transformation with SQL Job
Last thing to do is automate the transformation.
In SQL Server Managment Studio (SSMS) on the left pane there is the SQL Server Agent. Open this up and there will be a folder with the name 'Jobs', right click and create a new job.
In the steps, click on new to create a new job step. Give the step a name and add the procedure call in the command window.
Finally, select schedules and create a new schedule. Schedules can be set up daily, weekly, etc. The schedules in the SQL Agent job system have plenty of options.
Click ok out of everything it's done.
Summary
With part one and now this part in place, Stock Market Index data is being automatically loaded and transformed to a state that is usable to begin analysis. This is one of the beginning pieces I have planned. Next are ETF's, stock prices, related data, and more.
Comments
Post a Comment