In the coding world we often get a fun request. Capture
changes to a table into a log table during bulk ETL updates. When doing
this in SQL the examples I have seen of people doing this can often be clunky,
slow, and inefficient. Recently I had to tackle one of these problems and
the example code that I was given had multiple select statements comparing the
old data and the new data directly from the source and the destination using
complicated joins per column which it then inserted.
This looked incredibly complicated, and
required that any transformations that were done in the SQL be re-applied
during the data comparison. It was not a good method to approach the
issue, and was very slow because for each column it had to do a join against
old data.
Example:
Old Data vs New Data
Select
mycolum,
mynewcolumndata,
from old as current
join new as fromsource
on key = key
Then they inserted all of this into a log
table, and then did the update the table.
This is very slow as every single column
required its own insert and update statement. We don't want to do this.
So what I have done is crafted some code using the OUTPUT clause in SQL
Server to give the ability to capture the direct differences between two
tables.
MSDN OUTPUT CLAUSE:
https://msdn.microsoft.com/en-us/library/ms177564.aspx
Here
is the example that I created.
/*
Developer: Christopher Harsch
Definition: This is an example of how to do change data capture
using unpivot.
This uses a sample table, and updates a record and shows the
before and after results
of the query that could then be later inserted into a per column
logging table.
*/
IF Object_id('tempdb..#tmp_OUTPUT_CHG_DATA_CAPTURE_EXAMPLE') IS NOT NULL
BEGIN
DROP TABLE #tmp_output_chg_data_capture_example
END
-- Create a table
to put some sample data
CREATE TABLE #tmp_output_chg_data_capture_example
(
id INT
IDENTITY(1, 1)
, firstname
VARCHAR(255)
, lastname
VARCHAR(255)
, numberofanimals
INT
, awesomeindicator
CHAR(1)
, awesomeind_type_cd VARCHAR(255)
)
-- Insert some
sample data
INSERT INTO #tmp_output_chg_data_capture_example
(firstname
, lastname
, numberofanimals
, awesomeindicator
, awesomeind_type_cd)
SELECT 'Chris' AS FirstName
, 'Harsch' AS LastName
, '0' AS NUmberOfAnimals
, 'N' AS AwesomeIndicator
, 'NotAwesome' AS AwesomeInd_Type_CD
UNION
SELECT 'Buzz' AS FirstName
, 'LightYear' AS LastName
, '-1' AS NUmberOfAnimals
, 'Y' AS AwesomeIndicator
, 'Awesome' AS AwesomeInd_Type_CD
UNION
SELECT 'Woody' AS FirstName
, 'AndysToy' AS LastName
, '-1' AS NUmberOfAnimals
, 'Y' AS AwesomeIndicator
, 'Awesome' AS AwesomeInd_Type_CD
IF Object_id('tempdb..#tmp_chng_capture')
IS NOT NULL
BEGIN
DROP TABLE #tmp_chng_capture
END
-- See what is in
the sample data.
select * from #tmp_OUTPUT_CHG_DATA_CAPTURE_EXAMPLE
-- Add the columns I want to show are being changed
-- These are the ones you are bringing in, don't capture things
changing if you are
-- not changing them.
-- Note, for this part I am turning things into a static data type
as my method of
-- before
-- and after comparisons require all the columns to have the same
data type,
-- varchar(size) is a good method for this
-- in this case I set the size to 255
CREATE TABLE #tmp_chng_capture
(
id INT
NOT NULL
, numberofanimals
VARCHAR(255)
, numberofanimals_old
VARCHAR(255)
, awesomeindicator
VARCHAR(255)
, awesomeindicator_old
VARCHAR(255)
, awesomeind_type_cd
VARCHAR(255)
, awesomeind_type_cd_old VARCHAR(255)
)
-- Run the update,
and use the OUTPUT clause to get the inserted/updated values
UPDATE
tocdce
SET tocdce.numberofanimals = '0'
, awesomeindicator = 'Y'
, awesomeind_type_cd = 'Awesome'
output inserted.id
, inserted.numberofanimals
, deleted.numberofanimals
, inserted.awesomeindicator
, deleted.awesomeindicator
, inserted.awesomeind_type_cd
, deleted.awesomeind_type_cd
INTO #tmp_chng_capture ( id, numberofanimals, numberofanimals_old,
awesomeindicator,
awesomeindicator_old,
awesomeind_type_cd,
awesomeind_type_cd_old )
FROM #tmp_output_chg_data_capture_example
TOCDCE
WHERE lastname = 'Harsch'
-- Make sure we know woody has a dog (slinky dog)
UPDATE
tocdce
SET tocdce.numberofanimals = '1'
output inserted.id
, inserted.numberofanimals
, deleted.numberofanimals
, inserted.awesomeindicator
, deleted.awesomeindicator
, inserted.awesomeind_type_cd
, deleted.awesomeind_type_cd
INTO #tmp_chng_capture ( id, numberofanimals, numberofanimals_old,
awesomeindicator,
awesomeindicator_old,
awesomeind_type_cd,
awesomeind_type_cd_old )
FROM #tmp_output_chg_data_capture_example
TOCDCE
WHERE lastname = 'AndysToy'
-- lets look at the output of the data changes
SELECT *
FROM #tmp_chng_capture
-- now we want to do the unpivot magic. This unpivot is doing a
double unpivot.
-- This uses a CTE but could easily have used selects into
multiple temp tables for
-- each step
-- Step 1: Get only the data that we care about from the table,
this should only hold
-- columns that we care
-- about unpivoting, everything else is extraneous.
-- Step 2: unpivot the new columns and capture the column names
and the ID numbers
-- for those columns.
-- Step 3: unpivot the old columns and capture the column names
and the id numbers
-- for those columns.
-- You will notice that in the old column set, I made the naming
convention useful so
-- that the columns
-- are just the name of the "new" columns, but with the
name "_old" applied.
-- Step 4: Do a join on ID and column name, and remove the name
"Old" from the old
-- column names so that
-- it joins to the old columns.
This also does a comparison to see if the columns
-- are actually different
-- and it also checks for the NULL condition, IE two null columns
are the same, but
-- will not be equivalent
-- if you do a straight comparison, so this does an AND NOT (BOTH
NULL) clause.
; WITH cte_values_to_unpivot
AS (SELECT id
,
numberofanimals
,
numberofanimals_old
,
awesomeindicator
,
awesomeindicator_old
,
awesomeind_type_cd
,
awesomeind_type_cd_old
FROM #tmp_chng_capture),
cte_unpivoted_new
AS (SELECT id
,
[columnvalue] AS
[New_Value]
,
columnname
AS columnname
FROM cte_values_to_unpivot
UNPIVOT ( columnvalue
FOR columnname IN ( numberofanimals
, awesomeindicator
, awesomeind_type_cd ) ) AS up1),
cte_unpivoted_old
AS (SELECT id
,
[columnvalue] AS
[Old_Value]
,
columnname
AS columnname
FROM cte_values_to_unpivot
UNPIVOT ( columnvalue
FOR columnname IN ( numberofanimals_old
, awesomeindicator_old
, awesomeind_type_cd_old
) ) AS up1)
SELECT cun.id
, cun.columnname
, cun.new_value
, cuo.old_value
FROM cte_unpivoted_new
cun
JOIN cte_unpivoted_old cuo
ON cun.id = cuo.id
AND
cun.columnname = Replace(cuo.columnname, '_old', '') -- Make the column names match
WHERE cun.new_value <> cuo.old_value
AND NOT ( cun.new_value IS NULL
AND
cuo.old_value IS NULL )
-- Catches for if both values are null and wont evaluate