The Mobile Field Base: SQL Tutorial: Capturing changes using update statements using OUTPUT clauses

In the coding world we often get a fun request. Capture changes to a table into a log table during bulk ETL updates. When doing this in SQL the examples I have seen of people doing this can often be clunky, slow, and inefficient. Recently I had to tackle one of these problems and the example code that I was given had multiple select statements comparing the old data and the new data directly from the source and the destination using complicated joins per column which it then inserted.

This looked incredibly complicated, and required that any transformations that were done in the SQL be re-applied during the data comparison. It was not a good method to approach the issue, and was very slow because for each column it had to do a join against old data.

Example:

Old Data vs New Data

Select

mycolum,

mynewcolumndata,

from old as current

join new as fromsource

on key = key

Then they inserted all of this into a log table, and then did the update the table.

This is very slow as every single column required its own insert and update statement. We don't want to do this. So what I have done is crafted some code using the OUTPUT clause in SQL Server to give the ability to capture the direct differences between two tables.

MSDN OUTPUT CLAUSE: https://msdn.microsoft.com/en-us/library/ms177564.aspx

Here is the example that I created.

Developer: Christopher Harsch

Definition: This is an example of how to do change data capture using unpivot.

This uses a sample table, and updates a record and shows the before and after results

of the query that could then be later inserted into a per column logging table.

IF Object_id('tempdb..#tmp_OUTPUT_CHG_DATA_CAPTURE_EXAMPLE') IS NOT NULL

BEGIN

DROP TABLE #tmp_output_chg_data_capture_example

END

-- Create a table to put some sample data

CREATE TABLE #tmp_output_chg_data_capture_example

(

id INT IDENTITY(1, 1)

, firstname VARCHAR(255)

, lastname VARCHAR(255)

, numberofanimals INT

, awesomeindicator CHAR(1)

, awesomeind_type_cd VARCHAR(255)

)

-- Insert some sample data

INSERT INTO #tmp_output_chg_data_capture_example

(firstname

, lastname

, numberofanimals

, awesomeindicator

, awesomeind_type_cd)

SELECT 'Chris' AS FirstName

, 'Harsch' AS LastName

, '0' AS NUmberOfAnimals

, 'N' AS AwesomeIndicator

, 'NotAwesome' AS AwesomeInd_Type_CD

UNION

SELECT 'Buzz' AS FirstName

, 'LightYear' AS LastName

, '-1' AS NUmberOfAnimals

, 'Y' AS AwesomeIndicator

, 'Awesome' AS AwesomeInd_Type_CD

UNION

SELECT 'Woody' AS FirstName

, 'AndysToy' AS LastName

, '-1' AS NUmberOfAnimals

, 'Y' AS AwesomeIndicator

, 'Awesome' AS AwesomeInd_Type_CD

IF Object_id('tempdb..#tmp_chng_capture') IS NOT NULL

BEGIN

DROP TABLE #tmp_chng_capture

END

-- See what is in the sample data.

select * from #tmp_OUTPUT_CHG_DATA_CAPTURE_EXAMPLE

-- Add the columns I want to show are being changed

-- These are the ones you are bringing in, don't capture things changing if you are

-- not changing them.

-- Note, for this part I am turning things into a static data type as my method of

-- before

-- and after comparisons require all the columns to have the same data type,

-- varchar(size) is a good method for this

-- in this case I set the size to 255

CREATE TABLE #tmp_chng_capture

(

id INT NOT NULL

, numberofanimals VARCHAR(255)

, numberofanimals_old VARCHAR(255)

, awesomeindicator VARCHAR(255)

, awesomeindicator_old VARCHAR(255)

, awesomeind_type_cd VARCHAR(255)

, awesomeind_type_cd_old VARCHAR(255)

)

-- Run the update, and use the OUTPUT clause to get the inserted/updated values

UPDATE tocdce

SET tocdce.numberofanimals = '0'

, awesomeindicator = 'Y'

, awesomeind_type_cd = 'Awesome'

output inserted.id

, inserted.numberofanimals

, deleted.numberofanimals

, inserted.awesomeindicator

, deleted.awesomeindicator

, inserted.awesomeind_type_cd

, deleted.awesomeind_type_cd

INTO #tmp_chng_capture ( id, numberofanimals, numberofanimals_old,

awesomeindicator, awesomeindicator_old, awesomeind_type_cd,

awesomeind_type_cd_old )

FROM #tmp_output_chg_data_capture_example TOCDCE

WHERE lastname = 'Harsch'

-- Make sure we know woody has a dog (slinky dog)

UPDATE tocdce

SET tocdce.numberofanimals = '1'

output inserted.id

, inserted.numberofanimals

, deleted.numberofanimals

, inserted.awesomeindicator

, deleted.awesomeindicator

, inserted.awesomeind_type_cd

, deleted.awesomeind_type_cd

INTO #tmp_chng_capture ( id, numberofanimals, numberofanimals_old,

awesomeindicator, awesomeindicator_old, awesomeind_type_cd,

awesomeind_type_cd_old )

FROM #tmp_output_chg_data_capture_example TOCDCE

WHERE lastname = 'AndysToy'

-- lets look at the output of the data changes

SELECT *

FROM #tmp_chng_capture

-- now we want to do the unpivot magic. This unpivot is doing a double unpivot.

-- This uses a CTE but could easily have used selects into multiple temp tables for

-- each step

-- Step 1: Get only the data that we care about from the table, this should only hold

-- columns that we care

-- about unpivoting, everything else is extraneous.

-- Step 2: unpivot the new columns and capture the column names and the ID numbers

-- for those columns.

-- Step 3: unpivot the old columns and capture the column names and the id numbers

-- for those columns.

-- You will notice that in the old column set, I made the naming convention useful so

-- that the columns

-- are just the name of the "new" columns, but with the name "_old" applied.

-- Step 4: Do a join on ID and column name, and remove the name "Old" from the old

-- column names so that

-- it joins to the old columns. This also does a comparison to see if the columns

-- are actually different

-- and it also checks for the NULL condition, IE two null columns are the same, but

-- will not be equivalent

-- if you do a straight comparison, so this does an AND NOT (BOTH NULL) clause.

; WITH cte_values_to_unpivot

AS (SELECT id

, numberofanimals

, numberofanimals_old

, awesomeindicator

, awesomeindicator_old

, awesomeind_type_cd

, awesomeind_type_cd_old

FROM #tmp_chng_capture),

cte_unpivoted_new

AS (SELECT id

, [columnvalue] AS [New_Value]

, columnname AS columnname

FROM cte_values_to_unpivot

UNPIVOT ( columnvalue

FOR columnname IN ( numberofanimals

, awesomeindicator

, awesomeind_type_cd ) ) AS up1),

cte_unpivoted_old

AS (SELECT id

, [columnvalue] AS [Old_Value]

, columnname AS columnname

FROM cte_values_to_unpivot

UNPIVOT ( columnvalue

FOR columnname IN ( numberofanimals_old

, awesomeindicator_old

, awesomeind_type_cd_old ) ) AS up1)

SELECT cun.id

, cun.columnname

, cun.new_value

, cuo.old_value

FROM cte_unpivoted_new cun

JOIN cte_unpivoted_old cuo

ON cun.id = cuo.id

AND cun.columnname = Replace(cuo.columnname, '_old', '') -- Make the column names match

WHERE cun.new_value <> cuo.old_value

AND NOT ( cun.new_value IS NULL

AND cuo.old_value IS NULL )

-- Catches for if both values are null and wont evaluate

The Mobile Field Base

Wednesday, March 9, 2016

SQL Tutorial: Capturing changes using update statements using OUTPUT clauses

No comments:

Post a Comment