Thursday, March 31, 2016

SQL 2012 - SUM syntax in 2012. IE: I need to do more research on the latest SQL Server versions.

So I have written SQL for a long time, and what happens sometimes is that you get entrenched in a specific way of doing things without realizing that new, better, methods can exist.  Not only are these methods new and better, they are also much faster as well (not just easier to write).

I had this happen today when someone mentioned to me that the method of doing month over month sums by year that I was doing were archaic.  This piqued my interest because that is not something I would normally have expected to hear.  

Basically, the problem is this, I have a bunch of data being summed and I want to make sure that for any specific date I have the rolling number of the previous months sums.  So if I had 10 dollars in January, 10 dollars in February, and 10 dollars in March, then by March I have a total of 30 bucks, February 20 bucks, and January is just 10.  Seems pretty straight forward, and this is a totally common task.

What I didn't realize is that in SQL Server 2012 and beyond, the code to accomplish this goal is less obtuse because functionality was added to the SUM syntax.

So I created a small example that you can review:

IF Object_id('tempdb..#tmp') IS NOT NULL 
  
BEGIN 
      
DROP TABLE #tmp 
  
END 

CREATE TABLE #tmp 
  
( 
     
id      INT IDENTITY(1, 1), 
     
mth_nb  INT, 
     
year_nb INT, 
     
value   DECIMAL(16, 2) 
  
) 

INSERT INTO #tmp 
            
(mth_nb, 
             
year_nb, 
             
value) 
SELECT '1'  MTH_NB, 
       2016 
year_nb, 
       
'10' AS value 
UNION 
SELECT '2'  MTH_NB, 
       2016 
year_nb, 
       
'10' AS value 
UNION 
SELECT '3'  MTH_NB, 
       2016 
year_nb, 
       
'10' AS value 
UNION 
SELECT '4'  MTH_NB, 
       2016 
year_nb, 
       
'10' AS value 
UNION 
SELECT '5'  MTH_NB, 
       2016 
year_nb, 
       
'10' AS value 
UNION 
SELECT '1'  MTH_NB, 
       2015 
year_nb, 
       
'1'  AS value 
UNION 
SELECT '2'  MTH_NB, 
       2015 
year_nb, 
       
'1'  AS value 
UNION 
SELECT '3'  MTH_NB, 
       2015 
year_nb, 
       
'1'  AS value 
UNION 
SELECT '4'  MTH_NB, 
       2015 
year_nb, 
       
'1'  AS value 
UNION 
SELECT '5'  MTH_NB, 
       2015 
year_nb, 
       
'1'  AS value 

-- The sql server 2008 method 
SELECT ta.*, 
       
tart.runningtotal 
FROM   #tmp ta 
       
OUTER apply (SELECT year_nb, 
                           
Sum(value) AS RunningTotal 
                    
FROM   #tmp tsub 
                    
WHERE  ta.year_nb = tsub.year_nb 
                           
AND ta.mth_nb >= tsub.mth_nb 
                    
GROUP  BY year_nb) tart 
ORDER  BY ta.year_nb, 
          
mth_nb 

-- sql server 2012 method 
SELECT *, 
       
Sum(value) 
         
OVER( 
           
partition BY year_nb 
           
ORDER BY mth_nb) RunningTotal 
FROM   #tmp

The output of both is the correct answer!  But the SQL Server 2012 method is far faster!

id
MTH_NB
year_nb
value
RunningTotal
1
1
2015
1
1
3
2
2015
1
2
5
3
2015
1
3
7
4
2015
1
4
9
5
2015
1
5
2
1
2016
10
10
4
2
2016
10
20
6
3
2016
10
30
8
4
2016
10
40
10
5
2016
10
50


Here is the MSDN article on SUM.

https://msdn.microsoft.com/en-us/library/ms187810.aspx

NOTE: Even though the MSDN article says it starts in 2008, I try to run the new syntax on my  2008 instance and it doesn't work.

Wednesday, March 9, 2016

SQL Tutorial: Capturing changes using update statements using OUTPUT clauses

In the coding world we often get a fun request.  Capture changes to a table into a log table during bulk ETL updates.  When doing this in SQL the examples I have seen of people doing this can often be clunky, slow, and inefficient.  Recently I had to tackle one of these problems and the example code that I was given had multiple select statements comparing the old data and the new data directly from the source and the destination using complicated joins per column which it then inserted. 

This looked incredibly complicated, and required that any transformations that were done in the SQL be re-applied during the data comparison.  It was not a good method to approach the issue, and was very slow because for each column it had to do a join against old data.

Example:

Old Data vs New Data

Select
       mycolum,
       mynewcolumndata,
from old as current
join new as fromsource
 on key = key

Then they inserted all of this into a log table, and then did the update the table.

This is very slow as every single column required its own insert and update statement.  We don't want to do this.  So what I have done is crafted some code using the OUTPUT clause in SQL Server to give the ability to capture the direct differences between two tables.

MSDN OUTPUT CLAUSE: https://msdn.microsoft.com/en-us/library/ms177564.aspx

Here is the example that I created.
/*
Developer: Christopher Harsch
Definition: This is an example of how to do change data capture using unpivot.

This uses a sample table, and updates a record and shows the before and after results 
of the query that could then be later inserted into a per column logging table.

*/
IF Object_id('tempdb..#tmp_OUTPUT_CHG_DATA_CAPTURE_EXAMPLE') IS NOT NULL
  BEGIN
      DROP TABLE #tmp_output_chg_data_capture_example
  END

  -- Create a table to put some sample data
CREATE TABLE #tmp_output_chg_data_capture_example
  (
     id                   INT IDENTITY(1, 1)
     , firstname          VARCHAR(255)
     , lastname           VARCHAR(255)
     , numberofanimals    INT
     , awesomeindicator   CHAR(1)
     , awesomeind_type_cd VARCHAR(255)
  )

  -- Insert some sample data
INSERT INTO #tmp_output_chg_data_capture_example
            (firstname
             , lastname
             , numberofanimals
             , awesomeindicator
             , awesomeind_type_cd)
SELECT 'Chris'        AS FirstName
       , 'Harsch'     AS LastName
       , '0'          AS NUmberOfAnimals
       , 'N'          AS AwesomeIndicator
       , 'NotAwesome' AS AwesomeInd_Type_CD
UNION
SELECT 'Buzz'     AS FirstName
       , 'LightYear'   AS LastName
       , '-1'      AS NUmberOfAnimals
       , 'Y'       AS AwesomeIndicator
       , 'Awesome' AS AwesomeInd_Type_CD
UNION
SELECT 'Woody'    AS FirstName
       , 'AndysToy' AS LastName
       , '-1'       AS NUmberOfAnimals
       , 'Y'        AS AwesomeIndicator
       , 'Awesome'  AS AwesomeInd_Type_CD

IF Object_id('tempdb..#tmp_chng_capture') IS NOT NULL
  BEGIN
      DROP TABLE #tmp_chng_capture
  END

  -- See what is in the sample data.
select * from #tmp_OUTPUT_CHG_DATA_CAPTURE_EXAMPLE


-- Add the columns I want to show are being changed
-- These are the ones you are bringing in, don't capture things changing if you are  
-- not changing them.
-- Note, for this part I am turning things into a static data type as my method of 
-- before
-- and after comparisons require all the columns to have the same data type, 
-- varchar(size) is a good method for this
-- in this case I set the size to 255
CREATE TABLE #tmp_chng_capture
  (
     id                       INT NOT NULL
     , numberofanimals        VARCHAR(255)
     , numberofanimals_old    VARCHAR(255)
     , awesomeindicator       VARCHAR(255)
     , awesomeindicator_old   VARCHAR(255)
     , awesomeind_type_cd     VARCHAR(255)
     , awesomeind_type_cd_old VARCHAR(255)
  )

  -- Run the update, and use the OUTPUT clause to get the inserted/updated values
UPDATE tocdce
SET    tocdce.numberofanimals = '0'
       , awesomeindicator = 'Y'
       , awesomeind_type_cd = 'Awesome'
output inserted.id
       , inserted.numberofanimals
       , deleted.numberofanimals
       , inserted.awesomeindicator
       , deleted.awesomeindicator
       , inserted.awesomeind_type_cd
       , deleted.awesomeind_type_cd
INTO #tmp_chng_capture ( id, numberofanimals, numberofanimals_old,
awesomeindicator, awesomeindicator_old, awesomeind_type_cd,
awesomeind_type_cd_old )
FROM   #tmp_output_chg_data_capture_example TOCDCE
WHERE  lastname = 'Harsch'

-- Make sure we know woody has a dog (slinky dog)
UPDATE tocdce
SET    tocdce.numberofanimals = '1'
output inserted.id
       , inserted.numberofanimals
       , deleted.numberofanimals
       , inserted.awesomeindicator
       , deleted.awesomeindicator
       , inserted.awesomeind_type_cd
       , deleted.awesomeind_type_cd
INTO #tmp_chng_capture ( id, numberofanimals, numberofanimals_old,
awesomeindicator, awesomeindicator_old, awesomeind_type_cd,
awesomeind_type_cd_old )
FROM   #tmp_output_chg_data_capture_example TOCDCE
WHERE  lastname = 'AndysToy'

-- lets look at the output of the data changes
SELECT *
FROM   #tmp_chng_capture

-- now we want to do the unpivot magic. This unpivot is doing a double unpivot.
-- This uses a CTE but could easily have used selects into multiple temp tables for 
-- each step
-- Step 1: Get only the data that we care about from the table, this should only hold 
-- columns that we care
-- about unpivoting, everything else is extraneous.
-- Step 2: unpivot the new columns and capture the column names and the ID numbers 
-- for those columns.
-- Step 3: unpivot the old columns and capture the column names and the id numbers 
-- for those columns.
-- You will notice that in the old column set, I made the naming convention useful so 
-- that the columns
-- are just the name of the "new" columns, but with the name "_old" applied.
-- Step 4: Do a join on ID and column name, and remove the name "Old" from the old 
-- column names so that
-- it joins to the old columns.  This also does a comparison to see if the columns  
-- are actually different
-- and it also checks for the NULL condition, IE two null columns are the same, but 
-- will not be equivalent
-- if you do a straight comparison, so this does an AND NOT (BOTH NULL) clause.
; WITH cte_values_to_unpivot
     AS (SELECT id
                , numberofanimals
                , numberofanimals_old
                , awesomeindicator
                , awesomeindicator_old
                , awesomeind_type_cd
                , awesomeind_type_cd_old
         FROM   #tmp_chng_capture),
     cte_unpivoted_new
     AS (SELECT id
                , [columnvalue] AS [New_Value]
                , columnname    AS columnname
         FROM   cte_values_to_unpivot
                UNPIVOT ( columnvalue
                        FOR columnname IN ( numberofanimals
                                            , awesomeindicator
                                            , awesomeind_type_cd ) ) AS up1),
     cte_unpivoted_old
     AS (SELECT id
                , [columnvalue] AS [Old_Value]
                , columnname    AS columnname
         FROM   cte_values_to_unpivot
                UNPIVOT ( columnvalue
                        FOR columnname IN ( numberofanimals_old
                                            , awesomeindicator_old
                                            , awesomeind_type_cd_old ) ) AS up1)
SELECT cun.id
       , cun.columnname
       , cun.new_value
       , cuo.old_value
FROM   cte_unpivoted_new cun
       JOIN cte_unpivoted_old cuo
         ON cun.id = cuo.id
            AND cun.columnname = Replace(cuo.columnname, '_old', '') -- Make the column names match
WHERE  cun.new_value <> cuo.old_value
       AND NOT ( cun.new_value IS NULL
                 AND cuo.old_value IS NULL ) 
-- Catches for if both values are null and wont evaluate