Thursday, April 28, 2016

SSIS - Script Component Transform to TrimToNull Generically.

I have finally hit a problem that I had 10 years ago in SSIS.  The need for something to trim and then null any empty strings generically in SSIS.  Back then we used a compiled C# component, and the heavy use of that component caused headache for years.

Eventually we were able to use a vb script based script task to do the work, but that too has become unavailable in recent versions of SSIS (at least the method we were using).

So I had to whip one up yesterday.  Why? Why does someone feel that they need to do this instead of using a derived column task or manually taking all input columns and managing them individually?

Well, it takes a long time to do that for 100 columns, and I had about 30 minutes to make a fix and not all columns were string etc... etc... (excuses!).

So I did a little research and here is where I have come to this solution:


 //Include   
 using Microsoft.SqlServer.Dts.Pipeline;  
 .......  
 public override void ProcessInput(int InputID, PipelineBuffer Buffer)  
   {  
     while (Buffer.NextRow())  
     {  
       for (int columnIndex = 0;  
    columnIndex < Buffer.ColumnCount;  
    columnIndex++)  
       {  
         //string columnData = null;  
         if (Buffer.IsNull(columnIndex))  
         {  
           //columnData = "is NULL";  
         }  
         else  
         {  
           BufferColumn columnInfo = Buffer.GetColumnInfo(columnIndex);  
           switch (columnInfo.DataType)  
           {  
             case DataType.DT_BOOL:  
               //columnData = Buffer.GetBoolean(columnIndex).ToString();  
               break;  
             case DataType.DT_WSTR:  
               Buffer.SetString(columnIndex, Buffer.GetString(columnIndex).Trim());  
               if (Buffer.GetString(columnIndex) == "")  
               {  
                 Buffer.SetNull(columnIndex);  
               }  
               break;  
             case DataType.DT_STR:  
               //columnData += Buffer.GetString(columnIndex);  
               Buffer.SetString(columnIndex, Buffer.GetString(columnIndex).Trim());  
               if (Buffer.GetString(columnIndex) == "")  
               {  
                 Buffer.SetNull(columnIndex);  
               }  
               break;  
             // add code to support more data types here  
             default:  
              // columnData = "";  
               break;  
           }  
         }  
       }  
     }  
     base.ProcessInput(InputID, Buffer);  
   }  


So how does this code get into your script task?  Basically, you drop in a script component, connect your columns you want to clean as input, and put this script in the script component.

This code takes every single input that is a DT_STR or DT_WSTR and trims the data. If it is empty, it produces a NULL.

Not the best written code, I could have done a bit more, but it gives the mechanics of it.  The implementation requires adding a single include and pasting code into the buffer.  I got the base code a while ago from another source, but I don't remember the source.  I will add it if someone gives me a heads up.

Tuesday, April 12, 2016

Quick Reference For Ranking and Sums with Ranking

/*
Developer: Christopher Harsch

A quick and simple example of ranking and summing functions against a rank function.

Ranking Functions in SQL (some syntax only applies to 2012 sql server)

https://msdn.microsoft.com/en-us/library/ms189798.aspx

-- This shows the use of over clauses in sql which are used below.

https://msdn.microsoft.com/en-us/library/ms189461.aspx

*/

IF Object_id('tempdb..#tmp_rank') IS NOT NULL
  BEGIN
      DROP TABLE #tmp_rank
  END

CREATE TABLE #tmp_rank
  (
     myid   INT,
     value1 VARCHAR(255),
     value2 VARCHAR(255),
        DecimalValue decimal(18,2)
  )

INSERT INTO #tmp_rank
            (myid,
             value1,
             value2,
                      DecimalValue)
SELECT 1,
       'A',
       'B',
          10
UNION ALL
SELECT 1,
       'A',
       'C' ,
          10
UNION ALL
SELECT 1,
       'A',
       'D' ,
          10
UNION ALL
SELECT 1,
       'A',
       'B' ,
          10
UNION ALL
SELECT 2,
       'B',
       'B' ,
          10
UNION ALL
SELECT 2,
       'B',
       'B' ,
          10
UNION ALL
SELECT 1,
       'C',
       'D' ,
          10
UNION ALL
SELECT 1,
       'D',
       'B' ,
          10
UNION ALL
SELECT 1,
       'C',
       'A' ,
          10

IF object_id('tempdb..#tmp_ranks') IS NOT NULL
BEGIN
   DROP TABLE #tmp_ranks
END

SELECT *,
       Rank()
         OVER (
           partition BY tr.value1
           ORDER BY value2) AS myRank        -- Ranks with numbering being numerically continuous IE 1,1,3,4
       ,
       Dense_rank()
         OVER (
           partition BY tr.value1
           ORDER BY value2) AS mydenserank        -- This provides contiguous rank numbering within ranked sets, 1,1,2,3
       ,
       Row_number()
         OVER (
           partition BY tr.value1
           ORDER BY value2) AS myrownum -- Individually numbers
       ,
       Dense_rank()
         OVER (
           partition BY 1
           ORDER BY value1) AS myDenseRankgroupnumber        -- by partitioning by a static number you can use it as a group number.
       ,
       Rank()
         OVER (
           partition BY 1
           ORDER BY value1) AS myRankGroupNumber        -- Ranks with numbering being numerically continuous IE 1,1,3,4 not the best for group numbers
       , sum(DecimalValue) over(partition by myid) as SumForAllInId  -- Lets you sum by a grouping
       , sum(DecimalValue) over(partition by myid,value1) as SumForAllValue1InID -- Lets you sum within groups and sub groups
into #tmp_ranks
-- Individually numbers within a partitioned set.
FROM   #tmp_rank tr
-- This shows how you can use a two step ranking, then summing process.
-- In later versions of SQL this can be moved upward into the above query.
select
*
, SUM(1) over (Partition By myDenseRankgroupnumber) as  TotalInDenseGroup -- Count total records in a group that was created by the dense rank
, SUM(DecimalValue) over (Partition By myDenseRankgroupnumber) as  SumTotalInDenseGroup -- Sum records in a group that was created by the dense rank
from #tmp_ranks


Friday, April 8, 2016

SQL Server 2012 not properly opening .sql files and setting it in default files doesn't work.

Exactly as the title says. Sometimes I like to post things just to get their search rankings higher.  This is the solution I had for the problem where something went wrong in my installation of sql server 2012 and it _WOULD NOT_ properly open sql files by clicking on them directly, even using regular windows assignments.  This post on stack overflow has the best top answer for this issue and it fixed my problem.

http://stackoverflow.com/questions/14564469/opening-sql-files-in-ssms-2012-as-default-program-and-with-existing-instance

Thursday, March 31, 2016

SQL 2012 - SUM syntax in 2012. IE: I need to do more research on the latest SQL Server versions.

So I have written SQL for a long time, and what happens sometimes is that you get entrenched in a specific way of doing things without realizing that new, better, methods can exist.  Not only are these methods new and better, they are also much faster as well (not just easier to write).

I had this happen today when someone mentioned to me that the method of doing month over month sums by year that I was doing were archaic.  This piqued my interest because that is not something I would normally have expected to hear.  

Basically, the problem is this, I have a bunch of data being summed and I want to make sure that for any specific date I have the rolling number of the previous months sums.  So if I had 10 dollars in January, 10 dollars in February, and 10 dollars in March, then by March I have a total of 30 bucks, February 20 bucks, and January is just 10.  Seems pretty straight forward, and this is a totally common task.

What I didn't realize is that in SQL Server 2012 and beyond, the code to accomplish this goal is less obtuse because functionality was added to the SUM syntax.

So I created a small example that you can review:

IF Object_id('tempdb..#tmp') IS NOT NULL 
  
BEGIN 
      
DROP TABLE #tmp 
  
END 

CREATE TABLE #tmp 
  
( 
     
id      INT IDENTITY(1, 1), 
     
mth_nb  INT, 
     
year_nb INT, 
     
value   DECIMAL(16, 2) 
  
) 

INSERT INTO #tmp 
            
(mth_nb, 
             
year_nb, 
             
value) 
SELECT '1'  MTH_NB, 
       2016 
year_nb, 
       
'10' AS value 
UNION 
SELECT '2'  MTH_NB, 
       2016 
year_nb, 
       
'10' AS value 
UNION 
SELECT '3'  MTH_NB, 
       2016 
year_nb, 
       
'10' AS value 
UNION 
SELECT '4'  MTH_NB, 
       2016 
year_nb, 
       
'10' AS value 
UNION 
SELECT '5'  MTH_NB, 
       2016 
year_nb, 
       
'10' AS value 
UNION 
SELECT '1'  MTH_NB, 
       2015 
year_nb, 
       
'1'  AS value 
UNION 
SELECT '2'  MTH_NB, 
       2015 
year_nb, 
       
'1'  AS value 
UNION 
SELECT '3'  MTH_NB, 
       2015 
year_nb, 
       
'1'  AS value 
UNION 
SELECT '4'  MTH_NB, 
       2015 
year_nb, 
       
'1'  AS value 
UNION 
SELECT '5'  MTH_NB, 
       2015 
year_nb, 
       
'1'  AS value 

-- The sql server 2008 method 
SELECT ta.*, 
       
tart.runningtotal 
FROM   #tmp ta 
       
OUTER apply (SELECT year_nb, 
                           
Sum(value) AS RunningTotal 
                    
FROM   #tmp tsub 
                    
WHERE  ta.year_nb = tsub.year_nb 
                           
AND ta.mth_nb >= tsub.mth_nb 
                    
GROUP  BY year_nb) tart 
ORDER  BY ta.year_nb, 
          
mth_nb 

-- sql server 2012 method 
SELECT *, 
       
Sum(value) 
         
OVER( 
           
partition BY year_nb 
           
ORDER BY mth_nb) RunningTotal 
FROM   #tmp

The output of both is the correct answer!  But the SQL Server 2012 method is far faster!

id
MTH_NB
year_nb
value
RunningTotal
1
1
2015
1
1
3
2
2015
1
2
5
3
2015
1
3
7
4
2015
1
4
9
5
2015
1
5
2
1
2016
10
10
4
2
2016
10
20
6
3
2016
10
30
8
4
2016
10
40
10
5
2016
10
50


Here is the MSDN article on SUM.

https://msdn.microsoft.com/en-us/library/ms187810.aspx

NOTE: Even though the MSDN article says it starts in 2008, I try to run the new syntax on my  2008 instance and it doesn't work.