Thursday, April 28, 2016

SSIS - Script Component Transform to TrimToNull Generically.

I have finally hit a problem that I had 10 years ago in SSIS.  The need for something to trim and then null any empty strings generically in SSIS.  Back then we used a compiled C# component, and the heavy use of that component caused headache for years.

Eventually we were able to use a vb script based script task to do the work, but that too has become unavailable in recent versions of SSIS (at least the method we were using).

So I had to whip one up yesterday.  Why? Why does someone feel that they need to do this instead of using a derived column task or manually taking all input columns and managing them individually?

Well, it takes a long time to do that for 100 columns, and I had about 30 minutes to make a fix and not all columns were string etc... etc... (excuses!).

So I did a little research and here is where I have come to this solution:


 //Include   
 using Microsoft.SqlServer.Dts.Pipeline;  
 .......  
 public override void ProcessInput(int InputID, PipelineBuffer Buffer)  
   {  
     while (Buffer.NextRow())  
     {  
       for (int columnIndex = 0;  
    columnIndex < Buffer.ColumnCount;  
    columnIndex++)  
       {  
         //string columnData = null;  
         if (Buffer.IsNull(columnIndex))  
         {  
           //columnData = "is NULL";  
         }  
         else  
         {  
           BufferColumn columnInfo = Buffer.GetColumnInfo(columnIndex);  
           switch (columnInfo.DataType)  
           {  
             case DataType.DT_BOOL:  
               //columnData = Buffer.GetBoolean(columnIndex).ToString();  
               break;  
             case DataType.DT_WSTR:  
               Buffer.SetString(columnIndex, Buffer.GetString(columnIndex).Trim());  
               if (Buffer.GetString(columnIndex) == "")  
               {  
                 Buffer.SetNull(columnIndex);  
               }  
               break;  
             case DataType.DT_STR:  
               //columnData += Buffer.GetString(columnIndex);  
               Buffer.SetString(columnIndex, Buffer.GetString(columnIndex).Trim());  
               if (Buffer.GetString(columnIndex) == "")  
               {  
                 Buffer.SetNull(columnIndex);  
               }  
               break;  
             // add code to support more data types here  
             default:  
              // columnData = "";  
               break;  
           }  
         }  
       }  
     }  
     base.ProcessInput(InputID, Buffer);  
   }  


So how does this code get into your script task?  Basically, you drop in a script component, connect your columns you want to clean as input, and put this script in the script component.

This code takes every single input that is a DT_STR or DT_WSTR and trims the data. If it is empty, it produces a NULL.

Not the best written code, I could have done a bit more, but it gives the mechanics of it.  The implementation requires adding a single include and pasting code into the buffer.  I got the base code a while ago from another source, but I don't remember the source.  I will add it if someone gives me a heads up.

Tuesday, April 12, 2016

Quick Reference For Ranking and Sums with Ranking

/*
Developer: Christopher Harsch

A quick and simple example of ranking and summing functions against a rank function.

Ranking Functions in SQL (some syntax only applies to 2012 sql server)

https://msdn.microsoft.com/en-us/library/ms189798.aspx

-- This shows the use of over clauses in sql which are used below.

https://msdn.microsoft.com/en-us/library/ms189461.aspx

*/

IF Object_id('tempdb..#tmp_rank') IS NOT NULL
  BEGIN
      DROP TABLE #tmp_rank
  END

CREATE TABLE #tmp_rank
  (
     myid   INT,
     value1 VARCHAR(255),
     value2 VARCHAR(255),
        DecimalValue decimal(18,2)
  )

INSERT INTO #tmp_rank
            (myid,
             value1,
             value2,
                      DecimalValue)
SELECT 1,
       'A',
       'B',
          10
UNION ALL
SELECT 1,
       'A',
       'C' ,
          10
UNION ALL
SELECT 1,
       'A',
       'D' ,
          10
UNION ALL
SELECT 1,
       'A',
       'B' ,
          10
UNION ALL
SELECT 2,
       'B',
       'B' ,
          10
UNION ALL
SELECT 2,
       'B',
       'B' ,
          10
UNION ALL
SELECT 1,
       'C',
       'D' ,
          10
UNION ALL
SELECT 1,
       'D',
       'B' ,
          10
UNION ALL
SELECT 1,
       'C',
       'A' ,
          10

IF object_id('tempdb..#tmp_ranks') IS NOT NULL
BEGIN
   DROP TABLE #tmp_ranks
END

SELECT *,
       Rank()
         OVER (
           partition BY tr.value1
           ORDER BY value2) AS myRank        -- Ranks with numbering being numerically continuous IE 1,1,3,4
       ,
       Dense_rank()
         OVER (
           partition BY tr.value1
           ORDER BY value2) AS mydenserank        -- This provides contiguous rank numbering within ranked sets, 1,1,2,3
       ,
       Row_number()
         OVER (
           partition BY tr.value1
           ORDER BY value2) AS myrownum -- Individually numbers
       ,
       Dense_rank()
         OVER (
           partition BY 1
           ORDER BY value1) AS myDenseRankgroupnumber        -- by partitioning by a static number you can use it as a group number.
       ,
       Rank()
         OVER (
           partition BY 1
           ORDER BY value1) AS myRankGroupNumber        -- Ranks with numbering being numerically continuous IE 1,1,3,4 not the best for group numbers
       , sum(DecimalValue) over(partition by myid) as SumForAllInId  -- Lets you sum by a grouping
       , sum(DecimalValue) over(partition by myid,value1) as SumForAllValue1InID -- Lets you sum within groups and sub groups
into #tmp_ranks
-- Individually numbers within a partitioned set.
FROM   #tmp_rank tr
-- This shows how you can use a two step ranking, then summing process.
-- In later versions of SQL this can be moved upward into the above query.
select
*
, SUM(1) over (Partition By myDenseRankgroupnumber) as  TotalInDenseGroup -- Count total records in a group that was created by the dense rank
, SUM(DecimalValue) over (Partition By myDenseRankgroupnumber) as  SumTotalInDenseGroup -- Sum records in a group that was created by the dense rank
from #tmp_ranks


Friday, April 8, 2016

SQL Server 2012 not properly opening .sql files and setting it in default files doesn't work.

Exactly as the title says. Sometimes I like to post things just to get their search rankings higher.  This is the solution I had for the problem where something went wrong in my installation of sql server 2012 and it _WOULD NOT_ properly open sql files by clicking on them directly, even using regular windows assignments.  This post on stack overflow has the best top answer for this issue and it fixed my problem.

http://stackoverflow.com/questions/14564469/opening-sql-files-in-ssms-2012-as-default-program-and-with-existing-instance