Thursday, April 28, 2016

SSIS - Script Component Transform to TrimToNull Generically.

I have finally hit a problem that I had 10 years ago in SSIS.  The need for something to trim and then null any empty strings generically in SSIS.  Back then we used a compiled C# component, and the heavy use of that component caused headache for years.

Eventually we were able to use a vb script based script task to do the work, but that too has become unavailable in recent versions of SSIS (at least the method we were using).

So I had to whip one up yesterday.  Why? Why does someone feel that they need to do this instead of using a derived column task or manually taking all input columns and managing them individually?

Well, it takes a long time to do that for 100 columns, and I had about 30 minutes to make a fix and not all columns were string etc... etc... (excuses!).

So I did a little research and here is where I have come to this solution:


 //Include   
 using Microsoft.SqlServer.Dts.Pipeline;  
 .......  
 public override void ProcessInput(int InputID, PipelineBuffer Buffer)  
   {  
     while (Buffer.NextRow())  
     {  
       for (int columnIndex = 0;  
    columnIndex < Buffer.ColumnCount;  
    columnIndex++)  
       {  
         //string columnData = null;  
         if (Buffer.IsNull(columnIndex))  
         {  
           //columnData = "is NULL";  
         }  
         else  
         {  
           BufferColumn columnInfo = Buffer.GetColumnInfo(columnIndex);  
           switch (columnInfo.DataType)  
           {  
             case DataType.DT_BOOL:  
               //columnData = Buffer.GetBoolean(columnIndex).ToString();  
               break;  
             case DataType.DT_WSTR:  
               Buffer.SetString(columnIndex, Buffer.GetString(columnIndex).Trim());  
               if (Buffer.GetString(columnIndex) == "")  
               {  
                 Buffer.SetNull(columnIndex);  
               }  
               break;  
             case DataType.DT_STR:  
               //columnData += Buffer.GetString(columnIndex);  
               Buffer.SetString(columnIndex, Buffer.GetString(columnIndex).Trim());  
               if (Buffer.GetString(columnIndex) == "")  
               {  
                 Buffer.SetNull(columnIndex);  
               }  
               break;  
             // add code to support more data types here  
             default:  
              // columnData = "";  
               break;  
           }  
         }  
       }  
     }  
     base.ProcessInput(InputID, Buffer);  
   }  


So how does this code get into your script task?  Basically, you drop in a script component, connect your columns you want to clean as input, and put this script in the script component.

This code takes every single input that is a DT_STR or DT_WSTR and trims the data. If it is empty, it produces a NULL.

Not the best written code, I could have done a bit more, but it gives the mechanics of it.  The implementation requires adding a single include and pasting code into the buffer.  I got the base code a while ago from another source, but I don't remember the source.  I will add it if someone gives me a heads up.