Tuesday, March 21, 2017

Count vs Count distinct in SQL

Had a new coder ask me how count works, vs count distinct. So I made a small example:

create table #tmp_example
(
       id int identity(1,1)
       , value varchar(255)
)

insert into #tmp_example
(
       value
)
select '1'
UNION select '2'
UNION select '3'
UNION select NULL


select
count(*)
, count(distinct value)
, count(distinct case when value = 1 then null else value end)
from #tmp_example


In this example it returns 4, 3, 2

4 because there are 4 rows (*) in the system.
3 because there are 3 distinct values where NULL is not counted a value in count distinct.
2 because there are 2 distinct values when 1 is cast as null during the counting process.

Saturday, January 14, 2017

Google Chrome does not work but Internet Explorer does

This is for those of you who have gone through the effort of checking the internet explorer->internet options-> and UNCHECK the proxy server, and it didn't work for you.

Another possible solution is to flush the DNS cache of your computer. Sometimes Google just outright loses the ip it needs to connect to and won't work on getting a new one.

Go to the command prompt by going to the start menu, and in the command box tying "cmd" this will bring up a dos box.

In the dos box type "Ipconfig /flushdns" and see if that works for you.  I have seen people think I am "amazing" for knowing how to do this, but this used to be quite a common problem in older versions of windows, and the trick still works sometimes with common connectivity problems.  Now why it allows one browser to work when the other won't... that still confuses me, but there you go.

Monday, November 21, 2016

Advanced Debugging in SSIS no Script Task Related

http://www.techbrothersit.com/2013/12/ssis-how-to-watch-value-of-variables.html

I wanted to just have this out there.  More of a re-blog because it needs to be pushed up.  This is how you do this properly when you are having an issue and are unsure of the exact value of a record. I also tend to log every variable using a script task as well during run time so that I can review the ssis logs for it.

Thursday, November 3, 2016

In Place Updates of Two Fields To Each Other in SQL

Not really sure how to classify this, but there are cases when you want to do an update of data in your system where you update one field to another field and set the other field to NULL.  Those of us that have had to approach this problem before know what the answer is, but I decided to create an example.

So the situation is this:  I have a table with two columns, and I want the first column to be overridden by the second column, and the second column to be set to NULL, if and ONLY if the second column is NOT null.  



The final output I want to be is this:

As you can see, Four was not overridden by Null, but One and Two in Column UserOne were updated by the values in the field UserTwo.

The reason I thought this was interesting as a problem was because of the question I was asked by a junior developer asking how he would track the update step to make sure that he updated the fields to NULL and I realized that I had never really thought about it because SQL Server handles this situation quite elegantly.  You don't HAVE to track what was updated, when doing the update it will use the values that are available at time of update, it does not do in order updates at all.

In reality the only statement you have to use is provided as a single update in the example below.  Order of the columns being set doesn't matter either.

IF object_id('tempdb..#tmp_testexample') IS NOT NULL
BEGIN
   DROP TABLE #tmp_testexample
END

create table #tmp_testexample
(
       ID int identity(1,1)
       ,UserOne varchar(255)
       ,UserTwo varchar(255)
)

insert into #tmp_testexample
(

       UserOne,UserTwo
)
SELECT
'One','Two'
UNION
Select 'Two','Three'

UNION SELECT 'Four',null

SELECT * FROM #tmp_testexample

select
*
from #tmp_testexample
where UserTwo is not null
       and UserOne <> UserTwo

update tta
set tta.UserOne = tta.UserTwo
, tta.UserTwo = NULL
from #tmp_testexample tta
where UserTwo is not null
       and UserOne <> UserTwo


select
*
from #tmp_testexample

This produces the output:

Original Data Set:

ID
UserOne
UserTwo
1
Four
NULL
2
One
Two
3
Two
Three
Example of what I want to update:

ID
UserOne
UserTwo
2
One
Two
3
Two
Three
Final Result after single update statement:

ID
UserOne
UserTwo
1
Four
NULL
2
Two
NULL
3
Three
NULL

Thursday, April 28, 2016

SSIS - Script Component Transform to TrimToNull Generically.

I have finally hit a problem that I had 10 years ago in SSIS.  The need for something to trim and then null any empty strings generically in SSIS.  Back then we used a compiled C# component, and the heavy use of that component caused headache for years.

Eventually we were able to use a vb script based script task to do the work, but that too has become unavailable in recent versions of SSIS (at least the method we were using).

So I had to whip one up yesterday.  Why? Why does someone feel that they need to do this instead of using a derived column task or manually taking all input columns and managing them individually?

Well, it takes a long time to do that for 100 columns, and I had about 30 minutes to make a fix and not all columns were string etc... etc... (excuses!).

So I did a little research and here is where I have come to this solution:


 //Include   
 using Microsoft.SqlServer.Dts.Pipeline;  
 .......  
 public override void ProcessInput(int InputID, PipelineBuffer Buffer)  
   {  
     while (Buffer.NextRow())  
     {  
       for (int columnIndex = 0;  
    columnIndex < Buffer.ColumnCount;  
    columnIndex++)  
       {  
         //string columnData = null;  
         if (Buffer.IsNull(columnIndex))  
         {  
           //columnData = "is NULL";  
         }  
         else  
         {  
           BufferColumn columnInfo = Buffer.GetColumnInfo(columnIndex);  
           switch (columnInfo.DataType)  
           {  
             case DataType.DT_BOOL:  
               //columnData = Buffer.GetBoolean(columnIndex).ToString();  
               break;  
             case DataType.DT_WSTR:  
               Buffer.SetString(columnIndex, Buffer.GetString(columnIndex).Trim());  
               if (Buffer.GetString(columnIndex) == "")  
               {  
                 Buffer.SetNull(columnIndex);  
               }  
               break;  
             case DataType.DT_STR:  
               //columnData += Buffer.GetString(columnIndex);  
               Buffer.SetString(columnIndex, Buffer.GetString(columnIndex).Trim());  
               if (Buffer.GetString(columnIndex) == "")  
               {  
                 Buffer.SetNull(columnIndex);  
               }  
               break;  
             // add code to support more data types here  
             default:  
              // columnData = "";  
               break;  
           }  
         }  
       }  
     }  
     base.ProcessInput(InputID, Buffer);  
   }  


So how does this code get into your script task?  Basically, you drop in a script component, connect your columns you want to clean as input, and put this script in the script component.

This code takes every single input that is a DT_STR or DT_WSTR and trims the data. If it is empty, it produces a NULL.

Not the best written code, I could have done a bit more, but it gives the mechanics of it.  The implementation requires adding a single include and pasting code into the buffer.  I got the base code a while ago from another source, but I don't remember the source.  I will add it if someone gives me a heads up.

Tuesday, April 12, 2016

Quick Reference For Ranking and Sums with Ranking

/*
Developer: Christopher Harsch

A quick and simple example of ranking and summing functions against a rank function.

Ranking Functions in SQL (some syntax only applies to 2012 sql server)

https://msdn.microsoft.com/en-us/library/ms189798.aspx

-- This shows the use of over clauses in sql which are used below.

https://msdn.microsoft.com/en-us/library/ms189461.aspx

*/

IF Object_id('tempdb..#tmp_rank') IS NOT NULL
  BEGIN
      DROP TABLE #tmp_rank
  END

CREATE TABLE #tmp_rank
  (
     myid   INT,
     value1 VARCHAR(255),
     value2 VARCHAR(255),
        DecimalValue decimal(18,2)
  )

INSERT INTO #tmp_rank
            (myid,
             value1,
             value2,
                      DecimalValue)
SELECT 1,
       'A',
       'B',
          10
UNION ALL
SELECT 1,
       'A',
       'C' ,
          10
UNION ALL
SELECT 1,
       'A',
       'D' ,
          10
UNION ALL
SELECT 1,
       'A',
       'B' ,
          10
UNION ALL
SELECT 2,
       'B',
       'B' ,
          10
UNION ALL
SELECT 2,
       'B',
       'B' ,
          10
UNION ALL
SELECT 1,
       'C',
       'D' ,
          10
UNION ALL
SELECT 1,
       'D',
       'B' ,
          10
UNION ALL
SELECT 1,
       'C',
       'A' ,
          10

IF object_id('tempdb..#tmp_ranks') IS NOT NULL
BEGIN
   DROP TABLE #tmp_ranks
END

SELECT *,
       Rank()
         OVER (
           partition BY tr.value1
           ORDER BY value2) AS myRank        -- Ranks with numbering being numerically continuous IE 1,1,3,4
       ,
       Dense_rank()
         OVER (
           partition BY tr.value1
           ORDER BY value2) AS mydenserank        -- This provides contiguous rank numbering within ranked sets, 1,1,2,3
       ,
       Row_number()
         OVER (
           partition BY tr.value1
           ORDER BY value2) AS myrownum -- Individually numbers
       ,
       Dense_rank()
         OVER (
           partition BY 1
           ORDER BY value1) AS myDenseRankgroupnumber        -- by partitioning by a static number you can use it as a group number.
       ,
       Rank()
         OVER (
           partition BY 1
           ORDER BY value1) AS myRankGroupNumber        -- Ranks with numbering being numerically continuous IE 1,1,3,4 not the best for group numbers
       , sum(DecimalValue) over(partition by myid) as SumForAllInId  -- Lets you sum by a grouping
       , sum(DecimalValue) over(partition by myid,value1) as SumForAllValue1InID -- Lets you sum within groups and sub groups
into #tmp_ranks
-- Individually numbers within a partitioned set.
FROM   #tmp_rank tr
-- This shows how you can use a two step ranking, then summing process.
-- In later versions of SQL this can be moved upward into the above query.
select
*
, SUM(1) over (Partition By myDenseRankgroupnumber) as  TotalInDenseGroup -- Count total records in a group that was created by the dense rank
, SUM(DecimalValue) over (Partition By myDenseRankgroupnumber) as  SumTotalInDenseGroup -- Sum records in a group that was created by the dense rank
from #tmp_ranks