Tuesday, April 9, 2013

Sifting through SSIS logs

When debugging an SSIS package, depending on how the logging is set up, it can be difficult to track down issues that come up.  The most common thing you will find on the internet is the basic query against the log.  The basic query involves knowing the package name, or the source id, and looks something like this:


DECLARE @sourceid varchar(255)
set @sourceid = '{97841A9B-1E49-4B89-AD84-677D090B2AD0}' -- your package sourceid here.
select
*
from sysdb.dbo.sysssislog
where sourceid = sourceid
and starttime > '01/01/2013'
      and event like '%error%'




One of the issues with this basic query, is that a package can often fail spectacularly, and it can also fail in multiple ways, down multiple paths, and can also fail during post execute events.  This can make a log that is often confusing as to which error was the root error in a package.

For example.  Let us say that a package has a post execute event, the package is running with the distributed transaction manager enabled, running within a parent package, and has a sequence container containing a for loop over a data flow.  This may not seem likely, but is not as uncommon as one would think.  If the post execute event does an event which requires going against a sequence container, and is not constrained to fail if the package fails, the package can have many "important looking" error messages.

4-5 error messages if the data flow fails due to truncation
since the distributed transaction is going to rollback, any failures would then fail the post execute and look like a sql connection error, so you would get 4-5 messages for that percolating up the container chain as well.

Because there are tons of errors that can be caused, it may be easy to start trying to hunt down the first message error you will see in the log using a traditional example, that of the sql connection failing.

But the real error would have been the truncation error in the data flow, so how do you figure this out?

You could add a sort to the code from above, and sort by ID asc, but then you have a jumble of messages from multiple containers, and you have to sift through the incidental failure messages that piled up from later messages.

So the below code is what I have written to narrow down the scope of errors.  It isn't perfect, because it doesn't make any attempt to verify that any specific error message is related, but it will narrow down what errors are important, and the order of importance to view those errors.

The code groups together all of the messages that were returned, and when sort by best is set to 1, it orders the errors as having the highest priority in the result set.  It also returns the source chain of the errors, so you can see how the error percolated up through each container in the system.  There are situations in control flows that have error handling paths (red precedence constraints), but those are less likely to happen.  If I hit that situation myself in a debugging environment, I will create a post with the solution at a later date.



/*
-- First we want to find the execution ID, the following gives a method of that


-- If you have the source id that the package uses which can be found as the ID in the properties
-- of the package, use the following, just insert the start time you know it executed around
-- and get the execution id for the package execution instance.
select
*
from sysdb.dbo.sysssislog
where sourceid = '{97841A9B-1E49-4B89-AD84-677D090B2AD0}'
and starttime > '01/01/2013'
      and event = 'PackageStart'
order by id desc


-- If you want to locate the idea based on a package that failed...

DECLARE @sourceid varchar(255)
set @sourceid = '{97841A9B-1E49-4B89-AD84-677D090B2AD0}' -- your package sourceid here.
select
*
from sysdb.dbo.sysssislog
where sourceid = sourceid
and starttime > '01/01/2013'
      and event like '%error%'
order by id desc

--- If you only know the name of the source, but don't know how to get the sourceid

select
*
from sysdb.dbo.sysssislog
where source like '%mypackagename%'
and starttime > '01/01/2013' 
order by id desc
*/

/*
--- The debugging script takes a bit called sortbesterror
--- in this code I made it so that when 1, it sorts the result set
--- in a way that priorities error messages and other important message types,
--- and prioritizes those errors in a way that should have the most likely
--- error that is causing the problem show up first in the result set.

1 = best error messages
anything else just does linear reporting
*/
DECLARE @sortbesterror BIT

SET @sortbesterror = 1

DECLARE @executionid VARCHAR(255)

SET @executionid = 'B9F6C441-82D6-45A0-AE59-1015374C939B';

WITH ssis_messages
     AS (SELECT *,
                Replace(event, 'User:', '') AS best_event_message,
                Row_number()
                  OVER(
                    partition BY executionid, Replace(event, 'User:', ''),
                  message
                    ORDER BY id ASC)        row_num,
                Dense_rank()
                  OVER(
                    partition BY executionid
                    ORDER BY
                  --Determine Best Sorting
                  CASE WHEN @sortbesterror = 1 THEN ( CASE Replace(event,
                  'User:', ''
                  )
                  WHEN
                  'OnError' THEN 1 WHEN 'OnTaskFailed' THEN 2 WHEN
                  'OnInformation'
                  THEN
                  3 ELSE 4
                  END ) ELSE starttime END ASC, CASE WHEN @sortbesterror = 1
                  THEN 0
                  ELSE
                  id END
                  ASC, message ASC )        AS group_importance
         FROM   sysdb.dbo.sysssislog
         WHERE  executionid = @executionid),
     concatenations
     AS (SELECT group_importance,
                CONVERT(NVARCHAR(max), source) source,
                row_num
         FROM   ssis_messages
         WHERE  row_num = 1
         UNION ALL
         SELECT c.group_importance,
                ( c.source + ', ' + l.source ) source, -- concatonated sources
                l.row_num
         FROM   concatenations c -- this is a recursion! 
                INNER JOIN ssis_messages l
                        ON l.group_importance = c.group_importance  -- makes sure the same message group rank
                           AND l.row_num = c.row_num +-- concats in order
                           and c.source not like '%' + l.source + '%'  --- cleans out duplicate messaging
                           ),
     ranked_concats
     AS (SELECT *,
                Row_number()
                  OVER(
                    partition BY group_importance
                    ORDER BY row_num DESC) row_num_concats
         FROM   concatenations),
     sources
     AS (SELECT *
         FROM   ranked_concats
         WHERE  row_num_concats = 1)
SELECT sm.best_event_message AS debug_event,
         sm.event as debug_original_event_type,
       sm.group_importance,
       sm.starttime          AS debug_error_date,
       sm.message            AS debug_message,
       sm.sourceid           AS debug_best_source_id,
       sm.source             AS debug_best_source_name,
       s.source              AS debug_component_reporting_chain
FROM   ssis_messages sm
       JOIN sources s
         ON sm.group_importance = s.group_importance
            AND sm.row_num = 1
ORDER  BY s.group_importance ASC

No comments:

Post a Comment