Wednesday, April 10, 2013

Package Meta Data from MSDB

A lot of times during release phases, there are instances where a package that is currently in release is not the package that is currently in production.  So how do you tease out what is in production if you don't have the ability to get the file directly via SQL Server management studio, but you DO have select access to the msdb and your packages are stored in MSDB?

The following code will grab pack folder information, and verbuild information from the SSIS meta data stored in MSDB for a specific package, and then display that information.  It contains the package name, the msdb folder, the version information, and other information as needed.  This code allows a user to quickly verify whether his expected build is in the MSDB, without running a package and using any kind of logging hints.


/*
Code originally taken, slightly modified from

Bill Fellows of http://billfellows.blogspot.com

http://billfellows.blogspot.com/search?q=sysssispackagefolders

This modification adds a packagename variable to make the code a bit more straight forward to use.

*/


DECLARE @packagename varchar(255)
set @packagename = 'Your package Name, can leave null for all packages'

;WITH FOLDERS AS
(
    -- Capture root node
    SELECT
        cast(PF.foldername AS varchar(max)) AS FolderPath
    ,   PF.folderid
    ,   PF.parentfolderid
    ,   PF.foldername
    FROM
        msdb.dbo.sysssispackagefolders PF
    WHERE
        PF.parentfolderid IS NULL

    -- build recursive hierarchy
    UNION ALL
    SELECT
        cast(F.FolderPath + '\' + PF.foldername AS varchar(max)) AS FolderPath
    ,   PF.folderid
    ,   PF.parentfolderid
    ,   PF.foldername
    FROM
        msdb.dbo.sysssispackagefolders PF
        INNER JOIN
            FOLDERS F
            ON F.folderid = PF.parentfolderid
)
,   PACKAGES AS
(
    -- pull information about stored SSIS packages
    SELECT
        P.name AS PackageName
    ,   P.id AS PackageId
    ,   P.description as PackageDescription
    ,   P.folderid
    ,   P.packageFormat
    ,   P.packageType
    ,   P.vermajor
    ,   P.verminor
    ,   P.verbuild
    ,   suser_sname(P.ownersid) AS ownername
    FROM
        msdb.dbo.sysssispackages P
)
SELECT
    F.FolderPath
,   P.PackageName
,   F.FolderPath + '\' + P.PackageName AS PackagePath
,   P.verbuild
,   P.ownername
,   P.packageFormat
,   P.packageType
,   P.vermajor
,   P.verminor

,   P.PackageId
FROM
    FOLDERS F
    INNER JOIN
        PACKAGES P
        ON P.folderid = F.folderid
        WHERE
        (
                  p.PackageName like '%' + @packagename + '%'
                  or
                  @packagename is null
            )
            and
     F.FolderPath <> '\Data Collector'

Related post:

http://themobilefieldbase.blogspot.com/2013/03/export-ssis-from-msdb-when-all-else.html

Tuesday, April 9, 2013

Sifting through SSIS logs

When debugging an SSIS package, depending on how the logging is set up, it can be difficult to track down issues that come up.  The most common thing you will find on the internet is the basic query against the log.  The basic query involves knowing the package name, or the source id, and looks something like this:


DECLARE @sourceid varchar(255)
set @sourceid = '{97841A9B-1E49-4B89-AD84-677D090B2AD0}' -- your package sourceid here.
select
*
from sysdb.dbo.sysssislog
where sourceid = sourceid
and starttime > '01/01/2013'
      and event like '%error%'




One of the issues with this basic query, is that a package can often fail spectacularly, and it can also fail in multiple ways, down multiple paths, and can also fail during post execute events.  This can make a log that is often confusing as to which error was the root error in a package.

For example.  Let us say that a package has a post execute event, the package is running with the distributed transaction manager enabled, running within a parent package, and has a sequence container containing a for loop over a data flow.  This may not seem likely, but is not as uncommon as one would think.  If the post execute event does an event which requires going against a sequence container, and is not constrained to fail if the package fails, the package can have many "important looking" error messages.

4-5 error messages if the data flow fails due to truncation
since the distributed transaction is going to rollback, any failures would then fail the post execute and look like a sql connection error, so you would get 4-5 messages for that percolating up the container chain as well.

Because there are tons of errors that can be caused, it may be easy to start trying to hunt down the first message error you will see in the log using a traditional example, that of the sql connection failing.

But the real error would have been the truncation error in the data flow, so how do you figure this out?

You could add a sort to the code from above, and sort by ID asc, but then you have a jumble of messages from multiple containers, and you have to sift through the incidental failure messages that piled up from later messages.

So the below code is what I have written to narrow down the scope of errors.  It isn't perfect, because it doesn't make any attempt to verify that any specific error message is related, but it will narrow down what errors are important, and the order of importance to view those errors.

The code groups together all of the messages that were returned, and when sort by best is set to 1, it orders the errors as having the highest priority in the result set.  It also returns the source chain of the errors, so you can see how the error percolated up through each container in the system.  There are situations in control flows that have error handling paths (red precedence constraints), but those are less likely to happen.  If I hit that situation myself in a debugging environment, I will create a post with the solution at a later date.



/*
-- First we want to find the execution ID, the following gives a method of that


-- If you have the source id that the package uses which can be found as the ID in the properties
-- of the package, use the following, just insert the start time you know it executed around
-- and get the execution id for the package execution instance.
select
*
from sysdb.dbo.sysssislog
where sourceid = '{97841A9B-1E49-4B89-AD84-677D090B2AD0}'
and starttime > '01/01/2013'
      and event = 'PackageStart'
order by id desc


-- If you want to locate the idea based on a package that failed...

DECLARE @sourceid varchar(255)
set @sourceid = '{97841A9B-1E49-4B89-AD84-677D090B2AD0}' -- your package sourceid here.
select
*
from sysdb.dbo.sysssislog
where sourceid = sourceid
and starttime > '01/01/2013'
      and event like '%error%'
order by id desc

--- If you only know the name of the source, but don't know how to get the sourceid

select
*
from sysdb.dbo.sysssislog
where source like '%mypackagename%'
and starttime > '01/01/2013' 
order by id desc
*/

/*
--- The debugging script takes a bit called sortbesterror
--- in this code I made it so that when 1, it sorts the result set
--- in a way that priorities error messages and other important message types,
--- and prioritizes those errors in a way that should have the most likely
--- error that is causing the problem show up first in the result set.

1 = best error messages
anything else just does linear reporting
*/
DECLARE @sortbesterror BIT

SET @sortbesterror = 1

DECLARE @executionid VARCHAR(255)

SET @executionid = 'B9F6C441-82D6-45A0-AE59-1015374C939B';

WITH ssis_messages
     AS (SELECT *,
                Replace(event, 'User:', '') AS best_event_message,
                Row_number()
                  OVER(
                    partition BY executionid, Replace(event, 'User:', ''),
                  message
                    ORDER BY id ASC)        row_num,
                Dense_rank()
                  OVER(
                    partition BY executionid
                    ORDER BY
                  --Determine Best Sorting
                  CASE WHEN @sortbesterror = 1 THEN ( CASE Replace(event,
                  'User:', ''
                  )
                  WHEN
                  'OnError' THEN 1 WHEN 'OnTaskFailed' THEN 2 WHEN
                  'OnInformation'
                  THEN
                  3 ELSE 4
                  END ) ELSE starttime END ASC, CASE WHEN @sortbesterror = 1
                  THEN 0
                  ELSE
                  id END
                  ASC, message ASC )        AS group_importance
         FROM   sysdb.dbo.sysssislog
         WHERE  executionid = @executionid),
     concatenations
     AS (SELECT group_importance,
                CONVERT(NVARCHAR(max), source) source,
                row_num
         FROM   ssis_messages
         WHERE  row_num = 1
         UNION ALL
         SELECT c.group_importance,
                ( c.source + ', ' + l.source ) source, -- concatonated sources
                l.row_num
         FROM   concatenations c -- this is a recursion! 
                INNER JOIN ssis_messages l
                        ON l.group_importance = c.group_importance  -- makes sure the same message group rank
                           AND l.row_num = c.row_num +-- concats in order
                           and c.source not like '%' + l.source + '%'  --- cleans out duplicate messaging
                           ),
     ranked_concats
     AS (SELECT *,
                Row_number()
                  OVER(
                    partition BY group_importance
                    ORDER BY row_num DESC) row_num_concats
         FROM   concatenations),
     sources
     AS (SELECT *
         FROM   ranked_concats
         WHERE  row_num_concats = 1)
SELECT sm.best_event_message AS debug_event,
         sm.event as debug_original_event_type,
       sm.group_importance,
       sm.starttime          AS debug_error_date,
       sm.message            AS debug_message,
       sm.sourceid           AS debug_best_source_id,
       sm.source             AS debug_best_source_name,
       s.source              AS debug_component_reporting_chain
FROM   ssis_messages sm
       JOIN sources s
         ON sm.group_importance = s.group_importance
            AND sm.row_num = 1
ORDER  BY s.group_importance ASC

Wednesday, April 3, 2013

Simple method to make debugging stored procedures easier.

The scenario: 

The client has come back with a problem, the web site is not returning the right data.  You have the latest release code for the stored procedure, but you cannot update the procedure in production, and getting the database backed up into a dev environment would take longer than the needed fix.  You know you can do selects against database, so you crack open the code for the stored procedure and you begin editing away.  

You figure out the problem, send a release to production... but you forgot to test the proc in dev, and what would be the point right... you didn't have the data to test the race condition their anyway right?  The proc fails to compile, you left your testing variables in the proc!  You now have to go back, remove your testing code, and re-submit, taking precious time and you look bad.

This is actually a pretty common scenario in software development, especially in database development when their is significant response speed required, and a high demand client/business user breathing down your neck to get something done.  So how do you mitigate this problem.  In this case the developer codes an actual fix, he is able to select from the database, get the right results, but forgot to go through the grinding steps of properly cleaning up his work for a production release.  

I have found that the best method for this kind of thing is what I call a test harness (if anyone corrects my term I will update this article).  I am not sure if this is the proper term, but it is one I have seen used.  Because of the way that MS SQL server compiles its code, you can often add data to a stored proc that will allow you to run the code of a stored procedure in place by selecting a subset of code, without having to worry that the stored procedure will fail when the entire script is run by someone trying to deploy the code.

Lets take a following simple example:

CREATE PROCEDURE dbo.MyExampleFailProc (@Input1 VARCHAR(255), 
                                        @Input2 VARCHAR(255)) 
AS 
    SELECT @Input2 AS Input1, 
           @Input1 AS Input2 

go 

EXEC dbo.MyExampleFailProc
  1, 
  2 

The client comes back and says that 1 and 2 are reversed, and we are going to pretend that this is a MUCH larger proc that has a lot of debugging required.

In the end the developer has this code sitting in his Management Studio instance

ALTER PROCEDURE dbo.MyExampleFailProc (@Input1 VARCHAR(255), 
                                        @Input2 VARCHAR(255)) 
AS 
    DECLARE @Input1 VARCHAR(255), 
            @Input2 VARCHAR(255) 

    SET @Input1 = 1 
    SET @Input2 = 2 

    SELECT @Input1 AS Input1, 
           @Input2 AS Input2
As you can see the underlying logic of the select statement is correct, but he then sends this code off to be deployed (again, pretend it is a bigger problem).

The developer gets back the following error from the 

Msg 134, Level 15, State 1, Procedure MyExampleFailProc, Line 12
The variable name '@Input1' has already been declared. Variable names must be unique within a query batch or stored procedure.

The developer then goes back and released the correct code.

ALTER PROCEDURE dbo.MyExampleFailProc(@Input1 VARCHAR(255), 
                                        @Input2 VARCHAR(255)) 
AS 
    SELECT @Input1 AS Input1, 
           @Input2 AS Input2 

So what is an easy way to avoid this kind of issue?

This is by using a test harness.  When creating your stored procedure, make an assumption that the procedure will likely be wrong at some point, and need a quick turn around, then code to mitigate this contingency.  Instead of pretending that every coding environment is pristine, everyone does the right thing the first time, and nothing bad will ever happen, always assume that something bad will happen, and it will need to be fixed.  The test harness method allows you to do this.  Basically it assumes that at some point you will not have access to change a stored proc, that you WILL have access to execute and select, and that the data you have available in dev will not match that in production and will be hard to get to.  

What can be done is to use a small bit of knowledge about how SQL Server 2008 (and thus management studio) parses comment code, and you can debug stored procedure contents much faster.

To do this, you add the following to the stored procedure, just after the "AS" statement.

/*
--Put your declarations and set operations here:

--*/

By doing this, you can select your declaration and set operations for your testing, and your main code will still deploy.  Here is an example using the defined situation above:

ALTER procedure dbo.MyExampleFailProc
(
      @Input1 varchar(255)
      ,@Input2 varchar(255)
     
)
AS
/*
--Put your declarations and set operations here:
DECLARE @Input1 varchar(255)
      ,@Input2 varchar(255)

SET  @Input1 = 1
SET  @Input2 = 2
--*/


SELECT
       @Input1 as Input1
      ,@Input2 as Input2

Now if the developer selects the code from the DECLARE statement down, it will execute:

DECLARE @Input1 varchar(255)
      ,@Input2 varchar(255)

SET  @Input1 = 1
SET  @Input2 = 2
--*/


SELECT
       @Input1 as Input1
      ,@Input2 as Input2



And return valid test results:

Input1 Input2

1       2
(1 row(s) affected)

This works because when you select this in this manner, it does not see */ as a trailing comment block, it just comments it out like anything else, but when the code is executed in total, it sees it as a trailing comment block, and ignores the declare/set statements that were being used for debugging.

I have used this many times, and I have found it to be helpful to me for testing.  It violates good formatting principles that some have, but I think that the time savings when speed sometimes trumps careful coding, can be a real help.