Todd McDermid's Blog: Finding the Column Name for an ErrorColumn Code in SSIS (prior to 2016)

Fantastic news from the SSIS team at Microsoft. After a decade of work, they've been able to address all the bugs and features that somehow ranked higher than this. This problem is fixed! They've made it easy to tell what column an error is about in an SSIS 2016 data flow. And just like them, I'm late to the party too. I've finally found time to tell you how to get the exact same thing from versions earlier than 2016. That's right, read on to find out how to get SSIS to decode the ErrorColumn value that's on the Error output with in-the-box stuff only. Without add-on tools. Without custom components. Without pre-populated lookup tables made by utilities. (No judgment on the preceeding - use what you've got!)

What's the Problem?

Now, you're certainly reading this because you've run into an error in your code with SSIS. Seriously, nobody is good enough to code a data flow error free the first time. (And if you are, you're taking too much time to code - lean on the tool a little bit!) Given that we're all fallible, we'll all have run into the problem (especially in production) where a component in the data flow gets invalid input, routes a row to the error output, and... can't tell you anything useful about it.
The typical example is a Derived Column component. You've written an expression trying to construct a new column using a couple input columns. The problem is that the incoming data isn't what you expected when you wrote the expression, and is so unexpected that it causes an error in the evaluation. Perhaps you were attempting to get a fraction (ratio) between values in two columns, but the denominator column contained a zero. Or you were attempting to reduce the size of a column, or type cast a column to a smaller data type... but the incoming data exceeds the capacity of your new column.

At Design Time

Handling these kind of problems is easy... at design time. When a component fails, you have tools at your disposal to figure out why. Putting a data viewer on the input is the easiest and least intrusive - even though it requires you to think about how the incoming data interacts with your transformation.
With a little more work, you could attach a garbage component (Derived Column, Union All, ...) to the error output of the failing component, and configure a data viewer there. This will show you exactly which rows are causing the failure, and together with the Output window, you'll get a description of what the error is.
Unfortunately, there are still two problems (pre-SQL 2016):

We still don't know which column/expression is causing the component to fail. If you have two, ten, or twenty columns undergoing transformations in a component at the same time, it would be helpful to know which one of them is the problem. At design-time, you have the output window that reports the column having the problem - and this text can also be found in the sysssislog for production packages. But those aren't accessible at runtime, in production, to alter the package's behaviour, or report better information. The error output provides an ErrorColumn value to allegedly identify the column in error. However, this number has no discernible relationship with anything you can see in the data flow.
We can't handle the error reliably at runtime. Even if we understand what exactly is causing the component to fail, proactive error handling can't be coded reliably. There's an ErrorCode column to identify the kind of error (partially helpful) - but most often when error handling, you would really like to provide some business resource with usable information on how to fix the data that originated in their system. Because as we all (should) know, data quality is NOT an IT problem.

The Half-Answer

Well, fairly early on after SSIS 2008, we found out that it wasn't too hard to get one half of the answer - the actual error message. We just had to add a script component to the error output and call a function to decode the error number into a text string. However, there was no similar facility to determine what transformation caused the problem. To cope with this in some cases, I've split transformations individually into Derived Columns. Yes, one transform per component - all so that I could accurately and informatively report back to SMEs in production, in an automated fashion, which data elements of theirs needed a little attention.

The Second Half

Good news now that the new SSIS 2016 would finally add the option of getting a plain-English column name where the error occurred. It took me a while... but then I ran into my own need to get this information again... and recalled that I hadn't figured out a workaround for it in pre-2016 SSIS! What an unhelpful expert I am. Well - now you'll be glad to know that I took some time to solve the problem, and you don't need a separate tool to download, or store a table of metadata about your packages. This is all self-contained in the data flow - but even I can't say it's pretty, or particularly easy.

First - What is ErrorColumn?

The root of the issue is figuring out what the heck the number in ErrorColumn is. Yes, it's some indicator of a column in the data flow... but where else do we see that number? Nowhere in the UI, I can tell you that. But try this out in a failing data flow of yours: Write down the ErrorColumn, then open your DTSX with NotePad and search for that number (make it easy on yourself, and search for that number surrounded by quotation marks). What do you find? You find that the ErrorColumn refers to a LineageID. I've talked a lot about LineageIDs before - they're integral to how a data flow in SSIS works, and how SSIS keeps track of which columns are which. But what column does that LineageID reported by ErrorColumn refer to?
If we go back to our example of the Derived Column that fails, it's the LineageID of the column you're creating (or transforming). When I first tried to make sense of that ErrorColumn, I thought it would refer to the incoming column that had the "bad value" that was causing my expression to fail. But then I realized (of course) that SSIS has no idea what a "bad value" is... it only knows that an expression failed! So there's no "bad value", there's just a "bad expression". As such, the column that has the error isn't one of the inputs to the Derived Column - it's one of the output columns!
And to add insult to injury... that (new) column is NOT on the error output of the Derived Column. Because of course it's not - the error output doesn't include that column because it wouldn't have anything meaningful in it anyway. Facepalm moment. OK - now my intelligence/skills/experience has caught up to where the SSIS team was back in 2005. That took a while, I know, but bear with me - I'm attempting to redeem myself here.

Great - ErrorColumn is on the Error Output, but the Column Itself is on the Regular Output - Now What?

Great question.
And your next question should be... even so, how do I figure out which column on the regular output is associated with that LineageID that's nowhere to be found in the UI?
Script - of course! (Yay.) Now is the time to download the sample code that accompanies this article. You don't really need to read and understand the script - because the good news is that you won't have to edit it at all - it's completely generic and will work with any data flow with NO CHANGES. You can copy and paste the component (not just the code) into your data flow, or you can build your own by:

Go to the "Inputs and Outputs" page, and select the "Output 0" node. Change the "SynchronousInputID" property to "None". (This changes the script from synchronous to asynchronous.)
On the same page, open the "Output 0" node and select the "Output Columns" folder. Press the "Add Column" button. Change the "Name" property of this new column to "LineageID".
Press the "Add Column" button again, and change the "DataType" property to "Unicode string [DT_WSTR]", and change the "Name" property to "ColumnName".
Go to the "Script" page, and press the "Edit Script" button. Copy and paste this code into the ScriptMain class (you can delete all other method stubs):

    public override void CreateNewOutputRows()
    {
        IDTSInput100 input = this.ComponentMetaData.InputCollection[0];
        if (input != null)
        {
            IDTSVirtualInput100 vInput = input.GetVirtualInput();
            if (vInput != null)
            {
                foreach (IDTSVirtualInputColumn100 vInputColumn in vInput.VirtualInputColumnCollection)
                {
                    Output0Buffer.AddRow();
                    Output0Buffer.LineageID = vInputColumn.LineageID;
                    Output0Buffer.ColumnName = vInputColumn.Name;
                }
            }
        }
    }

Feel free to attach a dummy output to that script, with a data viewer, and see what you get. From here, it's "standard engineering" for you ETL gurus. Simply merge join the error output of the failing component with this metadata, and you'll be able to transform the ErrorColumn number into a meaningful column name.

But for those of you that do want to understand what the above script is doing:

It's getting the "first" (and only) input attached to the script component.
It's getting the virtual input related to the input. The "input" is what the script can actually "see" on the input - and since we didn't mark any columns as being "ReadOnly" or "ReadWrite"... that means the input has NO columns. However, the "virtual input" has the complete list of every column that exists, whether or not we've said we're "using" it.
We then loop over all of the "virtual columns" on this virtual input, and for each one...
Get the LineageID and column name, and push them out as a new row on our asynchronous script.

So now you can see that this script can be used in the middle of any data flow, and it will output the columns "as rows". You could even get more clever and start outputting columns with data types and other metadata related to the column too.

3 comments:

UnknownApril 22, 2016 at 3:18 AM
Very nice. I was sure myself that something like this would be possible, and tried to develop it, but didn't get as far as you did, partly because of the terrible documentation of the DTS object model as used in Scripts. Congratulations!
Arya HafshejaniMay 2, 2017 at 7:12 PM
Hi Todd,
Thanks for this. I am trying to implement this on a 2008 R2 solution where I have multiple ole Db error row redirects. I am not getting a match because the ErrorColumn of the last error ole db destination does not match the LineageId of the original column in the source ole DB connection as The script task gets the Lineage ID for the column of the very first OLE DB Source. Any advice on how to sort this out?
Thanks,
A
Todd McDermidMay 2, 2017 at 7:26 PM
You can't count on matching lineageIDs for your source to lineageIDs for your destination. Anything in between could cause a new "shape" for your buffer, which will result in new lineageIDs being generated. It's not really possible to use this technique as-is with a destination component (of any kind) because there's no output for it to connect to the "Copy metadata to error-decoding flow" conditional split. However, it might work if you put that split right in front of the destination instead of after it...

Todd McDermid's Blog

Pages

Tuesday, April 5, 2016

Finding the Column Name for an ErrorColumn Code in SSIS (prior to 2016)