Refer to Part One so that you know where we left off on this topic. In this instalment, I'll cover standard input and output validation and reinitialization. This post should clearly show - as I had previously claimed - that the Validate and ReinitializeMetaData methods go hand-in-hand.
A quick note here - all the snippets here refer to the SQL 2005 object model (IDTSInput90, ...) but the entire process remains unchanged for SQL 2008. The only changes you need to make are to change the "90" to "100" on the types.
I/O, Flow Attachment, and Virtual I/O
There is one thing that must be made clear before we continue - or else what follows will hopelessly confuse you. The SSIS terminology around inputs is confusing - I think primarily because most of you out there (yours truly included) are primarily working with SSIS as package designers, not as component developers.
When you're a package designer, you will naturally think that an "input" for a component is the arrow-connector that you've dragged from an upstream component and attached to a downstream component, and that's the whole truth. Not so when you're inside the component looking out. In component development, you have to understand that inputs and outputs have three fundamental apects that work together to make the component's "input" work.
When you work with inputs and outputs in your custom component, you'll see classes/interfaces like IDTSInput90 (or IDTSInput100 for SQL 2008) and IDTSVirtualInput90 - or IDTSInputColumnCollection90 and IDTSVirtualInputColumnCollection90. What's the difference?
The short version: the "virtual input" is the information that describes all of the information that's being passed down the attached data flow to your component's input. The virtual input essentially equals the output of the component upstream from your component. The "input" is the information you and/or the package designer (through your interface or the "Advanced Editor" only) has decided that you're "interested in".
What this means is that you have complete control over the "input" and "output" concept - but you share control with the package designer around the "virtual" partners to those. The package designer will alter the metadata of the columns presented to your component (the "virtual input"), and together with the package designer, you'll determine which columns will be used (and how) in your component.
Your component can "choose" which virtual columns get put on the input by using the SetUsageType method.
Input Validation - Flow Attachment
As I recommended earlier, it's best to move from simple validation to more complex validation. In this case, before you move forward with checking whether or not the package designer has added or removed columns from the upstream data flow, it's best to check whether or not they have disconnected or reconnected a flow to or from your component. (Detecting if your outputs are connected to anything downstream is not possible.)
Typically, all of the inputs that are defined by you - the component creator - are required to be attached to an upstream component by the package designer. Think of standard components like all of the Destinations, and a vast majority of the Transforms. Every one of the Destinations shows a validation error if there isn't an upstream component connected. Transforms like Sort, Derived Component, and Merge Join work the same way - they fail validation if the predefined inputs aren't attached.
In some cases, some of the inputs are optional. Think of the Union All transform. It requires at least one input, but additional inputs are not required. That may not seem obvious until you think it through - as soon as a package designer places the Union All transform on the design surface, it gets created with one input. That one input requires attachment. But as soon as an attachment is made to that input, the component defines another input - it has to, or BIDS would refuse to allow the package designer to drag another data flow onto it. You can attach or detach inputs to the Union All as much as you like - it won't destroy any of its inputs, but as soon as you attach enough arrows to fill up it's inputs, it will make more.
The way you check for input attachment is quite simple - given the typical requirement that all of your defined inputs need to be attached to upstream components - simply loop through the input collection and verify that they are.
foreach (IDTSInput90 input in ComponentMetaData.InputCollection)
{
if (!input.IsAttached)
{
ComponentMetaData.FireError(0, ComponentMetaData.Name, "All inputs need to be attached.", "", 0, out cancel);
return DTSValidationStatus.VS_ISBROKEN;
}
}
It's recommended that you set the status to VS_ISBROKEN, which indicates that the package designer is responsible for fixing the problem.
There is no related section in ReinitializeMetaData for repairing attachments. But for added automation, you may want to look at the OnInputAttached and OnInputDetached methods in order to prompt automated "repairs" to your component.
Input Validation - Virtual Column MetaData
After you've validated that there is a data flow connected to your input, you'll want to get more detailed. One of the first things I recommend checking is if the columns being delivered to you are those that you expect. Your component will very likely "break" if some parts of the metadata change - such as specific columns existing (because your component operates on them in some fashion), or certain columns being a particular data type or size. You can completely skip this part, and go right to checking the "input" metadata (as opposed to virtual input metadata) - since if a column that you're "not interested in" changes, then you're "not interested". I like to have my components check (and then repair), as I frequently maintain internal lists that reference "unused" columns for editing and other purposes.
I recommend simply checking counts first, then whether columns have been added or removed, and finally, data types for the columns.
Check Column Counts
To check column counts, compare a variable you've persisted in your component against input.GetVirtualInput().VirtualInputColumnCollection.Count, like this:
IDTSInput90 myInput = ComponentMetaData.InputCollection[0] // change this to retrieve your input by ID or property - you can never rely on SSIS maintaining your ordering for inputs
if (myInput.GetVirtualInput().VirtualInputColumnCollection.Count != mySavedInputColumnLineageIDs.Count)
{
ComponentMetaData.FireError(0, ComponentMetaData.Name, "Column metadata has changed - columns have been added or removed.", "", 0, out cancel);
return DTSValidationStatus.VS_NEEDSNEWMETADATA;
}
Check LineageIDs
To check for columns being added or removed, you're going to have to have saved (persisted) some kind of list of lineageIDs (because column names can't be counted on - lineage IDs are the column keys). Compare that list of saved lineageIDs to each virtual column's lineageID, like this:
foreach (IDTSVirtualInputColumn90 vColumn in myInput.GetVirtualInput().VirtualInputColumnCollection)
{
if (!mySavedInputColumnLineageIDs.Contains(vColumn.LineageID))
{
ComponentMetaData.FireError(0, ComponentMetaData.Name, "Column metadata has changed - columns have been removed.", "", 0, out cancel);
return DTSValidationStatus.VS_NEEDSNEWMETADATA;
}
}
Check Data Types
It's always a good idea to check column data types, especially if your component does something with them. You may not want to check every column's type - you might only want to validate those columns you use. You can do that this way, or against the Input columns, or only against a subset of the Virtual Input or Input columns. To ensure that column data types and lengths haven't changed, you'll have to store what you "saw last time" in your component's persistent storage as well for comparison. You need to loop through your virtual columns again, comparing the relevant properties of the virtual column (such as DataType, Length, etc...) to your stored expected values. If that check uncovers some differences, return VS_NEEDSNEWMETADATA. To check all of the Virtual Input column data types:
foreach (IDTSVirtualInputColumn90 vColumn in myInput.GetVirtualInput().VirtualInputColumnCollection)
{
// You could save data types in a Hashtable with the lineageID as the key
if (mySavedDataTypesByLineageID.ContainsKey(vColumn.LineageID))
{
if (vColumn.DataType != (DataType)mySavedDataTypesByLineageID[vColumn.LineageID])
{
ComponentMetaData.FireError(0, ComponentMetaData.Name, "Column '" + vColumn.Name + "' has changed type.", "", 0, out cancel);
return DTSValidationStatus.VS_NEEDSNEWMETADATA;
}
}
}
You will simply put more checking in for vColumn.Length and other properties to continue data type checking - as rigorously as you decide. Keep in mind that I've shown above that my code returns VS_NEEDSNEWMETADATA if the data type changed. You may not want to do that. You may have had the package designer identify some "special" columns that your component is designed to transform in a certain way. Perhaps your requirement for those "special" columns is that they're supposed to be a certain data type - an integer type for example. If the package designer has identified a non-integer typed column as one of your "special" columns, you as the component designer can't repair that. The component can't change the data type of the input column, and also can't arbitrarily select another column to use as your "special" column. Only the package designer can fix that problem. In a case like that, you'll want to return VS_ISBROKEN instead, like in this code:
// mySpecialColumnLineageIDs contains the lineage IDs the package designer identified as being your "special" integer-only columns
foreach (IDTSVirtualInputColumn90 in myInput.GetVirtualInput().VirtualInputColumnCollection)
{
if (mySpecialColumnLineageIDs.Contains(vColumn.LineageID))
{
if ((vColumn.DataType != DataType.DT_I8)
|| (vColumn.DataType != DataType.DT_I4)
|| (vColumn.DataType != DataType.DT_I2)
|| (vColumn.DataType != DataType.DT_I1)
|| (vColumn.DataType != DataType.DT_UI8)
|| (vColumn.DataType != DataType.DT_UI4)
|| (vColumn.DataType != DataType.DT_UI2)
|| (vColumn.DataType != DataType.DT_UI1))
{
ComponentMetaData.FireError(0, ComponentMetaData.Name, "Column '" + vColumn.Name + "' is required to be an integer type.", "", 0, out cancel);
return DTSValidationStatus.VS_ISBROKEN;
}
}
}
Input Repair - Virtual Column MetaData
If you find something wrong with the Virtual Input columns' metadata, you've likely told BIDS that you wanted it to call ReinitializeMetaData so that you could "fix" it. Keep in mind - you can't repair the columns themselves, you can only repair your own information. This means that you can't remove columns you don't like - you can only store somewhere in your component that you want to ignore them. You can't change a column's data type, you can only change the data type your component expects, so it's ready at runtime to handle it. If you have a problem with something the package designer did - for example, changing the data type on a column from an integer type to a string type - then don't return VS_NEEDSNEWMETADATA, return VS_ISBROKEN, as was done in the previous section.
So what can we repair? Perhaps your component stores the columns that have been identified as the sort order of the flow, because it does something with them. If the sort order changes - that's fine, it's not an error, but perhaps you want to record that information, or change some other internal properties because of it. This is where you make that happen. The code to do this is highly dependent on what your component is trying to accomplish, so I can't really give you any small snippets of code. I encourage you to look at sample components - like the Microsoft SQL Server Community Samples: Integration Services on CodePlex, or any of the other open source custom components you can find on the SSIS Community Tasks and Components project.
Input Validation - Column MetaData
As we've stated before, the difference between the Input and Virtual Input column collections is only the fact that the Input collection is a subset of all columns available that your component has decided to use. You may have skipped checking the Virtual Input to get here, that's fine - but don't skip the checking I explained there. Check your Input the same way I checked the Virtual Input - column counts, lineages, and data types. You can read all of the sample code in the Virtual Input section - just replace "IDTSVirtualInputColumn90" with "IDTSInputColumn90" and "GetVirtualInput().VirtualInputColumnCollection" with "InputColumnCollection".
Do exactly the same kind of validation here as we did in the Virtual Input checking. First, you should check your column counts, then match actual columns to expected ones via lineageID, and then check data types.
Input Repair - Column MetaData
The same comments apply here as I made in the Input Validation for Column MetaData and Input Repair for Virtual Column MetaData sections. You can only fix your own interpretation of the incoming columns here - you can't change the columns themselves. Take any useful snippets from the Input Repair for Virtual Column MetaData and replace the Virtual references with Input references.
For Next Time
I'm going to leave the topic for the next part open - this was a long post. I may talk about validating and repairing outputs, relinking your "internal" storage of special columns to the inputs, or something else entirely. But it will have something to do with using Validate and ReinitializeMetaData! See you then...
Hello Todd,
ReplyDeleteI have written code inside Script task to create a package dynamically. This is expected to load data for more than 1 table i.e. inside a dataflowtask there should be parallel source & destinations created dynamically.
I have used forloop to achieve this.
The logic runs successfully for 1st iteration of loop but fails on 2nd iteration on ReinitializeMetadata() right after i assigned Accessmode for destination.
I have written details in SSIS forum with code.
http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/daa4dcc8-7c61-4137-94e0-fb725c45c0f7
It would be very helpful if you could help me.
Thank you.
I'm not a fan of constructing SSIS packages inside of Scripts. Use a C# app instead. I'll take a peek at your thread - but I'm not familiar with creating packages from code. This post is about handling those methods within custom objects.
ReplyDeleteYes Todd. I agree with you and not in favor of creating package dynamically in Script Task. The intent of putting this code in script task was "quick debugging". I would be using this code in one of custom object that i would be creating in future.
ReplyDeleteHi Todd,
ReplyDeleteI am encountering a strange issue. I have created a 'custom destination component'. The custom destination component can automatically identify the columns passed to the from the source component during the 'Design time'. I have overidden the 'OnOutputAttached' method. The problem is if the source component adds an extra column 'X' to the output the custom destination component is not automatically selecting it, but the new column 'X' in available in the 'custom destination component'. Once I manually check the column 'X' it is added and loaded to the table.
I would like the 'custom destination component' to automatically add (check) the new column 'X' during the 'Run Time'. Can you please help?
Regards
Bidyut
I have implemented the solution to map the new columns automatically from source to destination.
ReplyDelete