Pages

Thursday, October 30, 2008

Using Validate and ReinitializeMetaData (Part One)

This is my second blog post about the internals of creating an SQL Server Integration Services custom data flow component. The first was about using the PerformUpgrade method to seamlessly upgrade your component in existing packages when you change your code enough that the saved metadata has to change to support your new features.
One of the key elements of creating a custom control in SQL Server Integration Services that can create a great user experience - or cause disaster - is how you implement the Validate and ReinitializeMetaData methods. If used properly, those methods can report meaningful information to the package designer, and seamlessly and automatically react to upstream changes in the Data Flow.
As always, taking a look at some working examples is a great idea. Check out the SSIS Community Samples on CodePlex, Alberto Ferrari's TableDifference, and my Kimball SCD Component on CodePlex.
The Broad View
First, we need to sit back and think about what Validate and ReinitializeMetaData are intended to do by the designers of SSIS. Once you understand that, then you'll understand what all the code inside them is for.
The Validate method is intended to check all of the properties of your component to see if they're consistent and valid for runtime use. SSIS calls this method in many different situations, but the bottom line is that this method is responsible for letting SSIS know if your component is ready to go or not. Validate has some "grey area" in it - it doesn't just return a True/False value of whether the component is ready. Validate lets SSIS know just "how broken" it is, if it's not ready. If the component is only slightly broken (we'll talk about what that means later) then ReinitializeMetaData will be called to try and fix it automatically - without the package designer having to get involved.
When creating your component, these two methods help you accomplish these goals:

  • Find and report every inconsistency or problem with your component's configuration. Every time an unacceptable configuration can be foreseen, or a problem situation occurs at runtime, your first thought needs to be "how can I add/change code in Validate to detect this?"
  • Do everything possible to automatically configure your component, or streamline that process for the package designer - without having to open the editor.
Basic Validation
There are a couple things to keep in mind when writing your Validate method:

  1. Don't change your configuration inside Validate. Validate is a "read-only" look at your component, you're not allowed to "fix" anything in here. The reason for this is that it's called at runtime as well as design-time. It might be fine to fix something in here at design-time (conceptually), but at runtime, you really don't want to be assuming anything - the package developer had better have set everything up correctly when he/she had the chance.
  2. Check EVERYTHING you can think of. We'll run through a basic list later, but this point really applies to your "custom" properties.
  3. Try to fail/pass as quickly as possible. Validate gets called a lot, and as such, you really don't want the package designer to be waiting... waiting... waiting... for your component to tell him what he already knows is wrong. Do your best to find the "most important" thing that's incorrect with the component, then stop checking.
  4. Use the "FireError" and "FireWarning" methods of ComponentMetaData to provide verbose information to the package designer about what needs to be fixed.
Down to the nitty gritty. Your validate method has to return one of these values, which I'm listing in order of "most broken" to "ready": VS_ISCORRUPT, VS_ISBROKEN, VS_NEEDSNEWMETADATA, and VS_ISVALID. The layman's description of each:

  • VS_ISCORRUPT - "The component's configuration is so inconsistent, broken, and/or non-existent that it's not possible for me (the developer) to "fix", or even the package designer to "fix". Tell the package developer to start over by deleting this component and placing another one on the design surface."
  • VS_ISBROKEN - "The component's configuration is broken in a way that I (the developer) can't automatically "fix", or the package designer just hasn't specified enough basic information in the configuration for me to actually validate the component. Tell the package designer he has to edit me, he can fix this."
  • VS_NEEDSNEWMETADATA - "The component's configuration is broken or inconsistent, but I (the developer) think I can fix it. Run the component's ReinitializeMetaData method, please."
  • VS_ISVALID - "Sweet. All ready to process data - let's go!"
To start with, there are a multitude of things that Validate should check in your component, and because of some of the goals of above (particularly #3) there is no rule as to what order the checks should be done in. As well, depending on your component, some of these checks simply won't apply because they don't make sense. So read through, and pick what you need. Keep in mind, every check you do here probably has a "fix" in ReinitializeMetaData - do both!

Base Class Validation
One thing I always do as the first call in my Validate method is to call the base class' Validate. I'm not exactly sure what the base class checks - but if it thinks a "bare" component is not valid, I believe it. If the base class returns with VS_ISVALID, then I check what I can - perhaps overlapping exactly what it checks, but again, I don't know what that is.
Meta Data Existence Validation
DTSValidationStatus status = base.Validate();
if (status != DTSValidationStatus.VS_ISVALID)
{
    return status;

}
As I said earlier, you want to check the easiest and most "incorrect" things first, so that you can exit out of the Validate method without wasting clock cycles. One of the first things you should check is for the sheer existence of your metadata. This is a very easy thing to check for - because you have a checklist built for you, in your own ProvideComponentProperties method.
ProvideComponentProperties tells SSIS what things it has to store for you in order to describe your component - your metadata. That method is called only once, when your component is first placed on the design surface. It should not change, except for under your control in the PerformUpgrade method. If the metadata elements don't exist, then something extremely wrong has happened to the metadata - something that the package designer nor you will likely be able to fix.
Run through your ProvideComponentProperties method - you likely define inputs, outputs, and custom properties. Your Validate should check that all those exist. At first, in a very simple way:
if (ComponentMetaData.InputCollection.Count != 2)
{
    bool cancel = false;
    ComponentMetaData.FireError(0, ComponentMetaData.Name, "Component expects two inputs, but only one is defned!", "", 0, out cancel);
    return DTSValidationStatus.VS_ISCORRUPT;
}
Now - please note that we are NOT checking to see if the package designer has connected two data flows into our component. We are checking whether our metadata tells SSIS that our component wants two inputs made available to the package designer. Your input count will vary - for example: the Sort component defines one input, the Merge Join two, and the Union component defines (at least) one more input in the InputCollection than the number of inputs currently attached (so that the package designer can connect more). Both the Sort and Merge Join component would have a very big problem with having more or less than one and two inputs defined, respectively - but the Union component will allow (by not returning VS_ISCORRUPT) one or more. As shown, I would recommend you fire an error event, and return VS_ISCORRUPT should the input count be unexpected.
If your inputs are named specifically, and used by name elsewhere in your code, you should also loop through the InputCollection to check the properties of each input. Perform the same level of validation on your outputs. Again - we are not checking if or how the package designer has attached flows to the outputs - we are checking whether SSIS has correct information on whether those outputs exist.
The third basic metadata existence check you should make is for your custom properties. Most components will define some in ProvideComponentProperties. Check their existence, and possibly their data type here. More rigorous "consistency checking" to ensure that the values stored in the custom properties actually make sense within the context of all of the other configuration information should be done later.
Just about every problem described above should get a response of VS_ISCORRUPT - because if the metadata is screwed up badly enough to be missing that basic stuff, it's probably not worth trying to reconstruct. However, if you feel adventurous, you can definitely return a VS_NEEDSNEWMETADATA. Doing that will cause SSIS to call your ReinitializeMetaData method, where you can (just like in ProvideComponentProperties) add Inputs, Outputs, and custom properties.
Basic Reinitialization
Keep in mind that your ReinitializeMetaData method is the only place where you can rescue the package designer from mindless drudgery. If you hate how you have to edit a Sort component, and uncheck and check a passthough column simply to have your change of a column name be propagated downstream, now you know what the SSIS Team "forgot" to handle. (No slight on them, they had the rest of the product to deliver - we're just making components!)
Base Class Reinitialization
Just like the Validate method, you should call the base class' method to handle the basic stuff:
base.ReinitializeMetaData();
... To Be Continued ...
Next post, I'll get into some more areas around validating and reconstructing metadata for components. That should show you very clearly that these two methods are very strongly related - almost like two sides of a coin.

No comments:

Post a Comment