Thursday, October 29, 2009

PASS Summit 2009: MVP Birds of a Feather Lunch

The PASS Summit next week has a very compelling event on Tuesday at lunch.  They asked the SQL Server MVPs attending the summit to host a lunch table on the topic of their choice.  This isn't a one-way presentation - it's a round table - an opportunity for you to talk to an MVP and other attendees about a topic they (and you) are passionate about, want to find out more, or that you simply have a question or two about.  The topics are varied and the hosts are very prominent MVPs (your truly excepted).
I took advantage of the opportunity - and I'm glad I did so quickly, because the tables were swiftly picked up, to the point of a significant waiting list.  For me, it's bittersweet - I won't get to go sit down at the other tables.  But you can jump around as much as you want!
Why I Chose "SSIS Scripting and Custom Objects"
Simple answer - I like to do this stuff, and it's my table.  I mean, seriously - if you've read my blog at all, this is a no-brainer.
Long answer - First, I think that the scripting capabilities and custom extensibility in Integration Services set it apart from other solutions in the market, and give Microsoft a distinct advantage in the ETL game.  Some say that it simply exposes what holes there are in an immature product.  There is truth to that, definitely - it gets used as duct tape frequently.  However, it certainly provides a way to get anything done in SSIS.  And I mean anything.
Second, I believe that for some package designers, it's become an overused crutch.  If you've comfortable using C#/VB.Net, you can begin to solve every problem with a Script - even when you could use standard parts.  If you do that, I think you risk short term gain for long term benefit.  Microsoft can't make your Script run any differently in SQL 11, 12, or 13.  But they can (and will) improve the standard parts.  Using Scripts can lock you out of that benefit.  Anyone tried to upgrade a DTS ActiveX script to SSIS?
And third, for those that tend to overuse scripting and have gotten themselves into a cut-and-paste maintenance nightmare - I say "go big or go home".  Convert your scripts into custom objects, for pete's sake.  You already know how to code in C#/VB.Net - take the next step to build a reusable and easily configurable custom object - it's really not that much further, honest!
What I Plan To Talk About
They asked the "hosts" to prep a little something to talk about to get people in the spirit of the table.  I say "asked" because it's really about the discussion, not another presentation - and it's perfectly acceptable for everyone (except the hosts) to bounce around tables and participate in more than one group.
If those people that happen to sit down at my table at the opening bell want to hear something (and please, let there be somebody!), I'll tell them about all the different parts of Integration Services that you can mould, automate, and customize to your will - and why that can be a very useful thing indeed.
What I'd Like You To Talk About
It would be great to hear from fellow SQL pros (yes - that's you!) about:
  • The times you have "had to" use a Script to get something done.  Why did you do it?  Is it something you think (reasonably) should have been an out-of-the-box capability of SSIS?
  • Your experiences using Scripting in SSIS.  Was it hard or easy?  If it was hard or confusing, what made it that way?
  • How repetitive are your Scripts?  Do you have the same ones in every package?
  • Have you made any custom objects?  Have you used any third party ones?  Why did you chose to do that?  Did it work out?  Do you want to share your custom-built object or relate the usability and value of the third party one with other DB pros?
Resources I'll Talk About
If you drop by and join the discussion, you'll probably want to know more about how to use Scripts and how to build Custom Objects for use in SSIS.  Here are some of the resources I'm sure to mention.  I'll likely be adding more over time as I stumble across them again, or if you ask a specific question at the table!

Sunday, October 25, 2009

PASS Summit 2009 - Next Week!

I'll be in Seattle filling my brain with all things SQL Server next week.  I can hardly contain myself - because for me, this is no ordinary conference!
It will be my first conference participating as an MVP.  This is pretty significant, as I'll be getting to meet with lots of other MVPs - icons in the field as far as I'm concerned.  I'll also get to have really interesting conversations with the Integration Services dev team, and get some gory details as to what they're working on.
It will also be the first time I've volunteered at a conference.  My awareness of this came from being an MVP - but there's nothing stopping non-MVPs from volunteering - so please do!  (You might get a free pass.)  I'll be assisting in the Hands-on Labs on Monday morning and Friday afternoon.  I'll also be at the Birds of a Feather MVP lunch, hosting a table on (of course) using scripts and custom objects in Integration Services.
But as interesting as all that will be, I might just be most interested in what happens by being a square in Quest Software's Twitter Bingo.  It's an opportunity to win prizes from Quest Software and SQLServerPedia, and meet people that you've only ever heard of or talked to online.
And of course - there will be too many sessions to attend.  I still haven't nailed down what I'm going to see - and I usually have that all figured out a day or two after the schedule is released.
Take part in the fun if you're going to be at PASS Summit 2009 - don't sit alone at lunch, and don't sit in your hotel room in the evening.  Say hi to somebody during the day, and ask if they have any plans.  There's thousands of us there - there just aren't enough restaurant tables in Seattle for us all to sit alone!
You can follow what I'm up to on Twitter, as I'm being encouraged to tweet about where I am and what's going on for the Bingo game.  It's a great opportunity to talk to experts in the field that are literally begging you to come talk to them.  I certainly will be taking advantage and trying to black out my card.  Come say hi, and learn my code word (which unfortunately isn't as creative as I hear other "squares" words are) for your chance to win prizes.

Wednesday, October 21, 2009

The Possible Business Value of Twitter

I had this post drafted a while back, waiting for a slot to schedule it.  (Taking Steve Jones' advice from 24 Hours of PASS.)  Well, Microsoft just dropped the bomb I was speculating on, so I had to get this one out there.
I am a cynic. People who know me might say I'm not. People who really know me know they don't have to be polite and will admit that I am. With that frame of reference, I looked upon Twitter as a colossal waste of time. Why would I shout out into the void of the internet that I'm brushing my teeth. Or walking the dog? Twitter = Bad reality TV.  The short answer is that I don't intend to shout that meaningless stuff out and am expecting some other value to become apparent. I already see the value that I expected, and expect more in the future - specifically because of the deal Microsoft just made with Twitter and Facebook.
The Global Coffee House
It's fair to say that more that half of the tweets I follow are junk - probably 75% to 90% of them.  Why do I consider them junk?  It's the nature of Twitter.  I liken Twitter to a coffee house - like Starbucks.  You wander in, order your coffee... and can't help but overhear some conversations going on.  In fact, some people actually go to these places to find conversation in addition to coffee.  (Neither appeals to me - I don't think I'm a good conversationalist, and I don't like the taste of coffee.)
But regardless why you're there - that's what you can do - eavesdrop.  And even more so than in a coffee house, this normally taboo behaviour is somewhat expected.  People want you to join in on their conversations... to a degree.  I still think there are some social lines set to be drawn about what is appropriate to be shared on Twitter and what's not - as more people discover that however much it might feel like a "private conversation" amongst friends, it really isn't.
The Actual Professional Value
So what specific value is there today, for someone like me, in a medium where I acknowledge that up to 90% of the traffic is junk?
Learning Opportunities via Word of Mouth
I suspect this isn't unique to the circle of people I follow; Twitter is a fantastic source of "word of mouth" information.  I tend to follow people in my line of work, and when somebody finds a "gem" out there in the vast wasteland of the internet, they may tweet on it.  Unfortunately, given the 140 character limit, the summary of the link they pass isn't great - but the content on the other end is usually fantastic.  Not always what I'm interested in at the time, but very good for "rounding out" knowledge, much like subscribing to a newsletter, except the editors are people you "know".
On-tap Instant Problem Solvers
This is a marginal increase on the value I already find in the MSDN forums.  For the technologies that I'm interested in, they're the place to go - much more so than StackOverflow or ExpertsExchange.  However, there are times when sending a tweet out will reach a different audience of experts than posting on a forum.  The second benefit is that I (have the ability to) pre-screen the audience I have on Twitter (by locking out those I don't want to follow me), so I know those who will receive my tweet aren't morons.  (I haven't actually done that.)  There's no such filter on the forums - opinions are like you-know-what, everyone has one.  It doesn't happen often, but when someone with a helpful intent, but complete lack of helpful content posts a response on a forum, the experts may just skip right over it, assuming that someone else is "taking care" of the issue.  Doesn't happen in Twitter.
The (Limited) Power of Twitter Search
The converse to the above is also true.  I've set up some Twitter searches in TweetDeck to find me instances where a conversation touches on SSIS.  There are false positives (Sassy Sisters In Sales, Brandeis' Student Sexuality Information Service) - but fortunately they're pretty quiet.  Despite the occasional distraction, I use that search to find out about people in my profession who I'm not following, events, and blog postings.  But most of all, I look for people asking for help.  The idea there isn't to look for more work to not get paid for - but to help promote my own "brand" on the internet.  I have to temper that enthusiasm to assist and get my name known, because it can be kind of creepy for the "other guy" to realize that his scream into the darkness of the internet actually results in some guy he doesn't know turning a flashlight on his face, saying "need a hand with that?"
I also have a search up for discussions that touch on slowly changing dimensions - but that doesn't net nearly as much traffic - specifically because I can't search on "SCD"... because that's apparently a South American TV show or something that has tons of twits talking about it.  (Sorry, "tweeps"...  OK, I'm not sorry.)  I watch that search for times I might be able to jump in and suggest some helpful advice - usually centred around using my custom component in SSIS.
The Forseeable Business Value
For a business, there isn't much there yet.  Sure, you can do the same kind of "conversation trolling" that I've done, using your own search terms.  And the Twitter search does allow geographic narrowing of tweets - so it would be possible to eavesdrop on all the conversations about "need" and "roof" within 100 miles of your roofing business.
What will start the real business value is Microsoft and/or Google being able to license the Twitter feed.  Twitter obviously doesn't have the technical chops or marketing muscle to push a brand-new aspect of internet searching - but those two do.  Where do I see the value in that?  Imagine our coffee shop again - but fill it with a million people all talking about different things.  Some of them are just shouting into the void - but most are having conversations with those in their circles.  Those conversations will invariably reference products - TV shows, books, technology, services, you name it.  I'm not in marketing, but even I can see the value in market research right there.  On top of that - businesses can take "market research" to the next level and become "intrusive".  Just like I've done with my "slowly changing dimension" search feed - businesses can have customer service departments and salespeople being fed with tweets mentioning their product or terms that relate to their product.  That business can then swoop in and save the day for their customer or potential customer - exactly what BBGeeks intentionally did, starting a business from "nothing".
Currently, Twitter search is pretty "dumb".  It's not all that sophisticated.  But throw in Google or Microsoft's search engines into the mix - and the potential for searching tweets for meaningful conversations is increased dramatically.  Add to that that the search is "retroactive".  Pay attention here - because this is an immensely critical point that differentiates Twitter from the coffee house eavesdropping metaphor.  Going back to the coffee house - imagine somebody mentioning that they were unhappy with their Chrysler dealer because they felt they had been hosed on a brake service.  If you're a mechanic who happened to overhear that comment, you could inject yourself into the conversation and try and get some business.  But enter the power of Twitter - and something that will be very difficult for tweeps to get their heads around.  Using Twitter, you - the mechanic - don't have to jump right in to the conversation cold.  You can easily research your potential customer just by reviewing the conversation they just had.  You can even research all their past conversations!  By doing so, you have the ability to avoid spending time/money attracting a customer if they seem to be "unreasonable".  You can inject yourself into the conversation more "naturally" by using terminology and phraseology consistent with the person having the problem, making them more comfortable and receptive to your pitch.
It's Inevitable
Crazy exciting and downright scary all at the same time.  It takes the "be careful what you say on the internet - it will live forever" advice to the next level.
But mark my words... it's coming - from Google, Twitter, Microsoft, Facebook, somebody.  It's just a matter of time.

Saturday, October 17, 2009

Quick Reference: SSIS in 32- and 64-bits

There are quite a few misconceptions flying about out there regarding SQL Server Integration Services and 64 bitness.  I've had to set more than a few people on the right path in the forums - mostly on one particular setting inside the SSIS packages that is getting misinterpreted.  Of course, this information only applies to 64-bit architectures - if you are running a Windows 32-bit OS, you have no choice - your packages will always run in 32-bit mode. 
Why would you want to run in 32-bit mode if you have a 64-bit system?  Drivers, mostly.  I'm not referring to hardware drivers, but data providers.  As an example, Excel and some versions of Oracle don't have 64-bit providers.  So for the ever-increasing base of 64-bit users, here's the skinny on executing Integration Services packages in your choice of 64-bit or 32-bit mode.
Background
I Can Design Fine, Why Won't It Run?
A little background to start.  Business Intelligence Design Studio (BIDS) is a 32-bit application.  When you're designing your package, you're using 32-bit facilities - and have no choice in the matter.  When you execute your package using DTExec, you have the option of 32-bit or 64-bit operation - but the default on a 64-bit installation is to use 64-bit mode (obviously).  However, some commonly used objects in SSIS don't have 64-bit counterparts, and will therefore cause your packages to fail.
Unfortunately, it usually doesn't say anywhere in these messages that the fault lies with 32 vs 64 bits.  It's usually something like:
  • 0x80040154 ... Class Not Registered
  • The AcquireConnection method call to the connection manager XXX failed with error code 0xC0202009
  • 0xC00F9304 ... SSIS Error Code DTS_E_OLEDB_EXCEL_NOT_SUPPORTED
  • The OLE DB provider "Microsoft.Jet.OLEDB.4.0" has not been registered
(I include those sample errors here in the hope that those searching the web may find this article!)
Why Do I Want 32-bit Mode?
The most common reason to want 32-bit mode in an executing SSIS package is the Excel Provider.  It's currently not available for 64 bits, and will cause your package to crash.  (Office 14 (2010) is reported to have 64-bit support - even though it's not supported side-by-side with 32-bit.)  This applies to the other Office providers as well - Access, specifically - and to several other third party drivers and providers (like Oracle).  They simply will not work in a 64-bit environment (pre-2010).  You may also wish to run Execute DTS 2000 Package Tasks - and those can only run in 32-bit mode as well.
It Depends How You're Executing Your Package
There are many ways to execute an SSIS package - and this is the primary determiner of whether you're running it in 64-bit or 32-bit mode.  So pick your execution environment from the list below, and read up on how to force the bitness you desire.
Choosing Bitness Inside Business Intelligence Development Studio (BIDS)
If you're running your package inside BIDS, the setup is simple unless you're using the Execute Package Task or Execute Process Task to run child packages.
The package you currently have open will (by default) run in 64 bit mode.  The setting that controls this is a property on the project called Run64BitRuntime.  To access this property, right-click on the Integration Services project in your solution explorer and select Properties.  Then select the Debugging node in the editor.  The default here is "true", which means all the packages in this project will run in 64-bit mode.  If you change this to "false", all the packages will be run in 32-bit mode.
Special Note: Execute Package Task
Any child packages executed via the Execute Package Task will run in the same mode as the parent, regardless of the Run64BitRuntime setting of the project that the child package belongs to, regardless of the setting of ExecuteOutOfProcess.  This means that even if your child package has Run64BitRuntime set to false in the project you designed it in, it will be executed in 64-bit mode within BIDS if your parent package's Run64BitRuntime property is true.
Special Note: Execute Process Task
The Execute Process Task can allow you to choose 32-bit mode independently of the settings in the parent package, at the expense of running the child package in another process.  As with the SQL Agent methods described later, you can specifically identify the 32-bit DTExec to run SSIS child packages in 32-bit mode (see below).
Choosing Bitness With SQL Agent
Instructing SQL Agent what environment you want your packages to run in is simple in Integration Services 2008.  SSIS 2005 makes you jump through a few more hoops.
Integration Services 2008
In the Agent Job Step Properties, you'll be using the SQL Server Integration Services Package type of step.  If you go to the Execution Options tab, you'll see an option to "Use 32 bit runtime" down at the bottom.
Integration Services 2005
With SQL 2005, you can not use the Integration Services Package type of job step to run an SSIS package in 32-bit mode.  Your recourse is to use the Operating System type of job step, and refer to the 32-bit version of DTExec specifically in the command line that you use, and manually specify arguments to DTExec.
Hurdle #1 - Finding the 32-bit DTExec
Finding the executable shouldn't be difficult.  In a standard 64-bit installation, the 32-bit DTExec.EXE should be located in the "\Program Files (x86)\Microsoft SQL Server\90\DTS\Binn" folder.  It's plainly called "DTExec.EXE", and doesn't identify itself in any way as a 32-bit app - you'll have to "know" it is by it being located in the 32-bit folder.  (Or you could try to execute it and watch Task Manager.)  If you've installed SQL to a non-standard location, you may have to hunt a little.  If you can't find it at all, you may not have installed the 32-bit components on your 64-bit machine.  During the install of SQL Server, if you only selected "Integration Services" and didn't install "Business Intelligence Development Studio" OR "Management Tools - Complete", then you won't have the 32-bit DTExec installed.  You'll have to run SQL Setup, and install one of those options.
Hurdle #2 - Determining the Command Line Arguments
Next, you need to determine the command line parameters you need to operate DTExec from the command line.  You could read through the documentation and attempt to determine the arguments and settings by yourself, but I would recommend you use the power of the included GUI tools.  Both the IS Job Step in SQL Agent, and the DTExecUI tool provide a GUI to configure an SSIS package run.  On the last page of the GUI, it very helpfully places the exact command line arguments needed to run DTExec, based on all of the configuration options you've chosen on the ten or so other tabs of the GUI.  Leverage that!  Set up your package execution using the GUI, then copy the arguments off that last page.
Precompiled Scripts
This is only an issue in Integration Services 2005 - the dev team completely fixed this issue in SSIS 2008.  There is a "Precompile" option on Script Tasks, which is set to "true" by default.  If this has somehow been set to "false", your packages may not execute in a 64-bit environment.
32-bit ODBC Drivers
There is also one other oddity with using 32-bit ODBC drivers in Windows - at least in Server 2003, 2008, Vista, and Windows 7 64-bit OSes.  The first step to using those drivers is to set up a DSN to handle storing the server name and other particulars.  In my experience, the natural first place to start is by opening the "Data Sources" applet in the Control Panel.  That's mistake #1 - because that applet only manages 64-bit drivers.  You won't see yours listed at all.  The next step is to poke around and realize that there's a "Data Sources (32-bit)" applet there in the Control Panel as well.  That's mistake #2 - but not your mistake.  In my experience, this icon leads to some odd hybrid 32/64 bit management utility.  It definitely did NOT manage my 32-bit sources.  If you fire it up, then look at the processes tab in Task Manager, you'll see a process labeled "odbcad32.exe"... but you'll notice that it doesn't have the "*32" after it denoting that it's a 32-bit app.  Whatever it's attempting to manage, it's not the 32-bit ODBC drivers.  What you need to do is navigate to another odbcad32.exe that's sitting in your SYSWOW64 folder.  That ODBC data source administrator truly does manage 32-bit drivers, in a 32-bit process.
References/Resources
Most (if not all) of this information is also distilled in an MSDN article: 64-bit Considerations for Integration Services.  If you find other useful articles, pointers, or mistakes in the above, please post a comment.
Some other very useful articles:
How To: Run a Package, MSDN
64-bit references within an SSIS Script Component by Michael Entin, Microsoft SSIS Dev.
64-bit Considerations for SQL Server Integration Services by Douglas Laudenschlager, Microsoft SSIS Dev.
Where is my app.config for SSIS? by Darren Green, SQL Server MVP.
Oracle Driver Configuration in a 64-bit environment by Rob Kerr
Importing data from 64-bit Excel in SSIS by Hrvoje Piasevoli
32- and 64-Bit Connectivity from the Same Machine by SQLCAT

Wednesday, October 7, 2009

More on Dealing with NULLs in Integration Services

In an earlier post, I talked about how having NULLs in your Data Warehouse is probably not the best thing - in dimension tables or in fact tables.  That post went over how to detect and cleanse the NULL values out of your source data before inserting into your Data Warehouse.
This post will discuss some of the issues that NULL values cause inside SQL Server Integration Services when you're not proactively handling them.  The typical indication of a rampant NULL is the dreaded DTS_E_INDUCEDTRANSFORMFAILUREONERROR message.  It's human nature to assume that the information you're retrieving from a text file, an Excel spreadsheet, or even a database is actually information - and not NULL.  It's the kind of thing that will end up biting you in the nether regions sooner or later, after having caused you to run around trying to find out where this cryptic error from SSIS is coming from.
Unnatural NULL Behaviour
The root of the problems with NULL - on top of usually being "unexpected" when you should be expecting it - is that it behaves completely unlike other values.  Let me give you a quick rundown of facts that you'll have to force into your brain (eventually) and accept as truth:
  • NULL is not equal to zero or blank
  • NULL is not equal to NULL
  • NULL is not "not equal" to NULL
  • Expressions with NULL in them typically evaluate to NULL, and eventually cause errors
Ah, I see your brain hurts already.
Why Is NULL Not Equal To Zero or Blank?
NULL is an "unknown" value.  Because it specifically represents "unknown", you can't compare it to a known value such as zero or a blank string.  Since we don't know what value NULL represents, we can't be sure if it is or isn't equivalent to zero or a blank string.  The result of evaluating an expression comparing NULL to a real value is... NULL.  (Read later for why.)
So an expression of:
[Column1] == [Column2]
Can cause you real problems in a Conditional Split component when either of Column1 or Column2 happens to be NULL for a particular row.
(I've described comparing NULL to zero or a blank string here - two different data types - because that's a common issue. In SSIS, NULL is actually typed. A NULL-valued integer column is different from a NULL-valued string column.  This is kindof irrelevant, because you typically don't compare a NULL-valued integer to a NULL-valued string.)
Why Is NULL Not Equal To NULL?
Can you hear the Twilight Zone music?  This is where it really starts to get weird.  If you accept the fact above that you can't detect NULL values by comparing them to zero or an empty string... your next logical option could be to try comparing it to NULL itself.  In SSIS, you could try an expression like this:
[Column1] == NULL(DT_I4) ? "Column1 is NULL!" : "Column1 is not NULL"
Except that won't work.  Why not?  Because NULL != NULL... which seems to make no sense!  0 == 0, right?  1 == 1.  Why does NULL != NULL?  If you remember, NULL represents "unknown."  Taken that way, the value could be anything... and since it could be anything, we can't say that one "value of something we don't know" is equal to another "value of something we don't know" because those two "somethings" could be different.  We just don't know.  And that leads us to...
Why Is NULL Not "Not Equal" To NULL?
Wait a minute.  I just told you that NULL == NULL was incorrect.  Isn't there some mathematical theorem that says that must mean NULL is not equal to NULL then?  You could be experiencing a full-on aneurysm... or we could be talking about NULLs.  To put you on solid ground - yes, there is such a theorem.  But no, it doesn't apply here - because I didn't tell you that NULL == NULL was false... I just told you they weren't equal.  The same twisted logic applies to trying to evaluate NULL != NULL.  It's not true, any more than NULL == NULL was.  With the same logic, we just can't know that two "unknowns" aren't equivalent.  And that leads us to...
Expressions With NULL In Them Typically Evaluate To NULL, And Eventually Cause Errors
All of the following statements will evaluate to NULL in Integration Services if Column1 is NULL-valued:
[Column1] == [Column2]
[Column1] != [Column2]
[Column1] + [Column2]
SUBSTRING([Column1], 1, 1)
FINDSTRING([Column1], "x", 1)
DATEDIFF("day", [Column1], GETDATE())
...etc...
The nasty problem here is that you probably have complex expressions like this:
SUBSTRING([Column2], FINDSTRING([Column1], "abc", 1), 10 - [Column3])
Which will pass a syntax check at design time just fine, but will fail miserably if Column3 is NULL in your data.  The tricky thing is (in the above example) if Column1 and/or Column2 is NULL also... SSIS won't throw an error - it will return NULL for the expression.  (It's able to do that because the NULL math in the SUBSTRING argument doesn't need to get executed, because the result of taking any SUBSTRING of NULL is... NULL!)
Dealing With NULL - "Properly"
The net effect of this is that if you have a NULL value somewhere in an expression, that NULL can get "bubbled up" from one nested function/operator into another, into another.  It can be difficult to track down where the problem is coming from.
Prevention
An ounce is worth a pound of cure, right?  So take your medicine and check for NULL values as soon as you can in the data flow.  If you don't expect certain columns to contain NULL at runtime, then use a Conditional Split to route those rows to your logging/audit mechanism to alert you of the bad data.  You do have a logging mechanism that you monitor, right?  You can (and probably should) be overzealous here - unless you know for sure that you may get NULL values in a certain column and you've checked that your data flow works as you expect with NULL values in it - detect rows with NULLs in that column.  If you find out later that a row comes in with that column NULL, and in your data quality investigation ("did you want this data?") they tell you they need it, and how to interpret it, you can then modify your flow.  Of course, in the perfect world, you would have the answers to all this up-front.  But then again, those responsible for data sources have been known to be mistaken...
The same caution should be taken after Merge Join components - if a left join is used, it's possible for rows to be emitted that have NULL values in them.  It costs very, very little to ensure you're data's clean by using a Conditional Split component to detect NULLs.  You only have to check one column on the output that came from the "right" input of the Merge Join to make sure.
Diagnosis
Well - you're likely to have a report from SSIS about exactly which component failed due to the NULL value sneaking through your flow.  Again, the problem here could be that the DTS_E_INDUCEDTRANSFORMFAILUREONERROR reported by a specific component could just be the "last straw" in the chain.  Since there are lots of cases where having a NULL value can slip through expressions undetected, showing up later in the data flow than the problem crept in.
The best way to debug your flow is with Data Viewers.  If it's not obvious to you what column the offending NULL appeared in, this feature will help you.  Run the flow, and the Data Viewer will "pause" the data flow for each buffer (set of rows) that passes through it.  You can click on the column headers to sort the buffer by the column - which will nicely bring the NULLs to the top of the list.  (Even though NULL means we don't know what the value is... it's "less" than any other value with respect to this sort.)
Once you've detected which column is at fault, you can start to find out where the NULL was introduced.  If just knowing the column that has the NULL doesn't inform you of where it came from, then place Data Viewers higher and higher upstream until you find the component that spawned them.
Treatment
OK - so you've found the problem - now what?  It depends, of course.  The NULL may be a result of an incorrect query, a misunderstood Merge Join - you should be able to spot the problem if that's it, and fix it by fixing the query, or executing the Merge Join properly.
But if it's simply a case of the NULL being a part of source data, or an inevitable result of a Merge Join that you can't avoid or fix by altering the Join, you'll need to deal with that NULL and change it. The way you do that is with a Derived Column component.  The two tools you'll use in that component are the ISNULL function, and the Conditional Operator (?).  You can replace the contents of the column with an expression like this:
ISNULL([Column1]) ? <null replacement> : [Column1]
Where you'd place a literal value you wanted to replace your NULL value with.  Here are some common examples.
To replace a NULL-valued string column with a blank (empty) string:
ISNULL([Column1]) ? "" : [Column1]
To replace a NULL-valued string column with a specific string value, in this example "n/a":
ISNULL([Column1]) ? "n/a" : [Column1]
To replace a NULL-valued integer column with a zero:
ISNULL([Column1]) ? 0 : [Column1]
To replace a NULL-valued date with a specific date, in this example January 1, 1900:
ISNULL([Column1]) ? (DT_DBTIMESTAMP)"1900-01-01" : [Column1]
Good luck tracking down and zapping those NULLs!

Thursday, October 1, 2009

SSIS Custom Object: Send HTML Mail Task

My ETL process sends mail when it's done - yours probably does too.  I like to send a little additional information along in that email - not just "I'm done" or "I failed".  It's informative to send information like how long it took, how many records of particularly sensitive tables were processed, what the errors were - that kind of thing.  I find that it's read best in a table format - particularly with a little subtle color thrown in to highlight issues.
The Send Mail Task Works...
Sure it does... but unfortunately I'm not big into making tables with those funky characters like we all used to do in the 80s.  Especially not when we have HTML today.  And despite the incredible engine that the SSIS Team provided for us - they weren't quite able to stock the pantry with everything you need for day to day cooking, if you know what I mean.
The Send HTML Mail Task
Almost the same as the Send Mail Task - but so much prettier, if I do say so myself.  All the options of the Send Mail Task, plus you're able to specify a "reply to" address, and set the Task to send HTML mail.  (It will also send plain-text mail as well.)
The Task uses code largely identical to what you'll find in Script Tasks that have been helpfully posted around the internet - this one being an example.
Head on over to CodePlex to download the installer and/or source code!