It's hard being a software giant. There's so many people to try to make happy with your product, and with a bank account that dwarfs a large percentage of the world's governments... you may actually think you should deliver on trying to make those users happy - or at least not make them unhappy.
One of the things that tends to happen in those cases is that backwards compatibility is given a large amount of sway in product development. Because removing functionality from a product - regardless of how poorly implemented or misused it is - will probably adversely affect a large number of customers. Unfortunately, that very thing is what most products need from time to time. A little house cleaning - and maybe because of the springtime mood, the SQL Server blogger community has taken on a "five things SQL Server should drop" meme, started by Paul Randal (blog|twitter). Jamie Thomson (blog|twitter) tweaked the meme towards our mutual niche, and I'll follow his lead with my list of five things that Integration Services should do away with, in no particular order.
1. The Properties Window
This area contains some extremely important settings for some of the constructs in Integration Services. Unfortunately, it's in a completely unrelated UI element - a pane or floating window - that changes contents based on what object is currently selected. That might not be horrible in itself - except that it's also cluttered with a multitude of properties that you shouldn't touch, and lots of properties you can't edit. Please - take the relevant, important properties and put them in a comprehensive UI with the object you're editing. Whether that's in the variables pane, where you should be able to edit the variable's expression, or the Task UIs where you should be able to specify the transaction behaviour. (The list is endless.)
2. The Data Profiling Task
Yes, I do think this task is useful. No, I don't think the idea is bad, or that the tool itself should go away. But does it really belong in a product who's purpose is automated data extraction, transformation, and loading? The key word from the above is "automated". The Data Profiling Task is never part of an automated solution (without massive hackery that's better left to a Script) - it should always be part of the analysis done before package design is ever started. That functionality should be inside SSMS or some other data quality/exploration toolset.
3. Shrink Database Task
This one is a cop-out on my part. Such an easy target based solely on the arguments from multiple experts - don't automate shrink operations. (Just see the rest of the "5 things to drop from SQL Server" posts for proof.)
4. Lineage IDs
As a developer and dimensional modeler, I can completely understand why a "surrogate key" lineage ID was used instead of the "business key" of the column name. But the trouble in transferring that excellent argument from those worlds to SSIS package design is that this key isn't hidden from the user. In dimensional modeling and development (code or DB), it's a good thing to use surrogate identifiers to refer to objects in case the (dumb) user decides to change how they refer to the object. It makes internals nice and clean. Unfortunately for SSIS, the lineage ID isn't buried as far as it should have been - and at this point I don't actually think you can bury it far enough unless you pick up a shovel. I might be wrong on the internals (I don't get to see the source) but the utility expected from abstracting a column's identification just hasn't been there.
5. The "Advanced" Editor
Fairly similar to my dislike of the Properties window, the "Advanced" editor (a most inappropriate name) is used fairly frequently to make critical changes to a Data Flow component. Changes like informing SSIS how the data is ordered when delivered by a source component. Again, there is more than enough capacity in the "basic" (?) UI to handle that kind of configuration. And again, the "advanced" editor is cluttered up with tons of properties you shouldn't touch, and a lot you can't (because they're read-only). In my opinion, it's a crutch for not taking the time to develop a completely usable UI.
Any other SSIS users out there have a part of Integration Services they'd like to march to the gallows?
The .dtsx file format. It is effectively unreadable and un-diffable, rendering code-reviews almost pointless and version control a joke. If the UI wrote C# code (and parsed C# code), it'd be great. Otherwise, it's so handy you can't ignore it, but the results are so unmaintainable you can't endure them.
ReplyDeleteGotta disagree with part of #2. I have a number of issues with the Data Profiling task, and I'd love to see it implemented differently, but not because I don't think data profiling should be part of an automated solution. Data profiling on a regular basis in your ETL can be an effective means of monitoring the data in your warehouse, changing business trends, and, in some advanced scenarios, allows the ETL to "learn" as it goes. In addition, in some scenarios, you might be dealing with lots of data from different sources, that are not entirely under your control. Doing an automated data profile on that data can prevent you from loading complete junk into your database.
ReplyDeleteI'd really like to see data profiling implemented as an inline component for the data flow, that passes the information through while gathering statistics on it. That woud open up some much more interesting possibilities.
Andrew - I feel your pain. But I would have to say it's not the DTSX (XML within a file) that's the problem, it's the way they've written the XML contents. I'd be fine with a DTSX that was structured better - and I know nothing about XML!
ReplyDeleteJohn - I completely agree with your sentiment there. I would say the same applies to most of my list - I don't necessarily want the feature's benefits to go away, it's just how they're implemented. I'd also like to see a component gather and record statistics, comparing the current run against the last "n" runs for suspicious contents based on definable thresholds. Now there's an idea...
If you don't like the .DSTX format for XML (and who came blame you), you might want to take a look at some of what we're doing at Varigence. Hadron is an XML language for defining BI solutions, but we're trying to keep it human-readable. And we generate SSIS packages from it. :)
ReplyDeleteSSIS Data Profiling , we have made some progress in SSIS data Profiling see Video creating a Metadata Mart via #SSIS DataQuality , plus Data Profiling Analytics via #Excel http://youtu.be/3FQNO8AQCAY Melissa Data
ReplyDeletewww.melissadata.com/.../data-quality-components-for-ssis.htm