Sunday, July 26, 2009

Presenting at New Jersey's PASS Chapter in August

I'm off to see family in New Jersey - my brother is a professor at Montclair State - so I thought I'd see if there were any SQL usergroups in the area.  You know - have a real vacation by getting in some good work-related content.  (My wife still loves me.  I think.)  I figured there had to be - heck there are two in my "hometown" on the left coast.  The first place I looked was on the PASS website, and I found the New Jersey SQL Server User Group, run by Melissa Demsak - the SQLDiva.
More than a few emails went back and forth, and I talked my way into presenting at the August meeting.  I had a great experience at DevTeach that sparked some inspiration, so it was relatively easy to brainstorm a good list of topics.  Melissa ran that by some of her members, and they decided on a combination of a few ideas around typical Data Warehouse processing requirements - avoiding bad performance in IS, processing SCDs, and loading fact tables.
If you're in the area, and are interested in a little pizza, Data Warehousing, and Integration Services - please mark your calendar!
What
Loading a Data Warehouse using SSIS should be pretty easy – all the tools you’ll need are in the box. However, there are a few things you should avoid and non-obvious best practices to use to make that ETL faster and more reliable. Assuming attendees have a basic level of familiarity with Kimball Data Warehousing methodology, this talk will explain how to accomplish Kimball style ETL using SQL Server Integration Services. We'll review the SCD Wizard and how it works for processing your dimension tables - what it's good for, what it's not, and what alternatives you have to it. We'll also examine fact table loading using a surrogate key replacement pipeline, especially using the Lookup component efficiently. Finally, we’ll look at specific and easy performance tweaks, techniques and best practices. Depending on time constraints, that can include reviewing which Data Flow components to avoid and how to still get your work done without them, tuning inserts and updates, or issues from the audience.
That's a lot to cover in one presentation, so there won't be a great amount of depth in any one area - there are simply too many tips and best practices!  However, it ought to give everyone a very good understanding of how SSIS implements Kimball style Data Warehouse loading.  A good portion of the performance portion of the presentation will be open to the floor - I have plenty to talk about, but specific questions are welcome.
When
The monthly NJSQL meeting will be on Thursday, August 13th.  Festivities start at 6pm and the presentation should start at around 6:30.
Where
The NJSQL user group meets in space graciously provided by SetFocus in Parsippany at 4 Century Drive.  I'd love to see you there!  (I'll even have some door prizes to add to the pot, if I can find some space to pack them.)

Friday, July 24, 2009

Vancouver PASS July: Presentation Slides and Demo Material

A huge thank you to Richard Baumet, Scott Stauffer, and the Vancouver PASS chapter for a great reception at the July meeting.  A big thanks to Sierra Systems for hosting, and Idera for sponsoring. 
I started off with the planned presentation - but it was apparent that the subject was a little too deep for the interest level of the audience, so we delved into other aspects of SSIS.  I did a little explanation of the SSIS architecture - Control Flow versus Data Flow - and what "things" were accomplished with each.  I explained (briefly) how the SSIS Data Flow was an in-memory "assembly line" as opposed to T-SQL "set" operations.  That topic led to questions about what kind of "real world" scenarios SSIS was intended for, and which scenarios should prompt you to look for some other tool.  That was followed by an explanation of why some operations (like sorting) were best left to other tools when possible - as those operations cause that "assembly line" to become inefficent and bottlenecked.  We covered a lot of ground - I hope it was an informational session for you!
A special shout-out to Sanjeev for "bringing your own code" to convert from a Script to a Custom SSIS object - I don't think you made it to the meeting, but send a message to Scott, Richard, or myself - I've got a DLL ready and waiting with your name on it...
Scott should have the meeting recording online soon at the Vancouver PASS website.  The (planned) slide deck about converting Tasks, and the conversion templates are all on my SkyDrive - feel free to download them.  For those of you interested in the "intro" part of the presentation, click though to my wrap-up post from April - the slides and a demo are there.
As I said at the meeting - I'll be trying out Idera's toolset for SharePoint to backup, move, and restore some SharePoint sites within the next week.  I'll try to post a comment here with info on how that went - but please bug me if you don't see that!
If you have any questions or comments about the process of converting Scripts, refer to my three part series for details and samples, and please don't hesitate to ask in the comments below, or on the MSDN SSIS forums.
Thanks for coming by for some Pizza and Integration Services - I'm sure I'll see you soon at a future Vancouver PASS meeting - hopefully listening to Richard talk about Filestreams and PHN BBQs...

Saturday, July 18, 2009

A Pause Task for SSIS

This is a relatively simple thing to accomplish in a Script, or even using an Execute SQL Task.  In fact, Jamie Thomson, with comments from Darren Gosbell, Marco Russo (I believe) and many others describes all three methods in a years old blog post - SSIS: Put a package to sleep.
Those are just fine, but they all have slight issues.  Jamie's first suggestion spikes CPU during your wait.  Using an Execute SQL Task to WAITFOR DELAY assumes you have a SQL Server available, you want to manage a connect to it, and you know they syntax of WAITFOR.  (I'm always having to look things like that up.)  Using a script assumes you know how to use .Net's Sleep and/or TimeSpan structure(one of the reasons that Jamie avoided the Script).
Another (Simpler?) Alternative
They really are minor issues, and I'm definitely not suggesting that Integration Services should add such a capability out of the box... but it does come in handy.  I typically use the WAITFOR method... no real reason, really - although I suppose I consider it a little more "transparent" than the Script method.  However, I recently had a desire not to use a SQL connection.  Note that it wasn't a "need", it was a desire - I didn't want the management overhead for a connection I wouldn't otherwise use in the package.  That, and I wanted to test the empty framework I had for converting Script Tasks to Custom Tasks.  (Yes, by doing so I did find and fix a bug in that!)

So here's a "Pause" task that allows you to specify a duration in milliseconds, or a time of day to pause until.  The binaries (2005 and 2008) are on my SkyDrive.  When I get some time, I'll post the project on CodePlex[1] so you can see how "minimal" the source really is.  Of course, all I've provided in that download is the DLL - so you'll have to install it yourself:
  • Copy it to the Program Files\Microsoft SQL Server\100\DTS\Tasks folder (or "90" for 2005)
  • Copy it (again) to the equivalent x86 Program Files folder if you're on a 64-bit system
  • Register it in the GAC with GACUtil
It has it's own simple editor to allow you to select if you want a duration to wait, or wait until a specific time of day.  Those properties are exposed in the properties window as well (as all Tasks do) - and they're usable with property expressions as well.  That way, you can provide a string value for a time of day inside an expression, perhaps retrieved into an SSIS variable - pretty much the same way you'd do it with a T-SQL WAITFOR.

For those of you who are coming to the PASS meeting this Thursday, this is one of the Tasks we could look at - just ask.  Although I'd rather convert your Scripts...

[1] UPDATE 2009-07-29: The Pause Task is now on CodePlex - installer included!

Wednesday, July 15, 2009

General Quick Reference: Batch File Tips

This will be kind of out-of-place here... but every once in a while I need to use batch files to get something done.  These are some of the more complicated things I tend to use, and as such, I'm constantly forgetting them and having to look for old batch files that do the same thing.  That's hard.  This is easier.  As always, I'll update this as I find other things that I need to find faster :) (Am I a dinosaur?  Has Microsoft replaced this venerable technology with something else baked in to Windows? Let me know...)
Resources
These are sites that I usually run into when searching the web for help for this kind of stuff.
Computer Hope (Batch Files)
Rob van der Woude's Site
Substring Equivalent
To get partial contents of a variable at a known position and length, use:
%[variable]:~[start],[length]%
Where [start] is a zero-based index into the [variable], and [length] is a non-zero length.
Examples
To reformat the contents of the DATE environment variable (which on my system is DD/MM/YYYY) to YYYY-MM-DD format:
Echo %DATE:~6,4%-%DATE:~3,2%-%DATE:~0,2%
or
SET REFORMATTED_DATE=%DATE:~6,4%-%DATE:~3,2%-%DATE:~0,2%
Replace Equivalent
To find and replace a specific character in a variable, use:
%[variable]:[string to find]=[replacement string]%
Where [replacement string] can be empty.
Examples
To strip quotations off of a variable (I just found this one - it was causing me grief for long path names passed in as arguments to the batch file):
SET LONG_PATH_NAME="C:\Program Files\Microsoft SQL Server\"
Echo %LONG_PATH_NAME:"=%

Using FOR Loops
Complete reference at Computer Hope.
FOR %%C IN ("filespec here") DO command here
Common replacement terms for the resulting file argument:
%%~nC uses just the filename itself, no pathing
Waiting A Specific Length of Time
Complete reference at Rob van der Woude's blog.
PING 127.0.0.1 -n
Example that waits five seconds:
PING 127.0.0.1 -n 6

Tuesday, July 14, 2009

Life in the Jungle

It's hot, it's muddy, there's all kinds of buzzing noises and itches that need scratching, and I'm sweating in places I didn't know I had.  No - supporting a "Dashboard" in MS Access is not the kind of vacation they advertised.  Yes, this is an update on my Guerilla BI strategy.
I've barely gotten any time to work on my Data Warehouse in the last six months - maybe 5% of my time.  But there has been some progress.  The last time I'd mentioned by Guerilla BI tactics, I had my nightly ETL process 95% completed - with respect to extracting the scope of data that I'd intended for a "first step".  As with all warehouses and BI solutions, and any software solution for that matter, the scope grows organically as you complete phases in your project.  Since then, I've tweaked the ETL a little, and fleshed out the SSAS cube.  Some of my calculated measures were very tricky to implement - and I'm not entirely sure why.  Now that I think of it (hindsight) I think I'll post a comment over at the SSAS forums for the experts in AS/MDX to look over.  They may have some suggestions to improve my implementation, or some questions about my sanity for constructing my members the way I did.  Regardless, I have my cube set with a good set of dimensions, "regular" measures (like COGS, margin, units sold), and calculated measures (like Turns and GMROI).  In order to get the Turns and GMROI, I had to add an inventory snapshot fact table to my warehouse and use some conformed dimensions.  I still have to work on the logging aspect of my ETL, but I haven't gotten around to that yet because I don't yet have a server provisioned for me.  Yes, I'm running my warehouse entirely on my own dev machine.  But then, I'm happy with that at this point, because there's no significant pressure from the potential user base yet - and it's a work in progress.
Now that we're talking about the user base... There are about ten people in the company who are aware of my "subversive" activity... and yes, one of them is my boss.  He's been encouraging me to spend that 5% of my time, knowing full well that it will (eventually) return the effort in spades.  Every time he gets a complex reporting request from the other executives, he tries to pass it my way to fill.  In part because he should, he's really too busy to be extracting data manually and constructing reports (although he's eminently qualified to do so) - and in part to "test" my cube.  Those tests have been very beneficial - they've required some tweaking and extending of the architecture, all well in advance of it really being viewed by "users".  On the last such inquiry for information, I was able to provide them information in minutes that they simply could not get from anywhere else in less than 8 person-hours.  We let the two execs requesting that information know where it was sourced from.
So I'm partially out in the open - the foliage in the jungle is thinning.  Whether that will mean true BI will emerge into the flowered meadow to take over from the bloated and diseased Access system, or just be exposed to some cost-cutting carpet-bombing remains to be seen.
On the technical side, I still need to get my server provisioned (it's in the pipeline), set up my deployment architecture (I have a rough plan), and add budget information to my warehouse.  On the "people" side, I need to get one or two mid-high level managers involved to understand and critique the Excel pivot-table experience of using the cube - and that will hopefully start the viral "win their hearts and minds" war.

Thursday, July 9, 2009

Presenting at Vancouver's PASS Chapter in July

A few weeks back Richard Baumet, the leader of my local PASS chapter, asked if I was interested in presenting something on SSIS in October.  I was just putting together some thoughts on topics when he realized his scheduled presentation to the chapter in July conflicted with another engagement.  He put out the emergency call, and since that night happened to be free for me, I offered to present.  Fortunately, Richard allowed me to do my presentation on SSIS - not FILESTREAM as he had been planning.  A win-win in my books - as I know nothing of it!  Hopefully the chapter thinks so as well, and I'm not left talking to crickets.
The Topic
I'll be doing a variation on a topic I presented at DevTeach/SQLTeach in June, and that I just happened to blog about recently.  It was well-received at the conference, but I still think I tried to cover too much - talking about custom tasks and data flow components.  (That does seem to be a problem for me!)  Since I already have those resources under my belt, I thought I'd stick my neck out a little bit further on this one..
Are you developing and supporting Integration Services packages? Five bucks says you have almost identical Script Tasks littered among your packages – they’re the “quick and easy” solution to a lot of common file handling, auditing, and other needs. But just like most “quick and easy” solutions, maintenance can quickly get out of hand – there’s no nice way to version Scripts, know which Scripts are “up to date”, or even know where a particular Script is being used so that you can update it. Every time you create a new package, you find yourself copying and pasting that same Script – and then modifying code and variable access to it. There is an alternative to that mess – Custom Tasks – and it’s really not as difficult as you might think.
Todd will lead you through the process of creating a Custom Task - turning your copy-paste-then-edit-script drudgery into drag-and-drop bliss. It’s no harder than the copy-paste-edit cycle you already go through – except you only have to do it once. In addition to finding out exactly how to do it yourself, Todd is inviting attendees (actual, virtual, or otherwise) to send in their favourite Script Task to be converted during the presentation! You could walk in to work tomorrow with a drag and drop solution for your Script maintenance nightmare...
What a deal!  You come for pizza, learn some stuff, AND go away with something you can take to the (time)bank!  I'll be trying to keep the opening slides to under ten minutes - hopefully that leaves enough time to get two or more custom Script Tasks built for attendees.  They'll all be available after the presentation - definitely on my SkyDrive, and maybe even on CodePlex in the handy SSIS Custom Tasks and Components directory.
When and Where
If you're in Vancouver on July 23rd, RSVP to Richard (rbaumet "@t" shaw.ca), and come to Sierra Systems' offices at 5:30.  The raucous party is on the 25th floor of 1177 West Hastings.  Pizza and drinks are being graciously provided by Idera and Sierra Systems.  (That reminds me that I have to try Idera's SharePoint toolset - SharePoint's built in backup/restore bites.)
Special Note!
I've just received the first snippet of code from a PASS member through Richard... and it's from a Script Component (a transformation), not a Script Task.  This presentation is about converting Script Tasks (on the Control Flow) to Custom Tasks.  Converting Script Components that work with columns in the Data Flow is a little more involved - I think too much for a short (if you think 1.5 hrs is short) presentation to do justice to.  So please - send Richard code from your Script Tasks...
Not to fear Sanjeev - I'll make some time before or after the "show" to step through converting your Script Component to a Custom Component.  You'll have a drag and drop component ready to go Friday morning...

Saturday, July 4, 2009

Using DtsMappingSurfaceControl in your Custom SSIS Component

As one of the "wishes" I'd had when creating my custom Kimball Method SCD component for SSIS, I'd wanted to give myself the same kind of column mapping interface that Microsoft had given other components - like the Merge Join or the OLE DB Destination components.  It's a relatively elegant interface (even if somewhat prone to drag and drop misses) providing a decent view of the columns involved, and an easy way to link them up.
I had posted a question in the forums about how to go about doing that, and hadn't gotten much response until KimYda did a little more digging than I had, and came up with a gem.  The control was "hidden" away in a Microsoft assembly they used internally.  Kim didn't provide any other details, so I had to plug away at it a bit until I felt satisfied that I could reasonably use the control.  Well, I can only hope I'm doing some extra legwork because I don't have access to internal Microsoft documentation and/or object model - because if not, I truly pity the developers on the SSIS team that have to wire up this control manually.  It's way too involved, and I wrote a wrapper even though I only use it in two places!  (You'll find the link below.)
The Short Version
You'll need to add six assembly references, add the mapping control to the toolbox, add the control to your design surface, and add a tooltip control to your UI.  In the code, you'll have to add drag and drop events and handling code, and should add handling for tooltips for the all important metadata.  Last but not least is translating "mappings" to and from the control so you can persist them in your component.
If you're looking for a shortcut, you can download two C# source files from my SkyDrive.  I've extracted the wrapper from my Kimball Method SCD project, dumping some XmlSerialization code out.  The SkyDrive version ought to work, but the "latest" bugfixed/new featured version will always be in the Kimball Method SCD project source - it's in there in the "SSIS Custom Component Utilities" project, in "DtsMappingSurfaceControlWrapper.cs" and "ColumnInformation.cs" source files.  (At least, it will be once I release v1.5...)  It's quite involved (read on to see why), and you'll have to remove some code if you don't want to use my property persistence framework.
The Details
Reference The Assemblies
KimYda noted that the "Microsoft.DataWarehouse" assembly contained the controls I was looking for.  Over the course of actually making this thing work, I found five other assemblies which are essential.  The six assemblies you'll need to add as references are:
  • Microsoft.DataTransformationServices.Controls* (in the GAC)
  • Microsoft.DataTransformationServices.Design (in \Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\)
  • Microsoft.DataWarehouse (in \Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\)
  • Microsoft.DataWarehouse.Interfaces (in \Program Files (x86)\Microsoft SQL Server\100\SDK\Assemblies\)
  • Microsoft.SqlServer.DlgGrid* (in the GAC)
  • Microsoft.SqlServer.GridControl  (in \Program Files (x86)\Microsoft SQL Server\90\Tools\Publishing\1.2\)
Four of those (non-starred) you should be able to find via the standard "add references" method using the .Net or Browse tabs.  Two of them you'll have to go to extra lengths with.  Keep in mind that I'm running Windows 7 x64 - that's why there's references in the paths above to Program Files (x86).  If you're running a 32-bit Windows, just refer to Program Files, or use your favourite file search facility.  If you can't find those DLLs where I did, first look in the GAC, then do a search.  You may have to do the following workaround with some of the assemblies (I did on a SQL 2005 x86 setup).
Microsoft.DataTransformationServices.Controls is in the GAC... but you can't see it (at least I couldn't) using Windows Explorer or Windows 7's integrated file search.  You'll have to fire up an (Administrator-privileged) command line, manually navigate to C:\Windows\Assembly, and down into GAC_MSIL.  There, you'll see a folder with the name you're looking for - now you just have to (painfully) drill down in there until you uncover the DLL.  Now you'll have to copy the DLL out of that folder into another location and reference it from there - because even though you've located it, the Add Reference browse won't see subfolders in the Assembly folder like the command line will.  I copied it right out to my source folder so I wouldn't have to look very far in the future.
The same song and dance occurs with Microsoft.SqlServer.DlgGrid - it's in the GAC, but inaccessible to the Add Reference wizard.  The oddity here is that it's in the "90" subfolder, even though I don't have SQL 2005 installed.  Well, I'm sure there's a reason - but I have no clue why it's there.
Add the DtsMappingSurfaceControl To Toolbox
Open up your VStudio toolbox, right-click and "Choose Items..."  Scroll around and find DtsMappingSurfaceControl and check the box to the left.  You'll get an inspired looking gear icon in your toolbox.  (They never intended these tools for public consumption, so why waste time on icons?)
Add a DtsMappingSurfaceControl To Your UI
Drag that new control onto your design surface.  It doesn't look like much - but you should see the a scrollable pane with a "subwindow" in it titled "Available Input Columns".  Excellent - we're almost 10% of the way there... ;)
Add a Tooltip Control
I recommend you add a tooltip control to your form to supply the metadata info (column data type, length) to the user when they float the mouse over them.  It's optional, but remember how irritating the lack of that feature is on the Derived Column component!
Initialize the DtsMappingSurfaceControl
Time to get into code!  From here on, there will be a lot of use of the words "first" and "second" when referring to the column lists and parts of the surface mapping control.  The "first" table/control/grid refers to the one that shows up to the left of the surface mapping control, and the "second" table/control/grid refers to the one that shows up on the right hand side of the control.  Well, initially anyway - the user can move them around wherever they like - that's probably why the SSIS team refers to them as "first" and "second". 
When I'm exposing what code to use, I'll be mentioning "FirstMovableGridControl" and "SecondMovableGridControl" a lot.  To keep things short, when possible, I'll refer to "XMovableGridControl" when what I'm talking to applies to either of those properties/objects.
Step one, set up some properties on the control.  For my purposes, I wanted to show two tables to link columns between them, so I set:
  • TwoTables property to true
  • XMovableGridControl.GridLayout to Check_Name
  • XMovableGridControl.Text to the first and second table names, respectively.
You'll have to supply the control with the column names for the first and second tables by constructing an array of ColumnInfo structures, and passing those arrays to InitializeFirstTable and InitializeSecondTable.  It's unfortunate that the ColumnInfo structure isn't more comprehensive and contain data type information - if it did, then the control could handle tooltip events itself.
Add Drag/Drop Events and Handlers
Next, you'll have to set up the drag and drop event ecosystem.  This was the difficult part (for me) as I'd never used drag and drop extensively, AND the sub-controls involved in this composite control were not consistent.  I'll try to highlight the inconsistency as we go. 
In broad strokes, you'll need to enable dragging and dropping, add a MouseDown, MouseUp, and MouseMove for each "side" to handle the "start" of the drag, and add a DragOver and DragDrop for each side to handle the "drop" part of the drag.  Almost all of those event handlers get placed on the GridControl of the XMovableGridControl, like this:
dtsMappingSurfaceControl.FirstMovableGridControl.GridControl.MouseDown += new MouseEventHandler(YourEventHandlerHere)
I say "almost all" because the DragOver and DragDrop event handlers for the FirstMovableGridControl are tied directly to that control, not to the GridControl property of that control.  To restate, add event handlers for all the above mentioned events to the XMovableGridControl.GridControl objects except for FirstMovableGridControl's DragOver and DragDrop.  Add those event handlers directly to FirstMovableGridControl, not to its GridControl property.
Now, in order to determine if the user is dragging instead of just moving the mouse around, we'll need to keep track of where the user was when they initially clicked the mouse.  So in the MouseDown event handler, store the mouse location (supplied in the MouseEventArgs) into a class-level variable.  The MouseUp event handler should clear that variable (by setting it to Point.Empty).
The (first) complicated part occurs in the MouseMove event handler.  Here, you'll have to first determine if the mouse has moved sufficiently to initiate a drag/drop operation.  You'd do that by comparing the location you stored in the MouseDown event to the location passed in the MouseEventArgs of the MouseMove, and seeing if that wanders outside the box prescribed by SystemInformation.DragSize.
If you have moved enough with the mouse button down to start dragging, you need to know what you're dragging.  Call the XMovableGridControl.GridControl.HitTest() method to get back a HitTestInfo structure.  One of the properties on that structure is the RowIndex.  If that property is zero or greater, the drag started on a valid column (row? :) ).  If it's a negative value, the user tried to drag the column header or other part of the control.  Use the RowIndex to retrieve the column name by using the XMovableGridControl.GridControl.GetCellString method, passing the RowIndex as well as the column index for the column name.  Use XMovableGridControl.ColumnNameIndex to supply that index.  Doing all that should have retrieved the column name that the user's started to drag.  Next you need to call XMovableGridControl.DoDragDrop() to start the drag operation.
Great!  You've got the drag started, so it's time to code the event(s) to handle the drop.  The DragOver event changes the icon to the appropriate "can't drop here" or "drop here" icon.  I've chosen to set the DragEventArgs.Effect to DragDropEffects.None when being dragged over anything but the destination control.  (Watch out here for the inconsistency between First and Second.)
The important part - and second complex part in this drag/drop dance - is the destination control's DragDrop event handler.  First, extract the source column name from the drag/drop operation by calling DragEventArgs.Data.GetData(typeof(System.String)).  Second, we have to find out where the user's decided to drop the link.  Do that by interpreting the drop location using XMovableGridControl.PointToClient.  (We haven't had to interpret locations like this anywhere else before - so don't think I missed this step elsewhere.)  Then perform the same HitTest method call as you did in the MouseMove event handler - except on the receiving XMovableGridControl.  The HitTestInfo structure returned will tell you your destination column name.
Now that you have the source column name and the destination, you can instruct the control to draw the "mapping" line between those two items.  You can do that by using the DtsMappingSurfaceControl's AddConnector method.  There are two things to note with this:
  1. You need to supply the arguments to the AddConnector method in the "first", "second" order of the tables, not the "source", "destination" order of the drag/drop operation.  To restate, if the user dragged a column from the "right" to the "left", you would have to reverse the arguments to the AddConnector method.
  2. If the second table's column is involved in any other connection, the AddConnection call will (silently) fail.  You need to call the DeleteConnectors method first - but to do that, you need to know what "first" column the second column is connected to, which means you need to find it in the Connections collection.
Whew.  That was a moutful - and a handful of work for you to do.
Add Tooltip Handlers
I've set up a tooltip object in my wrapper class.  If you choose to implement tooltips for your control, there are a few issues you'll have to deal with:
  • You'll need to dynamically set the tooltip message in your MouseMove event.
  • You'll have to trigger the tooltip to display "manually", because it only displays itself automatically when you move from control to control... which doesn't include moving from row to row (column to column?) within the surface mapping control.
  • You'll have to supply the metadata information, because the surface mapping control doesn't have it.
Add Mapping Code
At this point, you should have something you can "play" with at design-time in SSIS - but you don't have any way to set the connections before you show the control, nor retrieve the connections the user has set.
To set up the connections before the control is shown, call the control's AddConnector method, passing it column names.
To retrieve the columns involved in connections after the user has used the control, iterate over the DtsSurfaceConnection objects in the Connections collection of the control.  You're interested in the SourceConnectorObject and DestinationConnectorObject.  The "source" and "destination" here do not refer to any directionality in the mapping (to my knowledge) - the "source" refers to the "first" table, and the "destination" refers to the "second".
I'm Done!
Wow - you read all of that and built your own "mapping" interface?  Congratulations are in order - or maybe we can commiserate with each other on how much punishment we're willing to inflict on ourselves in order to get this far.  You could have always taken the short-cut of downloading the source from my SkyDrive or CodePlex... I wish I could have!
Using this, I was able to turn two tabs in my component editor that looked like this:
 To something I think looks (and operates) a lot nicer:
Have fun with your component!