Friday, February 25, 2011

Changes to the SSIS Slowly Changing Dimension Component

Probably not the kind of change you're expecting.  I've recently been contacted by the Kimball Group, and they've asked me to stop using the word "Kimball" to identify the component.  They have every right to make sure nobody is confused about who they are and what they do, or try to capitalize unfairly on their reputation.  I never intended to do anything of the sort, and certainly hope I've never given that impression.  Work is (slowly) underway to strip it out of the project materials on CodePlex.
Lemonade, Anyone?
So if I can't identify the component by the name of the methodology it follows... I've got to find something else for it's title - because "SCD component" is just boring, and will easily be confused by the component of the same name (but definitely inferior capability IMO) included within SQL Server Integration Services.
I could come up with a label, but it would probably be really lame.  Anything I can think of to tack on to "SCD Component" seems trite and overused in the world of 3rd party objects.  Your typical adjectives of "ultimate", "fast", "improved", "enhanced" don't seem to be able to capture what I'm looking for.
I Need Your Help
This component turned into a real community experience for me - 50,000 site visits,13,000 downloads, 600 discussion posts, and 100 bug reports and suggestions.  Best of all (for me) were 4 four-star and 10 five-star reviews (nothing lower) and the expressions of support.
So that's why I'm turning to you - the people who know what this component does better than anyone!  I'd like to run a little contest to come up with a new name for the project - something that speaks to the impact that it has for you in processing incremental dimension loads in SSIS.
Think Out of the Box
Don't restrict yourselves to looking through the thesaurus to find an adjective to tack on to "SCD Component" like I did.  Think unique.  A descriptor that's funny, insightful, or just plain accurate will do - but not something you'd hear everyday.  Send me your idea for the full name of the component, as well as something shorter if you get out of breath saying the full name.  (An obvious nickname would be the acronym... but you can do better, right?) 
Leave your ideas (and lobby for your favourites) in the comments, send an email to toddmcdermid at live.ca, tweet to @Todd_McDermid, or tell me in-person at SQL Saturday #65.
The Incentive
Seriously - you need an incentive other than contributing to the community?  Of course you don't - but I have one for you anyway.  The winner - chosen at my sole discretion - will get a boxed Windows 7 Ultimate x64.  You'll probably have to patch it to SP1 by yourself... but you guys develop SSIS ETL packages, so I know you're used to a little manual hackery.  (I may also have a little something for the funniest entry - I'm talkin' shooting-milk-out-the-nose funny...)
The Time Limit
All good things must come to an end - and this contest will too.  You've got one week - until midnight Pacific Time on Friday, March 4th to send me the moniker that should emblazon the best SCD option for SSIS.  Stay tuned for the results!

Sunday, February 20, 2011

Windows Phone 7 and OneNote

I'd never used OneNote before I picked up my Windows Phone 7 - but it's extremely handy, even more so than the Notes or Tasks that I'd used for the same purposes on my BlackBerry.  The only downside was that they (still) don't have notifications attached, which means OneNote notebooks aren't a replacement for Outlook Tasks.  That said, the ability to synchonize to the cloud allows better access - from any computer anywhere - rather than just from computers that I have my Outlook client installed on.  In addition, the formatting capabilities of OneNote are very useful compared to Outlook notes or tasks.  I find myself recording more information in my notebooks - because it lends itself to better organization, and it's easier to get at.
What's Possible Now
The Windows Phone how-to pages include information on how to set up a "default" synchronizing notebook called "Personal (Web)" on your SkyDrive.  This (inexplicably) shows up on the phone with a section called "Unfiled Notes" and you're warned against changing it.  This prevents a useful organizational feature of OneNote - the ability to file notes into sections, and even into different notebooks.
If you happen to have an internet-accessible corporate SharePoint 2010 server, you can "keep" a copy of those notebooks on the phone and synchronize them with the server.  I don't have a SharePoint 2010 server architecture lying about, and my organization is probably not looking to spend a ton to get one.  I really wanted to segregate information and realize some of the benefits of multiple sections and notebooks.
Officially, That's It
As far as Microsoft's documentation goes, that's all you can do.  They don't mention anything about using more than one notebook.  In fact, in all the news I've read on the internet about what's coming in a future update (the Mango, 2011H2 update) is "better" Office integration.
So you're stuck, right?
Pushing the Boundaries
It turns out you can put any SkyDrive notebook on the phone - although Microsoft doesn't explain how, or tout this capability.  It's relatively easy to do, but there are limitations, which is why you aren't hearing this from Microsoft.  It's so easy that I didn't realize how seamless it was until I broke it and re-enabled it by accident.
Quite simply, create any OneNote notebook on your SkyDrive, then use the phone's Internet Explorer to navigate to your Live account to find the document in SkyDrive, and open on it.  If you did this on your desktop's IE, it would open in OneNote's WebApp.  But on the phone, it switches immediately to the Office Hub, opening the notebook on the phone, and syncs the sections.
But wait - I said I "broke" this and then fixed it.  How did I do that?  I changed my phone's Internet Explorer settings to look at the pages in "desktop" mode, not mobile.  (Some sites have way too limited "mobile" versions, don't they?)  Seems that if you surf to your OneNote notebook in "desktop" mode, it'll fire up the OneNote WebApp, but if you hit it in "mobile" mode, it knows to download and sync it with the Office Hub.  Who knew?
Getting Rid of "Personal (Web)"
Great!  You can put any OneNote notebook on the phone... so why settle for a hobbled notebook called "Personal (Web)" with an "Unfiled Notes" section that you can't change?
Only one thing - if you create a new note (page), it needs a notebook to get saved in, and that's the "Unfiled Notes" section of the "Personal (Web)" notebook.  Again, the documents don't say this - but you can change that default.  Open the Office Hub and tap the "all" button.  Tap the "notebooks" pivot heading.  Try to tap the garbage can icon, then pick the "Personal (Web)" notebook - the phone will complain, and give you instructions on how to change the default.  Just to save you from having to do that, press "OK" and the back button to get back to the notebook screen, and tap and hold the notebook section (not the notebook) you want to set as the default.
Life Isn't Perfect Yet
But of course, there are some issues.  It seems like the phone reads different "section" information than the WebApp.  When you add any notebook to the phone using what I've described, the first section in the notebook is always "Untitled".  No matter what you do, you can't rename that.  In fact, any section renaming you do on the web using the WebApp doesn't EVER get reflected on the phone.  Moreover, if you try to change page names on the phone, you get a sync error.  You can view the sync errors on the web, but your only option is to copy the data out of the changed page, because the only resolution you have to sync errors is to delete the version the phone made.
It'll Do Until Mango
I could (and will) ask for more, but this will do.  I can now have whatever notebook pages I want on my phone, synced so I can work on them offline, in very high-fidelity. I can structure my notebooks' sections and pages any way I want... as long as I do it right the first time, and delete the automatically added first section.  I can even add voice notes, pictures, and video to the pages from the phone.  On top of that, with a little tap-and-hold, I can pin any notebook page right to the start menu.
Personally, I don't see how the one limitation here - section and page naming and syncing - is taking a year to fix.
Recap
Create your notebook "properly" using the WebApp
  1. Create the notebook
  2. Add a new section
  3. Delete the original "Untitled Section"
  4. Construct your notebook sections and pages as you wish - named perfectly on creation.
Link the notebook on the phone
  1. Surf in IE "mobile" mode to the SkyDrive
  2. Open the notebook
Limitations
  • Notebook, section, and page names can NOT be changed.  If you want to rename a section or page, create a new one, move the data in the pages, and delete the original (old) named section/page.
To the cloud!

Wednesday, February 16, 2011

How To: Load Dimension Tables with Integration Services - Introduction

When I first started down the business intelligence and data warehousing road, I needed to learn the foundations of dimensional design.  Ralph Kimball's methods seemed appropriate at the time, and they still are for me.  The next step was learning how to make Microsoft's BI tool stack follow that methodology, and after that, making the technology work as easily, reliably, and as fast as possible.
There Are Knowledge Gaps
Over the last year, I've noticed that although there's a lot of interest in the last step in that process - making things go fast - there seems to still be a lot of confusion over the first steps.  Quite a few interactions I've had with the community lead me to believe that there's a significant number of people skipping over understanding the dimensional model, and how ETL (of any brand or style) should be used to manage loading one.
I see questions in the MSDN forums asking why the OLE DB Command isn't inserting rows into their dimension table... which clearly demonstrates two problems the asker has.  First, they don't understand that the OLE DB Command doesn't insert rows - that's not its purpose.  But the second and more fundamental issue is that they don't understand why it doesn't.  They aren't grasping the concept of new row versions and updating old rows.  If they understood that, they wouldn't wonder why the OLE DB Command didn't insert rows, because they'd know adding rows to their dimension table shouldn't happen at that point.
I Sucked Too
This also came up quite clearly in my side-project of the Kimball Method SCD component.  I assumed (and you know what that means) that people would know what to do with the "New" and "Expired" outputs I'd labeled on the component.  I held on to that belief so fundamentally that I created a video intending to demonstrate the performance of the component, but completely ignoring the dimensional modeling accuracy of the demo.  In fact, the video showed exactly what not to do when inserting/updating rows in the dimension table.
But Then, So Did AdventureWorks
I tried to find a fairly simple example of dimension loading that clearly showed "data in" and "data out" and how that mapped to the SSIS SCD Wizard or any other tool.  I expected to find that in some Microsoft samples - like the AdventureWorks Integration Services samples.  Unfortunately, the writers of this example also assumed too much about the readers' knowledge of dimensional modeling.  The AdventureWorks samples contain an SSIS package specifically named AWDataWarehouseRefresh - implying that it can be used to "refresh" AdventureWorks dimension tables.  But it loads dimension tables into SQL Server from a CSV file.  A completely populated dimension table - a snapshot in time.  No use of the SCD Wizard.  No incremental loading.  I really can't understand why they thought it would be an accurate demonstration of loading a table in a data warehouse specifically - fact tables OR dimensions.  That example applies to any kind of table, data warehouse or not, and provides no help at all for the specific problem of loading dimension tables on an incremental basis.
A New Step-By-Step
So here's my second attempt at doing it better the first time.  I've made a few videos and other resources to try to better explain how to incrementally load dimension tables with SSIS, and I'll be posting them in a series here on my blog.  The videos were originally intended to better explain my Kimball Method SCD component and are already available there, but they seem to fit this bill as well.  Please do comment on specifics that make them easy or hard to understand.  I've made comments through my channels to Microsoft about improving the information available for this kind of thing - but your comments can help me confirm to Microsoft that this isn't all a figment of my imagination.
The next post in this series will provide and describe the sample data stored in our data warehouse, and the source system that updates will come from.

Saturday, February 5, 2011

Speed Up SSIS - February Talks

Improving the performance of packages is always a focus of ETL developers.  It should go without saying that your first duty is to make the package work... after which you can then make it work fast.  In an effort to help you out with the latter, I've got two presentations I'm set to deliver this month.  The first is to Colin Stasiuk's (blog|twitter) Edmonton PASS chapter, and the second is at SQL Saturday #65 in Vancouver.
Diagnosing and Addressing Performance in SSIS
On February 16th, the Edmonton PASS chapter is hosting me virtually to talk on the subject of optimizing SSIS packages.  You too can attend this webcast even if you're not there - I can guarantee that, because if it's not broadcast, I won't be able to deliver it, because I'm not going to be there in person either.
I'll be covering one of the essential steps that too often seems to get overlooked in trying to make SSIS packages "go fast"... and that's knowing how to find out where they're actually "going slowly."  If you don't know why this is important, or don't know there are techniques that can help you do this, you should probably make time to sign up.
I'll have to make a dig or two at Colin's expense during the webcast.  He trimmed my bio down on his site... he must not appreciate my humour.  Please feel free to feed me some fodder in the comments.
Data Warehouse ETL with Integration Services
On February 26th, the famous SQL Saturday franchise will make its first stop in Vancouver at the Pinnacle Mariott.  If you're anywhere close to the area and touch SQL Server, you need to be here.  The following week is the Microsoft MVP Summit in Redmond, which the organizers of Vancouver's SQL Saturday exploited to the fullest.  A ton of SQL MVPs will arrive a little sooner than normal to the Summit just to be able to make the short trip up to Vancouver.  Some of them will be presenting - for example, SQL rock stars like Brent Ozar and Brad McGehee.  Others couldn't get a speaking slot, but they're coming anyway, and they're all open to conversation.
I was fortunate enough to leverage my "local" credentials to get a session on performing data warehouse ETL with SSIS.  I'll start by describing the multiple ways to fill dimension tables with SSIS, leaving you with solid guidelines to pick a technique that performs in your scenario.  I'll finish up with the relatively simple techniques involved in fact table population, as well as some optimization techniques to speed the whole process up.