Friday, February 12, 2010

My CodePlex Wishlist

Now that I've finally gotten version 1.5 of the Kimball Method Slowly Changing Dimension custom component for SQL Server Integration Services published, there will surely be some bugs and requests that need addressing.
The project is hosted on Microsoft's open source project site, CodePlex.  Now, I'm not an OSS veteran, so I can't compare CodePlex to SourceForge or any other OSS hosting site, because I've never used them.  To date, I'm very pleased with the CodePlex experience - the project pages are easy to set up and edit, the discussions and issues work well, the releases (downloads) function effectively, the source control was pretty easy to set up, and the statistics are neat.  That said, there's always room for improvement, and I'm glad that the CodePlex team has a feedback mechanism set up to process it, and follows an agile process to work through the suggestions.  For example, they've just added Mercurial support to CodePlex - which looks like a very cool thing for larger development teams.
Even so, there are a few "project management" things that I wish were easier for me to accomplish, so I could spend more time coding, and less time managing the project.  I've filed a few of them with into the CodePlex issues list - and if you would like to help me (and I think other developers that use CodePlex too) then please sign on to CodePlex and vote up my following suggestions.  The CodePlex team has recently knocked off some "big" items that had a lot of votes - so much so that my requests are very near the top of the pile (when ordered by votes) - help me to push them to the top!
Release Management and Issue Reporting
To me, Issues and Releases go hand-in-hand.  Issues are bugs and feature requests that occur in one particular release and are addressed in another (or a revised) release.
Two things drive me bananas with bug reports in particular.  First, there's no facility for the person reporting the issue to indicate which version they observed the issue in, aside from the comments.  And we all know that users (yes, even I) don't always remember to include this essential piece of information in their bug report.  As you all probably know, it's a little hard to diagnose a problem when you don't know what version of the application the user is having an issue with!  The second thing is that I would really like to ask the user which version of SSIS, Windows, and bitness they're running.  Again, very key information that I would like to prompt from the user without having to hope they remember it being important.
The other side of the coin is my ability to report progress back to the people who filed the reports, or others who may be watching them.  There's the ability to indicate the issue is "fixed", and even to "close" the issue.  Fantastic.  But does that tell anybody anything useful?  What release did the fix make it into?  Does the release page show this issue as being addressed?  Does the release page show open issues related to this version?  No such luck - all that extremely valuable information has to be manually maintained by me. 
It would be great if the issue system would record a little more information (such as the "issue applies to this version", and "the issue is addressed in this version"), and then automatically marked up the releases page of both of those releases (which may be the same release!) with a "known issues" and "resolved issues" section.  CodePlex did just release the ability to tie source code changesets to releases - but that applies more to devs trying to figure things out - and it still doesn't tell them if an issue is actually fixed in the code or not.
If you like those ideas - or just want to make my life easier - see these issues on CodePlex:
Add More Properties to Issues - "Applies To Release"
Automatically Annotate Releases Based on Linked Issues
Discussion Management
One part of CodePlex that I really do like is the facility for users to post up discussions on possible problems, usage techniques, or anything else that relates to the project.  At the moment, though, it's quite unmanaged and can become a rats nest of confusing and worthless babble.  I'm not saying that user comments aren't wanted - but I would like to keep them organized an on topic so that people other than the original poster (or additional posters) can get some value out of the process.
My wishes for the discussions are heavily influenced by what I can do as a moderator on the MSDN SSIS forums.  I'm not asking for anything quite so sophisticated (although others are) - but a few more manageability features to keep things clean and organized would be great:
Add Ability to "Sticky" or "Pin" a Discussion
Allow Discussion Threads to be "Split"
Thanks to the CodePlex Team
Despite those minor changes I wish for, CodePlex has been a great place to find samples, working code, and to share what I've made with the community.  A hearty thank you to Sara Ford (blog|twitter) for steering the ship to where it is today (she's just announced she's moving on) and to the CodePlex team who've worked very hard to make it as successful as it is!  Follow all their announcements about new features on their blog, and on Twitter.  I'll be seeing you at your session at the MVP Summit shortly...
My Projects On CodePlex
Just in case you're interested, I currently have six projects open on CodePlex - all custom objects for Integration Services.  In order of importance...
  1. SSIS Community Tasks and Components - This "project" actually hosts John Welch's (blog|twitter) Batch Update Destination for SSIS.  But we've made it do double-duty to really show off the community that's grown to extend Integration Services.  Over 100 addons and extensions are listed - most of them free or open-source.
  2. Kimball Method Slowly Changing Dimension - Frustrated with the SCD Wizard in SSIS?  You can roll your own, use Script, the T-SQL MERGE command, or the very useful TableDifference component.  But I believe the KSCD beats them all!
  3. File Properties Task - Something that should have been "in-the-box" from the get-go, it's a task that lets you see if an expected file exists, and more.
  4. Send HTML Mail Task - Allows you to send HTML formatted email, something you can't do with the stock Send Mail Task.
  5. HTML Table Destination - Takes a rowset in a Data Flow and stuffs it into an SSIS string variable as an HTML marked-up table.  (I use this a lot with the Send HTML Mail Task.)
  6. Pause Task - Something you may need from time to time to assist with coordinating processes in your ETL or DI, this task allows you to wait for a certain number of milliseconds, or until a specific time of day.

Saturday, February 6, 2010

SSAS Calculated Measures for Business (III): Average Inventory

This is a third post in a series about implementing business oriented calculations in SQL Server Analysis Services. See the first and second posts for more information about where this is coming from, and leading up to.  This post is going to investigate how I calculated an average inventory level that I'll be using in my Inventory Turns calculation.
According to my organization (as well as Wikipedia), inventory turns are a measurement of how often you "cycle through" or "turn over" your inventory.  A simple example can be shown by assuming you sell $100 of product over a certain period, like a year.  If you had to keep an average of $50 worth of product on your store shelves for that year, then you have "turned over" your inventory twice.  If you were able to get by with only having $25 worth of inventory on the shelf, then you acheived four turns.  The higher the turns you achieve, the better you're managing your inventory while maintaining sales.
A Wrench in the Average
One of the keys to this calculation is determing what your average inventory was.  In my case, I have a periodic snapshot table for inventory, which has inventory levels taken at specific times.  The curve ball here is that the period isn't consistent.  During the time I've been collecting data from our live business system, I've been able to collect inventory information daily.  I'm choosing to only store weekly snapshots in my fact table, so I discard a lot of that daily information - but I still need to extract it from my source system, because it's intertwined with information I need to populate my dimension tables daily.  However, I've only been accumulating this weekly information from partway through the year.  I have been able to extract older inventory information from backup snapshots - but it's on a monthly basis, not a weekly one.  Therein lies the rub.
A simple average aggregate will definitely NOT work when the periodicity of the inventory levels is inconsistent.  I can't sum all of the inventory measurements within a time period and divide by the number of measurements.  If I do, any sub-period within that period that happened to measure weekly levels will be weighted much more heavily (about 4 times greater) than the sub-period that only captured monthly inventory levels.  I have to compute a weighted average.
Forcing Weekly Periodicity?
One way to weight an average is to make sure that the snapshots all occur at the same periodicity.  If you can do that, then you might be able to actually use a simple average aggregation to calculate average inventory.  I say "might be" because there's a caveat here.  Remember that the SSAS average is the total divided by the number of rows.  That means that for every item you might want to calculate an average for - you have to have one row for each and every period - even periods before the item existed or after it "expires".  That's because your crafty business users may (unintentionally, of course) pick a period that extends beyond the life of your item... resulting in too few "rows" to divide the total by.
In order to get the correct average, I'm going to have to be able to figure out a weekly number even for those periods where I have only monthly information.  Luckily, this is easy to do with SSAS, and is actually a side-effect of something you're going to have to do with non-additive measures like inventory levels.  (You can't be summing them up, can you?)  When configuring a measure like inventory level, you're going to use a non-default non-additive aggregation method - in my case, LastNonEmpty.  That aggregation type will give me the last specified inventory level regardless of whether there's an inventory record for the specific date I ask for.
Calculating the Correct Average
Now that you have a measure that can supply a value for any date you specify, you'll need to get an average by specifying a specific periodicity.  Specifying a particular periodicity - weekly in my case - will result in the average function getting the right denominator, and therefore dividing by the right number.  The calculated measure I'm defining will do both those things:
Avg(

  Descendants([Date].[Fiscal Calendar].CurrentMember, [Date].[Fiscal Calendar].[Fiscal Week], SELF),
  [Current Amount Material Cost On Hand])
The Descendants function returns the set of descendants (children) at the level that I've specified.  In the above, I've told the function to return all of the Fiscal Week descendants included within the current date slice.  The Avg function then looks at the [Current Amount Material Cost On Hand] measure - which I've defined with the LastNonEmpty aggregate - for each Fiscal Week returned.  In months where I just have the one snapshot, it add up the prior month's inventory level, one time for each week, then divide that by the number of weeks.  Exactly what I'm looking for.
AlmostThere!
I believe I'll have just one more post after this one to arrive at my desired calculated measures: GMROI and Inventory Turns.  This post resulted in a suitable calculation for average inventory, and prior posts solved the annualized sales figure issue.  I have all the parts - now it's time to put them together.