Monday, May 31, 2010

Using the Data Viewer Intelligently in SSIS Data Flows

If you've ever needed to debug a fairly complex Data Flow, you've probably wanted to take a peek at what data is moving through the flow at various points.  The data viewer in Integration Services is how you do that - but it has a major drawback.  You may be processing thousands or millions of rows - and you'll have to page through each buffer with a plain data viewer.  If you're trying to debug such a flow, one of the best ways to do so is by narrowing the scope of what you're looking at.  But doing so with a data viewer means you need to restructure or filter your data flow, which means you definitely need to remember to remove that filtering before it goes to the next step of development.
Here's a little tip that should help you weed through data that you don't want to see, as well as move your "viewport" from one place to another in the data flow as easily as possible.  Almost as easily as just removing and adding a new data viewer.
The Parts
You'll have to add a Multicast, Conditional Split, Sort, and Derived Column to your data flow.  They each serve a purpose - and depending on your flow, you may or may not need each one, or you may be able to exchange it for another component.
The Multicast
This is the key to the "mobility" of using this technique.  Wherever you want to see what's passing by - the place where you would otherwise place a data viewer - simply disconnect the flow, and insert the Multicast in between:
The Conditional Split
Next, since you don't want to see a million rows, or have to press that "play" button a thousand times, hoping to see relevant data, place a Conditional Split after the Multicast:
Inside the Conditional Split, place whatever condition will filter out only the records you're interested in seeing.  In my case, I only want to see rows that have a customer key of 451:
The Sort
This is an optional part of the technique, but can be quite useful when the rows you're interested in are scattered throughout the rows passing by in the data flow.  If you've researched the optimizations that SSIS uses to process data, you should know that the Multicast doesn't actually copy memory and the Conditional Split doesn't actually split rows either.  The Multicast is simply a visual aid for you to design packages with - the pipeline engine looks at it as a license to multi-thread the data flow.  The Conditional Split isn't really moving rows down two paths, it's simply marking each row with an "output name", and only allows relevant rows to be operated on by transforms attached to the various outputs.
If you simply attached a data viewer to the output of the Conditional Split, you may have to page through tens or hundreds of buffers (pages) by pressing the "play" button... while only seeing one row on each page.  Using the Sort component forces the pipeline engine to copy and reformat the buffer.  You can think of it as "buffer defragmentation."
The Derived Column
Virtually any transform will do here - but the Derived Column works well in that you can simply drop it on the surface without configuration and it will work.  It's purpose here is simply to provide an endpoint, so that we can place a data viewer between it and the previous transform:
Using the Technique
The first time you set up this sub-flow, it could look something like this:
Each part of the sub-flow I've described here would have correct metadata, including the data viewer.  If you run the flow, you'll see your data viewer pop up with only the filtered results you were looking for.  After you're confident that the flow is dealing with the data correctly up to that point, you may want to move it to examine what the data looks like after the next component in the main flow.  To do that, you need to:
  1. Detach the incoming flow to the "Data Viewer Multicast" as well as the outgoing flow from it, and reattach the main flow back together.
  2. Detach the main flow where you want to insert the "intelligent" data viewer sub-flow, and attach the "Data Viewer Multicast" in between the two components.
  3. Resolve any metadata issues (red circles with the white X) in the sub-flow.
  4. Remove and re-add the data viewer on the flow entering the "Data Viewer Endpoint" if your column metadata's changed.  (You won't get an error or warning about this!)
You would end up with something like this:
Wrapping It Up
No, this isn't the easiest or quickest way to monitor what's going through your pipeline - but it does provide more flexibility and usability than the plain data viewer for complex flows that need debugging.  Especially ones like the one I'm working on now, where I'm moving this down the flow... then up... then back down again...

Thursday, May 27, 2010

The SSIS Data Flow is Like... an Automotive Assembly Line

When I describe (or rather, attempt to describe) technical subjects to people that aren't familiar with them, I tend to fall back on automotive metaphors.  No, I'm not a gearhead, and no, it doesn't work every time, but it does seem to be the first thing that pops in my head.  In the case of the SQL Server Integration Services Data Flow, I think it works - and I'd like your thoughts on it.
The General Idea
I tend to describe the data flow pipeline as an automotive assembly line.  Most people are familiar with the general physical constraints of an assembly line, and a lot of the problems that happen in assembly lines happen in the data flow as well - it's just that they're easier to describe because you're talking about physical things.  In particular, the concept of data buffers "moving past" the components in the data flow is very analogous to car chassis' moving past assembly stations in the line.  This is typically a very difficult concept to convey, but becomes simpler in this metaphor.  A common example I give is that you can think of the Derived Column component like a station on the assembly line that bolts on doors to the car.  (That would be a new column.)  Or a Lookup component having to attach a customer's desired "options list" to the car, knowing only the car's ID on the assembly line.
Describing Performance Problems
With the assembly line metaphor, you can fairly easily describe the common performance problems that affect a data flow.  The difference between blocking and non-blocking components typically means the audience has to try to hold numbers and concepts in their heads, but gets easier with cars.  For example, you can describe the Sort component as being like a "limited edition" badge for the cars.  This badge is required to have a "1 of 1000" type imprint on it... except it needs to be accurate and know how many cars (and badges) need to be created before it can affix the first one to the first car.  Therefore, it has to wait until ALL the cars have reached that point in the assembly line so that it can count them and correctly generate the set of badges.
Once everyone hears that explanation, it's fairly easy to get them to understand that with a "production run" of ten cars, it might not be too bad to have all the chassis wait on the line itself.  But when you get to one thousand cars, you need to take floor space inside the building to "hold" the cars.  Get to a million, and you realize that you'll have to park the cars outside of the plant, because there just isn't enough floor space (RAM) to hold on to all of them.  Keeping some of the cars inside the plant doesn't help at all, and actually reduces the space you have left over for all the other stations (transforms) on the line.
You can even describe the EngineThreads property as actual workers.  You have ten, and they can work at any station - but you only have ten...
Does This Work For You?
Of course, the metaphor breaks down handily when you get to the Multicast component... but so far it's seemed fairly robust in getting people to understand how the pipeline can concurrently operate multiple transforms on the data in parallel, even when the flow looks very serial in the designer.
I'd like your feedback on this metaphor - does it work for you?  Where else does it break down - and is that breakdown at an acceptable point in how knowledgeable someone would have to be in SSIS in order to spot it?
If this idea doesn't go down in flames, I have some follow on thoughts to take it further...

Tuesday, May 25, 2010

SQL Saturday #27 Recap

What a fantastic event - and extremely well-run by Arnie Rowland (blog|twitter) and Stuart Celarier (blog|twitter) and their army of volunteers.  From the minute I got to the Friday dinner until the last glass of beer was drained at one of the unofficial decompression events, it was spectacular.  The University of Portland facilities were exceptional - the rooms were well equipped to allow dual-monitor and projector use, and the seating allowed for lots of room and sightlines.  If you were there, you know exactly what I mean.  Thanks to the sponsors, volunteers, and especially the attendees - you were all very interested and didn't nod off, which is very gratifying for a presenter.
Apparently over 850 people attended - probably helped by the rainy day that crossed off "yard work" from a great many Honey-Do lists.  My schedule only allowed me to sit in on Denny Cherry's (blog|twitter) SQL Service Broker session - and it was a great session to catch.  With my background, I guess I'd never realized that SQL devs didn't really think asynchronously, and hadn't had a built-in asynchronous processing mechanism until it came along.
Thanks for the reminders about the slide decks and demo material.  Feel free to use the content - but do use your own imagination to tell the story in your own words.  Here are the links, and to the SpeakerRate evaluation pages for those sessions:
Business Intelligence and Data Warehousing Primer - Slides, evaluation.
Dimension Processing with SSIS: from Simple to Complex - Slides, demo, evaluation.
UPDATE 2010-06-10: Apologies to those who've tried to download the demo and run it.  I mistakenly developed that with an alpha version of the KSCD (v1.6) which I haven't yet uploaded to CodePlex.  Therefore, it won't work for you.  I'll do my best to get that updated and "downgraded" to the currently released version so you can use it.
Deep Dive on Integration Services - Slides, evaluation.
Please do take the time to click through to evaluate the sessions on SpeakerRate and let me know if I could do anything to improve your experience.
I'm looking forward to next month in Redmond!

Tuesday, May 18, 2010

SQL Saturday #27 - Portland!

I'm looking forward to this Saturday with a very high level of anticipation - it will be my first SQL Saturday.  Arnie Rowland (blog|twitter) has coordinated SQL Saturday #27 alongside the Portland Code Camp, which is providing talks on tons of other technologies, from regular code-hacker Java/.Net/Ruby sessions to Mobile apps (iPhone/WM7), and gaming.
Some of the great people I've met at prior SQL conferences and user groups will be there to present.  Denny Cherry (blog|twitter) has a few sessions on Service Broker,virtualization, and indexing.  Tim Ford (blog|twitter) is talking about HA/DR and DMVs.  And who could leave out Buck Woody (blog|twitter) who's sure to talk up SQL 2008 R2 while lambasting Oracle ("Write down 'Larry'... again").  I know I'll meet some more great people too.
I'll be doing some talks on data warehousing and Integration Services.  I'll open the morning with my DW/BI Primer, explaining why there's such a big desire out there to put data in charts and process billions of rows on a netbook and what makes that hard.  In the afternoon, a deeper technical discussion of processing dimensions using SSIS, with a little walkthrough and performance comparison, including my Kimball Method SCD component.  Then after dinner (for those that can still function), I'll do a deep dive into Integration Services, doing my best to show all it's capabilities and functionality within the ETL space.
That's a lot for me to get through - but I loaded the plate up because Portland is a little ways to get to, and it might as well be worth it.  Next up, I hope to present at SQL Saturday #43 in Redmond.  I've only put one talk up there, with the intention of spending more time listening to other people's great ideas.

Tuesday, May 11, 2010

Five Things SSIS Should Drop

It's hard being a software giant.  There's so many people to try to make happy with your product, and with a bank account that dwarfs a large percentage of the world's governments... you may actually think you should deliver on trying to make those users happy - or at least not make them unhappy.
One of the things that tends to happen in those cases is that backwards compatibility is given a large amount of sway in product development.  Because removing functionality from a product - regardless of how poorly implemented or misused it is - will probably adversely affect a large number of customers.  Unfortunately, that very thing is what most products need from time to time.  A little house cleaning - and maybe because of the springtime mood, the SQL Server blogger community has taken on a "five things SQL Server should drop" meme, started by Paul Randal (blog|twitter).  Jamie Thomson (blog|twitter) tweaked the meme towards our mutual niche, and I'll follow his lead with my list of five things that Integration Services should do away with, in no particular order.
1. The Properties Window
This area contains some extremely important settings for some of the constructs in Integration Services.  Unfortunately, it's in a completely unrelated UI element - a pane or floating window - that changes contents based on what object is currently selected.  That might not be horrible in itself - except that it's also cluttered with a multitude of properties that you shouldn't touch, and lots of properties you can't edit.  Please - take the relevant, important properties and put them in a comprehensive UI with the object you're editing.  Whether that's in the variables pane, where you should be able to edit the variable's expression, or the Task UIs where you should be able to specify the transaction behaviour.  (The list is endless.)
2. The Data Profiling Task
Yes, I do think this task is useful.  No, I don't think the idea is bad, or that the tool itself should go away.  But does it really belong in a product who's purpose is automated data extraction, transformation, and loading?  The key word from the above is "automated".  The Data Profiling Task is never part of an automated solution (without massive hackery that's better left to a Script) - it should always be part of the analysis done before package design is ever started.  That functionality should be inside SSMS or some other data quality/exploration toolset.
3. Shrink Database Task
This one is a cop-out on my part.  Such an easy target based solely on the arguments from multiple experts - don't automate shrink operations.  (Just see the rest of the "5 things to drop from SQL Server" posts for proof.)
4. Lineage IDs
As a developer and dimensional modeler, I can completely understand why a "surrogate key" lineage ID was used instead of the "business key" of the column name.  But the trouble in transferring that excellent argument from those worlds to SSIS package design is that this key isn't hidden from the user.  In dimensional modeling and development (code or DB), it's a good thing to use surrogate identifiers to refer to objects in case the (dumb) user decides to change how they refer to the object.  It makes internals nice and clean.  Unfortunately for SSIS, the lineage ID isn't buried as far as it should have been - and at this point I don't actually think you can bury it far enough unless you pick up a shovel.  I might be wrong on the internals (I don't get to see the source) but the utility expected from abstracting a column's identification just hasn't been there.
5. The "Advanced" Editor
Fairly similar to my dislike of the Properties window, the "Advanced" editor (a most inappropriate name) is used fairly frequently to make critical changes to a Data Flow component.  Changes like informing SSIS how the data is ordered when delivered by a source component.  Again, there is more than enough capacity in the "basic" (?) UI to handle that kind of configuration.  And again, the "advanced" editor is cluttered up with tons of properties you shouldn't touch, and a lot you can't (because they're read-only).  In my opinion, it's a crutch for not taking the time to develop a completely usable UI.

Any other SSIS users out there have a part of Integration Services they'd like to march to the gallows?

Monday, May 3, 2010

Kimball SCD Internals: A High-Performance Ordered List

In order to satisfy some of the performance desires that I had for the Kimball Method Slowly Changing Dimension component, I had to implement some code constructs to support the processing. This post describes one of those constructs - what I call an OrderedHashList.  See Kimball SCD Internals labeled posts for more information.
The Problem
Much like a Merge Join in SSIS, the Kimball Method SCD component matches incoming rows from different inputs to each other (where possible).  One of the restrictions of participating in the SSIS data flow pipeline is that the component itself can't control which of the inputs is delivered rows.  This means that even when the rows are sorted - like the way the Merge Join requires - the probability exists that you'll get a bunch of rows from one of the inputs and have to hold on to them before you get matching rows from the other input.  In essence, the component has to "cache" incoming rows.
Unlike the Merge Join component, the Kimball SCD component doesn't require sorted inputs; you can deliver rows to the Kimball SCD in unsorted order.  The benefit of requiring sorted inputs is that you can imply that a row from one input will have no possible match on the other input if rows have sorted column values that occur "later" in the sort.  For example, you can guarantee that a row with a join key of "B" will never match a row coming in the other input if you've seen a "C" on that input.
The Kimball SCD component does this kind of optimization as well - but it also has to do the more mundane process of simply cachine everything until it finds a match.  But as I said, adding to a generic cache is relatively mundane.  It's all the other aspects of caching - retrieval, for example - that require more thought.  Specifically, these properties were must-haves for my caching mechanism:
  1. It must be thread-safe for readers and writers. There will be many threads attempting to add and remove items from a single cache construct.
  2. Key/value/arrival order triples are to be stored.  (Arrival order being the ordinal position of the row from the input stream - i.e.: "this is the fifth row received.")
  3. It must permit items to be interrogated or removed by key.
  4. It must permit items to be interrogated or removed by their arrival order.
  5. It must return subsets by specification of a range of arrival orders.  (i.e.: "give me any rows that arrived after the fifth row, but before the tenth.")
  6. All operations should be O(1), or O(log n) at worst - no O(n). The collections could potentially contain millions of items, and O(n) operations cost dearly at that scale.
The Possibilities
Not very many .Net collections could represent those requirements alone - especially #2.  Dictionaries and Hashtables have key and value "pair" concepts - but nothing has a "triplet" concept.  It gets even more difficult when the last item needs to be taken care of - a great deal of the internal collections have O(n) addition, removal and/or interrogation penalties to pay.
The Plan
This would be a little more complicated than a simple combination of stock collections. 
In order to store the triplets, I created an "entry" construct that kept the triplet as an atomic unit.  My collection would store those "entries".  In order to be able to retrieve any entry quickly by it's "key", I stored those entry constructs in a Hashtable, keyed by the "key."  In order to retrieve any entry quickly by its arrival order, I stored the keys in a Hashtable - the "key" was the value, and the arrival order was the key.  A double lookup allowed retrieval of a "value" by the arrival order.  In order to be able to retrieve a set of entries quickly by a range of arrival orders, I kept a List of keys in the order they arrived.  To mitigate the performance hit of removing items from the List, I didn't remove items from that List immediately.  Instead, I added them to a Hashtable of "deleted keys" that I could use to modify the results of other operations in an O(1) manner.  I then intersected the List and Hashtable of deleted items to rebuild the List - an infrequent O(n) operation - when I needed to retrieve a range of results.  I think there's still room for improvement here.
The Result
What I ended up with was a very well performing construct for the requirements. It's very specific to my intended use, and most common operations achieve O(1) performance.
The OrderedHashList:
  • Has O(1) behaviour on adds
  • Has O(1) behaviour on inquiries by key or arrival order
  • Has O(log n) behaviour on removes
  • Has O(n) behaviour on some range retrievals, O(log n) on others
  • Encapsulates all required thread safety - no external locking via a public SyncRoot-type property is required.
    #region CLASS: OrderedHashList<K, V>
    
/// <summary>
    /// The OrderedHashList provides a HashTable-like collection of key-value pairs, but extends
    /// that capability to allow an "order" to be associated with eash key-value pair.  The resulting
    /// key-value-order pairs can be retrieved by key or order, and can be enumerated like a List.
    /// </summary>
    
public class OrderedHashList<K, V>
    
{
        #region CLASS: OrderedHashListEntry
        
private class OrderedHashListEntry
        {
            #region
Private Variables
            
private K _key;
            
private V _value;
            
private int _arrivalOrder;
            
#endregion

            #region Constructor
            
public OrderedHashListEntry(K key, V value, int arrivalOrder)
            
{
                
this._key = key;
                
this._value = value;
                
this._arrivalOrder = arrivalOrder;
            
}
            #endregion

            #region
Public Properties
            
public K Key
            {
                get {
return this._key; }
            }

            
public V Value
            {
                get {
return this._value; }
            }

            
public int ArrivalOrder
            {
                get {
return this._arrivalOrder; }
            }
            #endregion

            
public override string ToString()
            
{
                
return "[" + this._key.ToString() + "].[" + this._arrivalOrder.ToString() + "]"
                    
+ ".[" + this._value.ToString() + "]";
            
}
        }
        #endregion

        #region
Private Variables
        
private readonly object _lock = new object();
        
private Hashtable _entriesByKey;
        
private Hashtable _keysByArrivalOrder;
        
private List<K> _keysInArrivalOrder;
        
private Hashtable _deletedKeysInArrivalOrder;
        
#endregion

        #region Constructor
        
public OrderedHashList()
        
{
            
this._entriesByKey = new Hashtable();
            
this._keysByArrivalOrder = new Hashtable();
            
this._keysInArrivalOrder = new List<K>();
            
this._deletedKeysInArrivalOrder = new Hashtable();
        
}
        #endregion

        #region
Public Properties
        
/// <summary>Returns the number of OrderedHashListEntries contained
        /// in the collection.  This is an O(1) operation.</summary>
        
public int Count
        {
            get {
lock (this._lock) { return this._entriesByKey.Count; } }
        }

        
/// <summary>Returns a complete list of values contained in the
        /// collection.  This is an O(n) operation.</summary>
        
public List<V> Values
        {
            get
            {
                List
<V> values = new List<V>();
                
lock (this._lock)
                
{
                    
foreach (OrderedHashListEntry entry in this._entriesByKey.Values)
                    
{
                        values.Add
(entry.Value);
                    
}
                }
                
return values;
            
}
        }
        #endregion

        #region
Public Methods
        #region Clear
        
/// <summary>Empties the OrderedHashList.  This is an O(1) operation.</summary>
        
public void Clear()
        
{
            
lock (this._lock)
            
{
                
this._entriesByKey.Clear();
                
this._keysByArrivalOrder.Clear();
                
this._keysInArrivalOrder.Clear();
                
this._deletedKeysInArrivalOrder.Clear();
            
}
        }
        #endregion

        #region Add
        
/// <summary>Adds a key-value-order to the OrderedHashList.  An exception
        /// will be thrown if the key or arrivalOrder is already present in the collection.
        /// This is an O(1) operation.</summary>
        /// <param name="key"></param>
        /// <param name="value"></param>
        //// <param name="arrivalOrder"></param>
        
public void Add(K key, V value, int arrivalOrder)
        
{
            OrderedHashListEntry entry
= new OrderedHashList<K, V>.OrderedHashListEntry(key, value, arrivalOrder);

            
lock (this._lock)
            
{
                
if (this._entriesByKey.Contains(key))
                
{
                    #region Report key already exists
                    
if (System.Diagnostics.Debugger.IsAttached)
                    
{
                        System.Diagnostics.Debugger.
Break();
                    
}
                    
throw new InvalidOperationException("Key (" + key.ToString() + ") already exists in the collection");
                    
#endregion
                }
                
if (this._keysByArrivalOrder.ContainsKey(arrivalOrder))
                
{
                    #region Report order already exists
                    
if (System.Diagnostics.Debugger.IsAttached)
                    
{
                        System.Diagnostics.Debugger.
Break();
                    
}
                    
throw new InvalidOperationException("Arrival order (" + arrivalOrder.ToString()
                        +
") already exists in the collection");
                    
#endregion
                }
                
try
                
{
                    
this._entriesByKey.Add(key, entry);
                    
this._keysByArrivalOrder.Add(entry.ArrivalOrder, entry.Key);
                    
this._keysInArrivalOrder.Add(entry.Key);
                
}
                #region
catch ...
                
catch (Exception ex)
                
{
                    
if (System.Diagnostics.Debugger.IsAttached)
                    
{
                        System.Diagnostics.Debugger.
Break();
                    
}
                    
throw new ApplicationException("Unexpected exception in OrderedHashList.Add: "
                        
+ ex.Message, ex);
                
}
                #endregion
            }
        }
        #endregion

        #region Remove
        
/// <summary>Removes the entry with the specified key from the collection.
        /// This is an O(log n) operation.</summary>
        /// <param name="key"></param>
        
public void Remove(K key)
        
{
            
lock (this._lock)
            
{
                
if (this._entriesByKey.ContainsKey(key))
                
{
                    OrderedHashListEntry entry
;
                    
entry = (OrderedHashListEntry)this._entriesByKey[key];
                    
if (entry != null)
                    
{
                        
try
                        
{
                            
if (this._entriesByKey.ContainsKey(entry.Key))
                            
{
                                
this._entriesByKey.Remove(entry.Key);
                            
}
                            
else
                            
{
                                #region Report
internal corruption
                                
if (System.Diagnostics.Debugger.IsAttached)
                                
{
                                    System.Diagnostics.Debugger.
Break();
                                
}
                                
throw new ApplicationException("Internal collection corruption detected in "
                                    
+ "OrderedHashList.Remove. Key (" + key.ToString() + ") was not found.");
                                
#endregion
                            }
                            
if (this._keysByArrivalOrder.ContainsKey(entry.ArrivalOrder))
                            
{
                                
this._keysByArrivalOrder.Remove(entry.ArrivalOrder);
                            
}
                            
else
                            
{
                                #region Report
internal corruption
                                
if (System.Diagnostics.Debugger.IsAttached)
                                
{
                                    System.Diagnostics.Debugger.
Break();
                                
}
                                
throw new ApplicationException("Internal collection corruption detected in "
                                    
+ "OrderedHashList.Remove. Arrival order " + entry.ArrivalOrder.ToString()
                                    +
" was not found (for key " + key.ToString() + ").");
                                
#endregion
                            }
                            
this._deletedKeysInArrivalOrder.Add(entry.Key, true);
                        
}
                        #region
catch ...
                        
catch (Exception ex)
                        
{
                            
if (System.Diagnostics.Debugger.IsAttached)
                            
{
                                System.Diagnostics.Debugger.
Break();
                            
}
                            
throw new ApplicationException("Unexpected exception in OrderedHashList.Remove "
                                
+ "removing entry from internal structures: " + ex.Message, ex);
                        
}
                        #endregion
                    }
                    
else
                    
{
                        #region Report
internal corruption
                        
if (System.Diagnostics.Debugger.IsAttached)
                        
{
                            System.Diagnostics.Debugger.
Break();
                        
}
                        
throw new ApplicationException("Internal collection corruption detected in "
                            
+ "OrderedHashList.Remove. Key (" + key.ToString() + ") was found, but "
                            
+ "the associated entry was null.");
                        
#endregion
                    }
                }
            }
        }
        #endregion

        #region ContainsKey
        
/// <summary>Determines whether the key is in the OrderedHashList.
        /// This is an O(1) operation.</summary>
        /// <param name="key"></param>
        /// <returns>true if the key can be found in the collection.</returns>
        
public bool ContainsKey(K key)
        
{
            
lock (this._lock)
            
{
                
return this._entriesByKey.ContainsKey(key);
            
}
        }
        #endregion

        #region GetValueAtArrivalOrder
        
/// <summary>Returns only the "value" portion of the OrderedHashListEntry at the
        /// specified ordinal position.  This is an O(1) operation.</summary>
        /// <param name="arrivalOrder">The arrival order associated with an entry.</param>
        /// <returns>The value with the specified arrival order, or null/zero if that
        /// arrival order doesn't exist in the collection.</returns>
        
public V GetValueAtArrivalOrder(int arrivalOrder)
        
{
            
lock (this._lock)
            
{
                
if (this._keysByArrivalOrder.ContainsKey(arrivalOrder))
                
{
                    K key
= (K)this._keysByArrivalOrder[arrivalOrder];
                    
if (this._entriesByKey.ContainsKey(key))
                    
{
                        OrderedHashListEntry entry
= (OrderedHashListEntry)this._entriesByKey[key];
                        
if (entry != null)
                        
{
                            
return entry.Value;
                        
}
                        
else
                        
{
                            #region Report
internal corruption
                            
if (System.Diagnostics.Debugger.IsAttached)
                            
{
                                System.Diagnostics.Debugger.
Break();
                            
}
                            
throw new ApplicationException("Internal collection corruption detected in "
                                
+ "OrderedHashList.GetValueAtOrder.  Located entry (for arrival order "
                                
+ arrivalOrder.ToString() + " -> key " + key.ToString() + ") was null.");
                            
#endregion
                        }
                    }
                    
else
                    
{
                        #region Report
internal corruption
                        
if (System.Diagnostics.Debugger.IsAttached)
                        
{
                            System.Diagnostics.Debugger.
Break();
                        
}
                        
throw new ApplicationException("Internal collection corruption detected in "
                            
+"OrderedHashList.GetValueAtOrder.  Arrival order (" + arrivalOrder.ToString() + ") was "
                            
+"found, but the associated key was not.");
                        
#endregion
                    }
                }
                
else
                
{
                    
return default(V);
                
}
            }
        }
        #endregion

        #region GetKeyAtArrivalOrder
        
/// <summary>Returns the "key" portion of the OrderedHashListEntry at the
        /// specified arrival order position.  This is an O(1) operation.</summary>
        /// <param name="arrivalOrder">The arrival order associated with an entry.</param>
        /// <returns>The key with the specified arrival order, or null/zero if that order
        /// doesn't exist in the collection.</returns>
        
public K GetKeyAtArrivalOrder(int arrivalOrder)
        
{
            
lock (this._lock)
            
{
                
if (this._keysByArrivalOrder.ContainsKey(arrivalOrder))
                
{
                    
return (K)this._keysByArrivalOrder[arrivalOrder];
                
}
                
else
                
{
                    
return default(K);
                
}
            }
        }
        #endregion

        #region GetArrivalOrderOfKey
        
/// <summary>Returns the arrival order portion of OrderedHashListEntry at the
        /// specified ordinal position.  This is an O(1) operation.</summary>
        /// <param name="key">The key associated with an entry.</param>
        /// <returns>The arrival order of the specified key, or -1 if that order
        /// doesn't exist in the collection.</returns>
        
public int GetArrivalOrderOfKey(K key)
        
{
            
lock (this._lock)
            
{
                
if (this._entriesByKey.ContainsKey(key))
                
{
                    OrderedHashListEntry entry
= (OrderedHashListEntry)this._entriesByKey[key];
                    
if (entry != null)
                    
{
                        
return entry.ArrivalOrder;
                    
}
                    
else
                    
{
                        #region Report
internal corruption
                        
if (System.Diagnostics.Debugger.IsAttached)
                        
{
                            System.Diagnostics.Debugger.
Break();
                        
}
                        
throw new ApplicationException("Internal collection corruption detected in "
                            
+ "OrderedHashList.GetValueAtArrivalOrder.  Located entry (for key "
                            
+ key.ToString() + ") was null.");
                        
#endregion
                    }
                }
                
else
                
{
                    
return -1;
                
}
            }
        }
        #endregion

        #region GetKeys
        
/// <summary>Returns a list of keys that have arrival orders between to numbers.
        /// This is an O(log n) operation.</summary>
        /// <param name="fromArrivalOrder">The arrival order to start from (inclusive).</param>
        /// <param name="toArrivalOrder">The arrival order to end at (inclusive).</param>
        /// <returns>A List of keys whose order values fall within the given range.</returns>
        
public List<K> GetKeys(int fromArrivalOrder, int toArrivalOrder)
        
{
            List
<K> keys = new List<K>();

            
lock (this._lock)
            
{
                
try
                
{
                    
if (this._entriesByKey.Count > 0)
                    
{
                        
int fromIndex;
                        
int toIndex;
                        
#region Find approximate start and end (may not be exact)
                        
try
                        
{
                            fromIndex
= Math.Max(0, this.IndexOfArrivalOrder(fromArrivalOrder, true) - 1);
                            
toIndex = Math.Min(this.IndexOfArrivalOrder(toArrivalOrder, true) + 1,
                                
this._keysInArrivalOrder.Count - 1);
                        
}
                        #region
catch ...
                        
catch (Exception ex)
                        
{
                            
if (System.Diagnostics.Debugger.IsAttached)
                            
{
                                System.Diagnostics.Debugger.
Break();
                            
}
                            
throw new ApplicationException("Unexpected exception in OrderedHashList.GetKeys "
                                
+ "determining start and end positions to retrieve: " + ex.Message, ex);
                        
}
                        #endregion
                        #endregion

                        #region Refine start and end indexes to be inclusive
                        
try
                        
{
                            
bool indexReady = false;
                            
while ((!indexReady) &amp;&amp; (fromIndex < this._keysInArrivalOrder.Count))
                            
{
                                K key
= this._keysInArrivalOrder[fromIndex];
                                
OrderedHashListEntry entry = (OrderedHashListEntry)this._entriesByKey[key];
                                
if (entry.ArrivalOrder >= fromArrivalOrder)
                                
{
                                    indexReady
= true;
                                
}
                                
else
                                
{
                                    fromIndex
++;
                                
}
                            }
                            indexReady
= false;
                            
while ((!indexReady) &amp;&amp; (toIndex >= 0))
                            
{
                                K key
= this._keysInArrivalOrder[toIndex];
                                
OrderedHashListEntry entry = (OrderedHashListEntry)this._entriesByKey[key];
                                
if (entry.ArrivalOrder <= toArrivalOrder)
                                
{
                                    indexReady
= true;
                                
}
                                
else
                                
{
                                  toIndex
--;
                                
}
                            }
                            toIndex
= Math.Max(toIndex, fromIndex);
                        
}
                        #region
catch ...
                        
catch (Exception ex)
                        
{
                            
if (System.Diagnostics.Debugger.IsAttached)
                            
{
                                System.Diagnostics.Debugger.
Break();
                            
}
                            
throw new ApplicationException("Unexpected exception in OrderedHashList.GetKeys "
                                
+ "refining start and end positions to retrieve: " + ex.Message, ex);
                        
}
                        #endregion
                        #endregion

                        #region Retrieve set
                        
try
                        
{
                            
if (fromIndex < this._keysInArrivalOrder.Count)
                            
{
                                keys.AddRange
(this._keysInArrivalOrder.GetRange(fromIndex, toIndex - fromIndex + 1));
                            
}
                        }
                        #region
catch ...
                        
catch (Exception ex)
                        
{
                            
if (System.Diagnostics.Debugger.IsAttached)
                            
{
                                System.Diagnostics.Debugger.
Break();
                            
}
                            
throw new ApplicationException("Unexpected exception in OrderedHashList.GetKeys "
                                
+ "retrieving keyset: " + ex.Message, ex);
                        
}
                        #endregion
                        #endregion
                    }
                }
                #region
catch ...
                
catch (Exception ex)
                
{
                    
if (System.Diagnostics.Debugger.IsAttached)
                    
{
                        System.Diagnostics.Debugger.
Break();
                    
}
                    
throw ex;
                
}
                #endregion
            }

            
return keys;
        
}
        #endregion

        #region Indexer
        
/// <summary>Indexer that returns the value of the OrderedHashListEntry
        /// specified by the given key.  This is an O(1) operation.</summary>
        /// <param name="key"></param>
        /// <returns></returns>
        
public V this[K key]
        {
            get
            {
                
lock (this._lock)
                
{
                    
if (this._entriesByKey.ContainsKey(key))
                    
{
                        OrderedHashListEntry entry
= (OrderedHashListEntry)this._entriesByKey[key];
                        
if (entry != null)
                        
{
                            
return entry.Value;
                        
}
                    
else
                        
{
                            #region Report
internal corruption
                            
if (System.Diagnostics.Debugger.IsAttached)
                            
{
                                System.Diagnostics.Debugger.
Break();
                            
}
                            
throw new ApplicationException("Internal collection corruption detected in "
                                
+ "OrderedHashList indexer. Key (" + key.ToString() + ") was found, but "
                                
+ "the associated entry was null.");
                            
#endregion
                        }
                    }
                    
else
                    
{
                        
return default(V);
                    
}
                }
            }
        }
        #endregion
        #endregion

        #region
Private Helpers
        #region IndexOfArrivalOrder
(and helper)
        
private int IndexOfArrivalOrder(int arrivalOrder, bool closeEnough)
        
{
            
// already locked by caller
            
if ((arrivalOrder == 0) &amp;&amp; (closeEnough))
            
{
                
return 0;
            
}
            
else if (this._keysInArrivalOrder.Count == 0)
            
{
                
return -1;
            
}
            
else
            
{
                
this.PurgeDeletedKeysInArrivalOrder();

                
int depth = 0;
                
int index = -1;
                
index = this.IndexOfArrivalOrder_Helper(depth, arrivalOrder, closeEnough, 0,
                    
this._keysInArrivalOrder.Count - 1);
                
return index;
            
}
        }

        
private int IndexOfArrivalOrder_Helper(int depth, int arrivalOrder, bool closeEnough,
            
int fromIndex, int toIndex)
        
{
            
// already locked by caller

            
#region Detect stack overflows
            
if (depth > 32)
            
{
                
if (System.Diagnostics.Debugger.IsAttached)
                
{
                    System.Diagnostics.Debugger.
Break();
                
}
                
throw new ApplicationException("Stack overflow condition detected in "
                    
+ "OrderedHashList.IndexOfArrivalOrder_Helper.");
            
}
            #endregion

            
int testIndex = (fromIndex + toIndex) / 2;
            
K testKey;
            
try
            
{
                testKey
= this._keysInArrivalOrder[testIndex];
            
}
            #region
catch ...
            
catch (Exception ex)
            
{
                
if (System.Diagnostics.Debugger.IsAttached)
                
{
                    System.Diagnostics.Debugger.
Break();
                
}
                
throw new ApplicationException("OrderedHashList.IndexOfArrivalOrder_Helper had an exception "
                    
+ "retrieving index " + testIndex.ToString() + " from key list (count: "
                    
+is._keysInArrivalOrder.Count.ToString() + "): " + ex.Message, ex);
            
}
            #endregion

            
if (this._entriesByKey.ContainsKey(testKey))
            
{
                OrderedHashListEntry entry
= (OrderedHashListEntry)this._entriesByKey[testKey];
                
if (entry != null)
                
{
                    
if (entry.ArrivalOrder == arrivalOrder)
                    
{
                        
return testIndex;
                    
}
                    
else if (fromIndex == toIndex)
                    
{
                        
if (closeEnough)
                        
{
                            
return fromIndex;
                        
}
                        
else
                        
{
                            
return -1;
                        
}
                    }
                    
else
                    
{
                        
if (entry.ArrivalOrder > arrivalOrder)
                        
{
                            
return this.IndexOfArrivalOrder_Helper(depth + 1, arrivalOrder, closeEnough,
                                
fromIndex, Math.Max(fromIndex, testIndex - 1));
                        
}
                        
else
                        
{
                            
return this.IndexOfArrivalOrder_Helper(depth + 1, arrivalOrder, closeEnough,
                                
Math.Min(testIndex + 1, toIndex), toIndex);
                        
}
                    }
                }
                
else
                
#region Report internal corruption
                {
                    
if (System.Diagnostics.Debugger.IsAttached)
                    
{
                        System.Diagnostics.Debugger.
Break();
                    
}
                    
throw new ApplicationException("Internal collection corruption detected in "
                        
+ "OrderedHashList.IndexOfArrivalOrder_Helper.  Key (" + testKey.ToString()
                        +
") was found, but the associated entry was null.");
                
}
                #endregion
            }
            
else
            
{
                #region Report
internal corruption
                
bool brute = false;
                
foreach (DictionaryEntry entry in this._entriesByKey)
                
{
                    
if (entry.Key.Equals(testKey))
                    
{
                        brute
= true;
                        
break;
                    
}
                }
                
bool second = this._entriesByKey.ContainsKey(testKey);

                
if (System.Diagnostics.Debugger.IsAttached)
                
{
                    System.Diagnostics.Debugger.
Break();
                
}
                
throw new ApplicationException("Internal collection corruption detected in "
                    
+ "OrderedHashList.IndexOfArrivalOrder_Helper.  No entry for key (" + testKey.ToString() + ")."
                    
+ "brute: " + brute.ToString() + ", second: " + second.ToString());
                
#endregion
            }
        }
        #endregion

        #region PurgeDeletedKeysInArrivalOrder
        
private void PurgeDeletedKeysInArrivalOrder()
        
{
            
// Already locked by caller
            
if (this._deletedKeysInArrivalOrder.Count > 0)
            
{
                List
<K> purgedKeysInOrder = new List<K>();
                
for (int index = 0; index < this._keysInArrivalOrder.Count; index++)
                
{
                    
if (!this._deletedKeysInArrivalOrder.ContainsKey(this._keysInArrivalOrder[index]))
                    
{
                        purgedKeysInOrder.Add
(this._keysInArrivalOrder[index]);
                    
}
                }
                
this._keysInArrivalOrder = purgedKeysInOrder;
                
this._deletedKeysInArrivalOrder.Clear();
            
}
        }
        #endregion
        #endregion
    }
    #endregion


For the latest source of this construct, visit the Kimball Method Slowly Changing Dimension component on CodePlex, and retrieve the OrderedHashList.cs file from the source.