Tuesday, May 31, 2011

Use Connections Properly in an SSIS Script Task

There are lots of scenarios in SSIS where you'd want to use a connection within a Script Task.  Unfortunately, the closest help documentation - the comments inside the Script Task itself - give you exceptionally poor advice:
To use the connections collection use something like the following:
ConnectionManager cm = Dts.Connections.Add("OLEDB");
cm.ConnectionString = "Data Source=localhost;Initial Catalog=AdventureWorks;Provider=SQLNCLI10;Integrated Security=SSPI;Auto Translate=False;";

Why is this advice bad?  Let me count the ways:
  1. The connection string is hardcoded inside the script.
  2. The connection can't be used with Package Configurations.
  3. The connection string (and name) imply that OLEDB connections can be used.
  4. As presented, the connection won't participate in MSDTC transactions.
(And seriously, "use something like the following"?  Hey - I write the vague blog posts here.  They're supposed to be writing documentation. ;) )
The poor advice boils down to the fact that it appears to create a "managed" connection, but doesn't truly do so.  In fact, if you attempt to use code in the comment as is - you're only halfway there.  How do you actually use this "ConnectionManager" object anyway?  There's no "Open" method, or Command object - just weird methods and properties that would be entirely unfamiliar to you even if you know .Net data access!  And to add insult to injury, even if you did know where to go from here, the example provided will only cause an error, and will never work.
What is a Managed Connection?
Lets start with describing what a "managed" connection is (in the context of SQL Server Integration Services).  It is simply a connection that's defined and controlled by an entry in the "Connection Managers" window at the bottom of the design surface in BIDS.
Managed connections have the following benefits:
  • Their properties (primarily the connection string) can be manipulated with Package Configurations.
  • Their properties (again, the connection string) can be manipulated with Property Expressions.
  • They can participate in SSIS (MSDTC) transactions.
  • The connection is tested by default when the package is loaded, before any (potentially unrecoverable) work is started.
But perhaps most importantly, using a managed connection within a script (properly) informs someone maintaining the package that a connection to an external resource is being used.
How is the Advice Wrong?
The comment in the Script Task counsels you to use Dts.Connections.Add to get a reference to a ConnectionManager object.  This is the correct type of object - it's a managed connection.  Unfortunately, using that syntax, you're creating a new one, not using one of the managed connections you've already defined in the package.  Since it's an entirely new connection, you have to set up the connection string.  You could pass the connection string in to the script using an SSIS variable, or read it from a file - but these are both non-standard practices.  Another ETL developer or administrator familiar with SSIS won't anticipate your decision to implement a connection that way.
Additionally, since it's a brand new object, none of the regular design time facilities are available to you; Package Configurations, Property Expressions, the TransactionOption property, and runtime validation don't work.
The Right Way to Use Managed Connections
Ignore the comments in the Script Task - use managed connections properly by following these steps.
1. Create an ADO.Net Connection
Add a Connection Manager to your package the same way you'd make one for use in a Data Flow or Execute SQL Task.  Right-click in the Connection Managers pane and add a new ADO.Net connection.  You can use OLEDB type connections and other connection types - but they're more difficult to use and have fewer capabilities inside a Script.
2. Retrieve a Reference to the Connection
Inside your Script Task, instead of using Dts.Connections.Add, retrieve a reference to an existing connection manager by using the Dts.Connections indexer.  For example, if you happen to have a connection manager named "(local).master", then use this code to retrieve a reference to the ConnectionManager object:
ConnectionManager cm = Dts.Connections["(local).master"];
3. For Connections with Providers, Acquire a Connection
(A "provider" means any connection that has code supporting and managing the connection - like an ADO, OLEDB, or Excel connection, but unlike a Flat File connection.)
Even though you now have a reference to the ConnectionManager object, you need to acquire an actual connection to the resource, and request that connection be established in a way that SSIS can manage it.  This step is necessary to ensure that your work with the connection participates in transactions and connection pooling (or lack of it).
The specific details for each connection type vary, but the general method is consistent.
// For an ADO.Net ConnectionManager using a SqlClient provider
System.Data.SqlClient.SqlConnection sqlConn = (System.Data.SqlClient.SqlConnection)cm.AcquireConnection(Dts.Transaction);
// For an ADO.Net ConnectionManager using an ODBC provider
System.Data.Odbc.OdbcConnection odbcConn = (System.Data.Odbc.OdbcConnection)cm.AcquireConnection(Dts.Transaction);
Be aware that it is NOT necessary (or advisable) to use the Open method on the connection object you just retrieved... it's already open.  (Check the connection's State property if you don't believe me.)
4. Use the Connection
At this point, you can use the connection to retrieve datasets, execute commands, or do just about anything else you'd want to do with your connection.
5. Release the Connection Properly
Once you've finished your work, you need to clean up after yourself - but NOT by using the Close method on the connection!  You'll need to use the ReleaseConnection method of the ConnectionManager instead.
How Hard Was That?
Hopefully not so hard as to make you want to go the other direction and use completely unmanaged connections in your scripts!  To recap, here's a complete code listing of how to correctly use an ADO.Net managed connection for a SqlClient provider within a Script Task:
bool fireAgain = true;
ConnectionManager cm;
System.Data.SqlClient.SqlConnection sqlConn;
System.Data.SqlClient.SqlCommand sqlComm;
int rowsAffected;

// Retrieve the reference to the managed connection
cm = Dts.Connections["(local).master"];
// Request an open connection
sqlConn = (System.Data.SqlClient.SqlConnection)cm.AcquireConnection(Dts.Transaction);
Dts.Events.FireInformation(0, "", "Connection is: " + sqlConn.State.ToString(), "", 0, ref fireAgain);
// Do your work
sqlComm = new System.Data.SqlClient.SqlCommand("UPDATE YourTable SET YourColumn = 'SomeValue'", sqlConn);
rowsAffected = sqlComm.ExecuteNonQuery();
// Inform SSIS you're done your work
cm.ReleaseConnection(sqlConn);
           
Dts.Events.FireInformation(0, "", rowsAffected.ToString() + " rows updated.", "", 0, ref fireAgain);
           
Dts.TaskResult = (int)ScriptResults.Success;

Tuesday, May 24, 2011

Idiot's Guide to SSAS Attribute Relationships

Yes, that's me - and this is a reference post because frankly, this stuff is too complicated to keep in my head for how frequently I use it.
The Problem

The SSAS Idiot
When you get the "Errors in the OLAP storage engine: The attribute key cannot be found when processing." error returned when you're trying to process an SSAS cube, what does that really mean?  I have a previous post where I went into some more technical reasons that can cause this error, but there's usually a pretty simple cause related to how you've structured your dimension in the designer... your attribute relationships. Attribute Relationships
Specifying good attribute relationships are something that the experts say are a very basic and necessary step to configuring a cube that performs well.  Attribute relationships tell SSAS that it can take some shortcuts when processing your cube.  What kind of shortcuts?  Something like telling it that since "British Columbia", "Alberta", "Saskatchewan", and all the other provinces are always in "Canada", if you (or a user) asks for a total sales number for Canada, it doesn't have to add up all the invoice line items for Canada - it can go add up the numbers it's stored already for total sales in each province.
So yes, it's very good to specify attribute relationships!  It can turn a million row operation into a ten row operation, which results in better query performance, reduced storage requirements, faster cube processing, and more.  It's a best practice to relate attributes that you've placed in a dimension Hierarchy - because they're going to be used in aggregations very often.
How Are Attribute Relationships Related to the Problem? (Pun Intended)
In the example of provinces and countries I provided, it seems like you'd never have a problem - it's straightforward and simple, right?  Not so fast.  You could have bad data... or not such a simple relationship.  We'll look at both in turn.
What do I mean by "bad data"?  I mean that your data warehouse could have business entities that have bad address information delivered to it by your OLTP system(s).  (You're shocked, I know.)  Bad how?  Perhaps address information is incomplete - it only has the country, they didn't bother to fill in the street, city, or province.  Bad, bad, OLTP data entry validation!  But it happens... all too often.  This means that a "blank" province could be located in multiple countries.
What do I mean by "not a simple relationship"?  Let's take another example - cities.  There's only one Vancouver, right?  The one in British Columbia, Canada.  Not so fast... Vancouver, WA would be having words about that assertion.  Other city names are much more prevalent, and easily demonstrate this particular issue.
Deciphering SSAS's Error Message
OK - so you understand the problem when I explain it with my example... but how do you decipher SSAS's message to figure out how your attribute relationships and your data are causing problems?  Here's the long explanation - I'll cover it again more briefly at the conclusion of the post.  My error message looks like this:
Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'dbo_vDIM_CUSTOMER', Column: 'ship_to_city', Value: 'Greenland'. The attribute is 'Ship To City'.
To genericize it, I'll replace the actual values with placeholder names that I'll use going forward:
Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'DimensionTable', Column: 'AttributeColumn', Value: 'AttributeValue'. The attribute is 'AttributeName'.
Find the "Source Attribute"
Breaking it down, this message means that the attribute in your SSAS dimension called AttributeName, and that's where the problem is.  We'll refer to this as the Source Attribute.
Find the "Related Attribute"
Open BIDS open the dimension that didn't process, and switch to the Attribute Relationships tab - the one with the graph that looks like this (based on my example):
(Now, why the "Ship To City" attribute is in a separate box from the "Dim Customer Key" attribute, but the "Ship To State Province Code" isn't is not something I can explain, so I won't.)  Looking at the list view (down below), I see this:
The important things to notice is that the Source Attribute (in my case) was Ship To City, and it's being "rolled up" or "is contained in" the Ship To State Province Code.  In your case, you need to find the Source Attribute in your relationship graph or list, and find out what other attributes it's related to.  (There might be more than one relationship where your Source Attribute is on the left-hand-side of an arrow.  You'll have to investigate each relationship individually.)
Find the "Source Column" in the Dimension Table
Your Source Attribute came from somewhere in your underlying dimensional database - find out where.  You're given a hint in the error message - the AttributeColumn.
Find the "Related Column" in the Dimension Table
Just like the Source Attribute, the Related Attribute comes from a dimensional database table - find out where.  You'll likely have to switch to the Dimension Structure tab of the dimension editor, select the Related Attribute, open the Properties window (F4), and look at the KeyColumns property.
Now you're ready to query your data.
Querying Your Data
You now have three critical pieces of information: the Source Column, Related Column, and AttributeValue.  In my case, the Source Column was "ship_to_city" in my vDIM_CUSTOMER view, the Related Column was "ship_to_state_province_code" in the same view, and the offending AttributeValue was 'Greenland'.  To find out why SSAS is having a problem, I need to run this query:
SELECT ship_to_city, ship_to_state_province_code, COUNT(*)
FROM vDIM_CUSTOMER
WHERE ship_to_city = 'Greenland'
GROUP BY ship_to_city, ship_to_state_province_code

To genericize the query and relate it to those placeholder names we've been using, it should look like this:
SELECT [Source Attribute], [Related Attribute], COUNT(*)
FROM [DimensionTable]
WHERE [Source Attribute] = AttributeValue
GROUP BY [Source Attribute], [Related Attribute]

When I run that, I got 2 rows back - it shows quite clearly what my problem was:
Yep - OLTP addresses where sometimes the city was specified on a customer record, but the state or province wasn't.  In your case, you should get at LEAST two rows from your query.  If you don't - then this isn't your problem... maybe you have one of these problems instead.
Knowing Is Half The Battle
Fantastic, now that you know where the problem is, you can fix it.  How?  One of two ways:
  1. Decide it's a "data problem" and rectify the underlying data.  Tell a business user to fix the offending address, tell the OLTP developer to add validation to customer address entry, and/or change your ETL process to clean the data.
  2. Decide the data will never get "clean" - because nobody will "own" the dirt, or because it's a valid part of the business.  This happens very often.
In my case, the data you see is dirty.  But the offending business data is very old, and not worth cleaning up.  If I did spend the time to clean up the blanks, I would still be unable to process my cube, because I would find "Springfield" in multiple states, etc...  So even though it is dirty, it is also a business reality that cities don't uniquely belong to state/provinces.
The Other Half - Fixing The Attributes
So if I can't clean the data because it is (mostly) how it's supposed to be... how can I model it in SSAS?  I recognize that certain cities are related to multiple states and provinces... but I still want to (and should) help SSAS optimize it's processing.
I'm sure there are several ways (there always are) - but the following works for me.  If an SSAS expert would like to chime in, please do!
Recognize That "AttributeColumn" Isn't Unique
We have to recognize that the Source Attribute doesn't uniquely identify the Reference Attribute.  Just because I say "Vancouver" doesn't always mean I'm referring to the city in British Columbia.  I'm going to have to uniquely identify cities in order to roll them up into states and provinces, just like you'll have to uniquely identify Source Attributes in order to roll them up into Reference Attributes.
How do we do that?
Inside BIDS, open the Dimension and select the Dimension Structure tab.  Select the Source Attribute from the list and open the Properties window (F4). 
Select the KeyColumn property, and click the ellipsis button (...) to open the Key Column selection dialog.
Pick another column that together with the Source Attribute will uniquely identify a Reference Attribute.  I've chosen to select the Reference Attribute column itself.  You may choose to select another column or columns that will uniquely identify the Reference Attribute.
Since we've just identified two columns to be the key for this attribute, SSAS can't use the default "name" for attribute members (the KeyColumn itself).  It doesn't know how to display specific members of this attribute - you've got to tell it by editing the NameColumn property - just under the KeyColumns property we just changed.  You may decide to simply choose the Source Attribute column (as I did).
The dimension will now successfully process... unless you've got another "duplicate attribute"!
Success! Let's Recap
The short version of what we just did was:
We deconstructed the error message to identify the Source Attribute
We examined the Attribute Relationship tab of the dimension to identify the Reference Attribute
We verified that there was a many-to-one relationship between the Source and Reference Attribute
We made the Source Attribute map one-to-one to the Reference Attribute by including the reference column in the KeyColumns for the Source Attribute.
What I don't know is whether this is the "best" general advice to give, or if there's a better rule of thumb for resolving this issue.  Until I do know, I'll be referring back to this step-by-step quite often, I'm sure...

Wednesday, May 18, 2011

How To: Use the Same Data Flow Column Twice As a Parameter in SSIS

Every once in a while, you'll have a slightly more complex UPDATE statement in an OLE DB Command or Destination.  You'll use an UPDATE statement or a destination table that needsto use data from one column in your data flow several times.  A typical example is a range update:
UPDATE table1
  
SET column1 = ?
  WHERE datecolumn1 <= ?
    AND datecolumn1 > ?

Yes, best practices say that you should probably avoid using the OLE DB Command entirely, but if you aren't updating many records and you don't mind the poor interface, it does get the job done.
What You Can't Do
The OLE DB Command interface asks for the statement (like above) that has question marks as placeholders for the values you're going to pass in.  The OLE DB and ADO.Net Destinations allow you to identify tables, or craft a SELECT statement to identify a table.  They also have a interface for associating those numbered parameters or destination columns to the columns you have available in the data flow.  One of the many faults of this interface is that it doesn't permit two parameters/destination columns to be associated with one column in the data flow - you just can't do it.
Another Splice Won't Hurt...
What You Can Do So if the interface only allows 1:1 associations of parameters in your statement to columns in the data flow... then you just need to duplicate a column in the data flow.  Add a Copy Column transform, or a Derived Column transform to the design surface just prior to the OLE DB Command or Destination transform to create a second column that contains the same data.  Now you've got what you need to configure your OLE DB Command or Destination properly.

Tuesday, May 10, 2011

Runtime Interaction with SSIS Packages - Confirming Actions

In developing my data warehouse ETL processes, there are some packages and SQL Agent jobs that are both necessary to the process of deploying a new version of the ETL system - and yet are extremely dangerous.  These packages "reset" certain control infomation to a known state, which is a destructive operation I want to make sure I think about twice before executing - especially in a QA or production environment.  If I executed this kind of package in development on my local machine, I could use Script Tasks to pop up MessageBoxes to ask for this kind of confirmation.
Are You Sure?

Of course, you can't do that when you're executing packages on a server!  There are also use cases beyond initial data loads.  You may allow your SSIS packages to be callable by a wide range of people or systems... but you may want some oversight or confirmation before those packages are actually run.  Here's one way to balance convenience of access and execution with some protection against unintended execution.
The Design
My basic thought process in finding an alternative to "confirmation" message boxes was:
  • I can't have a popup... how else can the server talk to me?
  • Ah - it can send me an email.  Bonus - it doesn't matter who kicks off the job, I can hardcode an email address in there so I get the message, or I can look up an email in a "responsibility" table so the email gets sent to a designated "process owner."
  • Now how do I talk back to the server?  I can't reply to the email - it doesn't have a mailbox.
  • Ah - it can watch a share for a file to appear that contains a confirmation to continue.
  • So how do I make sure that I don't confuse the package, inadvertently acknowledging confirmation for "request #2" when I intended to confirm "request #1"?  You know how that happens every once in a while - you're editing a document, and some other program "pops up" to ask you a question... but you don't stop typing in time, and your keystrokes end up selecting something and dismissing the dialog!
  • Ah - I can require the file contents to be something... special... and unique to that package execution.  Some kind of code that would be unlikely to mistype unless I spent some serious brain power on it.  Like a GUID.
  • But I don't want to have to navigate to the share, create a new text file, and hand-transcribe a GUID.  I'll probably type it in wrong and have to restart the process again.  (Faulty humans!)
  • Ah - I can attach the file to the email, so all I have to do is save it in the required location.
Yeah - OK already.  So it takes me a while to figure this stuff out...
The Implementation
I'd put up a sample package - but I'm too lazy busy to strip out the sensitive and custom stuff from one of my packages, so I'll just describe it.
Task 1: Add a Share
Find a location on your network and create a share that you give permission to both you (as the package execution confirmer) as well as the account you're going to be executing the SSIS package under.
Task 2: Script - Generate the Confirmation File Contents
Add an SSIS variable called "Confirmation File Name" that contains the UNC path (not driver letter path) to where you want the confirmation file placed.
Pass that variable into the Script Task, along with the MachineName and ExecutionInstanceGUID system variables.  I've chosen the ExecutionInstanceGUID as my "confirmation code" - you can choose something else entirely: a passphrase, or another GUID generated by Guid.NewGuid.  (It's more secure if you generate another GUID rather than use the execution instance - someone could guess your implementation and subvert the email step if they have access to the sysssislog table.)
Paste in code somewhat like this:

System.IO.StreamWriter confirmationFile
 
    (string)Dts.Variables["User::ConfirmationFilePathname"].Value);
confirmationFile.WriteLine(
  Dts.Variables["System::ExecutionInstanceGUID"].Value);
confirmationFile.Close();
Dts.TaskResult = (int)ScriptResults.Success;
= System.IO.File.CreateText(


As you'll see in a bit, I've built the file in the exact spot that I intend to look for it... we'll take care of that.
Task 3: Send the Confirmation
Slap a Send Mail Task down, set it up to send you an email, and attach the file using the ConfirmationFilePathname variable.
Task 4: Delete the Confirmation File
As you pointed out earlier, if I "waited" for you to place that file in the location we want, it wouldn't work - because the file is already there!  So I need to delete it, making sure that I've got a clean spot for you to place your confirmation.
(You could have created the file in a separate location so this isn't an issue - but deleting it was simpler to set up.)
Task 5: Wait... and Watch
You can use a Script to do this entirely - I happen to have used my File Properties Task instead.  It's up to you, but in this step, you'll be waiting for the file to reappear.  How long you wait, and how often you check is also up to you - but I suggest you don't wait indefinitely.  Have your package fail if a certain timeout is reached.  If you use a Script to watch for the file, you'll be using System.IO.File.Exists and System.Threading.Thread.Sleep to do that.
Task 6: Read the File
If you receive your email, decide to confirm the action, and save it to the right share, your package will arrive at this step.  It's now got to open the file and check the contents with another Script Task to make sure you're responding to the proper request.  Here's one with an appropriate level of error handling:

bool fireAgain = true;
string confirmationFileName =
  (string)Dts.Variables["User::ConfirmationFilePathname"].Value;
Dts.TaskResult = (int)ScriptResults.Failure;string contents = "";
try
{
  System.IO.StreamReader confirmationFile;

  confirmationFile = System.IO.File.OpenText(confirmationFileName);
  
Dts.Events.FireInformation(0, "",

    "Confirmation file opened.",
    "", 0, ref fireAgain);
  
contents = confirmationFile.ReadLine();
  
Dts.Events.FireInformation(0, "",

    "Confirmation file contents read.",
    "", 0, ref fireAgain);
  
confirmationFile.Close();

}
catch (Exception ex)
{
  Dts.Events.FireError
(0, "",

    "Unable to open '" + confirmationFileName + "' to "
    
+ "check contents: " + ex.Message,

    "", 0);
}

if (contents == (string)Dts.Variables["System::ExecutionInstanceGUID"].Value)

{
  Dts.Events.FireInformation
(0, "",

    "Confirmation file contents validated.",
    "", 0, ref fireAgain);
  
Dts.TaskResult = (int)ScriptResults.Success;

}
else
{
  Dts.Events.FireError
(0, "",

    "Confirmation file contents failed to "
    + "match expected code.", "", 0);
}

If the confirmation file contains the right code, it will return success, otherwise it will fail with an appropriate message.
Task 7+: Do Your Dirty Work
Your package just got the green light to trash the place (or do whatever critical action was requested)!

Tuesday, May 3, 2011

Iterating Over Columns in the SSIS Script Component

Just recently a question was posed in the MSDN SSIS Forums about how to create a "generic" script component that would read all columns for each row passed through it in the Data Flow.  The standard impetus for this kind of behaviour is a logging or auditing step in the Data Flow.  Now - my first reaction is that you should probably be looking to code a custom component so that you aren't copying and pasting script code everywhere...
But I know that not everyone is comfortable coding a custom component from scratch - you have to be a code addict like me to find joy in that.  So here's a little "gateway drug" for those of you who are itching for the capability.  Perhaps a little taste of it will expose you to the script's shortcomings, and lead you to build your own lab to cook up a custom component to do this job properly.
Location, Location, Location
The big key here, as mentioned by Darren Green in an earlier post on the subject, is where to put code to do this.  It's not in one of the automatically generated stubs - you'll have to tread into slightly less familiar territory - the ProcessInput method override.  The standard stubs (PreExecute, PostExecute, and Input0_ProcessInputRow) are all useless for our purposes - you can safely delete them or ignore them.  Why?  PreExecute doesn't yet have any information about the buffer, and PostExecute is too late (obviously).  Input0_ProcessInputRow is called once per row... but with a carefully wrapped row object that presents the columns in nicely named properties.  There's no way to access the collection of columns.  Only ProcessInput has access to the buffer in a condition that allows iterating over the columns.
Paste This Code
public override void ProcessInput(int InputID, PipelineBuffer Buffer)
{
  
bool fireAgain = true;
  
ComponentMetaData.FireInformation(0, "",
    
Buffer.ColumnCount.ToString() + " columns",
    
"", 0, ref fireAgain);

  
while (Buffer.NextRow())
  
{
    
for (int columnIndex = 0;
      
columnIndex < Buffer.ColumnCount;
      
columnIndex++)
    
{
      
string columnData = null;
      
if (Buffer.IsNull(columnIndex))
      
{
        columnData
= "is NULL";
      
}
      
else
      
{
        BufferColumn columnInfo
= Buffer.GetColumnInfo(columnIndex);
        
switch (columnInfo.DataType)
        
{
          
case DataType.DT_BOOL:
            columnData
= Buffer.GetBoolean(columnIndex).ToString();
            
break;

          
case DataType.DT_WSTR:
            columnData
+= Buffer.GetString(columnIndex);
            
break;

          
// add code to support more data types here

          
default:
            columnData
= "";
            
break;
        
}
      }
      ComponentMetaData.FireInformation
(0, "",
        
"Column " + columnIndex.ToString() + ": " + columnData,
        
"", 0, ref fireAgain);
    
}
  }
  base.ProcessInput
(InputID, Buffer);
}

What's Happening In There?
It doesn't take much explanation - but every little bit helps. 
Initially, a loop has to be constructed to iterate over all the rows we're given.  Do remember that this isn't ALL the rows coming through your Data Flow - it's just one buffer's worth.  This method will get called several times (unless you have very few rows in your flow).
The column then gets checked to see if it's NULL... because NULLs cause quite a problem when you don't expect to see them.
After that, a little inquiry is made to ask for some information about the column.  This bit of code can definitely be optimized out of the loop - it is a burden on the system to ask for it for each and every row!
The reason for getting a little information about the column is apparent in the next block - the switch statement that handles different datatypes differently.  You can extend the code to handle the data types you expect.
Finally, after the loop, don't forget to call the base ProcessInput method.  Why?  Press F12 on that call and you'll see - the base code handles marking the processing as being complete when you've finished seeing all the rows.
Variations On This Theme
The above code isn't the only way to get this done.  You can do away with manually detecting the column data type and simply call ToString on the buffer's indexer - as in:
string columnData = Buffer[columnIndex].ToString();
But do still beware of nulls and other odd results.  Using the "Get" methods specific to the data type do perform faster, and are safer in the long run.