I promise that the second installment of Using Validate and ReinitializeMetaData is coming soon - my past weeks have been hairy, and I hope to have things smoothed out soon.  That topic is quite involved, so I need to have a clear mind to tackle it...
In the meantime, it seems like this gets asked every day in the MSDN SSIS forums.  OK - maybe not every day, but at least every week.  There's plenty of information about it out there, but it's apparently not getting read.  Here's my two cents, and a hope it gets placed high up in Google searching so that people can find it.  I've also phrased it as simply as I can, also in the hope this is the last place people need to look.  I talk about rules and tools, and finally, there are PLENTY of examples at the end.  If you're looking for how to convert numbers to dates, look here.
The Rules
This is taken straight from Books Online in regards to casting strings to dates: "When a string is cast to a DT_DATE, or vice versa, the locale of the transformation is used. However, the date is in the ISO format of YYYY-MM-DD, regardless of whether the locale preference uses the ISO format."[1]
Not exactly crystal clear - but my experience translates this into: "When a string is cast to a DT_DATE, or vice versa, the locale of the transformation is used.  However, conversions from string to date will correctly use the ISO format of YYYY-MM-DD regardless of the locale."
So, you can rely on the vagaries (uncertainty) of your development machines and servers' locales, or you can do the smart thing by making everything clear, and specifically, deliberately, reformat your strings into ISO format before you cast to a date type.
Oh, and to be clear, the Derived Column component is the place to do this reformatting - NOT the Convert Data component!  Although the Convert Data component works just fine and performs better, it doesn't give you any of the flexibility of the Derived Column transformation for dealing with the crazy formats you can get dates in - it expects an unambiguous format, and if it doesn't get one, it will guess.
Basic Tools for Reformatting Strings to ISO Format
So - it now becomes clear that converting strings to dates really has very little to do with date formats or date types, and everything to do with string manipulation.  If you're converting from numeric types, say an integer value of 20081231 (for Dec 31, 2008), read on - it's coming next - or see this post.  Here are the tools you're going to use for string manipulation:
SUBSTRING (on MSDN)
This is by far the most common tool for reformatting strings to a date format.  It's most easy to use when you know that your incoming date string is of a specific fixed-length format, but mixed with FINDSTRING and LEN, it's the tool to use with variable length strings as well. 
The function expects you to give it a string in the first argument (called "character expression"), a starting point in the second, and a length in the third.  The first argument can't be NULL - you need to protect against that if it's a possibility in your data flow (see ISNULL later).  You also need to protect against string values that are too "short" - because if your second argument to the SUBSTRING function refers to a position past the end of the string, you'll get a runtime error (see LEN later). 
The "start" argument is a ones-based index, so to get the month out of an ISO formatted date (YYYY-MM-DD), the "start" argument should be 6. 
The "length" argument must be specified, and be greater than zero (if you happen to use a formula with FINDSTRING to get it).
SUBSTRING will return a string starting at the start index, with the specified number of characters (which includes that start character).
Padding With Spaces or Zeroes
Regardless of the incoming format of the date, in order to get it into a date, you have to match the YYYY-MM-DD ISO format.  This means that a format of YYYY-M-D won't cut it.  You have to pad out the string value you're casting to the fixed-length format.  Contrary to what you may think, you do not have to pad it with zeroes - it's quite capable of transforming "2008-1 -10" into January 10th, 2008.
With that in mind, how do you "pad" with the SSIS expression language?  Most of the time, it's not necessary.  Frequently, your input data will have fixed length strings that you're just rearranging.  Occasionally, your source will have number values, and those pad themselves when cast to a string - for example if you have "month" as an integer column, including (DT_WSTR, 2)[month] in an expression will result in a two-character string, padded with spaces.
Your only need to pad comes with variable-length input strings, and all you need to do is use string concatenation and the RIGHT function.  For example, use "2008-1-10" as a sample variable length input referring to January 10th, 2008.  When you extract the "month" component (see FINDSTRING later), you'll get "1".  To turn that into a two-character string whether it's a single or double character result, simply use RIGHT("0" + [month string], 2).  I used a zero to left pad that, as it's more visible in your code than a space - but either will work fine.
FINDSTRING (on MSDN)
Used for those times when your input string is of variable length, most typically seen when an input string is in a specific format (like ISO) - except that it isn't "padded" with zeroes.  For example: "2008-1-10" for January tenth, 2008.
FINDSTRING expects an input string as the first argument - which will most likely be your raw input column.  Again, just like SUBSTRING you have to protect against NULLs (see ISNULL later), and "too short" strings (see LEN later).
The second argument is the substring you're "looking for" - typically a delimiter of some sort, and also typically a single character.
The third argument is optional - the "occurrence" you're looking for.  In the above example ("2008-1-10"), we could use this argument to get the position of the second dash, by passing "2" as the "occurrence" argument when looking for "-".
FINDSTRING will return the character index (ones-based) of the found character, or zero if it isn't found.  Keep in mind that you'll typically be using FINDSTRING to provide SUBSTRING's second or third arguments (start and length) - and neither of those accept a zero value.
LEN (on MSDN)
When you use FINDSTRING, you'll also typically have to drag out LEN to help - and vice versa.  LEN also comes in handy for handling bad data.  LEN, as with all the string functions, can't handle NULLs, so you have to check for that first (see ISNULL later).
First, I'll describe using LEN in variable length data with FINDSTRING.  If we use the same example as in FINDSTRING ("2008-1-10"), you'll soon see that in order to pull out the "10" value, we use FINDSTRING to find out where it starts (the second dash), but we also need to supply the length of substring we want to retrieve to SUBSTRING.  In the example, this is NOT strictly necessary.  We could simply pass "2" to SUBSTRING, even if our example was "2008-1-1", and we'd get a legal (and working) expression.  However, there are cases where there may be dummy characters at the end of your string, and then LEN can become necessary.
Second, we may need to do some quality checking of the incoming data to reject or otherwise "handle" invalid date strings.  LEN works very well for this - either in detecting blanks (LEN([col]) = 0), or obviously invalid values (LEN([col]) < minimum length of a valid date string).
The only argument to LEN is the string whose length you need to know, and the return value is the string's length.
REPLACE (on MSDN)
Another common issue in converting strings to dates is with junk characters, or "named" months.
If your incoming date string contains some characters that sometimes occur, but that you want or need to discard, a common technique is to use REPLACE to replace them with nothing (rather than use FINDSTRING and SUBSTRING to detect and replace them).
To handle month names, you may be more comfortable with nesting REPLACE functions instead of having a rather large nested conditional operator statement (see Conditional Operator later).
Other Tools and Techniques
For those of you who are converting other types into dates (numbers or month names, for example), or who have unreliable incoming data, these other expression functions and techniques will help.
ISNULL (on MSDN)
This is a common function when dealing with unreliable data - as it weeds out values that will cause errors in the string functions you want to use for valid data.  You can use the ISNULL function to detect NULLs in a Conditional Split component before you pass the flow to your Derived Column component.  Or, in conjunction with the Conditional Operator expression, you can use it inside your Derived Column expression to supply a default value for NULLs.  The ISNULL function simply takes an expression and returns true if the expression evaluates to NULL, and false otherwise.
Conditional Operator (on MSDN)
The Conditional Operator ("?") is the SSIS way of expressing "IF".  It's actually extremely familiar to VB users if they think in terms of an IIF, or C# users' ? ternary operator.  There are three "arguments" (parts, really) to a Conditional Operator expression - and I recommend enclosing each part in parentheses, as well as the whole thing in parentheses to separate everything clearly.
The operator is used in this format: (condition) ? (true expression) : (false expression). 
The "condition" argument must evaluate to true or false.  Typically, the condition is going to be ISNULL([date string]), or a comparison to a literal string such as [date string] = "October".
The "true expression" and "false expression" can evaluate to any data type you want - but in our situation, that's most likely to be some kind of string expression that will later get converted into a date type.
Nesting Functions
It is quite common to have to nest the above functions and techniques deeply - to the point of being shocking if you don't have to at all.  "Nesting" means using the results of some of the above functions as arguments for more functions.  A common occurrence is using the results of FINDSTRING and LEN to supply the second and third arguments to SUBSTRING.  Since it's quite a common requirement, here are some tips to use nesting reasonably:
* Use parentheses.  It's maddeningly unclear to me - especially in Conditional Operator use - if parentheses are not used to group up the parts.
* Don't nest too far.  The editor for expressions in the Derived Column component isn't great - it's a single line scroller.  If your expression is complex and nested deep, then debugging it will be very hard.  It's a judgement call on your part as to how complicated you let it get.  What's your alternative?  Cascading.
Cascading Derived Column Components
Instead of having a thousand character expression to decode a single column, or repeating the same SUBSTRING or FINDSTRING functions three or four times, you might want to cascade Derived Column components.
When I say cascading, I'm referring to placing two (or more) Derived Column transformations on your data flow one after the other.  The output from the first Derived Column transformation is input into the second, and so on.  With this technique, you can decompose a VERY complex reformatting problem into a very simple step by step approach. 
If you are reformatting a variable length string, this can be quite effective.  Use the first Derived Column transform to create columns that contain the "answers" to your FINDSTRING questions.  For our running example of "2008-1-10", you can create a column called "month starts at index" that gets filled with FINDSTRING([date string], "-") + 1, and a column called "month ends at" that gets filled with FINDSTRING([date string], "-", 2) - 1.  Your second Derived Column transform would then have a very easy expression for SUBSTRING to retrieve the month value.
If you have to reformat month names into numbers, this technique can also help quite a lot.  Have your first Derived Column transform to the conversion of month names to numbers, using nested REPLACEs or Conditional Operators, and have the second Derived Column transform use the results to reformat the date string.  The net effect doesn't reduce the size of the total decoding expression, but it does separate the month name decoding into one chunk which can be easily debugged independently of the rest of the reformatting logic.
Error Redirection
In some cases, you'll have to deal with really cruddy data.  As always, you should gracefully handle problems that your reformatting doesn't anticipate.  Use the error output to log or otherwise handle your errors - it's the smart thing to do.
Regular Expressions
This tool is for particularly nasty or very dirty/untrustworthy data.  Unfortunately, the Derived Column component doesn't have any regular expression support in it.  But luckily, there are several other custom components that do.  The SSIS Community Tasks and Components project at CodePlex has links to several regex components.  As of this updated post, there's one from Microsoft and two from SQLIS.com (aka Konesans) that are free to use.
The Last Step - Conversion
After all that work to reformat your string (or number(s)) into ISO format, the last step is to convert that string into a date.  Any date type in SSIS will work, but I typically use DT_DBTIMESTAMP, as that's equivalent to the standard MS SQL Server "DateTime" datatype.  Perform that conversion like this:
(DT_DBTIMESTAMP)[ISO Format String Column]
Conversion Examples
The given examples all assume your date is coming in a single string-type column called "date string", unless otherwise specified.  All the examples also convert the incoming values into DT_DBTIMESTAMP - although you can interchange that to use any SSIS date type.
Consistent "YYYYMMDD" format
Easy:
(DT_DBTIMESTAMP)(SUBSTRING([date string], 1, 4) + "-" SUBSTRING([date string], 5, 2) + "-" + SUBSTRING([date string], 7, 2))
Consistent "MM/DD/YYYY" format
(DT_DBTIMESTAMP)(SUBSTRING([date string], 7, 4) + "-" SUBSTRING([date string], 1, 2) + "-" + SUBSTRING([date string], 4, 2))
"MM/DD/YYYY" format, sometimes NULL
(ISNULL([date string])) ? NULL(DT_DBTIMESTAMP) : (DT_DBTIMESTAMP)(SUBSTRING([date string], 7, 4) + "-" SUBSTRING([date string], 1, 2) + "-" + SUBSTRING([date string], 4, 2))
"MM/DD/YYYY" format, sometimes NULL, sometimes empty or truncated
(ISNULL([date string])) ? NULL(DT_DBTIMESTAMP) : (LEN([date string]) < 10) ? (DT_DBTIMESTAMP)"1900-01-01" : (DT_DBTIMESTAMP)(SUBSTRING([date string], 7, 4) + "-" SUBSTRING([date string], 1, 2) + "-" + SUBSTRING([date string], 4, 2))
Variable length "MM/DD/YYYY" format
(DT_DBTIMESTAMP)(SUBSTRING([date string], FINDSTRING([date string], "/", 2) + 1, 4) + "-" + RIGHT("0" + SUBSTRING([date string], 1, FINDSTRING([date string], "/") - 1), 2) + "-" + RIGHT("0" + SUBSTRING([date string], FINDSTRING([date string], "/") + 1, FINDSTRING([date string], "/", 2) - FINDSTRING([date string], "/") - 1), 2))
"MMM DD, YYYY" format
Nested Conditional Operator solution:
(DT_DBTMESTAMP)(SUBSTRING([date string], 9, 4) + "-" + (UPPER(SUBSTRING([date string], 1, 3)) = "JAN" ? "01" : UPPER(SUBSTRING([date string], 1, 3)) = "FEB" ? "02" : UPPER(SUBSTRING([date string], 1, 3)) = "MAR" ? "03" : UPPER(SUBSTRING([date string], 1, 3)) = "APR" ? "04" : UPPER(SUBSTRING([date string], 1, 3)) = "MAY" ? "05" : UPPER(SUBSTRING([date string], 1, 3)) = "JUN" ? "06" : UPPER(SUBSTRING([date string], 1, 3)) = "JUL" ? "07" : UPPER(SUBSTRING([date string], 1, 3)) = "AUG" ? "08" : UPPER(SUBSTRING([date string], 1, 3)) = "SEP" ? "09" : UPPER(SUBSTRING([date string], 1, 3)) = "OCT" ? "10" : UPPER(SUBSTRING([date string], 1, 3)) = "NOV" ? "11" : UPPER(SUBSTRING([date string], 1, 3)) = "DEC" ? "12" : "00") "-" + SUBSTRING([date string], 5, 2))
Cascading Derived Column transform solution (taken to an extreme - you don't have to go this far):
Derived Column transform #1
create a "year" column from this expression: SUBSTRING([date string], 9, 4)
create a "month name" column from this expression: UPPER(SUBSTRING([date string], 1, 3))
create a "day" column from this expression: SUBSTRING([date string], 5, 2)
Derived Column transform #2
create a "month number" column from this expression: ([month name] == "JAN") ? "01" : ([month name] == "FEB") ? "02" : ([month name] == "MAR") ? "03" : ([month name] == "APR") ? "04" : ([month name] == "MAY") ? "05" : ([month name] == "JUN") ? "06" : ([month name] == "JUL") ? "07" : ([month name] == "AUG") ? "08" : ([month name] == "SEP") ? "09" : ([month name] == "OCT") ? "10" : ([month name] == "NOV") ? "11" : ([month name] == "DEC") ? "12" : "00"
Derived Column transform #3
(DT_DBTIMESTAMP)([year] + "-" + [month number] + "-" + [day])
Numeric Date YYYYMMDD (One Column) format
These solutions assume the single column containing the integer date is called "numeric date"
Nested function solution:
(DT_DBTIMESTAMP)((DT_WSTR, 4)ROUND([numeric date] / 10000, 0) + "-" + (DT_WSTR, 2)(ROUND([numeric date] / 100, 0) - (ROUND([numeric date] / 10000, 0) * 100)) + "-" + (DT_WSTR, 2)([numeric date] - (ROUND([numeric date] / 100, 0) * 100)))
Conversion to string solution:
(DT_DBTIMESTAMP)(SUBSTRING((DT_WSTR, 8)[numeric date], 1, 4) + "-" + SUBSTRING((DT_WSTR, 8)[numeric date], 5, 2) + "-" + SUBSTRING((DT_WSTR, 8)[numeric date], 7, 2))
Cascading Derived Column solution (taken to an extreme, agan - you don't have to go this far):
Derived Column transform #1
create a "year" column from this expression: ROUND([numeric date] / 10000, 0)
Derived Column transform #2
create a "month" column from this expression: ROUND([numeric date] / 100, 0) - ([year] * 100)
Derived Column transform #3
create a "day" column from this expression: [numeric date] - ([year] * 10000) - ([month] * 100)
Derived Column transform #4
(DT_DBTIMESTAMP)((DT_WSTR, 4)[year] + "-" + (DT_WSTR, 2)[month] + "-" + (DT_WSTR, 2)[day])
Questions?
If you have any questions about how to convert your particular data into a date, please post a comment here, or over at the SSIS forums.  If you have any other techniques for date conversion, suggest them in a comment, and I'll try to fit them in to the above.  I hope this helps you guys, and reduces the post frequency about this topic in the forums!
References
[1] Taken from Cast (SSIS), about halfway down, just below the table that's below the big legal/illegal conversion chart.
UPDATE 2008-11-18: Clarified that the Derived Component transform is typically preferable to the Convert Data transform.
UPDATE 2009-01-23: Added Regular Expression tool information
UPDATE 2009-12-08: See Jamie Thomson's blogs on ambiguous date/time formats (original, repackaged w/ SQL 2008 info)
UPDATE 2010-08-04: The SSIS Team posted another tidbit on this subject - see how your choice of data type can also mess you up.
44

View comments

Loading