Pages

Thursday, November 13, 2008

Converting Strings to Dates in the Derived Column Component

I promise that the second installment of Using Validate and ReinitializeMetaData is coming soon - my past weeks have been hairy, and I hope to have things smoothed out soon.  That topic is quite involved, so I need to have a clear mind to tackle it...
In the meantime, it seems like this gets asked every day in the MSDN SSIS forums.  OK - maybe not every day, but at least every week.  There's plenty of information about it out there, but it's apparently not getting read.  Here's my two cents, and a hope it gets placed high up in Google searching so that people can find it.  I've also phrased it as simply as I can, also in the hope this is the last place people need to look.  I talk about rules and tools, and finally, there are PLENTY of examples at the end.  If you're looking for how to convert numbers to dates, look here.
The Rules
This is taken straight from Books Online in regards to casting strings to dates: "When a string is cast to a DT_DATE, or vice versa, the locale of the transformation is used. However, the date is in the ISO format of YYYY-MM-DD, regardless of whether the locale preference uses the ISO format."[1]
Not exactly crystal clear - but my experience translates this into: "When a string is cast to a DT_DATE, or vice versa, the locale of the transformation is used.  However, conversions from string to date will correctly use the ISO format of YYYY-MM-DD regardless of the locale."
So, you can rely on the vagaries (uncertainty) of your development machines and servers' locales, or you can do the smart thing by making everything clear, and specifically, deliberately, reformat your strings into ISO format before you cast to a date type.
Oh, and to be clear, the Derived Column component is the place to do this reformatting - NOT the Convert Data component!  Although the Convert Data component works just fine and performs better, it doesn't give you any of the flexibility of the Derived Column transformation for dealing with the crazy formats you can get dates in - it expects an unambiguous format, and if it doesn't get one, it will guess.
Basic Tools for Reformatting Strings to ISO Format
So - it now becomes clear that converting strings to dates really has very little to do with date formats or date types, and everything to do with string manipulation.  If you're converting from numeric types, say an integer value of 20081231 (for Dec 31, 2008), read on - it's coming next - or see this post.  Here are the tools you're going to use for string manipulation:
SUBSTRING (on MSDN)
This is by far the most common tool for reformatting strings to a date format.  It's most easy to use when you know that your incoming date string is of a specific fixed-length format, but mixed with FINDSTRING and LEN, it's the tool to use with variable length strings as well. 
The function expects you to give it a string in the first argument (called "character expression"), a starting point in the second, and a length in the third.  The first argument can't be NULL - you need to protect against that if it's a possibility in your data flow (see ISNULL later).  You also need to protect against string values that are too "short" - because if your second argument to the SUBSTRING function refers to a position past the end of the string, you'll get a runtime error (see LEN later). 
The "start" argument is a ones-based index, so to get the month out of an ISO formatted date (YYYY-MM-DD), the "start" argument should be 6. 
The "length" argument must be specified, and be greater than zero (if you happen to use a formula with FINDSTRING to get it).
SUBSTRING will return a string starting at the start index, with the specified number of characters (which includes that start character).
Padding With Spaces or Zeroes
Regardless of the incoming format of the date, in order to get it into a date, you have to match the YYYY-MM-DD ISO format.  This means that a format of YYYY-M-D won't cut it.  You have to pad out the string value you're casting to the fixed-length format.  Contrary to what you may think, you do not have to pad it with zeroes - it's quite capable of transforming "2008-1 -10" into January 10th, 2008.
With that in mind, how do you "pad" with the SSIS expression language?  Most of the time, it's not necessary.  Frequently, your input data will have fixed length strings that you're just rearranging.  Occasionally, your source will have number values, and those pad themselves when cast to a string - for example if you have "month" as an integer column, including (DT_WSTR, 2)[month] in an expression will result in a two-character string, padded with spaces.
Your only need to pad comes with variable-length input strings, and all you need to do is use string concatenation and the RIGHT function.  For example, use "2008-1-10" as a sample variable length input referring to January 10th, 2008.  When you extract the "month" component (see FINDSTRING later), you'll get "1".  To turn that into a two-character string whether it's a single or double character result, simply use RIGHT("0" + [month string], 2).  I used a zero to left pad that, as it's more visible in your code than a space - but either will work fine.
FINDSTRING (on MSDN)
Used for those times when your input string is of variable length, most typically seen when an input string is in a specific format (like ISO) - except that it isn't "padded" with zeroes.  For example: "2008-1-10" for January tenth, 2008.
FINDSTRING expects an input string as the first argument - which will most likely be your raw input column.  Again, just like SUBSTRING you have to protect against NULLs (see ISNULL later), and "too short" strings (see LEN later).
The second argument is the substring you're "looking for" - typically a delimiter of some sort, and also typically a single character.
The third argument is optional - the "occurrence" you're looking for.  In the above example ("2008-1-10"), we could use this argument to get the position of the second dash, by passing "2" as the "occurrence" argument when looking for "-".
FINDSTRING will return the character index (ones-based) of the found character, or zero if it isn't found.  Keep in mind that you'll typically be using FINDSTRING to provide SUBSTRING's second or third arguments (start and length) - and neither of those accept a zero value.
LEN (on MSDN)
When you use FINDSTRING, you'll also typically have to drag out LEN to help - and vice versa.  LEN also comes in handy for handling bad data.  LEN, as with all the string functions, can't handle NULLs, so you have to check for that first (see ISNULL later).
First, I'll describe using LEN in variable length data with FINDSTRING.  If we use the same example as in FINDSTRING ("2008-1-10"), you'll soon see that in order to pull out the "10" value, we use FINDSTRING to find out where it starts (the second dash), but we also need to supply the length of substring we want to retrieve to SUBSTRING.  In the example, this is NOT strictly necessary.  We could simply pass "2" to SUBSTRING, even if our example was "2008-1-1", and we'd get a legal (and working) expression.  However, there are cases where there may be dummy characters at the end of your string, and then LEN can become necessary.
Second, we may need to do some quality checking of the incoming data to reject or otherwise "handle" invalid date strings.  LEN works very well for this - either in detecting blanks (LEN([col]) = 0), or obviously invalid values (LEN([col]) < minimum length of a valid date string).
The only argument to LEN is the string whose length you need to know, and the return value is the string's length.
REPLACE (on MSDN)
Another common issue in converting strings to dates is with junk characters, or "named" months.
If your incoming date string contains some characters that sometimes occur, but that you want or need to discard, a common technique is to use REPLACE to replace them with nothing (rather than use FINDSTRING and SUBSTRING to detect and replace them).
To handle month names, you may be more comfortable with nesting REPLACE functions instead of having a rather large nested conditional operator statement (see Conditional Operator later).
Other Tools and Techniques
For those of you who are converting other types into dates (numbers or month names, for example), or who have unreliable incoming data, these other expression functions and techniques will help.
ISNULL (on MSDN)
This is a common function when dealing with unreliable data - as it weeds out values that will cause errors in the string functions you want to use for valid data.  You can use the ISNULL function to detect NULLs in a Conditional Split component before you pass the flow to your Derived Column component.  Or, in conjunction with the Conditional Operator expression, you can use it inside your Derived Column expression to supply a default value for NULLs.  The ISNULL function simply takes an expression and returns true if the expression evaluates to NULL, and false otherwise.
Conditional Operator (on MSDN)
The Conditional Operator ("?") is the SSIS way of expressing "IF".  It's actually extremely familiar to VB users if they think in terms of an IIF, or C# users' ? ternary operator.  There are three "arguments" (parts, really) to a Conditional Operator expression - and I recommend enclosing each part in parentheses, as well as the whole thing in parentheses to separate everything clearly.
The operator is used in this format: (condition) ? (true expression) : (false expression). 
The "condition" argument must evaluate to true or false.  Typically, the condition is going to be ISNULL([date string]), or a comparison to a literal string such as [date string] = "October".
The "true expression" and "false expression" can evaluate to any data type you want - but in our situation, that's most likely to be some kind of string expression that will later get converted into a date type.
Nesting Functions
It is quite common to have to nest the above functions and techniques deeply - to the point of being shocking if you don't have to at all.  "Nesting" means using the results of some of the above functions as arguments for more functions.  A common occurrence is using the results of FINDSTRING and LEN to supply the second and third arguments to SUBSTRING.  Since it's quite a common requirement, here are some tips to use nesting reasonably:
* Use parentheses.  It's maddeningly unclear to me - especially in Conditional Operator use - if parentheses are not used to group up the parts.
* Don't nest too far.  The editor for expressions in the Derived Column component isn't great - it's a single line scroller.  If your expression is complex and nested deep, then debugging it will be very hard.  It's a judgement call on your part as to how complicated you let it get.  What's your alternative?  Cascading.
Cascading Derived Column Components
Instead of having a thousand character expression to decode a single column, or repeating the same SUBSTRING or FINDSTRING functions three or four times, you might want to cascade Derived Column components.
When I say cascading, I'm referring to placing two (or more) Derived Column transformations on your data flow one after the other.  The output from the first Derived Column transformation is input into the second, and so on.  With this technique, you can decompose a VERY complex reformatting problem into a very simple step by step approach. 
If you are reformatting a variable length string, this can be quite effective.  Use the first Derived Column transform to create columns that contain the "answers" to your FINDSTRING questions.  For our running example of "2008-1-10", you can create a column called "month starts at index" that gets filled with FINDSTRING([date string], "-") + 1, and a column called "month ends at" that gets filled with FINDSTRING([date string], "-", 2) - 1.  Your second Derived Column transform would then have a very easy expression for SUBSTRING to retrieve the month value.
If you have to reformat month names into numbers, this technique can also help quite a lot.  Have your first Derived Column transform to the conversion of month names to numbers, using nested REPLACEs or Conditional Operators, and have the second Derived Column transform use the results to reformat the date string.  The net effect doesn't reduce the size of the total decoding expression, but it does separate the month name decoding into one chunk which can be easily debugged independently of the rest of the reformatting logic.
Error Redirection
In some cases, you'll have to deal with really cruddy data.  As always, you should gracefully handle problems that your reformatting doesn't anticipate.  Use the error output to log or otherwise handle your errors - it's the smart thing to do.
Regular Expressions
This tool is for particularly nasty or very dirty/untrustworthy data.  Unfortunately, the Derived Column component doesn't have any regular expression support in it.  But luckily, there are several other custom components that do.  The SSIS Community Tasks and Components project at CodePlex has links to several regex components.  As of this updated post, there's one from Microsoft and two from SQLIS.com (aka Konesans) that are free to use.
The Last Step - Conversion
After all that work to reformat your string (or number(s)) into ISO format, the last step is to convert that string into a date.  Any date type in SSIS will work, but I typically use DT_DBTIMESTAMP, as that's equivalent to the standard MS SQL Server "DateTime" datatype.  Perform that conversion like this:
(DT_DBTIMESTAMP)[ISO Format String Column]
Conversion Examples
The given examples all assume your date is coming in a single string-type column called "date string", unless otherwise specified.  All the examples also convert the incoming values into DT_DBTIMESTAMP - although you can interchange that to use any SSIS date type.
Consistent "YYYYMMDD" format
Easy:
(DT_DBTIMESTAMP)(SUBSTRING([date string], 1, 4) + "-" SUBSTRING([date string], 5, 2) + "-" + SUBSTRING([date string], 7, 2))
Consistent "MM/DD/YYYY" format
(DT_DBTIMESTAMP)(SUBSTRING([date string], 7, 4) + "-" SUBSTRING([date string], 1, 2) + "-" + SUBSTRING([date string], 4, 2))
"MM/DD/YYYY" format, sometimes NULL
(ISNULL([date string])) ? NULL(DT_DBTIMESTAMP) : (DT_DBTIMESTAMP)(SUBSTRING([date string], 7, 4) + "-" SUBSTRING([date string], 1, 2) + "-" + SUBSTRING([date string], 4, 2))
"MM/DD/YYYY" format, sometimes NULL, sometimes empty or truncated
(ISNULL([date string])) ? NULL(DT_DBTIMESTAMP) : (LEN([date string]) < 10) ? (DT_DBTIMESTAMP)"1900-01-01" : (DT_DBTIMESTAMP)(SUBSTRING([date string], 7, 4) + "-" SUBSTRING([date string], 1, 2) + "-" + SUBSTRING([date string], 4, 2))
Variable length "MM/DD/YYYY" format
(DT_DBTIMESTAMP)(SUBSTRING([date string], FINDSTRING([date string], "/", 2) + 1, 4) + "-" + RIGHT("0" + SUBSTRING([date string], 1, FINDSTRING([date string], "/") - 1), 2) + "-" + RIGHT("0" + SUBSTRING([date string], FINDSTRING([date string], "/") + 1, FINDSTRING([date string], "/", 2) - FINDSTRING([date string], "/") - 1), 2))
"MMM DD, YYYY" format
Nested Conditional Operator solution:
(DT_DBTMESTAMP)(SUBSTRING([date string], 9, 4) + "-" + (UPPER(SUBSTRING([date string], 1, 3)) = "JAN" ? "01" : UPPER(SUBSTRING([date string], 1, 3)) = "FEB" ? "02" : UPPER(SUBSTRING([date string], 1, 3)) = "MAR" ? "03" : UPPER(SUBSTRING([date string], 1, 3)) = "APR" ? "04" : UPPER(SUBSTRING([date string], 1, 3)) = "MAY" ? "05" : UPPER(SUBSTRING([date string], 1, 3)) = "JUN" ? "06" : UPPER(SUBSTRING([date string], 1, 3)) = "JUL" ? "07" : UPPER(SUBSTRING([date string], 1, 3)) = "AUG" ? "08" : UPPER(SUBSTRING([date string], 1, 3)) = "SEP" ? "09" : UPPER(SUBSTRING([date string], 1, 3)) = "OCT" ? "10" : UPPER(SUBSTRING([date string], 1, 3)) = "NOV" ? "11" : UPPER(SUBSTRING([date string], 1, 3)) = "DEC" ? "12" : "00") "-" + SUBSTRING([date string], 5, 2))
Cascading Derived Column transform solution (taken to an extreme - you don't have to go this far):
Derived Column transform #1
create a "year" column from this expression: SUBSTRING([date string], 9, 4)
create a "month name" column from this expression: UPPER(SUBSTRING([date string], 1, 3))
create a "day" column from this expression: SUBSTRING([date string], 5, 2)
Derived Column transform #2
create a "month number" column from this expression: ([month name] == "JAN") ? "01" : ([month name] == "FEB") ? "02" : ([month name] == "MAR") ? "03" : ([month name] == "APR") ? "04" : ([month name] == "MAY") ? "05" : ([month name] == "JUN") ? "06" : ([month name] == "JUL") ? "07" : ([month name] == "AUG") ? "08" : ([month name] == "SEP") ? "09" : ([month name] == "OCT") ? "10" : ([month name] == "NOV") ? "11" : ([month name] == "DEC") ? "12" : "00"
Derived Column transform #3
(DT_DBTIMESTAMP)([year] + "-" + [month number] + "-" + [day])
Numeric Date YYYYMMDD (One Column) format
These solutions assume the single column containing the integer date is called "numeric date"
Nested function solution:
(DT_DBTIMESTAMP)((DT_WSTR, 4)ROUND([numeric date] / 10000, 0) + "-" + (DT_WSTR, 2)(ROUND([numeric date] / 100, 0) - (ROUND([numeric date] / 10000, 0) * 100)) + "-" + (DT_WSTR, 2)([numeric date] - (ROUND([numeric date] / 100, 0) * 100)))
Conversion to string solution:
(DT_DBTIMESTAMP)(SUBSTRING((DT_WSTR, 8)[numeric date], 1, 4) + "-" + SUBSTRING((DT_WSTR, 8)[numeric date], 5, 2) + "-" + SUBSTRING((DT_WSTR, 8)[numeric date], 7, 2))
Cascading Derived Column solution (taken to an extreme, agan - you don't have to go this far):
Derived Column transform #1
create a "year" column from this expression: ROUND([numeric date] / 10000, 0)
Derived Column transform #2
create a "month" column from this expression: ROUND([numeric date] / 100, 0) - ([year] * 100)
Derived Column transform #3
create a "day" column from this expression: [numeric date] - ([year] * 10000) - ([month] * 100)
Derived Column transform #4
(DT_DBTIMESTAMP)((DT_WSTR, 4)[year] + "-" + (DT_WSTR, 2)[month] + "-" + (DT_WSTR, 2)[day])
Questions?
If you have any questions about how to convert your particular data into a date, please post a comment here, or over at the SSIS forums.  If you have any other techniques for date conversion, suggest them in a comment, and I'll try to fit them in to the above.  I hope this helps you guys, and reduces the post frequency about this topic in the forums!
References
[1] Taken from Cast (SSIS), about halfway down, just below the table that's below the big legal/illegal conversion chart.
UPDATE 2008-11-18: Clarified that the Derived Component transform is typically preferable to the Convert Data transform.
UPDATE 2009-01-23: Added Regular Expression tool information
UPDATE 2009-12-08: See Jamie Thomson's blogs on ambiguous date/time formats (original, repackaged w/ SQL 2008 info)
UPDATE 2010-08-04: The SSIS Team posted another tidbit on this subject - see how your choice of data type can also mess you up.

57 comments:

  1. Fantastic post and blog!!!
    Good work!!!
    regards!
    Pedro

    ReplyDelete
  2. In what format is it expecting the time?

    ReplyDelete
  3. SQL will correctly interpret a time given in a hh:mm:ss.fff format - where the hours (hh) are in a 24 hour clock format. The milliseconds (.fff) are optional. For more info on "standard" date/time types, check out the MSDN library article on ISO 8601 format here: http://msdn.microsoft.com/en-us/library/ms190977(SQL.90).aspx, or the CAST and CONVERT page here: http://msdn.microsoft.com/en-us/library/ms187928(SQL.90).aspx

    The ISO standard and the ODBC standard are extremely close in format - you ought to be safe with either against any RDBMS including MS SQL Server.

    ReplyDelete
  4. Cool, thanks, worked a treat.

    ReplyDelete
  5. How can I force a column to NULL using a ternary conditional operator? Something like: PIPPO==" " ? NULL : PIPPO

    Many thank
    pf

    ReplyDelete
  6. @pf

    You can assign a NULL value to a column in the Derived Column component by using the NULL() functions. Using your example, this should work (you may have to change the data type):

    PIPPO==" " ? NULL(DT_WSTR) : PIPPO

    ReplyDelete
  7. "MM/DD/YYYY" format, sometimes NULL This is the Expression
    (ISNULL([date string])) ? NULL(DT_DBTIMESTAMP) : (DT_DBTIMESTAMP)(SUBSTRING([date string], 7, 4) + "-" SUBSTRING([date string], 1, 2) + "-" + SUBSTRING([date string], 4, 2))

    Suppose sometimes some rows Zero what is the expression.

    ReplyDelete
  8. You just need to add another "if" construct, specifying a "default" value if you run into a zero (I chose Jan 1, 1900):

    (ISNULL([date string])) ? NULL(DT_DBTIMESTAMP) : ([date string]== "0") ? (DT_DBTIMESTAMP)"1900-01-01" : (DT_DBTIMESTAMP)(SUBSTRING([date string], 7, 4) + "-" SUBSTRING([date string], 1, 2) + "-" + SUBSTRING([date string], 4, 2))

    ReplyDelete
  9. Really good post Todd; haven't found something as comprehensive to this kind of conversion anywhere.
    Good reference for future projects too.

    Thanks,
    John

    ReplyDelete
  10. how do we convert a date in the format yymmdd into db_date? Any help?

    ReplyDelete
  11. You need to assume the century information by asking "is this date in the 20th or 21st century"? Traditionally, that's done by using the Conditional operator and SUBSTRING, and picking a "cutoff" year. For example, to assume every year less than 50 belongs to the 21st century, use this:
    (((DT_I4)SUBSTRING([DateColumn], 1, 2) < 50) ? "20" : "19") + [DateColumn]
    From there, you can use the YYYYMMDD example from above.

    ReplyDelete
  12. Todd thank you very much for your very detailed analysis. This is great to me because I'm going to avoid a lot of mistakes.

    I think the lack of a proper cast or convert function in SSIS is a big mistake, why I need to write all this code when I can do it in SQL Server with CONVERT( DATETIME , Receipt_Date , 104) against all the code with substring function.

    Regards
    Alcides

    ReplyDelete
  13. thank you for your.my question is about converting french string date to date.
    in my excel file i have this:
    01/09/2009
    13/12/2009
    12-13-2009

    after conversion to date i got this
    09/01/2009
    12/13/2009
    12/13/2009

    how can i force ssis to not accept the conversion of the third date cause it's an invalid french date?
    can i convert string in this format 'dd/mm/yyyy' to date in the same format

    regards

    ReplyDelete
  14. As I said in the article, the only way to make SSIS interpret dates properly is to change the format of them to "yyyy-MM-dd". Use the SUBSTRING function to do that. For example:
    SUBSTRING([column], 7, 4) + "-" + SUBSTRING([column], 4, 2) + "-" + SUBSTRING([column], 1, 2)
    Be warned - running the data you provided through a Derived Column with the above expression will cause an error - you'll need to configure the error output of the Derived Column component to ignore or re-route those error rows.

    ReplyDelete
  15. thank you todd for your answer.
    if i have a french version of sqlserver. should i change the format of date to a french format before inserting in the database?

    regards

    ReplyDelete
  16. Please remember that dates don't have formats. Date types are types. Once you convert your string typed data to a date, format isn't necessary any more - date typed data is language-agnostic.

    ReplyDelete
  17. Hi Todd,

    Excellent blog. I am trying to replace null values in "ToDate" which has datatype datetimestamp, with blank. Can you please advise how can I do that?

    Right now I am doing in following way.

    ISNULL(ToDate) ? (DT_DBTIMESTAMP)"1753-01-01 00:00:00.000" : (ToDate)

    but I dont want )"1753-01-01 00:00:00.000" to be display. rather I want " " string. Please help me.

    ReplyDelete
  18. You can not accomplish what you want. There is no such date as "blank". Dates are dates.

    ReplyDelete
  19. Hi Todd,

    Thanks for your quick response.

    ReplyDelete
  20. Hi Todd,

    Could you please advise where am I wrong here in below exp where RepStatus is two byte signed integer.


    ISNULL((DT_STR,50,1252)RepStatus) ? (DT_STR,50,1252)" " : ((DT_STR,50,1252)RepStatus)

    You help will be appreciated.

    Thanks,
    Anuja

    ReplyDelete
  21. Don't do that first conversion. Just do an ISNULL(RepStatus).

    Anuja, if you have any more questions, post them over at the SSIS forum on MSDN.

    ReplyDelete
  22. Thanks Todd,

    Sure I will post my questions on MSDN. Thanks for answering my question.

    ReplyDelete
  23. Hi Todd,

    please ignore my previous comment and use this instead.

    can you please help me how to write a code in the derived column task? here are the steps which i follow,

    I have to do the following steps,

    1. I have date and time in different columns in excel
    (Run_Date is in this format 12/30/2009 and Run_time 3:54:19 PM)

    2. While Loading them to SQL Server table, First I have to concatenate them

    3. While concatenating i should convert it to universal format YYYY-MM-DD HH:MM:SS.

    4. Once converted to Universal format, i need to load them back into the database table in DateTime format. I want my date and time to be the same as 12/30/2009 and 3:54:19 PM respectively when i load them in the table.

    My table output should look as

    Run_DateTime
    ------------------

    12/30/2009 3:54:19 PM

    Please help me with the code that i can write it in Derived column task or any other ways to do this.

    Many Thanks,
    Pradip

    ReplyDelete
  24. @Pradip - I'm not quite sure what else I can do to help you. You have a good plan to convert your data, this post gives you instructions and samples on how to use the included functions and transformations, and there are lots of links to BOL articles. Given all of that, you should be able to do this yourself!

    ReplyDelete
  25. Todd,

    Thanks for the great post. I am trying to get this expression working in the derived column, but for some reason it just does not get past the error -

    (ISNULL([company symbol])) ? NULL(DT_WSTR) : TRIM(REPLACE([company symbol],"'",""))

    Company Symbol is my column with datatype varchar (256) but usually data is not more than 7 chars.

    Any help is appreciated.

    Thanks,
    S

    ReplyDelete
  26. The error is probably with the NULL(DT_WSTR) part. It should have a length in there as well, like NULL(DT_WSTR, 256)

    ReplyDelete
  27. Thanks a ton Todd! It worked like a charm!

    Regards,
    S

    ReplyDelete
  28. Todd,

    I have an input coming in from flat file with column1 = NA and I need to evaluate it to NULL.

    I have the below derived column transformation to handle it. But during update I see that the value is not changed to NULL, it remains as "NA" itself.

    ([column1] =="NA") || ([column1] =="@NA") ? NULL(DT_WSTR,5) : TRIM(REPLACE([column1],",",""))

    Column1 is of data type decimal(3,0), length of 5. I did not get any error in the derived column transformation, that worked fine, but upon update I don't see NULL being inserted.

    Please let me know your thoughts. Appreciate your help.

    Thanks,
    S

    ReplyDelete
  29. Todd,

    Just to add more details to the below issue -

    ([column1] =="NA") || ([column1] =="@NA") ? NULL(DT_WSTR,5) : TRIM(REPLACE([column1],",",""))

    When I check the data view post Derived column transformation the column1 value is not changed to NULL , even if the input value was "NA". The derived trans does not error, but it does not do anything. It just passes the input value as is - "NA".

    Thanks for your help,
    S

    ReplyDelete
  30. @S - Your first comment is contradictory... [column1] can not possibly be a Decimal column if it contains string values.
    The most likely reason your expression isn't working is that [column1] contains spaces. Try using (TRIM([column1]) == "NA") instead of ([column1] == "NA")

    ReplyDelete
  31. Todd,

    Thank you very much! Yes indeed there were spaces and thus it was not working. Column1 is defined as a decimal column in SQL, but we get the data from a text file ,for e.g the values are like 1.12 etc.. , but occasionally we are sent strings like NA or @NA , and during this time we need to NULL it before we persist the values to the DB.

    Thanks again for your help!
    S

    ReplyDelete
  32. Todd,

    We currently receive dates from a flat file in the format of YYYYMM , every month ( ex 201109 for Sept 201110 for Oct etc).

    I need to persist this to SQL DB as 20110901 for Sept / 20111001 for Oct and so on.

    I have currently used a derived column transformation like below -

    TRIM(SUBSTRING(date,1,4)) + TRIM(SUBSTRING(date,5,2)) + TRIM(SUBSTRING(date,2,2)). But I don't think this is the most efficient way to do it, because let's say when we hit 2020 - all dates are going to come out in the format of YYYYMM02 ( 02 - DD where in I need them to be 01).

    Please let me know if there is a better way to handle this.

    Thanks,
    S

    ReplyDelete
  33. The easiest way to handle that is to not use a SUBSTRING in your third part - just use a constant of "01". With a few tweaks, this is probably what you want:
    SUBSTRING(date,1,4) + "-" + SUBSTRING(date,5,2) + "-01"

    ReplyDelete
  34. Hi Todd,

    I had a question on BULK delete. I will need to delete 9000 rows from a table in SQL 2008 - The delete query is on these lines -

    Delete from tableA where name = 'ABCD' and sector !='EFGH'.
    I will have approximately 9000 such queries, since the name and sector to delete are different. So after the delete the one's that remain on DB are the "names" tied correctly to their "sectors" with the unwanted name/sector combination removed.

    Just wanted to know if there is an easy way to do this rather than writing 9000 delete queries.

    Thanks for your help on this.

    ReplyDelete
  35. Not a question I'd expect on this post :)
    I wouldn't write 9000 Execute SQL Tasks either. Make a table with your condition values in it, then use a data flow to read it in, and an OLE DB Command component to execute the DELETE statements. Or better yet, just execute a joined DELETE T-SQL statement and do it all on the server.
    Next time you have a question like this, post it on the MSDN SSIS forums!

    ReplyDelete
  36. Hey Todd,

    Appreciate that you're still keeping up with this string. I'm relatively new to VS SSIS and am having an awful time taking my string in format "YYYY-MM-DD" (including the ""'s) and converting it to Date. I tried your examples above, but to no avail. Any advice would be greatly appreciated.

    ReplyDelete
  37. If your actual data (seen in a Data Viewer) includes the quotations, then it won't convert. You'll have to strip those off (try REPLACE or SUBSTRING).
    If that doesn't work, please ask your question at the MSDN SSIS forums. Post sample data and what you've tried.

    ReplyDelete
  38. Variable length "MM/DD/YYYY" format -- I get the following error trying to use it:

    Error at Data Flow Task [Derived Column [151]]: The function "FINDSTRING" requires 3 parameters, not 2 parameters. The function name was recognized, but the number of parameters is not valid.

    ReplyDelete
    Replies
    1. I got the same error. He specified 2 where he wanted specific occurrences so just put 1's in the places he didn't specify an occurrence.

      Delete
  39. Hi,

    I'm trying to transform the date in SSIS through Derived Column ,the source format is MM/DD/YYYY i want the destination to be in the format of MM-DD-YYYY format.
    For instance (3/2/2005) i want to append 0 infront of 3(03) and 2(02) . So the Destination format will be (03-02-2005).
    I tried
    (DT_DBDATE)(((RIGHT("0" + (DT_WSTR,2)Month(ExpDate),2)) + "-"+ RIGHT("0" + (DT_WSTR,2)Day(ExpDate),2) +"-" + (DT_WSTR,4)YEAR(ExpDate)))
    Didnot transform to desired format .The dataType on EXPDate is date[DT_Date].

    ReplyDelete
  40. @Anonymous - of course it failed.
    Please read this next sentences very carefully, because they're absolutely true: Dates have no "format." You can NOT "reformat" a date. You CAN reformat a string that contains a representation of a date.
    So - it appears that you have a date-typed column called EXPDate. That column contains a date - NOT a formatted representation of a date. If you want a MM-DD-YYYY formatted representation of a date, you'll have to cast that column as a DT_WSTR or DT_STR. Don't forget that you can't use a Derived Column to change a column's data type, you'll have to make a new column as a DT_WSTR/DT_STR.

    ReplyDelete
  41. Hi,

    I am trying to transform data from ODBC to SQL. In this process, the date column is having data like '0000-00-00'. Need to convert it using derived column. I tried different ways. Even though I am getting errors.

    I want to insert '0000-00-00' into SQL table of Date type column.
    Below is the expression I have used in the derived column:

    (DT_DATE)(DAY(Datecolumn) == 0 ? NULL(DT_DATE) : (DT_DATE)(SUBSTRING((DT_WSTR,8)Datecolumn,1,4) + "-" + SUBSTRING((DT_WSTR,8)Datecolumn,6,2) + "-" + SUBSTRING((DT_WSTR,8)Datecolumn,9,2)))

    Kindly help me here.

    Thanks,
    Naveen

    ReplyDelete
    Replies
    1. Unfortunately, that's not possible Naveen. SQL Server can't store that date, and SSIS doesn't understand it either. SSIS's DT_DATE starts at 1899-12-30, see Integration Services Data Types.

      Delete
  42. Hello Todd,
    Can you please tell me how i can handle null Values ( if my excel sheet date column ( in my case Birth Date) has no value ) . i am using
    (DT_DBTIMESTAMP)(SUBSTRING([Copy of Birth Date],1,4) + "-" + SUBSTRING([Copy of Birth Date],5,2) + "-" + SUBSTRING([Copy of Birth Date],7,2))
    Thanks for a nice discussion.

    ReplyDelete
  43. Faheem, it's not the prettiest, but you have to decorate your expression even more... In general, everywhere you have [Copy of Birth Date] you can replace it with (ISNULL([Copy of Birth Date]) ? "1753-01-01" : [Copy of Birth Date]). That would result in a rather large expression for you - so specifically for your case, you only need to use ISNULL once, like this:
    (DT_DBTIMESTAMP)(ISNULL([Copy of Birth Date]) ? "1753-01-01" : SUBSTRING([Copy of Birth Date],1,4) + "-" + SUBSTRING([Copy of Birth Date],5,2) + "-" + SUBSTRING([Copy of Birth Date],7,2))

    ReplyDelete
    Replies
    1. Now i want to replace Copy of Birth Date with blank where Copy of Birth Date="1753-01-01" before inserting in my OLEDB Source ?

      Delete
    2. What is "blank"?

      There's no such value... do you mean an empty string, all spaces, a NULL? Be wary of data types too - is [Copy of Birth Date] as string? Is it a date type? What about [Birth Date]?

      Delete
    3. sorry, i mean empty string?

      Delete
    4. Add a new Derived Column with an expression like this:
      ([Copy of Birth Date] == "1753-01-01") ? "" : [Copy of Birth Date]

      Learn that unary ? operator!

      Delete
  44. Hello everyone! If you are involved in projects where you need to translate strings, I recommend trying an online localization management tool like https://poeditor.com/.

    ReplyDelete
  45. Good work.....good blog for date datatype

    ReplyDelete
  46. Hi Todd, I have a variable yyyymmdd format of string as input. Ssis will pad the dates accordingly as per above logic, how do I cause the derived column to reroute the 7 length dates as invalid dates intstead of ssis intrinsically padding them?

    ReplyDelete
    Replies
    1. Not exactly sure how SSIS would be automatically padding them for you, but a relatively simple Conditional Split that looks for LEN(RTRIM([your date column])) < 8 should spot those "bad" dates.

      Delete
  47. Have you seen an issue where dates are being brought in differing formats, from Excel, in the same column, based on if they are typed as dates (dd/MM/YYYY) or formula based(MM/dd/YYYY)? This is driving me mad!

    ReplyDelete
    Replies
    1. If you're reading Excel dates inside SSIS, that's a whole mess in and of itself. What you SHOULD see is NUMBERS - regardless of the date format in Excel. Excel stores dates as numbers, not strings, or dates. (See the section on this page starting with "Serial Date Numbers from Excel".)

      Delete