![]() |
| Square peg... |
The OLE DB Command in Integration Services is evil - and for more than the typically cited reason of poor row-by-row performance. Here's another example of where it can cause difficulties, and where a set-based update can save the day.
I recently had need to perform some mathematical calculations on a fact table to populate values retroactively. I'm not a fan of having to do that, because it means I need to fully process the fact table after each update - but I'll work on optimizing this process later. The important part is the math.
I have sale amounts in this fact table - gross sale amounts. I have a need to also provide a calculated value on each row for the dollar value that the sale was discounted. Some discounts can be known exactly at the time of sale, but others may not be known for weeks or months later. The reason for this is national contracts with larger customers where volume discounts are promised based on the actual volume of orders that were placed. At the time the individual orders are being processed, there's no way to know if the customer will reach a specific discount level, so it's not possible to record an exact discount amount. At the end of the contract period we can calculate the total discount the customer should receive, and it helps the business users if we can push that discount back down to each line item on a proportional basis, so they can see the "true" value of each sale.
A simple way of looking at this is with an example of a customer we promise a $1000 discount should they spend more than $50,000 with us. If, at the end of the contract period, they spent $65,000 - we owe them $1000. This means our net revenue was actually $64,000 - and we'd like individual transactions to reflect that reduction. Since I wanted the reduction to be applied equally to each transaction - and in this case "equally" meant proportional to the amount sold - populating that "discount" column with a percentage of the transaction sale amount would do fine. It works even better that way, because I wouldn't have to apply a specific T-SQL command for each row, I could apply a calculation to a set of rows.
In our example, I wanted to populate each transaction's discount amount with 1/65th of the sale amount. My data flow calculated the ratio I wanted to apply - in this case, a sufficiently high precision value of 0.01538. I set up the OLE DB Command to issue an UPDATE command, and then surveyed the results of my work. A quick SELECT SUM(discount) arrived at a total applied discount of $650. Nowhere near the expected $1000.
I won't bore you with the hours of investigation that led me to the real problem. If you're eagle-eyed, you may see that the $650 actually applied corresponds to a factor of 0.01, rather than the 0.01538 factor I listed earlier. I'll protect my ego by saying that this example is contrived to be simple - and this correlation wasn't obvious in the real data. This is in fact the root of the problem - and it's exposed by the OLE DB Command component. I used a command of "UPDATE FACT_TABLE SET amount_discount = amount_sold * ?". The data type of the "amount_discount" column in SQL is a DECIMAL(10,2). The OLE DB Command component therefore assumed that the parameter placeholder should represent a value of that data type, and so constrained the value it passed to SQL. So instead of issuing a command of "UPDATE FACT_TABLE SET amount_discount = amount_sold * 0.01538", Integration Services issued a command of "UPDATE FACT_TABLE SET amount_discount = amount_sold * 0.15".
Not exactly what I was looking for - but then, I'm not quite sure what else I would expect from the OLE DB Command and its parameter replacement. But that's what you get when you try to fit a square peg into a round hole.

My experience of the same a couple of years back: http://www.bimonkey.com/2009/01/ssis-ole-db-command-misinterpreting-function-parameter-data-types/
ReplyDelete