tag:blogger.com,1999:blog-52693881412248505802024-03-11T21:51:35.277-07:00Todd McDermid's BlogPosts about SQL Server, Data Warehousing, Integration Services, .Net Development, and other tools of the tradeTodd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.comBlogger169125tag:blogger.com,1999:blog-5269388141224850580.post-48033045960892706172016-12-01T04:00:00.000-08:002016-12-01T04:00:16.548-08:00SSIS Data Flow "Sequence Containers" are Here!<h2>
Wooo!</h2>
OK - maybe too much of a celebration... and I do have meatier topics in the article pipeline. However, I just stumbled across a feature in SSIS that I didn't realize had been included - due to a <a href="https://connect.microsoft.com/SQLServer/feedback/details/425788/ssis-data-flow-sequence-container-needed" target="_blank">Connect item</a> I'd raised a long time ago (in early 2009). How come I didn't know that the SSIS team had fixed this in 2010? Because I was stuck on 2008R2! But this is something I've been wishing for forever - and all of you should be using this!<br />
<h2>
Data Flow Sequence Containers</h2>
OK - so what is a "Data Flow Sequence Container"? Back when I started using SSIS 2008R2, I'd just come from a coding background, and really appreciated the existence of #region/#endregion in C#. <br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://fhcmhw-ch3301.files.1drv.com/y3mCOufV4UrVz8Gy5WUITzwSC1IWzbEoqs-QxbrsafzlfzTIMcrd-tyG84fZnKBe0qlIHUFOOqhEiIa6_n__4CXYFnWSyk1JqJGHHkD2Zr82qoVdR3R799Y2bTFv5d5E3b8sSxuMDgiprT89ZxuIKEGGgvXW4AI8CiKS5PX2GmYpik?width=341&height=65&cropmode=none" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://fhcmhw-ch3301.files.1drv.com/y3mCOufV4UrVz8Gy5WUITzwSC1IWzbEoqs-QxbrsafzlfzTIMcrd-tyG84fZnKBe0qlIHUFOOqhEiIa6_n__4CXYFnWSyk1JqJGHHkD2Zr82qoVdR3R799Y2bTFv5d5E3b8sSxuMDgiprT89ZxuIKEGGgvXW4AI8CiKS5PX2GmYpik?width=341&height=65&cropmode=none" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Summary level region</td></tr>
</tbody></table>
I used regions before I started actually coding something to write pseudo-code as the regions, drill down into each "step", perhaps generate sub-step regions, until I got to "code level". This use of regions basically replaced my use of code comments (I still do use some). It was so much easier to describe what and why a chunk of code was doing, while hiding the code. Regions permitted much clearer exposition of what was going on in the code at a high-level, and letting the developer dig in as much or as little as they wanted.
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://crcehw-ch3301.files.1drv.com/y3m9k3O0fOm6fD9dRON2IXs7TOGfYJ7jZx6b8idIk19JKxHj9gDjyeuU1a8Id3RTahniqUWX-QY7g8LCJJm3Q3QGfrSSKgWmhaFXl6c6ONcFBs7XPxiLEBu509YiB2gMH8qoJY6dnGR1zQS9qv21YpCfKV7a_hyM8p9_ZmDusxoJzQ?width=395&height=162&cropmode=none" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://crcehw-ch3301.files.1drv.com/y3m9k3O0fOm6fD9dRON2IXs7TOGfYJ7jZx6b8idIk19JKxHj9gDjyeuU1a8Id3RTahniqUWX-QY7g8LCJJm3Q3QGfrSSKgWmhaFXl6c6ONcFBs7XPxiLEBu509YiB2gMH8qoJY6dnGR1zQS9qv21YpCfKV7a_hyM8p9_ZmDusxoJzQ?width=395&height=162&cropmode=none" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">One level lower detail regions</td></tr>
</tbody></table>
It facilitated debugging and code understanding by walking the unfamiliar developer through the thought process behind the code, instead of having the developer try to surmise what and why code was there (with comments). With only comments, the developer has to try to keep multiple ideas/steps in his head while scrolling through code.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://fhclhw-ch3301.files.1drv.com/y3m3zuVZy1OSRVNWPVm1LQ1T5WxQO0XTd9kJtF6ADdEsS7ffiYANPym16GQZUcQo-4OQKk8Ry_68YGB28m09HcT_seVof5nTQ2Ovd9SobT2BEGsxQ7xc8qrgAKfd4ueQQZ2eUps9dPRg6iKS7fxENKw2P8sHh2m1sduXQY_5lNuouU?width=653&height=227&cropmode=none" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://fhclhw-ch3301.files.1drv.com/y3m3zuVZy1OSRVNWPVm1LQ1T5WxQO0XTd9kJtF6ADdEsS7ffiYANPym16GQZUcQo-4OQKk8Ry_68YGB28m09HcT_seVof5nTQ2Ovd9SobT2BEGsxQ7xc8qrgAKfd4ueQQZ2eUps9dPRg6iKS7fxENKw2P8sHh2m1sduXQY_5lNuouU?width=653&height=227&cropmode=none" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">How are regions <i>not</i> awesome?</td></tr>
</tbody></table>
If you're not using these in SSIS Scripts, you SHOULD BE.
<br />
What I asked for in "Data Flow Sequence Containers" is the SSIS equivalent of this, and it's here. (Actually must have been here since SSIS 2012. Hi. I'm Todd, and I'm new here.)<br />
<h2>
Data Flow Groups</h2>
<div>
Data Flow Groups is what they're calling it, and it's deceptively simple to use. One of the reasons I'm sure I (and SSIS people I talk to who DID NOT LET ME KNOW IT WAS THERE) missed it is because I was expecting it to be a component in the toolbox. Not so.</div>
<div>
Code up your Data Flow as you normally would. Then go and select the components that you want to group together - via clicking and dragging a selection window, or click-selecting components. Any component combinations you want. Then right-click and select Group.</div>
<h2>
Collapse Groups</h2>
<div>
Just like code regions, you can collapse groups to make it easier to see the "big picture" in a sea of components.</div>
<div>
<h2>
Name Your Groups</h2>
<div>
To make your groups easier to understand, name them. Doing so is better than annotations (at least for short comments) because the name is visible when the group is expanded or collapsed. If you have more to say, put an annotation inside the Group - and it will be invisible when collapsed, decluttering the data flow, until you need it.</div>
</div>
<h2>
Nesting is Supported</h2>
<div>
Just like my lovely C# code regions, you can nest groups in SSIS! Take developers from the highest-level of abstraction to the nitty-gritty details in however many steps makes sense.</div>
Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com1tag:blogger.com,1999:blog-5269388141224850580.post-41910687515126446742016-10-24T04:00:00.000-07:002016-10-24T04:00:11.890-07:00Reading a SharePoint 2007 List from SSIS 2016It's been a while, but since I now get to play with SQL 2016 full time, I have new things to talk about! One of major difficulties that any technology pro has to deal with is integrating different systems. Vendors really do try to make this as easy as they can - but the main pain points they usually focus on is <a href="http://grammarist.com/usage/inter-intra/">intra-generational versus inter-generational</a>. And sometimes when the vendor falls down (looking at you Microsoft), some enterprising people craft solutions - but even then, it's usually intra-, not inter-generational.<br />
<h3>
You Can't Get There From Here</h3>
Specifically, you can connect to SharePoint 2007 with SQL Integration Services 2008R2 using <a href="http://sqlsrvintegrationsrv.codeplex.com/releases/view/17652">Kevin Idzi's SharePoint List Adapter</a> components provided by Microsoft as "samples". I've used that many times, and even though it's rough around the edges, what wasn't about 2008R2? It got the job done - and at the end of the day, that's what we're looking for, right? (And I'm pre-excusing the roughness of what's coming... :))<br />
Microsoft improved things with the OData support in SharePoint 2010+, and built-in <a href="https://msdn.microsoft.com/en-us/library/dn584133.aspx" target="_blank">OData</a> source in SSIS 2016. But again... that's intra-generational. How does one connect to a long-in-the-tooth SharePoint instance that just hasn't yet been migrated off of with a brand-spanking-new BI stack? Tough nut to crack, as the SharePoint List Adapter is in beta for 2014, and my hacks at getting it shoehorned into a 2016 environment didn't get me far.<br />
<h3>
Scripts... To The Rescue Again</h3>
Well, if you can't use something pre-packaged, you turn to the script component, right? It's easy to "resort" to the script, because the script also solves for a few problems that custom objects don't: allowing 3rd party (black box) code to run inside your systems. At least you can code review (hah!) this stuff.<br />
What I've got for you today is a semi-generic script - you will have some editing and setup to do - that connects (I think?) any version of SSIS to any version of SharePoint (until they deprecate WSDL). Of course, all those claims are unwarranted and untested - please do you own validation in your environment. There is a fair bit of code behind the scenes - unrepentantly cribbed from the aforementioned SharePoint List Adapter code on CodePlex, and other internet sources.<br />
Please read through the commentary at the top of the script - you will have to set up the script to work for your server and list - adding some code and service references, and setting up the columns you want to retrieve.<br />
Find the code <a href="https://1drv.ms/t/s!ACFZMNvLw3YMipZz" target="_blank">here</a>. All you'll need to do is:<br />
<ul>
<li>Drop a Script Component in your Data Flow, and say it's a "source"</li>
<li>Set up some output columns</li>
<li>Identify an HTML connection manager</li>
<li>Alter the "Setup" part of the script as described in the script comments.</li>
</ul>
<h3>
Issues Overcome</h3>
<div>
There were several issues that presented themselves that required a little searching of the internet, and trial-and-error hacking to overcome:</div>
<h4>
Understanding "Name" in SharePoint</h4>
<div>
Which isn't a "name", it's a GUID - of course. There are some hoops to jump through to translate the human-readable list and view name into the GUIDs that the rest of the SharePoint web service needs as arguments.</div>
<h4>
Reflection - Accessing Script Columns Generically</h4>
<div>
When you have a script you want to make generic, you don't want to have the "user" of the script have to dig into your code and edit a bunch of assignment code. (This means assigning a script column called "ID" the value from the SharePoint column "ID".) It would be nicer if you could just have them define the column mappings, and have the code figure out what buffer properties to set. That takes use of the System.Reflection namespace in .Net to dynamically find the properties of the BufferManager class that represent the columns you want to set.</div>
<h4>
Seeing ALL of the SharePoint Columns</h4>
<div>
Argh. So frustrating. I was calling GetListItems with the correct view and (allegedly good) defaults of "null" for query modifiers. But I wasn't seeing values from all of the columns... and there seemed to be no reason for why not. Until I stumbled over someone else's description of a similar problem, where they suggested passing an empty XML node instead of null as the "list of columns to get". Go figure.</div>
<h4>
Using Blobs for Really Wide Columns</h4>
<div>
I swear SharePoint is a massive key-value-pair store because of how it abuses SQL :). Anyway - you can stuff quite a few characters into SharePoint strings, more than SSIS likes to handle (comfortably), and so you may need to use DT_NTEXT (in SSIS) and NVARCHAR(MAX) (in SQL) to handle those. This issue wasn't too difficult to overcome - there was a little more work to do in reflecting the BufferWrapper properties to see if a column was DT_NTEXT (in C#, it's a BlobColumn), then in assigning the value to the buffer.</div>
<h3>
Issues Left</h3>
<div>
I tried (a little) to make the script completely copy-paste-able... but gave up because of...</div>
<h4>
Dynamic Reference to Web Services</h4>
<div>
The script, as written, requires you - the script "user" - to add two "Service References" and two "References" to the project in VSTA in order for it to work. From what I can see, it is possible to do this only from code, but didn't make time to go this far. If you can extend the script to do so and wish to share it, please add a comment to where you publish the extension!</div>
Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com0tag:blogger.com,1999:blog-5269388141224850580.post-39015988943362726512016-10-17T04:00:00.000-07:002016-10-17T04:00:07.854-07:00Log SSIS Variable Values When You Want ToHere's a quick hit for a Monday morning...<br />
Integration Services logging levels can definitely be set up such that you can record when variables change value. It's not the simplest, and involves setting <a href="http://www.sqlerudition.com/how-to-log-ssis-variable-values-during-execution-in-the-event-log/" target="_blank">properties on the variable, configuring logging, and adding an event handler</a>.<br />
If you ask me, that's a little too complicated - and doesn't accomplish what I (<a href="http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=97304" target="_blank">and</a> <a href="http://www.sqlservercentral.com/Forums/Topic1097274-364-1.aspx" target="_blank">several</a> <a href="https://social.msdn.microsoft.com/Forums/en-US/c63dafe2-d367-47ec-912b-d831b46c45f6/log-variable-values?forum=sqlintegrationservices" target="_blank">other</a> <a href="http://www.sqlservercentral.com/Forums/Topic606873-148-1.aspx" target="_blank">people</a>) seem to want - a record of variable values at specific times in package execution. Times such as "at package startup", or "before task X", or "each iteration of a loop."<br />
<h2>
A Straightforward Option</h2>
<a href="http://www.urbandictionary.com/define.php?term=ymmv" target="_blank">YMMV</a> of course.<br />
<ol>
<li>Drop a Script Task in your Control Flow where you want it.</li>
<li>Add the variables you want to log to the "ReadOnly Variables" list in the script editor UI.</li>
<li>Open the script itself and replace the Main the following:</li>
</ol>
<div>
<span style="font-family: "Courier New", Courier, monospace; font-size: x-small;"><b> public void Main()<br /> {<br /> bool fireAgain = true;<br /> Dts.Events.FireInformation(0, "", "Script - Log Variables - v1.0", "", 0, ref fireAgain);</b></span></div>
<div>
<span style="font-family: "Courier New", Courier, monospace; font-size: x-small;"><b><br /></b></span></div>
<div>
<span style="font-family: "Courier New", Courier, monospace; font-size: x-small;"><b> // Log input argument values<br /> foreach (Variable inputVariable in Dts.Variables)<br /> {<br /> switch (inputVariable.DataType)<br /> {<br /> case TypeCode.String:<br /> Dts.Events.FireInformation(0, "", " " + inputVariable.Name + ": " + (string)inputVariable.Value, "", 0, ref fireAgain);<br /> break;<br /> case TypeCode.Int32:<br /> Dts.Events.FireInformation(0, "", " " + inputVariable.Name + ": " + ((int)inputVariable.Value).ToString(), "", 0, ref fireAgain);<br /> break;<br /> case TypeCode.DateTime:<br /> Dts.Events.FireInformation(0, "", " " + inputVariable.Name + ": " + ((DateTime)inputVariable.Value).ToString(), "", 0, ref fireAgain);<br /> break;<br /> case TypeCode.Boolean:<br /> Dts.Events.FireInformation(0, "", " " + inputVariable.Name + ": " + ((bool)inputVariable.Value).ToString(), "", 0, ref fireAgain);<br /> break;<br /> default:<br /> Dts.Events.FireInformation(0, "", " " + inputVariable.Name + ": (unhandled type " + inputVariable.DataType.ToString() + ")", "", 0, ref fireAgain);<br /> break;<br /> }<br /> }</b></span></div>
<div>
<span style="font-family: "Courier New", Courier, monospace; font-size: x-small;"><b><br /></b></span></div>
<div>
<span style="font-family: "Courier New", Courier, monospace; font-size: x-small;"><b> Dts.TaskResult = (int)ScriptResults.Success;<br /> }</b></span></div>
<div>
<br /></div>
<div>
The script automatically iterates over the list of variables you put in the script editor UI and outputs them to the log at runtime. Copy and paste the whole script task to another package, change the list of variables in the UI, and you're done - no code edits required.</div>
<ol>
</ol>
Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com0tag:blogger.com,1999:blog-5269388141224850580.post-52730164438968531262016-10-14T04:00:00.000-07:002016-10-26T17:21:35.724-07:00Control Flow Package Parts in SSIS 2016<a href="https://msdn.microsoft.com/en-us/library/mt620034.aspx" target="_blank">Control Flow Package Parts</a> are a new feature in Microsoft SQL Server 2016 that attempts to enable code reuse within SQL Server Integration Services packages. The intent is great, but like with many first iterations, there are limited use cases, and rough edges. I'll provide an overview of those here, and in subsequent posts I'll detail how I'm currently using package parts in a data warehousing environment.<br />
<h2>
Why Were Control Flow Package Parts Created?</h2>
<div>
In Integration Services, there are pain points. </div>
<div>
<br /></div>
<div>
<This space left intentionally blank (for you to crack your own jokes about the volume and variety of SSIS pain points)></div>
<div>
<br /></div>
<div>
One of those difficulties only really becomes apparent after you use SSIS extensively - as part of a non-trivial data warehousing project or synchronization system, for example. In those cases, you're typically going to have multiple packages, development time is going to span months, and you're going to learn things along the way. Demands for broad progress monitoring, standardizing error handling, or execution management are likely to arise. Most of those scenarios (and many others) typically result in extremely similar plumbing being added to all packages involved in the project - typically by copy and paste. And when the "standard" implementation changes - because it always does - it's a burden to re-copy and paste (and adjust) each package to follow the new and improved system.</div>
<div>
Microsoft created Control Flow Package Parts to help this situation.</div>
<h2>
How Do Control Flow Package Parts Help Code Reuse?</h2>
<div>
The basic idea behind package parts makes complete sense to a coder - they're macros. You take code you've used in several places, put it in a separate file that you then include and "expand" in multiple other files.</div>
<div>
If you have multiple packages with parts of the Control Flow that are identical - setting up a database in a certain way, sending emails, calling a set of stored procedures, ... - then Control Flow Package Parts can help.</div>
<div>
The assistance isn't just limited to the initial coding, either. Yes - creating a new package with the "duplicate" code is much easier. But the real gain of Control Flow Package Parts is when your "standard" code needs changes. Instead of having to edit multiple packages to address the modifications - you only have to alter the package part. Deploying the project(s) that depend on this part automatically incorporates those improvements.</div>
<h2>
The Scorecard</h2>
<h3>
Good!</h3>
<div>
To recap, the positives of Control Flow Package Parts are that you can now reuse code in SSIS. If you've used Integration Services at all in the past ten years, you should be happy about that. It's great that you can bundle up a series of tasks, preconfigured to operate in a specific way, and just drop that "algorithm" into multiple packages. Moreover, it's looking fairly robust in its implementation - what's there does work reliably.</div>
<div>
The package parts aren't "just" macros - they step one level higher than that - you can (lightly) configure them by setting values on the exposed properties of the single top-level task in the part. This means you can (in limited fashion) have a tool encoded in a package part that you can apply to slightly different ends in different packages, or even multiple instances within the same package.</div>
<h3>
Bad...</h3>
<div>
Unsurprisingly for a "v1" feature, the initial glimmers of awesomeness are dulled by the relatively small set of scenarios that are enabled with 2016's Control Flow Package Parts. There are a few minor annoyances here:</div>
<h4>
Invisible Annotations</h4>
<div>
One of the first things I naturally did when making my first package part was to place an annotation in it so that "users" of the part could be informed about the purpose, side effects, and how to use the part. Unfortunately, when that part is placed inside a host package... annotations aren't visible.</div>
<div>
I've <a href="https://connect.microsoft.com/SQLServer/feedback/details/3106624" target="_blank">posted a Connect case</a> on that as a bug.</div>
<h4>
Conditional Precedence Gets Reset</h4>
<div>
It wouldn't surprise me if I was told I was pushing this new feature too far... but one of my goals was to have Control Flow Package Parts participate fully in the host package's control flow. One of those necessary capabilities is for the package part to participate in precedence constraints with other tasks in the Control Flow. They do... except if you have a conditional precedence constraint from the package part to another task. When the package part is edited and saved (independently from the host package), and then the "macro" of the package part is "expanded" inside the host package... the condition on the precedence constraint disappears.</div>
I've <a href="https://connect.microsoft.com/SQLServer/feedback/details/3106623" target="_blank">posted a Connect case</a> on that as well. This bug has the potential to be extremely vicious - edits from one user (the package part editor) destroy code in a package they didn't touch... silently.<br />
<h4>
Data Flow Layouts and Annotations Are Discarded</h4>
<div>
Update 2016-10-26: Just found this out and <a href="https://connect.microsoft.com/SQLServer/feedback/details/3109241/package-parts-discard-data-flow-layouts-and-annotations" target="_blank">submitted a Connect bug</a>: as soon as you add a package part to a package, you lose all manual control over the positioning of data flow elements, and all of your annotations disappear...</div>
<h3>
Ugly?</h3>
<div>
I have to confess that I'm probably taking the feature outside the design envelope - but I feel that it's the natural place to take it. I described Control Flow Package Parts as a "macro" earlier - but what I really want (and I'm sure you want it too) is "Control Flow Package Functions". I want those "parts" to be first-class citizens in the host package. Since they're essentially macros - the code from the package part appears to be literally automatically copied and pasted into the host - there seems to be no reason why not. </div>
<div>
I want a managed way to interact with the part: an API if you will. And no, altering the list of top-level task properties isn't an API. Nice try. If one of the strengths of a package part is to have multiple tasks act as "one unit", then by definition, none of the settings of those included tasks is accessible to configure. Secondly, altering properties on a task is one of the cruder methods of "configuring" execution dynamically.</div>
<h4>
Shared Connection Managers</h4>
<div>
I want the Package Part to be able to "inherit" Connection Managers from the host, and/or be able to "map" Connection Managers from the host package to Connection Managers inside the package part. I've <a href="https://connect.microsoft.com/SQLServer/feedback/details/3106629" target="_blank">filed a Connect item</a> for that request, which should enable far more code re-use. Imagine a simple package part built to test the health of a database and record it in a table, perhaps as a first "validation" step in a standard ETL process you conduct in many packages. This part would need to use two Connection Managers - one for the database under test, and another for the logging table. The logging table could be static (bad idea, nothing should be "unconfigurable"), but the database under test would naturally want to be specified by the host package. Unfortunately, there's no documented* way to "pass in" the connection manager you want.</div>
<h4>
Shared Variables</h4>
<div>
Probably more problematic is the inability to communicate values "in" to the package part and get results "out." One of the best ways I can see using package parts is to distill a complex operation to a simple one - but that commonly requires input and output to be passed back and forth. Take our previous example about testing the health of a database and recording it in a table. In addition to wanting the Connection Manager be configurable, you could also want a table name to be passed in, and the results of the test (perhaps a "last updated date") returned.</div>
<div>
There's <a href="https://connect.microsoft.com/SQLServer/feedback/details/3106628" target="_blank">no documented* way to do this</a>, but think of the code reuse it would enable (without going to the greater length of calling a child package).</div>
<h2>
What's Next?</h2>
<div>
It's a great start - but Control Flow Package Parts are of limited use. As-is (without hacks), they are VERY static and unconfigurable, making them not much better than simply dropping a regular task on the design surface.</div>
<div>
I may be asking for too much with my vision for "Control Flow Package Functions"... maybe I should be simply using child packages... but there must be a happy middle ground that doesn't require full execution of a child package with all the coding, setup and teardown that entails.</div>
<div>
As I said at the top, I really haven't shown you how to use Control Flow Package Parts in this article. I intend to do that next, so I can show you the strengths and weaknesses firsthand. I've also hinted* at "hacking" the Control Flow Package Parts system to accomplish more... and I have a series of how-to posts lined up to demonstrate exactly that... so stay tuned!</div>
<div>
By the way - many thanks to my new environment and colleagues at <a href="https://www.absolute.com/" target="_blank">Absolute Software</a>. Not to belittle the frequent positive feedback I get all the time about my past posts about earlier SQL releases, but it's really nice to get my hands on newer releases. Without direct practical usage of a product or technique, it's really hard to spot the strengths and issues, or hack away at interesting aspects. I'm really fortunate to be in such a great situation with Absolute! Of course, all the opinions I express here are my own, and not those of Absolute...</div>
<br />
<div>
<br /></div>
<div>
<br /></div>
Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com4tag:blogger.com,1999:blog-5269388141224850580.post-28018858530005843902016-06-21T06:00:00.000-07:002016-10-21T13:45:28.775-07:00AlwaysEncrypted in SQL 2016This isn't my usual stomping grounds - I'm your typical developer: security isn't a focus, and if I'm the one doing it, I wouldn't trust it either. Get a real security professional involved if you're serious about doing encryption right! In fact, I'm trying to do the same - if you happen to know a developer that breathes security, I'd love to talk to them to develop a presentation that doesn't just fake security, it's the real deal.<br />
But until that happens, I can only describe what looks to be an excellent new feature in SQL 2016 - AlwaysEncrypted. What makes it excellent?<br />
<ul>
<li>Encryption (that appears to be) done right</li>
<li>Very easy to set up</li>
<li>Does what it says on the tin</li>
<li>Minimal, yet very understandable impacts on functionality</li>
</ul>
<h2>
Starting At The Beginning - Why Encrypt?</h2>
It's a question of trust - always has been and always will be. Whom do you trust with the information your business collects? Today, the answer for most businesses will be "way too many people." It's not that employees are inherently untrustworthy - if they were, your business probably wouldn't have hired them. The reason you'd get this answer is a relatively straightforward one: "Do you really need to see X to do your job (well)?" - and the answer to that one is usually "no."<br />
A standard example of this is HR information. HR systems (front-end UIs) are typically locked down to only HR departments - that's pretty straightforward and accepted. But does a DBA or a network admin need full access to these systems? Almost surely not, except that it's easier to configure access broadly that includes the permissions they need, rather than tailoring access to just what's required. AlwaysEncrypted makes it a little easier to help tailor that access - or more accurately, adding security at a different layer that means granting complete system access doesn't expose anything.<br />
<h2>
How Easy Is It?</h2>
For a database guy? Pretty darn easy - and we're not always the sharpest tools in the drawer. Of course, being a duller implement may make my claims of setting up AlwaysEncrypted "being easy" be completely undermined by my limited understanding of proper security. I may be providing a false sense that it's actually effective, when it might not be. (Again, if you happen to know a developer who focuses on security, I'd love to talk to them.)<br />
It only takes a couple steps to set up - there's a <a href="https://channel9.msdn.com/shows/data-exposed/getting-started-with-always-encrypted-with-ssms">nice Channel 9 video here</a> - but there are some key points to be very aware of:<br />
<ul>
<li>Don't perform the configuration of AlwaysEncrypted columns on an "untrusted" machine. Sensitive encryption keys are generated during setup, and it's best to have those keys never touch a system that is - or will be - part of the "untrusted" group. Like your DB server itself.</li>
<li>You can't fully test your system's effectiveness only from SQL Server Management Studio. You WILL need the help of a developer - either an SSIS, SSRS, or code monkey.</li>
</ul>
<h2>
What Does The Tin Say?</h2>
AlwaysEncrypted.<br />
<h3>
Duh. So Does It Do What's On The Tin?</h3>
Sure seems to - and all the descriptions of how it works reinforce that belief. The bottom line is that once you've configured AlwaysEncrypted on a column in a table, the data is only ever seen in plaintext form at the application client. It's encrypted by the SQL client libraries on the client (be that a line of business application, SSIS, or SSRS), and stays encrypted (as cyphertext) everywhere else:<br />
<ul>
<li>In any non-client process space on the client machine (caches, etc)</li>
<li>In the client OS as it prepares to send queries over the wire</li>
<li>Over the wire (internal network or internet)</li>
<li>On the server OS as it receives the query and routes it to SQL</li>
<li>Inside SQL memory (the query optimizer, in-memory data pages)</li>
<li>On disk (SQL MDF and LDF files)</li>
</ul>
It simply boils down to only the client having the key. If only the client has the key, and the encryption is industry-standard "really, really difficult to brute force in the forseeable future" <a href="https://msdn.microsoft.com/en-us/library/mt653971.aspx">256-bit process</a>, there's no wiggle room. Nobody else can see the plaintext.<br />
<h2>
What Can't I Do With An AlwaysEncrypted Column?</h2>
There's a <a href="https://msdn.microsoft.com/en-us/library/mt163865.aspx">full list of constraints halfway down this page</a> - but in general, it falls into two categories:<br />
<ol>
<li>Less-used data types and scenarios that Microsoft will in all likelihood address in "v2" - such as image data types.</li>
<li>Operations that require access to plaintext to perform properly - such as check constraints.</li>
</ol>
So yeah - complain about #1 while still being understanding. But think about what you're asking for and/or complaining about first to make sure it doesn't fall into category #2.<br />
<h2>
AlwaysEncrypted - Start Using It</h2>
The reasons should be overwhelming:<br />
<ul>
<li>You're dealing with known-sensitive data</li>
<li>Rolling your own encryption is <strong>HARD</strong> to get right - harder than getting your SAN set up right or getting execs to walk the talk that data quality isn't a technical problem</li>
<li>Transparent Data Encryption (TDE) only protects on-disk data</li>
<li>Wouldn't you feel better as a DBA/DBDev being protected from even the hint of responsibility for protecting this data?</li>
</ul>
Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com0tag:blogger.com,1999:blog-5269388141224850580.post-22843619583025108122016-04-05T05:00:00.000-07:002016-10-21T13:46:27.515-07:00Finding the Column Name for an ErrorColumn Code in SSIS (prior to 2016)<a href="https://blogs.msdn.microsoft.com/ssis/2015/11/27/error-column-improvements-for-ssis-data-flow-updated-for-ctp3-3/">Fantastic news</a> from the SSIS team at Microsoft. After a decade of work, they've been able to address all the bugs and features that somehow ranked higher than this. This <a href="http://connect.microsoft.com/SQLServer/feedbackdetail/view/168857/ssis-cant-get-name-from-errorcolumn-property-in-data-flow-tasks">problem is fixed</a>! They've made it easy to tell what column an error is about in an SSIS 2016 data flow. And just like them, I'm late to the party too. I've finally found time to tell you how to get the exact same thing from versions earlier than 2016. That's right, read on to find out how to get SSIS to decode the ErrorColumn value that's on the Error output <a href="https://onedrive.live.com/redir?resid=C76C3CBDB305921!114099&authkey=!AIuo_6WxD2Wd8II&ithint=folder%2cdtsx">with in-the-box stuff only</a>. Without <a href="http://www.cozyroc.com/ssis/data-flow-task">add-on tools</a>. Without <a href="https://naseermuhammed.wordpress.com/tips-tricks/getting-error-column-name-in-ssis/">custom components</a>. Without <a href="http://www.techbrothersit.com/2013/12/ssis-how-to-get-error-column-name-in.html">pre-populated lookup tables</a> <a href="https://bennyaustin.wordpress.com/2011/02/04/ssis-name-of-errorcolumn/">made by utilities</a>. (No judgment on the preceeding - use what you've got!)<br />
<h3>
What's the Problem?</h3>
Now, you're certainly reading this because you've run into an error in your code with SSIS. Seriously, nobody is good enough to code a data flow error free the first time. (And if you are, you're taking too much time to code - lean on the tool a little bit!) Given that we're all fallible, we'll all have run into the problem (especially in production) where a component in the data flow gets invalid input, routes a row to the error output, and... can't tell you anything useful about it.<br />
The typical example is a Derived Column component. You've written an expression trying to construct a new column using a couple input columns. The problem is that the incoming data isn't what you expected when you wrote the expression, and is so unexpected that it causes an error in the evaluation. Perhaps you were attempting to get a fraction (ratio) between values in two columns, but the denominator column contained a zero. Or you were attempting to reduce the size of a column, or type cast a column to a smaller data type... but the incoming data exceeds the capacity of your new column.<br />
<h3>
At Design Time</h3>
Handling these kind of problems is easy... at design time. When a component fails, you have tools at your disposal to figure out why. Putting a data viewer on the input is the easiest and least intrusive - even though it requires you to think about how the incoming data interacts with your transformation.<br />
<a href="https://msdn.microsoft.com/en-us/library/ms141679(v=sql.105).aspx">With a little more work</a>, you could attach a garbage component (Derived Column, Union All, ...) to the error output of the failing component, and configure a data viewer there. This will show you exactly which rows are causing the failure, and together with the Output window, you'll get a description of what the error is.<br />
Unfortunately, there are still two problems (pre-SQL 2016):<br />
<ol>
<li>We still don't know which column/expression is causing the component to fail. If you have two, ten, or twenty columns undergoing transformations in a component at the same time, it would be helpful to know which one of them is the problem. At design-time, you have the output window that reports the column having the problem - and this text can also be found in the sysssislog for production packages. But those aren't accessible at runtime, in production, to alter the package's behaviour, or report better information. The error output provides an ErrorColumn value to allegedly identify the column in error. However, this number has no discernible relationship with anything you can see in the data flow.</li>
<li>We can't handle the error reliably at runtime. Even if we understand what exactly is causing the component to fail, proactive error handling can't be coded reliably. There's an ErrorCode column to identify the kind of error (partially helpful) - but most often when error handling, you would really like to provide some business resource with usable information on how to fix the data that originated in their system. Because as we all (should) know, data quality is NOT an IT problem.</li>
</ol>
<h3>
The Half-Answer</h3>
Well, fairly early on after SSIS 2008, we found out that it wasn't too hard to get one half of the answer - <a href="https://msdn.microsoft.com/en-us/library/microsoft.sqlserver.dts.pipeline.wrapper.idtscomponentmetadata100.geterrordescription(v=sql.105).aspx">the actual error message</a>. We just had to add a script component to the error output and call a function to decode the error number into a text string. However, there was no similar facility to determine what transformation caused the problem. To cope with this in some cases, I've split transformations individually into Derived Columns. Yes, one transform per component - all so that I could accurately and informatively report back to SMEs in production, in an automated fashion, which data elements of theirs needed a little attention.<br />
<h3>
The Second Half</h3>
Good news now that the new SSIS 2016 would finally add the option of getting <a href="http://microsoft-ssis.blogspot.ca/2015/09/sql-server-2016-ctp-23-get-error.html">a plain-English column name where the error occurred</a>. It took me a while... but then I ran into my own need to get this information again... and recalled that I hadn't figured out a workaround for it in pre-2016 SSIS! What an unhelpful expert I am. Well - now you'll be glad to know that I took some time to solve the problem, and you don't need a separate tool to download, or store a table of metadata about your packages. This is all self-contained in the data flow - but even I can't say it's pretty, or particularly easy.<br />
<h3>
First - What is ErrorColumn?</h3>
The root of the issue is figuring out what the heck the number in ErrorColumn is. Yes, it's some indicator of a column in the data flow... but where else do we see that number? Nowhere in the UI, I can tell you that. But try this out in a failing data flow of yours: Write down the ErrorColumn, then open your DTSX with NotePad and search for that number (make it easy on yourself, and search for that number surrounded by quotation marks). What do you find? You find that the ErrorColumn refers to a LineageID. I've talked a lot about LineageIDs before - they're integral to how a data flow in SSIS works, and how SSIS keeps track of which columns are which. But what column does that LineageID reported by ErrorColumn refer to?<br />
If we go back to our example of the Derived Column that fails, it's the LineageID of the column you're creating (or transforming). When I first tried to make sense of that ErrorColumn, I thought it would refer to the incoming column that had the "bad value" that was causing my expression to fail. But then I realized (of course) that SSIS has no idea what a "bad value" is... it only knows that an expression failed! So there's no "bad value", there's just a "bad expression". As such, the column that has the error isn't one of the inputs to the Derived Column - it's one of the output columns!<br />
And to add insult to injury... that (new) column is NOT on the error output of the Derived Column. Because of course it's not - the error output doesn't include that column because <i>it wouldn't have anything meaningful in it anyway</i>. Facepalm moment. OK - now my intelligence/skills/experience has caught up to where the SSIS team was back in 2005. That took a while, I know, but bear with me - I'm attempting to redeem myself here.<br />
<h3>
Great - ErrorColumn is on the Error Output, but the Column Itself is on the Regular Output - Now What?</h3>
Great question.<br />
And your next question should be... even so, how do I figure out which column on the regular output is associated with that LineageID that's nowhere to be found in the UI?<br />
Script - of course! (Yay.) Now is the time to <a href="https://onedrive.live.com/redir?resid=C76C3CBDB305921!114099&authkey=!AIuo_6WxD2Wd8II&ithint=folder%2cdtsx">download the sample code</a> that accompanies this article. You don't really need to read and understand the script - because the good news is that you won't have to edit it at all - it's completely generic and will work with any data flow with NO CHANGES. You can copy and paste the component (not just the code) into your data flow, or you can build your own by:<br />
<ol>
<li>Go to the "Inputs and Outputs" page, and select the "Output 0" node. Change the "SynchronousInputID" property to "None". (This changes the script from synchronous to asynchronous.)</li>
<li>On the same page, open the "Output 0" node and select the "Output Columns" folder. Press the "Add Column" button. Change the "Name" property of this new column to "LineageID".</li>
<li>Press the "Add Column" button again, and change the "DataType" property to "Unicode string [DT_WSTR]", and change the "Name" property to "ColumnName".</li>
<li>Go to the "Script" page, and press the "Edit Script" button. Copy and paste this code into the ScriptMain class (you can delete all other method stubs):</li>
</ol>
<span style="background-color: white; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><b> public override void CreateNewOutputRows()<br /> {<br /> IDTSInput100 input = this.ComponentMetaData.InputCollection[0];<br /> if (input != null)<br /> {<br /> IDTSVirtualInput100 vInput = input.GetVirtualInput();<br /> if (vInput != null)<br /> {<br /> foreach (IDTSVirtualInputColumn100 vInputColumn in vInput.VirtualInputColumnCollection)<br /> {<br /> Output0Buffer.AddRow();<br /> Output0Buffer.LineageID = vInputColumn.LineageID;<br /> Output0Buffer.ColumnName = vInputColumn.Name;<br /> }<br /> }<br /> }<br /> }</b></span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpFAbJzpzylc9KgeEC5onOr4GkGhr0MHD5tE5dI-W_2iXAH_lOmroVJkuvbggMj028sCB_9XwCwYxlTCnkLsk4ublFTRl-8XdLXO2yiU9X1g4l6r-UmqprwCvMqLtfHc_UzPjHlBD_DSU/s1600/ErrorColumnName.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpFAbJzpzylc9KgeEC5onOr4GkGhr0MHD5tE5dI-W_2iXAH_lOmroVJkuvbggMj028sCB_9XwCwYxlTCnkLsk4ublFTRl-8XdLXO2yiU9X1g4l6r-UmqprwCvMqLtfHc_UzPjHlBD_DSU/s1600/ErrorColumnName.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpFAbJzpzylc9KgeEC5onOr4GkGhr0MHD5tE5dI-W_2iXAH_lOmroVJkuvbggMj028sCB_9XwCwYxlTCnkLsk4ublFTRl-8XdLXO2yiU9X1g4l6r-UmqprwCvMqLtfHc_UzPjHlBD_DSU/s320/ErrorColumnName.jpg" width="320" /></a></div>
Feel free to attach a dummy output to that script, with a data viewer, and see what you get. From here, it's "standard engineering" for you ETL gurus. Simply merge join the error output of the failing component with this metadata, and you'll be able to transform the ErrorColumn number into a meaningful column name.<br />
<br />
But for those of you that do want to understand what the above script is doing:<br />
<ol>
<li>It's getting the "first" (and only) input attached to the script component.</li>
<li>It's getting the virtual input related to the input. The "input" is what the script can actually "see" on the input - and since we didn't mark any columns as being "ReadOnly" or "ReadWrite"... that means the input has NO columns. However, the "virtual input" has the complete list of every column that exists, whether or not we've said we're "using" it.</li>
<li>We then loop over all of the "virtual columns" on this virtual input, and for each one...</li>
<li>Get the LineageID and column name, and push them out as a new row on our asynchronous script.</li>
</ol>
So now you can see that this script can be used in the middle of any data flow, and it will output the columns "as rows". You could even get more clever and start outputting columns with data types and other metadata related to the column too.<br />
<div>
<br /></div>
Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com3tag:blogger.com,1999:blog-5269388141224850580.post-89490623548367705532015-08-03T05:00:00.000-07:002015-08-03T05:00:12.269-07:00Consolidating Multiple Data MartsOne of the challenges that I've worked on in the past months involved adapting our technical architecture to address issues exposed by business re-engineering. The BI group at CHC used to assign specific people to specific business units in order to deliver solutions to business needs. The upside of that arrangement was that each business unit developed a strong relationship with a member of the BI team, and always felt forward motion on their issues. The downside was that the BI team was fractured internally, technical implementations varied wildly, larger projects weren't staffed, and high-priority projects didn't attract the team's attention away from lower priority projects.<br />
Our internal BI delivery model has changed to become more of a single team knocking down projects together, and as such, the disconnected architecture of the existing solutions becomes glaringly apparent. The most obvious way it's visible is that it's very difficult to re-use work already done, and integrate "better" parts of one solution with the others. One aspect of that is how the relational data is stored and (not) shared.<br />
<h3>
The Debt</h3>
Typically in a multi-purpose data warehouse (Kimball style), you'll have a bus matrix that identifies the subject areas (facts) and the business entities that apply to each of them (dimensions). Also quite typically, the business entity tables (dimensions) are shared between the subject areas - conformed. This can be a tricky part of the project, getting two parts of your organization to agree (precisely!) on a common (or compatible) definition of a shared entity - but assuming you can get that done, the next step is to technically build and maintain only one table.<br />
Unfortunately, given the fractured nature of the past team, this wasn't the case. One of the ETL developers here counted eighteen (18) different date dimension tables. Eighteen. Dates. Yeah. Even with a pretty dysfunctional organization, you'd think you could come to agreement that a day is a day - even if you might not agree on which attributes are in common between subject areas or what they look like. Even if you couldn't do that, you'd think there wouldn't be 18 ways to represent dates. And you'd be right.<br />
This problem needed addressing - and it is a problem, because maintaining 18 different date tables is, quite frankly, ridiculous. But trying to collapse eighteen tables into one isn't easy. It's not quite as simple as making one common table and modifying all places that touch that table to refer to the common table. We could make sweeping changes all at once, or work one BI application at a time. If we're making changes to 18 BI applications at once, we could break them all.<br />
If we attempt to implement incrementally we could just be adding a 19th date table, and not really get rid of any existing ones.<br />
Of course, in a real-world scenario, things aren't even as simple as I've described. Sometimes a dimension is constructed by application A, but application B also uses it. So if we go and change application A to point at a new common dimension, we'll end up breaking B - perhaps silently. It's this silent dependency that really paralyzes attempts to clean things up.<br />
<h3>
The Goals</h3>
So all I've exposed so far is how unhappy and hard to please that I am. Now it's time to figure out what I'd be happy with! Here's the goals for cleaning up:<br />
<ul>
<li>Remove duplication of work by having one (1) dimension table per entity - where the business thinks it's feasible</li>
<li>Allow two or more dimension tables about the same entity in cases where the business can't/won't come to agreement</li>
<li>Make dependencies clear </li>
<li>Make breaking dependencies obvious as soon as possible</li>
<li>Permit changes to a central dimension table without breaking dependent entities</li>
<li>Design in the ability to start and stop refactoring at any time without having made the situation worse</li>
</ul>
Not a short, easy list of goals, is it?<br />
<h3>
Views to the Rescue</h3>
<div>
Views are a common weapon in the fight to add "abstraction layers" in systems. These layers don't stop changes from breaking things, but they can help make fixing them faster. They allow this by essentially making a "translation layer" that is easier and faster to reconfigure than the larger parts that they connect. Of course, one of the main traps to avoid in using views in this way is making them contain too much translation, turning them into part of the ETL, and a shadow code base.</div>
<div>
What we're going to do is:</div>
<ul>
<li>Take the ETL out of application A and/or B, and make a new ETL for the shared, conformed dimension. We'll call this "C" for now.</li>
<li>Replace A's dimension table with a view to C's table</li>
<li>Replace B's dimension table with a view to C's table</li>
</ul>
Now A and B are really referring to C. This achieves the "remove duplication of work" goal.<br />
To achieve the goal of making dependencies clear and remove the possibility of altering them unintentionally, we're going to use a special clause when we make the view: WITH SCHEMABINDING. <a href="https://msdn.microsoft.com/en-us/library/ms187956.aspx">Read up in MSDN</a> (look for SCHEMABINDING) - I'll wait. To translate that into our specific situation, this means that C can't change it's table definition if it would break the view in A or B. Dependency clarity.<br />
<h3>
A Specific Example (with Identical Tables)</h3>
Application A and B have "date" tables. They're both defined exactly the same, and have a set of identical columns. Critically, the KEY of both tables is defined the same. A's table has some columns that B doesn't have, and vice versa. In code (briefly - you'd have PKs, NOT NULL set appropriately... right?):<br />
CREATE TABLE DimDate_A (SK_Date INT, RealDate DATE, CalendarYear INT)<br />
CREATE TABLE DimDate_B (SK_Date INT, RealDate DATE, FiscalYear INT)<br />
Step 1 is to create a new table for the conformed date dimension:<br />
CREATE TABLE DimDate_C (SK_Date INT, RealDate DATE, CalendarYear INT, FiscalYear INT)<br />
Step 2 is to combine the ETL for A and B into a process for C - use your tool of choice (such as SSIS)<br />
Step 3 is to remove the ETL from A and B, drop A and B's tables, and create views instead:<br />
DROP TABLE DimDate_A<br />
DROP TABLE DimDate_B<br />
CREATE VIEW Dim_Date_A WITH SCHEMABINDING AS SELECT SK_Date, RealDate, CalendarYear FROM DimDate_C<br />
CREATE VIEW Dim_Date_B WITH SCHEMABINDING AS SELECT SK_Date, RealDate, FiscalYear FROM DimDate_C<br />
<h3>
More Complex Examples</h3>
This technique does work for more complex examples - I showed that at a <a href="https://onedrive.live.com/redir?resid=C76C3CBDB305921!66525&authkey=!AJuxHOH-h9RXR-4&ithint=folder%2csql">presentation <span id="goog_1741554239"></span>at last month's SQL Saturday</a><span id="goog_1741554240"></span>. If you missed SQL Saturday, Scott Stauffer is setting up a PASS chapter meeting in early August where I'll present it again, and have time for more interaction.Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com0tag:blogger.com,1999:blog-5269388141224850580.post-20423951417892460112015-06-19T05:00:00.000-07:002015-06-19T05:00:08.653-07:00SQL Saturday Vancouver - June 27, 2015<div class="separator" style="clear: both; text-align: center;">
<a href="https://7ub9vg-ch3301.files.1drv.com/y2m8hJJ8rb2iBrn3vwnJxZTEKlHWuWo8MjnO0NJp-ZRNf6qXPqYIA5bzV8IE8-O4fKz8vpnURxz93iRDtkktvIOUFMmNpwOsiUntIJMyk4Mqfj5wcCBgxYs2SxmE6chrEpH2zVoy4dnpCfnBdrbrAyfRg/SQLSAT407_web.jpg?download&psid=1" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://7ub9vg-ch3301.files.1drv.com/y2m8hJJ8rb2iBrn3vwnJxZTEKlHWuWo8MjnO0NJp-ZRNf6qXPqYIA5bzV8IE8-O4fKz8vpnURxz93iRDtkktvIOUFMmNpwOsiUntIJMyk4Mqfj5wcCBgxYs2SxmE6chrEpH2zVoy4dnpCfnBdrbrAyfRg/SQLSAT407_web.jpg?download&psid=1" /></a></div>
A great learning and networking event comes back to Vancouver just in time to regroup about SQL Server 2016. Don't pass up the opportunity to hear from some of the best expert speakers from around the world - they do this because they love doing it, so you know they're dedicated. All it costs is... well... nothing but your time on a Saturday. (OK - $15 for lunch so you don't have to hunt around and find something.)<br />
Register <a href="http://www.sqlsaturday.com/407/EventHome.aspx">TODAY</a> - it's only one week away.<br />
I'll be there too, helping out to get you to the sessions you want, and making sure everything is running as smoothly as Scott Stauffer, Rich Baumet, and Melody Zacharias want it to!<br />
I'll also be <a href="http://www.sqlsaturday.com/407/Sessions/Details.aspx?sid=36927">presenting a technique</a> I've learned to deal with gradual merging of multiple data marts into a larger, consistent data warehouse.Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com0tag:blogger.com,1999:blog-5269388141224850580.post-11868920282022598752015-06-17T05:00:00.000-07:002015-06-17T05:00:12.964-07:00Adventures in SSAS WritebackI should probably forward this to the <a href="http://thedailywtf.com/">Daily WTF</a> (I'm a dedicated reader) - but we'll see how well the story tells here, with an audience that has a passing knowledge of SQL Server.<br />
As part of my new gig, we have a lot of existing "BI applications" (typically some ETL, an SSAS cube, and a set of SSRS reports) that are built to... a different standard than I'm used to. I have no desire to imply that the people that made these applications are not good people. It's quite likely they either didn't have the right technical knowledge to build things properly (IMO), or were under other stressors that caused or contributed to poor solution construction.<br />
With that caveat out of the way, what I'm about to describe is definitely the worst use of writeback I’ve ever seen. But I have limited history. I’d be willing to wager it’s quite possibly the worst usage of writeback <em>ever</em> - let me know if you agree, or top it in the comments. Either way, don’t ever let me find you doing this…<br />
<h4>
Background on Writeback</h4>
First, some background (from an admitted SSAS noob) so we’re all on the same page.<br />
BI applications that involve SSAS usually mean multidimensional cubes are involved. That’s where the “cube” name comes from, and it is typically the target of end-user queries. The cube is built from regular database tables (or views, or queries) – typically designed a little differently, in star schema usually, but still regular tables. Those tables are normally filled by an ETL process that reads data from source systems, transforming it into useful information that the cube effectively summarizes and presents.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://5kzuva-ch3301.files.1drv.com/y2mBC_OuM8JQEDnm1ucDi-h1uCl-qVyoxQsJRO0nxw02G6sk3OAbqv3C2oce4sOiwRs2hpQ-TjtES7YebyPtlO_Lb8FZg5_rQMY2GmJir_us3YCy5CQDnIuSKy4lrzQWmamI_XFWmYl4k3Qc9-YirOZWw/Regular%20BI%20Application.jpg?download&psid=1" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="120" src="https://5kzuva-ch3301.files.1drv.com/y2mBC_OuM8JQEDnm1ucDi-h1uCl-qVyoxQsJRO0nxw02G6sk3OAbqv3C2oce4sOiwRs2hpQ-TjtES7YebyPtlO_Lb8FZg5_rQMY2GmJir_us3YCy5CQDnIuSKy4lrzQWmamI_XFWmYl4k3Qc9-YirOZWw/Regular%20BI%20Application.jpg?download&psid=1" width="640" /></a></div>
From time to time, there are requirements from users that they would like to add or modify information in that cube in ways the source systems don’t (want to) support. One common request is “what-if” scenarios. Other times its wanting to enter or adjust budget or target data. Sometimes it’s “corrections” or “adjustments” to computed values. One way to accomplish this requirement is to have those users fill in a form/database/textfile somewhere that the regular ETL consumes, eventually including those numbers in the finished cube. This method requires that the ETL and cube processing execute after adjustments are made to incorporate this new information in order to see the result of their addition/change/adjustment.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://7ub9vg-ch3301.files.1drv.com/y2mn6GPSU0BPV-twDCByUq1_io_3EwHPMW0kVeHXRHsQqP9gmPCiiinpBehXyP-G2kj5iAH03oeDLvOvGfnli5U4bYP1eTkAjyttjEF97WkS279vZyjH72bTkqz-TYKV7COqZUVNi5vIZ-JVzHTHDHD6w/ETL%20Adjusted%20BI%20Application.jpg?download&psid=1" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="173" src="https://7ub9vg-ch3301.files.1drv.com/y2mn6GPSU0BPV-twDCByUq1_io_3EwHPMW0kVeHXRHsQqP9gmPCiiinpBehXyP-G2kj5iAH03oeDLvOvGfnli5U4bYP1eTkAjyttjEF97WkS279vZyjH72bTkqz-TYKV7COqZUVNi5vIZ-JVzHTHDHD6w/ETL%20Adjusted%20BI%20Application.jpg?download&psid=1" width="640" /></a></div>
Cube writeback shortens this loop (if implemented as designed). Instead of users filling in a form which simply becomes just another source for the ETL, the users can edit the contents of the cube "directly", and SSAS maintains a SQL table filled with those "adjustments". No re-run of the ETL or cube processing is required. Those "adjustments" are persisted and made available (almost) immediately to all users. That instant feedback is a definite advantage of writeback, but it comes at a cost of limited auditability, and it doesn't always apply adjustments like you expect.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://7ub9vg-ch3301.files.1drv.com/y2m-q1UNWniCMxJGrgqGKhaLSHtoKsCdgYzawH5tcr36uNucaOu7Z9wIwXYtNP63UAxBRuaqGc91bbX_1cvgaUfhAIPk_Gjsc9YY9Wz8plrOEt9bwtjUjcVmaNirZrQHyTP7ofIa1QWYb9tZTzeRhggzw/Writeback%20BI%20Application.jpg?download&psid=1" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="210" src="https://7ub9vg-ch3301.files.1drv.com/y2m-q1UNWniCMxJGrgqGKhaLSHtoKsCdgYzawH5tcr36uNucaOu7Z9wIwXYtNP63UAxBRuaqGc91bbX_1cvgaUfhAIPk_Gjsc9YY9Wz8plrOEt9bwtjUjcVmaNirZrQHyTP7ofIa1QWYb9tZTzeRhggzw/Writeback%20BI%20Application.jpg?download&psid=1" width="640" /></a></div>
<h4>
Now on to the crazy…</h4>
We're in the process of rebuilding, or at least reviewing, every BI application in the business. Next on the list for me was a cube built for reporting KPIs, and I was told that it implemented writeback to allow for targets to be entered. Being a new guy to this sort of thing, I just nodded my head and carried on. Our standard process involves renaming all parts of the application - primarily in order to enforce internal consistency, but also to allow side-by-side comparison before launch. As we got into the project, we had to make sure the writeback hooked up and adjusted our new cube appropriately, just like the old one did. And that’s when we found… that writeback was NOT turned on in this cube. That’s right – we go to the writeback properties in the cube, and Visual Studio says that writeback is NOT POSSIBLE due to other settings in the cube. Double, triple checked. Cube in production is the same. No writeback.<br />
But wait, I said – I saw a table labeled “writeback” in the ETL processes… we’re reading the writeback information… where is it coming from? Unfortunately, being a noob, that was the WRONG question. The right question would have occurred to me much earlier – when I actually saw the “writeback” table included in the ETL. Going back to the backgrounder on writeback, just seeing that table should have been a red flag that something wasn’t right… the ETL shouldn’t even be involved in writeback at all, because it should be fully contained in the SSAS cube layer.<br />
Anyway, we’re still left with the question – and now it’s even more pressing – where is this writeback data coming from? Examining a spreadsheet that an end-user uses to fill in adjustments, it becomes clear. There’s a SECOND cube, in a SECOND Visual Studio solution, modeled on the same regular tables, WITHOUT the settings that are incompatible with writeback. The user updates the spreadsheet, which updates that second cube with adjustments. The second cube – using standard writeback configuration – writes data to a SQL table. Any query hitting the second cube will automatically, and immediately, show adjusted information. However, the primary (real) KPI cube will not… until the ETL is run that reads the second cube’s (supposed to be internally used only) writeback table. Unfortunately, this second cube data is also incorrect – because it doesn’t have the right processing configuration (the settings that prevent writeback from working). So we’re left with the first cube showing “good” but unadjusted information (until ETL runs again), and a second cube that shows “bad” but adjusted information.<br />
<h4>
High Facepalms All Around</h4>
At the end of the day, what we have is an overly complex implementation of end-user data entry that requires two cubes, roughly double the maintenance attention, and double the processing power of just about any other style of end-user data entry system. And to add insult to injury, it doesn’t even deliver the only positive differentiator of SSAS writeback compared to other systems: instant incorporation of adjustments into the cube.<br />
Awesomesauce.Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com0tag:blogger.com,1999:blog-5269388141224850580.post-6221474690154601452015-05-07T04:52:00.000-07:002015-05-07T04:52:00.128-07:00The Next Chapter<div class="OutlineElement Ltr SCX12033601" style="margin-left: 0px;">
<div class="Paragraph SCX12033601" style="background-color: transparent; color: windowtext; font-family: "Segoe UI",Tahoma,Verdana,"Sans-Serif"; font-size: 6pt; font-style: normal; font-weight: normal; text-align: left; vertical-align: baseline;" xml:lang="EN-CA">
<span class="TextRun SCX12033601" style="background-color: transparent; color: windowtext; font-family: Calibri,Sans-Serif; font-size: 11pt; font-style: normal; font-weight: normal; line-height: 17px; text-decoration: none;" xml:lang="EN-CA"><span class="NormalTextRun SCX12033601" style="background-color: inherit;">It hasn't happened often with me - I've just changed employers. Farewell to Metrie - I spent a long time there, worked with fantastic colleagues, and as I hope this blog has shown... I learned a lot. The company started with absolutely nothing analytic - except for a ten user license of Cognos that went completely unused. It took some time and lobbying to arrive at a solid base for a data warehouse that could deliver them real value. My small team actually accomplished what I think most BI groups endeavour to do - integrated data from multiple systems, transformed and cleansed the data, fed quality and reliability information back to stewards, presented a business-oriented data model to quality presentation tools, and provided best practices for evolving the business and technology to meet the future. Unfortunately, the roadmap felt lackluster at that point - so I made the difficult decision to leave a solid and comfortable place with great relationships.</span></span><span class="EOP SCX12033601" style="font-family: Calibri,Sans-Serif; font-size: 11pt; line-height: 17px;"> </span></div>
</div>
<div class="OutlineElement Ltr SCX12033601" style="margin-left: 0px;">
<div class="Paragraph SCX12033601" style="background-color: transparent; color: windowtext; font-family: "Segoe UI",Tahoma,Verdana,"Sans-Serif"; font-size: 6pt; font-style: normal; font-weight: normal; text-align: left; vertical-align: baseline;" xml:lang="EN-CA">
<span class="TextRun SCX12033601" style="background-color: transparent; color: windowtext; font-family: Calibri,Sans-Serif; font-size: 11pt; font-style: normal; font-weight: normal; line-height: 17px; text-decoration: none;" xml:lang="EN-CA"><span class="NormalTextRun SCX12033601" style="background-color: inherit;">Now it's on to the next adventure, working with a completely different industry on their business intelligence challenges. CHC Helicopters is a world leader in aviation - of the rotary wing kind. They had an existing BI group that certainly seemed intent on the same goals I mentioned above, but was unable to solve some serious issues. They ran head-first into some of the design, implementation, and organizational problems I had the benefit of seeing second-hand through this great community. Unfortunately, the technical debt grew too large, so CHC decided to completely rebuild their BI group, and I'm excited to be part of that. Four months ago, I joined a group of four with a grand total of two years as part of this team. That's not to say they're greenhorns - far from it. They each bring a very different, and very valuable background to the process of retooling the platform, architecture, and practices of CHC's BI.</span></span></div>
</div>
<div class="OutlineElement Ltr SCX12033601" style="margin-left: 0px;">
<div class="Paragraph SCX12033601" style="background-color: transparent; color: windowtext; font-family: "Segoe UI",Tahoma,Verdana,"Sans-Serif"; font-size: 6pt; font-style: normal; font-weight: normal; text-align: left; vertical-align: baseline;" xml:lang="EN-CA">
<span class="TextRun SCX12033601" style="background-color: transparent; color: windowtext; font-family: Calibri,Sans-Serif; font-size: 11pt; font-style: normal; font-weight: normal; line-height: 17px; text-decoration: none;" xml:lang="EN-CA"><span class="NormalTextRun SCX12033601" style="background-color: inherit;">Over the last year I've been fairly quiet as I wrestled with career decisions and couldn't scrape together much to share in the way of technical posts. It's hard to feel useful posting tips and tricks for 2008R2 - around five years old by now, and without much change since 2008. CHC isn't any different - yet - they still run 2008R2 as well, with a clear intent on moving to a current release. But before that - and all the technical goodness that I'm sure will fall out of it for me - I have some very good lessons I've learned from reworking CHC's systems. So for </span><span class="SpellingError SCX12033601" style="background-color: inherit;">longtime</span><span class="NormalTextRun SCX12033601" style="background-color: inherit;"> readers, the good news is that I'm posting again. The bad news is that it won't be centered on the most frequent topic I've blogged about so far. Overall, I think this will be a good change - I'll get to share how I've solved some design problems that I haven't run into before, or that I thought were common knowledge but obviously weren't. I hope you enjoy the ride reading about it as much as I have had working it over the last few months.</span></span></div>
</div>
<br />Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com0tag:blogger.com,1999:blog-5269388141224850580.post-60030609693755864782014-08-04T00:08:00.000-07:002014-08-04T00:08:36.463-07:00Professional Update - Mid 2014It's been too quiet around here...<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://w0sjeg.bn1.livefilestore.com/y2piTjPbySBLdyZlIUG9QgW2tDVdQrZKs0CZXOnxiU8ECodPQMBucdEvQ30oGlzxTDJv8XevOHH4NCUonvu4RuJHWFsneWzq_vlymoNqlh4E2I/Vewy%20Quiet.jpg?psid=1" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://w0sjeg.bn1.livefilestore.com/y2piTjPbySBLdyZlIUG9QgW2tDVdQrZKs0CZXOnxiU8ECodPQMBucdEvQ30oGlzxTDJv8XevOHH4NCUonvu4RuJHWFsneWzq_vlymoNqlh4E2I/Vewy%20Quiet.jpg?psid=1" /></a></div>
And that's all on me. You see, I've been hunting rabbits - the elusive kind - a specific breed called "analytic culture" which requires a ton of focus, and can be downright frustrating.<br />
Over the past year, I've been working diligently to get the right sponsorship from my company's management to support a BI program. It's hard work, and we're not there yet. Unfortunately, like Mr. Fudd, I tend to shoot myself in the foot too. Remember "Guerilla BI"? Good idea (for me) to learn the (technical) ropes and see what's going on in the craft - but I think having done that gave management the impression this is a technical problem to solve, not a business issue. I dug my hole, was happy to do it in order to figure out what all this BI and DW stuff was about (and would still recommend you get your hands dirty as well). But I do wish I'd handled the transition of sticking my head up and getting attention for the program a little better than I did. Despite that, I'm not entirely certain there's a good understanding of what analytics need from the organization - and hunting that rabbit may turn into my white whale...<br />
We've had some other changes in the organization recently, and more big ones are on the way. It's looking like we'll be implementing an SAP ERP over the next couple of years, and I might get to peek at the inside of HANA. That should be interesting, and will certainly cause some architectural shifts in the construction of our data warehouse! That's without mentioning consolidating multiple systems into SAP, which the current data warehouse will assist with, but will also remove one of the perceived drivers to have a data warehouse. Who knows, I could get converted over to the "dark side" of Informatica, SAP BDW and BO...Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com3tag:blogger.com,1999:blog-5269388141224850580.post-2811473451050497292014-08-03T23:22:00.005-07:002014-08-03T23:35:36.631-07:00Visualizing Data on the WebHere's a non-NDA tidbit from the Microsoft MVP Summit - I think you'll find it quite useful. I'm going to go through my old posts to see if I can find places to retrofit with this cool little extra. If I can't find any, I'll probably try to contrive a few new posts (yes, they've been lacking too) to demonstrate the utility of it.<br />
There are times when you include HTML tables in your webpages or blog posts. If one of your visitors wants to capture that information, there are a couple of options available to them (and you). First, they can select, then paste the data into Excel. This assumes they have Excel locally - which isn't always a roadblock because they can fire up an Excel sheet in their free <a href="http://onedrive.com/">OneDrive account</a>. A second option is for you to provide that data in an Excel spreadsheet, and place a download link somewhere on your page.<br />
I think we can all agree that copy and paste isn't the best experience for your visitors. We can also surely agree that it's a nuisance to have to duplicate the data, save it in a spreadsheet on your own OneDrive, set up a public link, etc...<br />
What's easier than all that, and delivers a surprisingly excellent interface for exploring the data? It's what the Excel team calls "<a href="http://www.excelmashup.com/">Excel Interactive View</a>". I have no doubt that the first time you click on the button below, you'll be as shocked (pleasantly) as I was when I saw it first. Don't take my word for it, explore yourself:<br />
<a href="#" name="MicrosoftExcelButton" data-xl-tableTitle="" data-xl-buttonStyle="Standard" data-xl-fileName="Book1" data-xl-attribution="" ></a>
<br />
<table border="1">
<tbody>
<tr><th>Year</th><th>Main Conference Attendees</th><th>Precon Attendees</th><th>Source</th></tr>
<tr><td>2012</td><td>3894</td><td>1717</td><td>http://www.businesswire.com/news/home/20121107006865/en/PASS-Summit-2012-Kicks-Record-5600-Registrations</td></tr>
<tr><td>2011</td><td>3492</td><td>1616</td><td>http://www.sqlpass.org/summit/2011/Live/News/Partner25.aspx</td></tr>
<tr><td>2010</td><td>2814</td><td>993</td><td>http://www.businesswire.com/news/home/20101109005927/en/PASS-Summit-2010-Opens-Record-Registrations</td></tr>
<tr><td>2009</td><td>2200</td><td>798</td><td>http://www.sqlservercentral.com/blogs/steve_jones/2009/11/03/pass-summit-2009-wayne-snyder/</td></tr>
</tbody></table>
<script type="text/javascript" src="http://r.office.microsoft.com/r/rlidExcelButton?v=1&kip=1"></script>
Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com0tag:blogger.com,1999:blog-5269388141224850580.post-29040130635935128842014-08-03T23:18:00.000-07:002014-08-03T23:18:06.075-07:00Imposing Sort Order on Already Sorted Rows in SSISThere are times when you know a data flow is sorted in Integration Services, but SSIS doesn't know this fact. For example, you may have data coming out of a Derived Column or synchronous Script where you can guarantee a certain sort order based on how you're specifically processing the rows. (Synchronous components modify data in the rows without changing row order, so they're not permitted to change the IsSorted or SortKeyPosition properties.) Without having SSIS know how the data is sorted, there are limitations to what you can do with it.<br />
<h4>
What Limitations?</h4>
The most common impact is that you can't use a Merge Join (or the lesser-used Merge). SSIS will require that you throw a Sort component into the flow in order to use a Merge Join. Of course, doing that is worse than useless... the data is already sorted, and placing a Sort component in the flow causes all of the rows to be accumulated and stored in-memory before the flow can continue! You're not exactly going to get great performance out of that arrangement.<br />
<h4>
So How Do I Deal With That?</h4>
What can you do to optimize your data flow? Typical suggestions include pushing the rows into SQL Server (or other persistent location) and reading it again. Since all Source components are asynchronous, they allow you inform SSIS of a sort order by using the Advanced Editor. The drawback of that advice is the expense (network or disk I/O) of persisting the data.<br />
<h4>
Is there another option? </h4>
In fact, there is. We can use an asynchronous Script to our advantage with much lower performance impact than either a Sort component or persisting data. Yes, there will be a performance hit - not so much because of the "asynchronous" attribute of the Script, but because of the COM interop and memory copying.<br />
<h4>
The Details</h4>
Unfortunately, you'll have to <a href="http://msdn.microsoft.com/en-us/library/ms136133.aspx">set up an asynchronous script component</a>. This means you'll have to define columns on the Output that match what's on your Input (because you want exactly the same columns out as come in). Unfortunately, this means your metadata suffers a break here, and if any column definitions change, you have to update this component manually.<br />
In order to create an asynchronous script, you have to change the SynchronousInputID of Output0 on the Inputs and Outputs tab of the script component editor to "None". Once you do that, you'll see a new property available called IsSorted. Set this to true.<br />
We're not done, but the meat of the script is really quite simple. Select all of the columns on the Input Columns tab with checkmarks. Then open the script code editor. Inside the Input0_ProcessInputRow method, use code like this:<br />
<span style="font-family: "Courier New", Courier, monospace;"><strong>while (Row.NextRow())<br />{<br /> Output0Buffer.AddRow();<br /> Output0Buffer.Column1 = Row.Column1;<br /> Output0Buffer.Column2 = Row.Column2;<br /> Output0Buffer.Column3 = Row.Column3;<br /> // ...<br />}</strong></span><br />
Of course, I'm certain that your column names will be more imaginative, and practical, than mine.<br />
<h4>
One Further Note on Performance</h4>
I did say that this will have a performance hit because it's jumping over the COM interop layer. The net effect of this (if your data flow is otherwise fast) will be to pin a single core, as SSIS will allocate one thread to perform all the duties of this script. If you have cores to spare, you may want to try the Balanced Data Distributor, or my poor-man's BDD in order to drive more parallelism...Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com0tag:blogger.com,1999:blog-5269388141224850580.post-31650204725272562502014-08-03T22:45:00.000-07:002014-08-03T22:45:54.155-07:00Poor Man's Balanced Data DistributorIf you're looking to increase the performance of your SSIS packages like I am, (and you should be,) then you'll know about the <a href="http://www.microsoft.com/en-us/download/details.aspx?id=4123">Balanced Data Distributor</a> component that Microsoft released a couple years ago. <br />
The Balanced Data Distributor is a component that works much like a blend between a Multicast and a Conditional Split. It splits the incoming data into several outputs, roughly equally, and with a very efficient internal method. This allows you to copy a part of your data flow once, twice, or more times - and have the BDD help you parallelize the execution. One of the downsides - as with any custom component - is that you need to install it on each machine that will run the package. That can be a little problematic in some environments.<br />
<h4>
Simple Doesn't Work</h4>
You can achieve something that looks similar by using a Row Numbering type of transform and a Conditional Split. The Conditional Split would divide the rows by some function of the row number, allowing different threads to process the rows. The problem with doing so is that the division of rows isn't reliably made on a "buffer boundary" - even if you think you've crafted it that way. Buffer sizes can't be specified by you, they're dynamically chosen by the execution engine. Your Conditional Split doesn't allow you to phrase an expression that will let you tell it to pass a complete buffer out output #1, and the next complete buffer out output #2. The result of using a Row Number plus Conditional Split might be a slight increase in parallelism - but not an efficient one.<br />
<h4>
A Little More Work Does</h4>
Here's how to make that split much more efficient - perfectly along buffer boundaries - and get almost all of the benefits of the BDD without having to install it on all your systems.<br />
The "poor man's" BDD has 5 parts:<br />
<ol>
<li>A Derived Column - to create our "split" column (called BufferNumber)</li>
<li>A Script - to fill the "split" column</li>
<li>A Conditional Split - to split the buffers</li>
<li>Space for your parallel parts</li>
<li>A Union All (or series of Merges) - to serialize the flow again</li>
</ol>
It's steps 2 and 3 that need the explanation. Step 1 is pretty self-explanatory, especially if you know my aversion to creating columns in a Script (the interface is so opaque). Step 2 should be a "transform" script, as you'd expect - but we will not be using the pre-supplied ProcessInputRow method. Instead, paste this code inside the ScriptMain class:<br />
<br />
<span style="font-family: "Courier New", Courier, monospace;"><strong>private int _bufferCounter = 0;<br />public override void Input0_ProcessInput(Input0Buffer Buffer)<br />{<br /> while (Buffer.NextRow())<br /> {<br /> Buffer.BufferNumber = this._bufferCounter;<br /> }<br /> this._bufferCounter++;<br />}</strong></span><br />
Step 3's contents depend on how many parallel flows you're making. And, unlike the BDD, we have to coordinate the contents of the Conditional Split with the parallel flows - it doesn't do the split automatically. So, let's say you want three (3) parallel flows. You'll need three outputs from the Conditional Split, which means two expressions (leaving the remainder to flow out the default). The first expression should be "(BufferNumber % 3) == 1". The second should be "(BufferNumber % 3) == 2". Your Conditional Split should now have three outputs for you to hook up to identical copies of a particular data flow you're looking to parallelize. Then use step 5 to join the flows back together.<br />
<h4>
Step 2 Explained</h4>
Why use a script this way? What's that method you pasted in, and why use it instead of ProcessInputRow? Because of how the internals of SSIS, and the script component work. Typically, you're presented with ProcessInputRow - a nice, useful method for doing things with rows one by one. And really, that's pretty much all you are ever concerned with in a script - or any SSIS component - because trying to do things across multiple rows isn't particularly what it's designed for. However, inside the script component, if you look at the generated code behind what you're presented with, you'll see that there's a method called Input0_ProcessInput that calls ProcessInput in a loop, once for each row that it's presented. However, you have to know some SSIS internals to know that Input0_ProcessInput is called once per buffer. Now you know why we override that default implementation with our own. We're labeling each buffer with a number, perfectly identifying which rows belong to which buffers, so that we can route rows in buffer-sized chunks, whatever that size may be.<br />
<h4>
Step 3 Explained</h4>
This is an explainer for those that don't know what modulus is. If you do, never mind. In SSIS' expression language, like many other languages, the "percent" sign means "modulus." This essentially means that you want to divide the two numbers, but instead of getting the quotient, you want the remainder. So for our example, since we wanted three parallel flows, we wanted to know what the remainder of the BufferNumber divided by 3 was. That answer in the expression will cause one, and only one of the outputs to be true. If the BufferNumber is evenly divisible, the modulus will be zero - nothing left over - in which case the rows will go out the default output.<br />
<br />
<br />Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com0tag:blogger.com,1999:blog-5269388141224850580.post-85084076172334210762013-04-12T04:00:00.000-07:002013-04-12T04:00:20.646-07:00Data Explorer Rocks - Let's Get an Enterprise StoryFirst - it's been a while since I've posted. I have quite a few articles in the pipe, but they're "heavier", so take longer to flesh out and publish. I thought I'd start the ball rolling by commenting on the newly released <a href="http://www.microsoft.com/en-us/download/details.aspx?id=36803" target="_blank">Data Explorer Preview for Excel</a>. If you are an "information worker" that is copying and pasting data into Excel, then cleaning it up, <strong>download and install this now</strong>. It definitely has the potential to save you tons of time redoing that work, or remembering what you did, or where you got the data later. There are plenty of resources to help you get started with Data Explorer, with walkthroughs and samples - Bingle to your heart's content. I'll wait.<br />
<span style="color: #e69138; font-size: large;">The Commentary</span><br />
As most of you know, I'm a DW and SSIS guy, so I'm interested in "enterprise" tools for ETL - things that can be managed, scheduled, scaled, and monitored. So why am I commenting on Data Explorer? First, it is very cool. But my real intention is to question and to elicit some reaction as to the future of Data Explorer. Data Explorer is currently a "self-service" BI tool just like PowerPivot was, but PowerPivot has matured somewhat since v1. PowerPivot workbooks can be transferred (by two methods) from the wild west of self-service to the Victorian rigour of IT governance. <br />
The first method is pretty slick, transparent to business users, and pretty effective. IT simply has to set up their SharePoint farm appropriately with PowerPivot for SharePoint. Business users then simply take advantage of SharePoint's document repositories to collaborate and share. They'll do this of their own accord, because it's much more convenient than email attachments, and because you've berated them often enough about clogging your Exchange server with dozens of copies of a very large workbook distributed to a large number of recipients. What happens in the background is that SharePoint strips out the PowerPivot model from the Excel workbook and loads it into the PowerPivot for Excel shared service. In addition to the sharing benefits (not needing Excel or PowerPivot addin installed), IT immediately gets visibility and manageability of the underlying SSAS tabular model.<br />
The second method is a manual, deliberate process. An IT resource can open the workbook in Visual Studio, extract the PowerPivot tabular model (just like happens in SharePoint), work with it (to extend and/or validate it), then deploy it to an IT-managed SSAS server.<br />
<span style="color: #e69138; font-size: large;">Data Explorer Growth Options</span><br />
What's the story for Data Explorer? Right now, there doesn't appear to be one, but I'm sure hoping (and asking) Microsoft to think about one. The benefits are quite similar to those achieved by the PowerPivot example. If we can "grow up" the Data Explorer transforms, IT can validate them, stabilize them, and manage data lineage concerns.<br />
Similarly to PowerPivot, I can see a couple options here. First, they could simply create a "Data Explorer for SharePoint" complement to the "Data Explorer for Excel" shared service. When you uploaded an Excel workbook that uses Data Explorer, SharePoint would extract the DE part, store it in some kind of new DE repository/server, and data would be funneled through that server when the workbook was refreshed as an Excel Service. Monitoring and connection management are natural SharePoint extensions. Being able to view and/or edit the transform in a "management portal"... less likely - just open the workbook. Share a DE implementation between workbooks, or surface the DE result as a business data connection? I can see the appeal.<br />
But that first option is likely to be unappealing to a good portion of IT shops, for much the same reason that the first PowerPivot option is. It's not using tooling they're already familiar with and skilled up for. It's not "enterprise-grade". And it's baked into SharePoint, which has a particularly bad stink to SQL Server professionals.<br />
What I'd like to see is the ability to extract the Data Explorer bits from a workbook into SSIS. What would this look like? I can see two options - both of which would have to be a ways off in development, simply because Data Explorer isn't done yet.<br />
First, I could see a "translation" layer, where the DE "steps" are converted into native SSIS data flow components. Some of these components may not exist now (depending on what DE steps ultimately get shipped). There would be some development required on the SSIS side to address that.<br />
Second, I could see new DE components created for SSIS - possibly replacing or duplicating current components. That's a bigger job (I think), but may give Microsoft the opportunity to rationalize both tools at the same time. I don't know if they're interested in doing that.<br />
<span style="color: #e69138; font-size: large;">Final Thoughts</span><br />
One thing that all of the above still doesn't solve - and this is a little off-topic - is custom transforms. I don't currently see a method in DE to create your own transform. That's probably a little beyond your average information worker. But it's not beyond "the ecosystem". I can definitely see third parties wanting to create "address parsing" transforms and the like. I think this is a great opportunity to add that kind of curated "store" concept to DE/SSIS, to take over from amateurish efforts like mine.<br />
So when you see a demo of Data Explorer given, raise your hand as a data professional and ask the presenter about the enterprise story. That feedback will make it back to the product team, and make us all happier.Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com1tag:blogger.com,1999:blog-5269388141224850580.post-44069824933645193222012-10-22T05:00:00.000-07:002012-10-22T10:23:22.504-07:00Data Flow Optimization - Collapse Sparse Buffers<div class="separator" style="clear: both; text-align: left;">
<a href="https://skydrive.live.com/#cid=0C76C3CBDB305921&id=C76C3CBDB305921%212291" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="248" src="https://vx0hqg.sn2.livefilestore.com/y1pAQv4j_CIXUwyzzJ6Kas-5nZOSHhMQ_XTC36IlISc566tqRNTGkDncgRsxvjLYthAnN4yA2TdfNkQb6rDGOBOarbNVjzgEX8A/Windows%20Memory%20Warning.jpg?psid=1" width="320" /></a>I'll warn you straight off that this is an advanced technique for a fairly small use case - it is NOT a "try this for all data flow performance problems" type of technique. (Much like the typical "adjust your buffer size" useless advice.) I ran into errors and warnings in my data flow about "A buffer failed while allocating X bytes" followed by "The system reports Y percent memory load...". The second error message itself reports very little memory load, and took some digging to get an idea as to what was going on. If you find yourself in this position, AND have a data flow similar to the one I describe, perhaps this can help.</div>
<h2>
<span style="color: #e69138;">How Small A Use Case Is It?</span></h2>
For myself - pretty darn small. I can only assume it's also fairly "edge" in general out there as well. In my specific situation, I'm attempting to load 22 million rows into a fact table. At the point in the data flow where this problem crops up, the row is 88 bytes wide (not counting overhead). I'm running this on my laptop with 8GB of RAM, over 4GB "free". My data flow should only take about 2GB of RAM to process (again - not counting overhead, and being VERY simplistic). I should have enough headroom.<br />
However, I am doing something fairly interesting in the data flow that I've described in a partially-finished blog entry that's been unfinished for months and months. I'm allocating a total quantity over thousands of rows, and want to make sure I have no rounding errors in the allocation. In other words, I'm splitting a total dollar amount amongst a lot of rows according to a proportional formula, and I want to make sure it all adds back up to the original amount.<br />
I'll leave it to that unfinished blog post I'll publish in the future to describe in detail how and why I go about doing it the way I do. But I have to describe it briefly to demonstrate the issue. It involves using a conditional split to funnel off a small percentage of rows (about 3%) from the main flow. The intent is to add/subtract whatever "remainder" or rounding error that occurred in the allocation calculation(s) to be adjusted on those rows. It also involves a Sort, Aggregate, and Merge Join - a lot of blocking and partially blocking components - just the kind of thing that usually clogs up memory - but doesn't in the way you'd think in this case.<br />
<h2>
<span style="color: #e69138;">More Diagnostic Details</span></h2>
Well - the errors you (and I) get seem to indicate that SSIS is failing for no reason. Here are verbatim errors from my output window:<br />
<blockquote class="tr_bq">
Error: 0xC0047012 at Calculate Facts and Record Years: A buffer failed while allocating 79576 bytes.<br />
Error: 0xC0047011 at Calculate Facts and Record Years: The system reports 68 percent memory load. There are 8577282048 bytes of physical memory with 2730938368 bytes free. There are 8796092891136 bytes of virtual memory with 8790742302720 bytes free. The paging file has 9099673600 bytes with 11153408 bytes free.</blockquote>
"68%" usage on an 8GB system, and it's complaining about not having enough memory? Seriously? 2.7GB physical memory "free", and there's still a buffer allocation failure?<br />
I'm definitely no expert on analyzing what's going on, but I did break out the few tools I fumble around with. First, Task Manager. When I watch Task Manager's processes tab during a BIDS execution, I see that DTSDebugHost is (of course) doing all the work. The default columns shown in Task Manager include a "Memory" column - but it never gets very large - not even close to the 2GB I think should see if my pipeline is stuffed with 22 million rows. If you run the second tool, "perfmon /res" (I highly recommend it!) and watch memory usage as the data flow runs, you'll see a TON of "free memory" - even as SSIS (and probably windows itself, with little popup messageboxes like at the top of this article) is telling you that you're running out of memory. A little discomforting, to say the least...<br />
<div style="clear: both; text-align: left;">
But on that same Perfmon screen, look over to the right-hand side at the graphs. One of them shows the "Commit Charge" - and you'll see this one climbing steadily until it hits 100%. Then (sometimes) all hell breaks loose and your flow stops in its tracks with what looks like a bogus memory error.</div>
<div class="separator" style="clear: both; text-align: left;">
<a href="https://skydrive.live.com/#cid=0C76C3CBDB305921&id=C76C3CBDB305921%212290" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="233" src="https://ugidiw.sn2.livefilestore.com/y1pzFG1xDozKva15Bl2_5sDS2oGNU775YCL_SmZrXC3-scSsxrnoGd_coLCwhl6Bj0pphQHjhufiv-QF37vg1AJRmoOy_EJl1Cy/High%20Commit%20Charge%20with%20Free%20Memory.jpg?psid=1" width="400" /></a>The problem with the Task Manager memory number (and the "free" numbers seen in Perfmon) seems to be that this "memory" column isn't exactly the best one to be looking at. It's for the process' "Private Working Set", which (if <a href="http://windows.microsoft.com/en-US/windows-vista/What-do-the-Task-Manager-memory-columns-mean">I understand correctly</a>) is the memory that the process is actively using exclusively. Taking the hint from Perfmon's graphs, the memory that really counts in this case is the "Commit Charge." This appears to be memory that the process has allocated to it, perhaps that it has already used, but hasn't been released (by Windows) back to the big usable pool yet.</div>
If you change the columns you're looking at in Task Manager to add the "Memory (Commit Charge)" one - you'll now see a more accurate picture that matches up with the complaints you're getting from Windows and SSIS about memory pressure. In my flow execution, the Private Working Set doesn't exceed 500MB, but the Commit Charge climbs above 4GB!<br />
<h2>
<span style="color: #e69138;">What I Believe Is Happening</span></h2>
There's got to be a reason for this "excessive" commit charge, while actual memory "usage" is quite low, doesn't there? I can't explain it exactly, but my hunch is that it's due to memory fragmentation. Your package may not be actively holding on to that many bytes in memory, but if those bytes are sparsely used in a larger block of memory, then Windows may not be able to reclaim it for other processes (or itself).<br />
It was that thought of "sparse usage" that made me look more closely at the Conditional Split I was using. For those of you that understand the Data Flow a little deeper than most - and you should if you read my blog - the Conditional Split doesn't make copies of data, or move data into two separate buffer stacks. It handles "splits" much more efficiently than that. Well, I should say efficiently with respect to CPU cycles and memory copying... because in our case, this efficiency is part of the problem. The Conditional Split adds a column to your data flow that only the data flow engine can see. (You can't - don't try.) Simplistically, this column is filled with the name of the output that row is supposed to flow out of. Components attached to Conditional Split outputs get ALL the buffers with ALL the rows that entered the Conditional Split - but they only operate on those rows with the right output indicator in that special hidden column.<br />
The net effect of this behaviour, in combination with how the Sort component works, is an accumulation in memory of a set of very sparse data flow buffers. In my flow, one of my Conditional Split outputs - the one with 97% of the rows - is processed in a part of the flow where they get "consumed" by a Merge Join. The buffers that contain those rows ought to get released from memory, because those rows are no longer used. Except... for the 3% that went out the other output from the Conditional Split.<br />
The 3% flow gets "consumed" by a Sort component. Except that I don't really think they do (right away). I believe that the Sort simply queues up the buffers, waiting for all of them to be given to it. It's doing what it's been designed to do - acting like a fully blocking component that needs to see all the buffers/rows before it emits any. However, the Sort is acting particularly poorly (in this case) because it's keeping the incoming rows inside the incoming buffers. Normally this isn't bad, and would actually be very smart. In this edge case of ours, it would make more sense to copy the incoming rows to another internal memory structure to release the 97% of rows that are "empty" in those buffers. But in normal cases, where you're sorting all (or almost all of) the rows inside an incoming buffer, it would be a waste of CPU and memory to make a copy of the incoming rows... that you're just going to discard anyway once you emit the sorted rows.<br />
Bottom line - the Sort ends up holding on to (almost) EVERY ONE of my sparsely populated buffers. Buffers sized to hold 22 million rows, but only holding 1 million rows marked for its use by the Conditional Split.<br />
<h2>
<span style="color: #e69138;">The Solution - Defragment Your Buffers!</span></h2>
For those of you that scrolled to the bottom to get the TL;DR solution... you shouldn't have, because this probably won't work for you. It'll only work for those people that understand the problem they're having.<br />
That said, it's fairly harmless to try this solution out. But do record baseline performance metrics on what your data flow's memory, CPU, disk, and time usage looks like so that you can determine if there's an improvement or not.<br />
How do you "defragment" those sparsely populated buffers? Use a Union All.<br />
The Union All is another asynchronous component just like the Sort - but it doesn't block like the Sort does. The Sort component won't process <em>any</em> inbound row until it has accumulated all of them. The Union All processes <em>every</em> incoming row as soon as it sees it... it just doesn't emit output buffers until it has full ones (or it's reached the end of the data).<br />
We're going to use that behaviour to our advantage here. If you simply drop a Union All in your data flow between your Conditional Split and Sort, you'll be telling SSIS to behave in a much better manner for your (and my) unusual data flow. The "sparse" buffers that get emitted by the Conditional Split will get "consumed" by the Union All. (In actuality, I believe those rows get marked as "dead" by the data flow engine, and when all rows in a buffer are marked "dead", the engine can dispose of the buffer.) New buffers emitted by the Union All are dense, requiring much less "commit charge" to hold.<br />
It worked wonders for me on this particular data flow. My Commit Charge was only a few hundred KB higher than my Private Working Set for the package execution, and I received no memory warnings or errors on multiple executions. If you know you're manufacturing "sparse" buffers in your Data Flow - a Union All might be able to help you out.Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com1tag:blogger.com,1999:blog-5269388141224850580.post-43392621076320976132012-10-15T05:00:00.000-07:002012-10-15T05:00:01.136-07:00SSIS Raw File Viewer (2008/2008R2)Back in 2007, Simon Sabin <a href="http://sqlblogcasts.com/blogs/simons/archive/2007/01/11/SSIS-Rawfile-viewer---now-available.aspx">published a file viewer for SSIS RAW files</a>. Of course, that was for Integration Services 2005. Simon Trigona, one of my colleagues at Moulding and Millwork, recently built up a new 2008/2008R2 RAW file viewer from source that <a href="http://social.msdn.microsoft.com/Forums/pl-PL/sqlintegrationservices/thread/e6288076-a23c-4b86-8836-24955434a577">was published on the MSDN SSIS forums</a>. He's built a <a href="http://rawviewer.codeplex.com/">CodePlex project for it</a>, so you can see the internals and modify it as you see fit.<br />
Of course, I don't know if this will work for SSIS 2012 - my bet is no... I haven't yet had the time to migrate, but hopefully that'll come soon. When it does, we'll be making any necessary adjustments.<br />
Thanks to Simon Sabin and Mitulkumar Brahmbhatt.Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com1tag:blogger.com,1999:blog-5269388141224850580.post-67208094339073810132012-09-12T06:00:00.000-07:002012-09-12T06:00:04.204-07:00The Only Thing Constant is ChangeThe season is about to change for all of us, but this past Summer has been quite an eventful one for me. I won't bore you all with the details, but there are a few things I think will be of interest to readers here.<br />
Scott Stauffer, whom I lured to come work with me full-time at Moulding and Millwork couldn't bear my authoritarian rule, and is back on the freelance circuit helping more people than just me. (I jest about the authoritarian comment - I'm sure I'm not that bad to work with, although I <em>am</em> sure our organization was too slow for Scott's faster pace.) Good wishes to Scott - I'll still see him very often, as I'll be one of his clients now. Hey! No Dilbert-esque "rehired at a higher rate as a contractor" comments, please ;)<br />
Due in no small part to Scott's efforts, we now have a true Data Warehouse team and program here at M&M. That's even better than a "project" that implies some kind of time limit, which would have been better than the skunkworks guerilla work that I'd been pursuing before Scott arrived. I must say that the conventional wisdom about getting Data Warehousing going in an organization is completely correct in my case as well - if you don't have an executive sponsor that's fully committed, you're going to have a tough road to hoe.<br />
All of the above activities, plus a LOT more internal projects here at work have cut down significantly on my output here at the blog, as well as at the MSDN forums. I'm lucky I was able to keep up some community work by presenting at DevTeach and New York's SQL Saturday. However, I do believe I've gotten on top of managing this change, so things should get closer to the "old normal." Here's hoping the reduced focus I felt I've placed on the community doesn't hurt my chances of getting renewed as a SQL MVP this October too badly. I really do appreciate the opportunity to communicate directly to the product team, and once I get deep into SSIS 2012 and related products, I'm sure I'll be wasting a bunch of their time!<br />
I still really enjoy writing here, on the forums, and speaking to user groups even though I've been relatively quiet of late. I suppose that if priorities didn't ebb and flow, life would be quite stale. Now that I think I've figured out how to organize myself somewhat, I'll be back at it right here - and hopefully find time to complete version 2 of my BC Ferries app for Windows Phone (and Windows 8) as well.Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com0tag:blogger.com,1999:blog-5269388141224850580.post-63187080257668369852012-07-31T06:53:00.000-07:002012-07-31T06:53:18.662-07:00Reading Excel into SSIS via OpenXML ScriptingReading Excel data isn't easy. It should be, but there's a lot of unfortunate barriers in the way that I've discussed before. They boil down to one main reason: Excel wasn't created for IT, it was created for business users. We're quite obviously an afterthought. That said, we're much more capable (technically) than business users, so even if the product wasn't designed with us in mind, we do have some options to bend it to our will.<br />
<span style="color: #e69138; font-size: large;">OpenXML Revealed</span><br />
One of those options is using the OpenXML API. Excel has supported OpenXML since Excel 2007 changed file formats from xls to xlsx. The extra "x" at the end of the file extension refers to the "XML" in OpenXML. The other Office 2007 and later app file formats are similar - they're all OpenXML too. The big secret about these file formats is that they're mostly not binary - they're plain-text XML... but compressed. To see this in action, take any xlsx (or other Office 2007 or later file), and change the file extension to .zip - then open it. Viola, you can now see a whole folder structure with a bunch of xml files in it. Careful what you touch - they're all inter-related.<br />
<span style="color: #e69138; font-size: large;">What's the API do?</span><br />
How is the OpenXML API different from the "regular" Excel COM API? In one big way: the COM API actually starts an instance of Excel and talks to it to read and write to Excel files. Instead of doing that, the OpenXML API understands the file formats, and allows you to read and write to the file directly, without knowing how to render, calculate, or otherwise "run" an Excel spreadsheet.<br />
Of course, I'm not the first one to discover the OpenXML API... there are (of course) the Office-focused geeks that developed it, use it, live it, and breathe it. However, much of what they seem interested in is creating Excel files from applications, rather than consuming them. Documentation and examples of how to read SpreadsheetML (the spreadsheet "flavour" of OpenXML) is pretty damn poor.<br />
<span style="color: #e69138; font-size: large;">So What's an SSIS Geek To Do?</span><br />
Take matters into their own hands, that's what! I worked on this for a while, because I'm sure I'm not alone in having Excel hold a huge amount of organizational data. This critical information ranges from customer metadata (customer groupings and other attributes) to external economic indicators (housing starts, GDP, ...).<br />
I've worked up a fairly robust script that reads named columns from a named table in an Excel spreadsheet, and maps them to the columns defined in the script editor. I've developed it such that I can copy and paste the vast majority of the script code - everything except the column mapping. Sooner or later I might turn it into a custom component... but maybe you can do that for me (wink, wink).<br />
Here's The Code... With Training!<br />
First, install the OpenXML API. Second, configure your script to output the columns (with data types) you want to read from the Excel table. Third, add a reference to "DocumentFormat.OpenXML" in your Script. Fourth, paste the code (you should probably only paste the contents of the class, not replace the entire class definition - I've included it here so you have context). Last, but not least, modify the first method in the script to set up your column mapping.<br />
If you want some hands-on training, come to SQLSaturday #158 New York. I've got a session where we'll talk about what makes Excel so hard to use, and why using OpenXML is a pretty good solution.<br />
<br />
<code style="font-size: 12px;"><span style="color: black;"><br /></span><span style="color: green;">/* Microsoft SQL Server Integration Services Script Component<br />* Write scripts using Microsoft Visual C# 2008.<br />* ScriptMain is the entry point class of the script.*/<br /></span><span style="color: blue;">using </span><span style="color: black;">System</span><span style="color: grey;">;</span><span style="color: blue;">using </span><span style="color: black;">System.Collections.Generic</span><span style="color: grey;">;</span><span style="color: blue;">using </span><span style="color: black;">System.Data</span><span style="color: grey;">;</span><span style="color: blue;">using </span><span style="color: black;">System.Linq</span><span style="color: grey;">;</span><span style="color: blue;">using </span><span style="color: black;">Microsoft.SqlServer.Dts.Pipeline.Wrapper</span><span style="color: grey;">;</span><span style="color: blue;">using </span><span style="color: black;">Microsoft.SqlServer.Dts.Runtime.Wrapper</span><span style="color: grey;">;</span><span style="color: blue;">using </span><span style="color: black;">DocumentFormat.OpenXml.Packaging</span><span style="color: grey;">;</span><span style="color: blue;">using </span><span style="color: black;">DocumentFormat.OpenXml.Spreadsheet</span><span style="color: grey;">;</span><span style="color: blue;">using </span><span style="color: black;">DocumentFormat.OpenXml</span><span style="color: grey;">;<br /></span><span style="color: black;">[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]</span><span style="color: blue;">public class </span><span style="color: black;">ScriptMain </span><span style="color: grey;">: </span><span style="color: black;">UserComponent<br />{<br /> #region Script Setup<br /> </span><span style="color: green;">// Ensure you've set up output columns and supplied an Excel Connection Manager in the Script Editor UI<br /> /// <summary><br /> /// Set the following variable value to the name of the table in Excel. To find or set the name of the table in Excel,<br /> /// go to the Design ribbon in the Table Tools group. The table name is shown on the left side.<br /> /// </summary><br /> </span><span style="color: blue;">private static readonly string </span><span style="color: black;">ExcelTableName </span><span style="color: blue;">= </span><span style="color: darkred;">"CompanyDefs"</span><span style="color: grey;">;<br /> </span><span style="color: green;">/// <summary><br /> /// Set this variable to true if you want copious reporting done (for debugging)<br /> /// </summary><br /> </span><span style="color: blue;">private static readonly bool </span><span style="color: black;">VerboseLogging </span><span style="color: blue;">= false</span><span style="color: grey;">;<br /> </span><span style="color: green;">/// <summary><br /> /// Fill this method with calls to MapColumn to map Excel column names to SSIS output columns, and provide data types.<br /> /// </summary><br /> </span><span style="color: blue;">private void </span><span style="color: black;">MapColumns</span><span style="color: grey;">()<br /> </span><span style="color: black;">{<br /> </span><span style="color: green;">// sample:<br /> // this.MapColumn("Excel Column Header", "SSIS Column Name", typeof(string));<br /> </span><span style="color: blue;">this</span><span style="color: black;">.MapColumn</span><span style="color: grey;">(</span><span style="color: darkred;">"Sample Excel Column"</span><span style="color: grey;">, </span><span style="color: darkred;">"SSIS Column"</span><span style="color: grey;">, </span><span style="color: blue;">typeof</span><span style="color: grey;">(</span><span style="color: blue;">string</span><span style="color: grey;">), </span><span style="color: blue;">true</span><span style="color: grey;">);</span><span style="color: grey;"> </span><span style="color: black;">}<br /> #endregion<br /><br /> #region Code You Don't Touch<br /> </span><span style="color: green;">// The following code is configured based on the information you supplied in the section above,<br /> // and what columns and connection in the Script Editor UI.<br /> </span><span style="color: blue;">private const string </span><span style="color: black;">SCRIPT_NAME </span><span style="color: blue;">= </span><span style="color: darkred;">"OpenXML API Script Source for SpreadsheetML"</span><span style="color: grey;">;<br /> </span><span style="color: blue;">private const string </span><span style="color: black;">LAST_UPDATED </span><span style="color: blue;">= </span><span style="color: darkred;">"2012-06-06 23:14"</span><span style="color: grey;">;<br /> </span><span style="color: blue;">private static bool </span><span style="color: black;">__script_last_updated_logged </span><span style="color: blue;">= false</span><span style="color: grey;">;<br /> </span><span style="color: blue;">private static </span><span style="color: black;">IDTSComponentMetaData100 __metadata</span><span style="color: grey;">;<br /> </span><span style="color: green;">/// <summary><br /> /// The list of Excel to SSIS column maps<br /> /// </summary><br /> </span><span style="color: blue;">private readonly </span><span style="color: black;">List</span><span style="color: grey;"><</span><span style="color: black;">ColumnMapping</span><span style="color: grey;">> </span><span style="color: black;">_columnMappings </span><span style="color: blue;">= new </span><span style="color: black;">List</span><span style="color: grey;"><</span><span style="color: black;">ColumnMapping</span><span style="color: grey;">>();<br /> </span><span style="color: black;">#region CLASS: ColumnMapping<br /> </span><span style="color: blue;">private class </span><span style="color: black;">ColumnMapping<br /> {<br /> #region Property Setting Delegates<br /> </span><span style="color: blue;">public delegate void </span><span style="color: black;">NullSetter</span><span style="color: grey;">(</span><span style="color: blue;">bool </span><span style="color: black;">isNull</span><span style="color: grey;">);<br /> </span><span style="color: blue;">public delegate void </span><span style="color: black;">StringSetter</span><span style="color: grey;">(</span><span style="color: blue;">string </span><span style="color: black;">value</span><span style="color: grey;">);<br /> </span><span style="color: blue;">public delegate void </span><span style="color: black;">Int32Setter</span><span style="color: grey;">(</span><span style="color: black;">Int32 value</span><span style="color: grey;">);<br /> </span><span style="color: blue;">public delegate void </span><span style="color: black;">DateTimeSetter</span><span style="color: grey;">(</span><span style="color: black;">DateTime value</span><span style="color: grey;">);<br /> </span><span style="color: blue;">public delegate void </span><span style="color: black;">BooleanSetter</span><span style="color: grey;">(</span><span style="color: blue;">bool </span><span style="color: black;">value</span><span style="color: grey;">);<br /> </span><span style="color: green;">//ADD_DATATYPES_HERE<br /> </span><span style="color: black;">#endregion<br /><br /> #region </span><span style="color: blue;">Private </span><span style="color: black;">Variables<br /> </span><span style="color: blue;">private readonly string </span><span style="color: black;">_excelColumnName</span><span style="color: grey;">;<br /> </span><span style="color: blue;">private int </span><span style="color: black;">_excelColumnOffset</span><span style="color: grey;">;<br /> </span><span style="color: blue;">private readonly string </span><span style="color: black;">_ssisColumnName</span><span style="color: grey;">;<br /> </span><span style="color: blue;">private readonly </span><span style="color: black;">System.Type _dataType</span><span style="color: grey;">;<br /> </span><span style="color: blue;">private readonly bool </span><span style="color: black;">_treatBlanksAsNulls</span><span style="color: grey;">;<br /> </span><span style="color: black;">#endregion<br /><br /> </span><span style="color: blue;">public </span><span style="color: black;">NullSetteer SetNull</span><span style="color: grey;">;<br /> </span><span style="color: blue;">public </span><span style="color: black;">StringSetter SetString</span><span style="color: grey;">;<br /> </span><span style="color: blue;">public </span><span style="color: black;">Int32Setter SetInt</span><span style="color: grey;">;<br /> </span><span style="color: blue;">public </span><span style="color: black;">DateTimeSetter SetDateTime</span><span style="color: grey;">;<br /> </span><span style="color: blue;">public </span><span style="color: black;">BooleanSetter SetBoolean</span><span style="color: grey;">;<br /> </span><span style="color: green;">//ADD_DATATYPES_HERE<br /><br /> </span><span style="color: black;">#region Constructor<br /> </span><span style="color: blue;">public </span><span style="color: black;">ColumnMapping</span><span style="color: grey;">(</span><span style="color: blue;">string </span><span style="color: black;">excelColumnName</span><span style="color: grey;">, </span><span style="color: blue;">string </span><span style="color: black;">ssisColumnName</span><span style="color: grey;">, </span><span style="color: black;">System.Type dataType</span><span style="color: grey;">, </span><span style="color: blue;">bool </span><span style="color: black;">treatBlanksAsNulls</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">this</span><span style="color: black;">._excelColumnName </span><span style="color: blue;">= </span><span style="color: black;">excelColumnName</span><span style="color: grey;">;<br /> </span><span style="color: blue;">this</span><span style="color: black;">._excelColumnOffset </span><span style="color: blue;">= </span><span style="color: grey;">-</span><span style="color: black;">1</span><span style="color: grey;">;<br /> </span><span style="color: blue;">this</span><span style="color: black;">._ssisColumnName </span><span style="color: blue;">= </span><span style="color: black;">ssisColumnName</span><span style="color: grey;">;<br /> </span><span style="color: blue;">this</span><span style="color: black;">._dataType </span><span style="color: blue;">= </span><span style="color: black;">dataType</span><span style="color: grey;">;<br /> </span><span style="color: blue;">this</span><span style="color: black;">._treatBlanksAsNulls </span><span style="color: blue;">= </span><span style="color: black;">treatBlanksAsNulls</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> #endregion<br /><br /> #region </span><span style="color: blue;">Public </span><span style="color: black;">Properties<br /> </span><span style="color: blue;">public </span><span style="color: black;">System.Type DataType<br /> {<br /> get { </span><span style="color: blue;">return this</span><span style="color: black;">._dataType</span><span style="color: grey;">; </span><span style="color: black;">}<br /> }<br /><br /> </span><span style="color: blue;">public string </span><span style="color: black;">ExcelColumnName<br /> {<br /> get { </span><span style="color: blue;">return this</span><span style="color: black;">._excelColumnName</span><span style="color: grey;">; </span><span style="color: black;">}<br /> }<br /><br /> </span><span style="color: blue;">public string </span><span style="color: black;">SSISColumnName<br /> {<br /> get { </span><span style="color: blue;">return this</span><span style="color: black;">._ssisColumnName</span><span style="color: grey;">; </span><span style="color: black;">}<br /> }<br /><br /> </span><span style="color: blue;">public int </span><span style="color: black;">ExcelColumnOffset<br /> {<br /> get { </span><span style="color: blue;">return this</span><span style="color: black;">._excelColumnOffset</span><span style="color: grey;">; </span><span style="color: black;">}<br /> set { </span><span style="color: blue;">this</span><span style="color: black;">._excelColumnOffset </span><span style="color: blue;">= </span><span style="color: black;">value</span><span style="color: grey;">; </span><span style="color: black;">}<br /> }<br /><br /> </span><span style="color: blue;">public bool </span><span style="color: black;">ExcelColumnFound<br /> {<br /> get { </span><span style="color: blue;">return </span><span style="color: grey;">(</span><span style="color: blue;">this</span><span style="color: black;">._excelColumnOffset </span><span style="color: grey;">>= </span><span style="color: black;">0</span><span style="color: grey;">); </span><span style="color: black;">}<br /> }<br /><br /> </span><span style="color: blue;">public bool </span><span style="color: black;">TreatBlanksAsNulls<br /> {<br /> get { </span><span style="color: blue;">return this</span><span style="color: black;">._treatBlanksAsNulls</span><span style="color: grey;">; </span><span style="color: black;">}<br /> }<br /> #endregion<br /><br /> #region SSIS Buffer Setter<br /> </span><span style="color: blue;">public void </span><span style="color: black;">SetSSISBuffer</span><span style="color: grey;">(</span><span style="color: blue;">string </span><span style="color: black;">value</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> #region </span><span style="color: blue;">String<br /> if </span><span style="color: grey;">(</span><span style="color: blue;">this</span><span style="color: black;">._dataType </span><span style="color: blue;">== typeof</span><span style="color: grey;">(</span><span style="color: blue;">string</span><span style="color: grey;">))<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">try<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">this</span><span style="color: black;">.SetString</span><span style="color: grey;">(</span><span style="color: black;">value</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #region </span><span style="color: blue;">catch</span><span style="color: black;">...<br /> </span><span style="color: blue;">catch </span><span style="color: grey;">(</span><span style="color: black;">Exception ex</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> ReportError</span><span style="color: grey;">(</span><span style="color: darkred;">"Error encountered setting SSIS column '" </span><span style="color: grey;">+ </span><span style="color: blue;">this</span><span style="color: black;">._ssisColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' with string value '" </span><span style="color: grey;">+ </span><span style="color: black;">value </span><span style="color: grey;">+ </span><span style="color: darkred;">"': " </span><span style="color: grey;">+ </span><span style="color: black;">ex.Message</span><span style="color: grey;">, </span><span style="color: blue;">true</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Set SSIS column '" </span><span style="color: grey;">+ </span><span style="color: blue;">this</span><span style="color: black;">._ssisColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' with string value '" </span><span style="color: grey;">+ </span><span style="color: black;">value </span><span style="color: grey;">+ </span><span style="color: darkred;">"'"</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> #region </span><span style="color: blue;">Int<br /> else if </span><span style="color: grey;">(</span><span style="color: blue;">this</span><span style="color: black;">._dataType </span><span style="color: blue;">== typeof</span><span style="color: grey;">(</span><span style="color: blue;">int</span><span style="color: grey;">))<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">int </span><span style="color: black;">intValue </span><span style="color: blue;">= </span><span style="color: black;">0</span><span style="color: grey;">;<br /> </span><span style="color: blue;">try<br /> </span><span style="color: black;">{<br /> intValue </span><span style="color: blue;">= </span><span style="color: black;">Convert.ToInt32</span><span style="color: grey;">(</span><span style="color: black;">value</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #region </span><span style="color: blue;">catch</span><span style="color: black;">...<br /> </span><span style="color: blue;">catch </span><span style="color: grey;">(</span><span style="color: black;">Exception ex</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> ReportError</span><span style="color: grey;">(</span><span style="color: darkred;">"Error encountered converting Excel column '" </span><span style="color: grey;">+ </span><span style="color: blue;">this</span><span style="color: black;">._excelColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' value '" </span><span style="color: grey;">+ </span><span style="color: black;">value </span><span style="color: grey;">+ </span><span style="color: darkred;">"' to integer: " </span><span style="color: grey;">+ </span><span style="color: black;">ex.Message</span><span style="color: grey;">, </span><span style="color: blue;">true</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> </span><span style="color: blue;">try<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">this</span><span style="color: black;">.SetInt</span><span style="color: grey;">(</span><span style="color: black;">intValue</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #region </span><span style="color: blue;">catch</span><span style="color: black;">...<br /> </span><span style="color: blue;">catch </span><span style="color: grey;">(</span><span style="color: black;">Exception ex</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> ReportError</span><span style="color: grey;">(</span><span style="color: darkred;">"Error encountered setting SSIS column '" </span><span style="color: grey;">+ </span><span style="color: blue;">this</span><span style="color: black;">._ssisColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' with int value '" </span><span style="color: grey;">+ </span><span style="color: black;">intValue.ToString</span><span style="color: grey;">() + </span><span style="color: darkred;">"': " </span><span style="color: grey;">+ </span><span style="color: black;">ex.Message</span><span style="color: grey;">, </span><span style="color: blue;">true</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Set SSIS column '" </span><span style="color: grey;">+ </span><span style="color: blue;">this</span><span style="color: black;">._ssisColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' with int value '" </span><span style="color: grey;">+ </span><span style="color: black;">intValue.ToString</span><span style="color: grey;">() + </span><span style="color: darkred;">"'"</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> #region DateTime<br /> </span><span style="color: blue;">else if </span><span style="color: grey;">(</span><span style="color: blue;">this</span><span style="color: black;">._dataType </span><span style="color: blue;">== typeof</span><span style="color: grey;">(</span><span style="color: black;">DateTime</span><span style="color: grey;">))<br /> </span><span style="color: black;">{<br /> DateTime dateValue </span><span style="color: blue;">= new </span><span style="color: black;">DateTime</span><span style="color: grey;">(</span><span style="color: black;">1900</span><span style="color: grey;">, </span><span style="color: black;">1</span><span style="color: grey;">, </span><span style="color: black;">1</span><span style="color: grey;">);<br /> </span><span style="color: blue;">try<br /> </span><span style="color: black;">{<br /> dateValue </span><span style="color: blue;">= </span><span style="color: black;">dateValue.AddDays</span><span style="color: grey;">(</span><span style="color: black;">Convert.ToDouble</span><span style="color: grey;">(</span><span style="color: black;">value</span><span style="color: grey;">) - </span><span style="color: black;">2</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #region </span><span style="color: blue;">catch</span><span style="color: black;">...<br /> </span><span style="color: blue;">catch </span><span style="color: grey;">(</span><span style="color: black;">Exception ex</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> ReportError</span><span style="color: grey;">(</span><span style="color: darkred;">"Error encountered converting Excel column '" </span><span style="color: grey;">+ </span><span style="color: blue;">this</span><span style="color: black;">._excelColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' value '" </span><span style="color: grey;">+ </span><span style="color: black;">value </span><span style="color: grey;">+ </span><span style="color: darkred;">"' to DateTime: " </span><span style="color: grey;">+ </span><span style="color: black;">ex.Message</span><span style="color: grey;">, </span><span style="color: blue;">true</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> </span><span style="color: blue;">try<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">this</span><span style="color: black;">.SetDateTime</span><span style="color: grey;">(</span><span style="color: black;">dateValue</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #region </span><span style="color: blue;">catch</span><span style="color: black;">...<br /> </span><span style="color: blue;">catch </span><span style="color: grey;">(</span><span style="color: black;">Exception ex</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> ReportError</span><span style="color: grey;">(</span><span style="color: darkred;">"Error encountered setting SSIS column '" </span><span style="color: grey;">+ </span><span style="color: blue;">this</span><span style="color: black;">._ssisColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' with DateTime value '" </span><span style="color: grey;">+ </span><span style="color: black;">dateValue.ToString</span><span style="color: grey;">() + </span><span style="color: darkred;">"': " </span><span style="color: grey;">+ </span><span style="color: black;">ex.Message</span><span style="color: grey;">, </span><span style="color: blue;">true</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Set SSIS column '" </span><span style="color: grey;">+ </span><span style="color: blue;">this</span><span style="color: black;">._ssisColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' with DateTime value '" </span><span style="color: grey;">+ </span><span style="color: black;">dateValue.ToString</span><span style="color: grey;">(</span><span style="color: darkred;">"yyyy-MM-dd hh:mm:ss"</span><span style="color: grey;">) + </span><span style="color: darkred;">"'"</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> #region Boolean<br /> </span><span style="color: blue;">else if </span><span style="color: grey;">(</span><span style="color: blue;">this</span><span style="color: black;">._dataType </span><span style="color: blue;">== typeof</span><span style="color: grey;">(</span><span style="color: blue;">bool</span><span style="color: grey;">))<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">bool </span><span style="color: black;">boolValue </span><span style="color: blue;">= false</span><span style="color: grey;">;<br /> </span><span style="color: blue;">try<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">if </span><span style="color: grey;">((</span><span style="color: black;">value.ToUpper</span><span style="color: grey;">()</span><span style="color: black;">.Trim</span><span style="color: grey;">() </span><span style="color: blue;">== </span><span style="color: darkred;">"YES"</span><span style="color: grey;">)<br /> || (</span><span style="color: black;">value.ToUpper</span><span style="color: grey;">()</span><span style="color: black;">.Trim</span><span style="color: grey;">() </span><span style="color: blue;">== </span><span style="color: darkred;">"Y"</span><span style="color: grey;">)<br /> || (</span><span style="color: black;">value.ToUpper</span><span style="color: grey;">()</span><span style="color: black;">.Trim</span><span style="color: grey;">() </span><span style="color: blue;">== </span><span style="color: darkred;">"TRUE"</span><span style="color: grey;">))<br /> </span><span style="color: black;">{<br /> boolValue </span><span style="color: blue;">= true</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> </span><span style="color: blue;">else if </span><span style="color: grey;">((</span><span style="color: black;">value.ToUpper</span><span style="color: grey;">()</span><span style="color: black;">.Trim</span><span style="color: grey;">() </span><span style="color: blue;">== </span><span style="color: darkred;">"NO"</span><span style="color: grey;">)<br /> || (</span><span style="color: black;">value.ToUpper</span><span style="color: grey;">()</span><span style="color: black;">.Trim</span><span style="color: grey;">() </span><span style="color: blue;">== </span><span style="color: darkred;">"N"</span><span style="color: grey;">)<br /> || (</span><span style="color: black;">value.ToUpper</span><span style="color: grey;">()</span><span style="color: black;">.Trim</span><span style="color: grey;">() </span><span style="color: blue;">== </span><span style="color: darkred;">"FALSE"</span><span style="color: grey;">))<br /> </span><span style="color: black;">{<br /> boolValue </span><span style="color: blue;">= false</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> </span><span style="color: blue;">else<br /> </span><span style="color: black;">{<br /> ReportError</span><span style="color: grey;">(</span><span style="color: darkred;">"Invalid boolean value in column '" </span><span style="color: grey;">+ </span><span style="color: blue;">this</span><span style="color: black;">._excelColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"': '" </span><span style="color: grey;">+ </span><span style="color: black;">value </span><span style="color: grey;">+ </span><span style="color: darkred;">"'"</span><span style="color: grey;">, </span><span style="color: blue;">true</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> }<br /> #region </span><span style="color: blue;">catch</span><span style="color: black;">...<br /> </span><span style="color: blue;">catch </span><span style="color: grey;">(</span><span style="color: black;">Exception ex</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> ReportError</span><span style="color: grey;">(</span><span style="color: darkred;">"Error encountered converting Excel column '" </span><span style="color: grey;">+ </span><span style="color: blue;">this</span><span style="color: black;">._excelColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' value '" </span><span style="color: grey;">+ </span><span style="color: black;">value </span><span style="color: grey;">+ </span><span style="color: darkred;">"' to boolean: " </span><span style="color: grey;">+ </span><span style="color: black;">ex.Message</span><span style="color: grey;">, </span><span style="color: blue;">true</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> </span><span style="color: blue;">try<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">this</span><span style="color: black;">.SetBoolean</span><span style="color: grey;">(</span><span style="color: black;">boolValue</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #region </span><span style="color: blue;">catch</span><span style="color: black;">...<br /> </span><span style="color: blue;">catch </span><span style="color: grey;">(</span><span style="color: black;">Exception ex</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> ReportError</span><span style="color: grey;">(</span><span style="color: darkred;">"Error encountered setting SSIS column '" </span><span style="color: grey;">+ </span><span style="color: blue;">this</span><span style="color: black;">._ssisColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' with boolean value '" </span><span style="color: grey;">+ </span><span style="color: black;">boolValue.ToString</span><span style="color: grey;">() + </span><span style="color: darkred;">"': " </span><span style="color: grey;">+ </span><span style="color: black;">ex.Message</span><span style="color: grey;">, </span><span style="color: blue;">true</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Set SSIS column '" </span><span style="color: grey;">+ </span><span style="color: blue;">this</span><span style="color: black;">._ssisColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' with Boolean value '" </span><span style="color: grey;">+ </span><span style="color: black;">boolValue.ToString</span><span style="color: grey;">() + </span><span style="color: darkred;">"'"</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> </span><span style="color: green;">//ADD_DATATYPES_HERE<br /> </span><span style="color: blue;">else<br /> </span><span style="color: black;">{<br /> ReportUnhandledDataTypeError</span><span style="color: grey;">(</span><span style="color: blue;">this</span><span style="color: black;">._dataType</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> }<br /> #endregion<br /> }<br /> #endregion<br /><br /> #region Sets up map from Excel column to an SSIS column<br /> </span><span style="color: blue;">private void </span><span style="color: black;">MapColumn</span><span style="color: grey;">(</span><span style="color: blue;">string </span><span style="color: black;">excelColumnName</span><span style="color: grey;">, </span><span style="color: blue;">string </span><span style="color: black;">ssisColumnName</span><span style="color: grey;">, </span><span style="color: black;">System.Type dataType</span><span style="color: grey;">, </span><span style="color: blue;">bool </span><span style="color: black;">treatBlanksAsNulls</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">string </span><span style="color: black;">methodName </span><span style="color: blue;">= </span><span style="color: darkred;">"set_" </span><span style="color: grey;">+ </span><span style="color: black;">ssisColumnName.Replace</span><span style="color: grey;">(</span><span style="color: darkred;">" "</span><span style="color: grey;">, </span><span style="color: darkred;">""</span><span style="color: grey;">);<br /> </span><span style="color: black;">VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Creating " </span><span style="color: grey;">+ </span><span style="color: black;">dataType.ToString</span><span style="color: grey;">() + </span><span style="color: darkred;">" mapping from '" </span><span style="color: grey;">+ </span><span style="color: black;">excelColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' to '" </span><span style="color: grey;">+ </span><span style="color: black;">ssisColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' via " </span><span style="color: grey;">+ </span><span style="color: black;">methodName</span><span style="color: grey;">);<br /> </span><span style="color: black;">ColumnMapping mapping </span><span style="color: blue;">= new </span><span style="color: black;">ColumnMapping</span><span style="color: grey;">(</span><span style="color: black;">excelColumnName</span><span style="color: grey;">, </span><span style="color: black;">ssisColumnName</span><span style="color: grey;">, </span><span style="color: black;">dataType</span><span style="color: grey;">, </span><span style="color: black;">treatBlanksAsNulls</span><span style="color: grey;">);<br /> </span><span style="color: black;">#region Code to create delegates I'd have liked to have inside the ColumnMapping class itself if I could pass Output0Buffer...<br /> mapping.SetNull </span><span style="color: blue;">= </span><span style="color: grey;">(</span><span style="color: black;">ColumnMapping.NullSetter</span><span style="color: grey;">)</span><span style="color: blue;">Delegate</span><span style="color: black;">.CreateDelegate</span><span style="color: grey;">(</span><span style="color: blue;">typeof</span><span style="color: grey;">(</span><span style="color: black;">ColumnMapping.NullSetter</span><span style="color: grey;">), </span><span style="color: black;">Output0Buffer</span><span style="color: grey;">, </span><span style="color: black;">methodName </span><span style="color: grey;">+ </span><span style="color: darkred;">"_IsNull"</span><span style="color: grey;">);<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: black;">dataType </span><span style="color: blue;">== typeof</span><span style="color: grey;">(</span><span style="color: blue;">string</span><span style="color: grey;">))<br /> </span><span style="color: black;">{<br /> mapping.SetString </span><span style="color: blue;">= </span><span style="color: grey;">(</span><span style="color: black;">ColumnMapping.StringSetter</span><span style="color: grey;">)</span><span style="color: blue;">Delegate</span><span style="color: black;">.CreateDelegate</span><span style="color: grey;">(</span><span style="color: blue;">typeof</span><span style="color: grey;">(</span><span style="color: black;">ColumnMapping.StringSetter</span><span style="color: grey;">), </span><span style="color: black;">Output0Buffer</span><span style="color: grey;">, </span><span style="color: black;">methodName</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> </span><span style="color: blue;">else if </span><span style="color: grey;">(</span><span style="color: black;">dataType </span><span style="color: blue;">== typeof</span><span style="color: grey;">(</span><span style="color: blue;">int</span><span style="color: grey;">))<br /> </span><span style="color: black;">{<br /> mapping.SetInt </span><span style="color: blue;">= </span><span style="color: grey;">(</span><span style="color: black;">ColumnMapping.Int32Setter</span><span style="color: grey;">)</span><span style="color: blue;">Delegate</span><span style="color: black;">.CreateDelegate</span><span style="color: grey;">(</span><span style="color: blue;">typeof</span><span style="color: grey;">(</span><span style="color: black;">ColumnMapping.Int32Setter</span><span style="color: grey;">), </span><span style="color: black;">Output0Buffer</span><span style="color: grey;">, </span><span style="color: black;">methodName</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> </span><span style="color: blue;">else if </span><span style="color: grey;">(</span><span style="color: black;">dataType </span><span style="color: blue;">== typeof</span><span style="color: grey;">(</span><span style="color: black;">DateTime</span><span style="color: grey;">))<br /> </span><span style="color: black;">{<br /> mapping.SetDateTime </span><span style="color: blue;">= </span><span style="color: grey;">(</span><span style="color: black;">ColumnMapping.DateTimeSetter</span><span style="color: grey;">)</span><span style="color: blue;">Delegate</span><span style="color: black;">.CreateDelegate</span><span style="color: grey;">(</span><span style="color: blue;">typeof</span><span style="color: grey;">(</span><span style="color: black;">ColumnMapping.DateTimeSetter</span><span style="color: grey;">), </span><span style="color: black;">Output0Buffer</span><span style="color: grey;">, </span><span style="color: black;">methodName</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> </span><span style="color: blue;">else if </span><span style="color: grey;">(</span><span style="color: black;">dataType </span><span style="color: blue;">== typeof</span><span style="color: grey;">(</span><span style="color: blue;">bool</span><span style="color: grey;">))<br /> </span><span style="color: black;">{<br /> mapping.SetBoolean </span><span style="color: blue;">= </span><span style="color: grey;">(</span><span style="color: black;">ColumnMapping.BooleanSetter</span><span style="color: grey;">)</span><span style="color: blue;">Delegate</span><span style="color: black;">.CreateDelegate</span><span style="color: grey;">(</span><span style="color: blue;">typeof</span><span style="color: grey;">(</span><span style="color: black;">ColumnMapping.BooleanSetter</span><span style="color: grey;">), </span><span style="color: black;">Output0Buffer</span><span style="color: grey;">, </span><span style="color: black;">methodName</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> </span><span style="color: green;">//ADD_DATATYPES_HERE<br /> </span><span style="color: blue;">else<br /> </span><span style="color: black;">{<br /> ReportUnhandledDataTypeError</span><span style="color: grey;">(</span><span style="color: black;">dataType</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> </span><span style="color: blue;">this</span><span style="color: black;">._columnMappings.Add</span><span style="color: grey;">(</span><span style="color: black;">mapping</span><span style="color: grey;">);<br /> </span><span style="color: black;">#endregion<br /> }<br /> #endregion<br /><br /> #region CreateNewOutputRows </span><span style="color: grey;">- </span><span style="color: black;">the only method called from SSIS</span><span style="color: grey;">, </span><span style="color: blue;">this is </span><span style="color: black;">the </span><span style="color: darkred;">"entry point"<br /> </span><span style="color: blue;">public override void </span><span style="color: black;">CreateNewOutputRows</span><span style="color: grey;">()<br /> </span><span style="color: black;">{<br /> #region Set up verbose logging<br /> __metadata </span><span style="color: blue;">= </span><span style="color: black;">ComponentMetaData</span><span style="color: grey;">;<br /> </span><span style="color: black;">#endregion<br /><br /> </span><span style="color: green;">// See http://blogs.msdn.com/b/brian_jones/archive/2008/11/10/reading-data-from-spreadsheetml.aspx<br /> // http://openxmldeveloper.org/discussions/formats/f/14/p/5029/157797.aspx<br /> // http://blogs.msdn.com/b/ericwhite/archive/2010/07/21/table-markup-in-open-xml-spreadsheetml.aspx<br /><br /> </span><span style="color: black;">#region Configure Column Mapping<br /> </span><span style="color: blue;">this</span><span style="color: black;">.MapColumns</span><span style="color: grey;">();<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: blue;">this</span><span style="color: black;">._columnMappings.Count </span><span style="color: grey;">!= </span><span style="color: black;">ComponentMetaData.OutputCollection[0].OutputColumnCollection.Count</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">string </span><span style="color: black;">message </span><span style="color: blue;">= this</span><span style="color: black;">._columnMappings.Count.ToString</span><span style="color: grey;">() + </span><span style="color: darkred;">" column relationships have been set up, but the Script Source has "<br /> </span><span style="color: grey;">+ </span><span style="color: black;">ComponentMetaData.OutputCollection[0].OutputColumnCollection.Count.ToString</span><span style="color: grey;">() + </span><span style="color: darkred;">" output columns defined."</span><span style="color: grey;">;<br /> </span><span style="color: black;">ReportError</span><span style="color: grey;">(</span><span style="color: black;">message</span><span style="color: grey;">, </span><span style="color: blue;">true</span><span style="color: grey;">);<br /> </span><span style="color: blue;">throw new </span><span style="color: black;">ArgumentException</span><span style="color: grey;">(</span><span style="color: black;">message</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: blue;">this</span><span style="color: black;">._columnMappings.Count.ToString</span><span style="color: grey;">() + </span><span style="color: darkred;">" column mappings defined."</span><span style="color: grey;">);<br /> </span><span style="color: black;">#endregion<br /><br /> #region Extract Excel file name from connection manager<br /> </span><span style="color: blue;">string </span><span style="color: black;">workbookFileName </span><span style="color: blue;">= null</span><span style="color: grey;">;<br /> </span><span style="color: black;">SpreadsheetDocument document </span><span style="color: blue;">= null</span><span style="color: grey;">;<br /> </span><span style="color: blue;">try<br /> </span><span style="color: black;">{<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Extracting Excel file name from connection manager."</span><span style="color: grey;">);<br /> </span><span style="color: blue;">string </span><span style="color: black;">connectionString </span><span style="color: blue;">= </span><span style="color: black;">ComponentMetaData.RuntimeConnectionCollection[0].ConnectionManager.ConnectionString</span><span style="color: grey;">;<br /> </span><span style="color: blue;">string</span><span style="color: black;">[] connectionStringParts </span><span style="color: blue;">= </span><span style="color: black;">connectionString.Split</span><span style="color: grey;">(</span><span style="color: black;">';');<br /> </span><span style="color: blue;">foreach </span><span style="color: grey;">(</span><span style="color: blue;">string </span><span style="color: black;">connectionStringPart </span><span style="color: blue;">in </span><span style="color: black;">connectionStringParts</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">string</span><span style="color: black;">[] pair </span><span style="color: blue;">= </span><span style="color: black;">connectionStringPart.Split</span><span style="color: grey;">(</span><span style="color: black;">'=');<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: black;">pair[0] </span><span style="color: blue;">== </span><span style="color: darkred;">"Data Source"</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> workbookFileName </span><span style="color: blue;">= </span><span style="color: black;">pair[1]</span><span style="color: grey;">;<br /> </span><span style="color: black;">VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"File name of '" </span><span style="color: grey;">+ </span><span style="color: black;">workbookFileName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' identified in connection manager."</span><span style="color: grey;">);<br /> </span><span style="color: blue;">break</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> }<br /> }<br /> #region </span><span style="color: blue;">catch </span><span style="color: black;">...<br /> </span><span style="color: blue;">catch </span><span style="color: grey;">(</span><span style="color: black;">Exception ex</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> ReportError</span><span style="color: grey;">(</span><span style="color: darkred;">"Unable to parse connection string: " </span><span style="color: grey;">+ </span><span style="color: black;">ex.Message</span><span style="color: grey;">, </span><span style="color: blue;">true</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> #endregion<br /> #region Opening Excel file<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: black;">workbookFileName </span><span style="color: grey;">!= </span><span style="color: blue;">null</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">try<br /> </span><span style="color: black;">{<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Attempting to open Excel file."</span><span style="color: grey;">);<br /> </span><span style="color: black;">document </span><span style="color: blue;">= </span><span style="color: black;">SpreadsheetDocument.Open</span><span style="color: grey;">(</span><span style="color: black;">workbookFileName</span><span style="color: grey;">, </span><span style="color: blue;">false</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #region </span><span style="color: blue;">catch </span><span style="color: black;">...<br /> </span><span style="color: blue;">catch </span><span style="color: grey;">(</span><span style="color: black;">Exception ex</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> ReportError</span><span style="color: grey;">(</span><span style="color: darkred;">"Unable to open '" </span><span style="color: grey;">+ </span><span style="color: black;">workbookFileName </span><span style="color: grey;">+ </span><span style="color: darkred;">"': " </span><span style="color: grey;">+ </span><span style="color: black;">ex.Message</span><span style="color: grey;">, </span><span style="color: blue;">true</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> }<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Excel file opened."</span><span style="color: grey;">);<br /> </span><span style="color: black;">#endregion<br /> </span><span style="color: blue;">try<br /> </span><span style="color: black;">{<br /> WorkbookPart workbook </span><span style="color: blue;">= </span><span style="color: black;">document.WorkbookPart</span><span style="color: grey;">;<br /> </span><span style="color: black;">SharedStringTablePart sharedStringTablePart </span><span style="color: blue;">= </span><span style="color: black;">workbook.SharedStringTablePart</span><span style="color: grey;">;<br /> </span><span style="color: black;">#region Unused code </span><span style="color: blue;">for </span><span style="color: black;">finding ranges<br /> </span><span style="color: green;">//ComponentMetaData.FireInformation(0, "", "Got WorkbookPart", "", 0, ref fireAgain);<br /> //#region Look at Ranges<br /> //bool foundRange = false;<br /> //RangeDef rangeDef = new RangeDef();<br /> //foreach (DefinedName name in workbook.Workbook.GetFirstChild<DefinedNames>())<br /> //{<br /> // ComponentMetaData.FireInformation(0, "", "Looking at defined name '" + name.Name + "'", "", 0, ref fireAgain);<br /> // if (name.Name == this._rangeName)<br /> // {<br /> // ComponentMetaData.FireInformation(0, "", "Saving def", "", 0, ref fireAgain);<br /> // rangeDef.Name = name.Name;<br /> // string reference = name.InnerText;<br /> // ComponentMetaData.FireInformation(0, "", " reference: " + reference, "", 0, ref fireAgain);<br /> // rangeDef.Sheet = reference.Split('!')[0].Trim('\'');<br /> // string[] rangeArray = reference.Split('!')[1].Split('$');<br /> // rangeDef.StartCol = rangeArray[1];<br /> // rangeDef.StartRow = rangeArray[2].TrimEnd(':');<br /> // rangeDef.EndCol = rangeArray[3];<br /> // rangeDef.EndRow = rangeArray[4];<br /> // foundRange = true;<br /> // break;<br /> // }<br /> //}<br /> //ComponentMetaData.FireInformation(0, "", "Done looking for defined names", "", 0, ref fireAgain);<br /> //if (foundRange)<br /> //{<br /> // string rangeID = workbook.Workbook.Descendants<Sheet>().Where(r => r.Name.Equals(rangeDef.Sheet)).First().Id;<br /> // ComponentMetaData.FireInformation(0, "", "Got rangeID " + rangeID, "", 0, ref fireAgain);<br /> // WorksheetPart range = (WorksheetPart)workbook.GetPartById(rangeID);<br /> // ComponentMetaData.FireInformation(0, "", "Got Range", "", 0, ref fireAgain);<br /> //}<br /> //#endregion<br /> </span><span style="color: black;">#endregion<br /> #region Iterate over sheets to find table<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Searching sheets for table '" </span><span style="color: grey;">+ </span><span style="color: black;">ExcelTableName </span><span style="color: grey;">+ </span><span style="color: darkred;">"'."</span><span style="color: grey;">);<br /> </span><span style="color: black;">Table table </span><span style="color: blue;">= null</span><span style="color: grey;">;<br /> </span><span style="color: black;">Worksheet worksheet </span><span style="color: blue;">= null</span><span style="color: grey;">;<br /> </span><span style="color: blue;">foreach </span><span style="color: grey;">(</span><span style="color: black;">Sheet sheet </span><span style="color: blue;">in </span><span style="color: black;">workbook.Workbook.Sheets</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Examining sheet '" </span><span style="color: grey;">+ </span><span style="color: black;">sheet.Name </span><span style="color: grey;">+ </span><span style="color: darkred;">"'."</span><span style="color: grey;">);<br /> </span><span style="color: black;">WorksheetPart worksheetPart </span><span style="color: blue;">= </span><span style="color: grey;">(</span><span style="color: black;">WorksheetPart</span><span style="color: grey;">)</span><span style="color: black;">document.WorkbookPart.GetPartById</span><span style="color: grey;">(</span><span style="color: black;">sheet.Id</span><span style="color: grey;">);<br /> </span><span style="color: blue;">foreach </span><span style="color: grey;">(</span><span style="color: black;">TableDefinitionPart tableDefinitionPart </span><span style="color: blue;">in </span><span style="color: black;">worksheetPart.TableDefinitionParts</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Sheet contains table '" </span><span style="color: grey;">+ </span><span style="color: black;">tableDefinitionPart.Table.DisplayName </span><span style="color: grey;">+ </span><span style="color: darkred;">"'."</span><span style="color: grey;">);<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: black;">tableDefinitionPart.Table.DisplayName </span><span style="color: blue;">== </span><span style="color: black;">ExcelTableName</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> worksheet </span><span style="color: blue;">= </span><span style="color: black;">worksheetPart.Worksheet</span><span style="color: grey;">;<br /> </span><span style="color: black;">table </span><span style="color: blue;">= </span><span style="color: black;">tableDefinitionPart.Table</span><span style="color: grey;">;<br /> </span><span style="color: black;">VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Sheet and table found."</span><span style="color: grey;">);<br /> </span><span style="color: blue;">break</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> }<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: black;">table </span><span style="color: grey;">!= </span><span style="color: blue;">null</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">break</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> }<br /> #endregion<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: black;">table </span><span style="color: blue;">== null</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> ReportError</span><span style="color: grey;">(</span><span style="color: darkred;">"Table '" </span><span style="color: grey;">+ </span><span style="color: black;">ExcelTableName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' wasn't found in '" </span><span style="color: grey;">+ </span><span style="color: black;">workbookFileName </span><span style="color: grey;">+ </span><span style="color: darkred;">"'."</span><span style="color: grey;">, </span><span style="color: blue;">true</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> </span><span style="color: blue;">else<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">string </span><span style="color: black;">firstColumnHeader </span><span style="color: blue;">= </span><span style="color: darkred;">""</span><span style="color: grey;">;<br /> </span><span style="color: black;">#region Find Excel Column Offsets<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Collecting column offsets for mapped columns."</span><span style="color: grey;">);<br /> </span><span style="color: blue;">int </span><span style="color: black;">columnIndex </span><span style="color: blue;">= </span><span style="color: black;">1</span><span style="color: grey;">;<br /> </span><span style="color: blue;">foreach </span><span style="color: grey;">(</span><span style="color: black;">TableColumn tableColumn </span><span style="color: blue;">in </span><span style="color: black;">table.TableColumns</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: black;">columnIndex </span><span style="color: blue;">== </span><span style="color: black;">1</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> firstColumnHeader </span><span style="color: blue;">= </span><span style="color: black;">tableColumn.Name</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> </span><span style="color: blue;">foreach </span><span style="color: grey;">(</span><span style="color: black;">ColumnMapping columnRelationship </span><span style="color: blue;">in this</span><span style="color: black;">._columnMappings</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: black;">tableColumn.Name </span><span style="color: blue;">== </span><span style="color: black;">columnRelationship.ExcelColumnName</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Found Excel column " </span><span style="color: grey;">+ </span><span style="color: black;">tableColumn.Name </span><span style="color: grey;">+ </span><span style="color: darkred;">" at offset " </span><span style="color: grey;">+ </span><span style="color: black;">columnIndex.ToString</span><span style="color: grey;">());<br /> </span><span style="color: black;">columnRelationship.ExcelColumnOffset </span><span style="color: blue;">= </span><span style="color: black;">columnIndex</span><span style="color: grey;">;<br /> </span><span style="color: blue;">break</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> }<br /> columnIndex</span><span style="color: grey;">++;<br /> </span><span style="color: black;">}<br /> #region </span><span style="color: blue;">Throw </span><span style="color: black;">an error </span><span style="color: blue;">if </span><span style="color: black;">not all columns were found<br /> </span><span style="color: blue;">foreach </span><span style="color: grey;">(</span><span style="color: black;">ColumnMapping columnRelationship </span><span style="color: blue;">in this</span><span style="color: black;">._columnMappings</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(!</span><span style="color: black;">columnRelationship.ExcelColumnFound</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">string </span><span style="color: black;">message </span><span style="color: blue;">= </span><span style="color: darkred;">"Unable to locate column '" </span><span style="color: grey;">+ </span><span style="color: black;">columnRelationship.ExcelColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' in table '" </span><span style="color: grey;">+ </span><span style="color: black;">ExcelTableName </span><span style="color: grey;">+ </span><span style="color: darkred;">"'."</span><span style="color: grey;">;<br /> </span><span style="color: black;">ReportError</span><span style="color: grey;">(</span><span style="color: black;">message</span><span style="color: grey;">, </span><span style="color: blue;">true</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> }<br /> #endregion<br /> #endregion<br /> #region Read spreadsheet data into SSIS output buffer<br /> SheetData sheetData </span><span style="color: blue;">= </span><span style="color: black;">worksheet.GetFirstChild</span><span style="color: grey;"><</span><span style="color: black;">SheetData</span><span style="color: grey;">>();<br /> </span><span style="color: black;">IEnumerable</span><span style="color: grey;"><</span><span style="color: black;">Row</span><span style="color: grey;">> </span><span style="color: black;">rows </span><span style="color: blue;">= </span><span style="color: black;">sheetData.Elements</span><span style="color: grey;"><</span><span style="color: black;">Row</span><span style="color: grey;">>();<br /> </span><span style="color: black;">#region Find First Row<br /> UInt32 firstRow </span><span style="color: blue;">= </span><span style="color: black;">0</span><span style="color: grey;">;<br /> </span><span style="color: black;">VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Finding first row of table."</span><span style="color: grey;">);<br /> </span><span style="color: blue;">foreach </span><span style="color: grey;">(</span><span style="color: black;">Row row </span><span style="color: blue;">in </span><span style="color: black;">rows</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">foreach </span><span style="color: grey;">(</span><span style="color: black;">Cell cell </span><span style="color: blue;">in </span><span style="color: black;">row.Elements</span><span style="color: grey;"><</span><span style="color: black;">Cell</span><span style="color: grey;">>())<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: blue;">this</span><span style="color: black;">.CellReferenceToCoordinates</span><span style="color: grey;">(</span><span style="color: black;">cell.CellReference</span><span style="color: grey;">)</span><span style="color: black;">[0] </span><span style="color: blue;">== </span><span style="color: black;">1</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: blue;">this</span><span style="color: black;">.GetCellValue</span><span style="color: grey;">(</span><span style="color: black;">cell</span><span style="color: grey;">, </span><span style="color: black;">sharedStringTablePart</span><span style="color: grey;">) </span><span style="color: blue;">== </span><span style="color: black;">firstColumnHeader</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> firstRow </span><span style="color: blue;">= </span><span style="color: black;">row.RowIndex </span><span style="color: grey;">+ </span><span style="color: black;">1</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> }<br /> }<br /> }<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"First row of table is on row " </span><span style="color: grey;">+ </span><span style="color: black;">firstRow.ToString</span><span style="color: grey;">() + </span><span style="color: darkred;">"."</span><span style="color: grey;">);<br /> </span><span style="color: black;">#endregion<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Preparing to read " </span><span style="color: grey;">+ </span><span style="color: black;">rows.Count</span><span style="color: grey;"><</span><span style="color: black;">Row</span><span style="color: grey;">>()</span><span style="color: black;">.ToString</span><span style="color: grey;">() + </span><span style="color: darkred;">" table rows from Excel."</span><span style="color: grey;">);<br /> </span><span style="color: blue;">foreach </span><span style="color: grey;">(</span><span style="color: black;">Row row </span><span style="color: blue;">in </span><span style="color: black;">rows</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Reading row " </span><span style="color: grey;">+ </span><span style="color: black;">row.RowIndex.ToString</span><span style="color: grey;">() + </span><span style="color: darkred;">"."</span><span style="color: grey;">);<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: black;">row.RowIndex </span><span style="color: grey;">< </span><span style="color: black;">firstRow</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Skipping non-table or header row."</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> </span><span style="color: blue;">else<br /> </span><span style="color: black;">{<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Reading data row " </span><span style="color: grey;">+ (</span><span style="color: black;">row.RowIndex </span><span style="color: grey;">- </span><span style="color: black;">1</span><span style="color: grey;">)</span><span style="color: black;">.ToString</span><span style="color: grey;">() + </span><span style="color: darkred;">"."</span><span style="color: grey;">);<br /> </span><span style="color: blue;">bool </span><span style="color: black;">rowAdded </span><span style="color: blue;">= false</span><span style="color: grey;">;<br /> </span><span style="color: blue;">foreach </span><span style="color: grey;">(</span><span style="color: black;">Cell cell </span><span style="color: blue;">in </span><span style="color: black;">row.Elements</span><span style="color: grey;"><</span><span style="color: black;">Cell</span><span style="color: grey;">>())<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">foreach </span><span style="color: grey;">(</span><span style="color: black;">ColumnMapping columnRelationship </span><span style="color: blue;">in this</span><span style="color: black;">._columnMappings</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: blue;">this</span><span style="color: black;">.CellReferenceToCoordinates</span><span style="color: grey;">(</span><span style="color: black;">cell.CellReference</span><span style="color: grey;">)</span><span style="color: black;">[0] </span><span style="color: blue;">== </span><span style="color: black;">columnRelationship.ExcelColumnOffset</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">string </span><span style="color: black;">cellValue </span><span style="color: blue;">= this</span><span style="color: black;">.GetCellValue</span><span style="color: grey;">(</span><span style="color: black;">cell</span><span style="color: grey;">, </span><span style="color: black;">sharedStringTablePart</span><span style="color: grey;">);<br /> </span><span style="color: blue;">if </span><span style="color: grey;">((</span><span style="color: black;">cellValue </span><span style="color: blue;">== null</span><span style="color: grey;">)<br /> || ((</span><span style="color: black;">cellValue </span><span style="color: blue;">== </span><span style="color: darkred;">""</span><span style="color: grey;">) &</span><span style="color: black;">amp</span><span style="color: grey;">;&</span><span style="color: black;">amp</span><span style="color: grey;">; </span><span style="color: black;">columnRelationship.TreatBlanksAsNulls</span><span style="color: grey;">))<br /> </span><span style="color: black;">{<br /> </span><span style="color: green;">// do nothing<br /> </span><span style="color: black;">}<br /> </span><span style="color: blue;">else<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(!</span><span style="color: black;">rowAdded</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> Output0Buffer.AddRow</span><span style="color: grey;">();<br /> </span><span style="color: black;">rowAdded </span><span style="color: blue;">= true</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Excel column '" </span><span style="color: grey;">+ </span><span style="color: black;">columnRelationship.ExcelColumnName </span><span style="color: grey;">+ </span><span style="color: darkred;">"' contains '" </span><span style="color: grey;">+ </span><span style="color: black;">cellValue </span><span style="color: grey;">+ </span><span style="color: darkred;">"'."</span><span style="color: grey;">);<br /> </span><span style="color: black;">columnRelationship.SetSSISBuffer</span><span style="color: grey;">(</span><span style="color: blue;">this</span><span style="color: black;">.GetCellValue</span><span style="color: grey;">(</span><span style="color: black;">cell</span><span style="color: grey;">, </span><span style="color: black;">sharedStringTablePart</span><span style="color: grey;">));<br /> </span><span style="color: black;">}<br /> }<br /> }<br /> }<br /> }<br /> }<br /> #endregion<br /> }<br /> }<br /> #region </span><span style="color: blue;">catch </span><span style="color: black;">...<br /> </span><span style="color: blue;">catch </span><span style="color: grey;">(</span><span style="color: black;">Exception ex</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> ReportError</span><span style="color: grey;">(</span><span style="color: darkred;">"Unable to open Excel file using OpenXML API: " </span><span style="color: grey;">+ </span><span style="color: black;">ex.Message</span><span style="color: grey;">, </span><span style="color: blue;">true</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> VerboseLog</span><span style="color: grey;">(</span><span style="color: darkred;">"Closing Excel file"</span><span style="color: grey;">);<br /> </span><span style="color: black;">document.Close</span><span style="color: grey;">();<br /> </span><span style="color: black;">}<br /> #endregion<br /><br /> #region Helper functions to change Excel ranges to numeric coordinates</span><span style="color: grey;">/</span><span style="color: black;">offsets<br /> </span><span style="color: blue;">private int</span><span style="color: black;">[] RangeReferenceToCoordinates</span><span style="color: grey;">(</span><span style="color: blue;">string </span><span style="color: black;">rangeReference</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">int</span><span style="color: black;">[] coordinates </span><span style="color: blue;">= new int</span><span style="color: black;">[4]</span><span style="color: grey;">;<br /><br /> </span><span style="color: blue;">string</span><span style="color: black;">[] cellReferences </span><span style="color: blue;">= </span><span style="color: black;">rangeReference.Split</span><span style="color: grey;">(</span><span style="color: black;">':');<br /> </span><span style="color: blue;">int</span><span style="color: black;">[] startCellReference </span><span style="color: blue;">= this</span><span style="color: black;">.CellReferenceToCoordinates</span><span style="color: grey;">(</span><span style="color: black;">cellReferences[0]</span><span style="color: grey;">);<br /> </span><span style="color: blue;">int</span><span style="color: black;">[] endCellReference </span><span style="color: blue;">= this</span><span style="color: black;">.CellReferenceToCoordinates</span><span style="color: grey;">(</span><span style="color: black;">cellReferences[1]</span><span style="color: grey;">);<br /><br /> </span><span style="color: black;">coordinates[0] </span><span style="color: blue;">= </span><span style="color: black;">startCellReference[0]</span><span style="color: grey;">;<br /> </span><span style="color: black;">coordinates[1] </span><span style="color: blue;">= </span><span style="color: black;">startCellReference[1]</span><span style="color: grey;">;<br /> </span><span style="color: black;">coordinates[2] </span><span style="color: blue;">= </span><span style="color: black;">endCellReference[0]</span><span style="color: grey;">;<br /> </span><span style="color: black;">coordinates[3] </span><span style="color: blue;">= </span><span style="color: black;">endCellReference[1]</span><span style="color: grey;">;<br /><br /> </span><span style="color: blue;">return </span><span style="color: black;">coordinates</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /><br /> </span><span style="color: blue;">private int</span><span style="color: black;">[] CellReferenceToCoordinates</span><span style="color: grey;">(</span><span style="color: blue;">string </span><span style="color: black;">cellReference</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: green;">//bool fireAgain = true;<br /> </span><span style="color: black;">t[] coordinates </span><span style="color: blue;">= new int</span><span style="color: black;">[2]</span><span style="color: grey;">;<br /><br /> </span><span style="color: green;">//ComponentMetaData.FireInformation(0, "", "CellRef: [" + cellReference + "]", "", 0, ref fireAgain);<br /> </span><span style="color: blue;">int </span><span style="color: black;">index</span><span style="color: grey;">;<br /> </span><span style="color: black;">cellReference </span><span style="color: blue;">= </span><span style="color: black;">cellReference.Replace</span><span style="color: grey;">(</span><span style="color: darkred;">"$"</span><span style="color: grey;">, </span><span style="color: darkred;">""</span><span style="color: grey;">)</span><span style="color: black;">.Trim</span><span style="color: grey;">();<br /> </span><span style="color: black;">#region Collect column letters </span><span style="color: grey;">-> </span><span style="color: black;">column<br /> </span><span style="color: blue;">string </span><span style="color: black;">column </span><span style="color: blue;">= </span><span style="color: darkred;">""</span><span style="color: grey;">;<br /> </span><span style="color: blue;">for </span><span style="color: grey;">(</span><span style="color: black;">index </span><span style="color: blue;">= </span><span style="color: black;">0</span><span style="color: grey;">; </span><span style="color: black;">index </span><span style="color: grey;">< </span><span style="color: black;">cellReference.Length</span><span style="color: grey;">; </span><span style="color: black;">index</span><span style="color: grey;">++)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: darkred;">"ABCDEFGHIJKLMNOPQRSTUVWXYZ"</span><span style="color: black;">.Contains</span><span style="color: grey;">(</span><span style="color: black;">cellReference[index]</span><span style="color: grey;">))<br /> </span><span style="color: black;">{<br /> column </span><span style="color: grey;">+</span><span style="color: blue;">= </span><span style="color: black;">cellReference[index]</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> </span><span style="color: blue;">else<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">break</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> }<br /> #endregion<br /> </span><span style="color: green;">//ComponentMetaData.FireInformation(0, "", "column: [" + column + "]", "", 0, ref fireAgain);<br /> </span><span style="color: black;">#region Convert column into number </span><span style="color: grey;">-> </span><span style="color: black;">coordinates[0]<br /> </span><span style="color: blue;">for </span><span style="color: grey;">(</span><span style="color: blue;">int </span><span style="color: black;">power </span><span style="color: blue;">= </span><span style="color: black;">0</span><span style="color: grey;">; </span><span style="color: black;">power </span><span style="color: grey;">< </span><span style="color: black;">index</span><span style="color: grey;">; </span><span style="color: black;">power</span><span style="color: grey;">++)<br /> </span><span style="color: black;">{<br /> coordinates[0] </span><span style="color: grey;">+</span><span style="color: blue;">= </span><span style="color: grey;">(</span><span style="color: darkred;">"ABCDEFGHIJKLMNOPQRSTUVWXYZ"</span><span style="color: black;">.IndexOf</span><span style="color: grey;">(</span><span style="color: black;">column[column.Length - power - 1]</span><span style="color: grey;">) + </span><span style="color: black;">1</span><span style="color: grey;">) * (</span><span style="color: blue;">int</span><span style="color: grey;">)</span><span style="color: black;">Math.Pow</span><span style="color: grey;">(</span><span style="color: black;">26</span><span style="color: grey;">, </span><span style="color: black;">power</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> </span><span style="color: green;">//ComponentMetaData.FireInformation(0, "", "col coords: [" + coordinates[0].ToString() + "]", "", 0, ref fireAgain);<br /> </span><span style="color: black;">#region Convert row into number </span><span style="color: grey;">-> </span><span style="color: black;">coordinates[1]<br /> coordinates[1] </span><span style="color: blue;">= </span><span style="color: black;">Convert.ToInt32</span><span style="color: grey;">(</span><span style="color: black;">cellReference.Substring</span><span style="color: grey;">(</span><span style="color: black;">index</span><span style="color: grey;">));<br /> </span><span style="color: black;">#endregion<br /> </span><span style="color: green;">//ComponentMetaData.FireInformation(0, "", "row coords: [" + coordinates[1].ToString() + "]", "", 0, ref fireAgain);<br /><br /> </span><span style="color: blue;">return </span><span style="color: black;">coordinates</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> #endregion<br /><br /> #region Helper function to read Excel cell values<br /> </span><span style="color: blue;">private string </span><span style="color: black;">GetCellValue</span><span style="color: grey;">(</span><span style="color: black;">Cell cell</span><span style="color: grey;">, </span><span style="color: black;">SharedStringTablePart sharedStringTablePart</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: black;">cell.ChildElements.Count </span><span style="color: blue;">== </span><span style="color: black;">0</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">return null</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> </span><span style="color: blue;">if </span><span style="color: grey;">((</span><span style="color: black;">cell.DataType </span><span style="color: grey;">!= </span><span style="color: blue;">null</span><span style="color: grey;">) &</span><span style="color: black;">amp</span><span style="color: grey;">;&</span><span style="color: black;">amp</span><span style="color: grey;">; (</span><span style="color: black;">cell.DataType </span><span style="color: blue;">== </span><span style="color: black;">CellValues.SharedString</span><span style="color: grey;">))<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">return </span><span style="color: black;">sharedStringTablePart.SharedStringTable.ChildElements[Int32.Parse(cell.CellValue.InnerText)].InnerText</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> </span><span style="color: blue;">else<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">return </span><span style="color: black;">cell.CellValue.InnerText</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> }<br /> #endregion<br /><br /> #region Logging functions<br /> </span><span style="color: blue;">private static void </span><span style="color: black;">VerboseLog</span><span style="color: grey;">(</span><span style="color: blue;">string </span><span style="color: black;">message</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">bool </span><span style="color: black;">pbFireAgain </span><span style="color: blue;">= true</span><span style="color: grey;">;<br /><br /> </span><span style="color: blue;">if </span><span style="color: grey;">(!</span><span style="color: black;">__script_last_updated_logged</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> __metadata.FireInformation</span><span style="color: grey;">(</span><span style="color: black;">0</span><span style="color: grey;">, </span><span style="color: black;">__metadata.Name</span><span style="color: grey;">, </span><span style="color: darkred;">"Excel OpenXML API Source Script (" </span><span style="color: grey;">+ </span><span style="color: black;">LAST_UPDATED </span><span style="color: grey;">+ </span><span style="color: darkred;">") running."</span><span style="color: grey;">, </span><span style="color: darkred;">""</span><span style="color: grey;">, </span><span style="color: black;">0</span><span style="color: grey;">, </span><span style="color: blue;">ref </span><span style="color: black;">pbFireAgain</span><span style="color: grey;">);<br /> </span><span style="color: black;">__script_last_updated_logged </span><span style="color: blue;">= true</span><span style="color: grey;">;<br /> </span><span style="color: black;">}<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: black;">VerboseLogging</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> __metadata.FireInformation</span><span style="color: grey;">(</span><span style="color: black;">0</span><span style="color: grey;">, </span><span style="color: black;">__metadata.Name</span><span style="color: grey;">, </span><span style="color: black;">message</span><span style="color: grey;">, </span><span style="color: darkred;">""</span><span style="color: grey;">, </span><span style="color: black;">0</span><span style="color: grey;">, </span><span style="color: blue;">ref </span><span style="color: black;">pbFireAgain</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> }<br /><br /> </span><span style="color: blue;">private static void </span><span style="color: black;">ReportError</span><span style="color: grey;">(</span><span style="color: blue;">string </span><span style="color: black;">message</span><span style="color: grey;">, </span><span style="color: blue;">bool </span><span style="color: black;">fatal</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">bool </span><span style="color: black;">pbCancel</span><span style="color: grey;">;<br /> </span><span style="color: black;">__metadata.FireError</span><span style="color: grey;">(</span><span style="color: black;">0</span><span style="color: grey;">, </span><span style="color: black;">__metadata.Name</span><span style="color: grey;">, </span><span style="color: black;">message</span><span style="color: grey;">, </span><span style="color: darkred;">""</span><span style="color: grey;">, </span><span style="color: black;">0</span><span style="color: grey;">, </span><span style="color: black;">out pbCancel</span><span style="color: grey;">);<br /> </span><span style="color: blue;">if </span><span style="color: grey;">(</span><span style="color: black;">fatal</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: blue;">throw new </span><span style="color: black;">ApplicationException</span><span style="color: grey;">(</span><span style="color: black;">SCRIPT_NAME </span><span style="color: grey;">+ </span><span style="color: darkred;">" had a fatal error."</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> }<br /><br /> </span><span style="color: blue;">private static void </span><span style="color: black;">ReportUnhandledDataTypeError</span><span style="color: grey;">(</span><span style="color: black;">System.Type dataType</span><span style="color: grey;">)<br /> </span><span style="color: black;">{<br /> </span><span style="color: green;">// Need to add a Data Type? Search for ADD_DATATYPES_HERE<br /> </span><span style="color: black;">ReportError</span><span style="color: grey;">(</span><span style="color: darkred;">"This script can't handle " </span><span style="color: grey;">+ </span><span style="color: black;">dataType.ToString</span><span style="color: grey;">() + </span><span style="color: darkred;">" types."</span><span style="color: grey;">, </span><span style="color: blue;">true</span><span style="color: grey;">);<br /> </span><span style="color: blue;">throw new </span><span style="color: black;">ArgumentException</span><span style="color: grey;">(</span><span style="color: darkred;">"This script can't handle " </span><span style="color: grey;">+ </span><span style="color: black;">dataType.ToString</span><span style="color: grey;">() + </span><span style="color: darkred;">" types."</span><span style="color: grey;">);<br /> </span><span style="color: black;">}<br /> #endregion<br /> #endregion<br />}</span></code>Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com11tag:blogger.com,1999:blog-5269388141224850580.post-80996188181189754852012-05-29T06:00:00.000-07:002013-11-15T11:15:11.326-08:00Presenting at Vancouver DevTeach<div class="separator" style="clear: both; text-align: left;">
<a href="https://public.sn2.livefilestore.com/y1pQ5EcEfhd4SZo0eXLe7r9U0ToRESejj0Q0iKtAmSoZi9HKImilfA3a_AlZgzUJumou7zvO-uqH1MenGVfq_EFEA/DevTeach%20Banner.jpg?download&psid=1" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://public.sn2.livefilestore.com/y1pQ5EcEfhd4SZo0eXLe7r9U0ToRESejj0Q0iKtAmSoZi9HKImilfA3a_AlZgzUJumou7zvO-uqH1MenGVfq_EFEA/DevTeach%20Banner.jpg?download&psid=1" /></a>This post should have been made a couple months ago so that my legions of rabid fans could have lined up to... yeah, OK - I'm not fooling anybody, I know. But for those that <strong>are</strong> signed up to DevTeach in Vancouver, you should definitely put my sessions in your schedule. I mean, if you read my blog, I know you're interested in Integration Services, and that's what I'll be talking about.</div>
I'll give you the first day to get warmed up to the conference experience. I'll be hanging around from midday onwards, picking up a few sessions and hanging around in between poking at my computer. <br />
But first thing on the second day, come bright-eyed and bushy-tailed to hear some good solutions for common application pattern problems in SSIS. I mentioned in the abstract that we'll look at processing files in date order, and upsert patterns. I've got five other topics I can cover - flat files with headers and footers, deduplicating data, emailing resultsets, calculating running totals, inserting parent/child rows, and zipping files. The good part is that the audience decides what we spend our time on! If nobody there cares about some of those topics, we don't have to waste time listening to it.<br />
On the third conference day, if you touch Excel spreadsheets with Integration Services, you can't miss my afternoon session. Scratch that - if you use SSIS, you obviously have spreadsheets as a source. Be there. There aren't any foolproof techniques, but there's a great big bag of excellent ones that should get you through the minefield of Excel flexibility and user inconsistency. The big reveal will be a sample of using OpenXML to read data from a spreadsheet. I don't think I'm a fan of OpenXML yet - but it'll be hard to resist if it continues to work this well. (I'll be blogging about that soon.)<br />
DevTeach is a very cozy and informal conference - I have no doubt that those of you who are attending will love it. Don't hesitate to take advantage of that environment - the speakers are all willing to sit down with you about any topic. I know I am, so if you have an interesting SSIS problem, or just want to moan about the weather, post a comment here, or mail me at <myfullnamewithnospaces>@live.ca. Look forward to seeing you all there.<br />
<strong>UPDATE</strong>: Presentation materials <a href="https://skydrive.live.com/redir?resid=C76C3CBDB305921!6131&authkey=!AOmkWXiW8AEzJos">on SkyDrive</a>.Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com2tag:blogger.com,1999:blog-5269388141224850580.post-24719296747796940682012-05-22T05:00:00.000-07:002012-05-26T09:46:48.322-07:00Basic SSIS Equivalents to T-SQL's LIKEThere are some things that we do in one SQL Server tool, but can't seem to figure out how to do in another. One of those for you might be the <a href="http://msdn.microsoft.com/en-us/library/ms179859.aspx">LIKE keyword</a> in T-SQL. LIKE is fairly powerful in T-SQL - it does rudimentary regular expressions. How do you do that in SSIS? Well, you can't get too fancy, but here are some basics.<br />
<span style="color: #e69138; font-size: large;">Columns That Contain a Value</span><br />
This applies to cases where your T-SQL query might include a clause like this:<br />
<blockquote class="tr_bq">
<span style="font-family: "Courier New", Courier, monospace;">col LIKE '%value%'</span></blockquote>
The SSIS expression equivalent involves the function <a href="http://msdn.microsoft.com/en-us/library/ms141748.aspx">FINDSTRING</a>. FINDSTRING looks in any string expression (including a column) for an occurrence of a particular substring, and which occurrence to look for.<br />
The equivalent expression in SSIS to the example above is:<br />
<blockquote class="tr_bq">
<span style="font-family: "Courier New", Courier, monospace;">FINDSTRING(col, "value", 1) > 0</span></blockquote>
<span style="color: #e69138; font-size: large;">Columns That Start With a Value</span><br />
This applies to cases where your T-SQL query might include a clause like this (where X is a number):<br />
<blockquote class="tr_bq">
<span style="font-family: "Courier New", Courier, monospace;">col LIKE 'value%'</span></blockquote>
<br />
or<br />
<blockquote class="tr_bq">
<span style="font-family: "Courier New", Courier, monospace;">LEFT(col, X) = 'value'</span></blockquote>
This is a fairly rudimentary case - and there are multiple methods for developing an equivalent SSIS expression with FINDSTRING, or <a href="http://msdn.microsoft.com/en-us/library/hh231081.aspx">LEFT</a>, such as:<br />
<blockquote class="tr_bq">
<span style="font-family: "Courier New", Courier, monospace;">FINDSTRING(col, "value", 1) == 1</span></blockquote>
or<br />
<blockquote class="tr_bq">
<span style="font-family: "Courier New", Courier, monospace;">LEFT(col, X) == "value"</span></blockquote>
<span style="color: #e69138; font-size: large;">Columns That End With a Value</span><br />
This one's a little trickier (in SSIS2008R2 and below), and has prompted a few questions on the <a href="http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/threads">MSDN SSIS forums</a>. The T-SQL case would look something like:<br />
<blockquote class="tr_bq">
<span style="font-family: "Courier New", Courier, monospace;">RIGHT(col, X) = 'value'</span></blockquote>
There's a LEFT in SSIS' expression language... but no RIGHT prior to SSIS 2012, so what can you do? How about using REVERSE and LEFT, like this:<br />
<blockquote class="tr_bq">
<span style="font-family: "Courier New", Courier, monospace;">REVERSE(LEFT(REVERSE(col), X)) == "value"</span></blockquote>
<span style="color: #e69138; font-size: large;">More Complicated Demands?</span><br />
T-SQL's LIKE can service more complicated pattern searching - look for future posts where I'll try to investigate some. If you have any particular requests, post a comment...Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com5tag:blogger.com,1999:blog-5269388141224850580.post-61740527329241083462012-05-14T05:00:00.000-07:002012-05-14T05:00:13.968-07:00The Effective Permissions Tool is HandyAn recurring issue with Integration Services packages that read or write to file systems is permissions issues. It's quite difficult to have a package run properly when it can't see the source files, or can't write results to a particular network share. But it's insanely hard to get a package working if you don't think permissions are set up properly but your sysadmins do. I'll rehash what many of you already know (the typical problem and typical solution) in order to provide some background, then I'll call out a tool I've used when that just doesn't work... and you need to <em>persuade</em> your sysadmins to look a little harder.<br />
<span style="color: #e69138; font-size: large;">SQL Agent Jobs Fail...</span><br />
These errors typically surface when the job is put into production as a SQL Agent Job Step, where it's executing under an account you didn't develop it with. You'll typically get an error in the job history that looks like this:<br />
<br />
<blockquote class="tr_bq">
<span style="font-family: "Courier New", Courier, monospace;">Executed as user: DOMAIN\MACHINE$. Microsoft (R) SQL Server Execute Package Utility Version 10.0.5500.0 for 64-bit Copyright (C) Microsoft Corp 1984-2005. All rights reserved. Started: 3:42:58 PM Error: 2012-05-07 15:42:58.76 Code: 0xC001401E Source: Flat File Read Connection manager "Test" Description: The file name "\\Share\Folder\Testing.txt" specified in the connection was not valid. End Error</span></blockquote>
<br />
<br />
Your first reaction is "I can see that file - of course it's there!"<br />
It's not hard to overlook the fact that the account the job is executing under is the SQL Agent account - not yours, and it needs the same permissions your account did in order to do this job. (Well - hopefully not exactly the same permissions - just those few that are absolutely necessary.)<br />
<span style="color: #e69138; font-size: large;">... Because They Don't Have Permission ...</span><br />
You can see that in the first part of the message - "Executed as user:" (I've replaced real data with fake names to protect the innocent.) The message tells me that the job is executing under a system account (my fictitious computer is named "machine"). The folder "Folder" on the network share "Share" doesn't permit that account access, so the job fails.<br />
You have a few options to deal with this - some are better than others:<br />
<ol>
<li>You can grant access to \\Share\Folder to DOMAIN\MACHINE.</li>
<li>You can create a Proxy to execute the package with your account credentials (since you have permissions).</li>
<li>You can create a new domain account specifically for this purpose, grant it appropriate rights to the network share, create a Proxy for it, then configure the job to use that Proxy.</li>
</ol>
I always choose (3). No, I don't ask our sysadmins to create a new account for every Agent job. I create "service accounts" that I can use to manage distinct "systems." I've got one that handles ETL work for my data warehouse. I've got another that handles some import work for a third-party system. Each of them is used in a few places, but the key is that their permissions are very low (no interactive login, etc) and they only get rights as necessary to complete their tasks.<br />
However you do it, managing proxies usually gets you to your goal... until it doesn't.<br />
<span style="color: #e69138; font-size: large;">... Even After They Should!?</span><br />
I've run into scenarios where I'd swear a certain account ought to have access to a resource, and yet it doesn't seem to. In the case of file systems there's one particular tool that has saved my addled brain a couple times. After all, computers are dumb machines (no offense intended) and only do what we've told them to do. So if they're refusing access... it must be because we've instructed them to do so.<br />
Given that we've narrowed the problem down to our own inability to configure security, we need to confirm that diagnosis, and possibly figure out why. The tool I'm talking about is the Effective Permissions tab of the security settings dialog. To get there, find the file system location or object you're trying to access and open its properties. Switch to the security tab (which you <strong>swear</strong> shows what it should). Hit the Advanced button. This (permissions) view alone may highlight the mistake you've made - you may think certain accounts inherit things in a certain manner, but don't. You may think certain permissions apply further down the folder tree, but don't.<br />
In case you're still not enlightened as to why things aren't working, and need proof one way or the other that permissions really are the issue, click over to the Effective Permissions tab. Type or select the account your proxy is configured for, and you'll see exactly how the operating system resolves the permissions. <br />
If you set up the security settings, I hope this cuts down on the possible problems and highlights just a few. If you've asked someone else to set up permissions, this tool should show you whether the job was done correctly or not.Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com0tag:blogger.com,1999:blog-5269388141224850580.post-33158656638719114872012-05-08T05:00:00.000-07:002012-05-08T05:00:05.016-07:00How To Use T-SQL TIMESTAMP (ROWVERSION) in SSISIn some database designs, you may encounter SQL Server columns with a TIMESTAMP or ROWVERSION data type. What are these, and how do you deal with them in SQL Server Integration Services?<br />
<span style="color: #e69138; font-size: large;">Time? Version?</span><br />
First, let's talk semantics. There are several unfortunately bad aspects to these names. First, the name "TIMESTAMP" is deprecated - newer versions of SQL Server use the name ROWVERSION instead. I can imagine that there was confusion that the data type somehow represented an actual time value (it does not), or matched the ISO standard definiton of a TIMESTAMP data type (it does not). Unfortunately the ROWVERSION name doesn't describe it that well to me either, as there is no relation to the primary key, or how <em>many</em> updates have been made to a particular row.<br />
A better description would probably be to call the data type SEQUENCE... but <a href="http://msdn.microsoft.com/en-us/library/ff878091.aspx">that's already a keyword </a>used for another construct in SQL Server. A column with a TIMESTAMP or ROWVERSION data type is an auto-incrementing, unique value for the entire database. (Perhaps it's implemented internally as a <a href="http://msdn.microsoft.com/en-us/library/ff878091.aspx">SEQUENCE</a>... ;) ) Another wrinkle is that this column gets updated whenever any data in the row is changed by an UPDATE. So it's almost like an IDENTITY column for inserts and updates.<br />
<span style="color: #e69138; font-size: large;">What's it Used For?</span><br />
As highlighted in the opening of the MSDN page for ROWVERSION, this data type is (allegedly) typically used to version-stamp table rows. In my (limited) experience, it can be equally (mis)used similarly to a GUID column - a unique identifier for a row in a table, or a hash-like "has this been modified" value.<br />
However it's used in the construction of the table, it usually enters into the Integration Services world as a column you have to compare against another table's column - typically to detect changes.<br />
Some database developers may take advantage of the documented (but not guaranteed in my reading) "incrementing" behaviour, and use it to help determine which row is the newest when comparing two rows.<br />
Careful reading of the MSDN page also shows that <strong>duplicate rowversion values</strong> are possible if SELECT INTO statements are used improperly. Something to watch out for there.<br />
<span style="color: #e69138; font-size: large;">How Does it Appear to SSIS?</span><br />
When you extract data from a table that has a ROWVERSION column, SSIS will see it as a DT_BYTES column of 8 bytes. This makes sense, as the MSDN page describes it as semantically equivalent to a <span><span class="input">binary(8) or varbinary(8).</span></span><br />
<span><span class="input">Since it's a byte stream, that greatly limits what you'll be able to do with this data in SSIS - but then, the intent of this data type isn't to contain any more data than some kind of version stamp. Your most likely action inside SSIS is to compare this value with the value from another table to see if they're different...</span></span><br />
<span><span class="input"><span style="color: #e69138; font-size: large;">Performing Comparisons</span></span></span><br />
<span><span class="input">If you try the straightforward comparison in a Conditional Split or Derived Column... you'll get a syntax error. Despite <a href="http://msdn.microsoft.com/en-us/library/ms141714.aspx">claims to the contrary</a>, SSIS can't use any comparison operators on DT_BYTES columns. (Except the ISNULL function.)</span></span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://public.sn2.livefilestore.com/y1poKoV3pVfr3NV-uKZwsJF_KCaFblkFUsOdurhrgiPQuLBByBMxSmGGXoCeFpNIhLg_tzEOYwC_C1Az-y1ns63VQ/DT_BYTES%20comparison%20syntax%20error.jpg?psid=1" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="124" src="https://public.sn2.livefilestore.com/y1poKoV3pVfr3NV-uKZwsJF_KCaFblkFUsOdurhrgiPQuLBByBMxSmGGXoCeFpNIhLg_tzEOYwC_C1Az-y1ns63VQ/DT_BYTES%20comparison%20syntax%20error.jpg?psid=1" width="320" /></a></div>
What to do? Convert the DT_BYTES to a data type that SSIS can compare - DT_WSTR. There's a tricky spot here though - you can't convert the DT_BYTES(8) to a DT_WSTR(8). The conversion code sometimes generates more character bytes - so double the space to a DT_WSTR(16), <span><span class="input">and you should be off to the races.</span></span>Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com6tag:blogger.com,1999:blog-5269388141224850580.post-87657285579852410762012-04-09T05:30:00.000-07:002012-04-13T10:34:57.653-07:00Quick Reference: How Do I Read an Excel XSLM (Macro-Enabled) File in SSIS?Excel 2010 (and presumably the next version of Excel) have taken some security precautions with regards to macros in Excel files. If a spreadsheet has macros in it, you can't save it as an XLSX. Excel forces you to save it with an extension of XLSM, indicating that the file has macros in it. The assumption being, I suppose, to make it a little harder to sneak a malicious macro under a user's nose when they're not expecting one.<br />
<span style="color: #e69138; font-size: large;">The Problem</span><br />
This causes problems with SQL Server Integration Services 2005 and 2008, because the developers didn't take macro-enabled files into consideration when they designed the Excel Connection Manager. If you attempt to use a macro-enabled Excel file name in the Connection Manager, it will report that the path "contains invalid Excel File" (bad English too). It recommends you identify a file with an XLS or XLSX extension.<br />
<span style="color: #e69138; font-size: large;">The Solution</span><br />
Point the Connection Manager to a valid Excel XLSX file - any one will do, and save it.<br />
Edit the Connection Manager properties (press F4) and change the ExcelFilePath property to your actual macro-enabled Excel file. Consider SSIS fooled.Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com3tag:blogger.com,1999:blog-5269388141224850580.post-77127456022034314712012-04-02T05:30:00.000-07:002012-04-02T05:30:01.521-07:00Quick Tip: Custom Component PerformanceAs regular readers will know, I've got a few custom components for SQL Server Integration Services under my belt, some of which I've published at <a href="http://www.codeplex.com/site/users/view/toddmcdermid">CodePlex</a>. As I was working with one of the components I haven't published, I got a little frustrated with the length of time it was taking the UI to appear when I'd double-clicked it, as well as the length of time it took to return to the design surface after accepting my changes.<br />
After a little investigation, the excessive time could be traced to a single type of operation that occurs several times in this component inside the Validate and ReinitializeMetaData methods. It's quite a common thing to do - so if you happen to have created any components for SSIS, I suggest you read on .<br />
<span style="color: #e69138; font-size: large;">What's Slow?</span><br />
<div class="separator" style="clear: both; text-align: left;">
<a href="http://www.flickr.com/photos/redbettyblack/11316204" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="176" src="https://public.sn2.livefilestore.com/y1p67ZhMGkn9RL4vqkN0TCmYz6X-Uf5jCC540OLXUOETleWAVDJUIxcD7Q0XeuFGMPQMtur7-Kev_Alkn9vd3fgKQ/20120402%20Roundabout.jpg?psid=1" width="200" /></a>The performance drag? Iterating over IDTSInputColumnCollections and IDTSOutputColumnCollections. Probably a no-brainer for you knowledgeable COM interop experts out there - but iterating over those collections is <strong>slow</strong>.</div>
<span style="color: #e69138; font-size: large;">Where's the Nitrous?</span><br />
To avoid those iterations, I constructed my own "temporary" collection(s) with the same elements to iterate over instead. In some cases I used a Hashtable (keyed by column name or lineage ID), and in some cases a simple List was just fine. I still incurred one performance hit in building my temporary collection - because I had to iterate over the COM collection once. After that, smooth sailing and a huge performance increase.<br />
<span style="color: #e69138; font-size: large;">Warning...</span><br />
Do be a little careful though! Creating a temporary collection to "mirror" the built-in one is perfectly safe when in the Validation context, because nothing is permitted to change. However, you have to watch your own code and the "staleness" of your collection in the ReinitializeMetaData context. Both you and SSIS can be altering the input and output collections at this time. So if you're playing with the UsageType on inputs, or adding or removing columns from asynchronous outputs, you'll have to make similar adjustments to your temporary collections or they'll get out of sync.Todd McDermidhttp://www.blogger.com/profile/10444379572904639267noreply@blogger.com0