The Misconception
One of those techniques is getting at SSIS variables from within a Script Task or Script component. You'll find a lot of posts out there about the LockOneForRead, LockOneForWrite, LockForRead, LockForWrite and GetVariables methods. Here's a small sample of posts - but go ahead and Bing/Google yourself:
- SSIS: Reading and Writing to Variables in Script Task - Arie Jones (blog)
- SSIS: Writing to a variable from a script task - Jamie Thomson (blog|twitter)
- SSIS - Variable lists for the script task are case sensitive - Simon Sabin (blog|twitter)
- CODE TUTORIAL: SSIS Packages, Dynamic Parent/Child Variables - Justin Tubbs (blog)
- A deadlock was detected while trying to lock variables in SSIS - Darren Green (blog)
The way those posts are phrased, it sounds like those methods are what you should use to get at SSIS variables when you're using a Script. In my opinion, they're not.
The truth is that 99.9% of the time, getting dirty with those method calls isn't necessary. Using them requires you to understand what those methods do, what sequence they should be called in, and the fact you need to call Unlock after you're done. (Oh, yeah - that wasn't in the list of VariableDispenser methods, was it? Easy to miss...)
The reason those posts are out there is because the experts who've posted them know that most of the time, the simple way works perfectly, but they've run into a situation where it doesn't. They're helpfully posting how to use the advanced technique - since the advanced technique isn't typically documented well. But by doing so - and being very enthusiastic about the advanced technique, they (we) tend to lead others astray. Additionally, they're confident in their abilities to use the advanced technique correctly every time.
The Simple Method
SSIS provides a really simple way to get at variables, and it doesn't take code. See these two properties on the Script Task?
Accessing Variables in the Script Task
Type the fully qualified variable name - or select it off the list exposed by the ellipsis (...) button - into either the ReadOnlyVariables property or ReadWriteVariables property. Separate multiple variables using a comma (,).
Inside the code, those variables are now available in the Dts.Variables collection. To read a variable called "StringVar", do the following:// C# code to read an SSIS variable
string scriptVariable;
scriptVariable = (string)Dts.Variables["User::StringVar"].Value;
' VB code to read an SSIS variable
Dim scriptVariable As String
scriptVariable = Dts.Variables("User::StringVar").ValueIf you have a variable you've placed in the ReadWriteVariables property, you can do the above to read the value out of it so you can use it in your script. You can also push values into the SSIS variable, like this:
// C# code to write to an SSIS variable
Dts.Variables["User::StringVar"].Value = "New Value";
' VB code to write to an SSIS variable
Dts.Variables("User::StringVar").Value = "New Value"Accessing Variables in the Script Component
Identically to the Script Task, type the fully qualified variable name - or select it off the list exposed by the ellipsis (...) button - into either the ReadOnlyVariables property or ReadWriteVariables property. Separate multiple variables using a comma (,).
The difference here is that accessing the variables is much more user-friendly... but where you're allowed to access them is not and I'll explain that next. So first, to read SSIS variables in a script component, use code like this:
// C# code for reading an SSIS variable
string scriptVariable;
scriptVariable = Variables.StringVar;
' VB code for reading an SSIS variable
Dim scriptVariable As String
scriptVariable = Variables.StringVarTo write to a variable inside a Script component, do this:
// C# code for writing to an SSIS variable
Variables.StringVar = "New Value";
' VB code for writing to an SSIS variable
Variables.StringVar = "New Value"Now for the "tricky" part in Script components... you can't use the above code just anywhere (nor should you). The ReadOnly variables can be read from anywhere in your script - the PreExecute method is the most common place to read them, but you can also get at them in ProcessInputRow and PostExecute. The ReadWrite variables can only be used in the PostExecute method - your script will report an error at runtime of "The collection of variables locked for read and write access is not available outside of PostExecute." You can't even read them in the PreExecute method - all access to ReadWrite variables is limited to PostExecute. If you need to read a variable's value in the script, do the work, then update the value, you'll need to use two SSIS variables to get that job done.
The Rationale for the Advanced Method
You've heard about the "simple" technique, and yet there are these fancy methods that other articles and posts may have said "work better." It's absolutely true - but for a very small fraction of scenarios. Here they are:
Tighter Control Over Locking (Concurrency)
When you place variables in the ReadOnlyVariables and ReadWriteVariables properties, SSIS writes code for you that you never see. This code calls all those Lock* methods in the proper sequence and makes sure to Unlock them after you're done. The code to lock the variables is run in the PreExecute phase of the task, and the unlocking is done in the PostExecute. This means that for a long-running task (either a Script Task or a Data Flow Task containing a script) your variables are locked for a long time. Is this bad? Only if you have other tasks running at the same time that want to read or update those same variables. They simply won't be able to do so, and SSIS will report errors.
Using the Lock* and Unlock methods allows you to reduce that window to only the portion of time that you need it.
This is usually a horrible rationale. The fact that SSIS is reporting concurrent access problems with a resource should be drawing your attention to the fact that your package is improperly architected. You have resources that two tasks are using. One of those tasks will depend on settings made by the other task... but you want to run them in parallel? Resolve the contention issue by serializing the tasks. If you code the variable locking to a smaller area of code, all you'll be doing is reducing the risk of having a lock contention issue - you will not be eliminating it. Your package will eventually fail in production - and you probably won't be able to replicate the failure to fix it. If your intent is to have two script components in one data flow use the same variable - that's just bad architecture on your part. There's absolutely no guarantee that one of those scripts will finish before the other starts - even if the data flow looks like it should happen that way. Use the advanced technique in these situations at your own risk!
Reading From ReadWriteVariables In PreExecute
As you now know, placing a variable in a Script component's ReadWriteVariables property means you can only read and write to it in the PostExecute method. Sometimes you may have a script process - such as a row numbering operation - that may need to read an "initial" value from a variable, run through the rows, then update the variable with the "final" value for later use elsewhere in your package. In order to do this with the "simple" technique, you need to use two variables.
Using the advanced technique, you only need one. In the PreExecute method, call LockForRead and GetVariables, or call LockOneForRead. In the PostExecute method, call LockForWrite and GetVariables, or LockOneForWrite. Don't forget to Unlock the variable as well.
This rationale is understandable. The alternative is maintaining two variables in SSIS in order to accomplish a simple task. The downside is that you now have to understand what those methods do, and your use of the variables is not exposed to anyone who might support your package unless they read your script line by line. Using the ReadWriteVariables property exposes your use of those variables very nicely to everyone else. Using the advanced technique here makes for a cleaner set of variables in the package, so I can't say it's a horrible use case.
Best Practices for Reading and Writing Variables in Scripts
Given all those facilities and restrictions on using variables, and the freedom the advanced technique gives you - what should you do? Here are my best practices for using variables. They won't apply to every situation you run into, but you should try to adhere to them for the reasons I mention with them.
- Use the ReadOnlyVariables and ReadWriteVariables properties instead of the Lock* methods.
* Advertises to other package designers which variables you're using in your Scripts.* Guarantees that the variable locking is handled properly. - In Script components, read variables in PreExecute - not in ProcessInputRow.
* Reading variables crosses the COM/.Net barrier and that's expensive. If you need the value in the variable when you're processing rows, read it in the PreExecute method into a module-level variable, and use that. - In Script Tasks, read or write to variables once and only once.
* The same reasoning as #2 - crossing that COM barrier is expensive. Pull the value out and store it locally for use. When writing, don't use the SSIS variable as an intermediate storage location - use a local variable and only assign the final value once. - In Script components, if you need to read a variable, process data, then write back to the same variable... don't. Use two variables instead.
* See the reasons for #1 - they apply here just as well.* Your SSIS variables will be labeled appropriately for "before" and "after" states.
I hope you did - I know it took me quite a while to figure all that stuff out. Some of the blame lies with guys like me - we tend to broadcast the "neat" and "difficult" solutions, and forget about the simple ones that took a long time to learn.



Very nice Todd. Not over complicated and clearly explained some basic but important design techniques and things to keep in mind when desigining script tasks.
ReplyDeleteWhy did I run into this only today? Thank you Todd nice article
ReplyDeleteThe debate is raised again
ReplyDeletehttp://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/4b2b7eea-f8dd-490e-a162-309d4ac01585?prof=required
:)
Todd, I really appreciate all that you are saying. I think I have a weird situation where a previous programmer put two script tasks in a row. The first script pulls the lastkey.value in the PreExecute and then using the process rows goes through the number of records creating a collection of say 100 new key values.
ReplyDeleteThen in the next script component, he assigned to a variable in the process rows the new key values and then assigned each of them to the lastKey.value so the last one would be the last value. Seems redundant to use two components and if I stop using the processInputRow how will it go through each value?
collection of say 100 records that has to keep track of the last key value and increment it by one.
That's interesting, BJ.
ReplyDeleteYou could use two separate script components (not tasks - be precise!) or just one in your scenario. Keep in mind that if you use two, BOTH PreExecutes get run before either's ProcessInputRow, and BOTH PostExecutes get run after both ProcessInputRows.
It is perfectly legal to read an SSIS variable in script component #1's PreExecute, use it as a seed value to write to a column in script #1's ProcessInputRow, then read values from that same column in script #2's ProcessInputRow, then write the last value to the same SSIS variable in script #2's PostExecute... but ONLY if you don't use ReadOnlyVariables (in script #1) or ReadWriteVariables (in script #2). You have to use the "code" method to do this, as you'd get a locking problem if you used the properties.
However, if you use only one script component, you can simplify a lot. Put the variable in ReadWriteVariables. Read it in PreExecute, storing the value in a code variable. Use the code variable in ProcessInputRows. Write the code variable back to the SSIS variable in PostExecute.
Hope that gives you a usable blueprint.
Interesting read - helped a lot as we are going through a refactoring effort as we migrate older packages to 2008 and deciding on some "best practices". The article helped with variables that are used in normal workflow script tasks, but I do get the "variable locked" deadlock issue when trying to read the variables in an event handler. We have one older package that looks for various errors in the Error event handler to decide how to handle the error (who to email). You mentioned how to handle this in the "PreExecute" method ... only I'm not a coder so don't really know where to go about finding or creating (or modifying) a PreExecute method. Can you shed any light here? I did read your article on data flow performance where you do define a PreExecute method however this didn't work so well for me (kept running into errors (didn't like the "public override void PreExecute()" syntax as nothing to override?).
ReplyDeleteDo you mind positing a really brief example that reads the value of a variable?
As I mentioned I can read the value in a normal script task but not with an Error Event Handler.
Thanks!
alex
You should read through this post on the one exception to my "rule" I think has merit. Darren Green has code samples there that should address your problem!
ReplyDeleteThanks Todd, worked perfectly - manually unlocking and locking the variable was required because a failure during the script task left the variable locked while the "OnError" event was triggered. This makes sense now I have read more about the variable locking scenarios within SSIS.
ReplyDeleteLinked photos are dead :(
ReplyDeleteLooks like they're still working to me... but they don't show too much that's interesting anyway :)
ReplyDeleteTodd,
ReplyDeleteGreat post. Just wondering if I'll run into the locking issue on the following scenario. I have a flat file to import, that has variable record lengths, so I use a conditional split to gather the various record types into separate paths and ultimately, different OLE DB destinations (SQL tables). Part of the process is to take a certain record type that contains a batch identifier, and write that value to a package variable, and also use that variable as part of a derived column transformation for one of the other record types. Because I've set the properties for the data flow task to use multiple threads, might I run across this due to parallelism, or is the use in a different path going to prevent the locking?
If I understand correctly, your current solution involves just one Data Flow, where a value is read from the flow from one "row type" (R1), and written to other row types (R2, R3) in order to associate "batches" of rows of different row types that need to know they are related to each other.
ReplyDeleteIf that's the case, and you're reading and writing using SSIS variables, then yes - you'll have this blocking issue.
In general, there is usually a better way to handle this without using variables at all. I'm going to make assumptions that may not apply in your situation! I'm going to assume the rows in a "batch" appear sequentially within your file, like: R1, R2, R3, R4, R5, R1, R2, R3, R4, R5. (Where there are five row types, and two "batches" in that example.) I'm also going to assume the batch identifier is in R1. To tie the batches together, start with the concept of a row numbering script. But don't start at zero - read the "number" (could be a alpha code) from the R1 row itself. When you see a non-R1 row, write the batch number to the row - but don't "increment" the number like you would in a row numbering script. When you see an R1 type row, read in the batch number. Your situation is probably more complicated - post at the MSDN forums to explain how...
Yes, I have a header record for a given group of records, for which a "bacth" identifier is present in the header, but not in the individual records. There can be multiple header records in the flat file, but probably not more than 10 and most likely, somewhere in the 5 to 6 range, out of millions of rows of data. I'm not sure I understand what you mean with regard to how to handle this. Right now, the script component that updates the variable is triggered by every occurrence of the header record, so the variable is ONLY updated when a header record appears, and at no other time. The script component appears as the final destination for a Derived Column Transformation (aka DCT) that receives the header records coming out of the Conditional Split. Only one other record type needs to have the header records batch identifier associated with it, and that record type is handled by a separate destination of the Conditional Split. Initially, I read your reply's suggested method as being what I'm already doing, but somehow, I suspect the written word might be getting in the way... or my brain just isn't interpreting what you typed in the right way. When I "write the batch number to the row", I do so for the record type that needs it (let's use R5 as in your example) by adding another column to an existing DCT for that record type, where it simply type casts the package variable's value appropriately. Have I done a better job of explaining what I'm doing?
ReplyDeleteOK - you've explained more, and I think it confirms that your scenario is in line with what I think. My suggestions:
ReplyDelete1. Get rid of the SSIS variable entirely. It's not necessary. Remove your current scripts.
2. Early in your flow, use a DCT to add a "batch" column with a NULL value (with the appropriate data type).
3. Add a single script after the DCT, mark the "batch" column as ReadWrite, and the column that may contain the batch number as ReadOnly. Add a module-level variable to store the mostRecentBatchSeen - default it to a non-null value in PreExecute. In Input0_ProcessInputRow, decode your row to determine if it holds the batch - if it does, store it in mostRecentBatchSeen. Next, still in Input0_ProcessInputRow, assign mostRecentBatchSeen to Row.batch.
I think the rest will be easy to do after you do that.
This sounds like a great idea. I already have a script task in place that derives the row number from the flat file (file itself does NOT contain it, so a class-level variable is incremented during Input0_ProcessInputRow, immediately after it's value is assigned to an Output Column created for that purpose, and that variable is initialized to 1 at declaration. I shouldn't even need the DCT to create the extra column, as the Script Task can do that for me. I'm looking forward to testing this, as I already have the row number part working.
ReplyDeleteSSIS is a somewhat recently acquired skillset, and while I've now worked with it for about a year, I just haven't had a lot of exposure to the various ways to accomplish things, and this idea is definitely helping me start to think outside the typical box that SSIS often wants to paint you in a corner of. I'll have to let you know the testing results, but that may not happen until next week, as I might not actually have the header records handy until then, and it could be later than that, but I will definitely post the results when I have them. Thanks again!
Turns out it works perfectly. Many thanks!
ReplyDeleteI appreciate the time and effort spent in explaining the various advantages and disadvantages of variable usage between using the component's properties and scripting. I've used the scripting method since the initial releases of SSIS, and never ran into any of the issues eloquently delineated above. It's good to have such a reference to improve coding practices. Kudos to you Todd.
ReplyDeleteRan into this one again today in SQL/SSIS/VS2019. Wanted to use a variable to begin numbering rows, number the rows, then update the variable with the last number used. I even tried to use two script components to do it the "easy" way - reading the variable and writing to rows in the first one, then reading the rows and writing to the variable with the second one. Unfortunately, the first script locks the variable for the whole flow - and SSIS hangs (with no messaging!) on PostExecute as the second script waits for the variable to become available to write to. Had to use LockOneForRead/LockOneForWrite - but only needed one script component.
ReplyDelete