(Again, this a quick reference post for me, but I'm sure this information will be very valuable for you out there as well.)Is Office Required to Read or Write to Excel or Access?
With Office 2007, you may receive a message such as "The 'Microsoft.ACE.OLEDB.12.0' provider is not registered on the local machine."
No - a full Office install or installation of the specific product is NOT necessary. You do need the Data Providers though. For Office 2007, those are here: 2007 Office System Driver: Data Connectivity Components. If you're using Office 2010 (beta), the providers are here: 2010 Office System Driver Beta: Data Connectivity Components. (I will attempt to revisit and relink when the RTM components are released.)
It Runs Fine in BIDS, But Not On My Server!
This might be a manifestation of the prior issue - not many network admins like Office installed on servers (I know mine don't) - but it could be that your servers are 64-bit. If they are, check out this post for the answers: Quick Reference: SSIS in 32- and 64-bits.
It Worked With My XLS When Testing, but Fails in Production!
If it's not one of the above, you might be getting "metadata errors" in validation, or NULLs showing up in columns in your Data Flow where you see perfectly good data in the source spreadsheet.
The likely cause of this kind of problem is with the Excel data provider that actually reads the data out of the Excel spreadsheet and gives it to SSIS. The problem originates with the simple fact that Excel isn't a database, and doesn't require that all the values in a particular column meet "rules" for a specific data type. You can put text, numbers, dates - whatever you want - anywhere you want in an Excel spreadsheet. Very powerful for doing what spreadsheets are good at - but reason #1 that any DBA will tell you not to to use Excel like a database. Since there are no data type "rules" for Excel to follow, it guesses. You heard right - it completely ignores any formatting you have set on the cells, and by default only takes a look at the first eight rows to guess at what kind of data to pass on to Integration Services. If the first eight rows make it think the column contains numbers, then any string value it runs into magically gets imported as NULL.
So how do you get around it? It's unfortunately fairly complicated - but there is one spectacular resource out there to help: Douglas Laudenschlager's blog posts tagged "Excel". Pure gold for this and any other Excel issue you happen to run into.
How Do I Read A Section Of A Spreadsheet?
As in "how do I read columns 5 to 10?" Use a SELECT statement, using the standard Excel column notation. Such as "SELECT F, G, H, I, J FROM SheetName"
But what about "how do I read columns 5 through 10, between rows 4 and 1500?" For that, you need Excel's named ranges to help out. Define a named range in Excel, then use a SELECT statement FROM that named range. (I suggest NOT calling your named range the same name as a worksheet.)
Can I Read Formulas Out Of A Spreadsheet?
Simple! No. The Excel provider only provides data, not how the Excel spreadsheet arrived at it. However, you may use the Office object model to manipulate a spreadsheet and extract the formulas that way. But I don't recommend it! See the "macros" issue below.
Can I Write Formulas Into A Spreadsheet?
Trivial answer! No. See above.
How Do I Run Macros On An Office Document?
Not a good idea... But you can (but shouldn't) use a Script Task to use the Office COM API. You'll probably regret it.
Why am I so down on this idea? Because Excel, Word, Access, and all other Office apps were designed to be run by an end-user, not the server. "Oh, get off your high horse - client-side apps won't bring down your server." No, they won't - they just won't work reliably for you. I'm not denigrating the quality of Office applications - I'm talking actual architecture here. Office applications depend on having access to a user's registry for mundane little things like retrieving the user's initials. Yes - you remember that, right? The first thing any Office app asks you after installation. Guess what happens when a headless server process like SQL Agent starts up an Integration Services package? No user registry is loaded - and no, it doesn't help to use an actual user account as a proxy - the user registry still isn't loaded. So what does Excel do? Ask for the user's initials! The net effect is that this hangs your package waiting for user input that will never come.
Review Considerations for server-side Automation of Office for specific details and unsavory options.
Bottom line - if you can accomplish the task you're asking about any other way - do it the other way. Even closing Excel reliably using the API with an interactive user session is a problem. (Grist for another blog post on how I've gotten as close as humanly possible.)
Where Do I Get The Office COM API?
If you absolutely must, and are willing to accept the risks stated above... You should use the Office Primary Interop Assemblies to do this. If you do, you won't have to install a full Office on your server.
For Office XP: Office XP PIAs
For Office 2003: Office 2003 Update: Redistributable Primary Interop Assemblies
For Office 2007: 2007 Microsoft Office System Update: Redistributable Primary Interop Assemblies
Info on how to install and use them: Installing and Using the Office 2003 Primary Interop Assemblies
Links Restated
Douglas Laudenschlager's Excel Posts
Considerations for server-side Automation of Office
Using Excel 2007 XSLX Files as Destinations
Thank you for the detailed guide!
ReplyDeleteI want to comment that there is already a third-party commercial solution from CozyRoc, which solves many of the described issues. Here is point by point review related to the guide:
1. No Office or any other third-party data components are required to work with Excel workbooks other than the CozyRoc's SSIS+ installation. This makes the deployment hassles less painful.
2. Supports both 32bit and 64bit execution.
3. Doesn't try to guess the format of the cells like the default Excel components. You have a choice between reading the formatted cell value or the raw cell value.
4. You can specify start/end column and start/end row.
5. The CozyRoc Excel Source component allows reading of formulas from Excel worksheet.
6. The CozyRoc Excel Destination allows writing of formulas to Excel worksheet.
Hi Todd,
ReplyDeletei just love the way you write. even in the forum ur answers are to the point. your blogs are exhaustive and well researched. Nice to read your blogs. An yes this one did help me I have started working on Win7 64 bit so needed this. Thanks
Thank you Todd. This is brilliant. It gives better understanding how the Office drivers will work with SSIS. You saved my time. Thanks agian
ReplyDelete