Pages

Monday, June 6, 2011

How Do You Deal with Excel in SSIS?

This Friday June 10th, I'll be presenting to the Vancouver BI PASS chapter about Excel.  I've got a TON of problems, solutions, resources and tips to compile together - Excel can be a real finicky beast to deal with from an automation standpoint.
Given all the problems with connectivity, data types, formatting, and so on... why don't ETL developers just proclaim that they won't take data in the form of Excel spreadsheets?  Because then we'd have no work to do!  It's pretty uncommon to hear of a data integration project that doesn't involve Excel in part of it.  The source OLTP system probably isn't a set of spreadsheets - but quite a lot of supporting metadata, reference information, or budgets and forecasts come from the ubiquitous Excel.  It's the business' most comfortable vehicle for storing data - I constantly hear it referred to as a "database" (by business folks).  Well, we can't snigger from on high in our ivory tower about how rediculous it is to call Excel spreadsheets databases - we have to deal with this data!
The presentation will be in the Microsoft offices in downtown Vancouver - 1111 West Georgia, 11th floor at 10:30 PDT.  Come early to chat - I'll be there around 10am, depending on how long a conference call goes.  I hope to see you there if you're a local - but if not, you can still tune in virtually.  Scott will have the RSVP and LiveMeeting links up on the Vancouver BI PASS website shortly.
And for those that are really paying attention - if there's a specific issue you're interested in solving between SSIS and Excel, ask away during the presentation - or be nice to me and leave a comment here prior to Friday so I come prepared!
UPDATE 2011-06-22: The recording, samples, and slide deck are now available on SkyDrive.

5 comments:

  1. I'm really looking forward to this talk (virtually). I'm going through the pains of loading from excel and it is heLL. As requested, it would be very very useful if you also could cover:

    - how to stop excel sometimes changes the data type of a column
    - show a real world practical example of using IMEX=1 on the excel connection string
    - Is there an alternative to using a Row-By-Row type operation (e.g. ole db command) to insert the data into a table from excel.

    Much appreciated
    Jag

    ReplyDelete
  2. I watched the live meeting recording and got some useful advice, thanks!

    wrt stopping the problem of excel guessing the data types and not giving the data to the SSIS data flow in the expected data type - The only way I consistently managed to do this is inserting 8 dummy rows in the sheet.

    What's your experience with this approach and how could in SSIS, e.g. via a script task or something.

    Any help is much appreciated
    Jag

    ReplyDelete
  3. One thing I have to test is specifying the TypeGuessRows property in the connection string - I've seen information to suggest it's possible to change this value on a per-connection basis without altering the registry, but I have to try it myself to be sure.
    If that's possible (or even if it's not) you could change the TypeGuessRows to 1 and not have to use eight dummy rows. Don't forget you can also use zero which will cause a full scan of all rows for type determination (but that can still result in unexpected behaviour).
    I don't recommend using Scripts at all due to server instability.
    If you have to take it a step further, you'll need to use the OpenXML API...

    ReplyDelete
  4. Any suggestions how I could insert the dummy row into the sheet in excel from SSIS?

    thanks

    ReplyDelete
  5. You can "generate" a row by using an OLE DB Source with a simple SELECT statement with no "FROM" clause. You could also use a script source.

    ReplyDelete