Saturday, May 31, 2008

Checking SSIS Data Flow Metadata

The SSIS Data Flow is heavily dependent on metadata. In order for it to do anything meaningful, it needs to have access to the metadata for all of the sources and destinations with which it interacts - the column names and data types being the most important thing. And because of the strongly-typed nature of the memory buffers that the SSIS Data Flow uses to deliver its incredible runtime performance, it tends to be very, very picky about its metadata.

On the SSIS Forums on MSDN there are often posts with questions or problems related to Data Flow metadata. Some of them are related to data type mismatches - why can't I load an ASCII string into a Unicode string column, or join an ASCII string to a Unicode string in a Lookup transformation? Some of them are related to perceived data type changes - I'm loading in a CHAR field, but the data is being treated as NUMERIC!

In the end, it all comes down to the metadata. But how does an SSIS developer see that metadata? The answer is very easy, but not always obvious. Many people will suggest using a Data Viewer - but this only shows the data, and not the metadata that describes it, and it's not always obvious what is the underlying type of a field from the way that it is displayed. Some data flow transformations will show the metadata of their inputs and outputs, but usually this information is hidden deep in the Advanced Editor and can only be viewed one column at a time, which is something of a pain.

Fortunately, there is a very simple technique that works every time: Edit the data flow path arrows.

Consider this simple data flow:

metadata01

The OLE DB Source component is configured to use this query:

metadata02

But what is the metadata coming out of the source component? If you right-click on the green data flow path arrow and choose Edit[1], on the Metadata tab of the Data Flow Path Editor dialog box you can see the metadata for all columns in that data flow path in one place:

metadata03

Note that in addition to the "obvious" metadata such as the data types and sizes, you can also view information about the sort order for each column and the data flow component that added that column to the data flow.

Now, for a trivial data flow like this one there isn't a lot of use to digging into the metadata like this. But picture a real-world data flow where you have multiple data sources, or where you're using the Derived Column or Data Conversion transformations to change the data types of columns. In that sort of scenario, it's easy to "lose track" of what's going on where. And when you are running into problems where you're saying "I know it shouldn't be doing this - it doesn't make sense! Why is SSIS giving me this error?" then the best thing to do is to look at the metadata that SSIS is using to make its decisions.

This simple technique has saved me many, many hours on SSIS projects over the years. I can't believe it's taken me this long to blog about it...

[1] You can also double-click on the arrow, but if you're like me and get shaky hands after a pot or two of coffee, right-clicking may be more effective.

1 comment:

readerjane said...

Your post just saved my sanity.

Thank you VERY much.