DataFlow profiling

What is the DataFlow profiling feature that DTLoggedExec provides to you? It's the ability to know, for each component in the data flow, how many rows has been sent to that component in any of its inputs, to know how many rows has been sent by the component to any of its outputs, and, of course, in how much time.

With all of this data you can then have a precise idea of the number of rows processed and moved from sources to destinations and the performance of your dataflow.

If you have complex packages where you use more than one dataflow or you have a dataflow inside a loop container, don't fear, DTLoggedExec can take care eve of this complex situations.

This is what you can profile:


All these data are saved into a CSV file that will contain a row for each processed buffer in the dataflow. The CSV has an header that contains data about the package in which the dataflow has been profiled. All the dataflow present in a package will have their profiled data sent to the same CSV file so it will be easier to import it into a DB for further processing.

This is a sample header:

#Package: Test Package 4
#Version: 1.0.19
#ProfileDateTime: 20080807 15:32:37


and these all the columns you have in the produced CSV:

DataFlowIdThe GUID that identifies the DataFlow inside the package
DataFlowNameThe name of the dataflow component (ex: "Data Flow Task 1")
ExecutionNumberThe number of the times the dataflow has been executed
EventNameName of the event that has produced the profiled data (Actually only "OnPipelineRowsSent" is available)
EventDateDate at which the profiled data has been taken
EventTimeTime at which the profiled data has been taken
SourceIdId of the source of component's processed data. This is the Output Buffer of a component.
SourceNameName of the source of component's output buffer.
ComponentIdId of the component that processes data
ComponentNameName of the component that processes data
DestinationIdId of the component that will receive processed data. This is the Input Buffer of a component.
DestinationNameName of the input buffer of the component that will received processed data.
RowCountNumber of rows processed in this buffer.

The values are separate by two colons. This is a sample line that show the profiled data:

{3A642E70-91EE-4C6C-B218-5904C0A95533}}::Data Flow Task 1::1::OnPipelineRowsSent::20080807::15:32:39.7812500::57::OLE DB Source Output::16::Script Component::18::Input 0::1365

The line allow us to understand that:

In the Dataflow named "Data Flow Task 1" the buffer processed 1365 at 15:32:39 of 2008-08-07. This buffer was used by the transformation component named "Script Component" and receives rows from the previous component's output named "OLED DB Source Output" into its input named "Input 0" for processing.

You can verify this also looking at the DataFlow:

Data Flow

Data Flow


Of course the input and the ouput buffers of the shown components are not visibile, since you can see how many input and ouput a component has only by accessing its properties.

DTLoggedExec is also capable of producing detailed information even if the same Dataflow is executed more than once. Let's take a Package where you have a Dataflow put into a For-Each loop:

Image

Thanks to the value present in the "ExecutionNumber" column, you can known to which iteraction that line of profiled data belongs to. A sample of the data profiling output for such package can be found here:

TestPackage4_20080806_113130.dtsDataFlowProfile

If you're now interested in understading how you can enable this dataflow profiling feature go here:

Enabling DataFlow Profiling

and if you want to understand how you can consume that data, just go there:

Consuming profiled data