What is the DataFlow profiling feature that DTLoggedExec provides to you? It's the ability to know, for each component in the data flow, how many rows has been sent to that component in any of its inputs, to know how many rows has been sent by the component to any of its outputs, and, of course, in how much time.
With all of this data you can then have a precise idea of the number of rows processed and moved from sources to destinations and the performance of your dataflow.
If you have complex packages where you use more than one dataflow or you have a dataflow inside a loop container, don't fear, DTLoggedExec can take care eve of this complex situations.
This is what you can profile:
- how many rows has been sent to input
- how many rows has benn sent to output
- how many rows has benn processed by a component
- how many times a dataflow component has been executed (idea when you have a dataflow inside a loop and you're interested in profiling separately every execution of the dataflow)
All these data are saved into a CSV file that will contain a row for each processed buffer in the dataflow. The CSV has an header that contains data about the package in which the dataflow has been profiled. All the dataflow present in a package will have their profiled data sent to the same CSV file so it will be easier to import it into a DB for further processing.
This is a sample header:
#Package: Test Package 4
#Version: 1.0.19
#ProfileDateTime: 20080807 15:32:37
and these all the columns you have in the produced CSV:
| DataFlowId | The GUID that identifies the DataFlow inside the package |
| DataFlowName | The name of the dataflow component (ex: "Data Flow Task 1") |
| ExecutionNumber | The number of the times the dataflow has been executed |
| EventName | Name of the event that has produced the profiled data (Actually only "OnPipelineRowsSent" is available) |
| EventDate | Date at which the profiled data has been taken |
| EventTime | Time at which the profiled data has been taken |
| SourceId | Id of the source of component's processed data. This is the Output Buffer of a component. |
| SourceName | Name of the source of component's output buffer. |
| ComponentId | Id of the component that processes data |
| ComponentName | Name of the component that processes data |
| DestinationId | Id of the component that will receive processed data. This is the Input Buffer of a component. |
| DestinationName | Name of the input buffer of the component that will received processed data. |
| RowCount | Number of rows processed in this buffer. |
The values are separate by two colons. This is a sample line that show the profiled data:
{3A642E70-91EE-4C6C-B218-5904C0A95533}}::Data Flow Task 1::1::OnPipelineRowsSent::20080807::15:32:39.7812500::57::OLE DB Source Output::16::Script Component::18::Input 0::1365The line allow us to understand that:
In the Dataflow named "Data Flow Task 1" the buffer processed 1365 at 15:32:39 of 2008-08-07. This buffer was used by the transformation component named "Script Component" and receives rows from the previous component's output named "OLED DB Source Output" into its input named "Input 0" for processing.
You can verify this also looking at the DataFlow:
 Data Flow |
Of course the input and the ouput buffers of the shown components are not visibile, since you can see how many input and ouput a component has only by accessing its properties.
DTLoggedExec is also capable of producing detailed information even if the same Dataflow is executed more than once. Let's take a Package where you have a Dataflow put into a For-Each loop:
Thanks to the value present in the "ExecutionNumber" column, you can known to which iteraction that line of profiled data belongs to. A sample of the data profiling output for such package can be found here:
TestPackage4_20080806_113130.dtsDataFlowProfileIf you're now interested in understading how you can enable this dataflow profiling feature go here:
Enabling DataFlow Profilingand if you want to understand how you can consume that data, just go there:
Consuming profiled data