Welcome Guest, you are in: Login

DTLoggedExec - Instrumenting SSIS Packages

RSS RSS

Navigation





Search the wiki
»

PoweredBy

Consuming profiled data

RSS
Modified on 2009/01/02 16:54 by dmauri Categorized as Database
Once you have a ".dtsDataFlowProfile" file you'll need to import it into a SQL Server database so that you can analyze DataFlow profiled data with the aid of T-SQL or Excel or even Analysis Services.

Create the DTLoggedExecDB database

In the zipped package you can find a dtloggedexec_db folder under DataFlowProfile. Here you have all the scripts you need to setup a database that can hold your profiled data.

The first script to execute is the one which creates the DTLoggedExecDB database. The script is create-database.sql. It uses some SQLCMD functionalities to make the change of the script easier, so please be aware to activate it if you're running it from SQL Server Management Studio.

If you're using SQL Server 2008 to store DTLoggedExecDB database, you may want to enable compression in order to optimize performance and space used. You can run the script create-database-for-sql2008.sql after having created the database to do this.

In the folder dtloggedexec_db folder you'll also find the import-profile-data.sql. This file is used to make import of .dtsDataFlowProfile data easier and should no be used manully unless you really know what you're doing.

Import .dtsDataFlowProfile data

Importing profiled dataflow data is very easy. All you have to do take you .dtsDataFlowProfile data and drop it on the import_dataflowprofile.bat file you can find in dtloggedexec_db folder. If you don't like drag'n'drop or prefer to use console command, you just have to pass the file contining profiled data to the import_dataflowprofile.bat batch:

import_dataflowprofile.bat TestPackage6_20080807_155635.dtsDataFlowProfile

The import phase will extract header and detail informations from profiled data and put everything in the db using the fast BULK INSERT statement for optimum performance. The import process will also check that the data carried by the file has not already been loaded previously in order to avoid duplicate data.

By default the batch will target the localhost database server. If you want to import data to a different database server or you're using named instances, you have to edit import_dataflowprofile.bat file replacing the localhost setting with your own.

If you want to customize the way in which data is imported, you have to change the import-profile-data.sql file, located in the dtloggedexec_db folder.

Querying DTLoggedExecDB database

Tables

The DTLoggedExecDB is actually made of tables:

dbo.version

Contains information on installed database schema and database data version.

dbo.packages

dbo.packages

Contains information on the Packages from which the imported dataflows data comes from. It stores Package Name and Package Version, so that it is possibile to see how the performance of dataflows changes across package versions, allowing to understand the impact of changes made to dataflows.

dbo.dataflow_profiling

dbo.dataflow_profiling

Contains a row for each dataflow execution. Rows are related to the package where they come from through the package_id column.

dbo.dataflow_profiling_dataflows_data

dbo.dataflow_profiling_dataflows_data

This table contains aggregated dataflow data, one row for each dataflow execution, obtained from dbo.dataflow_profiling_buffers_data. This means that if a dataflow present in a package is executed more than one time (for example because is contained into a Loop Container) here a row for each execution will be stored. The column execution_count will hold the value that represent the iteration number for that execution. The column processed_rows contains sum of all the rows processed by all components present in the dataflow.

The table has no FK to dbo.dataflow_profiling to allow usage of TRUNCATE TABLE statement.

dbo.dataflow_profiling_components_data

dbo.dataflow_profiling_components_data

This table contains aggregated data, one row for each dataflow component obtained from dbo.dataflow_profiling_buffers_data. As for the table above, the column execution_count will hold the value that represent the iteration number for the dataflow execution, if the dataflow is executed more than once within the same package execution. The column processed_rows contains the number of all the rows processed by the component. Since components can have more than one input or output, the name of inputs and outputs is also stored in this table, so that is it possible to keep track of the source and the destination of processed rows.

The table has no FK to dbo.dataflow_profiling to allow usage of TRUNCATE TABLE statement.

dbo.dataflow_profiling_buffers_data

dbo.dataflow_profiling_buffers_data

This table contains a row for each buffer processed in the dataflow. This represent the most detailed data you can have, since dataflow handle data on per-buffer base. For each row the name of the input and output buffer is stored, along with the name of the component that as processed that buffer.

Views

Views has been created in order to access to profiled data more easily, and they simply wraps access to table, avoiding the final user to deal with ids and joins:

  • vw_dataflow_profiled
  • vw_dataflow_profiled_dataflows_data
  • vw_dataflow_profiled_components_data
  • vw_dataflow_profiled_buffers_data

Some samples of report built using gathered data can be found here: Report Samples

ScrewTurn Wiki version 3.0.0.333. Some of the icons created by FamFamFam.