Consuming profiled data
Once you have a ".dtsDataFlowProfile" file you'll need to import it into a SQL Server database so that you can analyze DataFlow profiled data with the aid of T-SQL or Excel or even Analysis Services.
Create the DTLoggedExecDB database
¶
In the zipped package you can find a
dtloggedexec_db
folder under
DataFlowProfile
. Here you have all the scripts you need to setup a database that can hold your profiled data.
The first script to execute is the one which creates the DTLoggedExecDB database. The script is
create-database.sql
. It uses some SQLCMD functionalities to make the change of the script easier, so please be aware to activate it if you're running it from SQL Server Management Studio.
If you're using SQL Server 2008 to store DTLoggedExecDB database, you may want to enable compression in order to optimize performance and space used. You can run the script
create-database-for-sql2008.sql
after having created the database to do this.
In the folder
dtloggedexec_db
folder you'll also find the
import-profile-data.sql
. This file is used to make import of .dtsDataFlowProfile data easier and should no be used manully unless you really know what you're doing.
Import .dtsDataFlowProfile data
¶
Importing profiled dataflow data is very easy. All you have to do take you .dtsDataFlowProfile data and drop it on the
import_dataflowprofile.bat
file you can find in
dtloggedexec_db
folder. If you don't like drag'n'drop or prefer to use console command, you just have to pass the file contining profiled data to the
import_dataflowprofile.bat
batch:
import_dataflowprofile.bat TestPackage6_20080807_155635.dtsDataFlowProfile
The import phase will extract header and detail informations from profiled data and put everything in the db using the fast BULK INSERT statement for optimum performance. The import process will also check that the data carried by the file has not already been loaded previously in order to avoid duplicate data.
By default the batch will target the
localhost
database server. If you want to import data to a different database server or you're using named instances, you have to edit
import_dataflowprofile.bat
file replacing the
localhost
setting with your own.
If you want to customize the way in which data is imported, you have to change the
import-profile-data.sql
file, located in the
dtloggedexec_db
folder.
Querying DTLoggedExecDB database
¶
Tables
¶
The DTLoggedExecDB is actually made of tables:
dbo.version
¶
Contains information on installed database schema and database data version.
dbo.packages
¶
Contains information on the Packages from which the imported dataflows data comes from. It stores Package Name and Package Version, so that it is possibile to see how the performance of dataflows changes across package versions, allowing to understand the impact of changes made to dataflows.
dbo.dataflow_profiling
¶
Contains a row for each dataflow execution. Rows are related to the package where they come from through the
package_id
column.
dbo.dataflow_profiling_dataflows_data
¶
This table contains aggregated dataflow data, one row for each dataflow execution, obtained from
dbo.dataflow_profiling_buffers_data
. This means that if a dataflow present in a package is executed more than one time (for example because is contained into a Loop Container) here a row for each execution will be stored. The column
execution_count
will hold the value that represent the iteration number for that execution. The column
processed_rows
contains sum of all the rows processed by all components present in the dataflow.
The table has no FK to
dbo.dataflow_profiling
to allow usage of
TRUNCATE TABLE
statement.
dbo.dataflow_profiling_components_data
¶
This table contains aggregated data, one row for each dataflow component obtained from
dbo.dataflow_profiling_buffers_data
. As for the table above, the column
execution_count
will hold the value that represent the iteration number for the dataflow execution, if the dataflow is executed more than once within the same package execution. The column
processed_rows
contains the number of all the rows processed by the component. Since components can have more than one input or output, the name of inputs and outputs is also stored in this table, so that is it possible to keep track of the source and the destination of processed rows.
The table has no FK to
dbo.dataflow_profiling
to allow usage of
TRUNCATE TABLE
statement.
dbo.dataflow_profiling_buffers_data
¶
This table contains a row for each buffer processed in the dataflow. This represent the most detailed data you can have, since dataflow handle data on per-buffer base. For each row the name of the input and output buffer is stored, along with the name of the component that as processed that buffer.
Views
¶
Views has been created in order to access to profiled data more easily, and they simply wraps access to table, avoiding the final user to deal with ids and joins:
vw_dataflow_profiled
vw_dataflow_profiled_dataflows_data
vw_dataflow_profiled_components_data
vw_dataflow_profiled_buffers_data
Some samples of report built using gathered data can be found here:
Report Samples