Techbubbles

Using Data Profiling Task in SSIS

 

Data Profiling is important in data quality processing and transformation. It provides a way to investigate large datasets and to identify incorrectly formatted data. This post explains doing data analysis using Data Profiling task.

Data Profiling task helps you to

  • Identify the problems with your data. Example: Finding the products in your data  which are not following the naming pattern.
  • Identify the missing functional dependencies between columns. Example: State and Zip code columns. It helps you identify wrong ZIP code that has been entered for a state or city.

Setting up Data Profile Task

1. Drag the Data Profile task to designer and Specify the output file name. You can specify the output filename either through file connection or variable. There  is only one output filename for all profiling tasks.

image

Double click the task and set the below properties in general tab

image

If you set OverwiteDestination property to false then package will fail in subsequent runs.

2. Select the  Profile type

image

Column Statistics Profile Request – It helps you to calculate minimum, maximum and average values of a column. It helps you to find invalid or outbound values of a column. Example: Invalid Dates.

Column Value distribution Request – You can use this profile request to identify repetitive or duplicate values.

Column Length Distribution Request – You can use this profile request to identify column values that would not fit in the destination column.

Column Pattern Profile Request – Using this request you can identify how consistent the column values relative to each other.

Candidate Key Profile Request – Allows you to identify whether the column is suitable for key column in the table.

Functional Dependency Profile- Helps you to identify which one column values are dependent on another column.

Value Inclusion Profile Request – Helps you to identify the suitability one column as foreign key for another table.

Select the desired profile request from the above list. You can set all profile actions on the desired table in your database.

3. Configure the Connection Manager in SSIS designer. The Data Profiling task requires ADO.NET connection manager.

image

4. Assign the connection manager to profile request , select the table or view from the list as shown below

image

Depending on the profile type that you selected, you may have to select one or more columns from the table.

5. The Quick Profile option- if you want to quick profile a table , click the quick profile button then you will receive the below dialogue window

image

After you complete all your settings and click ok. Run the package, it should generate the results to output file that you configured in the profile task.

Analyzing the data using Data Profile Viewer

To open the Data Profile Viewer, go to the SQL Server 2008 start menu

image

image

Data Profiling Task is a powerful tool to interrogate the data and to understand the cleanliness , over all structure and relationship between data in tables.


Share this post :


Related Posts:

%d bloggers like this: