|Data Profiling is important in data quality processing and transformation. It provides a way to investigate large datasets and to identify incorrectly formatted data. This post explains doing data analysis using Data Profiling task.|
Data Profiling task helps you to
- Identify the problems with your data. Example: Finding the products in your data which are not following the naming pattern.
- Identify the missing functional dependencies between columns. Example: State and Zip code columns. It helps you identify wrong ZIP code that has been entered for a state or city.
Setting up Data Profile Task
1. Drag the Data Profile task to designer and Specify the output file name. You can specify the output filename either through file connection or variable. There is only one output filename for all profiling tasks.
Double click the task and set the below properties in general tab
If you set OverwiteDestination property to false then package will fail in subsequent runs.
2. Select the Profile type
Column Statistics Profile Request – It helps you to calculate minimum, maximum and average values of a column. It helps you to find invalid or outbound values of a column. Example: Invalid Dates.
Column Value distribution Request – You can use this profile request to identify repetitive or duplicate values.
Column Length Distribution Request – You can use this profile request to identify column values that would not fit in the destination column.
Column Pattern Profile Request – Using this request you can identify how consistent the column values relative to each other.
Candidate Key Profile Request – Allows you to identify whether the column is suitable for key column in the table.
Functional Dependency Profile- Helps you to identify which one column values are dependent on another column.
Value Inclusion Profile Request – Helps you to identify the suitability one column as foreign key for another table.
Select the desired profile request from the above list. You can set all profile actions on the desired table in your database.
3. Configure the Connection Manager in SSIS designer. The Data Profiling task requires ADO.NET connection manager.
4. Assign the connection manager to profile request , select the table or view from the list as shown below
Depending on the profile type that you selected, you may have to select one or more columns from the table.
5. The Quick Profile option- if you want to quick profile a table , click the quick profile button then you will receive the below dialogue window
After you complete all your settings and click ok. Run the package, it should generate the results to output file that you configured in the profile task.
Analyzing the data using Data Profile Viewer
To open the Data Profile Viewer, go to the SQL Server 2008 start menu
Data Profiling Task is a powerful tool to interrogate the data and to understand the cleanliness , over all structure and relationship between data in tables.