Saturday 16 August 2014

Basic Data Validations In ETL Testing For Beginners



Introduction
As per the Bill Inmon Data Warehouse is defined as “A Data Warehouse is a subject-oriented, integrated, time variant and non-volatile collection of data in support of management’s decision making process by analyzing the data”. Data ware house testing is one of challenging testing as each filed requires a thorough testing based on high complexity of business rules and business transformation of values. ETL is used to bring the Data together that is Extracted, Transformed and Loaded (ETL) into one single destination. This document contains the common validations which need to be tested while performing the ETL testing. This BOK helps a beginner in ETL testing to know about the various validations which they encounter during their projects.
ETL can defined as below,
Extract: The source data from many heterogeneous systems is extracted.
Transform: The transformations are applied on the extracted data into structures and types that follow the business rules of the data warehouse
Load:The transformed (cleansed) data is loaded into the data warehouse structures for data analysis
Some of the validations in ETL testing with examples are discussed below
Meta data testing:
ETL Mapping documents have the mapping rules between the source and target columns, data transformations and data types. We need to validate that the source and target files or table structure against corresponding mapping document provided by the client.
Data Type Check:  Verification of data type match between Source and Target columns is done during the data type check. Sometimes the transformation has to be reviewed as well as the data types can change during the transformations. Ensure whether all values for specific fields are stored the same way in the data warehouse regardless of how they were stored in the source system.
Example 1: The source column data type is number so the target column data type should also be number.
Example 2: If one source system stores “Off” or “On” in its status field and another source system stores “0” or “1” in its status field, then a data type conversion transformation converts the content of one or both of the fields to specified common value such as “OFF” or “ON”.
Data Length Check: Verifying whether the length of target column data type has to be greater than or equal to source column data type. Transformations need to consider during data length check as well.
           Example: Source table has First name , Last name columns with the length of 100  each. After applying the Expression transformation, the target table should have the column SunscriberName with the length of 200.
Source Table A:                                                         Target Table B

SSN     
First Name
Last Name
001
Nidhi
Text Box: After Expression TransformationSharma
002
Vijay
Kumar
  SSN
Subscriber Name
001
NidhiShrama
002
VijayKumar



Index/Constraint Check: As per the design document specifications, the proper constraints and indexes are defined on the target tables are verified during index/Constraint Check.Some of key checks are UNIQUE, NULL, NOT NULL, Primary Key, Foreign key, DEFAULT
Example 1: Verify that the columns that cannot be null have the 'NOT NULL' constraint.

Example 2: Verify that the Primary Key and Natural Key columns are indexed
Attribute Check:To verify whether all the attributes of source table are present in the target table as per the mapping document.
Data Completeness checks:
The main purpose of Data Completeness check is to verify whether all the expected data is loaded in to the target from the source. And performing completeness checks for transformed columns is a bit tricky but can be done in most of the cases by understanding the transformation rules and comparing the counts of the expected results.
Data completeness checks can be done by comparing and validating the record counts, Aggregates (min,max,sum,avg) of source and target columns with or without transformations.
Count Validation:During the count validation the record counts are compared between source and target to check for any rejected records.
Source:
          SELECT count (*) FROM Cag_Look_up


Target:
          SELECT count (*) FROM Cag_Look_up_target
If the counts from both queries results same (i.e. Source Count=Target Count) then there are no rejected records.
Data Profile Validation:Aggregate functions like count, Avg, Min, Sum, Max  (where applicable) are compared between source and target columns.
 Example:
Source:
SELECT count (MEMBER_ID), count (F_NAME), count (L_NAME), avg (PHARMACY_COST) FROM Cag_Look_up
Target:
SELECT count (MEMBER_ID), count (F_NAME), count (L_NAME), avg (PHARMACY_COST) FROM Cag_Look_up_target
Duplicate Check:As per the business requirements, any column or combination of columns need to unique will be verified during the duplicate check.
Example:
Select F_NAME, L_NAME, FAMILY_ID, count (*) from MEMBER_PERSISTANT_TABLE group by F_NAME, L_NAME, FAMILY_ID having count (*)>1
If the above query returns any results, then there are duplicate data in the columns F_NAME, L_NAME and FAMILY_ID.
Data Accuracy Testing:
The data from Source is accurately transferred to the Target according to the business logic is ensured by Data Accuracy testing.

Value Comparison:the columns in the source with the minimum or no transformation will compare with the target columns. The Source Qualifier transformation and expression transformation are used while value comparing check.
Example:
In ETL testing, while performing the value comparison between source and target data,
The below simple queries can be used for, Source Data is –
 Select count (*) from cag_look_up where CARRIER_ID is not null
And Target table query is-
Select count (*) from cag_look_up C, Target Member_persistant M where          
C. CARRIER_ID =M. CARRIER_ID and b.ID is not null 
If the count in the query1 and query2 are matched, then we can conclude as the data of CARRIER_ID column from source has successfully transferred to target.
Data Quality Check:
Number check:
Example1:  one of the column in the source system starts with 0,after loading it into the target system 0 should not be appended. This type of business Functionalities can be validated by Number check.
Example 2: if in the source format of numbering the columns are as aa_30 but if the target is only 30 then it has to load not pre_fix(aa_) .
Date Check:They have to follow Date format and it should be same across all the records.
Example:
The Standard format or the default format od date should be like yyyy-mm-dd etc.. And sometimes we can validate like FROM_DATE should not greater than TO_DATE.
Precision Check: precision of some of the numeric columns in the target should be rounded as per the business logic.
Example:
The value of price column in the source is 28.123789 but in the target it should be displayed as 28.20.(round of value)

Data Check:Some of the records from source to target need to be filtered out based on certain business rules.
Example: only records with the data_of_service>2012 and batch_id! = 101 should enter into target table.
Null Check:Based on the business logic, some of the columns should have “NULL” value.
Example: Display null value in Termination Date column unless and until if his “Active status” Column is “T”.


3 comments :

Appreciation for nice Updates,nice blog article.for more details

http://www.tekclasses.com/

Very nice and informative blog.Nice piece of knowledge!For more details

http://www.tekclasses.com/

Playtech, B2B, and Slots Online at MapYR.com
Playtech 안산 출장샵 - 강원도 출장샵 B2B 충주 출장샵 and Slots Online at MapYR.com. Get MapYR 광주광역 출장안마 user 고양 출장샵 ratings, ratings, features and reviews of popular casinos, Gambling Sites & Slots.

Post a Comment