Introduction
As
per the Bill Inmon Data Warehouse is defined as “A Data Warehouse is a
subject-oriented, integrated, time variant and non-volatile collection of data
in support of management’s decision making process by analyzing the data”. Data
ware house testing is one of challenging testing as each filed requires a
thorough testing based on high complexity of business rules and business
transformation of values. ETL is used to bring the Data together that is
Extracted, Transformed and Loaded (ETL) into one single destination. This document
contains the common validations which need to be tested while performing the
ETL testing. This BOK helps a beginner in ETL testing to know about the various
validations which they encounter during their projects.
ETL
can defined as below,
Extract: The source data from many heterogeneous systems is
extracted.
Transform:
The transformations are applied on the extracted
data into structures and types that follow the business rules of the data
warehouse
Load:The transformed (cleansed) data is loaded into the
data warehouse structures for data analysis
Some
of the validations in ETL testing with examples are discussed below
Meta
data testing:
ETL
Mapping documents have the mapping rules between the source and target columns,
data transformations and data types. We need to validate that the source and
target files or table structure against corresponding mapping document provided
by the client.
Data
Type Check: Verification of data type match between Source and
Target columns is done during the data type check. Sometimes the transformation
has to be reviewed as well as the data types can change during the
transformations. Ensure whether all values for specific fields are
stored the same way in the data warehouse regardless of how they were stored in
the source system.
Example 1:
The source column data type is number so the target column data type should
also be number.
Example 2: If
one source system stores “Off” or “On” in its status field and another source
system stores “0” or “1” in its status field, then a data type conversion
transformation converts the content of one or both of the fields to specified
common value such as “OFF” or “ON”.
Data
Length Check: Verifying whether the length of target column data
type has to be greater than or equal to source column data type.
Transformations need to consider during data length check as well.
Example: Source table
has First name , Last name columns with the length of 100 each. After applying the Expression
transformation, the target table should have the column SunscriberName with the
length of 200.
Source Table A:
Target Table B
SSN
|
First Name
|
Last Name
|
001
|
Nidhi
|
Sharma
|
002
|
Vijay
|
Kumar
|
SSN
|
Subscriber Name
|
001
|
NidhiShrama
|
002
|
VijayKumar
|
Index/Constraint
Check: As per the design
document specifications, the proper constraints and indexes are defined on
the target tables are verified during index/Constraint Check.Some of key checks
are UNIQUE, NULL, NOT NULL, Primary Key, Foreign key, DEFAULT
Example 1:
Verify that the columns that cannot be null have the 'NOT NULL' constraint.
Example 2: Verify that the Primary Key and Natural Key columns are indexed
Example 2: Verify that the Primary Key and Natural Key columns are indexed
Attribute
Check:To verify whether all the
attributes of source table are present in the target table as per the mapping
document.
Data
Completeness checks:
The
main purpose of Data Completeness check is to verify whether all the expected
data is loaded in to the target from the source. And performing completeness
checks for transformed columns is a bit tricky but can be done in most of the
cases by understanding the transformation rules and comparing the counts of the
expected results.
Data
completeness checks can be done by comparing and validating the record counts,
Aggregates (min,max,sum,avg) of source and target columns with or without
transformations.
Count
Validation:During the count
validation the record counts are compared between source and target to check for
any rejected records.
Source:
SELECT count (*) FROM Cag_Look_up
Target:
SELECT count (*) FROM
Cag_Look_up_target
If
the counts from both queries results same (i.e. Source Count=Target Count) then
there are no rejected records.
Data
Profile Validation:Aggregate
functions like count, Avg, Min, Sum, Max
(where applicable) are compared between source and target columns.
Example:
Source:
SELECT
count (MEMBER_ID), count (F_NAME), count (L_NAME), avg (PHARMACY_COST) FROM
Cag_Look_up
Target:
SELECT
count (MEMBER_ID), count (F_NAME), count (L_NAME), avg (PHARMACY_COST) FROM
Cag_Look_up_target
Duplicate
Check:As per the business
requirements, any column or combination of columns need to unique will be
verified during the duplicate check.
Example:
Select F_NAME, L_NAME,
FAMILY_ID, count (*) from MEMBER_PERSISTANT_TABLE group by F_NAME, L_NAME,
FAMILY_ID having count (*)>1
If the above query returns
any results, then there are duplicate data in the columns F_NAME, L_NAME and
FAMILY_ID.
Data
Accuracy Testing:
The
data from Source is accurately transferred to the Target according to the
business logic is ensured by Data Accuracy testing.
Value Comparison:the columns in the source with the minimum or no transformation will compare with the target columns. The Source Qualifier transformation and expression transformation are used while value comparing check.
Example:
In
ETL testing, while performing the value comparison between source and target
data,
The
below simple queries can be used for, Source Data is –
Select count (*) from cag_look_up where
CARRIER_ID is not null
And
Target table query is-
Select
count (*) from cag_look_up C, Target Member_persistant M where
C.
CARRIER_ID =M. CARRIER_ID and b.ID is not null
If
the count in the query1 and query2 are matched, then we can conclude as the
data of CARRIER_ID column from source has successfully transferred to target.
Data
Quality Check:
Number
check:
Example1: one of the column in the source system starts
with 0,after loading it into the target system 0 should not be appended. This
type of business Functionalities can be validated by Number check.
Example 2:
if in the source format of numbering the columns are as aa_30 but if the target
is only 30 then it has to load not pre_fix(aa_) .
Date
Check:They have to follow Date format and it
should be same across all the records.
Example:
The
Standard format or the default format od date should be like yyyy-mm-dd etc..
And sometimes we can validate like FROM_DATE should not greater than TO_DATE.
Precision
Check: precision of some of the numeric
columns in the target should be rounded as per the business logic.
Example:
The
value of price column in the source is 28.123789 but in the target it should be
displayed as 28.20.(round of value)
Data Check:Some of the records from source to target need to be filtered out based on certain business rules.
Example: only
records with the data_of_service>2012 and batch_id! = 101 should enter into
target table.
Null
Check:Based on the business logic, some of
the columns should have “NULL” value.
Example: Display null value in Termination Date
column unless and until if his “Active status” Column is “T”.
3 comments :
Appreciation for nice Updates,nice blog article.for more details
http://www.tekclasses.com/
Very nice and informative blog.Nice piece of knowledge!For more details
http://www.tekclasses.com/
Playtech, B2B, and Slots Online at MapYR.com
Playtech 안산 출장샵 - 강원도 출장샵 B2B 충주 출장샵 and Slots Online at MapYR.com. Get MapYR 광주광역 출장안마 user 고양 출장샵 ratings, ratings, features and reviews of popular casinos, Gambling Sites & Slots.
Post a Comment