Saturday, 16 August 2014

Transformation Involved In ETL Testing



Introduction
This document contains the common transformation which is encountered while doing ETL testing. We are sure readers would be coming across these types of transformation during their projects but they won’t be aware about what transformation they actually test. This BOK helps a novice ETL tester to know about the various transformations which they encounter during their projects. The document also uses some terms related to Healthcare Insurance while depicting the examples. Following are the terms used:
 1. Member: Any person covered under a health insurance plan, an enrollee or eligible dependent.
2. Provider: A term commonly used by health insurance companies to designate any healthcare provider, whether a doctor or nurse, a hospital or clinic.
3. Subscriber: A term used in Health Insurance which refers to the person who pays for the Health Insurance.
The document is prepared based on the project experiences.



Transformations In ETL Testing:
A transformation is a repository object which reads, modifies and passes the data. A set of Transformations are available (specific to the tools used) using which different kinds of operations can be performed on the data to customize the data as per our requirements. The document lists out various transformations available for data processing.
A Transformation can be classified as active or passive.
Active Transformation:
A transformation can be called as an active transformation if it changes the number of rows when the records pass from source to target. For E.g. some rows might be dropped while moving from source to target which do not meet the requirements.
Passive Transformation:
A transformation can be called as a passive transformation if there are no changes in the number of rows when data passes from source to target, i.e. all the rows pass through the transformation.

Various types of Transformations :-
The following transformations can be active or passive depending on the function they perform.

1. Router Transformation:
 Router is an Active transformation. It is applied to data present in a single source (for e.g. a Flat File or a single table) which needs to be routed to different targets (e.g. multiple tables or multiple files) based on certain conditions.

The data present in the source can be routed to different targets using any unique routing code based on which data can be split into different targets. This routing code can be a unique number or any unique identification key.

E.g.:Say a system A sends authorizations to two claim engines B and C which need to process the claims. The data coming from the system A has to be split into the two systems based on a route code .Authorizations having route code in the range 100-200 have to be sent to system B and Authorizations having route code in the range 300-400 have to be sent to system C. Any Authorization not lying in the above mentioned ranges has to be written to an error file. Such kind of data transformation requirement is handled using Router Transformation where a data coming from a single source is being split into multiple targets based on a unique code.

Flow diagram to demonstrate Router Transformation:
The Fig 1(a) demonstrates how router transformation can split the data into different targets .
                                                          Fig1 (a)

2. Filter transformation:
The filter transformation is an active transformation. A condition is specified in this type of transformation where only the data in the source meeting the specified condition moves to the target.
E.g. Consider a source table containing Member related data who avail services from a Provider. The table has both active members and terminated members .As per the requirement only the members which are active from a particular effective date (say 01-jan-2013) and have the termination date as open end date(say 31-dec-2039) need to be sent to the target database or file. In this scenario the filter condition is to pass all the members which are active between the mentioned Effective and Termination date to the target table and reject the data which does not meet the condition.

Flow diagram to demonstrate Filter Transformation:
The Fig 2(a) demonstrates how filter transformation passes only the data which meets the given condition.

                                                            Fig 2(a)
NOTE:-
Difference between Filter and Router Transformation.
In filter transformation we eliminate the data that does not meet the condition whereas router transformation has an option to capture the data that does not meet the condition.
Considering the above example if we require to capture the details of all the inactive members into an error file we would use router transformation instead of filter transformation where all members which do not comply with the filter criteria will be routed to the error file as shown in the below diagram.

                                                      Fig 3(a)
3. Aggregator Transformation: Aggregator transformation is an active transformation used to perform calculations such as sums, averages, counts, etc. on groups of data.
    Following aggregate functions can be tested under Aggregator transformation,
1.    Avg
2.    Count
3.    Sum
4.    Last
5.    First
6.    Min
7.    Max
8.    Variance
9.    Median
10. Percentile
11. STDDEV(Standard Deviation)
Example for the function MAX: Consider the below scenario:
A member avails a service from a provider and has to be admitted for 10 days to undergo a treatment prescribed by the provider. The admission date and the discharge date is recorded by the provider in a table. After 9 days the provider recommends that the member should stay for another 15 days for complete recovery. In this scenario the member will have a new entry for the discharge date in the table and therefore the member will have multiple entries for the discharge date in the database.
Member ID
Admission Date
Discharge Date
001
01-JAN-2013
10-JAN-2013
001
01-Jan-2013
25-JAN-2013

Since the table has multiple entries for the member, in order to calculate the final discharge date for the Inpatient, the function MAX can be applied to the  discharge date field i.e MAX(discharge date) and using this aggregate function the required date can be retrieved from the database.

Example for the function SUM: Consider the below scenario:
A member has undergone a surgery during which he avails medical facilities from the provider such as diagnostics, anesthesia, pharmacy, etc. The total cost for the services taken up by the member will include all the availed facilities cost. The following table depicts the data of the Member.
Member Id
Anesthesia Cost
Lab diagnostics Cost
Pharmacy Cost
002
1000
2000
3000

Hence, the Treatment cost would be a total of all the costs incurred by the member. i.e.
SUM (Anesthesia Cost, Lab Diagnostics Cost, Pharmacy Cost)
Example for the function COUNT: Consider the below scenario:
Suppose an input file contains member information i.e both subscriber and dependent information .As per the requirement only those members should be inserted into the target table  who has an age as  of  or greater than 18 years .In source file we have a total of  100 members whereas the members satisfying the above conditions are only 20 .Here count functions can be used in order to know how many members have actually passed the given condition and are coming into the target.
Example for the function MIN: Consider the below scenario:
Suppose there is an input file which contains medical message details .As per the requirement a unique Family Id’s should be retrieved for those records whose Batch Id’s are min.
In Source file we have in all 150 records where as the family Ids satisfying the above condition are only 10.Here the min is used to know how many family Ids have passed the given condition and are coming into the target file.
Source File
Batch Id
Family Id
001
58043615200001
002
58043615200001
003
58043615200001
002
3HZN0039585001
005
3HZN0039585001
001
3HZN0039585002
003
3HZN0039585002

After Filter Condition:-
Batch Id
Family Id
Batch Id
001
58043615200001
001
002
3HZN0039585001
002
001
3HZN0039585002
001

4. Expression Transformation
Expression transformation is a passive transformation used to calculate values on a single row. Example of an expression transformation would be concatenating the first and the last name, etc
Example:Consider the below scenario
A source has three separate fields for First Name, Middle Name and Last Name of a subscriber. As per the requirement the target should contain the name of the subscriber in one field in the below Format:
First Name   + Middle Name +    Last name.
Consider the following tables which depict the data:
Source Table:
Subscriber Id
First Name
Middle Name
Last name
0031
RAJ
K
SHARMA

Target Table:
Subscriber Id
Subscriber Name
0031
RAJKSHARMA

In the above example, expression transformation is used to concatenate the Names present in different fields of the table and populating a single name in the target table.
5. Lookup transformation
A lookup transformation is used to look up data in a flat file, table or view which can return a single or multiple rows. It can be a passive or active transformation.
The lookup transformation can be used in a scenario where we already have a source value and based on that source value we need to retrieve the corresponding information from a different source. Say we have to look up the City Name (present in a different table) for a particular STD Code given in a table containing contact information. This scenario would use a lookup transformation. The following example further demonstrates the usage of lookup transformation.
The values retrieved via lookup transformations can be a single row or multiple rows and can also be used for calculations.
Example:Consider the below scenario
A source has three fields: a unique social security number (SSN),First Name and Last name. As per the requirement the target table should have an additional field of Member Id along with the SSN, First Name and Last Name. But the information for the Member Id is stored in a separate table along with the associated SSN. Hence the Member Id can be retrieved from the separate table using the SSN which is common in both the tables. Such kind of reference tables are known as Lookup tables and the process is known as lookup transformation. Consider the below tables which depict the scenario.
Below diagram demonstrates the usage of expression transformation

Source Table:



Target Table:
SSN
Member ID
First Name
Last Name
1234567
ABCD1234567
Ajay
Sharma


6. Sequence Generator Transformation
The sequence generator transformation is used to generate unique numeric values in sequential order. The sequence number generated can be used as a primary key to identify the data.

Example:
The source which is a flat file has data related to a member, e.g. member name, age, sex, provider Name, etc as a part of the details as shown below:

Data in the source file:
NIDHI SHARMA 25 F CISCO
 NEHA DUTT 45 M INFOSYS
 BRIAN LARA 32 M XYZHEALTHCARE
The data is to be populated to a target file which should have a unique key to identify the record easily. This kind of transformation is known as sequence generator transformation where a unique number is generated for each record which can be used as a primary key to identify records.

Target file with sequence numbers generated:
01  NIDHI SHARMA 25 F CISCO
02 NEHA DUTT 45 M INFOSYS
03 BRIAN LARA 32 M XYZHEALTHCARE

7. SORTER TRANSFORMATION
Sorter transformation is a connected and an active transformation. It allows sorting data either in ascending or descending order according to a specified field or set of fields.
Example: Consider an input file which contains group related data. The data currently present in the file is not in a sorted order. As per requirement data in the output file should be presentin a sorted order based on the combination of group number ,sub group number and effective date  fields. Consider the below tables which depict the scenario.




Source data (Group related information)




0 comments :

Post a Comment