Introduction
This
document contains the common transformation which is encountered while doing
ETL testing. We are sure readers would be coming across these types of
transformation during their projects but they won’t be aware about what
transformation they actually test. This BOK helps a novice ETL tester to know
about the various transformations which they encounter during their projects.
The document also uses some terms related to Healthcare Insurance while depicting
the examples. Following are the terms used:
1. Member: Any person covered under a health
insurance plan, an enrollee or eligible dependent.
2.
Provider: A term commonly used by health insurance companies to designate any
healthcare provider, whether a doctor or nurse, a hospital or clinic.
3.
Subscriber: A term used in Health Insurance which refers to the person who pays
for the Health Insurance.
The
document is prepared based on the project experiences.
Transformations
In ETL Testing:
A
transformation is a repository object which reads, modifies and passes the
data. A set of Transformations are available (specific to the tools used) using
which different kinds of operations can be performed on the data to customize
the data as per our requirements. The document lists out various
transformations available for data processing.
A
Transformation can be classified as active or passive.
Active
Transformation:
A
transformation can be called as an active
transformation if it changes
the number of rows when the records pass from source to target. For E.g. some
rows might be dropped while moving from source to target which do not meet the
requirements.
Passive
Transformation:
A
transformation can be called as a passive
transformation if there are no
changes in the number of rows when data passes from source to target, i.e. all
the rows pass through the transformation.
Various
types of Transformations :-
The following transformations can be active or passive depending on the
function they perform.
1.
Router Transformation:
Router is an Active transformation. It is
applied to data present in a single source (for e.g. a Flat File or a single
table) which needs to be routed to different targets (e.g. multiple tables or
multiple files) based on certain conditions.
The
data present in the source can be routed to different targets using any unique
routing code based on which data can be split into different targets. This
routing code can be a unique number or any unique identification key.
E.g.:Say a system A sends authorizations to two claim
engines B and C which need to process the claims. The data coming from the
system A has to be split into the two systems based on a route code
.Authorizations having route code in the range 100-200 have to be sent to system
B and Authorizations having route code in the range 300-400 have to be sent to
system C. Any Authorization not lying in the above mentioned ranges has to be
written to an error file. Such kind of data transformation requirement is
handled using Router Transformation where a data coming from a single source is
being split into multiple targets based on a unique code.
Flow diagram to demonstrate Router
Transformation:
The
Fig 1(a) demonstrates how router transformation can split the data into
different targets .
Fig1
(a)
2.
Filter transformation:
The
filter transformation is an active transformation. A condition is specified in
this type of transformation where only the data in the source meeting the
specified condition moves to the target.
E.g. Consider a
source table containing Member related data who avail services from a Provider.
The table has both active members and terminated members .As per the
requirement only the members which are active from a particular effective date
(say 01-jan-2013) and have the termination date as open end date(say
31-dec-2039) need to be sent to the target database or file. In this scenario
the filter condition is to pass all the members which are active between the
mentioned Effective and Termination date to the target table and reject the
data which does not meet the condition.
Flow diagram to demonstrate Filter
Transformation:
The
Fig 2(a) demonstrates how filter transformation passes only the data which
meets the given condition.
Fig 2(a)
NOTE:-
Difference
between Filter and Router Transformation.
In
filter transformation we eliminate the data that does not meet the condition
whereas router transformation has an option to capture the data that does not
meet the condition.
Considering
the above example if we require to capture the details of all the inactive
members into an error file we would use router transformation instead of filter
transformation where all members which do not comply with the filter criteria
will be routed to the error file as shown in the below diagram.
Fig
3(a)
3. Aggregator Transformation: Aggregator transformation is an active transformation used to perform
calculations such as sums, averages, counts, etc. on groups of data.
Following aggregate functions can be tested
under Aggregator transformation,
1.
Avg
2.
Count
3.
Sum
4.
Last
5.
First
6.
Min
7.
Max
8.
Variance
9.
Median
10. Percentile
11. STDDEV(Standard Deviation)
Example
for the function MAX: Consider the below scenario:
A member avails a service from a provider and has
to be admitted for 10 days to undergo a treatment prescribed by the provider.
The admission date and the discharge date is recorded by the provider in a
table. After 9 days the provider recommends that the member should stay for
another 15 days for complete recovery. In this scenario the member will have a
new entry for the discharge date in the table and therefore the member will
have multiple entries for the discharge date in the database.
Member ID
|
Admission Date
|
Discharge Date
|
001
|
01-JAN-2013
|
10-JAN-2013
|
001
|
01-Jan-2013
|
25-JAN-2013
|
Since the table has multiple entries for the
member, in order to calculate the final discharge date for the Inpatient, the
function MAX can be applied to the
discharge date field i.e MAX(discharge
date) and using this aggregate function the required date can be retrieved
from the database.
Example
for the function SUM: Consider the below scenario:
A member has undergone a surgery during which he
avails medical facilities from the provider such as diagnostics, anesthesia,
pharmacy, etc. The total cost for the services taken up by the member will
include all the availed facilities cost. The following table depicts the data
of the Member.
Member Id
|
Anesthesia Cost
|
Lab diagnostics Cost
|
Pharmacy Cost
|
002
|
1000
|
2000
|
3000
|
Hence, the Treatment cost would be a total of all
the costs incurred by the member. i.e.
SUM
(Anesthesia Cost, Lab Diagnostics Cost, Pharmacy Cost)
Example
for the function COUNT: Consider the below scenario:
Suppose an input file contains member information
i.e both subscriber and dependent information .As per the requirement only
those members should be inserted into the target table who has an age as of or
greater than 18 years .In source file we have a total of 100 members whereas the members satisfying
the above conditions are only 20 .Here count functions can be used in order to
know how many members have actually passed the given condition and are coming
into the target.
Example for the function MIN: Consider the
below scenario:
Suppose there is an input file which contains
medical message details .As per the requirement a unique Family Id’s should be
retrieved for those records whose Batch Id’s are min.
In Source file we have in all 150 records where as
the family Ids satisfying the above condition are only 10.Here the min is used
to know how many family Ids have passed the given condition and are coming into
the target file.
Source File
Batch Id
|
Family Id
|
001
|
58043615200001
|
002
|
58043615200001
|
003
|
58043615200001
|
002
|
3HZN0039585001
|
005
|
3HZN0039585001
|
001
|
3HZN0039585002
|
003
|
3HZN0039585002
|
After Filter Condition:-
Batch
Id
|
Family
Id
|
Batch
Id
|
001
|
58043615200001
|
001
|
002
|
3HZN0039585001
|
002
|
001
|
3HZN0039585002
|
001
|
4.
Expression Transformation
Expression
transformation is a passive transformation used to calculate values on a single
row. Example of an expression transformation would be concatenating the first
and the last name, etc
Example:Consider the
below scenario
A source has three separate fields for First Name,
Middle Name and Last Name of a subscriber. As per the requirement the target
should contain the name of the subscriber in one field in the below Format:
First Name
+ Middle Name + Last name.
Consider the following tables which depict the
data:
Source
Table:
Subscriber
Id
|
First
Name
|
Middle
Name
|
Last
name
|
0031
|
RAJ
|
K
|
SHARMA
|
Target
Table:
Subscriber
Id
|
Subscriber
Name
|
0031
|
RAJKSHARMA
|
In
the above example, expression transformation is used to concatenate the Names
present in different fields of the table and populating a single name in the
target table.
5.
Lookup transformation
A
lookup transformation is used to look up data in a flat file, table or view
which can return a single or multiple rows. It can be a passive or active
transformation.
The
lookup transformation can be used in a scenario where we already have a source
value and based on that source value we need to retrieve the corresponding
information from a different source. Say we have to look up the City Name
(present in a different table) for a particular STD Code given in a table
containing contact information. This scenario would use a lookup
transformation. The following example further demonstrates the usage of lookup
transformation.
The
values retrieved via lookup transformations can be a single row or multiple
rows and can also be used for calculations.
Example:Consider the
below scenario
A source has three fields: a unique social security
number (SSN),First Name and Last name. As per the requirement the target table
should have an additional field of Member Id along with the SSN, First Name and
Last Name. But the information for the Member Id is stored in a separate table
along with the associated SSN. Hence the Member Id can be retrieved from the
separate table using the SSN which is common in both the tables. Such kind of
reference tables are known as Lookup tables and the process is known as lookup
transformation. Consider the below tables which depict the scenario.
Below
diagram demonstrates the usage of expression transformation
Source
Table:
Target
Table:
SSN
|
Member
ID
|
First
Name
|
Last
Name
|
1234567
|
ABCD1234567
|
Ajay
|
Sharma
|
6.
Sequence Generator Transformation
The
sequence generator transformation is used to generate unique numeric values in sequential
order. The sequence number generated can be used as a primary key to identify
the data.
Example:
The source which is a flat file has data related to
a member, e.g. member name, age, sex, provider Name, etc as a part of the
details as shown below:
Data in
the source file:
NIDHI SHARMA 25 F CISCO
NEHA DUTT 45
M INFOSYS
BRIAN LARA
32 M XYZHEALTHCARE
The data is to be populated to a target file which
should have a unique key to identify the record easily. This kind of
transformation is known as sequence generator transformation where a unique
number is generated for each record which can be used as a primary key to
identify records.
Target
file with sequence numbers generated:
01 NIDHI SHARMA 25 F CISCO
02 NEHA DUTT 45 M INFOSYS
03 BRIAN LARA 32 M XYZHEALTHCARE
7.
SORTER TRANSFORMATION
Sorter
transformation is a connected and an active transformation. It allows sorting
data either in ascending or descending order according to a specified field or
set of fields.
Example:
Consider
an input file which contains group related data. The data currently present in
the file is not in a sorted order. As per requirement data in the output file
should be presentin a sorted order based on the combination of group number
,sub group number and effective date
fields. Consider the below tables which depict the scenario.
Source
data (Group related information)
0 comments :
Post a Comment