ETL stands for extract, transform, and load. The main
purpose of ETL is to collect data’s which is distributed among different
sources. The source data can be scattered on different sources for different departments
and with the help of ETL we consolidates data and handles it safely.
Lots of interfaces use this method to extract data and
then transform according to their requirement and loads data to the target
tables. For example a retail organization in which the store data may be in a
different department and the sales data may be in a different system and with
the help of ETL it collects and bundles data into one and loads into the target
tables. It stores the data in a database or in a data warehouse.
The
ETL process
Extract
Extracts data from the
source
↓
Transforms
Transforms the data
according to the requirement
↓
Loading
Loads the data to the
source
This 3 stage process helps
us to load data from a pre confirmed source and also is a shortcut to load our
target tables.
This process can be
initiated by converting each script’s to mainframe jobs. The script which does
the extraction, transformation and loading would be kept in a particular path
with the corresponding names. We can invoke this script by mainframe jobs. We
prepare the events and in each event there would be a job which invokes a
particular script and the intended job happen. After completing the extraction
job we have to make sure that the functionality of that job is completed. Since
for the transformation job the target of the extraction becomes the source of
the transformation job. Similarly in the case of loading job the target of the
transformation job becomes the source of the Loading. Hence after the
completion of each job we have to make sure that the data is loaded.
Steps
Included in an ETL Process
The ETL process contains
some steps which can be mentioned below
1)preparing the Source
file(For extraction we should make sure that the source is having data or the
source file should be present in the expected path)
2)All the dataset up
should be done (The reference tables should be having suitable data)
3)The Extract job can be
ran
4)we can validate that the extraction is successful
5)The needed
transformation should be done after the necessary look ups on tables
6)The Loading should be
done to the necessary Staging or Load table
7)If needed Reports should
be published
8)Audit reconciliation
should be done(Whether the input record count matches the output record count)
9)After the successful run
of the interface the intermediate files can be archived
Scope
of Validation
1)Whether the data file is loading
the source tables from the source system(The Source table should be having
valid data).
2) The ETL job should be present and the extraction form the source tables is happening correctly and also the needed data is loaded to staging tables(Transform)
2) The ETL job should be present and the extraction form the source tables is happening correctly and also the needed data is loaded to staging tables(Transform)
3)Validation in the staging tables to
check whether all the Data mapping/Transformation has happened and whether the
rules are followed
4) Validations in Target tables to ensure data is present in required format and there is no data loss from Source to Target tables
4) Validations in Target tables to ensure data is present in required format and there is no data loss from Source to Target tables
Challenges
1)
Unavailability of data in the source. It may differ from interface to
interface. In some case inadequate data may be the problem. Also it may be the
case of junk data in the tables. Validation wise also we may need to face
challenge, since for certain scenarios we may need to do the dataset up as the
required data won’t be available in the database.
2) Job failure is another major
challenge as there would be some problem in the script or in the ESP event and
which we need to debug and reran it for the successful run of interface.
3) Access to Database is another
major issue which the testers may need to face.
4) Mapping of columns to the correct
expected from the source to the target
5) Large volume of data may be also
another challenge which we may overcome
Nowadays there are different tools
with which we can make the ETL testing more effective and simpler. Some of the
tools available are
Commercial
ETL Tools:
·
IBM InfosphereDataStage
·
InformaticaPowerCenter
·
Oracle Warehouse Builder (OWB)
·
Oracle Data Integrator (ODI)
·
SAS ETL Studio
·
Business Objects Data Integrator(BODI)
·
Microsoft SQL Server Integration
Services(SSIS)
·
Ab Initio
Freeware,
open source ETL tools:
·
Pentaho Data Integration (Kettle)
·
Talend Integrator Suite
·
CloverETL
·
Jasper ETL
0 comments :
Post a Comment