Saturday 16 August 2014

Basics of ETL Testing


ETL stands for extract, transform, and load. The main purpose of ETL is to collect data’s which is distributed among different sources. The source data can be scattered on different sources for different departments and with the help of ETL we consolidates data and handles it safely.
Lots of interfaces use this method to extract data and then transform according to their requirement and loads data to the target tables. For example a retail organization in which the store data may be in a different department and the sales data may be in a different system and with the help of ETL it collects and bundles data into one and loads into the target tables. It stores the data in a database or in a data warehouse.
The ETL process
 


Extract
Extracts data from the source
Transforms
Transforms the data according to the requirement
Loading
Loads the data to the source

This 3 stage process helps us to load data from a pre confirmed source and also is a shortcut to load our target tables.
This process can be initiated by converting each script’s to mainframe jobs. The script which does the extraction, transformation and loading would be kept in a particular path with the corresponding names. We can invoke this script by mainframe jobs. We prepare the events and in each event there would be a job which invokes a particular script and the intended job happen. After completing the extraction job we have to make sure that the functionality of that job is completed. Since for the transformation job the target of the extraction becomes the source of the transformation job. Similarly in the case of loading job the target of the transformation job becomes the source of the Loading. Hence after the completion of each job we have to make sure that the data is loaded.
Steps Included in an ETL Process

The ETL process contains some steps which can be mentioned below
1)preparing the Source file(For extraction we should make sure that the source is having data or the source file should be present in the expected path)
2)All the dataset up should be done (The reference tables should be having suitable data)
3)The Extract job can be ran
4)we can validate  that the extraction is successful
5)The needed transformation should be done after the necessary look ups on tables
6)The Loading should be done to the necessary Staging or Load table
7)If needed Reports should be published
8)Audit reconciliation should be done(Whether the input record count matches the output record count)
9)After the successful run of the interface the intermediate files can be archived

Scope of Validation
1)Whether the data file is loading the source tables from the source system(The Source table should be having valid data).
2) The ETL job should be present and the extraction form the source tables is happening correctly and also the needed data is loaded to staging tables(Transform)
3)Validation in the staging tables to check whether all the Data mapping/Transformation has happened and whether the rules are followed
4) Validations in Target tables to ensure data is present in required format and there is no data loss from Source to Target tables



Challenges
1) Unavailability of data in the source. It may differ from interface to interface. In some case inadequate data may be the problem. Also it may be the case of junk data in the tables. Validation wise also we may need to face challenge, since for certain scenarios we may need to do the dataset up as the required data won’t be available in the database.
2) Job failure is another major challenge as there would be some problem in the script or in the ESP event and which we need to debug and reran it for the successful run of interface.
3) Access to Database is another major issue which the testers may need to face.
4) Mapping of columns to the correct expected from the source to the target
5) Large volume of data may be also another challenge which we may overcome

Nowadays there are different tools with which we can make the ETL testing more effective and simpler. Some of the tools available are
Commercial ETL Tools:
·         IBM InfosphereDataStage
·         InformaticaPowerCenter
·         Oracle Warehouse Builder (OWB)
·         Oracle Data Integrator (ODI)
·         SAS ETL Studio
·         Business Objects Data Integrator(BODI)
·         Microsoft SQL Server Integration Services(SSIS)
·         Ab Initio
Freeware, open source ETL tools:
·         Pentaho Data Integration (Kettle)
·         Talend Integrator Suite
·         CloverETL
·         Jasper ETL






0 comments :

Post a Comment