Saturday 16 August 2014

ETL Testing



Extract, transform and load (ETL) is a method in database convention and particularly in data warehousing. It encompasses:
  • Mining data from external sources
  • Altering it to fit operational requirements (which can comprise quality levels)
  • Loading it into the final target (database or data warehouse)
The 3-stage ETL process can be depicted as follows:
Loading:  Converting data into a destination BW
Transforming: Data to fit the demanded standards
Extracting: Data brought up from outside sources
 





The phase by phase process of validating the ETL processes developed in a typical Data Warehousing model. Is ETL Testing . It is basically to test the inbound transformed data from diverse sources when it gets loaded to the target.
ETL testing comprises authorizing the data and the functionality of the application as per the defined requirements. It is not just about testing and validating the data but also about checking the integrity of data. A typical ETL testing work model would involve testing the data against the requirements, testing the data completeness, testing the data correctness and testing the integrity of the transformed data when loaded in the target.
It is essential to validate that data is altered appropriately conferring to various business requirements and guidelines. Entire predictable data should be laden into the data warehouse deprived of any data loss and being truncated. It is necessary to make it definite that this application suitably discards swaps with default values and reports data that is not valid.
ETL also go through the parallel stage as any other independent testing technique.
1.    Understanding of business rules and the requirements
2.    Authenticating them
3.    Making the test estimate
4.    The inputs from test estimation and basic understanding of business requirement become the source for test planning.
5.    After that, the next step is to formulate the test cases and test scenarios from all the accessible feedbacks.
6.    After  test cases being completed and accepted officially, next testing team progress to implement pre-execution check and preparation of test data for testing
  1. In the end, implementation is done till exit standards are accomplished
  2. After the completion has been successful, a summary report is prepared and closure procedure is completed.
Outlining test strategy is essential which should be jointly accepted by stakeholders beforehand actual testing. A definite test strategy will make certain that right tactic has been followed meeting the testing aspiration.
ETL software can be depicted as:
Product database
Sales database
Customer database
Operational Applications

ETL
ETL
ETL
 


                           

Data Warehouse staging area
Data Warehouse
ETL
ETL
 














Data Marts




3.  Goals of testing an ETL Application

Data Correctness
Data Completeness
Environment Integration
Performance
Data Validation
 













The general goals of testing a typical ETL application can be broken down as follows:

v  Data Completeness
v  Data Quality
v  Data Transformation
v  Integration Testing
v  Performance & Scalability
v  User-acceptance Testing
v  Regression Testing



It is one of the most elementary and foremost steps to be achieved when testing an ETL application and it confirms that all the desired data is wholly loaded into the target. This would involve testing the data completeness with respect to the no. of rows loaded, no. of columns loaded for each record, the contents loaded into each field. 


3.2         Data Quality

Data Quality is principally to check how the data denial, rectification, modification, substitution is handled. There are various forms in which the data quality can be upheld and checked.


Data transformation involves testing and validating the business rules have been applied correctly or not. The distinctive process to validate and test the transformation of data is to select some sample records from the source and then verify/validate them against the target data keeping in mind the business logic. This involves a comprehensive manual testing.

3.4         Integration Testing

A standard system testing would involve testing within the ETL application only.
It shows how the application functions into the entire system flow along with all the other upstream and downstream applications. While building the integration test scenarios it necessities to reflect as to in what manner the overall process can be broken and also focus on the points/gaps between the different applications rather than one application. There arises a need for checking how process failures at every stage would be handled and how data would be deleted or recovered if essential.


The primary goal of performance testing is to recognize the weaknesses in the ETL architecture Performance is one essential aspect that necessarily be considered with respect to a typical Data Warehousing application as the data in a Data Warehouse becomes voluminous; ETL load times would keep increasing and then the performance decreases. A decent ETL architecture and design diminishes the consequence on performance.

3.6         Regression Testing

Regression testing is reassuring and checking the prevailing functionality of the application that whether introduction of new code has not disturbed the existing functionality. While scheming the regression test scenarios, it needs to be kept in observance that these cases would be implemented multiple times as new releases are shaped due to defect fixes, enhancements or upstream system changes. In order to perform the regression testing the approach which is considered as faster, smooth and best is test automation.



The chief objective of building a Data Warehouse application is to make the data accessible to the business users. The user-acceptance testing does not test that how the ETL application would function. It characteristically test the data in the Data Warehouse application.




The general points that an ETL Tester keeps in mind can be classified as follows:

v  Do the mappings stand by the development criteria and naming conventions?
v  Do the mappings perform as per the technical design defined?
v  Do the mappings work appropriately in relation to other processes?


v  Test Planning as per the Requirements and Technical Design.
v  Test ETL software
v  Test ETL data warehouse components
v  Executing backend data-driven test.
v  Formulating the Test Strategy.
v  Test Case/Scenario preparation.
v  Test Data Management.
v  Test Execution.
v  Recording the Test Results.
v  Categorize, troubleshoot and deliver solutions for potential matters.
v  Approve requirements and design stipulations.
v  Defect Maintenance.
v  Performing Test Audit and formulating Test Audit Report.


0 comments :

Post a Comment