Extract, transform and load (ETL) is a method in database convention and
particularly in data
warehousing. It encompasses:
- Mining
data from external sources
- Altering it
to fit operational requirements (which can comprise quality levels)
- Loading it into the final
target (database or data warehouse)
The 3-stage ETL process can be depicted as follows:
Loading: Converting data into a destination BW
|
Transforming:
Data to fit the demanded standards
|
Extracting:
Data brought up from outside sources
|
The
phase by phase process of validating the ETL
processes developed in a typical Data Warehousing model. Is ETL Testing . It is
basically to test the inbound transformed data from diverse sources when it
gets loaded to the target.
ETL testing comprises authorizing the data and the
functionality of the application as per the defined requirements. It is not
just about testing and validating the data but also about checking the
integrity of data. A typical ETL testing work model would involve testing the
data against the requirements, testing the data completeness, testing the data
correctness and testing the integrity of the transformed data when loaded in
the target.
It is
essential to validate that data is altered appropriately conferring to various
business requirements and guidelines. Entire predictable data should be laden
into the data warehouse deprived of any data loss and being truncated. It is
necessary to make it definite that this application suitably discards swaps
with default values and reports data that is not valid.
ETL also
go through the parallel stage as any other independent testing technique.
1.
Understanding
of business rules and the requirements
2.
Authenticating
them
3.
Making
the test estimate
4.
The
inputs from test estimation and basic understanding of business requirement
become the source for test planning.
5.
After
that, the next step is to
formulate the test cases and test scenarios from all the accessible feedbacks.
6.
After test cases being completed and accepted
officially, next testing team progress to implement pre-execution check and preparation of test data for testing
- In the end, implementation
is done till exit standards are accomplished
- After the completion has
been successful, a summary report is prepared and closure procedure is
completed.
Outlining test strategy is essential which should be jointly accepted by
stakeholders beforehand actual testing. A definite test strategy will make
certain that right tactic has been followed meeting the testing aspiration.
ETL software can be depicted as:
Product database
|
Sales database
|
Customer database
|
ETL
|
ETL
|
ETL
|
Data Warehouse staging area
|
Data Warehouse
|
ETL
|
ETL
|
Data Marts
3. Goals of
testing an ETL Application
Data Correctness
|
Data Completeness
|
Environment
Integration
|
Performance
|
Data Validation
|
The general goals of testing a typical ETL
application can be broken down as follows:
v Data
Completeness
v Data
Quality
v Data
Transformation
v Integration
Testing
v Performance
& Scalability
v User-acceptance
Testing
v Regression
Testing
It is one of the most elementary and foremost steps
to be achieved when testing an ETL application and it confirms that all the
desired data is wholly loaded into the target. This would involve testing the
data completeness with respect to the no. of rows loaded, no. of columns loaded
for each record, the contents loaded into each field.
3.2
Data Quality
Data Quality is
principally to check how the data denial, rectification, modification,
substitution is handled. There are various forms in which the data quality can
be upheld and checked.
Data
transformation involves testing and validating the business rules have been
applied correctly or not. The distinctive process to validate and test the
transformation of data is to select some sample records from the source and
then verify/validate them against the target data keeping in mind the business
logic. This involves a comprehensive manual testing.
3.4
Integration Testing
A
standard system testing would involve testing within the ETL application only.
It
shows how the application functions into the entire system flow along with all
the other upstream and downstream applications. While building the integration
test scenarios it necessities to reflect as to in what manner the overall
process can be broken and also focus on the points/gaps between the different
applications rather than one application. There arises a need for checking how
process failures at every stage would be handled and how data would be deleted
or recovered if essential.
The
primary goal of performance testing is to recognize the weaknesses in the ETL
architecture Performance is one essential aspect that necessarily be considered
with respect to a typical Data Warehousing application as the data in a Data
Warehouse becomes voluminous; ETL load times would keep increasing and then the
performance decreases. A decent ETL architecture and design diminishes the
consequence on performance.
3.6
Regression Testing
Regression
testing is reassuring and checking the prevailing functionality of the
application that whether introduction of new code has not disturbed the
existing functionality. While scheming the regression test scenarios, it needs
to be kept in observance that these cases would be implemented multiple times
as new releases are shaped due to defect fixes, enhancements or upstream system
changes. In order to perform the regression testing the approach which is
considered as faster, smooth and best is test automation.
The
chief objective of building a Data Warehouse application is to make the data
accessible to the business users. The user-acceptance testing does not test
that how the ETL application would function. It characteristically test the
data in the Data Warehouse application.
The general points that an ETL Tester keeps in mind
can be classified as follows:
v Do the
mappings stand by the development criteria and naming conventions?
v Do the
mappings perform as per the technical design defined?
v Do the
mappings work appropriately in relation to other processes?
v Test
Planning as per the Requirements and Technical Design.
v Test
ETL software
v Test
ETL data warehouse components
v Executing backend data-driven test.
v Formulating
the Test Strategy.
v Test
Case/Scenario preparation.
v Test
Data Management.
v Test
Execution.
v Recording
the Test Results.
v Categorize, troubleshoot and deliver solutions for potential
matters.
v Approve requirements and design stipulations.
v Defect
Maintenance.
v Performing
Test Audit and formulating Test Audit Report.
0 comments :
Post a Comment