1.0
Introduction:
Data Warehouse is a
central repository of data that receives data from one or more source systems.
It is used to store both historical and current data and is mainly used for
reporting and data analysis such as annual and quarterly Comparisons. To build
a data warehouse, ETL tool is used which works as an integrator. The name
ETL represents Extraction, Transformation and Loading.
ETL tool extracts the data from different source systems,
transforms the data in the desired format based on the business transformation
rules and load the data in the database which is named as Data Warehouse
1.1
Difference between Operational database and Data warehouse
Operational
Database: It is primarily designed for
supporting day to day operations.
Data Warehouse: It is primarily designed for supporting
strategic decision making
1.2
Characteristics of Data warehouse
Subject Oriented: Data in the Data Warehouse is organized in such a way where all
the data elements are linked together which is related to the same real-world
object.It is mainly used to analyze a particular subject area.
Integrated:Data Warehouse integrates data from different source systems. For example consider 3 source
systems named X, Y and Z and these 3 systems have different ways of identifying
a product. But coming to Data warehouse there will be not be many, only one way
of identifying a product.
Time-Variant:Data warehouse contains not only the current data but also the
historical data. For example, we can retrieve the data based on time-variance,
like 6 months or 1 year or data even older than 1 year from a data warehouse.
Non-volatile:The data will not be changed once it is loaded into the Data
warehouse, never be overwritten or deleted.The historical data in a Data
warehouse is static, so it never be alerted or modified but retained the data
for future reporting.
2.0
Data Warehouse Testing
Data warehouse testing is a process used to verify the data is
extracted, transformed and loaded correctly in the Data warehouse and also
ensures that there will be not be any data loss .The data remains complete, it should
be retrievable and is made available for reporting and analysis.
2.1 Comparison between Data Warehouse
Testing and Database Testing
Database Testing
|
Data Warehouse Testing
|
Normally DB Testing is
performed using smaller scale of data
Example: OLTP (Online transaction processing)
databases
|
DW testing is
performed with huge volume of data
Example: OLAP (online analytical processing) databases.
|
Data will be received
from uniform sources which are consistent
|
Data will be received
from multiple sources which are inconsistent
|
Create, Read, Update and Delete operations
are performed in database testing
|
Only Select operation is performed in data
warehouse testing
|
2.2 Data Warehouse Architecture with a Staging Area and Data Marts
Data Sources
DataMarts
User
Reports
|
ETL ETL
Process Process
From the above Architecture diagram
1. Data Sources may be operational systems or Flat
Files
2. Staging area is the one where data gets
processed before the data gets loaded in the data warehouse
3. Data warehouse contains raw data, metadata, and
summary data
4. Data marts are the access layers of the Data
warehouse, and are usually oriented to a specific business line or team such as
Sales, purchasing etc.
5. The data from the data marts is used for
generating the reports and for analysis
2.3 Issues encountered while building
a Data warehouse
The different types of
issues that we normally encounter when we are building a Data warehouse are
- Data may be invalid
- Data may be inconsistent
- Data may be inaccurate
In order to overcome
this we need to go through the following processeswhich is used to present a
clear and consistent view of data to the users
- Data mapping : Once the data gets loaded into the data
warehouse field to field mapping and record count should be verified to
ensure the data is loaded correctly in to the database based on the ETL
logic
§ Data
Cleansing: The process of removing incorrect and incomplete data, improperly formatted
and duplicate data in the database is defined as Data Cleansing or Data
Scrubbing. Data Scrubbing tool is used for examining data flaws by using
certain rules, algorithms and look-up tables
2.4 Data WarehouseValidations (ETL Testing)
- Verify the logic for Data Extraction
- Verify
that data is transformed correctly as per logic or business transformation
rules. Transformation Logic should be validated
- Verify
all the database fields and field data is loaded correctly in the
warehouse without any truncation
- Verify
the data and record counts are matching in the source and target tables.
- Verify
the data is loaded within expected time frames in the Data warehouse. This
ensures improved performance and scalability.
- Verify proper error
logs are generated for the rejected data that should consists of all
necessary details
- Verify Data
integrity and also ensure duplicate data is not loaded in the data
warehouse
- Verify the
fields that contain NULL values
2.5 Report Testing
For any DataWarehouse end result will be the
Reports and the main purpose for which Data warehouse is build.
2.6
Issues encountered during Reporting
The different types of
issues that we normally encounter when we are accessing the data that is used for
reporting and analysisare
- Data can’t be
accessible
- Data may be
blocked by security
- Data maybe in
wrong format
To overcome this we need
to go though a process like Summarizing the data, Performing calculations
and Reorganizing the data before the
data is made available for ‘ Reporting’ .
2.7 Report Validations
- Validating the layout format, prompts , filter
attributes that appears on the report
- Validating the sorting and export functions of the
reports in the web environment
- Validating of data in the reports containing derived
metrics such as aggregates and subtotals
3.0 Conclusion
Data
warehouse testing is challenging task as data is very huge. It requires proper
planning and End to End testing.Due to evolving needs of business,and due to
changes in the data sources, there will be continuous changesthat occur in the
data warehouse schema and the data being loaded. Hence, it is very much
important to clearly define the development and End to End testing processes,
and that should be followed by strong alignment and Impact Analysis between the
different teams like development, operations and finally business.
1 comments :
Thanks for sharing valuable information and its very useful. Keep posting.
etl testing course in hyderabad
etl testing certification
Post a Comment