Saturday 16 August 2014

Data Warehouse Testing -ETL Testing



1.0 Introduction:

Data Warehouse is a central repository of data that receives data from one or more source systems. It is used to store both historical and current data and is mainly used for reporting and data analysis such as annual and quarterly Comparisons. To build a data warehouse, ETL tool is used which works as an integrator. The name ETL represents Extraction, Transformation and Loading.

ETL tool extracts the data from different source systems, transforms the data in the desired format based on the business transformation rules and load the data in the database which is named as Data Warehouse
1.1 Difference between Operational database and Data warehouse

Operational Database:  It is primarily designed for supporting day to day opera­tions.

Data Warehouse: It is primarily designed for supporting strategic decision making
1.2 Characteristics of Data warehouse
Subject Oriented: Data in the Data Warehouse is organized in such a way where all the data elements are linked together which is related to the same real-world object.It is mainly used to analyze a particular subject area.
Integrated:Data Warehouse integrates data from different source systems.           For example consider 3 source systems named X, Y and Z and these 3 systems have different ways of identifying a product. But coming to Data warehouse there will be not be many, only one way of identifying a product.
Time-Variant:Data warehouse contains not only the current data but also the historical data. For example, we can retrieve the data based on time-variance, like 6 months or 1 year or data even older than 1 year from a data warehouse.
Non-volatile:The data will not be changed once it is loaded into the Data warehouse, never be overwritten or deleted.The historical data in a Data warehouse is static, so it never be alerted or modified but retained the data for future reporting.
2.0 Data Warehouse Testing

Data warehouse testing is a process used to verify the data is extracted, transformed and loaded correctly in the Data warehouse and also ensures that there will be not be any data loss .The data remains complete, it should be retrievable and is made available for reporting and analysis.








2.1 Comparison between Data Warehouse Testing and Database Testing


         Database Testing

        Data Warehouse Testing
Normally DB Testing is performed using smaller scale of data
Example:  OLTP (Online transaction processing) databases
DW testing is performed with huge volume of data
Example: OLAP (online analytical processing) databases.

Data will be received from uniform sources which are consistent
Data will be received from multiple sources which are inconsistent
Create, Read, Update and Delete operations are performed in database testing
Only Select operation is performed in data warehouse testing


2.2 Data Warehouse Architecture with a Staging Area and Data Marts

Data Sources                                                         DataMarts
Flowchart: Magnetic Disk:   Data Warehouse
 




Flowchart: Magnetic Disk:  Staging   
   Area
User
Flowchart: Magnetic Disk: DM22@Flowchart: Magnetic Disk: DS2                                                            Reports
Reporting
 
                  ETL                              ETL 
                  Process                        Process     
Flowchart: Magnetic Disk: DM3M                                                           
Flowchart: Magnetic Disk: DS3
 






   From the above Architecture diagram

1.    Data Sources may be operational systems or Flat Files
2.    Staging area is the one where data gets processed before the data gets loaded in the data warehouse
3.    Data warehouse contains raw data, metadata, and summary data
4.    Data marts are the access layers of the Data warehouse, and are usually oriented to a specific business line or team such as Sales, purchasing etc.
5.    The data from the data marts is used for generating the reports and for analysis 





2.3 Issues encountered while building a Data warehouse

The different types of issues that we normally encounter when we are building a Data warehouse are

  • Data may be invalid
  • Data may be inconsistent
  • Data may be inaccurate

In order to overcome this we need to go through the following processeswhich is used to present a clear and consistent view of data to the users

  • Data mapping : Once the data gets loaded into the data warehouse field to field mapping and record count should be verified to ensure the data is loaded correctly in to the database based on the ETL logic

§  Data Cleansing: The process of removing incorrect and incomplete data, improperly formatted and duplicate data in the database is defined as Data Cleansing or Data Scrubbing. Data Scrubbing tool is used for examining data flaws by using certain rules, algorithms and look-up tables
2.4 Data WarehouseValidations (ETL Testing)
  • Verify the logic for Data Extraction
  • Verify that data is transformed correctly as per logic or business transformation rules. Transformation Logic should be validated
  • Verify all the database fields and field data is loaded correctly in the warehouse without any truncation
  • Verify the data and record counts are matching in the source and target tables.
  • Verify the data is loaded within expected time frames in the Data warehouse. This ensures improved performance and scalability.
  • Verify proper error logs are generated for the rejected data that should consists of all necessary details
  • Verify Data integrity and also ensure duplicate data is not loaded in the data warehouse
  • Verify the fields that contain NULL values

2.5 Report Testing
For any DataWarehouse end result will be the Reports and the main purpose for which Data warehouse is build.
2.6 Issues encountered during Reporting
The different types of issues that we normally encounter when we are accessing                the data that is used for reporting and analysisare

  •  Data can’t be accessible
  •  Data may be blocked by security
  •  Data maybe in wrong format

To overcome this we need to go though a process like Summarizing the data, Performing calculations and  Reorganizing the data before the data is made available for ‘ Reporting’ .

2.7 Report Validations

  • Validating the layout format, prompts , filter attributes that appears on the report
  • Validating the sorting and export functions of the reports in the web environment
  • Validating of data in the reports containing derived metrics such as aggregates and subtotals


3.0 Conclusion

Data warehouse testing is challenging task as data is very huge. It requires proper planning and End to End testing.Due to evolving needs of business,and due to changes in the data sources, there will be continuous changesthat occur in the data warehouse schema and the data being loaded. Hence, it is very much important to clearly define the development and End to End testing processes, and that should be followed by strong alignment and Impact Analysis between the different teams like development, operations and finally business.





1 comments :

Thanks for sharing valuable information and its very useful. Keep posting.

etl testing course in hyderabad
etl testing certification

Post a Comment