Saturday 16 August 2014

Datawarehouse Testing ETL And Microstrategy Reports


Introduction to Data warehouse

 

  • A data warehouse is a relational database that is intended for extraction and analysis rather than for Insert/update transaction processing.
  • It contains synthesized data derived from operational data, pulled  from multiple transactional data sources of the organization. It helps organization to build the strategic decision making system and enables an organization to consolidate data from several relational and non relational transactional sources.
  • A typical Data Warehouse environment contain  a relational database to hold the data, extraction, transformation, and loading (ETL) engine to synthesize the data , an online analytical processing (OLAP) engine, end user analysis and reporting tools, and other custom built applications that manage the process of gathering data and delivering it to business users.
  • The Data Modeling technique used in a Warehouse used is more of de-normalized approach and not the one used in building the traditional OLTP systems.

1.1 Typical Differences between DWH and OLTP


              DWH
  • Data warehouse database
  • Designed for analysis of business metrics by subjects and attributes
  • Optimized for large loads and complex, unpredictable queries that access multiple rows per table
  • Loaded with pre-verified, valid data; Usually not validated in real time
  • Supports few concurrent users relative to OLTP
OLTP
  • OLTP database
  • Designed for real-time business operations
  • Optimized for a common set of transactions, usually  Inserting, modifying or retrieving a fewer row at a time per table
  • Designed for validation of incoming data during transactions; uses validation data tables
  • Supports thousands of concurrent users

 

1.2 Data Warehouse Architecture (Basic)


 

1.3 Data Warehouse Architecture (with a Staging Area)

1.4 Data Warehouse Architecture (with a Staging Area and Data Marts)

2. Different Types of Tables in Data Warehouse


2.1 Dimension table


Dimension tables are the perspective of Data Warehouse. Ex: Customer, Product, Stores, Discounts. Using Dimension table one can explain the facts more descriptively.

2.2 Fact Table


This table captures the data that measures the organization's business operations. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables.Fact tables usually contain large numbers of rows.

2.3 Aggregation Table


Aggregate tables are also called Summary Facts or Summary Tables. Usually these tables are aggregated from fact tables. For example: Daily Sales Fact table data can be aggregated for a week and stored in Weekly aggregate tables similarly one can have monthly, quarterly and yearly summary tables.




3. Datawarehouse Testing


3.1 Data Warehouse Life Cycle


 

 

3.2 Importance of Data warehouse testing


  • Data warehouses are designed to analyze the data.
  • In a single data warehouse there are multiples of tables and schemas where in some millions of records are transformed as per the business requirements.
  • There is every possibility that a minute change in the value of one field can affect thousands of records.
  • Ensures the data movement across all the tables and the schemas.
  • High Complexity of business rules and Business/syntaxical transformation of values in the each field requires a thorough testing.

 

3.3 Challenges in Data Warehouse testing


·         Data Warehouse Testing needs to deal with large volumes of data
·         Complex transformation logic necessitates high skill levels to design tests with optimal coverage
·         Higher Cost of Quality as the impact of defect slippage is high
·         Lack of well-defined data warehouse test strategies
·         Test environment and Test Data setup

3.4 Need For Data warehouse Testing


To ensure all the records in the underlying tables are being tested. Covering all similar records in a table in UI testing may not be feasible and is time consuming. Database testing also helps in finding out any data defects, and results in unearthing any defect at an earlier stage, making it easier to fix.

4. Testing Strategies for a DataWareHouse Application

A.   Replicate Business (Transformation) Logic



              Application

 












          Test SetupFlowchart: Process: DWH Testing Tool, where in  Process 1, Process 2 etc are replicated a high level.
Flowchart: Alternate Process: Final DataFlowchart: Alternate Process: Source Data
Source Data
                                                                                                                                                                                                                                                                                                                                                                  
Highlights 
           
·         DWH Application is not required for testing.
·         Source Data replica is fed into the DWH test set-up.
·         Target data is compared with that generated by the application.
·         Data transformation at each individual points is not checked. Only start and end points are considered.


B.  Accessing the DWH Application



Flowchart: Process: DWH Testing Tools, where in Process 1, Process 2 etc are, replicated a high level.Flowchart: Alternate Process: Process 1Flowchart: Alternate Process: Source Data
Source Data
Flowchart: Alternate Process: Process 2Flowchart: Alternate Process: Final Data                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  



Highlights 

·         DWH Application is required for testing.
·         Source Data need not be replicated.
·         Final data is compared with basedlined test data.
·         Transformation at each individual points can be checked, by pointing the Test-setup to different points with in the application.        

5. Tools for DWH Testing


DWH Testing cannot be a manual testing, simply because of the volume of data involved. Which tool to be used for Data ware house testing mainly depends on the environment, and the tool used in implementing the Datawarehouse. Listed below are some of the common tools that can be used for DataWarehouse testing.

Ø  MicroStrategy
Ø  TOAD (Quest Software)
Ø  Clearware (Infosys Internal Tool)

5.1 MicroStrategy


MicroStrategy is a business intelligence (BI), enterprise reporting, and OLAP (on-line analytical processing) software vendor.
The tool MicroStrategy is capable of reporting and analysis of data stored in
b)     Multidimensional database
c)    Flat data file.
MicroStrategy describes its core reporting software as having a "ROLAP" or "Relational OLAP" architecture. The company's most recent software suite is MicroStrategy 9, released in March 2009.
Common Products are Microstrategy Intelligence Server, Microstrategy OLAP Services and Microstrategy Developer Kit. 

5.2 TOAD (Quest Software)


TOAD is the market leader for Oracle Database Analysis, Development and Testing. Toad for Data Analysts helps to easily query, understand and document data from almost any source. 
With a single tool, you will be able to connect directly to data for querying and reporting, compare and sync data without hours of manual work, and automate and schedule frequent and repetitive tasks.

5.3 Clearware


Clearware is an Infosys Datawarehousing Testing Solution. It’s a web-based test solution exclusively for data warehouse validations. It provides end-to-end test life cycle management and traceability; supports reusability.
Clearware’s Business Rule Builder supports Conditional, Parameterized and nested rule building.
Clearware comes integrated with a test management tool, scalable to integrate with ETL, Test Data Management tools.
It has reporting capabilities (multiple formats) and also depict traceability.
Clearware supports heterogeneous databases viz Teradata, Oracle, IBM (DB2), and Microsoft SQL Server.  It also supports excel, flatfiles, xml and fixed length files.

6. ETL [data management and integration]


ETL is an acronym that stands for ‘Extract – Transform – Load’ and covers the software that is typically used as data integration tool for building data warehouse or data warehouse like environments.
 1)  Extraction of data from one or more sources.
 2) Transforming [e.g. cleansing, reformatting, standardization, aggregation, or the application of any   number of business rules] it to business needs.
 3)  Loading of the resulting data set into specified target systems or file formats.

6.1 Extract

The first part of an ETL process is to extract data from the source systems. Extraction can be of high volumes of data from large number of heterogeneous sources like relational, hierarchical and object databases, files with structured or unstructured data, XML documents and web services.

6.2 Transform

Transformation is simple format or type conversions, or complex integration and logic operations on extracted data.The transform stage applies a series of rules or functions to the extracted data from the source to load data to the end target.
While transforming data from source to target system, few systems will require manipulation of source data and some system will not require manipulation of data. ETL will transform data automatically without any human interaction to manipulate source data.

6.3 Load

In this phase processed data will load into target application usually target system will be DATA WAREHOUSE. Target system not just the database, but may include XML documents, flat files and desktop productivity application like Excel.

ETL sub processes :


·         Data Extraction: This is the process used to capture data from the source system and used to move the data to the staging data base.
·         Data Verification: The data extracted will be compared with the business rules that are specified in the business requirements document. The data quality rules are checked in this stage and it is the responsibility of the DB administrator or the Source system owner. The data that does not meet the data quality rules is handled in different ways.
·         Data Cleansing: The data is made more precise in the Data cleaning or cleansing stage. This stage uses things like Find and Replace, Convert case, Merging data, NULL Value handling, Data type conversion, splitting of files to multiple files, removing duplicates, reindexing existing data, streamlining the formatting of information that has been merged from different parent databases etc.
·         Data Integration: It is the phase where the data merging from the different source systems into a unified data interface. It involves creation new tables, new columns and usage of joins etc to tie data from different sources.
·         Aggregation: when all the tables are ready to be loaded in to Data warehouse; perform Summary calculations and store this data for quick run of queries.
·         Loading: This is the stage where the transformed data is loaded into data warehouse. The factors like size of tables, proportion of updates/inserts will influence loading.

7. ETL TESTING


·         Count  of number of rows from Source to Target  i.e.; to check whether all the data is loaded in to the database
·         If the Count from source to target is not matching; then the remaining should get loaded in to the corresponding Error tables with Error codes and reasons associated.
·         What tables and fields are affected when transformation takes place?
·         Data Movement without any transformations will be verified.
·         Transformation of source data to target according to the Business Rules.
·         Data is available in the Look up table before applying transformation.
·         Proper values are passed in the parameters where ever required.
·         Verify Session log to understand number of records loaded, number of records rejected, error details, start time and end time of tables.
·         Data is not truncated; i.e.; data is loaded completely or not?
·         Data is coming in the proper format.(Expected format)
·         Data should not be transformed if the Error occurs.
·         Proper error codes (something which is mentioned clearly in business requirement document or at least provided and approved by the client) are populated where ever error occurs.

 

 



7.1 ETL Tools:


·         Oracle Data Warehouse Builder
·         Oracle Data Integrator (Sunopsis ETL)
·         WisdomForce FastReader
·         iWay Software Data Migrator
·         SAS Software DataFlux
·         IBM WebSphere DataStage
·         Ab Initio
·         Informatica
·         Business Objects Data Integrator
·         Sun Microsystems ETL Integrator
·         Microsoft SSIS and DecisionStream

8. Report Testing(Database To Report)


·         Title of the reports is displayed correctly.
·         Field labels are displayed correctly.
·         Proper Tool Tip is displayed where ever applicable.
·         Data is displayed correctly in the columns according to the defined logic.
·         Data in the columns are displayed in the specified format.
·         Proper data is displayed in the corresponding columns.
·         Reports are in printable format.
·         Spell mistakes are not there in the reports.
·         Number of records to be displayed in the page is verified.
·         Page movement is happening properly; as expected.




8 comments :

Appreciation for nice Updates,nice blog article.for more details

http://www.tekclasses.com/

Very nice and informative blog.Nice piece of knowledge!For more details

http://www.tekclasses.com/

This comment has been removed by the author.

Thanks for sharing such a good and informative content to all of us
ETL Testing Online
ETL Testing Training

Post a Comment