Introduction to Data warehouse
- A
data warehouse is a relational database that is intended for extraction
and analysis rather than for Insert/update transaction processing.
- It
contains synthesized data derived from operational data, pulled from multiple transactional data sources
of the organization. It helps organization to build the strategic decision
making system and enables an organization to consolidate data from several
relational and non relational transactional sources.
- A
typical Data Warehouse environment contain
a relational database to hold the data, extraction, transformation,
and loading (ETL) engine to synthesize the data , an online analytical
processing (OLAP) engine, end user analysis and reporting tools, and other
custom built applications that manage the process of gathering data and
delivering it to business users.
- The
Data Modeling technique used in a Warehouse used is more of de-normalized
approach and not the one used in building the traditional OLTP systems.
1.1 Typical Differences between DWH and
OLTP
DWH
- Data
warehouse database
- Designed
for analysis of business metrics by subjects and attributes
- Optimized
for large loads and complex, unpredictable queries that access multiple
rows per table
- Loaded
with pre-verified, valid data; Usually not validated in real time
- Supports
few concurrent users relative to OLTP
OLTP
- OLTP
database
- Designed
for real-time business operations
- Optimized
for a common set of transactions, usually
Inserting, modifying or retrieving a fewer row at a time per table
- Designed
for validation of incoming data during transactions; uses validation data
tables
- Supports
thousands of concurrent users
1.2 Data Warehouse Architecture (Basic)
1.3 Data Warehouse Architecture (with a
Staging Area)
1.4 Data Warehouse Architecture (with a
Staging Area and Data Marts)
2. Different Types
of Tables in Data Warehouse
2.1 Dimension table
Dimension tables are the perspective of Data Warehouse.
Ex: Customer, Product, Stores, Discounts. Using Dimension table one can explain
the facts more descriptively.
2.2 Fact Table
This table captures the data that measures the
organization's business operations. A fact table typically has two
types of columns: those that contain facts and those that are foreign keys to
dimension tables.Fact tables usually contain large numbers
of rows.
2.3 Aggregation Table
Aggregate tables are also called Summary Facts or
Summary Tables. Usually these tables are aggregated from fact tables. For
example: Daily Sales Fact table data can be aggregated for a week and stored in
Weekly aggregate tables similarly one can have monthly, quarterly and yearly
summary tables.
3. Datawarehouse Testing
3.1 Data Warehouse Life Cycle
3.2 Importance of Data warehouse testing
- Data
warehouses are designed to analyze the data.
- In
a single data warehouse there are multiples of tables and schemas where in
some millions of records are transformed as per the business requirements.
- There
is every possibility that a minute change in the value of one field can
affect thousands of records.
- Ensures
the data movement across all the tables and the schemas.
- High
Complexity of business rules and Business/syntaxical transformation of
values in the each field requires a thorough testing.
3.3 Challenges in Data Warehouse testing
·
Data Warehouse Testing needs to deal with
large volumes of data
·
Complex
transformation logic necessitates high skill levels to design tests with
optimal coverage
·
Higher Cost of
Quality as the impact of defect slippage is high
·
Lack of
well-defined data warehouse test strategies
·
Test environment
and Test Data setup
3.4 Need For Data
warehouse Testing
To ensure all the records in the underlying tables are being
tested. Covering all similar records in a table in UI testing may not be
feasible and is time consuming. Database testing also helps in finding out any
data defects, and results in unearthing any defect at an earlier stage, making
it easier to fix.
4.
Testing Strategies for a DataWareHouse Application
A.
Replicate Business
(Transformation) Logic
Application
Test
Setup
Highlights
·
DWH Application is not required for
testing.
·
Source Data replica is fed into the DWH
test set-up.
·
Target data is compared with that generated
by the application.
·
Data transformation at each individual
points is not checked. Only start and end points are considered.
B. Accessing the DWH Application
Highlights
·
DWH Application is required for testing.
·
Source Data need not be replicated.
·
Final data is compared with basedlined test
data.
·
Transformation at each individual points
can be checked, by pointing the Test-setup to different points with in the
application.
5.
Tools for DWH Testing
DWH Testing cannot be a manual testing,
simply because of the volume of data involved. Which tool to be used for Data
ware house testing mainly depends on the environment, and the tool used in
implementing the Datawarehouse. Listed below are some of the common tools that
can be used for DataWarehouse testing.
Ø
MicroStrategy
Ø
TOAD
(Quest Software)
Ø
Clearware
(Infosys Internal Tool)
5.1
MicroStrategy
MicroStrategy is a business intelligence (BI), enterprise reporting, and OLAP
(on-line analytical processing) software
vendor.
The tool MicroStrategy is capable of reporting and
analysis of data stored in
b)
Multidimensional database
c)
Flat data file.
MicroStrategy describes its core reporting software as
having a "ROLAP"
or "Relational OLAP" architecture. The company's most recent software
suite is MicroStrategy 9, released in March 2009.
Common Products are Microstrategy Intelligence Server,
Microstrategy OLAP Services and Microstrategy Developer Kit.
5.2
TOAD (Quest Software)
TOAD is the market leader for Oracle Database Analysis,
Development and Testing. Toad for Data Analysts helps to easily query,
understand and document data from almost any source.
With a single tool, you will be able to connect
directly to data for querying and reporting, compare and sync data without
hours of manual work, and automate and schedule frequent and repetitive
tasks.
5.3
Clearware
Clearware is an Infosys Datawarehousing Testing
Solution. It’s a web-based test solution exclusively for data warehouse
validations. It provides end-to-end test life cycle management and
traceability; supports reusability.
Clearware’s Business Rule Builder supports Conditional,
Parameterized and nested rule building.
Clearware comes integrated with a test management tool,
scalable to integrate with ETL, Test Data Management tools.
It has reporting capabilities (multiple formats) and
also depict traceability.
Clearware supports heterogeneous databases viz
Teradata, Oracle, IBM (DB2), and Microsoft SQL Server. It also supports excel, flatfiles, xml and
fixed length files.
6. ETL [data management and integration]
ETL is an acronym that stands for ‘Extract – Transform
– Load’ and covers the software that is typically used as data integration tool
for building data warehouse or data warehouse like environments.
1)
Extraction of data from one or more sources.
2) Transforming [e.g. cleansing,
reformatting, standardization, aggregation, or the application of any number of business rules] it to business
needs.
3) Loading
of the resulting data set into specified target systems or file formats.
6.1 Extract
The first part of an ETL process is to extract data
from the source systems. Extraction can be of high volumes of data from large
number of heterogeneous sources like relational, hierarchical and object
databases, files with structured or unstructured data, XML documents and web
services.
6.2 Transform
Transformation is simple format or type conversions, or
complex integration and logic operations on extracted data.The transform stage
applies a series of rules or functions to the extracted data from the source to
load data to the end target.
While transforming data from source to target system,
few systems will require manipulation of source data and some system will not
require manipulation of data. ETL will transform data automatically without any
human interaction to manipulate source data.
6.3 Load
In this phase processed data will load into target
application usually target system will be DATA WAREHOUSE. Target system not
just the database, but may include XML documents, flat files and desktop
productivity application like Excel.
ETL sub processes :
·
Data
Extraction: This is the process used to capture data from the source
system and used to move the data to the staging data base.
·
Data
Verification: The data extracted will be compared with the business rules
that are specified in the business requirements document. The data quality
rules are checked in this stage and it is the responsibility of the DB
administrator or the Source system owner. The data that does not meet the data
quality rules is handled in different ways.
·
Data
Cleansing: The data is made more precise in the Data cleaning or
cleansing stage. This stage uses things like Find and Replace, Convert case,
Merging data, NULL Value handling, Data type conversion, splitting of files to
multiple files, removing duplicates, reindexing existing data, streamlining the
formatting of information that has been merged from different parent databases
etc.
·
Data
Integration: It is the phase where the data merging from the different
source systems into a unified data interface. It involves creation new tables,
new columns and usage of joins etc to tie data from different sources.
·
Aggregation: when
all the tables are ready to be loaded in to Data warehouse; perform Summary
calculations and store this data for quick run of queries.
·
Loading: This
is the stage where the transformed data is loaded into data warehouse. The
factors like size of tables, proportion of updates/inserts will influence
loading.
7. ETL TESTING
·
Count
of number of rows from Source to Target
i.e.; to check whether all the data is loaded in to the database
·
If the Count from source to target is not
matching; then the remaining should get loaded in to the corresponding Error
tables with Error codes and reasons associated.
·
What tables and fields are affected when
transformation takes place?
·
Data Movement without any transformations
will be verified.
·
Transformation of source data to target
according to the Business Rules.
·
Data is available in the Look up table
before applying transformation.
·
Proper values are passed in the parameters
where ever required.
·
Verify Session log to understand number of
records loaded, number of records rejected, error details, start time and end
time of tables.
·
Data is not truncated; i.e.; data is loaded
completely or not?
·
Data is coming in the proper format.(Expected
format)
·
Data should not be transformed if the Error
occurs.
·
Proper error codes (something which is
mentioned clearly in business requirement document or at least provided and
approved by the client) are populated where ever error occurs.
7.1 ETL Tools:
·
Oracle Data Warehouse Builder
·
WisdomForce FastReader
·
iWay Software Data Migrator
·
Business Objects Data Integrator
·
Microsoft SSIS and DecisionStream
8. Report Testing(Database To Report)
·
Title of the reports is displayed
correctly.
·
Field labels are displayed correctly.
·
Proper Tool Tip is displayed where ever
applicable.
·
Data is displayed correctly in the columns according
to the defined logic.
·
Data in the columns are displayed in the
specified format.
·
Proper data is displayed in the
corresponding columns.
·
Reports are in printable format.
·
Spell mistakes are not there in the
reports.
·
Number of records to be displayed in the
page is verified.
·
Page movement is happening properly; as
expected.
8 comments :
Appreciation for nice Updates,nice blog article.for more details
http://www.tekclasses.com/
Very nice and informative blog.Nice piece of knowledge!For more details
http://www.tekclasses.com/
Nice information, this is will helpfull a lot, Thank for sharing, Keep do posting i like to follow this informatica online training
informatica online course
informatica bdm training
informatica developer training
informatica training
informatica course
Thanks for sharing very useful information. Keep posting.
etl course
etl testing training hyderabad
Thanks for sharing such a good and informative content to all of us
ETL Testing Online
ETL Testing Training
NIce blog and in this blog content was usseful
micro strategy online training
micro strategy courses
An awesome blog for the freshers. Thanks for posting this information.
Microstrategy Training
Microstrategy Online Training
Post a Comment