1.
Introduction:
SAS Enterprise Guide is used for
publishing result by analyzing and manipulating the data. It is a point and click tool which enable
business user to perform task with the help of menu and wizard. It also has
programming capability which generates code to speed up the productivity for
analyses and forecasting the in real time data. It has other feature as
centralized and role-based security. It allows easy access to enterprise data
sources and any external file.
Based on your need and requirement,
features of SAS EG can be explored. We have explored the features of the tool
to perform ETL Testing.
a.
SAS can validate data from various
heterogeneous sources such as Oracle, MS SQL, DB2, Teradata and Flat File.
b.
SAS provides record level comparison
report.
c.
Delta load can be validated as we
store previous snapshot of the target database in Dataset.
d.
SAS has in-build Application
automation framework (AOF) which can be used to automate and schedule
execution.
e.
SAS scripts are portable. Base SAS
script written can be re-used across different environment.
Extract, Transform and Load (ETL)
involves below steps:
1) Extract data from outside sources
2) Transform extracted data to fit
business/operational needs
3) Load transformed data to the end
target (Eg. An operational data store or a data mart or a extract or data
warehouse)
ETL Testing aims at validating the ETL process. So in ETL Testing involves:
1)
Extracting source data from outside sources
2)
Transforming it to fit operational needs (which can include quality levels)
3)
Importing the Target data after success ETL.
4)
Comparison between Transformed source data and Target Data to find any
discrepancies.
SAS EG is basically used as:
1)
Analytic and Reporting tool.
2)
ETL tool and
3)
Forecasting tool.
We are using SAS tool for testing ETL process.
In Enterprise Guide we have a
mechanism to organize the work and can also group the sequence of tasks, data
items and results into Projects. All the code and tasks that are executed
within an Enterprise Guide session are available in the Active Project (the project
in which the code is being executed) and each of them are displayed as separate
nodes within Process Flow view and Project Tree view.
Useful Terms:
The files which are
understandable by SAS are called as SAS files. All SAS file will reside in a
SAS data library. There are three types of SAS files – SAS dataset, SAS
catalogue and SAS stored program.
i. SAS Dataset:
A SAS Dataset contains the
data in a structured format that SAS can process.
ii. SAS Stored Program:
SAS Stored program contains
the compiled code that a user creates and saves for repeated use.
SAS files can be External
File (ASCII or text file and files generated by other application apart from
SAS) or Internal File (Created by SAS program). All internal SAS file is stored
in a SAS library, which is a collection of SAS files. Based on operating
environments, a library can be physical or logical collection of files.
By default, SAS defines several libraries:
i.
SAS helpis a permanent library that contains sample data and other files that
control how SAS works at your site, this is a read-only library
ii.
SAS useris a permanent library that contains SAS files in the Profile catalog that
store user’s personal settings. This is also a convenient place to store your own
files. Permanent SAS libraries are available to you during subsequent SAS
sessions as well. To store files permanently in a SAS data library, you specify
a library name other than the default library name Work.
iii.
Work is a temporary library for files that do not need to be saved from session
to session. Temporary SAS libraries last only for the current SAS session. If
you don't specify a library name when you create a file (or if you specify the
library name Work), the file is stored in the temporary SAS data library.
i.
Data Step:
SAS Dataset can be created using DATA step.
Dataset is a collection of data and data dictionary. Data dictionary defines
the variable and their properties. SAS procedures can read data which is in the
form of a SAS dataset.
ii.
PROC Step:
PROC is abbreviation of PROCEDURE. A
PROC step calls an in-build SAS procedure for analyzing or processing a SAS
dataset.
Features:
·
Predefined routines to
analyze and process the data in a SAS data set.
·
Present data in the form of
report
o
create a report that lists
the data
o
produce descriptive
statistics
o
create a summary report
o
produce plots and charts
·
Create a new data set for
storing the result of procedure.
·
OPERATIONS: list, sort,
summarize data
For example: PROC SQL; PROC PRINT; PROC MEANS;
PROC CONTENT;
External Files are those
files whose structure of data is not recognized by SAS. Raw data, SAS
program statements or procedure output can be the present
in External Files.
The below are few useful
System options provided by SAS. These system options can be used in DATA Step.
PAGESIZE option specifies
the maximum number of lines to print on each page of the output.
a. SKIP:
SKIP option skips a
specified number of lines in the SAS log.
b. CENTER:
CENTER option verifies
whether the output is centered. By default titles and procedure output are
centered by SAS.
c. LABEL:
This system option is used
to assign descriptive portion to the SAS procedures.
d. TITLE:
This system option prints
titles at the top of each output page. By default, SAS prints the titles: “The
SAS system”. TITLE system option can be used to replace the default title or
specify other descriptive titles for SAS programs. NULL TITLE system option can
be used to suppress a TITLE statement.
a. PROC PRINT:
This Proc prints out a data set .The title statement is optional.
b. PROC IMPORT:
PROC IMPORT is a very useful tool for converting external files (text,
excel, xml etc.) to SAS data sets.
c. PROC SQL:
It is the SAS implementation of the Standard Query Language (SQL) standard
to work with databases and is used in Teradata, Oracle, Microsoft SQL Server,
Microsoft Access and other database systems. PROC SQL is a powerful language
that allows sophisticated merges. When you merge datasets with PROC SQL you do
not need to sort them. QUIT statement is used to end PROC SQL. The SQL query is between PROC SQL and QUIT
(as a very long statement).
d. PROC
Sort:
This Proc statement is used to sort the observations in the datasets.
e. PROC
Compare:
This Proc statement is used to compare two datasets, one of the two
datasets involved in the comparison will be a Source dataset and the other
dataset will be the target dataset.
f. PROC PLOT:
This proc plots two variables. The plot command species the variables to be
plotted.
g. PROC UNIVARIATE and PROC MEANS:
Both these compute descriptive statistics (means, variances, other
quantities). UNIVARIATE computes more descriptive statistics. MEANS gives more
compact output and is easier to use to produce a new data set. BY command is
used for descriptive statistics.
i.
To reference a permanent SAS data set in your SAS programs, you use a
two-level name : libref.filename
ii.
To reference a temporary SAS data set in
your SAS programs, you use a one-level name, the default libref Work is
assumed.
i.
For many procedures and for some DATA step statements, data must be in the
form of a SAS data set to be processed.
ii.
Conceptually, a SAS data set is a file that consists of two parts
§ A descriptor portion: The descriptor portion of a SAS data set contains
information about the data set, which include name of the data set, date and
time of data set creation, the number of observations(rows), the number of variables
§ A
data portion: The data portion contains the actual data arranged in a rectangular
tabular format.
i.
Sometimes a SAS
data set also points to one or more indexes
Each time a step is
executed, SAS generates a log of the processing activities and the results of
the processing. The SAS log collects messages about the processing of SAS
programs and about any errors that occur. You get separate sets of messages for
each step in the program. All SAS programs produce log messages, but some SAS
programs produce only log messages.
You can specify result
formats to create your output as:
a.
SAS
Report
b.
HTML
document
c.
RTF
document
d.
PDF
e.
Text
Output
f.
Below framework can be
followed to validate target using ETL tool:
1.
Importing data from External file as
SAS Dataset: Any external file such as delimited flat file (flat file, fixed length
file, XML file) can be imported into SAS environment by using the “IMPORT”
option form the FILE Menu. Below figure shows the process of importing a file
into SAS EG.

2. Importing data
from relational database asSAS Dataset:
Importing
target data from relational database can be done in two methods.
Method 1(By using “CONNECT TO” SAS
keyword):
Below are the
steps to import data into SAS EG by connecting to a relational database by
using “CONNECT TO” keyword provided by SAS EG.
STEP1: Mention
the database name like Oracle/Teradata/DB2 as Source.
STEP2: Enter
the login Credentials of the database that needs to be connected.
STEP3: Provide
the SERVER name and the DATABASE name of the database, from where the data is
to be fetched.
STEP4: Embed
the above written SQL query within the “PROC SQL” and “Quit” statements.
Method 2(By using “LIBNAME” statement of
SAS EG):
Below are the
steps to import data into SAS EG by connecting to a relational database by
using “LIBNAME” keyword provided by SAS EG.
STEP1: Mention
the database name like Oracle/Teradata/DB2.
STEP2: Enter
the login Credentials of the database that needs to be connected.
STEP3: Provide
the SERVER name and the DATABASE name of the database, from where the data is
to be fetched.
STEP4: Assign
a reference variable to this LIBNAME statement. This reference variable is used
by SAS EG for connecting to a database.
3.
Creating scripts for Applying
Transformation & Mapping rules: Using SAS Language element (Procedures and DATA Steps)
scripts can be created to implement the mapping logics and the transformation
rules provided in the mapping document.
4.
Execution
of Scripts:
Scripts written in above step is executed to generate Transformed dataset from
source dataset. Refer the log to see the execution result. Remove any
syntax/logical error while execution.
5.
Comparing the Transformed Dataset with
target Dataset: Compare the above generated datasets by making using ‘PROC COMPARE’
procedure statement.
6.
Stored
the comparison result in ODS: Comparison result can be stored into ODS for future
processing and analysis.
This execution can be applicable for both Inbound and Outbound process in
data Warehouse:
•
Building
Block:
§ Relative SAS Server
paths
§ Environment variables
§ Schema details of
source and target(if applicable)
Follow the below
process for each extract:
•
Import
source and target data in SAS data sets
§ Create Load sas script to fetch source/target data in SAS (if
source/target is extract file) (E.g. Asset_Load_ExtractName)
OR
Create setup sas script to
fetch source/target data in SAS (If source/target is table) (E.g.
Asset_Setup_ExtractName)
•
Control
and balance
§ Row Count: Write Check Row Count script for checking Control and
balance. (E.g. Asset_CheckRowCount )
•
Straight
Move:
§ Write One To One Move script for Straight Move. (E.g. Asset_OneToOneMove)
•
Transformed
Move:
§ Create Mapping script for each transformation move. As per the test case
document execute script corresponding to each transformation move. (E.g.
Asset_ExtractName_Mapping)
•
Column
Comparison
§ Create Compare Column script for column to column comparison in source
and target data.(E.g. Asset_CompareColumn )
SAS EG is a versatile tool with lot of capabilities. SAS EG is available
to a wide range of users with a variety of skill levels. It includes
easy-to-use interface and programmable capabilities. It has basic methods of
data exploration with descriptive statistics. SAS Enterprise Guide enables you
to manage and manipulate data effectively and efficiently to prepare it for
analysis.
9 comments :
I must confess that it is very promising,if you are interested visit below link
http://www.tekclasses.com/
Nice list of Citations.
http://www.tekclasses.com/
Well,Certainly a treasured Article.if you are interested visit below link
http://www.tekclasses.com/
Your post about technology was very helpful to me. Very clear step-by-step instructions. I appreciate your hard work and thanks for sharing.
SAS Training in Chennai
SAS Course in Chennai
Your post is extremely awesome. This blog is extremely instructive and valuable. A debt of gratitude is in order for sharing this data.
Article Submission sites | Latest Updates | Technology
Much appreciated for clarifying for all intents and purposes. Incredible Post! Get more data.
Education | Article Submission sites | Technology
So glad I stumbled on this blog post! I'm going to post a link to you on my blog because I think this is such a great tutorial JAVA Training in Chennai |
JAVA Course in Chennai |
Best JAVA Training in Chennai
Great website, looks very clean and organized. Keep up the good work!
SAS Course in Chennai |
SAS Training Institutes in Chennai |
SAS Institute in Chennai
Thanks for sharing valuable information and very well explained. keep posting.
etl testing online
etl testing online training
Post a Comment