Sunday, March 30, 2008

BEST PRACTICES FOR INFORMATICA DEVELOPMENT



BEST PRACTICES FOR INFORMATICA DEVELOPMENT

FOLDERS:

Folders names need to be carefully chosen and a naming standard needs to be implemented before any folder is created. The goal of a folder name is to have the ability to develop and execute in a logical and efficient manner. The following are different methods for naming folders.

Folders by Subject (Target) Area

The “Subject Area Division” method provides a solid infrastructure for larger data warehouse or data mart developments by organizing work by key business area. This strategy is particularly suitable for large projects populating numerous target tables. Example folder names: SALES, DISTRIBUTION, etc.

Folder Division by Environment

This method is easier to establish and maintain than Folders by Subject Area, but is suitable only for small development teams working with a minimal number of mappings. As each developer completes unit tests in his/her individual work folders, the mappings or objects are consolidated as they are migrated to test or QA. Migration to production is simplified significantly, with the maximum number of required folder copies limited to the number of environments. Eventually however, the number of mappings in a single folder may become too large to easily maintain. Example folder names: DEV1, DEV2, DEV3, TEST, QA, etc.

Folder Division by Source Area

The “Source Area Division” method is attractive to some development teams, particularly if development is centralized around the source systems. In these situations, the promotion and deployment process can be quite complex depending on the load strategy. Example folder names: ERP, BILLING, etc.

Migration Considerations

If the migration approach adopted by the Technical Architect involves migrating from a development repository to another repository (test or production), it may make sense for the “target” repository to mirror the folder structure within the development repository. This simplifies the repository-to-repository migration procedures. Another possible approach is to assign the same names to corresponding database connections in both the “source” and “target” repositories. This is particularly useful when performing folder copies from one environment to another because it eliminates the need to change database connection settings after the folder copy has been completed.

The following is the recommended naming standard for Folders in the Repository:

Folder Names

Development:

Project folders: DEV_PROJECT_PROCESS_BUSINESSUNIT

Shared Folders: DEV_SHR_GLOBAL/LOCAL_DESCRIPTION

QA/Production:
Project folders: PROJECT_PROCESS_BUSINESSUNIT

Shared Folders: SHR_GLOBAL/LOCAL_DESCRIPTION

SECURITY:

Determining an optimal security configuration for a PowerMart environment requires a thorough understanding of the business requirements, end users’ access requirements, and the data content, and well as the security facilities inherent in the Informatica product suite.

Users

Security should be setup to allow for easy maintenance and scalability. There will be three types of users:

¨ Administrators - These users will have access to all folders and will have the ability to perform maintenance on the repository including migration, security, and folder control.

¨ Developers - These users will have access to all the folders that reside within the developer’s group. All developers will belong to at least two groups: Project_Developers and Global_Developers.

¨ Global Developers - These users will have access to all the folders that reside within the Global Developer’s group. They will be the developers that can edit the shared objects as well as edit mappings and objects within the project folders. All Global developers will belong to at least two groups: Project_Developers and Global_Developers.

¨ QA - These users will have the ability to BROWSE the repository and not change anything. They will belong to all the groups that pertain to the QA testing, but security set at the user level will prevent the user from writing to the repository.

¨ The following is the USER level security settings and GROUP membership for each user:

Developers

Privileges:

Session Operator, Use Designer, Browse Repository, Create Sessions and Batches

Groups:

Global_Development, Project_Development

QA

Privileges:

Browse Repository

Groups:

Administration, Global_Development, Project_Development, Private

Administrators

Privileges:

Session Operator, Use Designer, Browse Repository, Create Sessions and Batches, Administer Repository, Administer Server, Super User

Groups:

Administration, Global_Development, Project_Development, Private

Groups -

There will be four GROUPS within the repository. The groups are defined as follows:

¨ Project_Development - This group will contain developers that are working on a specific project and be used to organize folders according to project.

¨ Global_Development - This group will contain all the developers and will be used to identify what folders will allow access across the repository. All developers will belong to this group.

¨ Private - This group will be used to organize folders that will not be used, but need to exist on the repository.

¨ Administration - This group will contain all the Administrators and will be used to restrict access to folders that only Administrators may access.

ERROR HANDLING:

¨ In general there are three methods for handling data errors detected in the loading process. Reject critical strategy is recommended.

1. Reject all – which is the simplest to implement because it rejects all the errors. Both the dimensional and factual data are rejected when errors are encountered.

2. Reject None – the data integrity is none but data is intact.

3. Reject Critical – This approach requires categorizing the data as either Key Elements vs Attributes or Inserts Vs Updates. Key elements are required fields.

¨ All the dimension tables should be made as unconnected lookups. When populating the fact, all the dimensions should be looked upon. Based on the business rules the data should be accepted or rejected and put into an error table with a proper error message. Users should be given a provision to correct the errors so the all the data is accounted for in the datamart. m_risk_basefact_load is one of the mapping, which can be modified to implement the best practice.

NAMING CONVENTIONS FOR TRANSFORMATION OBJECTS:

The following guidelines suggest naming conventions for objects within the repository:

Transformation Objects

Naming Convention

Expression Transform:

exp_TargetTableName(s) that leverages the expression or a name that describes the procession being done.

Lookup Transform:

lkp_LookupTableName.

Source Qualifier Transform:

sq_SourceTable1_SourceTable2.

Aggregator Transform:

agg_TargetTableName(s) that leverages the expression or a name that describes the procession being done.

Filter Transform:

fltr_ TargetTableName(s) that leverages the expression or a name that describes the procession being done.

Update Strategy Transform:

upd_TargetTableName(s) that leverages the expression or a name that describes the procession being done.

Normalizer Transform:

norm_TargetTableName(s) that leverages the expression or a name that describes the procession being done.

Rank Transform:

rnk_TargetTableName(s) that leverages the expression or a name that describes the procession being done.

Stored Procedure Transform:

sp_StoredProcedureName

External Procedure Transform:

ext_ProcedureName

Joiner Transform:

jnr_SourceTable/FileName1_ SourceTable/FileName2

Mapping Name:

m_TargetTable1_TargetTable2

Session Name:

s_MappingName

Batch Names:

bs_BatchName for a sequential batch and bc_BatchName for a concurrent batch.

Folder Name:

A name to logically group similar data and transformations. It is often the Subject Areas that make up the data mart or the names of the individual data marts.

MAPPLETS:

A mapplet is a reusable object that represents a set of transformations. It allows reusing transformation logic and containing as many transformations as necessary. Create a mapplet when you want to use a standardized set of transformation logic in several mappings. For example, if you have several fact tables that require a series of dimension keys, you can create a mapplet containing a series of lookup transformations to find each dimension key. You can use the mapplet in the mapping.