Top 143 Informatica Questions and Answers for Job Interview
1. What are the advantages of using Informatica?
Answer: The use of Informatica has a number of advantages which have been listed below:
- Faster as compared to all platforms.
- Jobs can be easily monitored with the Informatica Workflow Monitor.
- Makes data validation, iteration and project development easier.
- Failing of jobs can be prevented and recovery can be practiced.
2. Mention some real world situations where Informatica can be used.
Answer: The following are the areas where Informatica is used extensively:
- Application migration.
- Data migration.
- Data warehousing.
3. Tell us about some of the examples of Informatica ETL programs.
Answer: These are some of the basic Informatica ETL programs:
- Mappings: A mapping is provided within the Designer. It is used to define all the ETL processes. Data is read from the original sources by mappings before transformation logic is applied to the read data. The transformed data is then written to corresponding targets.
- Workflows: workflow is defined as a collection of different tasks that are required while executing all the processes of runtime ETL. Workflows have been provided within Workflow Manager.
- Task: Task is defined as a set of actions, commands, or functions that can be executed. A sequence of different tasks is used to define the behavior of an ETL process.
4. Tell us about the development components of Informatica which have the highest usage.
Answer: The popular development components of Informatica are listed below:
- Expression: This is used to transform data with functions.
- Lookups: They are used to join data.
- Sorter and Aggregator: As the name suggests, they are used to sort and aggregate data.
- Java transformation: These are used whenever the user wants to use variables, java methods, third-party API’s and built-in java packages.
- Source qualifiers: This is used for conversion of the source data types to equivalent Informatica data types.
- Transaction control: This is used for creating transactions and have full control over rollbacks and commits.
5. What are the various uses of the various tools provided by ETL?
Answer: ETL tools are extensively used for a number of processes:
- To load data into a data warehouse from a source known (Target).
- To extract data from a data warehouse from different sources such as database tables or files.
- To transform received data into an organized way.
6. Tell us about Informatica and everything you know about it.
Answer: Informatica is basically a tool used to support all the steps of extraction, transformation and the loading process of data. Informatica has the following features:
- It is also used as an Integration tool.
- It is quite popular because it is very use to use.
- The visual interface is very simple. The user can easily drag and drop different objects (known as transformations) and design the process flow for data extraction, transformation and loading. All these process flow diagrams are called mappings. After the completion of a mapping, it can be used more than once.
- The Informatica server takes care of data fetching from source, transforming the same and loading it into the target systems/databases.
7. What do you know about Informatica?
Answer: Informatica is a Software development company, which provides its users with a number of data integration products. If is quite popular for the given products:
- Data masking
- Data quality
- Data replica
- Data virtualization
- Master data management
The Informatica Powercenter ETL, which is a Data Integration tool is the most popular tool. Whenever anyone says Informatica, he/she is referring to Informatica PowerCenter tool for ETL.
The Informatica Powercenter ETL allows the users to connect as well as fetch data from various heterogeneous source and process all this data. Informatica PowerCenter 9.6.0 is the most recent version. The various editions of the Informatica PowerCenter are:
- Standard edition
- Advanced edition
- Premium edition
The U.S Air Force, Allianz, Fannie Mae, ING, and Samsung are some of the institutions that are using the Informatica PowerCenter for data integration. The IBM Datastage, Oracle OWB, Microsoft SSIS and Ab Initio are some of the other Data Integration Tools.
8. Mention the reasons for which Informatica needs to be used.
Answer: Informatica is usually used to clean data, modify data, and perform some other operations on the basis of a set of rules or to load a huge volume of data from one system to another. It also provides a distinct set of features:
- Operations at row level on data
- Integration of data from multiple structured, semi-structured or unstructured systems, scheduling of data operation
- Preservation of information about process and data operations
9. Tell us about the advantages of using Informatica.
Answer: Informatica has a number of advantages due to which is preferred by users for Data Integration:
- Considerably faster than all available platforms.
- Easy monitoring of jobs by using the Informatica Workflow Monitor.
- Easy data validation, iteration and project development.
- Easy identification of failed projects and recover from the same.
- Coding in its GUI tool is faster than hand code scripting provided by other data integration tools.
- Easy and lossless communicate with all major data sources, such as mainframe, RDBMS, Flat Files, XML, VSM, SAP, etc.
- Effective handling of large data.
- Quick application of mappings, extract rules, cleansing rules, transformation rules, aggregation logic and loading rules even if they are present in separate objects in an ETL tool. Any change in an object affects other objects.
- Users can reuse previously created objects by using Transformation Rules
- Data can be extracted from packaged ERP applications by using different “adapters”.
- Informatica can be easily run on both the Window and the Unix environment.
10. Tell us about some examples of Informatica ETL programs.
Answer: These are some of the basic Informatica ETL programs:
- Mappings: A mapping is provided within the Designer. It is used to define all the ETL processes. Data is read from the original sources by mappings before transformation logic is applied to the read data. The transformed data is then written to corresponding targets.
- Workflows: workflow is defined as a collection of different tasks that are required while executing all the processes of runtime ETL. Workflows have been provided within Workflow Manager.
- Task: Task is defined as a set of actions, commands, or functions that can be executed. A sequence of different tasks is used to define the behavior of an ETL process.
11. Differentiate between Informatica and DataStage.
Answer: Informatica has the following features:
- The tools used for GUI development and Monitoring are PowerDesigner, Repository Manager, Worflow Designer, Workflow Manager.
- It provides a step-by-step solution for solving data integration.
- It has good Data Transformation tools.
DataStage has the following features:
- The tools used for GUI development and Monitoring are DataStage Designer, Job Sequence Designer and Director.
- It provides a project based integration solution for solving data integration.
- It has excellent Data Transformation tools.
12. What is ‘Enterprise Data Warehousing’?
Answer: Enterprise Data Warehousing is the process of development of data of organization at a single point of access.
13. Tell us the differences between a database and a data warehouse.
Answer: In a data warehouse there is a set of every kind of data, both useful as well as not useful. The required data is extracted for the customer. In a database, however, a group of useful information, small in size, is stored.
14. Define the term ‘domain’.
Answer: Domain is used to represent all the interlinked relationships and nodes which are considered as a single organizational point.
15. What is the difference between a repository server and a powerhouse?
Answer: A Repository server guarantees the repository reliability and uniformity.
The powerhouse server is used to handle the execution of many procedures between the factors of the server’s database repository.
16. How many repositories can be created in the Informatica WorkFlow Manager?
Answer: The number of repositories that can be created in the Informatica WorkFlow Manager depends upon the number of ports required. There is no limit to the number of repositories in the Informatica WorkFlow Manager, however.
17. Tell us about the advantages of partitioning a session.
Answer: When the user partitions a session, he/she achieves better server’s process and competence. He/she is also able to implement the solo sequences within the session.
18. How can the user create indexes after completing the full load process?
Answer: The command task at session level is used for creation of indexes immediately after the load procedure. The concerned sessions are defined within Informatica ETL. A Session can be defined as a teaching group that is required for transformation of information from the source to a target.
19. What is the maximum number of sessions a user can have in one group?
Answer: There is no limit to the number of sessions in a group. However, lesser number of sessions are generally recommended in a batch because then migration becomes easier.
20. What are the differences between the mapping parameter and the mapping variable?
Answer: Mapping variable is the special name that is given to those values which are altered during the session’s implementation. Mapping parameters is the name given to those values that are not altered within the session implementation.
21. Tell us about some of the features of complex mapping.
Answer: These are some of the features of Complex mapping:
- It has a number of transformations.
- It is quite tricky because it needs good understanding of the compound business logic.
22. What are the means of identifying whether the mapping is correct or not, without using the connecting session?
Answer: Identifying the correctness of the mapping without making use of the connecting session has been made easier with the help of the debugging option that is built within Informatica.
23. Can the user make use of mapping parameter or variables which have been developed in one mapping into any other reusable transformation?
Answer: Yes, the user can make use of mapping parameter or variables which have been developed in one mapping into any other reusable transformation because mapping parameters don’t have any mapplets.
24. Why is the aggregator cache file used?
Answer: The aggregator delivers extra cache files whenever extra space is required in order to keep the transformation values. The aggregator file is also used to keep the transitional value already present inside the local buffer memory.
25. Define ‘Lookup Transformation’.
Answer: The Lookup Transformation provides the right to enter the RDBMS.
26. Define the term ‘Role Playing’ dimension.
Answer: Role Playing Dimensions are those dimensions which can be used to play diversified roles whilst remaining within the same database domain.
27. How can the user access repository reports without using SQL or any other transformations?
Answer: The metadata reporter can be used to access repository reports. SQL or other transformation are not required here because this is a web app.
28. Mention the various types of metadata that are stored in repositories.
Answer: The different metadata types which are present within the repository are:
- Target definition
- Source definition
- Mapplet
- Mappings
- Transformations
29. What do you know about code page compatibility?
Answer: Code Page compatibility is the process of transferring data from one code page to another page by keeping both code pages having the same character sets so that data failure cannot occur.
30. How can the user confirm all mappings in the repository?
Answer: Mapping cannot be validated simultaneously because the user can validate only one mapping at a time.
31. What is ‘Aggregator Transformation’?
Answer: Aggregator Transformation is different from the Expression Transformation where calculations can be done in set as aggregate calculations such as averages, sum, etc. can be done in the former.
32. Define ‘Expression transformation’.
Answer: Expression Transformation is used to perform non aggregated calculations and the user can test conditional statements well before the output results are moved to the target tables.
33. What is ‘Filter Transformation’?
Answer: Filter transformation is used to filter the rows in a mapping. It has a record of all ports of input and output and the specific row which matches the particular condition passed by this filter.
34. What is ‘Joiner transformation’?
Answer: Joiner Transformation is used to combine two associated mixed sources located in different locations whilst a source qualifier transformation is used to combine data from a common source.
35. Define ‘Lookup Transformation’.
Answer: Lookup transformation is used to maintain data in a relational table by means of a mapping. Multiple lookup transformation can also be used in a mapping.
36. How can the user make use of Union Transformation?
Answer: The Union Transformation is a different input group transformation used for combining data from a number of different sources.
37. What is ‘Incremental Aggregation’?
Answer: Incremental Aggregation is performed by the user when a session is under development for any mapping aggregate.
38. Mention the differences between a connected look up and an unconnected look up.
Answer: Whenever Inputs are taken directly from various transformations in the pipeline, we call it a connected lookup. In unconnected lookup, however, inputs are not directly taken from various transformations, but can be used in any kind of transformations. Unconnected lookups can also be raised as a function using the specific LKP expression.
Connected Lookup:
- They take part in dataflow and receive input directly from the pipeline.
- They can be used as both dynamic and static cache.
- They can return more than one column value through the output port.
- The cache for all connected lookups are actually lookup columns.
- Connected lookups support all user-defined default values.
Unconnected Lookup:
- These receive input values from the result of a LKP expression in another transformation.
- The cache for the Unconnected Lookup are never dynamic.
- They can return only one column value through the output port.
- The caches for unconnected lookup are only the lookup output ports in the lookup conditions as well as the return port.
- The unconnected lookups do not support all user defined default values.
39. What is a ‘Mapplet’?
Answer: A recyclable object that specifically uses the mapplet designer is known as a Mapplet.
40. Define ‘Reusable Transformation’.
Answer: The Reusable Transformation is commonly used in mapping. The Reusable Transformation is different from other mappings. The others transformation uses the stored metadata.
41. Define an ‘Update Strategy’.
Answer: The Update Strategy is used when the row has to be updated or inserted on the basis of some sequence. However, the condition should be specified before for the processed row is update or inserted.
42. When is the server of Informatica forced to reject files?
Answer: Whenever Informatica encounters the DD_Reject in an update strategy transformation, then Informatica is forced to send the server a message to reject the files.
43. Define a Surrogate Key.
Answer: The Surrogate Key is a substitute commonly used for the natural prime key. The Surrogate Key is means of unique identification for each row in the table.
44. What are the prerequisite tasks required to achieve a session partition?
Answer: To achieve a perform session partition, the user needs to configure the session to partition the corresponding source data and finally, installing the Informatica server machine in multifold CPU’s.
45. Which files are created during the session in the server of Informatica?
Answer: The following are the files created during the session in the server of Informatica:
- Errors log
- Bad file
- Workflow low
- Session log
46. What is a ‘Session Task’?
Answer: A Session Task is a mass of instruction that can be used to guide power center server about how and when to move data from the sources to the targets.
47. What is a ‘Command Task’?
Answer: The ‘Command Task’ permits one or more than one shell commands in the UNIX or the DOS environment in Windows to run during the workflow.
48. What do you mean by ‘Standalone Command Task’?
Answer: The Standalone Command Task is generally used by the user anywhere in the workflow to execute the shell commands.
49. Define pre and post session shell command.
Answer: The Command task can be either the pre or post session shell command for any session task. The user can be executed as a pre session command or a post session success command or a post session failure command.
50. Define ‘Predefined event’.
Answer: A predefined event is a general file-watch event. Such an event waits for a specific file to arrive at a specific location.
51. What is a user-defined event?
Answer: A flow of tasks within the workflow is known as a User defined event. Such events can be developed and raised according to the need.
52. What do you mean by ‘Work Flow’?
Answer: Work Flow is the term given to a group of directions that is used to communicate with thw server about how to implement tasks.
53. What are the different tools present within the workflow manager?
Answer: These are the different tools that are present within workflow manager:
- Task Developer
- Task Designer
- Workflow Designer
54. What are the other tools that is used for scheduling other than workflow manager pmcmd?
Answer: A good user generally uses the option of ‘CONTROL M’ which is a third party tool for scheduling purpose instead of workflow manager.
55. What do you know about the term OLAP?
Answer: The term OLAP is used for the process by which multi-dimensional analysis occur.
56. What are the different types of OLAP?
Answer: The different types of OLAP are ROLAP, HOLAP, and DOLAP.
57. What do you mean by the term ‘Worklet’?
Answer: The term ‘Worklet’ is used to describe the process when the workflow tasks get collected within a group. Such processes include the timer, decision, command, event wait, etc.
58. Why is the target designer used?
Answer: The ‘target designer’ is used by the user whenever the user has to create target definition.
59. Where is the throughput option located in Informatica?
Answer: The throughput option is found within the workflow monitor. To achieve this, one needs to follow the given steps:
- a) Right click on session.
- b) Press on get run properties.
- c) The Throughput Option can found under the source/target statistics.
60. What does the term ‘Target Load Order’ mean?
Answer: The term ‘Target Load Order’ is used on the criteria of a source qualifiers in a mapping. Whenever there are a number of source qualifiers attached to many targets then the user can permit order in which the Informatica loads data within the targets.
61. How can the performance of Informatica Aggregator Transformation be improved?
Answer: An ‘Aggregator Performance’ is used to improve if the data records are sorted before they reach the aggregator. The ‘sorted input’ option under the Aggregator properties is checked under such circumstances. The record set must be sorted while entering the data into the columns that are used in the ‘Group By’ Operation. This helps in sorting the record set within the database level (inside a source qualifier transformation), except when there is a chance that the already sorted records are brought from the source qualifier. There is a high chance that the data can again become unsorted before reaching the ‘Aggregator’.
62. Mention the different varieties of lookup cache(s)?
Answer: The Informatica Lookups can be used under the cached or un-cached/no cache category. The Cached lookup can be either a static or a dynamic.
A static cache is used when one cannot modify the cache after it has been built and remains the same while the session is being executed. A cache is always refreshed while the session is being executed by inserting or updating the records within the cache according to the incoming source data. Informatica cache is static cache by default.
A lookup cache can be divided as a persistent or a non-persistent based on whether Informatica has retained the cache even after the session run has been completed or after deletion.
63. How can the user successfully update a record in target table without using the Update strategy?
Answer: A target table can be easily updated without actually using the ‘Update Strategy’. One must follow the given steps:
1) Define the key within the target table at the Informatica level.
2) Connect the key as well as the field that has to be updated in the mapping Target. In the session level, the target property has to termed as “Update as Update” and the “Update” check-box has to be checked.
3) If the session properties are set correctly, the mapping will be updated by the customer address field for all the matching customer IDs.
64. Mention some of the new features of the Informatica 9.x Developer.
Answer: These are the new features in Informatica 9.x that have caused a lot of positive reviews:
- Lookup can be configured as an active transformation and can return multiple rows on a successful match.
- SQL can be written by the user to override on an un-cached lookup also.
- The size of the session can be controlled by the user by their session log. In a real-time environment the session log file size or time can be controlled by the user.
- Database deadlock resilience feature is the best feature that ensures that the session does not fail if any database deadlock is encountered. In such a case, the operation will be retires. The number of retry attempts can be controlled by the user.
65. Mention some of the advantages of using Informatica as an ETL tool over the typical Teradata.
Answer:
- Informatica is a data integration tool but Teradata is a MPP based database with scripting using BTEQ and fast data movement using mLoad, FastLoad, Parallel Transporter capabilities.
- Informatica jobs (sessions) can be arranged logically into worklets and workflows in folders. In Teradata however, there is an ecosystem which is easy to maintain and architects and analysts for further analysis.
- In Informatica Workflow Monitor, jobs can be easily monitored. In Teradata however, jobs can be easily identified and recovered if there are failed jobs or slow running jobs.
- In Teradata, the software can be restarted from failure row /step. In Informatica, the MarketPlace can stop shop for lots of tools and accelerators to make the SDLC faster and to improve the application support.
- Teradata has a number of developers in the market with varying skill levels and expertise. Informatica on the other hand has lots of connectors to various databases, such as support for Teradata mLoad, tPump, FastLoad and Parallel Transporter along with the ODBC drivers. Some ‘exotic’ connectors may need to be procured and hence could cost extra.
- Surrogate key generation is achieved through the shared sequence generators inside Informatica and it could be faster than generating them inside the database.
- Teradata has been provided with the pushdown optimization that can be used by the user to process the data within the database. The user gets the ability to code using ETL in Informatica so that the processing load is balanced between the ETL server and database box. This is an important feature as it proves to be useful if the database box is ageing or if the ETL server has a fast disk or a large memory space along with the CPU that can easily outperform the database in certain tasks.
- Informatica also has the ability to publish processes as web services which Teradata lacks.
Teradata over Informatica
- The initial cost of Teradata is less than Informatica as initial ETL tool license costs a lot and lower OPEX costs.
- Teradata proves to be a great choice if all data that has to be loaded is available only in the form structured files as such can also be processed inside the database after the initial stage load.
- Teradata is a better choice for a lower complexity ecosystem as compared to Informatica.
- Teradata developers or resources with good ANSI/Teradata SQL / BTEQ knowledge are enough for building and enhancing the system.
66. What do you know about the Informatica ETL Tool?
Answer: Informatica ETL tool is considered as one of the best softwares for data integration and data quality services. The ETL and EAI tools of Informatica make it one of the best tools in all of India. The ETL tool is majorly used for extraction, transformation, and loading. Data integration tools within Informatica are different from other software platforms and languages. There are no inbuilt features that can help the user to build user interface where the end user can see the transformed data. The Informatica ETL tool is capable to manage, integrate and migrate enterprise data.
67. Tell us something about the InformaticaPowerCenter.
Answer: The InformaticaPowerCenter is another great Enterprise Data Integration products developed completely by Informatica Corporation. It is also an ETL tool used to extractdata from the source, to transform and load data to the target.
Extraction involves understanding, analysis and cleansing of source data.
Transformation involves cleansing of the data precisely and modifying it according to the requirements.
Loading involves assigning the dimensional keys and loading them into the warehouse.
68. Tell us something about an Expression Transformation in Informatica and how is it achieved.
Answer: An expression transformation is a common Powercenter mapping transformation which is typically used for transforming data passed, one record at a time. It id passive and connected.
Data can be manipulated, variables can be created, and output ports can be generated within an expression. Conditional statements can be written within output ports or variables can be created for transforming data according to requirements.
69. Why does a user need an ETL tool?
Answer: ETL tools prevents the users from writing complex code for connecting multiple sources and handle errors. ETL has been developed to prevent any such complex code-writing.
70. Define the terms ‘Active Transformation’ and ‘Passive Transformation’.
Answer: An Active Transformation is the one that performs the given functions:
- Changes the number of rows during transformation input and output, such as Filter transformation.
- Changes the transaction boundary by defining commit or rollback points, such as the transaction control transformation
- Changes the row type, such as Update Strategy is active as it flags the rows for insert, delete, update or reject
A Passive Transformation is that type of Transformation which does not change the number of rows that pass through it, such as an Expression transformation.
71. Differentiate between Router and Filter.
Answer: Router has the following features:
- This is a type of Transformation which divides the incoming records into multiple groups based on a previously mentioned condition. These groups are mutually inclusive, i.e., different groups may contain the same record.
- This Transformation does not block any record. If any record does not match any of the routing conditions, the record is routed to the default group
- Router acts like the CASE…..WHEN statement in SQL or the switch().. case statement in C, C++, Java, etc.
Filter has the following features:
- This type of transformation blocks the incoming record set based on a predefined condition.
- It does not have a default group. If one record does not match filter condition, the record is blocked.
- Filter acts like the WHERE condition is SQL.
72. When does the aggregator fail within the session while selecting the Sorted Input?
Answer: The Aggregator fails within the session if the input data is sorted incorrectly. There is a high probability of failing even if the input data is properly sorted. The Aggregator can also fail if the sort order is based on ports and the Group By ports of the Aggregator are not in a similar order.
73. Why is the Sorter known as an Active Transformation?
Answer: The Sorter is also called an Active Transformation as the user can select the “Distinct” option in the sorter property. Usually, the Sorter transformation is constructed such as to treat output rows as distinct. Hence, the Sorter Transformation assigns all ports as part of the sort key. The Integration Service rejects duplicate rows compared during the sort operation. The number of Input Rows usually vary as compared with the Output rows. Thus, the Sorter is considered an Active transformation.
74. Differentiate between the Static and Dynamic Lookup Cache.
Answer: A Lookup transformation is used to construct the cache on the basis the underlying lookup table.
The Static or Read-only Lookup cache of the Integration Service caches the lookup table right at the beginning of the session and does not update the lookup cache while it processes the Lookup transformation.
The Dynamic Lookup Cache of the Integration Service dynamically inserts or updates data in the lookup cache and passes the data to the target. The dynamic cache is coordinated with the target.
75. Differentiate between the STOP and the ABORT options within the Workflow Monitor.
Answer: When the user issues the STOP command on the executing session task, the Integration Service stops reading data from the source. Processing, writing and committing the data to targets continues even now.
If the Integration Service does not complete processing and committing data, the user can issue the ABORT command. This command has a timeout period of 60 seconds. If the Integration Service does not complete processing and committing data within the timeout period, the DTM process is killed and the session is terminated.
76. How can the user successfully delete duplicate row while working with Informatica?
Answer: There are a number of ways of successfully deleting duplicate rows which varies according to the exact situation. These are the given scenarios with the exact method to be used.
Scenario 1: Duplicate rows are present within the relational database
If there are many Duplicate records within the Source System and only unique records need to be loaded to the Target System, only after the duplicate rows had been eliminated, the following method must be used.
The user should check the ‘Distinct’ option of the Source Qualifier of the source table and only then load the target accordingly.
Scenario 2: Deletion of duplicate rows / selection of distinct rows for FLAT FILE sources
In such a case, the source system is a Flat File, the user cannot select the ‘Distinct’ option in the source qualifier as it will be disabled due to the Flat File source table. Here, the user must apply a Sorter Transformation and then check the ‘Distinct’ option. When the user selects the ‘Distinct’ option all the columns will the selected as keys, in ascending order by default.
Duplicate records in source batch run can also be handled with the help of the Aggregator Transformation and the ‘Group By’ checkbox on the ports which have duplicate occurring data.
77. How can the user successfully implement the Aggregation operation without actually using an Aggregator Transformation?
Answer: The Expression Transformation can be used to achieve this task at a time when the user can access the previous row data as well as the currently processed data in an Expression Transformation. The Sorter, Expression and Filter transformation can be used for aggregation at the Informatica level.
78. Define ‘Source Qualifier’. Why is it used in Informatica? Why is it also known as ACTIVE transformation?
Answer: A Source Qualifier is defined as an Active and Connected Informatica transformation that is able to read the rows from a relational database or even a Flat File source.
The SQ can be configured to join both the INNER and the OUTER JOIN to acquire data originating from the same source database:
- A source filter can be used to reduce the number of rows for the Integration Service queries.
- A number for sorted ports can be specified and the Integration Service can be used to add an ORDER BY clause to the default SQL query.
- The ‘Select Distinct’ option can be used for the relational databases and the Integration Service adds a SELECT DISTINCT clause to the SQL query, by default.
- A Custom/Used Defined SQL query can also be written by the user which will ultimately override the default query in the SQ by changing the default settings of the transformation properties.
- A user can also write the Pre and the Post SQL statements to be executed before and after the SQ query in the source database.
The Source Qualifier is that type of transformation that is in-built with the property of Select Distinct. When the Integration Service adds a SELECT DISTINCT clause to the default SQL query, which affects the number of rows returned by the Database to the Integration Service. Hence, the SQ qualifies as an Active transformation.
79. What is the effect on a mapping if the datatypes are modified between the Source and the concerned Source Qualifier?
Answer: The Source Qualifier transformation is used to display the transformation datatypes. The transformation datatypes, on the other hand, are used to determine how the source database binds data when the Integration Service reads it. If the datatypes are explicitly modified by the user in the Source Qualifier transformation or if the datatypes in the source definition and the Source Qualifier transformation do not match, the Designer marks the mapping as invalid when the project is finally saved.
80. Consider a situation: All the Select Distinct and the number of sorted ports property in the SQ have been utilized. Here, Custom SQL Query is added. What changes will occur?
Answer: Whenever the user adds the Custom SQL or the SQL override query, the User-Defined Join, Source Filter, Number of Sorted Ports, and Select Distinct settings in the Source Qualifier transformation are overridden. Thus, only the user-defined SQL Query is finally executed in the database and all other options are implicitly ignored.
81. Where can we use the Source Filter, Select Distinct and number of sorted ports properties of the Source Qualifier transformation?
Answer: The Source Filter option is generally used for causing a reduction in the number of rows for the Integration Service queries in order to improve performance.
The Select Distinct option is used to use the Integration Service for selecting unique values from a source and for filtering unnecessary data which might have stayed earlier in the data flow in order to improve performance.
The Number Of Sorted Ports option is used to sort the data from the source so as to use it in transformations like Aggregator or Joiner, as sorted input shows an improvement in the performance.
82. What happens if the SELECT list COLUMNS in the Custom overrides the SQL Query and the OUTPUT PORTS order in the SQ transformation are dissimilar?
Answer: Any form of mismatch or change in the order of the list of selected columns in accordance with the connected transformation output ports usually causes a session failure.
83. Tell us about some situations where the Joiner transformation is used instead of the Source Qualifier transformation.
Answer: When the user needs to join the Source Data of heterogeneous sources and to join flat files, he/she generally uses the Joiner transformation. The Joiner transformation can be freely used when the following types of sources have to be joined:
- Join data from different Flat Files.
- Join relational sources and flat files.
- Join data from different Relational Databases.
84. Tell us about the maximum number that can be used within the number of sorted ports for the Sybase source system.
Answer: The Sybase can easily support a maximum of 16 columns in an ORDER BY clause. Hence a source that is Sybase, more than 16 columns should not be sorted.
85. Consider a situation where there two Source Qualifier transformations SQ1 and SQ2 connected to Target tables TGT1 and TGT2, respectively. How can the user ensure that TGT2 is loaded only after TGT1 has been loaded?
Answer: If there are multiple Source Qualifier transformations connected to multiple targets, the order can be decided according to the order in which the Integration Service loads data into the target. In the Mapping Designer, the Target Load Plan based on the Source Qualifier transformations in a mapping must be considered to specify the required loading order.
86. Consider there are a situation where there is a Source Qualifier transformation that is being used to populate two target tables. How can the user ensure that TGT2 is loaded only after TGT1 has been loaded?
Answer: In the Workflow Manager, the Configure Constraint can be configured on the basis of the load ordering for a session. The Integration Service, here, is used to order the target load on a row-by-row basis. For every row generated by an active source, the Integration Service loads the matching transformed row first to the primary key table, and then to the foreign key table. If there is one Source Qualifier transformation that provides data for multiple target tables having primary and foreign key relationships, the user should make use of Constraint based load ordering.
87. Define a Filter Transformation. Why is a Filter Transformation active?
Answer: A Filter transformation is an Active and connected transformation that is generally used for filtering rows in a mapping. The rows that meet the Filter Condition are the only rows that are allowed to pass through the Filter transformation to the next transformation in the pipeline.
TRUE and FALSE are the implicit return values from any filter condition that has been set by the user. If the filter condition results in a NULL value, the row is assumed to be FALSE. FALSE is equal to zero and any other non-zero value is equal to TRUE. As the Filter transformation is an Active Transformation, the number of rows passed through it can change at the output end. A filter condition returns TRUE or FALSE for each row that passes through the transformation, depending on whether a row meets or does not meet the specified condition, respectively. Rows that return TRUE are only allowed to pass through this transformation. The Discarded rows do not appear in the session log or reject files.
88. Differentiate between the Source Qualifier transformations Source Filter and the Filter transformation.
Answer: Source Qualifier Transformation has the following features:
- This filters rows while reading from a particular source.
- This can only filter rows from concerned Relational Sources.
- This limits the row set extracted from a particular source.
- It reduces the number of rows used while mapping and provides better performance.
- This only uses standard SQL while it is executed within the database.
Filter Transformation has the following features:
- This transformation filters rows from within a mapping.
- This filters rows coming from any type of source system at the mapping level.
- This limits the row set sent to a target.
- Including the Filter transformation close to the sources in the mapping in order to filter the unwanted data early in the flow of data from the source to the concerned target.
- This type of transformation can define a condition using a statement or a transformation function that returns a TRUE or a FALSE value.
89. Define a ‘Joiner Transformation’. Why is it active in nature?
Answer: A Joiner is an Active and a connected transformation that is generally used to join the source data from the source system or from two related heterogeneous sources that are stored at different locations or file systems. The Joiner transformation is then used to join sources with minimum one matching column. A condition that matches one or more pairs of columns between the two sources is used in such cases.
The two input pipelines comprise a master pipeline and a detail pipeline or a master and a detail branch. The master pipeline ends at the Joiner transformation while the detail pipeline continues to the target. The user must configure within the Joiner transformation by the terms Join Condition, Join Type and Sorted Input option to improve Integration Service performance. This particular condition comprises ports from both input sources that must match for the Integration Service to join two rows. The Integration Service either adds the row to the result set or discards the row on the basis of the Join selected. The Joiner transformation produces result sets based on the join type, condition, and input data sources. Thus, a Joiner Transformation can be considered an Active transformation.
90. What are the limitations on Joiner due to which they cannot be used within the mapping pipeline?
Answer: The Joiner transformation allows inputs from almost all transformations. These are the given limitations for the same:
- It cannot be used if the input pipeline contains an Update Strategy ransformation.
- It cannot be used if a Sequence Generator transformation is connected directly before the Joiner transformation.
91. Which among the two input pipelines of a joiner should be used as a master pipeline?
Answer: During any session run, the Integration Service compares each row of the master source with the detail source. The master and detail sources need to be configured for maximum possible performance.
To improve performance for an Unsorted Joiner transformation the source with fewer rows is used as the master source. The lesser the number of unique rows in the master, the lesser the number of iterations of the join comparison can occur as it quickens the join process.
When the Integration Service routes an unsorted Joiner transformation, all master rows are read before the detail rows. The Integration Service keeps the detail source busy while rows from the master source are cached. Once the Integration Service has completed reading and caching all master rows, the detail source are blocked and the detail rows are read.
To improve performance for a Sorted Joiner transformation, the source with fewer duplicate key values are used as the master source.
When the Integration Service processes a sorted Joiner transformation, data based on the mapping configuration is blocked and fewer rows in the cache are stored to increased performance. This is possible if both the master and detail input to the Joiner transformation originate from different sources. This is not possible if they do not originate from different sources and more rows are stored in the cache instead.
92. Mention the various types of Joins that have been provided within Joiner Transformation.
Answer: A join in SQL is basically a relational operator that is used to combine data from multiple tables into a single result set. The Joiner transformation is similar to this except that data can originate from different types of sources. The following types of joins supported by Joiner transformation are:
- Normal
- Master Outer
- Detail Outer
- Full Outer
93. Define the various Join Types of Joiner Transformation.
Answer:
- In a Normal Join, the Integration Service is used for discarding all rows of data from the master and detail source that do not match on the basis of the join condition.
- A Master Outer Join is used for keeping all rows of data from both the detail source and the matching rows from the master source and for discarding the unmatched rows from the master source.
- A Detail Outer Join is used for keeping all rows of data from both the master source and the matching rows from the detail source. The unmatched rows are discarded from the detail source.
- A Full Outer Join is used for keeping all rows of data from both the Master and Detail sources.
94. What is the effect of the number of join conditions and the join order in a Joiner Transformation?
Answer: A number of conditions on the basis of equality between the specified master and detail sources. Both these ports require to have the same data types. If the data types do not match, we must make sure to match the data types explicitly.
The Designer is used for validating datatypes in a Join condition. Additional ports in the join condition are used to increase the time required to join two sources. The order of the ports in the join condition can affect the performance of the Joiner transformation as well. If there are multiple ports in the Join condition, the Integration Service is used to compare the ports in the specified order.
95. Explain the working of the Joiner transformation with the NULL value matching.
Answer: The Joiner transformation is unable to match NULL values.
For example, if both EMP_ID1 and EMP_ID2 contain a row with a null value, the Integration Service is unable to match the value and hence, prevents their joining.
To join rows with null values, the NULL input is replaced with default values in the Ports tab of the Joiner and the Join action is performed on the default values.
If a result set includes fields with no data from either sources, the Joiner transformation is used to populate the empty fields with NULL values. If there is a field that will return a NULL and NULL values are not required in the target, a default value is set on the Ports tab for the corresponding port.
96. Mention the transformations that cannot be positioned between the sort origin and the Joiner transformation such that the input sort order is lost.
Answer: The Joiner transformation should be placed directly after the sort origin to maintain the sorted data. The following transformations should not be used between the sort origin and the Joiner transformation:
- Normalizer
- Custom
- Rank
- Unsorted Aggregator
- Union transformation
- XML Generator transformation
- XML Parser transformation
- Mapplet
97. Define the term ‘Sequence Generator Transformation’.
Answer: A Sequence Generator Transformation can be defined as a passive and connected transformation that is used to generate numeric values. It can be used for creating unique primary key values, replacing missing primary keys, or moving through a sequential range of numbers. By default, this transformation contains ONLY Two OUTPUT ports: CURRVAL and NEXTVAL. These ports can neither be edited nor be deleted. These ports cannot be added to this unique transformation as well. About two billion unique numeric values with the widest range from 1 to 2147483647 can be created.
98. Tell us about the Properties that are provided within the Sequence Generator transformation.
Answer:
- Start Value: The start value of the generated sequence that the Integration Service is supposed to use if the Cycle option is used. If the Cycle is selected, the Integration Service cycles back to this value when it reaches the end value. The default value is 0.
- Increment By: The difference between two consecutive values from the NEXTVAL port. The default value is 1.
- End Value: This is the maximum value generated by the SeqGen. After reaching this value the session will fail if the sequence generator is not configured to the cycle. The default value is 2147483647.
- Current Value: It is used to store the current value of the sequence. The user needs to enter the value according to the requirements to the Integration Service to use as the first value in the sequence. The default value is 1.
- Cycle: If selected when the Integration Service reaches the configured end value for the sequence, it wraps around and starts the cycle again, beginning with the configured Start Value.
- Number of Cached Values: It stores the number of sequential values the Integration Service caches at a time. The default value for a standard Sequence Generator is 0. The default value for a reusable Sequence Generator is 1,000.
- Reset: This is used to restart the sequence at the current value each time a session runs. This option is disabled for reusable Sequence Generator transformations.
99. Define the term ‘Aggregator Transformation’.
Answer: An aggregator is defined as an Active, Connected transformation which is used to perform aggregate calculations such as
- AVG
- COUNT
- FIRST
- LAST
- MAX
- MEDIAN
- MIN
- PERCENTILE
- STDDEV
- SUM
- VARIANCE
100. Differentiate between the Expression Transformation and Aggregator Transformation.
Answer: An Expression Transformation is used to perform calculation on a row-by-row basis while an Aggregator Transformation is used to perform calculations on groups.
101. Is Informatica Transformation only able to support Aggregate expressions?
Answer: The Informatica Aggregator is used for supporting non-aggregate expressions and conditional clauses along with aggregate expressions.
102. Mention the way in which the Aggregator Transformation treats NULL values.
Answer: The aggregator transformation treats null values as NULL in aggregate functions by default. The treatment of the null values in aggregate functions as NULL or zero can be specified explicitly as well.
103. Define the term ‘Incremental Aggregation’.
Answer: The Incremental Aggregation is a session operation that is used for a session that includes an Aggregator Transformation. When the Integration Service performs the incremental aggregation, the changed source data passes through the mapping and the historical cache data is used to perform aggregate calculations incrementally.
104. Mention the factors that are to be considered while working with the Aggregator Transformation.
Answer: These are the factors that are to be considered while working with the Aggregator Transformation:
- The unnecessary data should be filtered before aggregating it. A Filter transformation should be placed in the mapping before the Aggregator transformation in order to cause a reduction in the unnecessary aggregation.
- The performance can be improved by connecting only the necessary input/output ports to subsequent transformations and hence, the size of the data cache is reduced.
- The sorted input should be used as it reduces the amount of data cached and improves the session performance.
105. Explain the working of the Sorted Input for Aggregator Transformation.
Answer: The Integration Service is used to create the index as well as the data caches files in memory in order to process the Aggregator transformation. If the Integration Service requires some more space compared to the allocated for the index and data cache sizes in the transformation properties, the overflow values are stored in the cache files, i.e., paging to disk. The session performance can be improved by increasing the index and data cache sizes in the transformation properties.
106. Mention the cases where the selection of the Sorted Input in Aggregator will not enhance the performance of the session.
Answer: These are the cases where the selection of the Sorted Input in Aggregator will not enhance the performance of the session:
- Incremental Aggregation where the session option is enabled.
- The aggregate expression contains a nested aggregate functions.
- The source data is data driven.
107. Mention those cases where the selection of the Sorted Input in Aggregator might cause a failure in the session.
Answer: These are the cases where the selection of the Sorted Input in Aggregator might cause a failure in the session:
- If the input data is not sorted correctly where the session will fail.
- If the input data is properly sorted, the session might fail if the sort order by ports and the group by ports of the aggregator are not in the same order.
108. What is the value to be expected when the column in an aggregator transform is neither an aggregate expression nor a group by?
Answer: The Integration Service yields one row for each group based on the group by ports. The columns which are not a part of the key or the aggregate expression are returned according to the corresponding value of the last record of the group received. If the user explicitly specifies particularly the FIRST function, the Integration Service returns according to the value of the specified first row of the group. The LAST function is chosen by default.
109. Define the term ‘Rank Transform’.
Answer: The Rank Transform is an Active Connected Informatica transformation that is specifically used for selecting a set of top or bottom values of data.
110. Tell us the difference between the Rank Transform and the Aggregator Transform functions MAX and MIN.
Answer: The Rank transformation lets the user group information just like the Aggregator transformation. The Rank Transform allows the user to select a group of top or bottom values unlike the Aggregator which allows the user to select only one value out of MAX and MIN functions.
111. Define the terms ‘RANK port’ and ‘RANKINDEX’.
Answer: The ‘Rank port’ is used as an input/output port for specifying the column for which the source values have to be ranked.
The Informatica creates an output port ‘RANKINDEX’ for each Rank transformation by default and is used for storing the ranking position for each row in a group.
112. How can the user successfully achieve ranks on the basis of different groups?
Answer: The Rank transformation helps the user to group information. One of the input/output ports can be used as a group by port. For each unique value in the group port, the transformation is used for creating a group of rows falling within the rank definition.
113. Tell us about the condition when the values of two rank are similar.
Answer: When two rank values match, they receive the same value in the rank index and the transformation avoids the next value.
114. Mention the restrictions imposed on Rank Transformation.
Answer: These are the restrictions imposed on the Rank Transformation:
- The ports can be connected from only one transformation to the Rank transformation.
- Only the top or bottom rank can be selected.
- The Number of records in each rank can be selected.
- Only one Rank port in a Rank transformation can be designated.
115. Tell us about the working of the Rank Cache.
Answer: The Integration Service compares an input row with rows in the data cache during a session. If the input row out-ranks a cached row, the Integration Service swaps the cached row with the input row. If the user is able to configure the Rank transformation to the rank based on different groups, the Integration Service positions incrementally for each group it finds. The Integration Service creates an index cache for storing the group information and data cache for the row data.
116. Tell us about the way Rank transformation deals with different string values.
Answer: Rank transformation returns the strings at the top or the bottom of a session sort order. When the Integration Service is executed in the Unicode mode, the character data is sorted in the session using the selected sort order linked to the Code Page of IS which may be French, German, etc. When the Integration Service is executed in the ASCII mode, this setting is ignored and binary sort order is used to sort character data.
117. Define the term ‘Sorter Transformation’.
Answer: The Sorter Transformation can be defined as an active, connected Informatica transformation that is used for sorting data in the ascending or descending order permitting to specify sort keys. It contains only input/output ports.
118. Tell us why the Sorter Transformation is active in nature.
Answer: When the Sorter transformation is configured explicitly by the user to treat output the rows as distinct, all ports as part of the sort key are assigned. The Integration Service rejects the duplicate rows that are compared with the sort operation. The number of Input Rows usually change when compared with the Output rows. Hence, the Sorter Transformation is considered an Active transformation.
119. Explain the working of the Sorter with Case Sensitive sorting.
Answer: The Case Sensitive property is used for determining the Integration Service considers specific cases when data is sorted. Whenever the Case Sensitive property is enables, the Integration Service is used for sorting uppercase characters higher than lowercase characters.
120. Explain the working of the Sorter with NULL values.
Answer: Whenever Null Treated Low property is enabled, the user can specify how the null values are supposed to be treated as lower than any other value whenever the sort operation is performed. If the user requires the Integration Service to treat null values as higher than any other value, this property can be disabled.
121. Explain the working of the Sorter Cache.
Answer: The Integration Service is used to pass all incoming data into the Sorter Cache before the Sorter transformation starts performing the sort operation. The Integration Service uses the Sorter Cache Size property for determining the maximum amount of memory that can be allocated to perform the sort operation. If enough memory cannot be allocated, the Integration Service fails the session.
The Sorter cache size is configured with a value less than or equal to the amount of available physical RAM on the Integration Service machine to improve the overall performance. If the amount of incoming data is greater than the amount of Sorter cache size, the Integration Service temporarily stores data within the Sorter transformation work directory. The Integration Service requires disk space of at least twice the amount of incoming data when data has to be stored in the work directory.
122. Define the term ‘Union Transformation’.
Answer: The Union transformation is defined as an active, connected non-blocking multiple input group transformation that is used to merge data from a number of pipelines or sources into a single pipeline branch. Similar to the UNION ALL SQL statement, the Union transformation does not completely reject duplicate rows.
123. Mention the restrictions imposed on the Union Transformation.
Answer: Every input group and output group must have a matching port. The precision, datatype, and scale should be identical across all groups. Multiple input groups can be created but only one output group is created by default.
The Union transformation does not remove duplicate rows. A Sequence Generator or a Update Strategy transformation cannot be used upstream from a Union transformation. The Union transformation generally does not generate transactions.
124. Define the term ‘Persistent Lookup Cache’.
Answer: Lookups are cached in Informatica by default. The Lookup cache can be either non-persistent or persistent. The Integration Service is used to save or delete the lookup cache files after a successful session run on the basis of whether the Lookup cache is checked as persistent or not.
125. Differentiate between Reusable transformation and a Mapplet.
Answer: Any Informatica Transformation which is created in the Transformation Developer or a Non-Reusable is promoted to Reusable Transformation from the mapping designer which can be used in multiple mappings is known as a ‘Reusable Transformation’.
When a reusable transformation is added explicitly by the user to a mapping, an instance of the transformation is added. The instance of a reusable transformation is a pointer to that transformation so the user changes the transformation in the Transformation Developer and hence, the instances reflect these changes.
A Mapplet is a reusable object that is created in the Mapplet Designer and it contains a set of transformations and the user is allowed to reuse the transformation logic in multiple mappings. It can contain a number of transformations according to the user’s needs.
Similar to a reusable transformation when a Mapplet is used in a mapping, an instance of the Mapplet can be used. Any change made to the Mapplet is inherited by all these instances of the Mapplet.
126. Mention all those transformation that cannot be executed along with Mapplet.
Answer: The given are all those transformations that cannot be executed along with the Mapplet:
- Normalizer
- COBOL sources
- XML sources
- XML Source Qualifier transformations
- Target definitions
- Pre-Session and Post-Session Stored Procedures
127. List out the ERROR tables provided within Informatica.
Answer: The following are the ERROR tables provided within Informatica:
- PMERR_TRANS– This table is used to store metadata about the source as well as the transformation ports when a transformation error occurs.
- PMERR_DATA– This table stores data as well as metadata about any transformation row error and the corresponding source row.
- PMERR_SESS– This table is used to store metadata about the session.
- PMERR_MSG– This table stores metadata about an error and the corresponding error message.
128. Differentiate between the STOP and ABORT commands.
Answer: The STOP command when applied on the Integration Service stops reading data from the concerned source. It continues all processing, writing and committing of data to the targets. If the Integration Service cannot finish processing and committing data, the ABORT command needs to be used to stop the process and terminate the session.
The ABORT command has a timeout period of 60 seconds. If the Integration Service cannot finish processing and committing data within this period, the DTM process is killed and the session is terminated.
129. Can a session be copied to a new folder or a new repository?
Answer: Yes, a session can be copied to a new folder or a new repository delivered if the equivalent Mapping is already present within it.
130. What are the types of join supported by Lookup?
Answer: Lookup is similar to SQL LEFT OUTER JOIN and majorly supports the LEFT OUTER JOIN.
131. What are the types of groups provided within the Router Transformation?
Answer: The three types of groups present within the Router transformation are default, input and output.
132. Define the term ‘Dimensional Tables’.
Answer: Dimensional tables are an important part of Informatica. They have been designed for helping segment and for describing the company’s data in a hierarchical manner for easier access and comprehension.
133. What are the various methods used to implement parallel processing in Informatica?
Answer: There are a number of methods that can be used to implement parallel processing in Informatica. The choice of the method depends on the choice of the user and the context of the situation. Some of the methods used for the implementation of parallel processing pass-through, database and Round-Robin partitioning. All these methods are based on Data Partitioning.
134. What are the number of distinct values to be expected when working with process of mapping?
Answer: The user is required to choose the fields that he/she wants to be distinct. After accomplishing this task, the user needs to add an aggregator by the specific fields before inserting any data.
135. Why are certain files rejected by Informatica?
Answer: Informatica uses DD_REJECT in update strategy to reject all those files which do not fulfill the mentioned condition.
136. Why is an ETL tool required while working with Informatica?
Answer: With the use of the ETL tools in Informatica, the user does not need to connect to multiple sources or personally handle errors. ETL does not involve the use of complex code as it is in-built with these features.
137. What are the differences between Joiner and Lookup Transformation?
Answer:
- In the Lookup Transformation, the query can be overridden but the same cannot be achieved in Joiner Transformation.
- In Lookup, different types of operators like – “>,<,>=,<=,!=” can be provided but in Joiner only the equal to operator can be used.
- In Lookup the number of rows can be restricted when reading the relational table using lookup override but in Joiner the number of rows cannot be restricted while reading.
- In Joiner the tables can be easily joined based on- Normal Join, Master Outer, Detail Outer and Full Outer Join but in Lookup this is not possible. Lookup behaves like the Left Outer Join of database.
138. Define the term ‘Lookup Transformation’. Explain the types of Lookup transformation.
Answer: Lookup transformation in a mapping is used for looking up data in a flat file, relational table, or view. A lookup definition can also be created from a source qualifier.
These are the different types of Lookup:
- Relational or flat file lookup which is used to perform a Lookup on a Flat File or a Relational Table.
- Pipeline lookup is used to perform a lookup on all application sources such as JMS or MSMQ.
- Connected or Unconnected lookup:
o A Connected Lookup transformation is used to receive the source data, to perform a lookup, and to return data to the pipeline.
o An Unconnected Lookup transformation is not connected to a source or target. A transformation in the pipeline calls the Lookup transformation with a LKP expression. The Unconnected Lookup transformation, however, returns one column to the calling transformation.
- Cached or un-cached lookup is used to configure the Lookup transformation to Cache the Lookup data or for directly querying the Lookup source every time the lookup is called. If the Lookup source is Flat file, the lookup is always Cached.
139. Tell us about the different types of Caches provided within Lookup.
Answer: There are two types of Lookup Caches on the basis of the configuration done at Lookup Transformation/ Session Property Level:
- Un-cached lookup – The lookup transformation does not create a cache in the un-cached lookup. For every record, the Lookup Source is accessed, the lookup is performed and the corresponding value is returned. So for 10K rows, the Lookup source will be accessed 10K times to get related values.
- Cached Lookup– The user can configure the lookup transformation to create the cache to reduce the to and fro communication with the Lookup Source and Informatica Server. The entire data from the Lookup Source is cached and all lookups are performed against these Caches.
Based on the types of the Caches configured, there are two types of caches: Static and Dynamic.
The Integration Service performs in a different manner on the basis of lookup cache that has been configured.
1) Persistent Cache: The Lookup caches are deleted post successful completion of the respective sessions by default. However, the user can configure to preserve the caches in order to reuse it next time.
2) Shared Cache: The lookup cache between multiple transformations can be easily shared. An unnamed cache between transformations in the same mapping can also be shared. A named cache between transformations in the same or different mappings can also be conveniently shared.
140. Tell us about the reason for which both Update Strategy and Union Transformations are considered active.
Answer: The Update Strategy is generally used to change the row types as it can easily assign the row types based on the expression created to evaluate the concerned rows. For example,
IIF (ISNULL (CUST_DIM_KEY), DD_INSERT, DD_UPDATE).
This expression can be used to change the row types to insert for which the CUST_DIM_KEY is NULL and to update for which the CUST_DIM_KEY is NOT NULL. The Update Strategy can be used to reject the rows. The user can also filter out some rows by using proper conditions. Sometimes, the number of input rows, might not be equal to the number of output rows. For example,
IIF (IISNULL (CUST_DIM_KEY), DD_INSERT,
IIF (SRC_CUST_ID! =TGT_CUST_ID), DD_UPDATE, DD_REJECT))
Here, the CUST_DIM_KEY is being checked whether it is not null. If SRC_CUST_ID is equal to the TGT_CUST_ID, the outcome will be NOT NULL. If these values are equal, then no actions are taken and they are rejected.
In union transformation, the total number of rows passing into the Union is the same as the total number of rows passing out of it. Thus, it can be observed that the positions of the rows are not preserved, i.e., row number 1 from input stream 1 might not be row number 1 in the output stream. Since, Union does not guarantee that the output is repeatable and hence, it is considered an Active Transformation.
141. How is the user successfully load alternate records into different tables by using the process of mapping flow?
Answer: The basic idea behind achieving this is to add a sequence number to the records and then dividing the number of records by 2. If the number of records are divisible by 2, they need to be moved to one target and if the numbers are not divisible, they need to be moved to another target.
- The source needs to be dragged and connected to an expression transformation.
- The next value of a sequence generator should be added to the expression transformation.
- In expression transformation two ports need to be added, one is odd and another is even.
- A router transformation should be connected to an expression.
- Two group should be created within the router.
- Finally the two groups need to be sent to the different targets.
142. Explain the process of loading only the first and the last records into the target table? Tell us about the different methods used to achieve.
Answer: In order to load only the first and the last record into the target table by following the given steps:
- The user should drag and drop ports from the source qualifier to two rank transformations.
- A reusable sequence generator must be created with start value 1 and the next value should be connected to both the rank transformations.
- The newly added sequence port should be chosen as the Rank Port.
- Two instances of the target should be created.
- The output port should be connected to the target.
143. Consider a situation where there are two different source structure tables but the user wants to load into the single target table. Explain the process through which this can be achieved.
Answer: The following process can be used when the user wants to load two different source structure tables into a single target table:
- A joiner can be used to join the data sources by using the matching column to join the tables.
- A Union transformation can also be used if the tables have some common columns and data needs to be joined vertically. One union transformation needs to be created and the matching ports need to be added to form the two sources, to two different input groups and finally one can send the output group to the target.
Using either of these the user can move the data from two sources to a single target.
Informatica is a very popular and leading data warehouse tool. Majority of the organizations across the globe use Informatica for their data warehouse applications. Due to high reliability and excellent user interface, Informatica has been consistently gaining its popularity. Though several other tools came into the market, Informatica is still the leader in data warehouse segment. Hence, the demand for Informatica is expected to stay and there will be consistent job opportunities for Informatica resources. 143 top Informatica interview questions given above will help you to succeed in your job interview.