David Aldridge, April 10, 2006 - 5:21 pm UTC
A pure Informatica process is mostly unfeasible -- you need to call stored procedures to disable and rebuild indexes,validate dimensions and whatnot.
The one problem that irritates me most with Informatica is its insistence that 2.5*1.3=3.24999999... (for example), which can ruin your day when you need to compare three data values to see if a*b=c. In those cases I've always reverted to embedding the logic in a decode statement as part of a view or a SQL override on the source qualifier. Once you start doing that sort of thing its a slippery slope ....
a*b=c
Kirtan, March 28, 2007 - 3:33 pm UTC
I stumblled across this question while browsing through asktom.
David,
Just an FYI for future, you could turn high precision on in Informatica to do that sort of things. I have been using informatica for a while now and I find the combination of PL/SQL and Informatica running side by side quite prodcutive.
Many Instances of Wrong
A reader, August 14, 2007 - 9:17 am UTC
I am frustrated with my company's insistance on using Informatica for simple data movements. For example, we are pulling data from a remote database via a single SQL query and DB LINK. There are no calculations being performed, there are very few rows being returned. In short, there's no ETL at all. Just a remote DB query.
As we were having our production meeting, the suggestion is made that we change those queries to use Informatica because "we don't want hits on the database". Which, to me, sounds nonsensical since Informatica has to hit the database as well.
It also sounds to me as though they are trying to create work since it takes several weeks to productionize even the most simple workflows.
A database link takes 5 minutes to create and use. An Informatica workflow takes several weeks.
There's no comparison in that case.
To me, if the purpose is for E T L, then Informatica is the right tool. If youre trying to create work, then go to a different company...
thoughts?
August 20, 2007 - 11:04 am UTC
my thoughts....
I don't get their argument about "we don't want hits on the database". For your stated reason (informatica will actually HIT THE DATABASE HARDER than your current database link + single sql statement), as well as the fact that - well, why did you buy a database if you are afraid to use it?
Need an expert review of a statement from book: "The datawarehouse ETL Toolkit"
Giri, June 12, 2008 - 2:52 am UTC
Hi Tom,
In this context of PL/SQL vs ETL, here is a statement from the book: "The datawarehouse ETL Toolkit". Author: Ralph Kimball. (Page:300, Chapter:7, Published:2004, ISBN:0-764-56757-8)
"ETL Products are specifically designed to extract, transform, and load massive amounts like no other nondata wareousing solution. With minimal exception, most databases are designed to support transactional and operational applications. Database-procedural programming is good for supporting data-entry applications but is not optimal for processing large data sets at once. The use of cursors-where each record is analyzed individually before moving on to the next-is notoriously slow and usually results in unacceptable prefromance while processing very large data sets. Instead of using procedures stored within the databses, it's beneficial to utilize the ETL engine for manipulating and managing the data".
If i read these sentences carefully, i have following observations.
1) He means to imply that incorrectly implemented cursors like row by row, slow by slow ones, but not referring to correct implementations like ref cursors and bulk fetch/collect.
2) Some of the statements in this paragraph needs to be taken with a grain of salt and it all boils down to the fact that "it depends" and needs to be bench marked.
3) I some how don't agree with this statement. Probably Kimball went a little overboard against db and more biased towards an ETL Tool.
Database-procedural programming is good for supporting data-entry applications but is not optimal for processing large data sets at once.
4) Don't believe every thing written in books even by gurus, and probably some of the statements apply for very specific scenarios, most of the time incorrectly implemented scenarios.
It would be great if you could provide your expert observation on the ETL tool.
June 12, 2008 - 8:05 am UTC
1) he means "non-bulk things", yes, if you do all of your ETL slow by slow (which, in my experience, most of these external ETL engines do!!!! They are notorious for slow by slow processing - they are not designed "in bulk" as he suggests - they should be, but are not), it'll be very slow.
If you can do it in a single sql statement - there you go, best ever. That is the MOST optimal way to process large data sets (never let it OUT of the database, "etl engines" tend to a) extract (hence the E), b) write to files on disk, c) transform (hence the T), d) write to files on disk, and then e) load (hence the L).
I'd rather just T. Skip E and L.
2) correct, as with everything here
3) I concurr. There is a bit of database independence going on there as well - which if you get caught up in, you'll end up doing things in a manner that gets poor performance in one database (the one you initially develop on) and horrible performance on the rest (those to which you "port")
PLSQL is just a language, SQL is a language, Java, C, fortran - they are all languages. the ETL engine is written in *some* language and that language would ultimately be comparable to PLSQL as far as programming constructs. You can write bad plsql code, you can write bad C code. It is in the implementation - your algorithms.
Look at the ETL engine that is OWB (Oracle Warehouse Builder). It generates............ PLSQL for everything (it skips the E and L steps whenever possible). It has the ability to generate slow by slow, bulk process, or single SQL statement implementations - you choose for the most part.
A language should not be damned for what it is generally used for.
4) perfect.
There should be some kind of framework for ETL
Reader, July 28, 2008 - 6:58 am UTC
I have used Informatica and when it came to performance we had a hard time to get the data in-time before business hours.
I am not sure if the latest version has improved, not using it anymore.
I think if we have data in Oracle DB itself we can very well use SQL, PL/SQL which might give better performance.
For the above solution to be more productive (in terms of development time) I think there should be some kind of framework (built in Oracle or on top of it by 3rd party) using PL/SQL which can take care of common task involved in ETL, just like Java has so many framework for specific areas. I am not sure if something similar already exists.
It should just provide a common framework which can be easily used and extended. It should not end up doing everything and become like a ETL tool :)
What do you think?
PL/SQL Vs ETL !!!
Ankit, January 20, 2012 - 12:38 am UTC
Hi Tom
Its wonderfull to read your Oracle words of wisdom. Thanks for all that :))
I am currently working on a DWH application (build on Oracle 10g R2) which uses PL/SQL procedures for extracting data via a DB link into intermediate STAGING tables, then use PL/SQL procedures to transform it a bit and load into MAIN tables, majorly using MERGE statements. Currently the load takes Max 3.5 hrs to finish. Indexes and hints are in place, a job runs weekly for statistics collection on schema. There are around 550 jobs and 450 tables to be filled up. The guy who developed it said "what's the need to use some ETL tool when everything at-last has to get converted into PL/SQL only".
I need to propose value addition things for this application.
I want to know :
1) Will it be beneficial to use some ETL tool instead of above given approach for this application ? If yes, what benefits are there with ETL tools ? Which tool do you propose, OWB or informatica ?
2) How much true is the statement "what's the need to use some ETL tool when everything atlast has to get converted into PL/SQL only" ?
3) Would you suggest anything other than MERGE statement to improve efficiency ?
Any other suggestion from you will be highly appreciated.
Thanks.
January 20, 2012 - 9:47 am UTC
Indexes and hints are in place,
that scares me. Don't like the idea of hints and if your ETL uses a ton of indexes, you must be doing slow by slow processing :(
1) see the original answer above, starting with the "to me - the primary reason..".
2) I would say it is not entirely accurate. I would like it to be converted into SQL only with a tiny bit of PLSQL if necessary.
3) avoid conventional path insert (insert without append), avoid update, avoid delete, avoid merge. Use create table as select (CTAS) or insert /*+ APPEND */ to load massive amounts of information. Instead of update or merge - do the operation as a create table as select and get rid of the old information (this can happen at the partition level or whatever).
insert (without append), update, delete, merge - invented mostly for OLTP, not for doing millions of rows.
CTAS, direct path load (insert /*+ APPEND */) invented for warehousing - it skips undo, can skip redo, and allows the resulting data to be compressed to the max.
Difference between ETL & ELT
Waran, September 03, 2013 - 7:58 am UTC
What is the difference between ETL and ELT
September 04, 2013 - 6:58 pm UTC
it is much harder and less efficient in general to do the T (transformation) OUTSIDE of the database. the database was born to transform data with big set based direct path operations.
so extracting to disk, transforming to disk, loading back into a database
is in my experience much more tedious and much slower than
extract and load into the database, transform
or even
extract and (load & transform) into the database
http://www.youtube.com/watch?v=2jWq-VUeOGs
get over the wrong concepts
Amir Riaz, September 08, 2013 - 7:57 am UTC
A lot of wrong concepts have been projected in this thread.
in reality there is no pure ETL and ELT loading. A well designed data loading is a mixture of both. That is why Tools like Informatica gives both.
With ELT loading, the biggest problem is to transfer the data between heterogeneous systems. for example between teradata and oracle. while Oracle provides gateways and other technique most of them are time consuming and unrealistic in nature.
Oracle ELT tool ODI implements the data movement between heterogeneous systems by JDBC, though File transfer. none of them are good and they show the real weakness of that tool
Oracle after so many years still not able to understand the requirements of a true ETL or ELT tool, because most of them are either DBAs or PL/SQL programmers.
September 10, 2013 - 9:40 am UTC
huh???
elt = extract, load and then transform (forget any tools, if you have hundreds of gb's or tb's of data - you'll be doing this down to the wire, not with pretty pictures and push buttons)
etl = extract, transform and then load - without using the database to transform
elt = going light speed
etl = going by boat
Amir Riaz, September 14, 2013 - 12:03 pm UTC
Dear Tom,
yes, you are right and I understand that but did you ever evaluated ODI against some other tools. They are trying to fix it from last 6 years and still there are no where.
OBIEE and Exadata are wonderful products but once you look for ELT tool from oracle, ODI really presents poor choice. with poor interfaces and illogical descriptive style.
just describe how do you move data from teradata to oracle if we cannot use oracle gateways, the data is huge and require parallel processing.
A few weeks back i have this problem and only solution was to use Oracle gateways, image what its consequence will be, if you everything you have to go to DBA.
ETL
Dragutin Jastrebic from France, September 15, 2013 - 7:16 pm UTC
I have worked on several datawarehouse projects,and we have always used the staging area (extract, transform) inside the database.
But there are many different situations.
For exemple, in his book "The datawarehouse ETL Toolkit" (Chapter 2: ETL data structures) , Ralph Kimball gives arguments in favor of database tables but also arguments in favor of flat files
Dragutin
ETL
David Aldridge, September 17, 2013 - 6:56 am UTC
Just as background information the last time I worked with Informatica Powercenter, about 5 years ago, there was an option to push code execution down to the database rather than running it through the Powercenter server. So, if you had a simple transformation within a single database it could be rewritten to a SQL statement -- an update, for example -- and executed on the database server.
This was a performance enhancement for which an extra license fee was payable.
So just to clarify, if you had paid for Powercenter but still needed higher performance you could pay extra to not use Powercenter.
Alexander, April 21, 2014 - 4:29 pm UTC
Hi Tom.
I have a strange problem that I'm hoping to get any ideas you may have. We use an Oracle product called OBIEE. It has an ETL and recently the support team had a problem in both production and test, where a query with no where clause was not returning all the rows it should have been. For example, the query was supposed to return about 100k rows from a table but was only getting about 80-90k. For some reason, dropping and recreating the table resolved the issue, and after it returned all the rows.
Can you think of anything that could possibly cause this? I don't even know where to begin it's such a strange problem. I'm trying to rule out the database as a cause for the post-mortem analysis.
This is the query just so you can see it:
SELECT EMPLOYEE_NUMBER,
TASK_NAME,
RESULTS_ACHIEVED,
POTENTIAL,
RESULTS_POTENTIAL_RAW,
RESULTS_POTENTIAL,
READINESS_2011,
READINESS,
IC_READY_TO_BE_AN_FLM,
PARA_READY_TO_BE_A_SUPERVISOR,
ARE_YOU_WILLING_TO_RELOCATE,
RETENTION_RISK,
IMPACT_OF_LOSS,
POTENTIAL_NEXT_ROLES,
READY_FOR_IMMED_MOVE_FLG
FROM
WC_ELC_TALENT_MSTR_ALL