Pauline, July 03, 2001 - 11:33 am UTC
Peter
Pierre Bilodeau, December 19, 2001 - 9:06 am UTC
Thanks for your tips
Safely fixing the duplicates
Steve Reich, May 08, 2002 - 10:46 am UTC
Be careful deleting the duplicate constraints. Some of the duplicate constraints are the not null check constraints automatically created when you define a column as not nullable. If you drop the original check constraint then the nullable field in user_tab_columns will be set to 'Y', even if you preserve a duplicate constraint, so be sure not to drop the original.
what constraints are created ?
A reader, December 30, 2002 - 2:11 pm UTC
Hi tom,
Can you please tell me or point me to the link(website)
where can look for
1.) how may and what constraints are created when I
create a primary key
2.) how may and what constraints are created when I
create a foreign key
3.) and How to name those implicit constraints ?
December 30, 2002 - 2:58 pm UTC
It is the Oracle SQL Reference guide.
ops$tkyte@ORA920> create table t ( x int , constraint t_pk primary key(x) )
2 /
Table created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> select count(*) from user_constraints;
COUNT(*)
----------
1
A constraint creates ONE constraint. a primary key is a constraint, a foreign key is a constraint. one constraint each.
Duplicate checker
PJ, August 24, 2004 - 2:58 pm UTC
Hi Tom,
Hope you are doing fine.
We have a requirement here.
We are expecting an input flat file having 5 million rows everyday for 30 days.
and we have to compare each row of this file with previous 30days records.
So any suggestions for incorporating this "Duplicate checking" functionality?
I was thinking of having an sqlldr direct path to load the file with an additional last_modified_date field. Then on 2nd day again load the file, it will put duplicate records in BAD file then again extract the rows from table with last_modified_date=systemdate.
But i was concernerd when we reach to 150 million how sqlldr will behave.
You must be having an unique (YOUR) way of doing this functionality.
Please help as usual.
August 24, 2004 - 3:44 pm UTC
why not just have a unique constraint on the columns to check?
sqlldr will write failed records to a bad file.
we want GOOD records back in file
PJ, August 25, 2004 - 4:35 am UTC
Hi Tom,
I guess I was not clear mentioning myrequirement. Sorry for that.
We are expecting a flat file with ~5M records everyday. We have to remove the Duplicates from that file (comaring with a rollong window of 30 days...Dont know whether its a good idea to store in Database or keep it in file only) And then pass on the file after removing duplicates. So duplicate checking is just a part of operation.
We are yet to decide whether to load these records in DB & compare or create some hash value of the unique key (6-7 columns make it unique).
Is there any way sqlldr creates GOOD records i.e. opposite to BAD records, so that after loading, we can simply forward this file further.
I hope I'm clear with this, please let me know your views on the various approaches.
Many thanks,
August 25, 2004 - 7:40 am UTC
sqlldr only knows how to put good data into the database.
SO DB is not a good option?
PJ, August 25, 2004 - 7:47 am UTC
Tom,
So do you mean to say putting data into database in this case is not a good option?
because if we load the data, then good records will be put in DB, but again we want to extract todays (~5M) records back in file to processd further.
So it will be a roundtrip to oracle (once load-once extract) to check the duplicates.
Can we create some hash key using a set of columns & only store that in a different table and then compare or something like that?
Thanks for ur continuous help.
August 25, 2004 - 7:52 am UTC
if you want the data outside of the database, not sure why you are putting it into the database.
seems once I put data IN, i keep it IN, and i do my "further processing" IN the database myself.
so I would be looking for ways to turn the problem around, rather then "put it in, take it out", I'd be looking to "put it in, keep it in, process it in"
but sure, you can put in "good" and "unload good" -- that should be "obvious"?
PJ, August 25, 2004 - 7:59 am UTC
Thanks for such a quick reply Tom.
You are right. The data is actual outside of DB. but we might need to store 6 months worh of data in future. either in flat file or in table (not yet decided) So i was asked my opinionmy my boss & i suggested that if we are going to store in tables why not laod the data in table, the duplicate rows wont get inserted so unload the data again everyday (~5 million) to proceed further.
August 25, 2004 - 8:11 am UTC
why "unload" -- that is, and will be my question.
once it gets into the database, where it is safe, usable, secure, recoverable, etc -- why take it back out.
Application requires so.
PJ, August 25, 2004 - 9:28 am UTC
Its a typical telecom aplication where we need to further enrich this non duplicate data.
These records in the file needs to be processed further.
So we need it back in flat file. But if we dont upload it in the DB in the first place, then we'll need to think of something to write in C or some other language to find out the duplicate rows ( everyday 5 M comared to 150 M ).
I hope I'm clear this time.
Thanks as always Tom.
August 25, 2004 - 10:54 am UTC
well, i'll just keep coming to to "there is absolutely no better place to further process DATA than in a DATAbase"
if you don't want to use the database -- you'll need to sort of figure out what works best for you. me, i use the DATAbase to process DATA.
so -- it is totally up to you where and how you do this, i've no comment really. at a loss to understand how a "typical telcom application" would not take advantage of a database (since the telco applications I've seen sort of *do* use the database)