Hi Tom
It looks simple but I am unable to do this efficiently.
CUST_LEGACY table has duplicate records on BRAND. So basically, I want to group by PRODUCT only.
create table customer(cust_id char(4), product varchar2(50), make varchar2(50), qty number);
insert into customer (cust_id) values ('ABR1');
insert into customer (cust_id) values ('ABR2');
insert into customer (cust_id) values ('CD01');
insert into customer (cust_id) values ('CD08');
create table cust_legacy(cust_id char(4), product varchar2(50), brand varchar2(50), qty number);
insert into cust_legacy values ('ABR1', 'DVD', 'Sony', 5);
insert into cust_legacy values ('ABR1', 'DVD', 'Canon',6);
insert into cust_legacy values ('ABR2', 'Camera','Sony',8);
insert into cust_legacy values ('ABR2', 'Camera', 'Nikon',23);
insert into cust_legacy values ('ABR1', 'Laptop', 'Dell', 2);
Now, I have to update the customer table based on CUST_ID and irrespective of the brand, we have to group by product only from cust_legacy and where there is no duplicate in cust_legacy.
update customer a set (product, make, qty) =
(select product, brand, qty from cust_legacy b where
product= (select product from cust_legacy c where a.cust_id=b.cust_id group by
product having count(*) =1)
);
SQL> select * from customer;
CUST PRODUCT MAKE QTY
---- -------- -------- ----------
ABR1 Laptop Dell 2
ABR2
CD01
CD08
As cust_legacy table can have about 12 million records for updation in customer table, I am looking for an efficient SQL and I came up with this below but as you can see, it throws "invalid identifier" error.
UPDATE
CUSTOMER A
SET
(A.PRODUCT, A.MAKE, A.QTY) =
( SELECT
PRODUCT, BRAND, QTY FROM
(SELECT PRODUCT, BRAND, QTY, COUNT(*) OVER (PARTITION BY PRODUCT) CNT FROM CUST_LEGACY B
WHERE A.CUST_ID = B.CUST_ID)
) WHERE CNT=1
);
This throws A.CUST_ID as invalid identifier. I understand that table CUSTOMER is not visible inside the second subquery. How do I resolve this and also at the same time, efficiently update the CUSTOMER table?
Thanks,
VBala
You can change the count(*) to include the cust_id in the partition. This enables you to find those customers and products with a single row. This allows you to reference cust_legacy just once. You can then join on cust_id in the subquery:
update customer a
set ( a.product, a.make, a.qty ) =
(select product, brand, qty
from
(select cust_id, product, brand,
qty, count ( * ) over ( partition by cust_id, product ) cnt
from cust_legacy b
) b
where cnt =1
and a.cust_id = b.cust_id
) ;
4 rows updated.
select * from customer;
CUST PRODUCT MAKE QTY
---- ---------- ---------- ----------
ABR1 Laptop Dell 2
ABR2
CD01
CD08I don't fully understand the logic though. Is it guaranteed that a customer will only have one product that doesn't have duplicates?
If not, you'll hit errors:
insert into cust_legacy values ('ABR1', 'Camera', 'Nikon',23);
1 row inserted.
update customer a
set ( a.product, a.make, a.qty ) =
(select product, brand, qty
from
(select cust_id, product, brand,
qty, count ( * ) over ( partition by cust_id, product ) cnt
from cust_legacy b
) b
where cnt = 1
and a.cust_id = b.cust_id
) ;
SQL Error: ORA-01427: single-row subquery returns more than one row