drop table test purge;
create table test (ssn int, address varchar2(10), tax_id varchar2(10), identify int);
insert into test values(100, '','', null);
insert into test values(111, '','', null);
insert into test values(111, 'xyz','', null);
insert into test values(null, 'xyz','', null);
insert into test values(111, '','222', null);
insert into test values(null, '','222', null);
insert into test values(122, '','', null);
commit;
--Question--
Could you please help me write a better query to assign a unique id for IDENTIFY field,
so as to associate the same ID having same SSN or ADDRESS or TAX_ID assuming the person to be same.
where; "-" can be any value or null.
SSN ADDRESS TAX_ID IDENTIFY
100 - - - --> 1 -- a new id
111 - - - --> 2 -- a new id
111 xyz - - --> 2 -- ssn same
- xyz - - --> 2 -- address same
111 - 222 - --> 2 -- ssn same
- - 222 - --> 2 -- tax_id same
122 - - - --> 3 -- a new id
7 rows selected.
So you want to assign an id, based on the SSN? But some rows have no SSN defined?
If so, you can find an SSN for the rows without one by joining to the table again. Look for rows where the address or tax_id matches the current row:
select t.*,
case
when ssn is null then
(select min(ssn) from test t2
where (t2.address = t.address) or
(t2.tax_id = t.tax_id)
)
else ssn
end ssn_v2
from test t;
SSN ADDRESS TAX_ID IDENTIFY SSN_V2
100 100
111 111
111 xyz 111
xyz 111
111 222 111
222 111
122 122
You can then use dense_rank, ordering by this expression, to assign the IDs:
select t.*,
dense_rank() over (order by case
when ssn is null then
(select min(ssn) from test t2
where (t2.address = t.address) or
(t2.tax_id = t.tax_id)
)
else ssn
end) id
from test t;
SSN ADDRESS TAX_ID IDENTIFY ID
100 1
111 xyz 2
111 2
xyz 2
111 222 2
222 2
122 3
Note: for this to work, you need to be sure that the columns ADDRESS and TAX_ID uniquely identify someone. TAX_ID probably does, but ADDRESS almost certainly doesn't!
Most houses have multiple living in them. So if you have just an address with no SSN or TAX_ID you could incorrectly merge people...
For example, if you have another person at address xyz:
insert into test values(133, 'xyz','', null);
Which SSN does the row with only xyz set belong to? You don't know!