Skip to Main Content
  • Questions
  • delete based on two columns and property of another table

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Robert.

Asked: October 24, 2011 - 12:49 pm UTC

Last updated: October 25, 2011 - 9:26 am UTC

Version: 10g xe

Viewed 10K+ times! This question is

You Asked

create table hwsets_v4 ( id number(38), qlist CLOB, primary key(id) );

insert into hwsets_v4 values (3129, '1@2386:1@2387:1@2388:1@2389:1@2391:1@2392:1@2393:1@2394/2395;2396/2877');
insert into hwsets_v4 values (3130, '1@2664:1@2672:1@2673:1@2674:1@2675:1@2677:1@2678:1@3425:1@3426:1@3706:1@6058');

create table responses_v5 ( user_id varchar2(20), hw_id number(38), pb_id number(38), tries number(38), grade number(3,2), response CLOB, primary key(user_id, hw_id, pb_id, tries) );

insert into responses_v5 values ( 'bob', 3129, 2386, 1, 1.0, 'response1' );
insert into responses_v5 values ( 'bob', 3129, 2390, 1, 1.0, 'response2' );
insert into responses_v5 values ( 'bob', 3129, 2394, 1, 1.0, 'response3' );
insert into responses_v5 values ( 'bob', 3129, 2395, 1, 1.0, 'response4' );
insert into responses_v5 values ( 'bob', 3129, 2877, 1, 1.0, 'response5' );
insert into responses_v5 values ( 'refoyl', 3129, 2386, 1, 1.0, 'response6' );
insert into responses_v5 values ( 'refoyl', 3129, 2390, 1, 0.0, 'response7' );
insert into responses_v5 values ( 'refoyl', 3129, 2390, 2, 1.0, 'response8' );
insert into responses_v5 values ( 'refoyl', 3129, 2391, 1, 0.0, 'response9' );
insert into responses_v5 values ( 'refoyl', 3129, 2391, 2, 1.0, 'response10' );
insert into responses_v5 values ( 'refoyl', 3130, 2391, 1, 1.0, 'response11' );
insert into responses_v5 values ( 'refoyl', 3130, 2673, 1, 1.0, 'response12' );


I want to delete all rows from responses_v5 in which responses_v5.pb_id no longer appears in hwsets_v4.qlist corresponding to hwsets_v4.hw_id = responses_v5.hw_id. In the example above, that means the rows with responses 2, 7, 8, and 11.

Seems simple enough, but the simplest query that seems reasonable is not valid:

SQL> delete from responses_v5 where responses_v5.hw_id = hwsets_v4.hw_id and REGEXP_INSTR(hwsets_v4.qlist, '[@/;]' || responses_v5.pb_id || '([/;:]|$)') <= 0;
                                                                                     *
ERROR at line 1:
ORA-00904: "HWSETS_V4"."QLIST": invalid identifier


This works:

delete from responses_v5 where hw_id || ' ' || pb_id in (select unique responses_v5.hw_id || ' ' || responses_v5.pb_id from responses_v5, hwsets_v4 where responses_v5.hw_id = hwsets_v4.hw_id and REGEXP_INSTR(hwsets_v4.qlist, '[@/;]' ||  responses_v5.pb_id || '([/;:]|$)') <= 0);


but I'm sure there's a way to do it that doesn't rely on concatenation.

It would be easy if there were only one primary key in responses_v5, but the fact that there are two keys that both need to be considered, plus necessary information in a second table, is what's throwing me for a loop.


and Tom said...

Well, a couple of methods pop into mind.

First of all - you don't need concatenation at all - when using IN or NOT IN, you can do this:

where (a,b,c) in (select d,e,f from ....)

You can use multiple columns - so instead of "where hw_id||' '||pb_id in (select esponses_v5.hw_id || ' ' || responses_v5.pb_id" - you would just code

"where (hw_id,pb_id) in (select responses_v5.hw_id, responses_v5.pb_id ..."

Note that I did not use UNIQUE, you never need the unique in that set operation, IN takes care of that - the only thing the UNIQUE could be doing it impacting runtime query performance negatively.

ops$tkyte%ORA11GR2> delete from responses_v5
  2   where (hw_id, pb_id) in (select responses_v5.hw_id, responses_v5.pb_id
  3                              from responses_v5, hwsets_v4
  4                             where responses_v5.hw_id = hwsets_v4.id
  5                                                       and REGEXP_INSTR(hwsets_v4.qlist,
  6                                   '[@/;]' ||  responses_v5.pb_id || '([/;:]|$)') <= 0);

4 rows deleted.




Here is another approach:

ops$tkyte%ORA11GR2> delete from responses_v5
  2   where not exists (select null
  3                       from hwsets_v4
  4                      where responses_v5.hw_id = hwsets_v4.id
  5                        and REGEXP_INSTR(hwsets_v4.qlist, '[@/;]' || responses_v5.pb_id || '([/;:]|$)') > 0);

4 rows deleted.



Here is yet another, based on your second query attempt:

ops$tkyte%ORA11GR2> delete from
  2   (select null
  3      from responses_v5, hwsets_v4
  4     where responses_v5.hw_id = hwsets_v4.id
  5       and REGEXP_INSTR(hwsets_v4.qlist,
  6           '[@/;]' ||  responses_v5.pb_id || '([/;:]|$)') <= 0);

4 rows deleted.





I must say, I'm not entirely thrilled with your "data model". That table hwsets_v4 goes against just about every principle involved in relational database design... In other words - that table is just plain "broke".

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, October 25, 2011 - 8:54 am UTC

Thanks for the feedback. I didn't know that "where (a, b) in (select (c, d) ...)" was proper SQL. As for the table being broke, can you explain further? BTW, there are a lot more fields in both hwsets_v4 and responses_v5 that I didn't include because they weren't relevant to the problem at hand. Here are the full descriptions of the three relevant tables:

<p align="center">hwsets_v4, assignment
description</p>
<table summary="">
<tr><th>Field</th><th>Description</th><th>Type</th></tr>
<tr><td>hw_id</td>
<td>unique identifier of this assignment
</td>
<td>number(38)</td></tr>
<tr><td>course_id</td>
<td>course &rarr; cw_courses_v3.id</td>
<td>number(38)</td></tr>
<tr><td>serial_no</td>
<td>serial number of assignment in course</td>
<td>number(38)</td></tr>
<tr><td>name</td>
<td>assignment name</td>
<td>CLOB</td></tr>
<tr><td>remarks</td>
<td>comments about this assignment, displayed to students</td>
<td>CLOB</td></tr>
<tr><td>date_created </td>
<td>date and time that assignment was created</td>
<td>varchar2(50)</td></tr>
<tr><td>date_due</td>
<td>due date and time of assignment</td>
<td>varchar2(50)</td></tr>
<tr><td>tries</td>
<td>how many tries students are allowed (&ndash;1 if no limit)</td>
<td>number(38)</td></tr>
<tr><td>question_points</td>
<td>number of points each question is worth; format is p1/p2/p3...</td>
<td>CLOB</td></tr>
<tr><td>attempt_grading</td>
<td>attempt-dependent grading parameters; format is
x<sub>1</sub>/y<sub>1</sub>/x<sub>2</sub>/y<sub>2</sub>/...//y<sub>n</sub>
where x is the number of attempts and y is the corresponding
multiplier
</td>
<td>CLOB</td></tr>
<tr><td>time_grading</td>
<td>time-dependent grading parameters; format is
x<sub>1</sub>/y<sub>1</sub>/x<sub>2</sub>/y<sub>2</sub>/...//y<sub>n</sub>
where x is the number of days past due and y is the corresponding
multiplier
</td>
<td>CLOB</td></tr>
<tr><td>dependencies</td>
<td>m<sub>1</sub>:n<sub>1</sub>;m<sub>2</sub>:n<sub>2</sub>...
where students can see each question m only after
correctly answering each question n</td>
<td>CLOB</td></tr>
<tr><td>extensionees</td>
<td>students who have received an extension, and the number of days of each extension</td>
<td>CLOB</td></tr>
<tr><td>flags</td>
<td>bit 0: visible to students
<br/>bit 1: record work in gradebook after due date
<br/>bit 2: this assignment is an exam
<br/>bit 3: save all tries, not just the most recent one
<br/>bit 4: delay grading and feedback until instructor prompts it
</td>
<td>number(38)</td></tr>
<tr><td>rxn_cond_id</td>
<td>colon-separated list of reaction conditions that ACE makes
available to students for synthesis questions in this assignment
&rarr; reaction_conditions_v3.rxn_cond_id
</td>
<td>CLOB</td></tr>
<tr><td>qlist</td>
<td>list of questions in assignment, for example,
2@4713/312;71/4560;4/2219:1@3856:1@3857
<br/>more generally,
a colon-separated list of
pick@qId[/qId]<sub>m</sub>[;qId[/qId]<sub>m</sub>]<sub>n</sub>
where pick is the number of question bundles to assign from
the subsequent semicolon-separated list of n + 1 bundles, each
of which contains m + 1 slash-separated question ID numbers
&rarr; questions_v3.id
</td>
<td>CLOB</td></tr>
</table>

<p align="center">assignment_questions_v2,
questions in a student's instantiated assignment</p>
<table summary="">
<tr><th>Field</th><th>Description</th><th>Type</th></tr>
<tr><td>user_id</td>
<td>student's user ID &rarr; cw_user_v2.user_id</td>
<td>varchar2(40)</td></tr>
<tr><td>assignment_id</td>
<td>unique assignment ID &rarr; hwsets_v4.id</td>
<td>number(38)</td></tr>
<tr><td>qlist</td>
<td>question ID numbers as colon-separated list; instantiated from
hwsets_v4.qlist</td>
<td>varchar(2000)</td></tr>
</table>


<p align="center">responses_v5, student response
to a question</p>
<table summary="">
<tr><td>hw_id</td>
<td>assignment number
&rarr; hwsets_v4.hw_id</td>
<td>number(38)</td>
</tr>
<tr><td>student_id</td>
<td>student
&rarr; cw_user_v2.user_id</td>
<td>varchar2(50)</td>
</tr>
<tr><td>pb_id</td>
<td>unique question ID number
&rarr; questions_v3.id
or user_questions_v3.id</td>
<td>number(38)</td>
</tr>
<tr><td>tries</td>
<td>number of times the student has responded
to this question</td>
<td>number(38)</td>
</tr>
<tr><td>grade</td>
<td>0.0 to 1.0 inclusive</td>
<td>number(3,2)</td>
</tr>
<tr><td>response</td>
<td>student response</td>
<td>CLOB</td>
</tr>
<tr><td>r_groups</td>
<td>shortcut groups selected to substitute R groups in
this instance of the question</td>
<td>CLOB</td>
</tr>
<tr><td>date_time</td>
<td>time of student's response</td>
<td>varchar2(25)</td>
</tr>
<tr><td>status</td>
<td>whether response is initialized, saved, evaluated, or requires human
grading</td>
<td>char(1)</td>
</tr>
<tr><td>instructor_comment</td>
<td>comment by instructor on student's response</td>
<td>CLOB</td>
</tr>
<tr><td>ip_address</td>
<td>IP address from which student submitted response</td>
<td>varchar2(16)</td>
</tr>
</table>

Tom Kyte
October 25, 2011 - 9:26 am UTC

As for the table being broke, can you explain further?

you are storing multiple attributes in a single column - encoded in a string - which has to be parsed over and over and over again.

It should be normalized out into rows and columns.

the first string looks like it should be stored as:

col1      col2
------    ------
1         2386
1         2387
....
    
and so on

A reader, October 25, 2011 - 9:42 am UTC

I see, thanks. Those strings used to be a lot simpler, but we kept extending them without reconsidering the data structures we were using to store them. And regular expressions make it possible (though slow) to parse them on the fly. Thanks for pointing out what should have been obvious to me.

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here