Skip to Main Content
  • Questions
  • Removing all lines starting with a particular text

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Ankit.

Asked: March 16, 2023 - 5:19 am UTC

Last updated: March 21, 2023 - 5:37 pm UTC

Version: 20.2.0

Viewed 10K+ times! This question is

You Asked

Hi Tom,

I have a text column with clob datatype. (Below is how data looks in the column).

I want to have only the customer input information in the column and remove all private notes.

Text Example:

Customer Input 04.12.2022 here is sentence 1 ||Private Notes This is sentence 2||Private Notes This is sentence 3
||Customer Input this is sentence 4||Private Notes This is sentence 5 and needs to be removed||Customer Input this is sentence 6


Expected Output:
Customer Input 04.12.2022 here is sentence 1 ||Customer Input this is sentence 4||Customer Input this is sentence 6


and Chris said...

Updated following comment from mathguy

Here's one way with regular expressions:

Search for

\|\|Private Notes[^|]*


This finds all the text starting "||Private Notes" up to the next pipe. The backslashes are necessary to escape | at the start, which regular expressions use as the OR operator.

[^|]* matches everything after "||Private Notes" that is not "|". Inside the square brackets is a character class, so the escaping is unnecessary there.

Here it is in action:

select regexp_replace (
  'Customer Input 04.12.2022 here is sentence 1 ||Private Notes This is sentence 2||Private Notes This is sentence 3||Customer Input this is sentence 4||Private Notes This is sentence 5 and needs to be removed||Customer Input this is sentence 6',
  '\|\|Private Notes[^|]*'
       ) removed
from   dual;

REMOVED                                                                                                            
-------------------------------------------------------------------------------------------------------------------
Customer Input 04.12.2022 here is sentence 1 ||Customer Input this is sentence 4||Customer Input this is sentence 6

Rating

  (3 ratings)

We're not taking comments currently, so please try again later if you want to add a comment.

Comments

Negated character class

mathguy, March 19, 2023 - 6:20 pm UTC

[^(\|\|)]* matches everything after "||Private Notes" that is not "||".

That is incorrect. That regular expression matches every character that is not a left parenthesis, a backslash, a pipe character or a right parenthesis. There are no escape sequences in a character class (either a standard one or a negated one); and character classes (including negated character classes) can only match one character at a time, there is no grouping in character classes.

Some regexp dialects (but not the one chosen by Oracle) support escape sequences in character classes. No dialect supports grouping of multiple characters within a character class, negated or otherwise.

Before suggesting another solution to the OP, one would need to understand what he meant by the || sequence. He is talking about "lines" in the title; perhaps he showed us || as a placeholder for "newline", where the CLOB has actual newline characters, chr(10)? Or are those double-pipes really hard-coded in the CLOB, as a line separator?
Chris Saxon
March 20, 2023 - 2:08 pm UTC

Well you know what they say about regular expressions!

You're right, I've updated the answer. Clarity on the requirements from OP would be desirable; I doubt "||" is newline though because there's a newline before "sentence 4".

mathguy, March 20, 2023 - 3:39 pm UTC

The updated answer is better, but still not perfect. There is still the possibility that some text that must be removed will contain a single pipe (not two consecutive pipes), and such characters will stop the deletion of text. For example:

||Private Notes This note contains |, a pipe character


Deletion will stop at the single pipe character in the middle of the sentence; the pipe, the comma, the space, and the rest of the text will not be deleted.

One might wonder why single-pipes would be present in text; but if they aren't, one might equally well wonder why TWO pipes indicate the end of a line, instead of a single one.

It is possible to write a 100% correct solution, that will not trip over single-pipes. Another thing that must be fixed - what if the very first "line" must be removed? It begins with the start-of-string anchor, rather than with two pipes. (Easy to fix.) I would still wait for clarification from the OP though, whether he really did mean that those two-consecutive-pipes fragments exist as written in the CLOB.
Chris Saxon
March 20, 2023 - 3:43 pm UTC

True, but like you say there are still lots of unknowns here. This should be enough to get the OP going though

Specific Text Extraction

Ankit B, March 21, 2023 - 10:36 am UTC

Hi Tom,

Thanks for the answer that really helps. But just wanted to understand

Example text:
Customer Input (25/04/22): What is your name? Problem Summary: Identify text. +++ Customer Input: My name is XYZ.

1. I want to identify, how many times customer input is present in the text
2.Any customer input before problem summary needs to be excluded.
Expected output:

Text | Counter(No of times customer input present)
--------------------------------------------------------------------
Customer Input: My name is XYZ, 2


Chris Saxon
March 21, 2023 - 5:37 pm UTC

1. You can use regexp_count to count the number of times a string appears
2. With regexp_replace you can remove everything before the last "Customer Input"

e.g.:

with rws as (
  select 'Customer Input (25/04/22): What is your name? Problem Summary: Identify text. +++ Customer Input: My name is XYZ.' str 
  from   dual
)
  select regexp_count ( str, 'Customer Input' ) num,
         regexp_replace ( str, '(.*)(Customer Input)', '\2' ) last
  from   rws;
  
       NUM LAST                           
---------- -------------------------------
         2 Customer Input: My name is XYZ.

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.