Skip to Main Content
  • Questions
  • Oracle SQL Repeated words in the String

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Venkat.

Asked: January 12, 2017 - 3:34 pm UTC

Last updated: January 12, 2017 - 4:40 pm UTC

Version: 11.2.0.4.0

Viewed 1000+ times

You Asked

I need your suggestions/inputs on of the following task. I have the following table
ID ID_NAME
1 TOM HANKS TOM JR
2 PETER PATROL PETER JOHN PETER
3 SAM LIVING
4 JOHNSON & JOHNSON INC
5 DUHGT LLC
6 THE POST OF THE OFFICE
7 TURNING REP WEST
8 GEORGE JOHN

I need SQL to find a repetitive word for every ID if exist what's the count of the word it repeated for instance in ID 2 PETER repeated for 3 times and in ID 1 TOM repeated for 2 times. so I need the output something like this
ID ID_NAME COUNT
1 TOM HANKS TOM JR 2
2 PETER PATROL PETER JOHN PETER 3
3 SAM LIVING
4 JOHNSON & JOHNSON INC 2
5 DUHGT LLC
6 THE POST OF THE OFFICE 2
7 TURNING REP WEST
8 GEORGE JOHN

The table has 560K rows

I tried the below and it didn't work and it literally is looking for every single word.
SELECT RESULT, COUNT(*)
FROM (SELECT
REGEXP_SUBSTR(COL_NAME, '[^ ]+', 1, COLUMN_VALUE) RESULT
FROM TABLE_NAME T ,
TABLE(CAST(MULTISET(SELECT DISTINCT LEVEL
FROM TABLE_NAME X
CONNECT BY LEVEL <= LENGTH(X.COL_NAME) -
LENGTH(REPLACE(X.COL_NAME, ' ', '')) + 1
) AS SYS.ODCINUMBERLIST)) T1
)
WHERE RESULT IS NOT NULL
GROUP BY RESULT
ORDER BY 1;

and Chris said...

You also need to group by the original string:

with table_name as (
  select 'TOM TOM' col_name from dual union all
  select 'THE POST OF THE OFFICE' col_name from dual union all
  select 'PETER PATROL PETER JOHN PETER' col_name from dual union all
  select 'GEORGE JOHN' col_name from dual 
)
select col_name, result, count ( * )
from
  (select regexp_substr ( col_name, '[^ ]+', 1, column_value ) result, col_name
  from table_name t , 
    table ( cast ( multiset
    ( select distinct level
    from table_name x
    connect by level <= length ( x.col_name ) - length ( replace ( x.col_name, ' ', '' ) ) + 1
    ) as sys.odcinumberlist ) ) t1
  )
where result is not null
group by col_name, result
order by 1;

COL_NAME                       RESULT  COUNT(*)  
GEORGE JOHN                    GEORGE  1         
GEORGE JOHN                    JOHN    1         
PETER PATROL PETER JOHN PETER  JOHN    1         
PETER PATROL PETER JOHN PETER  PATROL  1         
PETER PATROL PETER JOHN PETER  PETER   3         
THE POST OF THE OFFICE         OF      1         
THE POST OF THE OFFICE         OFFICE  1         
THE POST OF THE OFFICE         POST    1         
THE POST OF THE OFFICE         THE     2         
TOM TOM                        TOM     2 


Not a complete solution, but should get you on your way...

Is this answer out of date? If it is, please let us know via a Comment