Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Gianna.

Asked: January 20, 2006 - 11:36 am UTC

Last updated: January 07, 2011 - 3:58 pm UTC

Version: 9.2.0

Viewed 10K+ times! This question is

You Asked

Hi Tom,

I have to count occurrences of characters into a column. The solution I found works with few records, but it bombs with a bigger table:


bsgd9@ORABE> create table aaa (nome varchar2(100));

Table created.

bsgd9@ORABE> insert into aaa values ('AAAA');

1 row created.

bsgd9@ORABE> insert into aaa values ('BBBB');

1 row created.

bsgd9@ORABE> insert into aaa values ('CCCC');

1 row created.

bsgd9@ORABE> insert into aaa values ('DDDD');

1 row created.

bsgd9@ORABE> select count(*), b.item ITEM
2 from ( select a.ITEM
3 from (select distinct level, substr( nome,level,1 ) ITEM from aaa connect by level < length(nome) + 1 ) a ) b
4 group by b.item
5 order by b.item;

COUNT(*) I
---------- -
4 A
4 B
4 C
4 D


I try to clarify: solution I found works fine with a single string:

bsgd9@ORABE> var X varchar2(100)

bsgd9@ORABE> exec :X := 'JOHN ALLEN PAULOS'
PL/SQL procedure successfully completed.

bsgd9@ORABE> select count(*), b.item ITEM
2 from ( select a.ITEM
3 from (select substr( :X,level,1 ) ITEM
4 from dual connect by level < length(:X) + 1
5 ) a
6 ) b
7 group by b.item
8 order by b.item;

COUNT(*) I
---------- -
2
2 A
1 E
1 H
1 J
3 L
2 N
2 O
1 P
1 S
1 U

11 rows selected.


But it returns a wrong result-set when it is applied to a table:

bsgd9@ORABE> create table aaa (nome varchar2(100));
Table created.

bsgd9@ORABE> insert into aaa values ('JOHN ALLEN PAULOS');
1 row created.

bsgd9@ORABE> insert into aaa values ('THEODOR SMITH');
1 row created.

bsgd9@ORABE> select * from aaa;

NOME
--------------------------------
JOHN ALLEN PAULOS
THEODOR SMITH

bsgd9@ORABE> select count(*), b.item ITEM
2 from ( select a.ITEM
3 from (select substr( nome,level,1 ) ITEM
4 from aaa connect by level < length(nome) + 1
5 ) a
6 ) b
7 group by b.item
8 order by b.item;

COUNT(*) I
---------- -
1168
4128 A
16 D
260 E
4102 H
1024 I
1 J
8384 L
512 M
520 N
8234 O
2048 P
64 R
8448 S
2049 T
8192 U

16 rows selected.

Something strange seems to be performed.
I expected to have the count of all occurrences of each character in the table field "nome", similarly to the single string case.

It is possible to resolve the issue in SQL?


Thanks
Gianna


and Tom said...

we need a "table" with as many rows as the longest string - 100 in your case. We can join to that to make the nome field be output into the result set once for each character in it. Then, using substr - get the i'th character.

Simple group by and we are done:

ops$tkyte@ORA10GR2> with data
2 as
3 (select level r from dual connect by level <= 100)
4 select substr( nome, r, 1 ), count(*)
5 from aaa, data
6 where data.r <= length(aaa.nome)
7 group by substr(nome,r,1)
8 order by substr(nome,r,1)
9 /

S COUNT(*)
- ----------
3
A 2
D 1
E 2
H 3
I 1
J 1
L 3
M 1
N 2
O 4
P 1
R 1
S 2
T 2
U 1

16 rows selected.


Rating

  (16 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Excellent !

Parag J Patankar, January 30, 2006 - 5:49 am UTC

Hi Tom,

I do not have words to express your brilliant answer to this question.

Very simple and powerful solution !!!!!!

Hats off !!!

best regards
pjp

Excellent++

Andrea, January 30, 2006 - 12:59 pm UTC

Excellent++

It's amazing what sql can do...
A chapter in your (upcoming) new book should be entitled "things you could not believe sql could do" (or something more english...)

Finding out the positions and count of a particular character within a string

a reader, February 05, 2007 - 8:49 am UTC

Please suggest how to get the positions and count(for each time) of a particular character or a string in a given text.

for eg:

say i have a line: ASDF****BB**NHYT*****JJ

i want to find out the positions of ' * ' and the count of stars for each position, so my result should be:
5th position 4stars
11th position 2stars
17th position 5stars
thanks
Tom Kyte
February 05, 2007 - 9:41 am UTC

I would use instr and substr, but only because I like them

you could today use the regex (regular expression) functions as well

all are documented in the sql reference guide online at otn.oracle.com

Re:Finding out the positions and count of a particular character within a string

Frank Zhou, February 05, 2007 - 2:58 pm UTC

Here is a 10G solution using regular expression .

Thanks,

Frank

SQL> select str, char_start , char_end - char_start num_char
2 from (
3 select str,
4 REGEXP_INSTR(st,'[[:alnum:]]#+',1,LEVEL) +1 AS char_start,
5 REGEXP_INSTR(st,'#[[:alnum:]]',1,LEVEL) +1 AS char_end
6 FROM
7 (SELECT str,
8 REGEXP_REPLACE(str,'[^[:alnum:]]','#') as st
9 FROM (select 'ASDF****BB**NHYT*****JJ' as str from dual)
10 )
11 CONNECT BY PRIOR str = str
12 AND REGEXP_INSTR (st, '[[:alnum:]]+#+', 1, LEVEL) > 0
13 AND PRIOR dbms_random.string ('p', 10) IS NOT NULL
14 );

STR CHAR_START NUM_CHAR
----------------------- ---------- ----------
ASDF****BB**NHYT*****JJ 5 4
ASDF****BB**NHYT*****JJ 11 2
ASDF****BB**NHYT*****JJ 17 5

SQL>

Re:Finding out the positions and count of a particular character within a string

Frank Zhou, February 05, 2007 - 4:11 pm UTC

The query is updated to take care some special cases
For example :

**ASDF****BB**NHYT*****JJ**

Frank


Test 1 : ASDF****BB**NHYT*****JJ

SQL>
SQL> select str, min(num) char_start,
2 max(num) - min(num) +1 as num_char
3 from (
4 select str, max(grp) over ( order by num) grp, num
5 from (select str , num ,
6 case when nvl(lag(num) over (order by num),num) != num-1
7 then num
8 end grp
9 FROM
10 (select str , INSTR(st,'#',1,LEVEL) AS num
11 FROM
12 (SELECT str, REGEXP_REPLACE(str,'[^[:alnum:]]','#') as st
13 FROM (select 'ASDF****BB**NHYT*****JJ' as str from dual)
14 )
15 CONNECT BY PRIOR str = str
16 AND INSTR (st, '#', 1, LEVEL) > 0
17 AND PRIOR dbms_random.string ('p', 10) IS NOT NULL
18 )
19 )
20 )
21 group by str, grp;

STR CHAR_START NUM_CHAR
----------------------- ---------- ----------
ASDF****BB**NHYT*****JJ 5 4
ASDF****BB**NHYT*****JJ 11 2
ASDF****BB**NHYT*****JJ 17 5


Test 2 : **ASDF****BB**NHYT*****JJ**

SQL> select str, min(num) char_start,
2 max(num) - min(num) +1 as num_char
3 from (
4 select str, max(grp) over ( order by num) grp, num
5 from (select str , num ,
6 case when nvl(lag(num) over (order by num),num) != num-1
7 then num
8 end grp
9 FROM
10 (select str , INSTR(st,'#',1,LEVEL) AS num
11 FROM
12 (SELECT str, REGEXP_REPLACE(str,'[^[:alnum:]]','#') as st
13 FROM (select '**ASDF****BB**NHYT*****JJ**' as str from dual)
14 )
15 CONNECT BY PRIOR str = str
16 AND INSTR (st, '#', 1, LEVEL) > 0
17 AND PRIOR dbms_random.string ('p', 10) IS NOT NULL
18 )
19 )
20 )
21 group by str, grp;

STR CHAR_START NUM_CHAR
--------------------------- ---------- ----------
**ASDF****BB**NHYT*****JJ** 1 2
**ASDF****BB**NHYT*****JJ** 7 4
**ASDF****BB**NHYT*****JJ** 13 2
**ASDF****BB**NHYT*****JJ** 19 5
**ASDF****BB**NHYT*****JJ** 26 2

SQL>
SQL> spool off;

count characters occurrences

dharanidhar, February 06, 2007 - 2:29 am UTC

excellent
thank you

with... what?

Harrison Picot, February 07, 2007 - 7:12 am UTC

with, as in

ops$tkyte@ORA10GR2> with data
2 as
3 (select level r from dual connect by level <= 100)
4 select substr( nome, r, 1 ), count(*)
5 from aaa, data

seems to work, but what is it doing here and where is it defined? Does it have any data in it, or is it just dummy? And for the sub-genius class (not me, of course, but a lot of my friends), could you flesh the rest of the logic out a bit? It is not immediately obvious and connect be seems to be used effectively by you but a lot of "my friends" just copy the examples without grasping what it is doing (even I have troubles, on admittedly rare occasions).
Tom Kyte
February 07, 2007 - 6:43 pm UTC

if you

select level r from dual connect by level <= 100


that creates 100 rows. WITH subquery factoring (search docs for subquery factoring) was new with 9i, works sort of like an inline view...

An alternative SQL solution for the original question !

Frank Zhou, February 07, 2007 - 3:53 pm UTC

Hi Tom,

Here is an alternative solution for Gianna's original question ""count characters occurrences"

Thanks,

Frank

SQL> SELECT SUBSTR(nome , LEVEL , 1), count(*)
2 FROM (SELECT nome FROM aaa)
3 CONNECT BY PRIOR nome = nome
4 AND LEVEL <= LENGTH(nome)
5 AND PRIOR dbms_random.string ('p', 10) IS NOT NULL
6 GROUP BY SUBSTR(nome , LEVEL , 1)
7 ORDER BY SUBSTR(nome , LEVEL , 1);

S COUNT(*)
- ----------
3
A 2
D 1
E 2
H 3
I 1
J 1
L 3
M 1
N 2
O 4
P 1
R 1
S 2
T 2
U 1

16 rows selected.

SQL> spool off;

Oracle unnesting tables property

Mihail Bratu, August 11, 2008 - 10:13 am UTC

Hi Tom

Here is a solution based on Oracle unnesting tables property for Gianna's original question 'count characters occurrences'

Thanks,

CREATE OR REPLACE
TYPE list_varchar100 AS TABLE OF VARCHAR2(100)
/
insert /*+append*/ into aaa
select object_name
from all_objects
/
column item format a5

select
t.column_value item ,count(*) n
from aaa, TABLE(CAST(MULTISET(
select substr(nome,level,1) from dual connect by level <= length(nome)
) as list_varchar2000 )) t
group by t.column_value
order by item
/

ITEM N
----- ----------
# 38
$ 9332
+ 56
- 34
. 531
/ 65195
0 17327
1 21940
2 20528
...

select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE 11.1.0.6.0 Production
TNS for Linux: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production

regexp_count

A reader, January 06, 2010 - 4:58 am UTC

Hi Tom

is there any alternate of regexp_count function of 11g in 10g.

thanks

Tom Kyte
January 06, 2010 - 8:31 am UTC

why?

A reader, January 06, 2010 - 5:25 am UTC

Hi Tom

I need to serach the number of occurenece of a character '|' in a clob data type in one of my tables.
database version is oracle 10g R2. 

how can I do this, in the examples above in this thread nowhere we have tried clob data type.

in 11g i know there is a regexp_count function which can do this.

create table and sample data

SQL> create table t (data1 clob) ;

Table created.

SQL> 
SQL> 
SQL> 



SQL> insert into t values ('A|A') ;

1 row created.

SQL> insert into t values ('A|A|A') ;

1 row created.

SQL> insert into t values ('A|A|A|A') ;

1 row created.

SQL> insert into t values ('A\A') ;

1 row created.

SQL> insert into t values ('A') ;

1 row created.

SQL> commit ;

Commit complete.

thanks

Tom Kyte
January 06, 2010 - 8:33 am UTC

ops$tkyte%ORA10GR2> select data1, length(data1)-length(replace(data1,'|','')) from t;

DATA1                LENGTH(DATA1)-LENGTH(REPLACE(DATA1,'|',''))
-------------------- -------------------------------------------
A|A                                                            1
A|A|A                                                          2
A|A|A|A                                                        3
A\A                                                            0
A                                                              0


extremely helpful

A reader, January 07, 2010 - 12:33 am UTC

thanks Tom, exactly what i needed.

thanks indeed

A reader, January 07, 2011 - 2:55 pm UTC

Thant was nice.. however I have a different issue.
I want to count the number of '*'s in the string
Abdfyim***bnko,***nghfb*nhgb*** using sql.

Please help.

Thanks,
SG
Tom Kyte
January 07, 2011 - 3:58 pm UTC

ops$tkyte%ORA11GR2> select length(s)-length(replace(s,'*','')) from
  2  (select 'Abdfyim***bnko,***nghfb*nhgb***' s from dual);

LENGTH(S)-LENGTH(REPLACE(S,'*',''))
-----------------------------------
                                 10


samita, January 10, 2011 - 10:24 am UTC

Thanks Tom..!! It works perferctly....

count occurance only to a point

Mani, November 25, 2013 - 5:12 pm UTC

i have a table with data as

x
--
xxYYxx
xxxxllaa
zxxx

the count should return only those starting with x and the count of the same till encounters other char.

so in effect it should return

2
4
0

for the above data. i tried regexp_count but simple 'x' search counts all 'x' and '^x' returns 1 for first 2 because they start with x. i thought we can do a data to rows and then do a search for change using next_value kinda. is there a better and easier way to do it.

got it!

Mani, November 27, 2013 - 10:08 am UTC

Hi Tom,

here is the code i used to fix the issue.

select regexp_replace('xxAAxxA','[^x]',' '), decode(substr(regexp_replace('xxAAxxA','[^x]',' '),1,1),' ',0,length(trim(regexp_replace('xxAAAA','[^x]',' ')))) x_cnt from dual;

please share any better n easier alternate there.

thanks
Mani

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library