Skip to Main Content

Breadcrumb

Question and Answer

Sergiusz Wolicki

Thanks for the question, Alex.

Asked: January 06, 2021 - 1:23 pm UTC

Last updated: January 11, 2021 - 2:03 am UTC

Version: 19.8.0.0

Viewed 100+ times

You Asked

Hello TOM and Happy New Year!

We have a table with a VARCHAR2(200) column that stores `names` with European accents: Valérie, Véronique, and so on.

We would like to perform a CI/AI (case-insensitive, accent-insensitive search) on those values, i.e. people searching for "valerie"/"VALERIE" will still fetch "Valérie" and so on.

I am thinking of several options:
(1)
- UTL_RAW.CAST_TO_VARCHAR2(NLSSORT(name , 'nls_sort=binary_ai')) = UTL_RAW.CAST_TO_VARCHAR2(NLSSORT(input_name , 'nls_sort=binary_ai'))
But this is awfully slow in SQL (query response time 3-5x slower than without).
(2) Call this
EXECUTE IMMEDIATE('ALTER SESSION SET NLS_COMP = LINGUISTIC');
EXECUTE IMMEDIATE('ALTER SESSION SET NLS_SORT = BINARY_AI');

But this is bringing me back to (1).

(3)
Create an invisible column called `curated_name` that is generated always as
UTL_RAW.CAST_TO_VARCHAR2(NLSSORT(name , 'nls_sort=binary_ai'))
But Oracle complains function is not deterministic (which is true, as it depends on the language).

(4)
Create a column with collation BINARY_AI, but I've just discovered a new Oracle bug: 32335770 - COLLATION BINARY_AI NOT RETURNING ALL IDENTICAL ROWS

(5) Oracle text, but this requires careful maintenance.


And (other) ideas ? I'm inclined to use (4), if only the bug is solved by Oracle.

Thanks,

and we said...


Bug 32335770 affects all your solutions that reference BINARY_AI. I suspect Oracle Text may be affected as well, though one would need to check. I do not know/remember whether Oracle Text uses the same base letter mapping table as BINARY_AI.

With compatible>=12.2.0 and max_string_size=extended, you can define the column as BINARY_AI. However, this will use this collation for all SQL comparisons of the column values, unless overridden. Otherwise, since Oracle Database 12.2, without the parameters set, you can just say:

name COLLATE BINARY_AI = input_name


in those specific conditions in which you really want accent-insensitive comparison. For now, bug 32335770 will prevent some specific characters to compare properly, though most should work fine. We are working on finding a solution to this bug. To work around it, you would need to explicitly map problematic characters to their non-accented counterparts using the TRANSLATE function and then compare with COLLATE BINARY_AI. Note, the bug specifically talks about LATIN CAPITAL LETTER L WITH STROKE, which is used in Polish only, but we have not yet checked whether more characters are affected.

Note that by design, the BINARY_AI collation does not equivalence certain letters that you would normally want to be compared as equal. For example, it will not compare German eszet 'ß' as 'ss' and it will not compare the ligature 'œ' as 'oe'. For this, you may need to use one of the more expensive but more versatile collations such as GENERIC_M_AI or UCA0700_ORAROOT_AI.



Rating

  (2 ratings)

Comments

Feedback

Alex, January 08, 2021 - 4:17 am UTC

Thanks Sergiusz.
Speaking of Bug 32335770 (around word ZESPÓŁ):
I'm not a Polish speaker, would "Ł" be closest converted to "L" in Latin/English ? As "Ó" is to "O" ?
The other way put, if someone enters "zespol" as sought string, should we bring also "ZESPÓŁ" (if it exists in our records) in the results list (for accent-insensitive search, using BINARY_AI column collation)?

Thanks,

Solution

Alex, January 10, 2021 - 5:10 pm UTC

For those having a similar problem, I invite you to check `UCA0700_DUCET_AI`collation.
https://docs.oracle.com/en/database/oracle/oracle-database/18/nlspg/linguistic-sorting-and-matching.html#GUID-5C93ABE8-13D6-4F34-8855-B416BB810851

It solved my issue (when I was using `BINARY_AI` collation).
Now, "Ł" is reduced to "l" (lowercase L).
Connor McDonald
January 11, 2021 - 2:03 am UTC

Thanks for getting back to us.