lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <>
Subject Re: Difference in behaviour between LowerCaseFilter and String.toLowerCase()
Date Sat, 01 Dec 2012 09:02:51 GMT
Iterating character-by-character is different than considering the
entire string at once so your observation is correct, that's how it's
supposed to work. In particular, note this in String#toLowerCase

"Since case mappings are not always 1:1 char mappings, the resulting
String may be a different length than the original String."

So it simply cannot be the same as iterating char-by-char.


On Sat, Dec 1, 2012 at 6:32 AM, Trejkaz <> wrote:
> On Fri, Nov 30, 2012 at 8:22 PM, Ian Lea <> wrote:
>> Sounds like a side effect of possibly different, locale-dependent,
>> results of using String.toLowerCase() and/or Character.toLowerCase().
>> specifically mentions Turkish.
>> A Google search for "Character.toLowerCase() turkish" gets hits which
>> sound relevant.
> Certainly Turkish has special rules because of that uppercase I with
> dot. I was more wondering whether LowerCaseFilter was intentionally
> doing it differently to String.toLowerCase() or whether it was some
> kind of unintentional side-effect of using Character.toLowerCase()
> iteratively.
> TX
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message