commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (LANG-1406) StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase
Date Wed, 05 Sep 2018 15:33:00 GMT


ASF GitHub Bot commented on LANG-1406:

Github user HiuKwok commented on the issue:
    To whom who interested in this issue, here is some founding that I discovered throughout
this month of issue solving. 
     - The exception would happened when any String object passed in with unicode character.
In order to achieve ignore case replacement, the internal logic would first transform both
`text` and `SearchString` to lowerCase( ) for comparaition.   
    - However if anyone passion enough to digger deeper into the src logic of `.toLowerCase(
)`. Certain unicode character would be denormalized. In this way the result String length
would tend to longer than original length().  Example like:  ![image](
    So making use of the transformed String, Out bound exception would happen when trying
to access the index that doesn't access at all (3 in this case vs 2 in length before lowerCase).
     - So the first thought into my mind is, why dun just normalize both `text` and `searchString`
before performing ignore case comparation? In this way the String length would always stay
consistence no matter `toLowerCase( )` or `toUpperCase( )` 3 -> 3.  However the another
problem would emerged, as you may noticed, while the String mentioned above denormalize, it
would turn into a UpperCase I and a dot sign. 
    - But what happen if the search pattern emerge into searchText in decompose form. In this
case let say I am trying to match a upper [I]. Then mismatch would happen and this is certain
not the desire behavior of this method I believe. 
    BTW I Drafted a simple main method to demonstrate how mismatch would happen in here.

> StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase
> ----------------------------------------------------------------
>                 Key: LANG-1406
>                 URL:
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>            Reporter: Michael Ryan
>            Priority: Major
> {code}
> StringUtils.replaceIgnoreCase("\u0130x", "x", "")
> {code}
> EXPECTED: "\u0130" is returned.
> ACTUAL: StringIndexOutOfBoundsException
> This happens because the replace method is assuming that text.length() == text.toLowerCase().length(),
which is not true for certain characters.

This message was sent by Atlassian JIRA

View raw message