directory-api mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <>
Subject Re: Prepare String
Date Wed, 06 Apr 2016 11:41:12 GMT
Le 06/04/16 09:35, Emmanuel Lécharny a écrit :
> Le 06/04/16 08:47, Stefan Seelmann a écrit :
>> On 04/06/2016 01:05 AM, Emmanuel Lécharny wrote:
>>> So for the record, after a couple of hours working on it tonite, I get
>>> the DeepTrimToLowerNormalizer() working fine, with tests passing.
>>> I was also able to improve the performances of the beast : from 20
>>> seconds to normalize 10 000 000 or String like "xs crvtbynU 
>>> Jikl7897790", down to 4.3s. I just assumed that most of the time, we
>>> will deal with chars between 0x00 and 0x7F, and wrote a specific
>>> function for that. If we have chars above 0x7F, then an exception is
>>> thrown and we fell back to the complexe process, which will then take
>>> 47s instead of 20s.
>>> So this is a balance :
>>> - we have an implementation that covers all the chars, and takes 20s for
>>> 10M Strings
>>> - we have an implementation that tries to process the String if chars
>>> are in [0c00, 0x7F] and takes 4.3 s for 10M Strings, but takes 47
>>> seconds if we have a char outside this range.
>>> Beside the obvious gain, there is another reason why I wanted to do that
>>> : processing IA5String values will benefit from this separation, and
>>> that covers numerous AttributeTypes (like mail, homeDirectory,
>>> memberUid, krb5principalname, krb5Realmname, and a lot more.
>>> wdyt ? Going for an average of 20s no matter what, or accepting a huge
>>> penalty when the String does not contain ASCII chars ?
>> I'd go for the 2nd optimized way.
>> Is the cause of the penalty only the exception-throw-catch? 
> It's part of it. Changing the code to use a static Exception that is
> being thrown, instead of creating a new exception everytime saves 20s.
> This is probably teh way to go : we benefit from a huge improvement when
> the String is pure ASCII, and the penalty is just the time spent in this
> phase if this is not the case. Here are the new numbers :
> - pure ASCII String : 4s
> - non ASCII String : 24,8s
> - catch-all solution (ie, no ASCII optimisation) : 20s
> Way better than the previous solution by simpy adding :
>     /** An exception used to get out of the map method quickly */
>     private static final ArrayIndexOutOfBoundsException AIOOBE = new
> ArrayIndexOutOfBoundsException();
> and throwing AIOOBE in the ascii method...
> Otherwise, there are other parts that can be improved : we always
> process a String in the map(), normalize(), checkProhibited() and
> insignifiantSpacesString() methods. That means weget the char[] out of
> the String, and create a new String. We could most certainly do it only
> once at least for the 2 last methods that are run consecutively (the
> normalize() method uses a Java method that expect a String()).
> I'll check that tonite.
> Thanks for the feedback !
Quick update, I spent 30 minutes during lunch to improve the ASCII pasrt
: down below 4s (instead of 4.3) by using char[] instead of String for
the checkProhibited() and insignifiantSpacesString() method. Small gain,
but stll : this is a 10% improvement :)

View raw message