directory-api mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <>
Subject Re: Prepare String
Date Wed, 06 Apr 2016 07:35:54 GMT
Le 06/04/16 08:47, Stefan Seelmann a écrit :
> On 04/06/2016 01:05 AM, Emmanuel Lécharny wrote:
>> So for the record, after a couple of hours working on it tonite, I get
>> the DeepTrimToLowerNormalizer() working fine, with tests passing.
>> I was also able to improve the performances of the beast : from 20
>> seconds to normalize 10 000 000 or String like "xs crvtbynU 
>> Jikl7897790", down to 4.3s. I just assumed that most of the time, we
>> will deal with chars between 0x00 and 0x7F, and wrote a specific
>> function for that. If we have chars above 0x7F, then an exception is
>> thrown and we fell back to the complexe process, which will then take
>> 47s instead of 20s.
>> So this is a balance :
>> - we have an implementation that covers all the chars, and takes 20s for
>> 10M Strings
>> - we have an implementation that tries to process the String if chars
>> are in [0c00, 0x7F] and takes 4.3 s for 10M Strings, but takes 47
>> seconds if we have a char outside this range.
>> Beside the obvious gain, there is another reason why I wanted to do that
>> : processing IA5String values will benefit from this separation, and
>> that covers numerous AttributeTypes (like mail, homeDirectory,
>> memberUid, krb5principalname, krb5Realmname, and a lot more.
>> wdyt ? Going for an average of 20s no matter what, or accepting a huge
>> penalty when the String does not contain ASCII chars ?
> I'd go for the 2nd optimized way.
> Is the cause of the penalty only the exception-throw-catch? 

It's part of it. Changing the code to use a static Exception that is
being thrown, instead of creating a new exception everytime saves 20s.

This is probably teh way to go : we benefit from a huge improvement when
the String is pure ASCII, and the penalty is just the time spent in this
phase if this is not the case. Here are the new numbers :

- pure ASCII String : 4s
- non ASCII String : 24,8s
- catch-all solution (ie, no ASCII optimisation) : 20s

Way better than the previous solution by simpy adding :

    /** An exception used to get out of the map method quickly */
    private static final ArrayIndexOutOfBoundsException AIOOBE = new

and throwing AIOOBE in the ascii method...

Otherwise, there are other parts that can be improved : we always
process a String in the map(), normalize(), checkProhibited() and
insignifiantSpacesString() methods. That means weget the char[] out of
the String, and create a new String. We could most certainly do it only
once at least for the 2 last methods that are run consecutively (the
normalize() method uses a Java method that expect a String()).

I'll check that tonite.

Thanks for the feedback !

View raw message