lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Watson <>
Subject Re: Memory Usage
Date Fri, 04 Jul 2008 06:45:46 GMT

Thanks very much for this; I'll give it a shot.


On 4 Jul 2008, at 00:02, Paul Smith wrote:

>> (there are around 6,000,000 posts on the message board database)
>> Date encoded as yyMMdd: appears to be using around 30M
>> Date encoded as yyMMddHHmmss:  appears to be using more than 400M!
>> I guess I would have understood if I was seeing the usage double  
>> for sure, or even a little more; no idea how you guys encode the  
>> indexes, if at all, but it's gone up over tenfold, which I can't  
>> explain.
> Sort memory cost is based on the total # of unique terms for the  
> given field (multiplied by the number of locale's involved if you  
> have to do that too! but in temporal sorting you don't).
> This is easier than you think, just use 2 fields (date, time) and  
> sort by both.  This means the Date field's unique term count grows  
> only 1 term per day.  The Time field can be set to minutes (if you  
> can get away with that) meaning that you only have fairly  
> insignificant total term count for the time field.  We use this at  
> Aconex,  and have indexes with millions of records (weekly 'work'  
> searcher refreshed every 5 seconds, archive searcher is held in  
> memory, with a Multisearcher done over the 2) and it works a treat.   
> We regularly need to return million+ results from a search (don't  
> ask) using this sort of sorting and the overall search time is only  
> a few seconds.
> On a related note, work hard not to need to use Locale sensitive  
> sorting if you can for any other fields, for large results the CPU  
> penalty is horrific (even once you get past the synchronization  
> bottleneck in the CollationKey stuff).
> cheers,
> Paul Smith
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message