lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Condit <>
Subject RE: Best practices for searcher memory usage?
Date Thu, 15 Jul 2010 18:53:02 GMT
> [Toke: No frequent updates]
> So everything is rebuild from scratch each time? Or do you mean that you're
> only adding new documents, not changing old ones?

Everything is reindexed from scratch - indexing speed is not essential to us...

> Either way, optimizing to a single 140GB segment is heavy. Ignoring the
> relatively light processing of the data, the I/O for merging is still at the very
> minimum to read and write the 140GB. Even if you can read and write
> 100MB/sec it still takes an hour. This is of course not that relevant if you're
> fine with a nightly batch job.

Sorry - I wasn't clear here. The total index size ends up being 140GB but to try to help improve
performance we build 50 separate indexes (which end up being a bit under 3gb each) and then
open them with a parallel multisearcher. The only reason I tried this multisearcher approach
was to toy around with Katta which ended up not working out for us. I can also deploy it as
a RemoteSearchable (although I'm not sure if this is deprecated or not).
> > By more segments do you mean not call optimize() at index time?
> Either that or calling it with maxNumSegments 10, where 10 is just a wild
> guess. Your mileage will vary:
> xWriter.html#optimize%28int%29

Is preferred(in terms of performance) to the above approach (splitting into multiple indexes)?

> As Erick Erickson recently wrote: "Since it doesn't make sense to me, that
> must mean I don't understand the problem very thoroughly".

Not yet! I've added some benchmarking code to keep track of all performance as I add these
changes. Do you happen to know if the Lucene benchmark package is still in use / a good thing
to toy around with?

Thanks for all your suggestions,

View raw message