lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: "batch-update"-pattern, NoMergeScheduler?
Date Tue, 23 Dec 2014 10:53:30 GMT

I can't give an exact answer to your question but my experience has
been that it's best to leave all the merge/buffer/etc settings alone.
If you are doing a bulk update of a large number of docs then it's no
surprise that you are seeing a heavy IO load.  If you can, it's likely
to be worth giving lucene a dedicated disk or at least make sure
there's as little contention as possible - that's just general advice
for any workload.  There is always going to a limiting factor

You could also experiment with multiple threads, or multiple jobs
writing to separate indexes with a standalone merge at the end.  In my
experience these have generally been more trouble than they're worth,
but the occasions when I do bulk loads of large number of docs are
sufficiently rare that I'm not too bothered how long it takes.



On Mon, Dec 22, 2014 at 9:45 AM, Clemens Wyss DEV <> wrote:
> One of our indexes is updated completely quite frequently -> "batch update" or "re-index".
> If so more than 2million documents are added/updated to/in the very index. This creates
an immense IO load on our system. Does it make sense to set merge scheduler to NoMergeScheduler
(and/or MergePolicy to NoMergePolicy). Or is merging "not relevant" as the commit is done
at the very end only?
> Context information:
> At the moment the writer's config consists only of setRAMBufferSizeMB:
> IndexWriterConfig config = new IndexWriterConfig( IndexManager.CURRENT_LUCENE_VERSION,
analyzer );
> config.setMergePolicy( NoMergePolicy.NO_COMPOUND_FILES );
> //config.setMergeScheduler( NoMergeScheduler.INSTANCE );
> config.setRAMBufferSizeMB( 20 );
> The update logic is as follows:
> indexWriter.deleteAll()
> ...
> for all elements do {
> ...
> indexWriter.updateDocument( term, doc ); // in order to omit "duplicate entries"
> ...
> }
> indexWriter.commit
> What is the proposed way to perform such a batch update?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message