lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: BufferedUpdateStreams breaks high performance indexing
Date Thu, 28 Jul 2016 13:35:10 GMT
Hmm not good.

If you are really only adding documents, you should be using
IndexWriter.addDocument, which won't buffer any deleted terms and that
method call should be a no-op.  It also makes flushes more efficient since
all of your indexing buffer goes to the added documents, not buffered
delete terms.  Are you using updateDocument?

Can you reproduce this slowness on a newer release?  There have been
performance issues fixed in newer releases in this method, e.g

Have you changed any IndexWriterConfig settings from defaults?

What are your unique id fields like?  How many bytes in length?

Mike McCandless

On Thu, Jul 28, 2016 at 5:01 AM, Bernd Fehling <> wrote:

> While trying to get higher performance for indexing it turned out that
> BufferedUpdateStreams is breaking indexing performance.
> public synchronized ApplyDeletesResult applyDeletesAndUpdates(...)
> At IndexWriterConfig I have setRAMBufferSizeMB=1024 and the Lucene 4.10.4
> API states:
> "Determines the amount of RAM that may be used for buffering added
> documents and deletions before they are flushed to the Directory.
> Generally for faster indexing performance it's best to flush by RAM
> usage instead of document count and use as large a RAM buffer as you can."
> Also setMaxBufferedDocs=-1 and setMaxBufferedDeleteTerms=-1.
> BD 0 [Wed Jul 27 13:42:03 GMT+01:00 2016; Thread-27890]: applyDeletes:
> infos=...
> BD 0 [Wed Jul 27 14:38:55 GMT+01:00 2016; Thread-27890]: applyDeletes took
> 3411845 msec
> About 56 minutes no indexing and only applying deletes.
> What is it deleting?
> If the index gets bigger the time gets longer, currently 2.5 hours of
> waiting.
> I'm adding 96 million docs with uniq id, no duplicates, only add, no
> deletes.
> Any suggestions which config is _really_ going for high performance
> indexing?
> Best regards,
> Bernd
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message