lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kireet Reddy <>
Subject small segment sizes
Date Sun, 06 Jul 2014 18:47:14 GMT
I am trying to understand why I am seeing very small segment sizes during indexing. I am using
elasticsearch and one node sees heavy merge activity. After enabling info stream logs it seems
that the node is doing more, smaller merges than the other nodes. In the TMP logs, I see a
lot of merges of segments much smaller than the floor size, some only a few KB. After some
research, it seems that lucene writes segments per IndexWriter, so small segments could come
about if using a lot of writers, but not writing much data. This could definitely happen in
my setup as some indices don’t take many writes but the writer is flushed every 30 seconds
to make those writes available for search.

The puzzling thing to me is that there seems to be some governor somewhere. The machine CPU
is at about 50% user and I/O usage is low. I would have expected resource utilization to get
maxed out. If I can understand what’s limiting things, perhaps I can raise that limit. Otherwise
I can’t think of anything else to try except flushing less frequently. 

I do see this message much more often (4x) on the problematic node:

 DW: DocumentsWriter has queued dwpt; will hijack this thread to flush pending segment(s)

Is that something to worry about?
To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message