lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: flushRamSegments() is "over merging"?
Date Wed, 16 Aug 2006 22:53:42 GMT
On 8/16/06, Doron Cohen <DORONC@il.ibm.com> wrote:
> Under-merging would hurt search, unless optimize is called explicitly, but
> the index should "behave" without requiring the user to call optimize. 388
> deals with this.

Depends on what you mean by "behave" :-)
More segments than expected can cause failure because of file
descriptor exhaustion.  It's nice to have a calculable cap on the
number of segments. It also depends on exactly what one thinks the
index invariants should be w.r.t. mergeFactor.

> Over-merging - in current flushRamSegments() code - would merge at most
> merge-factor documents prematurely.

Right.

>  Since merge-fatcor is usually not very
> large, this might be a minor issue - but still, if an index is growing by
> small doses, does it make sense to re-merge with the last disk segment each
> time the index is closed? Why not letting it be simply controlled by
> maybeMergeSegments?

I personally see mergeFactor as the maximum number of segments at any
level in the index, with level defined by
docsInSegment/maxBufferedDocs.

maybeMergeSegments doesn't enforce this in the presence of partially
filled segments because it counts documents and not segments.  Since
partially filled segments aren't written in a single IndexWriter
session, this only needs to be checked for on a close().

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message