lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Static index, fastest way to do forceMerge
Date Fri, 02 Nov 2018 16:56:46 GMT
The merge process is rather tricky, and there's nothing that I know of
that will use all resources available. In fact the merge code is
written to _not_ use up all the possible resources on the theory that
there should be some left over to handle queries etc.

Yeah, the situation you describe is indeed one of the few where
merging down to 1 segment makes sense. Out of curiosity, what kind of
performance gains to you see?

This applies to the default TieredMergePolicy (TMP):

1> there is a limit to the number of segments that can be merged at
once, so sometimes it can take more than one pass. If you have more
than 30 segments, it'll be multi-pass. You can try (and I haven't done
this personally) setting maxMergeAtOnceExplicit in your solrconfig.xml
to see if it helps. That only takes effect when you forceMerge.
There's a trick bit of reflection that handles this, see the very end
of TieredMergePolicy.java for the parameters you can set.

2> As of Solr 7.5 (see LUCENE-7976) the default behavior has changed
from automatically merging down to 1 segment to respecting
"maxMergedSegmentMB" (default 5G). You will have to explicitly pass
maxSegments=1 to get the old behavior.

Best,
Erick
On Fri, Nov 2, 2018 at 3:13 AM Jerven Bolleman
<jerven.bolleman@sib.swiss> wrote:
>
> Dear Lucene Devs and Users,
>
> First of all thank you for this wonderful library and API.
>
> forceMerges are normally not recommended but we fall into one of the few
> usecases where it makes sense.
>
> In our use case we have a large index (3 actually) and we don't update
> them ever after indexing. i.e. we index all the documents and then never
> ever add another document to the index, nor are any deleted.
>
> It has proven beneficial for search performance to always foreMerge down
> to one segment. However, this takes significant time. Are there any
> suggestions on what kind of merge scheduler/policy settings will utilize
> the most of the available IO, CPU and RAM capacity? Currently we end up
> being single thread bound, leaving lots of potential cpu and bandwidth
> not used during the merge.
>
> e.g. we are looking for a MergeEvertyThing use all hardware policy and
> scheduler.
>
> We are currently on lucene 7.4 but nothing is stopping us from upgrading.
>
> Regards,
> Jerven
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message