lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5705) ConcurrentMergeScheduler/maxMergeCount default is too low
Date Sat, 24 May 2014 15:36:01 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008145#comment-14008145
] 

Michael McCandless commented on LUCENE-5705:
--------------------------------------------

The purpose of maxMergeCount is to put back pressure on ongoing indexing when merges are falling
behind.  It's very bad when merges fall behind because you get too many segments in the index,
searching slows down, PK (id) lookups slow down, too many file handles opened on NRT readers,
etc.

The current default maxZMergeCount (2) means that if 2 merges are already needed (one is running)
and a 3rd merge shows up, then the incoming thread is stalled until the merges can catch up.
 Maybe we can increase it to 3, but I don't think we should go higher than that by default.

Maybe Solr can increase this limit temporarily while importing from JDBC?  Or maybe we need
a less "harsh" way to apply back-pressure, e.g. in Elasticsearch we force indexing to be single-threaded
(not outright stopped) when merges can't keep up.

Do you know why merges can't keep up in your use case?  E.g. are you throttling the merge
IO?

> ConcurrentMergeScheduler/maxMergeCount default is too low
> ---------------------------------------------------------
>
>                 Key: LUCENE-5705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5705
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: 4.8
>            Reporter: Shawn Heisey
>            Assignee: Shawn Heisey
>            Priority: Minor
>             Fix For: 4.9
>
>         Attachments: LUCENE-5705.patch
>
>
> The default value for maxMergeCount in ConcurrentMergeScheduler is 2.  This causes problems
for Solr's dataimport handler when very large imports are done from a JDBC source.
> What happens is that when three merge tiers are scheduled at the same time, the add/update
thread will stop for several minutes while the largest merge finishes.  In the meantime, the
dataimporter JDBC connection to the database will time out, and when the add/update thread
resumes, the import will fail because the ResultSet throws an exception.  Setting maxMergeCount
to 6 eliminates this issue for virtually any size import -- although it is theoretically possible
to have that many simultaneous merge tiers, I've never seen it.
> As long as maxThreads is properly set (the default value of 1 is appropriate for most
installations), I cannot think of a really good reason that the default for maxMergeCount
should be so low.  If someone does need to strictly control the number of threads that get
created, they can reduce the number.  Perhaps someone with more experience knows of a really
good reason to make this default low?
> I'm not sure what the new default number should be, but I'd like to avoid bikeshedding.
 I don't think it should be Integer.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message