lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Heisey (JIRA)" <>
Subject [jira] [Commented] (LUCENE-5705) ConcurrentMergeScheduler/maxMergeCount default is too low
Date Sun, 25 May 2014 17:15:02 GMT


Shawn Heisey commented on LUCENE-5705:

I do see evidence in the infostream that I'm currently creating that merges are done out of
order with preference to small merges.

IW 4 [Sun May 25 09:43:57 MDT 2014; Lucene Merge Thread #11]: merge time 47224 msec for 563274
IW 4 [Sun May 25 09:52:39 MDT 2014; Lucene Merge Thread #13]: merge time 8761 msec for 68640
IW 4 [Sun May 25 09:53:44 MDT 2014; Lucene Merge Thread #12]: merge time 266527 msec for 4227876

When I was having the problem I described (which was admittedly a long time ago, Solr 1.4.0
most likely), I was using the old default, LogByteSizeMergePolicy.  Would that have been using
CMS, or a different scheduler?  When no scheduler is configured in Solr 4.x, does it choose
CMS?  I would think that it does.

I have seen others have this problem very recently on the mailing list and IRC.  I'm reasonably
sure that at least one of them was on a 4.x release.  Bumping up maxMergeCount has fixed it
for those people, just like it did for me.

> ConcurrentMergeScheduler/maxMergeCount default is too low
> ---------------------------------------------------------
>                 Key: LUCENE-5705
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: 4.8
>            Reporter: Shawn Heisey
>            Assignee: Shawn Heisey
>            Priority: Minor
>             Fix For: 4.9
>         Attachments: LUCENE-5705.patch, LUCENE-5705.patch, dih-example.patch
> The default value for maxMergeCount in ConcurrentMergeScheduler is 2.  This causes problems
for Solr's dataimport handler when very large imports are done from a JDBC source.
> What happens is that when three merge tiers are scheduled at the same time, the add/update
thread will stop for several minutes while the largest merge finishes.  In the meantime, the
dataimporter JDBC connection to the database will time out, and when the add/update thread
resumes, the import will fail because the ResultSet throws an exception.  Setting maxMergeCount
to 6 eliminates this issue for virtually any size import -- although it is theoretically possible
to have that many simultaneous merge tiers, I've never seen it.
> As long as maxThreads is properly set (the default value of 1 is appropriate for most
installations), I cannot think of a really good reason that the default for maxMergeCount
should be so low.  If someone does need to strictly control the number of threads that get
created, they can reduce the number.  Perhaps someone with more experience knows of a really
good reason to make this default low?
> I'm not sure what the new default number should be, but I'd like to avoid bikeshedding.
 I don't think it should be Integer.MAX_VALUE.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message