lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Heisey (JIRA)" <>
Subject [jira] [Commented] (LUCENE-5705) ConcurrentMergeScheduler/maxMergeCount default is too low
Date Sun, 25 May 2014 13:53:01 GMT


Shawn Heisey commented on LUCENE-5705:

The javadoc changes that I made do need to change again if we don't also make the code changes.

I think the new javadoc need to be the following:

   * Sets the maximum number of merge threads and simultaneous merges allowed.
   * @param maxMergeCount the max # simultaneous merges that are allowed.
   *       If a merge is necessary yet we already have this many
   *       threads running, the incoming thread (that is calling
   *       add/updateDocument) will block until a merge thread
   *       has completed.  If index data is coming from a source that is
   *       sensitive to inactivity timeouts (like JDBC), it is advisable to
   *       set this value higher than default so that the incoming thread
   *       never stops.  Note that we will only run the smallest
   *       <code>maxThreadCount</code> merges at a time.
   * @param maxThreadCount the max # simultaneous merge threads that should
   *       be running at once.  This must be &lt;= <code>maxMergeCount</code>.
   *       Most setups should use the default value of 1 here.
   *       If the index is on Solid State Disk and there are
   *       plenty of CPU cores available, it is usually safe to
   *       run more threads simultaneously.

I did notice the following comment in the 4x branch, but this has not been my experience with
Solr.  Older versions seemed to prefer running the largest merge to completion before doing
the smaller ones.  The behavior described here would be preferable.  If the comment is accurate,
does anyone know when it changed?  I originally ran into my problem back on Solr 1.4.1 (Lucene
2.9), and I am pretty sure that some of the people I've helped on the mailing list and IRC
were running some 4.x version.

  // Max number of merge threads allowed to be running at
  // once.  When there are more merges then this, we
  // forcefully pause the larger ones, letting the smaller
  // ones run, up until maxMergeCount merges at which point
  // we forcefully pause incoming threads (that presumably
  // are the ones causing so much merging).

> ConcurrentMergeScheduler/maxMergeCount default is too low
> ---------------------------------------------------------
>                 Key: LUCENE-5705
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: 4.8
>            Reporter: Shawn Heisey
>            Assignee: Shawn Heisey
>            Priority: Minor
>             Fix For: 4.9
>         Attachments: LUCENE-5705.patch, LUCENE-5705.patch, dih-example.patch
> The default value for maxMergeCount in ConcurrentMergeScheduler is 2.  This causes problems
for Solr's dataimport handler when very large imports are done from a JDBC source.
> What happens is that when three merge tiers are scheduled at the same time, the add/update
thread will stop for several minutes while the largest merge finishes.  In the meantime, the
dataimporter JDBC connection to the database will time out, and when the add/update thread
resumes, the import will fail because the ResultSet throws an exception.  Setting maxMergeCount
to 6 eliminates this issue for virtually any size import -- although it is theoretically possible
to have that many simultaneous merge tiers, I've never seen it.
> As long as maxThreads is properly set (the default value of 1 is appropriate for most
installations), I cannot think of a really good reason that the default for maxMergeCount
should be so low.  If someone does need to strictly control the number of threads that get
created, they can reduce the number.  Perhaps someone with more experience knows of a really
good reason to make this default low?
> I'm not sure what the new default number should be, but I'd like to avoid bikeshedding.
 I don't think it should be Integer.MAX_VALUE.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message