lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Heisey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5705) ConcurrentMergeScheduler/maxMergeCount default is too low
Date Sun, 25 May 2014 18:47:01 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008402#comment-14008402
] 

Shawn Heisey commented on LUCENE-5705:
--------------------------------------

bq. disabling merges while you import data will improve latency in that respect.

If I had a Lucene program, turning off merging is likely a very simple thing to do.  With
Solr, is that possible to change without filesystem (solrconfig.xml) modification, and without
restarting Solr or reloading cores?  If it is, I could do an optimize as the last step of
a full rebuild.  The lack of merging during the rebuild, followed by an optimize at the end,
would probably be faster than what happens now.  If I have to change the config and restart/reload,
then this is not something I can implement -- anyone who has access can currently kick off
a rebuild simply by changing an entry in a MySQL database table.  The SolrJ program notices
this and starts all the the dataimport handlers in the build cores.  Managing filesystem changes
from a Java program across multiple machines is not something I want to try.  If I switched
to SolrCloud, config changes are relatively easy using the zkCli API, but switching to SolrCloud
would actually lead to a loss of functionality in my index.

Once the index is built, my SolrJ program does a full optimize on one cold shard per day,
so it takes six days for the whole index.  The hot shard is optimized once an hour -- only
takes about 30 seconds.


> ConcurrentMergeScheduler/maxMergeCount default is too low
> ---------------------------------------------------------
>
>                 Key: LUCENE-5705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5705
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: 4.8
>            Reporter: Shawn Heisey
>            Assignee: Shawn Heisey
>            Priority: Minor
>             Fix For: 4.9
>
>         Attachments: LUCENE-5705.patch, LUCENE-5705.patch, dih-example.patch
>
>
> The default value for maxMergeCount in ConcurrentMergeScheduler is 2.  This causes problems
for Solr's dataimport handler when very large imports are done from a JDBC source.
> What happens is that when three merge tiers are scheduled at the same time, the add/update
thread will stop for several minutes while the largest merge finishes.  In the meantime, the
dataimporter JDBC connection to the database will time out, and when the add/update thread
resumes, the import will fail because the ResultSet throws an exception.  Setting maxMergeCount
to 6 eliminates this issue for virtually any size import -- although it is theoretically possible
to have that many simultaneous merge tiers, I've never seen it.
> As long as maxThreads is properly set (the default value of 1 is appropriate for most
installations), I cannot think of a really good reason that the default for maxMergeCount
should be so low.  If someone does need to strictly control the number of threads that get
created, they can reduce the number.  Perhaps someone with more experience knows of a really
good reason to make this default low?
> I'm not sure what the new default number should be, but I'd like to avoid bikeshedding.
 I don't think it should be Integer.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message