Looks like the disk check here is the problem, I am no Java developer, but this patch ignores the check if you are using the link method for splitting. Attached the patch. This is off of the commit for 7.7.2, d4c30fc285 . The modified version only has to be run on the overseer machine, so there is that at least.


From: Andrew Kettmann
Sent: Tuesday, June 18, 2019 11:32:43 AM
To: solr-user@lucene.apache.org
Subject: Solr 7.7.2 - SolrCloud - SPLITSHARD - Using LINK method fails on disk usage checks
 

Using Solr 7.7.2 Docker image, testing some of the new autoscale features, huge fan so far. Tested with the link method on a 2GB core and found that it took less than 1MB of additional space. Filled the core quite a bit larger, 12GB of a 20GB PVC, and now splitting the shard fails with the following error message on my overseer:


2019-06-18 16:27:41.754 ERROR (OverseerThreadFactory-49-thread-5-processing-n:10.0.192.74:8983_solr) [c:test_autoscale s:shard1  ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test_autoscale operation: splitshard failed:org.apache.solr.common.SolrException: not enough free disk space to perform index split on node 10.0.193.23:8983_solr, required: 23.35038321465254, available: 7.811378479003906
    at org.apache.solr.cloud.api.collections.SplitShardCmd.checkDiskSpace(SplitShardCmd.java:567)
    at org.apache.solr.cloud.api.collections.SplitShardCmd.split(SplitShardCmd.java:138)
    at org.apache.solr.cloud.api.collections.SplitShardCmd.call(SplitShardCmd.java:94)
    at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:294)
    at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)
    at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)


I attempted sending the request to the node itself to see if it did anything different, but no luck. My parameters are (Note Python formatting as that is my language of choice):



splitparams = {'action':'SPLITSHARD',
               'collection':'test_autoscale',
               'shard':'shard1',
               'splitMethod':'link',
               'timing':'true',
               'async':'shardsplitasync'}


And this is confirmed by the log message from the node itself:


2019-06-18 16:27:41.730 INFO  (qtp1107530534-16) [c:test_autoscale   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={async=shardsplitasync&timing=true&action=SPLITSHARD&collection=test_autoscale&shard=shard1&splitMethod=link} status=0 QTime=20

While it is true I do not have enough space if I were using the rewrite method, the link method on a 2GB core used an additional less than 1MB of space. Is there something I am missing here? is there an option to disable the disk space check that I need to pass? I can't find anything in the documentation at this point.



Andrew Kettmann
DevOps Engineer
P: 1.314.596.2836

LinkedIn Twitter Instagram

evolve24 Confidential & Proprietary Statement: This email and any attachments are confidential and may contain information that is privileged, confidential or exempt from disclosure under applicable law. It is intended for the use of the recipients. If you are not the intended recipient, or believe that you have received this communication in error, please do not read, print, copy, retransmit, disseminate, or otherwise use the information. Please delete this email and attachments, without reading, printing, copying, forwarding or saving them, and notify the Sender immediately by reply email. No confidentiality or privilege is waived or lost by any transmission in error.