lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Obernberger <>
Subject Re: Recovery Issue - Solr 6.6.1 and HDFS
Date Tue, 21 Nov 2017 18:20:23 GMT
Erick - thank you very much for the reply.  I'm still working through 
restarting the nodes one by one.

I'm stopping the nodes with the script, but yes - they are being killed 
forcefully because they are in this recovery, failed, retry loop.  I 
could increase the timeout, but they never seem to recover.

The largest tlog file that I see currently is 222MBytes. Autocommit is 
set to 1800000 and autoSoftCommit is set to 120000. Errors when they are 
in the long recovery are things like:

2017-11-20 21:41:29.755 ERROR 
x:UNCLASS_shard37_replica1 s:shard37 c:UNCLASS r:core_node196) 
[c:UNCLASS s:shard37 r:core_node196 x:UNCLASS_shard37_replica1] 
o.a.s.h.IndexFetcher Error closing file: _8dmn.cfs
org.apache.hadoop.ipc.RemoteException( File 
could only be replicated to 0 nodes instead of minReplication (=1).  
There are 39 datanode(s) running and no node(s) are excluded in this 

Complete log is here for one of the shards that was forcefully stopped.

As to what is in the logs when it is recovering for several hours, it's 
many WARN messages from the DFSClient such as:
Excluding datanode 

or from the IndexFetcher:

File _a96y.cfe did not match. expected checksum is 3502268220 and actual 
is checksum 2563579651. expected length is 405 and actual length is 405

Unfortunately, I'm not getting errors from some of the nodes (still 
working through restarting them) about zookeeper:

org.apache.solr.common.SolrException: Could not get leader props
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /collections/UNCLASS/leaders/shard21/leader
     at org.apache.zookeeper.KeeperException.create(
     at org.apache.zookeeper.ZooKeeper.getData(

Any idea what those could be?  Those shards are not coming back up. 
Sorry so many questions!


On 11/21/2017 12:11 PM, Erick Erickson wrote:
> How are you stopping Solr? Nodes should not go into recovery on
> startup unless Solr was killed un-gracefully (i.e. kill -9 or the
> like). If you use the bin/solr script to stop Solr and see a message
> about "killing XXX forcefully" then you can lengthen out the time the
> script waits for shutdown (there's a sysvar you can set, look in the
> script).
> Actually I'll correct myself a bit. Shards _do_ go into recovery but
> it should be very short in the graceful shutdown case. Basically
> shards temporarily go into recovery as part of normal processing just
> long enough to see there's no recovery necessary, but that should be
> measured in a few seconds.
> What it sounds like from this "The shards go into recovery and start
> to utilize nearly all of their network" is that your nodes go into
> "full recovery" where the entire index is copied down because the
> replica thinks it's "too far" out of date. That indicates something
> weird about the state when the Solr nodes stopped.
> wild-shot-in-the-dark question. How big are your tlogs? If you don't
> hard commit very often, the tlogs can replay at startup for a very
> long time.
> This makes no sense to me, I'm surely missing something:
> The process at this point is to start one node, find out the lock
> files, wait for it to come up completely (hours), stop it, delete the
> write.lock files, and restart.  Usually this second restart is faster,
> but it still can take 20-60 minutes.
> When you start one node it may take a few minutes for leader electing
> to kick in (the default is 180 seconds) but absent replication it
> should just be there. Taking hours totally violates my expectations.
> What does Solr _think_ it's doing? What's in the logs at that point?
> And if you stop solr gracefully, there shouldn't be a problem with
> write.lock.
> You could also try increasing the timeouts, and the HDFS directory
> factory has some parameters to tweak that are a mystery to me...
> All in all, this is behavior that I find mystifying.
> Best,
> Erick
> On Tue, Nov 21, 2017 at 5:07 AM, Joe Obernberger
> <> wrote:
>> Hi All - we have a system with 45 physical boxes running solr 6.6.1 using
>> HDFS as the index.  The current index size is about 31TBytes.  With 3x
>> replication that takes up 93TBytes of disk. Our main collection is split
>> across 100 shards with 3 replicas each.  The issue that we're running into
>> is when restarting the solr6 cluster.  The shards go into recovery and start
>> to utilize nearly all of their network interfaces.  If we start too many of
>> the nodes at once, the shards will go into a recovery, fail, and retry loop
>> and never come up.  The errors are related to HDFS not responding fast
>> enough and warnings from the DFSClient.  If we stop a node when this is
>> happening, the script will force a stop (180 second timeout) and upon
>> restart, we have lock files (write.lock) inside of HDFS.
>> The process at this point is to start one node, find out the lock files,
>> wait for it to come up completely (hours), stop it, delete the write.lock
>> files, and restart.  Usually this second restart is faster, but it still can
>> take 20-60 minutes.
>> The smaller indexes recover much faster (less than 5 minutes). Should we
>> have not used so many replicas with HDFS?  Is there a better way we should
>> have built the solr6 cluster?
>> Thank you for any insight!
>> -Joe
> ---
> This email has been checked for viruses by AVG.

View raw message