lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Obernberger <>
Subject Re: Recovery Issue - Solr 6.6.1 and HDFS
Date Wed, 22 Nov 2017 13:18:45 GMT
Hi Hendrick - I was halting a replica and then restarting it, waited, 
then restarted another one.  That didn't work, but when I halted all 
three, and then restarted those one by one, the shard finally elected a 
leader and came up.  Phew!  I too noticed the lock files in 
index.<timestamp> folders.  Usually what I do is:
hadoop fs -ls -R /solr6.6.0 | grep write.lock > out.txt
cat out.txt | cut --bytes 57-
to get a list of files to delete

Glad these shards have come up!  Thanks very much.


On 11/22/2017 5:20 AM, Hendrik Haddorp wrote:
> Hi Joe,
> sorry, I have not seen that problem. I would normally not delete a 
> replica if the shard is down but only if there is an active shard. 
> Without an active leader the replica should not be able to recover. I 
> also just had a case where all replicas of a shard stayed in down 
> state and restarts didn't help. This was however also caused by lock 
> files. Once I cleaned them up and restarted all Solr instances that 
> had a replica they recovered.
> For the lock files I discovered that the index is not always in the 
> "index" folder but can also be in an index.<timestamp> folder. There 
> can be an "" file in the "data" directory in HDFS and 
> this contains the correct index folder name.
> If you are really desperate you could also delete all but one replica 
> so that the leader election is quite trivial. But this does of course 
> increase the risk of finally loosing the data quite a bit. So I would 
> try looking into the code and figure out what the problem is here and 
> maybe compare the state in HDFS and ZK with a shard that works.
> regards,
> Hendrik
> On 21.11.2017 23:57, Joe Obernberger wrote:
>> Hi Hendrick - the shards in question have three replicas.  I tried 
>> restarting each one (one by one) - no luck.  No leader is found. I 
>> deleted one of the replicas and added a new one, and the new one also 
>> shows as 'down'.  I also tried the FORCELEADER call, but that had no 
>> effect.  I checked the OVERSEERSTATUS, but there is nothing unusual 
>> there.  I don't see anything useful in the logs except the error:
>> org.apache.solr.common.SolrException: Error getting leader from zk 
>> for shard shard21
>>     at 
>>     at 
>>     at 
>>     at 
>> org.apache.solr.core.ZkContainer.lambda$registerInZk$0(
>>     at 
>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(
>>     at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(
>>     at 
>> java.util.concurrent.ThreadPoolExecutor$
>>     at
>> Caused by: org.apache.solr.common.SolrException: Could not get leader 
>> props
>>     at 
>>     at 
>>     at 
>>     ... 7 more
>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
>> KeeperErrorCode = NoNode for /collections/UNCLASS/leaders/shard21/leader
>>     at 
>> org.apache.zookeeper.KeeperException.create(
>>     at 
>> org.apache.zookeeper.KeeperException.create(
>>     at org.apache.zookeeper.ZooKeeper.getData(
>>     at 
>>     at 
>>     at 
>>     at 
>>     at 
>>     ... 9 more
>> Can I modify zookeeper to force a leader?  Is there any other way to 
>> recover from this?  Thanks very much!
>> -Joe
>> On 11/21/2017 3:24 PM, Hendrik Haddorp wrote:
>>> We sometimes also have replicas not recovering. If one replica is 
>>> left active the easiest is to then to delete the replica and create 
>>> a new one. When all replicas are down it helps most of the time to 
>>> restart one of the nodes that contains a replica in down state. If 
>>> that also doesn't get the replica to recover I would check the logs 
>>> of the node and also that of the overseer node. I have seen the same 
>>> issue on Solr using local storage. The main HDFS related issues we 
>>> had so far was those lock files and if you delete and recreate 
>>> collections/cores and it sometimes happens that the data was not 
>>> cleaned up in HDFS and then causes a conflict.
>>> Hendrik
>>> On 21.11.2017 21:07, Joe Obernberger wrote:
>>>> We've never run an index this size in anything but HDFS, so I have 
>>>> no comparison.  What we've been doing is keeping two main 
>>>> collections - all data, and the last 30 days of data.  Then we 
>>>> handle queries based on date range. The 30 day index is 
>>>> significantly faster.
>>>> My main concern right now is that 6 of the 100 shards are not 
>>>> coming back because of no leader.  I've never seen this error 
>>>> before.  Any ideas?  ClusterStatus shows all three replicas with 
>>>> state 'down'.
>>>> Thanks!
>>>> -joe
>>>> On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
>>>>> We actually also have some performance issue with HDFS at the 
>>>>> moment. We are doing lots of soft commits for NRT search. Those 
>>>>> seem to be slower then with local storage. The investigation is 
>>>>> however not really far yet.
>>>>> We have a setup with 2000 collections, with one shard each and a 
>>>>> replication factor of 2 or 3. When we restart nodes too fast that 
>>>>> causes problems with the overseer queue, which can lead to the 
>>>>> queue getting out of control and Solr pretty much dying. We are 
>>>>> still on Solr 6.3. 6.6 has some improvements and should handle 
>>>>> these actions faster. I would check what you see for 
>>>>> "/solr/admin/collections?action=OVERSEERSTATUS&wt=json". The 
>>>>> critical part is the "overseer_queue_size" value. If this goes up 
>>>>> to about 10000 it is pretty much game over on our setup. In that 
>>>>> case it seems to be best to stop all nodes, clear the queue in ZK 
>>>>> and then restart the nodes one by one with a gap of like 5min. 
>>>>> That normally recovers pretty well.
>>>>> regards,
>>>>> Hendrik
>>>>> On 21.11.2017 20:12, Joe Obernberger wrote:
>>>>>> We set the hard commit time long because we were having 
>>>>>> performance issues with HDFS, and thought that since the block 
>>>>>> size is 128M, having a longer hard commit made sense.  That was

>>>>>> our hypothesis anyway. Happy to switch it back and see what happens.
>>>>>> I don't know what caused the cluster to go into recovery in the 
>>>>>> first place.  We had a server die over the weekend, but it's just

>>>>>> one out of ~50.  Every shard is 3x replicated (and 3x replicated

>>>>>> in 9 copies).  It was at this point that we noticed 
>>>>>> lots of network activity, and most of the shards in this 
>>>>>> recovery, fail, retry loop.  That is when we decided to shut it

>>>>>> down resulting in zombie lock files.
>>>>>> I tried using the FORCELEADER call, which completed, but doesn't

>>>>>> seem to have any effect on the shards that have no leader. Kinda

>>>>>> out of ideas for that problem.  If I can get the cluster back up,

>>>>>> I'll try a lower hard commit time. Thanks again Erick!
>>>>>> -Joe
>>>>>> On 11/21/2017 2:00 PM, Erick Erickson wrote:
>>>>>>> Frankly with HDFS I'm a bit out of my depth so listen to Hendrik

>>>>>>> ;)...
>>>>>>> I need to back up a bit. Once nodes are in this state it's not
>>>>>>> surprising that they need to be forcefully killed. I was more

>>>>>>> thinking
>>>>>>> about how they got in this situation in the first place. 
>>>>>>> _Before_ you
>>>>>>> get into the nasty state how are the Solr nodes shut down? 
>>>>>>> Forcefully?
>>>>>>> Your hard commit is far longer than it needs to be, resulting
>>>>>>> much
>>>>>>> larger tlog files etc. I usually set this at 15-60 seconds with

>>>>>>> local
>>>>>>> disks, not quite sure whether longer intervals are helpful on
>>>>>>> What this means is that you can spend up to 30 minutes when you
>>>>>>> restart solr _replaying the tlogs_! If Solr is killed, it may

>>>>>>> not have
>>>>>>> had a chance to fsync the segments and may have to replay on

>>>>>>> startup.
>>>>>>> If you have openSearcher set to false, the hard commit operation
>>>>>>> not horribly expensive, it just fsync's the current segments
>>>>>>> opens
>>>>>>> new ones. It won't be a total cure, but I bet reducing this 
>>>>>>> interval
>>>>>>> would help a lot.
>>>>>>> Also, if you stop indexing there's no need to wait 30 minutes
>>>>>>> you
>>>>>>> issue a manual commit, something like
>>>>>>> .../collection/update?commit=true. Just reducing the hard commit
>>>>>>> interval will make the wait between stopping indexing and 
>>>>>>> restarting
>>>>>>> shorter all by itself if you don't want to issue the manual commit.
>>>>>>> Best,
>>>>>>> Erick
>>>>>>> On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
>>>>>>> <> wrote:
>>>>>>>> Hi,
>>>>>>>> the write.lock issue I see as well when Solr is not been

>>>>>>>> stopped gracefully.
>>>>>>>> The write.lock files are then left in the HDFS as they do
>>>>>>>> get removed
>>>>>>>> automatically when the client disconnects like a ephemeral
node in
>>>>>>>> ZooKeeper. Unfortunately Solr does also not realize that
>>>>>>>> should be owning
>>>>>>>> the lock as it is marked in the state stored in ZooKeeper
>>>>>>>> the owner and
>>>>>>>> is also not willing to retry, which is why you need to restart

>>>>>>>> the whole
>>>>>>>> Solr instance after the cleanup. I added some logic to my
>>>>>>>> start up
>>>>>>>> script which scans the log files in HDFS and compares that
>>>>>>>> the state in
>>>>>>>> ZooKeeper and then delete all lock files that belong to the

>>>>>>>> node that I'm
>>>>>>>> starting.
>>>>>>>> regards,
>>>>>>>> Hendrik
>>>>>>>> On 21.11.2017 14:07, Joe Obernberger wrote:
>>>>>>>>> Hi All - we have a system with 45 physical boxes running
>>>>>>>>> 6.6.1 using
>>>>>>>>> HDFS as the index.  The current index size is about
>>>>>>>>> With 3x
>>>>>>>>> replication that takes up 93TBytes of disk. Our main

>>>>>>>>> collection is split
>>>>>>>>> across 100 shards with 3 replicas each.  The issue that
>>>>>>>>> running into
>>>>>>>>> is when restarting the solr6 cluster.  The shards go
>>>>>>>>> recovery and start
>>>>>>>>> to utilize nearly all of their network interfaces. If
we start 
>>>>>>>>> too many of
>>>>>>>>> the nodes at once, the shards will go into a recovery,
>>>>>>>>> and retry loop
>>>>>>>>> and never come up.  The errors are related to HDFS not

>>>>>>>>> responding fast
>>>>>>>>> enough and warnings from the DFSClient.  If we stop
a node 
>>>>>>>>> when this is
>>>>>>>>> happening, the script will force a stop (180 second timeout)

>>>>>>>>> and upon
>>>>>>>>> restart, we have lock files (write.lock) inside of HDFS.
>>>>>>>>> The process at this point is to start one node, find
out the 
>>>>>>>>> lock files,
>>>>>>>>> wait for it to come up completely (hours), stop it, delete
>>>>>>>>> write.lock
>>>>>>>>> files, and restart.  Usually this second restart is
>>>>>>>>> but it still can
>>>>>>>>> take 20-60 minutes.
>>>>>>>>> The smaller indexes recover much faster (less than 5
>>>>>>>>> Should we
>>>>>>>>> have not used so many replicas with HDFS?  Is there
a better 
>>>>>>>>> way we should
>>>>>>>>> have built the solr6 cluster?
>>>>>>>>> Thank you for any insight!
>>>>>>>>> -Joe
>>>>>>> ---
>>>>>>> This email has been checked for viruses by AVG.

View raw message