lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hendrik Haddorp <>
Subject Re: Recovery Issue - Solr 6.6.1 and HDFS
Date Tue, 21 Nov 2017 20:24:38 GMT
We sometimes also have replicas not recovering. If one replica is left 
active the easiest is to then to delete the replica and create a new 
one. When all replicas are down it helps most of the time to restart one 
of the nodes that contains a replica in down state. If that also doesn't 
get the replica to recover I would check the logs of the node and also 
that of the overseer node. I have seen the same issue on Solr using 
local storage. The main HDFS related issues we had so far was those lock 
files and if you delete and recreate collections/cores and it sometimes 
happens that the data was not cleaned up in HDFS and then causes a conflict.


On 21.11.2017 21:07, Joe Obernberger wrote:
> We've never run an index this size in anything but HDFS, so I have no 
> comparison.  What we've been doing is keeping two main collections - 
> all data, and the last 30 days of data.  Then we handle queries based 
> on date range.  The 30 day index is significantly faster.
> My main concern right now is that 6 of the 100 shards are not coming 
> back because of no leader.  I've never seen this error before.  Any 
> ideas?  ClusterStatus shows all three replicas with state 'down'.
> Thanks!
> -joe
> On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
>> We actually also have some performance issue with HDFS at the moment. 
>> We are doing lots of soft commits for NRT search. Those seem to be 
>> slower then with local storage. The investigation is however not 
>> really far yet.
>> We have a setup with 2000 collections, with one shard each and a 
>> replication factor of 2 or 3. When we restart nodes too fast that 
>> causes problems with the overseer queue, which can lead to the queue 
>> getting out of control and Solr pretty much dying. We are still on 
>> Solr 6.3. 6.6 has some improvements and should handle these actions 
>> faster. I would check what you see for 
>> "/solr/admin/collections?action=OVERSEERSTATUS&wt=json". The critical 
>> part is the "overseer_queue_size" value. If this goes up to about 
>> 10000 it is pretty much game over on our setup. In that case it seems 
>> to be best to stop all nodes, clear the queue in ZK and then restart 
>> the nodes one by one with a gap of like 5min. That normally recovers 
>> pretty well.
>> regards,
>> Hendrik
>> On 21.11.2017 20:12, Joe Obernberger wrote:
>>> We set the hard commit time long because we were having performance 
>>> issues with HDFS, and thought that since the block size is 128M, 
>>> having a longer hard commit made sense.  That was our hypothesis 
>>> anyway.  Happy to switch it back and see what happens.
>>> I don't know what caused the cluster to go into recovery in the 
>>> first place.  We had a server die over the weekend, but it's just 
>>> one out of ~50.  Every shard is 3x replicated (and 3x replicated in 
>>> 9 copies).  It was at this point that we noticed lots of 
>>> network activity, and most of the shards in this recovery, fail, 
>>> retry loop.  That is when we decided to shut it down resulting in 
>>> zombie lock files.
>>> I tried using the FORCELEADER call, which completed, but doesn't 
>>> seem to have any effect on the shards that have no leader. Kinda out 
>>> of ideas for that problem.  If I can get the cluster back up, I'll 
>>> try a lower hard commit time.  Thanks again Erick!
>>> -Joe
>>> On 11/21/2017 2:00 PM, Erick Erickson wrote:
>>>> Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...
>>>> I need to back up a bit. Once nodes are in this state it's not
>>>> surprising that they need to be forcefully killed. I was more thinking
>>>> about how they got in this situation in the first place. _Before_ you
>>>> get into the nasty state how are the Solr nodes shut down? Forcefully?
>>>> Your hard commit is far longer than it needs to be, resulting in much
>>>> larger tlog files etc. I usually set this at 15-60 seconds with local
>>>> disks, not quite sure whether longer intervals are helpful on HDFS.
>>>> What this means is that you can spend up to 30 minutes when you
>>>> restart solr _replaying the tlogs_! If Solr is killed, it may not have
>>>> had a chance to fsync the segments and may have to replay on startup.
>>>> If you have openSearcher set to false, the hard commit operation is
>>>> not horribly expensive, it just fsync's the current segments and opens
>>>> new ones. It won't be a total cure, but I bet reducing this interval
>>>> would help a lot.
>>>> Also, if you stop indexing there's no need to wait 30 minutes if you
>>>> issue a manual commit, something like
>>>> .../collection/update?commit=true. Just reducing the hard commit
>>>> interval will make the wait between stopping indexing and restarting
>>>> shorter all by itself if you don't want to issue the manual commit.
>>>> Best,
>>>> Erick
>>>> On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
>>>> <> wrote:
>>>>> Hi,
>>>>> the write.lock issue I see as well when Solr is not been stopped 
>>>>> gracefully.
>>>>> The write.lock files are then left in the HDFS as they do not get 
>>>>> removed
>>>>> automatically when the client disconnects like a ephemeral node in
>>>>> ZooKeeper. Unfortunately Solr does also not realize that it should 
>>>>> be owning
>>>>> the lock as it is marked in the state stored in ZooKeeper as the 
>>>>> owner and
>>>>> is also not willing to retry, which is why you need to restart the 
>>>>> whole
>>>>> Solr instance after the cleanup. I added some logic to my Solr 
>>>>> start up
>>>>> script which scans the log files in HDFS and compares that with 
>>>>> the state in
>>>>> ZooKeeper and then delete all lock files that belong to the node 
>>>>> that I'm
>>>>> starting.
>>>>> regards,
>>>>> Hendrik
>>>>> On 21.11.2017 14:07, Joe Obernberger wrote:
>>>>>> Hi All - we have a system with 45 physical boxes running solr 
>>>>>> 6.6.1 using
>>>>>> HDFS as the index.  The current index size is about 31TBytes. 
>>>>>> With 3x
>>>>>> replication that takes up 93TBytes of disk. Our main collection 
>>>>>> is split
>>>>>> across 100 shards with 3 replicas each.  The issue that we're 
>>>>>> running into
>>>>>> is when restarting the solr6 cluster.  The shards go into 
>>>>>> recovery and start
>>>>>> to utilize nearly all of their network interfaces.  If we start

>>>>>> too many of
>>>>>> the nodes at once, the shards will go into a recovery, fail, and

>>>>>> retry loop
>>>>>> and never come up.  The errors are related to HDFS not responding

>>>>>> fast
>>>>>> enough and warnings from the DFSClient.  If we stop a node when

>>>>>> this is
>>>>>> happening, the script will force a stop (180 second timeout) and

>>>>>> upon
>>>>>> restart, we have lock files (write.lock) inside of HDFS.
>>>>>> The process at this point is to start one node, find out the lock

>>>>>> files,
>>>>>> wait for it to come up completely (hours), stop it, delete the 
>>>>>> write.lock
>>>>>> files, and restart.  Usually this second restart is faster, but

>>>>>> it still can
>>>>>> take 20-60 minutes.
>>>>>> The smaller indexes recover much faster (less than 5 minutes). 
>>>>>> Should we
>>>>>> have not used so many replicas with HDFS?  Is there a better way

>>>>>> we should
>>>>>> have built the solr6 cluster?
>>>>>> Thank you for any insight!
>>>>>> -Joe
>>>> ---
>>>> This email has been checked for viruses by AVG.

View raw message