lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Obernberger <>
Subject Re: Recovery Issue - Solr 6.6.1 and HDFS
Date Tue, 21 Nov 2017 21:11:34 GMT
Thank you Erick.  I've set the RamBufferSize to 1G; perhaps higher would 
be beneficial.  One more data point is that if I restart a node, more 
often than not, it goes into recovery, beats up the network for a while, 
and then goes green.  This happens even if I do no indexing between 
restarts.  Is that expected? Sometimes this can take longer than 20 
minutes.  No new data was added to the index between the restarts.


On 11/21/2017 3:43 PM, Erick Erickson wrote:
> bq: We are doing lots of soft commits for NRT search...
> It's not surprising that this is slower than local storage, especially
> if you have any autowarming going on. Opening  new searchers will need
> to read data from disk for the new segments, and HDFS may be slower
> here.
> As far as the commit interval, an under-appreciated event is that when
> RAMBufferSizeMB is exceeded (default 100M last I knew) new segments
> are written _anyway_, they're just a little invisible. That is, the
> segments_n file isn't updated even though they're closed IIUC at
> least. So that very long interval isn't helping with that problem I
> don't think....
> Evidence to the contrary trumps my understanding of course.
> About starting all these collections up at once and the Overseer
> queue. I've seen this in similar situations. There are a _lot_ of
> messages flying back and forth for each replica on startup, and the
> Overseer processing was very inefficient historically so that queue
> could get in the 100s of K, I've seen some pathological situations
> where it's over 1M. SOLR-10524 made this a lot better. There are still
> a lot of messages written in a case like yours, but at least the
> Overseer has a much better chance to keep up.... Solr 6.6... At that
> point bringing up Solr took a very long time.
> Erick
> On Tue, Nov 21, 2017 at 12:24 PM, Hendrik Haddorp
> <> wrote:
>> We sometimes also have replicas not recovering. If one replica is left
>> active the easiest is to then to delete the replica and create a new one.
>> When all replicas are down it helps most of the time to restart one of the
>> nodes that contains a replica in down state. If that also doesn't get the
>> replica to recover I would check the logs of the node and also that of the
>> overseer node. I have seen the same issue on Solr using local storage. The
>> main HDFS related issues we had so far was those lock files and if you
>> delete and recreate collections/cores and it sometimes happens that the data
>> was not cleaned up in HDFS and then causes a conflict.
>> Hendrik
>> On 21.11.2017 21:07, Joe Obernberger wrote:
>>> We've never run an index this size in anything but HDFS, so I have no
>>> comparison.  What we've been doing is keeping two main collections - all
>>> data, and the last 30 days of data.  Then we handle queries based on date
>>> range.  The 30 day index is significantly faster.
>>> My main concern right now is that 6 of the 100 shards are not coming back
>>> because of no leader.  I've never seen this error before.  Any ideas?
>>> ClusterStatus shows all three replicas with state 'down'.
>>> Thanks!
>>> -joe
>>> On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
>>>> We actually also have some performance issue with HDFS at the moment. We
>>>> are doing lots of soft commits for NRT search. Those seem to be slower then
>>>> with local storage. The investigation is however not really far yet.
>>>> We have a setup with 2000 collections, with one shard each and a
>>>> replication factor of 2 or 3. When we restart nodes too fast that causes
>>>> problems with the overseer queue, which can lead to the queue getting out
>>>> control and Solr pretty much dying. We are still on Solr 6.3. 6.6 has some
>>>> improvements and should handle these actions faster. I would check what you
>>>> see for "/solr/admin/collections?action=OVERSEERSTATUS&wt=json". The
>>>> critical part is the "overseer_queue_size" value. If this goes up to about
>>>> 10000 it is pretty much game over on our setup. In that case it seems to
>>>> best to stop all nodes, clear the queue in ZK and then restart the nodes
>>>> by one with a gap of like 5min. That normally recovers pretty well.
>>>> regards,
>>>> Hendrik
>>>> On 21.11.2017 20:12, Joe Obernberger wrote:
>>>>> We set the hard commit time long because we were having performance
>>>>> issues with HDFS, and thought that since the block size is 128M, having
>>>>> longer hard commit made sense.  That was our hypothesis anyway.  Happy
>>>>> switch it back and see what happens.
>>>>> I don't know what caused the cluster to go into recovery in the first
>>>>> place.  We had a server die over the weekend, but it's just one out of
>>>>> Every shard is 3x replicated (and 3x replicated in 9 copies).
>>>>> was at this point that we noticed lots of network activity, and most
of the
>>>>> shards in this recovery, fail, retry loop.  That is when we decided to
>>>>> it down resulting in zombie lock files.
>>>>> I tried using the FORCELEADER call, which completed, but doesn't seem
>>>>> have any effect on the shards that have no leader. Kinda out of ideas
>>>>> that problem.  If I can get the cluster back up, I'll try a lower hard
>>>>> commit time.  Thanks again Erick!
>>>>> -Joe
>>>>> On 11/21/2017 2:00 PM, Erick Erickson wrote:
>>>>>> Frankly with HDFS I'm a bit out of my depth so listen to Hendrik
>>>>>> I need to back up a bit. Once nodes are in this state it's not
>>>>>> surprising that they need to be forcefully killed. I was more thinking
>>>>>> about how they got in this situation in the first place. _Before_
>>>>>> get into the nasty state how are the Solr nodes shut down? Forcefully?
>>>>>> Your hard commit is far longer than it needs to be, resulting in
>>>>>> larger tlog files etc. I usually set this at 15-60 seconds with local
>>>>>> disks, not quite sure whether longer intervals are helpful on HDFS.
>>>>>> What this means is that you can spend up to 30 minutes when you
>>>>>> restart solr _replaying the tlogs_! If Solr is killed, it may not
>>>>>> had a chance to fsync the segments and may have to replay on startup.
>>>>>> If you have openSearcher set to false, the hard commit operation
>>>>>> not horribly expensive, it just fsync's the current segments and
>>>>>> new ones. It won't be a total cure, but I bet reducing this interval
>>>>>> would help a lot.
>>>>>> Also, if you stop indexing there's no need to wait 30 minutes if
>>>>>> issue a manual commit, something like
>>>>>> .../collection/update?commit=true. Just reducing the hard commit
>>>>>> interval will make the wait between stopping indexing and restarting
>>>>>> shorter all by itself if you don't want to issue the manual commit.
>>>>>> Best,
>>>>>> Erick
>>>>>> On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
>>>>>> <> wrote:
>>>>>>> Hi,
>>>>>>> the write.lock issue I see as well when Solr is not been stopped
>>>>>>> gracefully.
>>>>>>> The write.lock files are then left in the HDFS as they do not
>>>>>>> removed
>>>>>>> automatically when the client disconnects like a ephemeral node
>>>>>>> ZooKeeper. Unfortunately Solr does also not realize that it should
>>>>>>> owning
>>>>>>> the lock as it is marked in the state stored in ZooKeeper as
the owner
>>>>>>> and
>>>>>>> is also not willing to retry, which is why you need to restart
>>>>>>> whole
>>>>>>> Solr instance after the cleanup. I added some logic to my Solr
>>>>>>> up
>>>>>>> script which scans the log files in HDFS and compares that with
>>>>>>> state in
>>>>>>> ZooKeeper and then delete all lock files that belong to the node
>>>>>>> I'm
>>>>>>> starting.
>>>>>>> regards,
>>>>>>> Hendrik
>>>>>>> On 21.11.2017 14:07, Joe Obernberger wrote:
>>>>>>>> Hi All - we have a system with 45 physical boxes running
solr 6.6.1
>>>>>>>> using
>>>>>>>> HDFS as the index.  The current index size is about 31TBytes.
With 3x
>>>>>>>> replication that takes up 93TBytes of disk. Our main collection
>>>>>>>> split
>>>>>>>> across 100 shards with 3 replicas each.  The issue that we're
>>>>>>>> into
>>>>>>>> is when restarting the solr6 cluster.  The shards go into
>>>>>>>> and start
>>>>>>>> to utilize nearly all of their network interfaces.  If we
start too
>>>>>>>> many of
>>>>>>>> the nodes at once, the shards will go into a recovery, fail,
>>>>>>>> retry loop
>>>>>>>> and never come up.  The errors are related to HDFS not responding
>>>>>>>> fast
>>>>>>>> enough and warnings from the DFSClient.  If we stop a node
when this
>>>>>>>> is
>>>>>>>> happening, the script will force a stop (180 second timeout)
and upon
>>>>>>>> restart, we have lock files (write.lock) inside of HDFS.
>>>>>>>> The process at this point is to start one node, find out
the lock
>>>>>>>> files,
>>>>>>>> wait for it to come up completely (hours), stop it, delete
>>>>>>>> write.lock
>>>>>>>> files, and restart.  Usually this second restart is faster,
but it
>>>>>>>> still can
>>>>>>>> take 20-60 minutes.
>>>>>>>> The smaller indexes recover much faster (less than 5 minutes).
>>>>>>>> we
>>>>>>>> have not used so many replicas with HDFS?  Is there a better
way we
>>>>>>>> should
>>>>>>>> have built the solr6 cluster?
>>>>>>>> Thank you for any insight!
>>>>>>>> -Joe
>>>>>> ---
>>>>>> This email has been checked for viruses by AVG.

View raw message