lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Obernberger <>
Subject Re: Recovery Issue - Solr 6.6.1 and HDFS
Date Tue, 21 Nov 2017 19:12:53 GMT
We set the hard commit time long because we were having performance 
issues with HDFS, and thought that since the block size is 128M, having 
a longer hard commit made sense.  That was our hypothesis anyway.  Happy 
to switch it back and see what happens.

I don't know what caused the cluster to go into recovery in the first 
place.  We had a server die over the weekend, but it's just one out of 
~50.  Every shard is 3x replicated (and 3x replicated in 9 
copies).  It was at this point that we noticed lots of network activity, 
and most of the shards in this recovery, fail, retry loop.  That is when 
we decided to shut it down resulting in zombie lock files.

I tried using the FORCELEADER call, which completed, but doesn't seem to 
have any effect on the shards that have no leader.  Kinda out of ideas 
for that problem.  If I can get the cluster back up, I'll try a lower 
hard commit time.  Thanks again Erick!


On 11/21/2017 2:00 PM, Erick Erickson wrote:
> Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...
> I need to back up a bit. Once nodes are in this state it's not
> surprising that they need to be forcefully killed. I was more thinking
> about how they got in this situation in the first place. _Before_ you
> get into the nasty state how are the Solr nodes shut down? Forcefully?
> Your hard commit is far longer than it needs to be, resulting in much
> larger tlog files etc. I usually set this at 15-60 seconds with local
> disks, not quite sure whether longer intervals are helpful on HDFS.
> What this means is that you can spend up to 30 minutes when you
> restart solr _replaying the tlogs_! If Solr is killed, it may not have
> had a chance to fsync the segments and may have to replay on startup.
> If you have openSearcher set to false, the hard commit operation is
> not horribly expensive, it just fsync's the current segments and opens
> new ones. It won't be a total cure, but I bet reducing this interval
> would help a lot.
> Also, if you stop indexing there's no need to wait 30 minutes if you
> issue a manual commit, something like
> .../collection/update?commit=true. Just reducing the hard commit
> interval will make the wait between stopping indexing and restarting
> shorter all by itself if you don't want to issue the manual commit.
> Best,
> Erick
> On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
> <> wrote:
>> Hi,
>> the write.lock issue I see as well when Solr is not been stopped gracefully.
>> The write.lock files are then left in the HDFS as they do not get removed
>> automatically when the client disconnects like a ephemeral node in
>> ZooKeeper. Unfortunately Solr does also not realize that it should be owning
>> the lock as it is marked in the state stored in ZooKeeper as the owner and
>> is also not willing to retry, which is why you need to restart the whole
>> Solr instance after the cleanup. I added some logic to my Solr start up
>> script which scans the log files in HDFS and compares that with the state in
>> ZooKeeper and then delete all lock files that belong to the node that I'm
>> starting.
>> regards,
>> Hendrik
>> On 21.11.2017 14:07, Joe Obernberger wrote:
>>> Hi All - we have a system with 45 physical boxes running solr 6.6.1 using
>>> HDFS as the index.  The current index size is about 31TBytes. With 3x
>>> replication that takes up 93TBytes of disk. Our main collection is split
>>> across 100 shards with 3 replicas each.  The issue that we're running into
>>> is when restarting the solr6 cluster.  The shards go into recovery and start
>>> to utilize nearly all of their network interfaces.  If we start too many of
>>> the nodes at once, the shards will go into a recovery, fail, and retry loop
>>> and never come up.  The errors are related to HDFS not responding fast
>>> enough and warnings from the DFSClient.  If we stop a node when this is
>>> happening, the script will force a stop (180 second timeout) and upon
>>> restart, we have lock files (write.lock) inside of HDFS.
>>> The process at this point is to start one node, find out the lock files,
>>> wait for it to come up completely (hours), stop it, delete the write.lock
>>> files, and restart.  Usually this second restart is faster, but it still can
>>> take 20-60 minutes.
>>> The smaller indexes recover much faster (less than 5 minutes). Should we
>>> have not used so many replicas with HDFS?  Is there a better way we should
>>> have built the solr6 cluster?
>>> Thank you for any insight!
>>> -Joe
> ---
> This email has been checked for viruses by AVG.

View raw message