lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: create collection gets stuck on node restart
Date Tue, 03 Jan 2017 21:32:48 GMT
NOTE: Your problem is perfectly valid, this is something of a
side issue.

You shouldn't have a clusterstate.json in 6x. Or 5x either for
that matter. More accurately it will be at most an empty node.

instead, each collection should have a "state.json" file. There
are a couple of reasons for this:

1> if you have many collections, ZooKeeper only has to send
updates to those nodes that host replicas for any specific collection
(avoids the "thundering herd" problem).

2> This can imply that you have "legacyCloud" set in your ZooKeeper.
In this mode you can have collections "come back" under special
circumstances and that can be kind of confusing. Actually, this may
be tangentially related.

So assuming that your reference to "clusterstate.json" isn't just a
typo, I'd recommend you upgrade your ZK by using the Collections
API MIGRATESTATEFORMAT command.

Best,
Erick

On Tue, Jan 3, 2017 at 6:34 AM, Shawn Heisey <apache@elyograg.org> wrote:
> On 1/3/2017 2:59 AM, Hendrik Haddorp wrote:
>> I have a SolrCloud setup with 5 nodes and am creating collections with
>> a replication factor of 3. If I kill and restart nodes at the "right"
>> time during the creation process the creation seems to get stuck.
>> Collection data is left in the clusterstate.json file in ZooKeeper and
>> no collections can be created anymore until this entry gets removed. I
>> can reproduce this on Solr 6.2.1 and 6.3, while 6.3 seems to be
>> somewhat less likely to get stuck. Is Solr supposed to recover from
>> data being stuck in the clusterstate.json at some point? I had one
>> instance where it looked like data was removed again but normally the
>> data does not seem to get cleaned up automatically and just blocks any
>> further collection creations.
>>
>> I did not find anything like this in Jira. Just SOLR-7198 sounds a bit
>> similar even though it is about deleting collections.
>
> Don't restart your nodes at the same time you're trying to do
> maintenance of any kind on your collections.  Try to only do maintenance
> when they are all working, or you'll get unexpected results.
>
> The most recent development goal is make it so that collection deletion
> can be done even if the creation was partial.  The idea is that if
> something goes wrong, you can delete the bad collection and then be free
> to try to create it again.  I see that you've started another thread
> about deletion not fully eliminating everything in HDFS.  That does
> sound like a bug.  I have no experience with HDFS at all, so I can't be
> helpful with that.
>
> Thanks,
> Shawn
>

Mime
View raw message