lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-13396) SolrCloud will delete the core data for any core that is not referenced in the clusterstate
Date Fri, 12 Apr 2019 15:16:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-13396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816343#comment-16816343
] 

Erick Erickson commented on SOLR-13396:
---------------------------------------

This is a sticky wicket. Let's claim I have a 200 node cluster hosting 1,000 collections.
Keeping track of all the cores that aren't _really_ part of a collection and manually cleaning
them up is an onerous task.

Yet it's pretty horrible to have one mistake (someone edits the startup script and messes
up the ZK parameter and pushes it out to all the Solr nodes and restarts the cluster) one
could delete everything everywhere.

More thinking out loud, and I have no clue how it'd interact with autoscaling. It seems odd
but we _could_ use ZooKeeper to keep a list of potential nodes to delete and have

1> a way to view/list them

2> a button to push or a collections API command to issue or.. to say "delete them".

3> some kind of very visible warning that this list is not empty.

"But wait!!" you cry, The whole problem is that you can't get to ZooKeeper in the first place!"
Which is perfectly fine, since we're presupposing a bogus ZK address anyway. That way the
nodes to delete would be tied to the proper ZK instance. When the ZK address was corrected,
there wouldn't be anything in the queue. I think I like this a little better than some sort
of scheduled-in-the-future event, for people who cared a cron job that issued the collections
API call could be done. One could even attach a date to the znode for the potential core to
delete with an expiration date.

 

> SolrCloud will delete the core data for any core that is not referenced in the clusterstate
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13396
>                 URL: https://issues.apache.org/jira/browse/SOLR-13396
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 7.3.1, 8.0
>            Reporter: Shawn Heisey
>            Priority: Major
>
> SOLR-12066 is an improvement designed to delete core data for replicas that were deleted
while the node was down -- better cleanup.
> In practice, that change causes SolrCloud to delete all core data for cores that are
not referenced in the ZK clusterstate.  If all the ZK data gets deleted or the Solr instance
is pointed at a ZK ensemble with no data, it will proceed to delete all of the cores in the
solr home, with no possibility of recovery.
> I do not think that Solr should ever delete core data unless an explicit DELETE action
has been made and the node is operational at the time of the request.  If a core exists during
startup that cannot be found in the ZK clusterstate, it should be ignored (not started) and
a helpful message should be logged.  I think that message should probably be at WARN so that
it shows up in the admin UI logging tab with default settings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message