lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Risden (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-13396) SolrCloud will delete the core data for any core that is not referenced in the clusterstate
Date Fri, 12 Apr 2019 15:35:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-13396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816367#comment-16816367
] 

Kevin Risden commented on SOLR-13396:
-------------------------------------

I agree that arbitrarily deleting data is bad. The other issue is how do you clean up if you
JUST have the error/warn. Would be nice to know what you needed to do in addition that it
was a problem.

So I will caveat this by saying I have no idea how this works today, but when I read this
I thought it would make sense for each node responsible for a shard/collection would have
to "ack" that the operation was complete. If the node was down at the time, when it comes
up it should know it needs to do "xyz" and finish the operation.

Again not sure of the ZK details, but some rough ideas:
* Create a znode for each node with list of operations it needs to complete - this would be
written to by the leader?
* Keep track of which operations each node completed on existing list before deleting? - I
think this could be hard since leader could change?

Some of the concerns would be added load on ZK for reading/writing operations.

The above could have already been thought about when building Solr Cloud so it might be a
nonstarter.

> SolrCloud will delete the core data for any core that is not referenced in the clusterstate
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13396
>                 URL: https://issues.apache.org/jira/browse/SOLR-13396
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 7.3.1, 8.0
>            Reporter: Shawn Heisey
>            Priority: Major
>
> SOLR-12066 is an improvement designed to delete core data for replicas that were deleted
while the node was down -- better cleanup.
> In practice, that change causes SolrCloud to delete all core data for cores that are
not referenced in the ZK clusterstate.  If all the ZK data gets deleted or the Solr instance
is pointed at a ZK ensemble with no data, it will proceed to delete all of the cores in the
solr home, with no possibility of recovery.
> I do not think that Solr should ever delete core data unless an explicit DELETE action
has been made and the node is operational at the time of the request.  If a core exists during
startup that cannot be found in the ZK clusterstate, it should be ignored (not started) and
a helpful message should be logged.  I think that message should probably be at WARN so that
it shows up in the admin UI logging tab with default settings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message