lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shalin Shekhar Mangar (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-6591) Cluster state updates can be lost on exception in main queue loop
Date Mon, 03 Nov 2014 16:53:34 GMT

     [ https://issues.apache.org/jira/browse/SOLR-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shalin Shekhar Mangar updated SOLR-6591:
----------------------------------------
    Attachment: SOLR-6591-ignore-no-collection-path.patch

{quote}
A rapid create+delete loop for collections with state format > 1 causes the above exception
to happen. This is because the updateZkState method assumes that the collection exists and
it tries to write to /collections/collection_name/state.json directly without verifying whether
the /collections/collection_name zk node exists
{quote}

This patch ignores state messages which are trying to create new collections when the parent
zk path doesn't exist. I've added the following comment in the code to explain the situation:
{quote}
                 // if the /collections/collection_name path doesn't exist then it means that
                  // 1) the user invoked a DELETE collection API and the OverseerCollectionProcessor
has deleted
                  // this zk path.
                  // 2) these are most likely old "state" messages which are only being processed
now because
                  // if they were new "state" messages then in legacy mode, a new collection
would have been 
                  // created with stateFormat = 1 (which is the default state format)
                  // 3) these can't be new "state" messages created for a new collection because
                  // otherwise the OverseerCollectionProcessor would have already created
this path
                  // as part of the create collection API call -- which is the only way in
which a collection
                  // with stateFormat > 1 can possibly be created
{quote}



> Cluster state updates can be lost on exception in main queue loop
> -----------------------------------------------------------------
>
>                 Key: SOLR-6591
>                 URL: https://issues.apache.org/jira/browse/SOLR-6591
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: Trunk
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>             Fix For: Trunk
>
>         Attachments: SOLR-6591-constructStateFix.patch, SOLR-6591-ignore-no-collection-path.patch,
SOLR-6591-no-mixed-batches.patch, SOLR-6591.patch
>
>
> I found this bug while going through the failure on jenkins:
> https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/648/
> {code}
> 2 tests failed.
> REGRESSION:  org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch
> Error Message:
> Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core [halfcollection_shard1_replica1]
Caused by: Could not get shard id for core: halfcollection_shard1_replica1
> Stack Trace:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error CREATEing
SolrCore 'halfcollection_shard1_replica1': Unable to create core [halfcollection_shard1_replica1]
Caused by: Could not get shard id for core: halfcollection_shard1_replica1
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:570)
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215)
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
>         at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583)
>         at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205)
>         at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message