lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amrit Sarkar (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6
Date Fri, 01 Sep 2017 06:38:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150114#comment-16150114
] 

Amrit Sarkar edited comment on SOLR-11278 at 9/1/17 6:37 AM:
-------------------------------------------------------------

Rambling again:

What is the use of bootstrapFuture essentially? to get status of the current operation, right?

In CdcrRequestHandler.java :: there are some custom log lines, ignore them:

{code}
 Runnable runnable = () -> {
      Lock recoveryLock = req.getCore().getSolrCoreState().getRecoveryLock();
      boolean locked = recoveryLock.tryLock();
      SolrCoreState coreState = core.getSolrCoreState();
      try {
        if (!locked)  {
          log.info("we reached this point :: CANCEL BOOTSTRAP, locked :: " + locked);
          handleCancelBootstrap(req, rsp);
        } else if (leaderStateManager.amILeader())  {
          coreState.setCdcrBootstrapRunning(true);
          //running.set(true);
          String masterUrl = req.getParams().get(ReplicationHandler.MASTER_URL);
          BootstrapCallable bootstrapCallable = new BootstrapCallable(masterUrl, core);
          coreState.setCdcrBootstrapCallable(bootstrapCallable);
          Future<Boolean> bootstrapFuture = core.getCoreContainer().getUpdateShardHandler().getRecoveryExecutor()
              .submit(bootstrapCallable);
          try {
            log.info("we reached this point :: all good, bootstrapFuture.get :: " + bootstrapFuture.get());
          } catch (Exception e) {
            log.error("bootstrapFuture.get :: ",e);
          }
          coreState.setCdcrBootstrapFuture(bootstrapFuture);
          try {
            bootstrapFuture.get();
          } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            log.warn("Bootstrap was interrupted", e);
          } catch (ExecutionException e) {
            log.error("Bootstrap operation failed", e);
          }
        } else  {
          log.error("Action {} sent to non-leader replica @ {}:{}. Aborting bootstrap.", CdcrParams.CdcrAction.BOOTSTRAP,
collectionName, shard);
        }
      } finally {
        if (locked) {
          coreState.setCdcrBootstrapRunning(false);
          recoveryLock.unlock();
        }
      }
    };
{code}

*bootstrapFuture.get()* throws:

{quote}
  [beaster]   2> 43072 ERROR (updateExecutor-39-thread-1-processing-n:127.0.0.1:41488_solr
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) [n:127.0.0.1:41488_solr
c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler
Bootstrap operation failed
  [beaster]   2> java.util.concurrent.ExecutionException: java.lang.AssertionError
  [beaster]   2> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  [beaster]   2> 	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
  [beaster]   2> 	at org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:653)
  [beaster]   2> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
  [beaster]   2> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  [beaster]   2> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  [beaster]   2> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
  [beaster]   2> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  [beaster]   2> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  [beaster]   2> 	at java.lang.Thread.run(Thread.java:748)
  [beaster]   2> Caused by: java.lang.AssertionError
  [beaster]   2> 	at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:804)
  [beaster]   2> 	at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:723)
  [beaster]   2> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  [beaster]   2> 	... 5 more
{quote}

and bootstrap operation fails.

FutureTask.java ::
{code}
    /**
     * Returns result or throws exception for completed task.
     * @param s completed state value
     */
    @SuppressWarnings("unchecked")
    private V report(int s) throws ExecutionException {
        Object x = outcome;
        if (s == NORMAL)
            return (V)x;
        if (s >= CANCELLED)
            throw new CancellationException();
        throw new ExecutionException((Throwable)x);
    }
{code}

and the assertion failure is at same function {{finally}} block ::
{code}
        if (closed || !success) {
          // we cannot apply the buffer in this case because it will introduce newer versions
in the
          // update log and then the source cluster will get those versions via collectioncheckpoint
          // causing the versions in between to be completely missed
          boolean dropped = ulog.dropBufferedUpdates();
          assert dropped;
        }
{code}

{{dropped}} is false, {{bufferredUpdates}} are not cleared / dropped? I understand it is calling
its own function but this is difficult to comprehend who is calling what and what is getting
returned..


was (Author: sarkaramrit2@gmail.com):
Rambling again:

What is the use of bootstrapFuture essentially? to get status of the current operation, right?

In CdcrRequestHandler.java :: there are some custom log lines, ignore them:

{code}
 Runnable runnable = () -> {
      Lock recoveryLock = req.getCore().getSolrCoreState().getRecoveryLock();
      boolean locked = recoveryLock.tryLock();
      SolrCoreState coreState = core.getSolrCoreState();
      try {
        if (!locked)  {
          log.info("we reached this point :: CANCEL BOOTSTRAP, locked :: " + locked);
          handleCancelBootstrap(req, rsp);
        } else if (leaderStateManager.amILeader())  {
          coreState.setCdcrBootstrapRunning(true);
          //running.set(true);
          String masterUrl = req.getParams().get(ReplicationHandler.MASTER_URL);
          BootstrapCallable bootstrapCallable = new BootstrapCallable(masterUrl, core);
          coreState.setCdcrBootstrapCallable(bootstrapCallable);
          Future<Boolean> bootstrapFuture = core.getCoreContainer().getUpdateShardHandler().getRecoveryExecutor()
              .submit(bootstrapCallable);
          try {
            log.info("we reached this point :: all good, bootstrapFuture.get :: " + bootstrapFuture.get());
          } catch (Exception e) {
            log.error("bootstrapFuture.get :: ",e);
          }
          coreState.setCdcrBootstrapFuture(bootstrapFuture);
          try {
            bootstrapFuture.get();
          } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            log.warn("Bootstrap was interrupted", e);
          } catch (ExecutionException e) {
            log.error("Bootstrap operation failed", e);
          }
        } else  {
          log.error("Action {} sent to non-leader replica @ {}:{}. Aborting bootstrap.", CdcrParams.CdcrAction.BOOTSTRAP,
collectionName, shard);
        }
      } finally {
        if (locked) {
          coreState.setCdcrBootstrapRunning(false);
          recoveryLock.unlock();
        }
      }
    };
{code}

*bootstrapFuture.get()* throws:

{quote}
  [beaster]   2> 43072 ERROR (updateExecutor-39-thread-1-processing-n:127.0.0.1:41488_solr
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) [n:127.0.0.1:41488_solr
c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler
Bootstrap operation failed
  [beaster]   2> java.util.concurrent.ExecutionException: java.lang.AssertionError
  [beaster]   2> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  [beaster]   2> 	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
  [beaster]   2> 	at org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:653)
  [beaster]   2> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
  [beaster]   2> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  [beaster]   2> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  [beaster]   2> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
  [beaster]   2> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  [beaster]   2> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  [beaster]   2> 	at java.lang.Thread.run(Thread.java:748)
  [beaster]   2> Caused by: java.lang.AssertionError
  [beaster]   2> 	at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:804)
  [beaster]   2> 	at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:723)
  [beaster]   2> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  [beaster]   2> 	... 5 more
{quote}

and bootstrap operation fails.

FutureTask.java ::
{code}
    /**
     * Returns result or throws exception for completed task.
     * @param s completed state value
     */
    @SuppressWarnings("unchecked")
    private V report(int s) throws ExecutionException {
        Object x = outcome;
        if (s == NORMAL)
            return (V)x;
        if (s >= CANCELLED)
            throw new CancellationException();
        throw new ExecutionException((Throwable)x);
    }
{code}

and the assertion failure is at same function {{finally}} block ::
{code}
        if (closed || !success) {
          // we cannot apply the buffer in this case because it will introduce newer versions
in the
          // update log and then the source cluster will get those versions via collectioncheckpoint
          // causing the versions in between to be completely missed
          boolean dropped = ulog.dropBufferedUpdates();
          assert dropped;
        }
{code}

{{dropped}} is false, {{bufferredUpdates}} are not cleared / dropped? I understand it is calling
its own function but this is difficult to understand.

> CdcrBootstrapTest failing in branch_6_6
> ---------------------------------------
>
>                 Key: SOLR-11278
>                 URL: https://issues.apache.org/jira/browse/SOLR-11278
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: CDCR
>            Reporter: Amrit Sarkar
>            Assignee: Varun Thacker
>         Attachments: SOLR-11278-cancel-bootstrap-on-stop.patch, SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=vi
-Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | CdcrBootstrapTest.testBootstrapWithSourceCluster
<<<
>   [beaster]    > Throwable #1: java.lang.AssertionError: Document mismatch on target
after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message