spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tathagata Das (JIRA)" <>
Subject [jira] [Created] (SPARK-24453) Fix error recovering from the failure in a no-data batch
Date Fri, 01 Jun 2018 23:02:00 GMT
Tathagata Das created SPARK-24453:

             Summary: Fix error recovering from the failure in a no-data batch
                 Key: SPARK-24453
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 2.4.0
            Reporter: Tathagata Das
            Assignee: Tathagata Das

java.lang.AssertionError: assertion failed: Concurrent update to the log. Multiple streaming
jobs detected for 159897

The error occurs when we are recovering from a failure in a no-data batch (say X) that has
been planned (i.e. written to offset log) but not executed (i.e. not written to commit log).
Upon recovery, the following sequence of events happen.

- `MicroBatchExecution.populateStartOffsets` sets `currentBatchId` to X. Since there was no
data in the batch, the `availableOffsets` is same as `committedOffsets`, so `isNewDataAvailable`
is false.
- When MicroBatchExecution.constructNextBatch is called, ideally it should immediately return
true because the next batch has already been constructed. However, the check of whether the
batch has been constructed was `if (isNewDataAvailable) return true`. Since the planned batch
is a no-data batch, it escaped this check and proceeded to plan the same batch X once again.
And if there is new data since the failure, it does plan a new batch, and try to write new
offsets to the `offsetLog` as batchId X, and fail with the above error.

The correct solution is to check the offset log whether the currentBatchId is the latest or

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message