spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tathagata Das (JIRA)" <>
Subject [jira] [Updated] (SPARK-7139) Allow received block metadata to be saved to WAL and recovered on driver failure
Date Sat, 01 Aug 2015 05:57:04 GMT


Tathagata Das updated SPARK-7139:
    Issue Type: Sub-task  (was: Improvement)
        Parent: SPARK-9215

> Allow received block metadata to be saved to WAL and recovered on driver failure
> --------------------------------------------------------------------------------
>                 Key: SPARK-7139
>                 URL:
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Streaming
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>            Priority: Blocker
>             Fix For: 1.4.0
> The received API allows arbitrary metadata to be added for each block. However that information
is not saved in the WAL as part of the block information in the driver. 
> To fix this, the following needs to be done. 
> 1. Forward the metadata to the ReceivedBlockTracker in the driver.
> 2. ReceivedBlockTracker saves the metadata and recovers it on restart. 
> However there is one tricky thing. The ReceivedBlockTracker WAL is enabled only when
`spark.streaming.receiver.writeAheadLog.enable = true`. This means that only when  receiver
WAL is enabled is the driver WAL enabled. This is not desired as the one may want to save
and recovered block metadata information (especially information like Kafka offsets or Kinesis
sequence numbers) that can be used to recover data without actually saving the data to the
receiver WAL. So we have to always enable the tracker WAL. 
> 3. Always enable the ReceivedBlockTracker WAL. However, make sure that the WriteAheadLogBackedBlockRDD
skips block lookup after restart as the blocks are obviously gone.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message