spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyson Condie (JIRA)" <>
Subject [jira] [Created] (SPARK-18790) Keep a general offset history of stream batches
Date Thu, 08 Dec 2016 20:47:58 GMT
Tyson Condie created SPARK-18790:

             Summary: Keep a general offset history of stream batches
                 Key: SPARK-18790
             Project: Spark
          Issue Type: Improvement
            Reporter: Tyson Condie

Instead of only keeping the minimum number of offsets around, we should keep enough information
to allow us to roll back n batches and reexecute the stream starting from a given point. In
particular, we should create a config in SQLConf, spark.sql.streaming.retainedBatches that
defaults to 100 and ensure that we keep enough log files in the following places to roll back
the specified number of batches:

the offsets that are present in each batch
versions of the state store
the files lists stored for the FileStreamSource
the metadata log stored by the FileStreamSink

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message