spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Shreedharan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-4707) Reliable Kafka Receiver can lose data if the block generator fails to store data
Date Wed, 10 Dec 2014 01:47:12 GMT

    [ https://issues.apache.org/jira/browse/SPARK-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240496#comment-14240496
] 

Hari Shreedharan commented on SPARK-4707:
-----------------------------------------

TD and I discussed this and decided that the second option can be implemented with only a
limited number of retries, which can make it quite readable and less complex.

> Reliable Kafka Receiver can lose data if the block generator fails to store data
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-4707
>                 URL: https://issues.apache.org/jira/browse/SPARK-4707
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.2.0
>            Reporter: Hari Shreedharan
>            Priority: Critical
>
> The Reliable Kafka Receiver commits offsets only when events are actually stored, which
ensures that on restart we will actually start where we left off. But if the failure happens
in the store() call, and the block generator reports an error the receiver does not do anything
and will continue reading from the current offset and not the last commit. This means that
messages between the last commit and the current offset will be lost. 
> I will send a PR for this soon - I have a patch which needs some minor fixes, which I
need to test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message