spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "bharath kumar avusherla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-25052) Is there any possibility that spark structured streaming generate duplicates in the output?
Date Wed, 08 Aug 2018 03:18:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-25052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572608#comment-16572608
] 

bharath kumar avusherla commented on SPARK-25052:
-------------------------------------------------

i also thought about it. Hence I created it as question. Anyhow i will send the question
to the mailing list.

> Is there any possibility that spark structured streaming generate duplicates in the output?
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-25052
>                 URL: https://issues.apache.org/jira/browse/SPARK-25052
>             Project: Spark
>          Issue Type: Question
>          Components: Spark Core
>    Affects Versions: 2.3.0
>            Reporter: bharath kumar avusherla
>            Priority: Minor
>
> We recently observed that the spark structured streaming generated duplicates in the
output when reading from Kafka topic and storing the output to the S3 (and checkpointing
in S3).  We ran into this issue twice. This is not reproducible. Is there anyone has ever
faced this kind of issue before? Is this because of S3 eventual consistency?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message