spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "vincent ye (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-4964) Exactly-once + WAL-free Kafka Support in Spark Streaming
Date Tue, 27 Jan 2015 18:12:35 GMT

    [ https://issues.apache.org/jira/browse/SPARK-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293899#comment-14293899
] 

vincent ye commented on SPARK-4964:
-----------------------------------

I have pretty much the same idea as mentioned in Tathagata's design doc. I prototyped it on
top of  tresata/spark-kafka project. Here is the code 
https://github.com/vincentye38/spark-kafka/tree/InputDStream_updateStateByKey. 
I use StateDStream to checkpoint the offsets since generatedRDDs member variable and clearMetadata()
method of DStream  are not accessible from its subclasses.
I have run it on the staging environment of my company for a week. It can recovery from restarting.

> Exactly-once + WAL-free Kafka Support in Spark Streaming
> --------------------------------------------------------
>
>                 Key: SPARK-4964
>                 URL: https://issues.apache.org/jira/browse/SPARK-4964
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>            Reporter: Cody Koeninger
>
> There are two issues with the current Kafka support 
>  - Use of Write Ahead Logs in Spark Streaming to ensure no data is lost - Causes data
replication in both Kafka AND Spark Streaming. 
>  - Lack of exactly-once semantics - For background, see http://apache-spark-developers-list.1001551.n3.nabble.com/Which-committers-care-about-Kafka-td9827.html
> We want to solve both these problem in JIRA. Please see the following design doc for
the solution. 
> https://docs.google.com/a/databricks.com/document/d/1IuvZhg9cOueTf1mq4qwc1fhPb5FVcaRLcyjrtG4XU1k/edit#heading=h.itproy77j3p



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message