spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "vincent ye (JIRA)" <>
Subject [jira] [Commented] (SPARK-4964) Exactly-once + WAL-free Kafka Support in Spark Streaming
Date Tue, 27 Jan 2015 18:12:35 GMT


vincent ye commented on SPARK-4964:

I have pretty much the same idea as mentioned in Tathagata's design doc. I prototyped it on
top of  tresata/spark-kafka project. Here is the code 
I use StateDStream to checkpoint the offsets since generatedRDDs member variable and clearMetadata()
method of DStream  are not accessible from its subclasses.
I have run it on the staging environment of my company for a week. It can recovery from restarting.

> Exactly-once + WAL-free Kafka Support in Spark Streaming
> --------------------------------------------------------
>                 Key: SPARK-4964
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>            Reporter: Cody Koeninger
> There are two issues with the current Kafka support 
>  - Use of Write Ahead Logs in Spark Streaming to ensure no data is lost - Causes data
replication in both Kafka AND Spark Streaming. 
>  - Lack of exactly-once semantics - For background, see
> We want to solve both these problem in JIRA. Please see the following design doc for
the solution. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message