spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Shreedharan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM
Date Thu, 18 Dec 2014 21:35:14 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252319#comment-14252319
] 

Hari Shreedharan commented on SPARK-3146:
-----------------------------------------

If you look at the addDataWithCallback method, we pass (data, metadata) to the BlockGenerator,
where data here is K,V and metadata is the (topicAndPartition, msgAndMetadata.offset). Would
passing data, metadata to an intercept method be useful?

> Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process
message before storing into BM
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-3146
>                 URL: https://issues.apache.org/jira/browse/SPARK-3146
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.0.2, 1.1.0
>            Reporter: Saisai Shao
>
> Currently Spark Streaming Kafka API stores the key and value of each message into BM
for processing, potentially this may lose the flexibility for different requirements:
> 1. currently topic/partition/offset information for each message is discarded by KafkaInputDStream.
In some scenarios people may need this information to better filter the message, like SPARK-2388
described.
> 2. People may need to add timestamp for each message when feeding into Spark Streaming,
which can better measure the system latency.
> 3. Checkpointing the partition/offsets or others...
> So here we add a messageHandler in interface to give people the flexibility to preprocess
the message before storing into BM. In the meantime time this improvement keep compatible
with current API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message