spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Kafka Direct Stream
Date Thu, 01 Oct 2015 14:06:07 GMT
You can get the topic for a given partition from the offset range.  You can
either filter using that; or just have a single rdd and match on topic when
doing mapPartitions or foreachPartition (which I think is a better idea)

http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers

On Wed, Sep 30, 2015 at 5:02 PM, Udit Mehta <umehta@groupon.com> wrote:

> Hi,
>
> I am using spark direct stream to consume from multiple topics in Kafka. I
> am able to consume fine but I am stuck at how to separate the data for each
> topic since I need to process data differently depending on the topic.
> I basically want to split the RDD consisting on N topics into N RDD's each
> having 1 topic.
>
> Any help would be appreciated.
>
> Thanks in advance,
> Udit
>

Mime
View raw message