spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Latest 200 messages per topic
Date Tue, 19 Jul 2016 15:14:35 GMT
Unless you're using only 1 partition per topic, there's no reasonable
way of doing this.  Offsets for one topicpartition do not necessarily
have anything to do with offsets for another topicpartition.  You
could do the last (200 / number of partitions) messages per
topicpartition, but you have no guarantee as to the time those events
represent, especially if your producers are misbehaving.  To be
perfectly clear, this is a consequence of the Kafka data model, and
has nothing to do with spark.

So, given that it's a bad idea and doesn't really do what you're
asking...  you can do this using KafkaUtils.createRDD

On Sat, Jul 16, 2016 at 10:43 AM, Rabin Banerjee
<dev.rabin.banerjee@gmail.com> wrote:
> Just to add ,
>
>   I want to read the MAX_OFFSET of a topic , then read MAX_OFFSET-200 ,
> every time .
>
> Also I want to know , If I want to fetch a specific offset range for Batch
> processing, is there any option for doing that ?
>
>
>
>
> On Sat, Jul 16, 2016 at 9:08 PM, Rabin Banerjee
> <dev.rabin.banerjee@gmail.com> wrote:
>>
>> HI All,
>>
>>    I have 1000 kafka topics each storing messages for different devices .
>> I want to use the direct approach for connecting kafka from Spark , in which
>> I am only interested in latest 200 messages in the Kafka .
>>
>> How do I do that ?
>>
>> Thanks.
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message