spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Spark Streaming Specify Kafka Partition
Date Tue, 01 Dec 2015 16:12:12 GMT
I actually haven't tried that, since I tend to do the offset lookups if
necessary.

It's possible that it will work, try it and let me know.

Be aware that if you're doing a count() or take() operation directly on the
rdd it'll definitely give you the wrong result if you're using -1 for one
of the offsets.



On Tue, Dec 1, 2015 at 9:58 AM, Alan Braithwaite <alan@cloudflare.com>
wrote:

> Neat, thanks.  If I specify something like -1 as the offset, will it
> consume from the latest offset or do I have to instrument that manually?
>
> - Alan
>
> On Tue, Dec 1, 2015 at 6:43 AM, Cody Koeninger <cody@koeninger.org> wrote:
>
>> Yes, there is a version of createDirectStream that lets you specify
>> fromOffsets: Map[TopicAndPartition, Long]
>>
>> On Mon, Nov 30, 2015 at 7:43 PM, Alan Braithwaite <alan@cloudflare.com>
>> wrote:
>>
>>> Is there any mechanism in the kafka streaming source to specify the
>>> exact partition id that we want a streaming job to consume from?
>>>
>>> If not, is there a workaround besides writing our a custom receiver?
>>>
>>> Thanks,
>>> - Alan
>>>
>>
>>
>

Mime
View raw message