spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shushant Arora <shushantaror...@gmail.com>
Subject Re: spark streaming 1.3 with kafka
Date Tue, 01 Sep 2015 15:16:49 GMT
What if I use custom checkpointing. So that I can take care of offsets
being checkpointed at end of each batch.

Will it be possible then to reset the offset.

On Tue, Sep 1, 2015 at 8:42 PM, Cody Koeninger <cody@koeninger.org> wrote:

> No, if you start arbitrarily messing around with offset ranges after
> compute is called, things are going to get out of whack.
>
> e.g. checkpoints are no longer going to correspond to what you're actually
> processing
>
> On Tue, Sep 1, 2015 at 10:04 AM, Shushant Arora <shushantarora09@gmail.com
> > wrote:
>
>> can I reset the range based on some condition - before calling
>> transformations on the stream.
>>
>> Say -
>> before calling :
>>  directKafkaStream.foreachRDD(new Function<JavaRDD<byte[][]>, Void>()
{
>>
>> @Override
>> public Void call(JavaRDD<byte[][]> v1) throws Exception {
>> v1.foreachPartition(new  VoidFunction<Iterator<byte[][]>>{
>> @Override
>> public void call(Iterator<byte[][]> t) throws Exception {
>> }});}});
>>
>> change directKafkaStream's RDD's offset range.(fromOffset).
>>
>> I can't do this in compute method since compute would have been called at
>> current batch queue time - but condition is set at previous batch run time.
>>
>>
>> On Tue, Sep 1, 2015 at 7:09 PM, Cody Koeninger <cody@koeninger.org>
>> wrote:
>>
>>> It's at the time compute() gets called, which should be near the time
>>> the batch should have been queued.
>>>
>>> On Tue, Sep 1, 2015 at 8:02 AM, Shushant Arora <
>>> shushantarora09@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> In spark streaming 1.3 with kafka- when does driver bring latest
>>>> offsets of this run - at start of each batch or at time when  batch gets
>>>> queued ?
>>>>
>>>> Say few of my batches take longer time to complete than their batch
>>>> interval. So some of batches will go in queue. Will driver waits for
>>>>  queued batches to get started or just brings the latest offsets before
>>>> they even actually started. And when they start running they will work on
>>>> old offsets brought at time when they were queued.
>>>>
>>>>
>>>
>>
>

Mime
View raw message