spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shushant Arora <shushantaror...@gmail.com>
Subject Re: spark streaming 1.3 with kafka
Date Tue, 01 Sep 2015 15:04:58 GMT
can I reset the range based on some condition - before calling
transformations on the stream.

Say -
before calling :
 directKafkaStream.foreachRDD(new Function<JavaRDD<byte[][]>, Void>() {

@Override
public Void call(JavaRDD<byte[][]> v1) throws Exception {
v1.foreachPartition(new  VoidFunction<Iterator<byte[][]>>{
@Override
public void call(Iterator<byte[][]> t) throws Exception {
}});}});

change directKafkaStream's RDD's offset range.(fromOffset).

I can't do this in compute method since compute would have been called at
current batch queue time - but condition is set at previous batch run time.


On Tue, Sep 1, 2015 at 7:09 PM, Cody Koeninger <cody@koeninger.org> wrote:

> It's at the time compute() gets called, which should be near the time the
> batch should have been queued.
>
> On Tue, Sep 1, 2015 at 8:02 AM, Shushant Arora <shushantarora09@gmail.com>
> wrote:
>
>> Hi
>>
>> In spark streaming 1.3 with kafka- when does driver bring latest offsets
>> of this run - at start of each batch or at time when  batch gets queued ?
>>
>> Say few of my batches take longer time to complete than their batch
>> interval. So some of batches will go in queue. Will driver waits for
>>  queued batches to get started or just brings the latest offsets before
>> they even actually started. And when they start running they will work on
>> old offsets brought at time when they were queued.
>>
>>
>

Mime
View raw message