spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Direct Kafka input stream and window(…) function
Date Tue, 22 Mar 2016 17:24:28 GMT
I definitely have direct stream jobs that use window() without
problems... Can you post a minimal code example that reproduces the
problem?

Using print() will confuse the issue, since print() will try to only
use the first partition.

Use foreachRDD { rdd => rdd.foreach(println)

or something comparable

On Tue, Mar 22, 2016 at 10:14 AM, Martin Soch <Martin.Soch@oracle.com> wrote:
> Hi all,
>
> I am using direct-Kafka-input-stream in my Spark app. When I use window(...)
> function in the chain it will cause the processing pipeline to stop - when I
> open the Spark-UI I can see that the streaming batches are being queued and
> the pipeline reports to process one of the first batches.
>
> To be more correct: the issue happens only when the windows overlap (if
> sliding_interval < window_length). Otherwise the system behaves as expected.
>
> Derivations of window(..) function - like reduceByKeyAndWindow(..), etc.
> works also as expected - pipeline doesn't stop. The same applies when using
> different type of stream.
>
> Is it some known limitation of window(..) function when used with
> direct-Kafka-input-stream ?
>
> Java pseudo code:
> org.apache.spark.streaming.kafka.DirectKafkaInputDStream s;
> s.window(Durations.seconds(10)).print();  // the pipeline will stop
>
> Thanks
> Martin
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message