spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Gouta <ali.go...@gmail.com>
Subject Re: Spark Streaming - print accumulators value every period as logs
Date Fri, 25 Dec 2015 08:47:53 GMT
Something like

Stream.foreachRdd(rdd=>
rdd.collect.foreach(print accum))

Should answer your question. You get things printed in Each batch interval

Ali Gouta
Le 25 déc. 2015 04:22, "Roberto Coluccio" <roberto.coluccio@gmail.com> a
écrit :

> Hello,
>
> I have a batch and a streaming driver using same functions (Scala). I use
> accumulators (passed to functions constructors) to count stuff.
>
> In the batch driver, doing so in the right point of the pipeline, I'm able
> to retrieve the accumulator value and print it as log4j log.
>
> In the streaming driver, doing the same results in just nothing. That's
> probably due to the fact that accumulators in the streaming driver are
> created empty and the code to print them is executed once at the driver
> (when they are empty) when the StreamingContext is started and the DAG is
> created.
>
> I'm looking for a way to log at every batch period of my Spark Streaming
> driver the current value of my accumulators. Indeed, I wish to reset such
> accumulators at each period so to just have the counts related to that
> period.
>
> Any advice would be really appreciated.
>
> Thanks,
> Roberto
>

Mime
View raw message