spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kant kodali <kanth...@gmail.com>
Subject Do I need to do .collect inside forEachRDD
Date Tue, 05 Dec 2017 20:35:44 GMT
Hi All,

I have a simple stateless transformation using Dstreams (stuck with the old
API for one of the Application). The pseudo code is rough like this

dstream.map().reduce().forEachRdd(rdd -> {
     rdd.collect(),forEach(); // Is this necessary ? Does execute fine but
a bit slow
})

I understand collect collects the results back to the driver but is that
necessary? can I just do something like below? I believe I tried both and
somehow the below code didn't output any results (It can be issues with my
env. I am not entirely sure) but I just would like some clarification on
.collect() since it seems to slow things down for me.

dstream.map().reduce().forEachRdd(rdd -> {
     rdd.forEach(() -> {} ); //
})

Thanks!

Mime
View raw message