spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: spark streaming kafa best practices ?
Date Wed, 17 Dec 2014 18:08:45 GMT
Foreach is slightly more efficient because Spark doesn't bother to try
and collect results from each task since it's understood there will be
no return type. I think the difference is very marginal though - it's
mostly stylistic... typically you use foreach for something that is
intended to produce a side effect and map for something that will
return a new dataset.

On Wed, Dec 17, 2014 at 5:43 AM, Gerard Maas <gerard.maas@gmail.com> wrote:
> Patrick,
>
> I was wondering why one would choose for rdd.map vs rdd.foreach to execute a
> side-effecting function on an RDD.
>
> -kr, Gerard.
>
> On Sat, Dec 6, 2014 at 12:57 AM, Patrick Wendell <pwendell@gmail.com> wrote:
>>
>> The second choice is better. Once you call collect() you are pulling
>> all of the data onto a single node, you want to do most of the
>> processing  in parallel on the cluster, which is what map() will do.
>> Ideally you'd try to summarize the data or reduce it before calling
>> collect().
>>
>> On Fri, Dec 5, 2014 at 5:26 AM, david <david4it@free.fr> wrote:
>> > hi,
>> >
>> >   What is the bet way to process a batch window in SparkStreaming :
>> >
>> >     kafkaStream.foreachRDD(rdd => {
>> >       rdd.collect().foreach(event => {
>> >         // process the event
>> >         process(event)
>> >       })
>> >     })
>> >
>> >
>> > Or
>> >
>> >     kafkaStream.foreachRDD(rdd => {
>> >       rdd.map(event => {
>> >         // process the event
>> >         process(event)
>> >       }).collect()
>> >     })
>> >
>> >
>> > thank's
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> > http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-kafa-best-practices-tp20470.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> > For additional commands, e-mail: user-help@spark.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message