spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shushant Arora <shushantaror...@gmail.com>
Subject Re: writing to kafka using spark streaming
Date Mon, 06 Jul 2015 19:09:29 GMT
whats the difference between foreachPartition vs mapPartitions for a
Dtstream both works at partition granularity?

One is an operation and another is action but if I call an opeartion
afterwords mapPartitions  also, which one is more efficient and recommeded?

On Tue, Jul 7, 2015 at 12:21 AM, Tathagata Das <tdas@databricks.com> wrote:

> Yeah, creating a new producer at the granularity of partitions may not be
> that costly.
>
> On Mon, Jul 6, 2015 at 6:40 AM, Cody Koeninger <cody@koeninger.org> wrote:
>
>> Use foreachPartition, and allocate whatever the costly resource is once
>> per partition.
>>
>> On Mon, Jul 6, 2015 at 6:11 AM, Shushant Arora <shushantarora09@gmail.com
>> > wrote:
>>
>>> I have a requirement to write in kafka queue from a spark streaming
>>> application.
>>>
>>> I am using spark 1.2 streaming. Since different executors in spark are
>>> allocated at each run so instantiating a new kafka producer at each run
>>> seems a costly operation .Is there a way to reuse objects in processing
>>> executors(not in receivers)?
>>>
>>>
>>>
>>
>

Mime
View raw message