spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <t...@databricks.com>
Subject Re: writing to kafka using spark streaming
Date Mon, 06 Jul 2015 19:14:58 GMT
Both have same efficiency. The primary difference is that one is a
transformation (hence is lazy, and requires another action to actually
execute), and the other is an action.
But it may be a slightly better design in general to have "transformations"
be purely functional (that is, no external side effect) and all
non-functional stuff be "actions" (e.g., saveAsHadoopFile is an action).


On Mon, Jul 6, 2015 at 12:09 PM, Shushant Arora <shushantarora09@gmail.com>
wrote:

> whats the difference between foreachPartition vs mapPartitions for a
> Dtstream both works at partition granularity?
>
> One is an operation and another is action but if I call an opeartion
> afterwords mapPartitions  also, which one is more efficient and
> recommeded?
>
> On Tue, Jul 7, 2015 at 12:21 AM, Tathagata Das <tdas@databricks.com>
> wrote:
>
>> Yeah, creating a new producer at the granularity of partitions may not be
>> that costly.
>>
>> On Mon, Jul 6, 2015 at 6:40 AM, Cody Koeninger <cody@koeninger.org>
>> wrote:
>>
>>> Use foreachPartition, and allocate whatever the costly resource is once
>>> per partition.
>>>
>>> On Mon, Jul 6, 2015 at 6:11 AM, Shushant Arora <
>>> shushantarora09@gmail.com> wrote:
>>>
>>>> I have a requirement to write in kafka queue from a spark streaming
>>>> application.
>>>>
>>>> I am using spark 1.2 streaming. Since different executors in spark are
>>>> allocated at each run so instantiating a new kafka producer at each run
>>>> seems a costly operation .Is there a way to reuse objects in processing
>>>> executors(not in receivers)?
>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message