Both have same efficiency. The primary difference is that one is a transformation (hence is lazy, and requires another action to actually execute), and the other is an action. 
But it may be a slightly better design in general to have "transformations" be purely functional (that is, no external side effect) and all non-functional stuff be "actions" (e.g., saveAsHadoopFile is an action).


On Mon, Jul 6, 2015 at 12:09 PM, Shushant Arora <shushantarora09@gmail.com> wrote:
whats the difference between foreachPartition vs mapPartitions for a Dtstream both works at partition granularity?

One is an operation and another is action but if I call an opeartion afterwords mapPartitions  also, which one is more efficient and recommeded?

On Tue, Jul 7, 2015 at 12:21 AM, Tathagata Das <tdas@databricks.com> wrote:
Yeah, creating a new producer at the granularity of partitions may not be that costly.

On Mon, Jul 6, 2015 at 6:40 AM, Cody Koeninger <cody@koeninger.org> wrote:
Use foreachPartition, and allocate whatever the costly resource is once per partition.

On Mon, Jul 6, 2015 at 6:11 AM, Shushant Arora <shushantarora09@gmail.com> wrote:
I have a requirement to write in kafka queue from a spark streaming application.

I am using spark 1.2 streaming. Since different executors in spark are allocated at each run so instantiating a new kafka producer at each run seems a costly operation .Is there a way to reuse objects in processing executors(not in receivers)?