spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniket Bhatnagar <>
Subject Re: Execute function once on each node
Date Mon, 18 Jul 2016 22:43:34 GMT
You can't assume that the number to nodes will be constant as some may
fail, hence you can't guarantee that a function will execute at most once
or atleast once on a node. Can you explain your use case in a bit more

On Mon, Jul 18, 2016, 10:57 PM joshuata <> wrote:

> I am working on a spark application that requires the ability to run a
> function on each node in the cluster. This is used to read data from a
> directory that is not globally accessible to the cluster. I have tried
> creating an RDD with n elements and n partitions so that it is evenly
> distributed among the n nodes, and then mapping a function over the RDD.
> However, the runtime makes no guarantees that each partition will be stored
> on a separate node. This means that the code will run multiple times on the
> same node while never running on another.
> I have looked through the documentation and source code for both RDDs and
> the scheduler, but I haven't found anything that will do what I need. Does
> anybody know of a solution I could use?
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

View raw message