spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <>
Subject Re: Spark driver assigning splits to incorrect workers
Date Fri, 01 Jul 2016 22:03:44 GMT
I guess you extended some InputFormat for providing locality information.

Can you share some code snippet ?

Which non-distributed file system are you using ?


On Fri, Jul 1, 2016 at 2:46 PM, Raajen <> wrote:

> I would like to use Spark on a non-distributed file system but am having
> trouble getting the driver to assign tasks to the workers that are local to
> the files. I have extended InputSplits to create my own version of
> FileSplits, so that each worker gets a bit more information than the
> default
> FileSplit provides. I thought that the driver would assign splits based on
> their locality. But I have found that the driver will send these splits to
> workers seemingly at random -- even the very first split will go to a node
> with a different IP than the split specifies. I can see that I am providing
> the right node address via GetLocations. I also set spark.locality.wait to
> a
> high value, but the same misassignment keeps happening.
> I am using newAPIHadoopFile to create my RDD. InputFormat is creating the
> required splits, but not all splits refer to the same file or the same
> worker IP.
> What else I can check, or change, to force the driver to send these tasks
> to
> the right workers?
> Thanks!
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

View raw message