spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tsai Li Ming <mailingl...@ltsai.com>
Subject Re: Configuring shuffle write directory
Date Fri, 28 Mar 2014 06:29:52 GMT

Hi,

Thanks! I found out that I wasn’t setting the SPARK_JAVA_OPTS correctly..

I took a look at the process table and saw that the “org.apache.spark.executor.CoarseGrainedExecutorBackend”
didn’t have the -Dspark.local.dir set.




On 28 Mar, 2014, at 1:05 pm, Matei Zaharia <matei.zaharia@gmail.com> wrote:

> I see, are you sure that was in spark-env.sh instead of spark-env.sh.template? You need
to copy it to just a .sh file. Also make sure the file is executable.
> 
> Try doing println(sc.getConf.toDebugString) in your driver program and seeing what properties
it prints. As far as I can tell, spark.local.dir should *not* be set there, so workers should
get it from their spark-env.sh. It’s true that if you set spark.local.dir in the driver
it would pass that on to the workers for that job.
> 
> Matei
> 
> On Mar 27, 2014, at 9:57 PM, Tsai Li Ming <mailinglist@ltsai.com> wrote:
> 
>> Yes, I have tried that by adding it to the Worker. I can see the "app-20140328124540-000”
in the local spark directory of the worker.
>> 
>> But the “spark-local” directories are always written to /tmp since is the default
spark.local.dir is taken from java.io.tempdir?
>> 
>> 
>> 
>> On 28 Mar, 2014, at 12:42 pm, Matei Zaharia <matei.zaharia@gmail.com> wrote:
>> 
>>> Yes, the problem is that the driver program is overriding it. Have you set it
manually in the driver? Or how did you try setting it in workers? You should set it by adding
>>> 
>>> export SPARK_JAVA_OPTS=“-Dspark.local.dir=whatever”
>>> 
>>> to conf/spark-env.sh on those workers.
>>> 
>>> Matei
>>> 
>>> On Mar 27, 2014, at 9:04 PM, Tsai Li Ming <mailinglist@ltsai.com> wrote:
>>> 
>>>> Anyone can help?
>>>> 
>>>> How can I configure a different spark.local.dir for each executor?
>>>> 
>>>> 
>>>> On 23 Mar, 2014, at 12:11 am, Tsai Li Ming <mailinglist@ltsai.com>
wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Each of my worker node has its own unique spark.local.dir.
>>>>> 
>>>>> However, when I run spark-shell, the shuffle writes are always written
to /tmp despite being set when the worker node is started.
>>>>> 
>>>>> By specifying the spark.local.dir for the driver program, it seems to
override the executor? Is there a way to properly define it in the worker node?
>>>>> 
>>>>> Thanks!
>>>> 
>>> 
>> 
> 


Mime
View raw message