spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: configure to run multiple tasks on a core
Date Thu, 27 Nov 2014 02:11:48 GMT
Instead of SPARK_WORKER_INSTANCES you can also set SPARK_WORKER_CORES, to have one worker that
thinks it has more cores.

Matei

> On Nov 26, 2014, at 5:01 PM, Yotto Koga <yotto.koga@autodesk.com> wrote:
> 
> Thanks Sean. That worked out well.
> 
> For anyone who happens onto this post and wants to do the same, these are the steps I
took to do as Sean suggested...
> 
> (Note this is for a stand alone cluster)
> 
> login to the master
> 
> ~/spark/sbin/stop-all.sh
> 
> edit ~/spark/conf/spark-env.sh
> 
> modify the line
> export SPARK_WORKER_INSTANCES=1
> to the multiple you want to set (e.g 2)
> 
> I also added
> export SPARK_WORKER_MEMORY=some reasonable value so that the total number of workers
on a node is within the available memory available on the node (e.g. 2g)
> 
> ~/spark-ec2/copy-dir /root/spark/conf
> 
> ~/spark/sbin/start-all.sh
> 
> 
> ________________________________________
> From: Sean Owen [sowen@cloudera.com]
> Sent: Wednesday, November 26, 2014 12:14 AM
> To: Yotto Koga
> Cc: user@spark.apache.org
> Subject: Re: configure to run multiple tasks on a core
> 
> What about running, say, 2 executors per machine, each of which thinks
> it should use all cores?
> 
> You can also multi-thread your map function manually, directly, within
> your code, with careful use of a java.util.concurrent.Executor
> 
> On Wed, Nov 26, 2014 at 6:57 AM, yotto <yotto.koga@autodesk.com> wrote:
>> I'm running a spark-ec2 cluster.
>> 
>> I have a map task that calls a specialized C++ external app. The app doesn't
>> fully utilize the core as it needs to download/upload data as part of the
>> task. Looking at the worker nodes, it appears that there is one task with my
>> app running per core.
>> 
>> I'd like to better utilize the cpu resources with the hope of increasing
>> throughput by running multiple tasks (with my app) per core in parallel.
>> 
>> I see there is a spark.task.cpus config setting with a default value of 1.
>> It appears though that this is used to go the other way than what I am
>> looking for.
>> 
>> Is there a way where I can specify multiple tasks per core rather than
>> multiple cores per task?
>> 
>> thanks for any help.
>> 
>> 
>> 
>> --
>> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/configure-to-run-multiple-tasks-on-a-core-tp19834.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message