spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yotto Koga <yotto.k...@autodesk.com>
Subject RE: configure to run multiple tasks on a core
Date Thu, 27 Nov 2014 03:47:16 GMT
Indeed. That's nice.

Thanks!

yotto
________________________________________
From: Matei Zaharia [matei.zaharia@gmail.com]
Sent: Wednesday, November 26, 2014 6:11 PM
To: Yotto Koga
Cc: Sean Owen; user@spark.apache.org
Subject: Re: configure to run multiple tasks on a core

Instead of SPARK_WORKER_INSTANCES you can also set SPARK_WORKER_CORES, to have one worker
that thinks it has more cores.

Matei

> On Nov 26, 2014, at 5:01 PM, Yotto Koga <yotto.koga@autodesk.com> wrote:
>
> Thanks Sean. That worked out well.
>
> For anyone who happens onto this post and wants to do the same, these are the steps I
took to do as Sean suggested...
>
> (Note this is for a stand alone cluster)
>
> login to the master
>
> ~/spark/sbin/stop-all.sh
>
> edit ~/spark/conf/spark-env.sh
>
> modify the line
> export SPARK_WORKER_INSTANCES=1
> to the multiple you want to set (e.g 2)
>
> I also added
> export SPARK_WORKER_MEMORY=some reasonable value so that the total number of workers
on a node is within the available memory available on the node (e.g. 2g)
>
> ~/spark-ec2/copy-dir /root/spark/conf
>
> ~/spark/sbin/start-all.sh
>
>
> ________________________________________
> From: Sean Owen [sowen@cloudera.com]
> Sent: Wednesday, November 26, 2014 12:14 AM
> To: Yotto Koga
> Cc: user@spark.apache.org
> Subject: Re: configure to run multiple tasks on a core
>
> What about running, say, 2 executors per machine, each of which thinks
> it should use all cores?
>
> You can also multi-thread your map function manually, directly, within
> your code, with careful use of a java.util.concurrent.Executor
>
> On Wed, Nov 26, 2014 at 6:57 AM, yotto <yotto.koga@autodesk.com> wrote:
>> I'm running a spark-ec2 cluster.
>>
>> I have a map task that calls a specialized C++ external app. The app doesn't
>> fully utilize the core as it needs to download/upload data as part of the
>> task. Looking at the worker nodes, it appears that there is one task with my
>> app running per core.
>>
>> I'd like to better utilize the cpu resources with the hope of increasing
>> throughput by running multiple tasks (with my app) per core in parallel.
>>
>> I see there is a spark.task.cpus config setting with a default value of 1.
>> It appears though that this is used to go the other way than what I am
>> looking for.
>>
>> Is there a way where I can specify multiple tasks per core rather than
>> multiple cores per task?
>>
>> thanks for any help.
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/configure-to-run-multiple-tasks-on-a-core-tp19834.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message