spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From edward cui <edwardcu...@gmail.com>
Subject Re: number of executors
Date Mon, 18 May 2015 16:55:16 GMT
Oh BTW, it's spark 1.3.1 on hadoop 2.4. AIM 3.6.

Sorry for lefting out this information.

Appreciate for any help!

Ed

2015-05-18 12:53 GMT-04:00 edward cui <edwardcuims@gmail.com>:

> I actually have the same problem, but I am not sure whether it is a spark
> problem or a Yarn problem.
>
> I set up a five nodes cluster on aws emr, start yarn daemon on the master
> (The node manager will not be started on default on the master, I don't
> want to waste any resource since I have to pay). And submit the spark task
> through yarn-cluster mode. The command is:
> ./spark/bin/spark-submit --master yearn-cluster --num-executors 5
> --exectutor-cores 4 --propertifies-file spark-application.conf myapp.py
>
> But the yarn resource manager only created 4 containers on 4 nodes, and
> one node was completely on idle.
>
> More details about the setup:
> EMR node:
> m3.xlarge: 16g ram 4 cores 40g ssd (HDFS on EBS?)
>
> Yarn-site.xml:
> yarn.scheduler.maximum-allocation-mb=11520
> yarn.nodemanager.resource.memory-mb=11520
>
> Spark-conf:
>
> spark.executor.memory 		        10g
>
> spark.storage.memoryFraction 	        0.2
>
> spark.python.worker.memory	        1500mspark.akka.frameSize                    200spark.shuffle.memoryFraction
           0.1
>
> spark.driver.memory                     10g
>
>
> Hadoop behavior observed:
> Create 4 containers on four nodes including emr master but one emr slave
> on idle (memory consumption around 2g and 0% cpu occupation)
> Spark use one container for driver on emr slave node (make sense since I
> required that much of memory)
> Use the other three node for computing the tasks.
>
>
> If yarn can't use all the nodes and I have to pay for the node, it's just a big waste
: p
>
>
> Any thoughts on this?
>
>
> Great thanks,
>
> Ed
>
>
>
> 2015-05-18 12:07 GMT-04:00 Sandy Ryza <sandy.ryza@cloudera.com>:
>
> *All
>>
>> On Mon, May 18, 2015 at 9:07 AM, Sandy Ryza <sandy.ryza@cloudera.com>
>> wrote:
>>
>>> Hi Xiaohe,
>>>
>>> The all Spark options must go before the jar or they won't take effect.
>>>
>>> -Sandy
>>>
>>> On Sun, May 17, 2015 at 8:59 AM, xiaohe lan <zombiexcoder@gmail.com>
>>> wrote:
>>>
>>>> Sorry, them both are assigned task actually.
>>>>
>>>> Aggregated Metrics by Executor
>>>> Executor IDAddressTask TimeTotal TasksFailed TasksSucceeded TasksInput
>>>> Size / RecordsShuffle Write Size / RecordsShuffle Spill (Memory)Shuffle
>>>> Spill (Disk)1host1:61841.7 min505640.0 MB / 12318400382.3 MB / 121007701630.4
>>>> MB295.4 MB2host2:620721.7 min505640.0 MB / 12014510386.0 MB / 109269121646.6
>>>> MB304.8 MB
>>>>
>>>> On Sun, May 17, 2015 at 11:50 PM, xiaohe lan <zombiexcoder@gmail.com>
>>>> wrote:
>>>>
>>>>> bash-4.1$ ps aux | grep SparkSubmit
>>>>> xilan     1704 13.2  1.2 5275520 380244 pts/0  Sl+  08:39   0:13
>>>>> /scratch/xilan/jdk1.8.0_45/bin/java -cp
>>>>> /scratch/xilan/spark/conf:/scratch/xilan/spark/lib/spark-assembly-1.3.1-hadoop2.4.0.jar:/scratch/xilan/spark/lib/datanucleus-core-3.2.10.jar:/scratch/xilan/spark/lib/datanucleus-api-jdo-3.2.6.jar:/scratch/xilan/spark/lib/datanucleus-rdbms-3.2.9.jar:/scratch/xilan/hadoop/etc/hadoop
>>>>> -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit --master yarn
>>>>> target/scala-2.10/simple-project_2.10-1.0.jar --class scala.SimpleApp
>>>>> --num-executors 5 --executor-cores 4
>>>>> xilan     1949  0.0  0.0 103292   800 pts/1    S+   08:40   0:00 grep
>>>>> --color SparkSubmit
>>>>>
>>>>>
>>>>> When look at the sparkui, I see the following:
>>>>> Aggregated Metrics by ExecutorExecutor IDAddressTask TimeTotal TasksFailed
>>>>> TasksSucceeded TasksShuffle Read Size / Records1host1:304836 s101127.1
>>>>> MB / 28089782host2:49970 ms00063.4 MB / 1810945
>>>>>
>>>>> So executor 2 is not even assigned a task ? Maybe I have some problems
>>>>> in my setting, but I don't know what could be the possible settings I
set
>>>>> wrong or have not set.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Xiaohe
>>>>>
>>>>> On Sun, May 17, 2015 at 11:16 PM, Akhil Das <
>>>>> akhil@sigmoidanalytics.com> wrote:
>>>>>
>>>>>> Did you try --executor-cores param? While you submit the job, do
a ps
>>>>>> aux | grep spark-submit and see the exact command parameters.
>>>>>>
>>>>>> Thanks
>>>>>> Best Regards
>>>>>>
>>>>>> On Sat, May 16, 2015 at 12:31 PM, xiaohe lan <zombiexcoder@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have a 5 nodes yarn cluster, I used spark-submit to submit
a
>>>>>>> simple app.
>>>>>>>
>>>>>>>  spark-submit --master yarn
>>>>>>> target/scala-2.10/simple-project_2.10-1.0.jar --class scala.SimpleApp
>>>>>>> --num-executors 5
>>>>>>>
>>>>>>> I have set the number of executor to 5, but from sparkui I could
see
>>>>>>> only two executors and it ran very slow. What did I miss ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Xiaohe
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message