spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From akshay naidu <akshaynaid...@gmail.com>
Subject Re: Run Multiple Spark jobs. Reduce Execution time.
Date Wed, 14 Feb 2018 10:44:41 GMT
**********************************************************************************************************************
yarn-site.xml


 <property>

<name>yarn.scheduler.fair.preemption.cluster-utilization-threshold</name>
    <value>0.8</value>
  </property>

<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>3584</value>
</property>

<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>10752</value>
</property>

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>10752</value>

******************************************************************************************************************************
spark-defaults.conf

spark.master                       yarn
spark.driver.memory                9g
spark.executor.memory              1024m
spark.yarn.executor.memoryOverhead 1024m
spark.eventLog.enabled  true
spark.eventLog.dir hdfs://tech-master:54310/spark-logs

spark.history.provider
org.apache.spark.deploy.history.FsHistoryProvider
spark.history.fs.logDirectory     hdfs://tech-master:54310/spark-logs
spark.history.fs.update.interval  10s
spark.history.ui.port             18080

spark.ui.enabled                true
spark.ui.port                   4040
spark.ui.killEnabled            true
spark.ui.retainedDeadExecutors  100

spark.scheduler.mode            FAIR
spark.scheduler.allocation.file
/usr/local/spark/current/conf/fairscheduler.xml

#spark.submit.deployMode         cluster
spark.default.parallelism        30

SPARK_WORKER_MEMORY 10g
SPARK_WORKER_INSTANCES 1
SPARK_WORKER_CORES 5

SPARK_DRIVER_MEMORY 9g
SPARK_DRIVER_CORES 5

SPARK_MASTER_IP Tech-master
SPARK_MASTER_PORT 7077

On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu <akshaynaidu.9@gmail.com>
wrote:

> Hello,
> I'm try to run multiple spark jobs on cluster running in yarn.
> Master is 24GB server with 6 Slaves of 12GB
>
> fairscheduler.xml settings are -
> <pool name="default">
>     <schedulingMode>FAIR</schedulingMode>
>     <weight>10</weight>
>     <minShare>2</minShare>
> </pool>
>
> I am running 8 jobs simultaneously , jobs are running parallelly but not
> all.
> at a time only 7 of then runs simultaneously while the 8th one is in queue
> WAITING for a job to stop.
>
> also, out of the 7 running jobs, 4 runs comparatively much faster than
> remaining three (maybe resources are not distributed properly) .
>
> I want to run n number of jobs at a time and make them run faster , Right
> now, one job is taking more than three minutes while processing a max of
> 1GB data .
>
> Kindly assist me. what am I missing.
>
> Thanks.
>

Mime
View raw message