spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From akshay naidu <akshaynaid...@gmail.com>
Subject Re: Run Multiple Spark jobs. Reduce Execution time.
Date Wed, 14 Feb 2018 11:47:29 GMT
Hello Siva,
Thanks for your reply.

Actually i'm trying to generate online reports for my clients. For this I
want the jobs should be executed faster without putting any job on QUEUE
irrespective of the number of jobs different clients are executing from
different locations.
currently , a job processing 17GB of data takes more than 20mins to
execute. also only 6 jobs run simultaneously and the remaining one are in
WAITING stage.

Thanks

On Wed, Feb 14, 2018 at 4:32 PM, Siva Gudavalli <gudavalli.siva@yahoo.com>
wrote:

>
> Hello Akshay,
>
> I see there are 6 slaves * with 1 spark Instance each * 5 cores on each
> Instance => 30 cores in total
> Do you have any other pools confuted ? Running 8 jobs should be triggered
> in parallel with the number of cores you have.
>
> For your long running job, did you have a chance to look at Tasks thats
> being triggered.
>
> I would recommend slow running job to be configured in a separate pool.
>
> Regards
> Shiv
>
> On Feb 14, 2018, at 5:44 AM, akshay naidu <akshaynaidu.9@gmail.com> wrote:
>
> ************************************************************
> **********************************************************
> yarn-site.xml
>
>
>  <property>
>     <name>yarn.scheduler.fair.preemption.cluster-
> utilization-threshold</name>
>     <value>0.8</value>
>   </property>
>
> <property>
> <name>yarn.scheduler.minimum-allocation-mb</name>
> <value>3584</value>
> </property>
>
> <property>
> <name>yarn.scheduler.maximum-allocation-mb</name>
> <value>10752</value>
> </property>
>
> <property>
> <name>yarn.nodemanager.resource.memory-mb</name>
> <value>10752</value>
>
> ************************************************************
> ******************************************************************
> spark-defaults.conf
>
> spark.master                       yarn
> spark.driver.memory                9g
> spark.executor.memory              1024m
> spark.yarn.executor.memoryOverhead 1024m
> spark.eventLog.enabled  true
> spark.eventLog.dir hdfs://tech-master:54310/spark-logs
>
> spark.history.provider            org.apache.spark.deploy.
> history.FsHistoryProvider
> spark.history.fs.logDirectory     hdfs://tech-master:54310/spark-logs
> spark.history.fs.update.interval  10s
> spark.history.ui.port             18080
>
> spark.ui.enabled                true
> spark.ui.port                   4040
> spark.ui.killEnabled            true
> spark.ui.retainedDeadExecutors  100
>
> spark.scheduler.mode            FAIR
> spark.scheduler.allocation.file /usr/local/spark/current/conf/
> fairscheduler.xml
>
> #spark.submit.deployMode         cluster
> spark.default.parallelism        30
>
> SPARK_WORKER_MEMORY 10g
> SPARK_WORKER_INSTANCES 1
> SPARK_WORKER_CORES 5
>
> SPARK_DRIVER_MEMORY 9g
> SPARK_DRIVER_CORES 5
>
> SPARK_MASTER_IP Tech-master
> SPARK_MASTER_PORT 7077
>
> On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu <akshaynaidu.9@gmail.com>
> wrote:
>
>> Hello,
>> I'm try to run multiple spark jobs on cluster running in yarn.
>> Master is 24GB server with 6 Slaves of 12GB
>>
>> fairscheduler.xml settings are -
>> <pool name="default">
>>     <schedulingMode>FAIR</schedulingMode>
>>     <weight>10</weight>
>>     <minShare>2</minShare>
>> </pool>
>>
>> I am running 8 jobs simultaneously , jobs are running parallelly but not
>> all.
>> at a time only 7 of then runs simultaneously while the 8th one is in
>> queue WAITING for a job to stop.
>>
>> also, out of the 7 running jobs, 4 runs comparatively much faster than
>> remaining three (maybe resources are not distributed properly) .
>>
>> I want to run n number of jobs at a time and make them run faster , Right
>> now, one job is taking more than three minutes while processing a max of
>> 1GB data .
>>
>> Kindly assist me. what am I missing.
>>
>> Thanks.
>>
>
>
>

Mime
View raw message