spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From akshay naidu <akshaynaid...@gmail.com>
Subject Re: Run Multiple Spark jobs. Reduce Execution time.
Date Thu, 15 Feb 2018 05:38:14 GMT
a small hint would be very helpful .

On Wed, Feb 14, 2018 at 5:17 PM, akshay naidu <akshaynaidu.9@gmail.com>
wrote:

> Hello Siva,
> Thanks for your reply.
>
> Actually i'm trying to generate online reports for my clients. For this I
> want the jobs should be executed faster without putting any job on QUEUE
> irrespective of the number of jobs different clients are executing from
> different locations.
> currently , a job processing 17GB of data takes more than 20mins to
> execute. also only 6 jobs run simultaneously and the remaining one are in
> WAITING stage.
>
> Thanks
>
> On Wed, Feb 14, 2018 at 4:32 PM, Siva Gudavalli <gudavalli.siva@yahoo.com>
> wrote:
>
>>
>> Hello Akshay,
>>
>> I see there are 6 slaves * with 1 spark Instance each * 5 cores on each
>> Instance => 30 cores in total
>> Do you have any other pools confuted ? Running 8 jobs should be triggered
>> in parallel with the number of cores you have.
>>
>> For your long running job, did you have a chance to look at Tasks thats
>> being triggered.
>>
>> I would recommend slow running job to be configured in a separate pool.
>>
>> Regards
>> Shiv
>>
>> On Feb 14, 2018, at 5:44 AM, akshay naidu <akshaynaidu.9@gmail.com>
>> wrote:
>>
>> ************************************************************
>> **********************************************************
>> yarn-site.xml
>>
>>
>>  <property>
>>     <name>yarn.scheduler.fair.preemption.cluster-utilization-
>> threshold</name>
>>     <value>0.8</value>
>>   </property>
>>
>> <property>
>> <name>yarn.scheduler.minimum-allocation-mb</name>
>> <value>3584</value>
>> </property>
>>
>> <property>
>> <name>yarn.scheduler.maximum-allocation-mb</name>
>> <value>10752</value>
>> </property>
>>
>> <property>
>> <name>yarn.nodemanager.resource.memory-mb</name>
>> <value>10752</value>
>>
>> ************************************************************
>> ******************************************************************
>> spark-defaults.conf
>>
>> spark.master                       yarn
>> spark.driver.memory                9g
>> spark.executor.memory              1024m
>> spark.yarn.executor.memoryOverhead 1024m
>> spark.eventLog.enabled  true
>> spark.eventLog.dir hdfs://tech-master:54310/spark-logs
>>
>> spark.history.provider            org.apache.spark.deploy.histor
>> y.FsHistoryProvider
>> spark.history.fs.logDirectory     hdfs://tech-master:54310/spark-logs
>> spark.history.fs.update.interval  10s
>> spark.history.ui.port             18080
>>
>> spark.ui.enabled                true
>> spark.ui.port                   4040
>> spark.ui.killEnabled            true
>> spark.ui.retainedDeadExecutors  100
>>
>> spark.scheduler.mode            FAIR
>> spark.scheduler.allocation.file /usr/local/spark/current/conf/
>> fairscheduler.xml
>>
>> #spark.submit.deployMode         cluster
>> spark.default.parallelism        30
>>
>> SPARK_WORKER_MEMORY 10g
>> SPARK_WORKER_INSTANCES 1
>> SPARK_WORKER_CORES 5
>>
>> SPARK_DRIVER_MEMORY 9g
>> SPARK_DRIVER_CORES 5
>>
>> SPARK_MASTER_IP Tech-master
>> SPARK_MASTER_PORT 7077
>>
>> On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu <akshaynaidu.9@gmail.com>
>> wrote:
>>
>>> Hello,
>>> I'm try to run multiple spark jobs on cluster running in yarn.
>>> Master is 24GB server with 6 Slaves of 12GB
>>>
>>> fairscheduler.xml settings are -
>>> <pool name="default">
>>>     <schedulingMode>FAIR</schedulingMode>
>>>     <weight>10</weight>
>>>     <minShare>2</minShare>
>>> </pool>
>>>
>>> I am running 8 jobs simultaneously , jobs are running parallelly but not
>>> all.
>>> at a time only 7 of then runs simultaneously while the 8th one is in
>>> queue WAITING for a job to stop.
>>>
>>> also, out of the 7 running jobs, 4 runs comparatively much faster than
>>> remaining three (maybe resources are not distributed properly) .
>>>
>>> I want to run n number of jobs at a time and make them run faster ,
>>> Right now, one job is taking more than three minutes while processing a max
>>> of 1GB data .
>>>
>>> Kindly assist me. what am I missing.
>>>
>>> Thanks.
>>>
>>
>>
>>
>

Mime
View raw message