spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aakash Basu <aakash.spark....@gmail.com>
Subject Re: [Spark Optimization] Why is one node getting all the pressure?
Date Mon, 11 Jun 2018 10:22:47 GMT
Jorn - The code is a series of feature engineering and model tuning
operations. Too big to show. Yes, data volume is too low, it is in KBs,
just tried to experiment with a small dataset before going for a large one.

Akshay - I ran with your suggested spark configurations, I get this (the
node changed, but the problem persists) -





On Mon, Jun 11, 2018 at 3:16 PM, akshay naidu <akshaynaidu.9@gmail.com>
wrote:

> try
>  --num-executors 3 --executor-cores 4 --executor-memory 2G --conf
> spark.scheduler.mode=FAIR
>
> On Mon, Jun 11, 2018 at 2:43 PM, Aakash Basu <aakash.spark.raj@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have submitted a job on* 4 node cluster*, where I see, most of the
>> operations happening at one of the worker nodes and other two are simply
>> chilling out.
>>
>> Picture below puts light on that -
>>
>> How to properly distribute the load?
>>
>> My cluster conf (4 node cluster [1 driver; 3 slaves]) -
>>
>> *Cores - 6*
>> *RAM - 12 GB*
>> *HDD - 60 GB*
>>
>> My Spark Submit command is as follows -
>>
>> *spark-submit --master spark://192.168.49.37:7077
>> <http://192.168.49.37:7077> --num-executors 3 --executor-cores 5
>> --executor-memory 4G /appdata/bblite-codebase/prima_diabetes_indians.py*
>>
>> What to do?
>>
>> Thanks,
>> Aakash.
>>
>
>

Mime
View raw message