spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre Borckmans <pierre.borckm...@realimpactanalytics.com>
Subject Re: Changing number of workers for benchmarking purposes
Date Thu, 13 Mar 2014 09:00:14 GMT
Thanks Patrick.

I could try that.

But the idea was to be able to write a fully automated benchmark, varying the dataset size,
the number of workers, the memory, … without having to stop/start the cluster each time.

I was thinking something like SparkConf.set(“spark.max_number_workers”, n) would be useful
in this context but maybe too specific to be implemented.

Thanks anyway,

Cheers

Pierre



On 12 Mar 2014, at 22:50, Patrick Wendell <pwendell@gmail.com> wrote:

> Hey Pierre,
> 
> Currently modifying the "slaves" file is the best way to do this
> because in general we expect that users will want to launch workers on
> any slave.
> 
> I think you could hack something together pretty easily to allow this.
> For instance if you modify the line in slaves.sh from this:
> 
>  for slave in `cat "$HOSTLIST"|sed  "s/#.*$//;/^$/d"`; do
> 
> to this
> 
>  for slave in `cat "$HOSTLIST"| head -n $NUM_SLAVES | sed
> "s/#.*$//;/^$/d"`; do
> 
> Then you could just set NUM_SLAVES before you stop/start. Not sure if
> this helps much but maybe it's a bit faster.
> 
> - Patrick
> 
> On Wed, Mar 12, 2014 at 10:18 AM, Pierre Borckmans
> <pierre.borckmans@realimpactanalytics.com> wrote:
>> Hi there!
>> 
>> I was performing some tests for benchmarking purposes, among other things to observe
the evolution of the performances versus the number of workers.
>> 
>> In that context, I was wondering if there is any easy way to choose the number of
workers to be used in standalone mode, without having to change the "slaves" file, dispatch
it, and restart the cluster ?
>> 
>> 
>> Cheers,
>> 
>> Pierre


Mime
View raw message