spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deep Pradhan <pradhandeep1...@gmail.com>
Subject Re: Worker and Nodes
Date Sat, 21 Feb 2015 15:37:38 GMT
So, with the increase in the number of worker instances, if I also increase
the degree of parallelism, will it make any difference?
I can use this model even the other way round right? I can always predict
the performance of an app with the increase in number of worker instances,
the deterioration in performance, right?

Thank You

On Sat, Feb 21, 2015 at 8:52 PM, Deep Pradhan <pradhandeep1991@gmail.com>
wrote:

> Yes, I have decreased the executor memory.
> But,if I have to do this, then I have to tweak around with the code
> corresponding to each configuration right?
>
> On Sat, Feb 21, 2015 at 8:47 PM, Sean Owen <sowen@cloudera.com> wrote:
>
>> "Workers" has a specific meaning in Spark. You are running many on one
>> machine? that's possible but not usual.
>>
>> Each worker's executors have access to a fraction of your machine's
>> resources then. If you're not increasing parallelism, maybe you're not
>> actually using additional workers, so are using less resource for your
>> problem.
>>
>> Or because the resulting executors are smaller, maybe you're hitting
>> GC thrashing in these executors with smaller heaps.
>>
>> Or if you're not actually configuring the executors to use less
>> memory, maybe you're over-committing your RAM and swapping?
>>
>> Bottom line, you wouldn't use multiple workers on one small standalone
>> node. This isn't a good way to estimate performance on a distributed
>> cluster either.
>>
>> On Sat, Feb 21, 2015 at 3:11 PM, Deep Pradhan <pradhandeep1991@gmail.com>
>> wrote:
>> > No, I just have a single node standalone cluster.
>> >
>> > I am not tweaking around with the code to increase parallelism. I am
>> just
>> > running SparkKMeans that is there in Spark-1.0.0
>> > I just wanted to know, if this behavior is natural. And if so, what
>> causes
>> > this?
>> >
>> > Thank you
>> >
>> > On Sat, Feb 21, 2015 at 8:32 PM, Sean Owen <sowen@cloudera.com> wrote:
>> >>
>> >> What's your storage like? are you adding worker machines that are
>> >> remote from where the data lives? I wonder if it just means you are
>> >> spending more and more time sending the data over the network as you
>> >> try to ship more of it to more remote workers.
>> >>
>> >> To answer your question, no in general more workers means more
>> >> parallelism and therefore faster execution. But that depends on a lot
>> >> of things. For example, if your process isn't parallelize to use all
>> >> available execution slots, adding more slots doesn't do anything.
>> >>
>> >> On Sat, Feb 21, 2015 at 2:51 PM, Deep Pradhan <
>> pradhandeep1991@gmail.com>
>> >> wrote:
>> >> > Yes, I am talking about standalone single node cluster.
>> >> >
>> >> > No, I am not increasing parallelism. I just wanted to know if it is
>> >> > natural.
>> >> > Does message passing across the workers account for the happenning?
>> >> >
>> >> > I am running SparkKMeans, just to validate one prediction model. I
am
>> >> > using
>> >> > several data sets. I have a standalone mode. I am varying the workers
>> >> > from 1
>> >> > to 16
>> >> >
>> >> > On Sat, Feb 21, 2015 at 8:14 PM, Sean Owen <sowen@cloudera.com>
>> wrote:
>> >> >>
>> >> >> I can imagine a few reasons. Adding workers might cause fewer tasks
>> to
>> >> >> execute locally (?) So you may be execute more remotely.
>> >> >>
>> >> >> Are you increasing parallelism? for trivial jobs, chopping them
up
>> >> >> further may cause you to pay more overhead of managing so many
small
>> >> >> tasks, for no speed up in execution time.
>> >> >>
>> >> >> Can you provide any more specifics though? you haven't said what
>> >> >> you're running, what mode, how many workers, how long it takes,
etc.
>> >> >>
>> >> >> On Sat, Feb 21, 2015 at 2:37 PM, Deep Pradhan
>> >> >> <pradhandeep1991@gmail.com>
>> >> >> wrote:
>> >> >> > Hi,
>> >> >> > I have been running some jobs in my local single node stand
alone
>> >> >> > cluster. I
>> >> >> > am varying the worker instances for the same job, and the
time
>> taken
>> >> >> > for
>> >> >> > the
>> >> >> > job to complete increases with increase in the number of workers.
>> I
>> >> >> > repeated
>> >> >> > some experiments varying the number of nodes in a cluster
too and
>> the
>> >> >> > same
>> >> >> > behavior is seen.
>> >> >> > Can the idea of worker instances be extrapolated to the nodes
in a
>> >> >> > cluster?
>> >> >> >
>> >> >> > Thank You
>> >> >
>> >> >
>> >
>> >
>>
>
>

Mime
View raw message