spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deep Pradhan <pradhandeep1...@gmail.com>
Subject Re: Worker and Nodes
Date Sat, 21 Feb 2015 16:19:50 GMT
So, if I keep the number of instances constant and increase the degree of
parallelism in steps, can I expect the performance to increase?

Thank You

On Sat, Feb 21, 2015 at 9:07 PM, Deep Pradhan <pradhandeep1991@gmail.com>
wrote:

> So, with the increase in the number of worker instances, if I also
> increase the degree of parallelism, will it make any difference?
> I can use this model even the other way round right? I can always predict
> the performance of an app with the increase in number of worker instances,
> the deterioration in performance, right?
>
> Thank You
>
> On Sat, Feb 21, 2015 at 8:52 PM, Deep Pradhan <pradhandeep1991@gmail.com>
> wrote:
>
>> Yes, I have decreased the executor memory.
>> But,if I have to do this, then I have to tweak around with the code
>> corresponding to each configuration right?
>>
>> On Sat, Feb 21, 2015 at 8:47 PM, Sean Owen <sowen@cloudera.com> wrote:
>>
>>> "Workers" has a specific meaning in Spark. You are running many on one
>>> machine? that's possible but not usual.
>>>
>>> Each worker's executors have access to a fraction of your machine's
>>> resources then. If you're not increasing parallelism, maybe you're not
>>> actually using additional workers, so are using less resource for your
>>> problem.
>>>
>>> Or because the resulting executors are smaller, maybe you're hitting
>>> GC thrashing in these executors with smaller heaps.
>>>
>>> Or if you're not actually configuring the executors to use less
>>> memory, maybe you're over-committing your RAM and swapping?
>>>
>>> Bottom line, you wouldn't use multiple workers on one small standalone
>>> node. This isn't a good way to estimate performance on a distributed
>>> cluster either.
>>>
>>> On Sat, Feb 21, 2015 at 3:11 PM, Deep Pradhan <pradhandeep1991@gmail.com>
>>> wrote:
>>> > No, I just have a single node standalone cluster.
>>> >
>>> > I am not tweaking around with the code to increase parallelism. I am
>>> just
>>> > running SparkKMeans that is there in Spark-1.0.0
>>> > I just wanted to know, if this behavior is natural. And if so, what
>>> causes
>>> > this?
>>> >
>>> > Thank you
>>> >
>>> > On Sat, Feb 21, 2015 at 8:32 PM, Sean Owen <sowen@cloudera.com> wrote:
>>> >>
>>> >> What's your storage like? are you adding worker machines that are
>>> >> remote from where the data lives? I wonder if it just means you are
>>> >> spending more and more time sending the data over the network as you
>>> >> try to ship more of it to more remote workers.
>>> >>
>>> >> To answer your question, no in general more workers means more
>>> >> parallelism and therefore faster execution. But that depends on a lot
>>> >> of things. For example, if your process isn't parallelize to use all
>>> >> available execution slots, adding more slots doesn't do anything.
>>> >>
>>> >> On Sat, Feb 21, 2015 at 2:51 PM, Deep Pradhan <
>>> pradhandeep1991@gmail.com>
>>> >> wrote:
>>> >> > Yes, I am talking about standalone single node cluster.
>>> >> >
>>> >> > No, I am not increasing parallelism. I just wanted to know if it
is
>>> >> > natural.
>>> >> > Does message passing across the workers account for the happenning?
>>> >> >
>>> >> > I am running SparkKMeans, just to validate one prediction model.
I
>>> am
>>> >> > using
>>> >> > several data sets. I have a standalone mode. I am varying the
>>> workers
>>> >> > from 1
>>> >> > to 16
>>> >> >
>>> >> > On Sat, Feb 21, 2015 at 8:14 PM, Sean Owen <sowen@cloudera.com>
>>> wrote:
>>> >> >>
>>> >> >> I can imagine a few reasons. Adding workers might cause fewer
>>> tasks to
>>> >> >> execute locally (?) So you may be execute more remotely.
>>> >> >>
>>> >> >> Are you increasing parallelism? for trivial jobs, chopping
them up
>>> >> >> further may cause you to pay more overhead of managing so many
>>> small
>>> >> >> tasks, for no speed up in execution time.
>>> >> >>
>>> >> >> Can you provide any more specifics though? you haven't said
what
>>> >> >> you're running, what mode, how many workers, how long it takes,
>>> etc.
>>> >> >>
>>> >> >> On Sat, Feb 21, 2015 at 2:37 PM, Deep Pradhan
>>> >> >> <pradhandeep1991@gmail.com>
>>> >> >> wrote:
>>> >> >> > Hi,
>>> >> >> > I have been running some jobs in my local single node
stand alone
>>> >> >> > cluster. I
>>> >> >> > am varying the worker instances for the same job, and
the time
>>> taken
>>> >> >> > for
>>> >> >> > the
>>> >> >> > job to complete increases with increase in the number
of
>>> workers. I
>>> >> >> > repeated
>>> >> >> > some experiments varying the number of nodes in a cluster
too
>>> and the
>>> >> >> > same
>>> >> >> > behavior is seen.
>>> >> >> > Can the idea of worker instances be extrapolated to the
nodes in
>>> a
>>> >> >> > cluster?
>>> >> >> >
>>> >> >> > Thank You
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>
>>
>

Mime
View raw message