spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Pivovarov <apivova...@gmail.com>
Subject Re: spark on yarn wastes one box (or 1 GB on each box) for am container
Date Tue, 09 Feb 2016 08:35:53 GMT
If I add additional small box to the cluster can I configure yarn to select
small box to run am container?


On Mon, Feb 8, 2016 at 10:53 PM, Sean Owen <sowen@cloudera.com> wrote:

> Typically YARN is there because you're mediating resource requests
> from things besides Spark, so yeah using every bit of the cluster is a
> little bit of a corner case. There's not a good answer if all your
> nodes are the same size.
>
> I think you can let YARN over-commit RAM though, and allocate more
> memory than it actually has. It may be beneficial to let them all
> think they have an extra GB, and let one node running the AM
> technically be overcommitted, a state which won't hurt at all unless
> you're really really tight on memory, in which case something might
> get killed.
>
> On Tue, Feb 9, 2016 at 6:49 AM, Jonathan Kelly <jonathakamzn@gmail.com>
> wrote:
> > Alex,
> >
> > That's a very good question that I've been trying to answer myself
> recently
> > too. Since you've mentioned before that you're using EMR, I assume you're
> > asking this because you've noticed this behavior on emr-4.3.0.
> >
> > In this release, we made some changes to the maximizeResourceAllocation
> > (which you may or may not be using, but either way this issue is
> present),
> > including the accidental inclusion of somewhat of a bug that makes it not
> > reserve any space for the AM, which ultimately results in one of the
> nodes
> > being utilized only by the AM and not an executor.
> >
> > However, as you point out, the only viable fix seems to be to reserve
> enough
> > memory for the AM on *every single node*, which in some cases might
> actually
> > be worse than wasting a lot of memory on a single node.
> >
> > So yeah, I also don't like either option. Is this just the price you pay
> for
> > running on YARN?
> >
> >
> > ~ Jonathan
> >
> > On Mon, Feb 8, 2016 at 9:03 PM Alexander Pivovarov <apivovarov@gmail.com
> >
> > wrote:
> >>
> >> Lets say that yarn has 53GB memory available on each slave
> >>
> >> spark.am container needs 896MB.  (512 + 384)
> >>
> >> I see two options to configure spark:
> >>
> >> 1. configure spark executors to use 52GB and leave 1 GB on each box. So,
> >> some box will also run am container. So, 1GB memory will not be used on
> all
> >> slaves but one.
> >>
> >> 2. configure spark to use all 53GB and add additional 53GB box which
> will
> >> run only am container. So, 52GB on this additional box will do nothing
> >>
> >> I do not like both options. Is there a better way to configure
> yarn/spark?
> >>
> >>
> >> Alex
>

Mime
View raw message