spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Tustin <>
Subject Re: YARN Shuffle service and its compatibility
Date Mon, 18 Apr 2016 20:53:50 GMT
I'm good with option B at least until it blocks something utterly wonderful
(like shuffles are 10x faster).

On Mon, Apr 18, 2016 at 4:51 PM, Mark Grover <> wrote:

> Hi all,
> If you don't use Spark on YARN, you probably don't need to read further.
> Here's the *user scenario*:
> There are going to be folks who may be interested in running two versions
> of Spark (say Spark 1.6.x and Spark 2.x) on the same YARN cluster.
> And, here's the *problem*:
> That's all fine, should work well. However, there's one problem that
> relates to the YARN shuffle service
> <>.
> This service is run by the YARN Node Managers on all nodes of the cluster
> that have YARN NMs as an auxillary service
> <>
> .
> The key question here is -
> Option A:  Should the user be running 2 shuffle services - one for Spark
> 1.6.x and one for Spark 2.x?
> OR
> Option B: Should the user be running only 1 shuffle service that services
> both the Spark 1.6.x and Spark 2.x installs? This will likely have to be
> the Spark 1.6.x shuffle service (while ensuring it's forward compatible
> with Spark 2.x).
> *Discussion of above options:*
> A few things to note about the shuffle service:
> 1. Looking at the commit history, there aren't a whole of lot of changes
> that go into the shuffle service, rarely ones that are incompatible.
> There's only one incompatible change
> <> that's been made to
> the shuffle service, as far as I can tell, and that too, seems fairly
> cosmetic.
> 2. Shuffle services for 1.6.x and 2.x serve very similar purpose (to
> provide shuffle blocks) and can easily be just one service that does it,
> even on a YARN cluster that runs both Spark 1.x and Spark 2.x.
> 3. The shuffle service is not version-spaced. This means that, the way the
> code is currently, if we were to drop the jars for Spark1 and Spark2's
> shuffle service in YARN NM's classpath, YARN NM won't be able to start both
> services. It would arbitrarily pick one service to start (based on what
> appears on the classpath first). Also, the service name is hardcoded
> <>
> in Spark code and that name is also not version-spaced.
> Option A is arguably cleaner but it's more operational overhead and some
> code relocation/shading/version-spacing/name-spacing to make it work (due
> to #3 above), potentially to not a whole lot of value (given #2 above).
> Option B is simpler, lean and more operationally efficient. However, that
> requires that we as a community, keep Spark 1's shuffle service forward
> compatible with Spark 2 i.e. don't break compatibility between Spark1's and
> Spark2's shuffle service. We could even add a test (mima?) to assert that
> during the life time of Spark2. If we do go down that way, we should revert
> SPARK-12130 <> - the
> only backwards incompatible change made to Spark2 shuffle service so far.
> My personal vote goes towards Option B and I think reverting SPARK-12130
> is ok. What do others think?
> Thanks!
> Mark

Want to work at Handy? Check out our culture deck and open roles 
Latest news <> at Handy
Handy just raised $50m 
by Fidelity

View raw message