spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: Suggestion for SPARK-1825
Date Fri, 25 Jul 2014 22:35:18 GMT
Actually reflection is probably a better, lighter weight process for this.
An extra project brings more overhead for something simple.





On Fri, Jul 25, 2014 at 3:09 PM, Colin McCabe <cmccabe@alumni.cmu.edu>
wrote:

> So, I'm leaning more towards using reflection for this.  Maven profiles
> could work, but it's tough since we have new stuff coming in in 2.4, 2.5,
> etc.  and the number of profiles will multiply quickly if we have to do it
> that way.  Reflection is the approach HBase took in a similar situation.
>
> best,
> Colin
>
>
> On Fri, Jul 25, 2014 at 11:23 AM, Colin McCabe <cmccabe@alumni.cmu.edu>
> wrote:
>
> > I have a similar issue with SPARK-1767.  There are basically three ways
> to
> > resolve the issue:
> >
> > 1. Use reflection to access classes newer than 0.21 (or whatever the
> > oldest version of Hadoop is that Spark supports)
> > 2. Add a build variant (in Maven this would be a profile) that deals with
> > this.
> > 3. Auto-detect which classes are available and use those.
> >
> > #1 is the easiest for end-users, but it can lead to some ugly code.
> >
> > #2 makes the code look nicer, but requires some effort on the part of
> > people building spark.  This can also lead to headaches for IDEs, if
> people
> > don't remember to select the new profile.  (For example, in IntelliJ, you
> > can't see any of the yarn classes when you import the project from Maven
> > without the YARN profile selected.)
> >
> > #3 is something that... I don't know how to do in sbt or Maven.  I've
> been
> > told that an antrun task might work here, but it seems like it could get
> > really tricky.
> >
> > Overall, I'd lean more towards #2 here.
> >
> > best,
> > Colin
> >
> >
> > On Tue, Jul 22, 2014 at 12:47 AM, innowireless TaeYun Kim <
> > taeyun.kim@innowireless.co.kr> wrote:
> >
> >> (I'm resending this mail since it seems that it was not sent. Sorry if
> >> this
> >> was already sent.)
> >>
> >> Hi,
> >>
> >>
> >>
> >> A couple of month ago, I made a pull request to fix
> >> https://issues.apache.org/jira/browse/SPARK-1825.
> >>
> >> My pull request is here: https://github.com/apache/spark/pull/899
> >>
> >>
> >>
> >> But that pull request has problems:
> >>
> >> l  It is Hadoop 2.4.0+ only. It won't compile on the versions below it.
> >>
> >> l  The related Hadoop API is marked as '@Unstable'.
> >>
> >>
> >>
> >> Here is an idea to remedy the problems: a new Spark configuration
> >> variable.
> >>
> >> Maybe it can be named as "spark.yarn.submit.crossplatform".
> >>
> >> If it is set to "true"(default is false), the related Spark code can use
> >> the
> >> hard-coded strings that is the same as the Hadoop API provides, thus
> >> avoiding compile error on the Hadoop versions below 2.4.0.
> >>
> >>
> >>
> >> Can someone implement this feature, if this idea is acceptable?
> >>
> >> Currently my knowledge on Spark source code and Scala is limited to
> >> implement it myself.
> >>
> >> To the right person, the modification should be trivial.
> >>
> >> You can refer to the source code changes of my pull request.
> >>
> >>
> >>
> >> Thanks.
> >>
> >>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message