spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))
Date Sun, 08 Mar 2015 21:56:31 GMT
Ah. I misunderstood that Matei was referring to the Scala 2.11 tarball
at http://people.apache.org/~pwendell/spark-1.3.0-rc3/ and not the
Maven artifacts.

Patrick I see you just commented on SPARK-5134 and will follow up
there. Sounds like this may accidentally not be a problem.

On binary tarball releases, I wonder if anyone has an opinion on my
opinion that these shouldn't be distributed for specific Hadoop
*distributions* to begin with. (Won't repeat the argument here yet.)
That resolves this n x m explosion too.

Vendors already provide their own distribution, yes, that's their job.


On Sun, Mar 8, 2015 at 9:42 PM, Krishna Sankar <ksankar42@gmail.com> wrote:
> Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop
> Distributions X ...
>
> May be one option is to have a minimum basic set (which I know is what we
> are discussing) and move the rest to spark-packages.org. There the vendors
> can add the latest downloads - for example when 1.4 is released, HDP can
> build a release of HDP Spark 1.4 bundle.
>
> Cheers
> <k/>
>
> On Sun, Mar 8, 2015 at 2:11 PM, Patrick Wendell <pwendell@gmail.com> wrote:
>>
>> We probably want to revisit the way we do binaries in general for
>> 1.4+. IMO, something worth forking a separate thread for.
>>
>> I've been hesitating to add new binaries because people
>> (understandably) complain if you ever stop packaging older ones, but
>> on the other hand the ASF has complained that we have too many
>> binaries already and that we need to pare it down because of the large
>> volume of files. Doubling the number of binaries we produce for Scala
>> 2.11 seemed like it would be too much.
>>
>> One solution potentially is to actually package "Hadoop provided"
>> binaries and encourage users to use these by simply setting
>> HADOOP_HOME, or have instructions for specific distros. I've heard
>> that our existing packages don't work well on HDP for instance, since
>> there are some configuration quirks that differ from the upstream
>> Hadoop.
>>
>> If we cut down on the cross building for Hadoop versions, then it is
>> more tenable to cross build for Scala versions without exploding the
>> number of binaries.
>>
>> - Patrick
>>
>> On Sun, Mar 8, 2015 at 12:46 PM, Sean Owen <sowen@cloudera.com> wrote:
>> > Yeah, interesting question of what is the better default for the
>> > single set of artifacts published to Maven. I think there's an
>> > argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros
>> > and cons discussed more at
>> >
>> > https://issues.apache.org/jira/browse/SPARK-5134
>> > https://github.com/apache/spark/pull/3917
>> >
>> > On Sun, Mar 8, 2015 at 7:42 PM, Matei Zaharia <matei.zaharia@gmail.com>
>> > wrote:
>> >> +1
>> >>
>> >> Tested it on Mac OS X.
>> >>
>> >> One small issue I noticed is that the Scala 2.11 build is using Hadoop
>> >> 1 without Hive, which is kind of weird because people will more likely want
>> >> Hadoop 2 with Hive. So it would be good to publish a build for that
>> >> configuration instead. We can do it if we do a new RC, or it might be that
>> >> binary builds may not need to be voted on (I forgot the details there).
>> >>
>> >> Matei
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message