spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Wendell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1518) Spark master doesn't compile against hadoop-common trunk
Date Thu, 29 May 2014 00:00:08 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011851#comment-14011851
] 

Patrick Wendell commented on SPARK-1518:
----------------------------------------

bq. In practice it look like one generic Hadoop 1, Hadoop 2, and CDH 4 release is produced,
and 1 set of Maven artifact. (PS again I am not sure Spark should contain a CDH-specific distribution?
realizing it's really a proxy for a particular Hadoop combo. Same goes for a MapR profile,
which is really for vendors to maintain) That means right now you can't build a Spark app
for anything but Hadoop 1.x with Maven, without installing it yourself, and there's not an
official distro for anything but two major Hadoop versions. Support for niche versions isn't
really there or promised anyway, and fleshing out "support" may make doing so pretty burdensome.

We need to update the list of binary builds for Spark... some are getting outdated. The workflow
for people building Spark apps is that they write their app against the Spark API's in Maven
central (they can do this no matter which cluster they want to run on). To run the app, If
they just want to run it locally they can spark-submit from any compiled package of Spark,
or they can use their build tool to just run it. If they want to submit it to a cluster, users
need to have a Spark package compiled for the Hadoop version on the cluster. Because of this
we distribute pre-compiled builds to allow people to avoid ever having to compile Spark.

In terms of vendor-specific builds, we've done this because users asked for it. It's useful
if, e.g. a user wants to submit a Spark job to a CDH or MapR cluster. Or run spark-shell locally
and read data from a CDH HDFS cluster. That's the main use case we want to support.

I don't know what it means that you "can't build a Spark app" for Hadoop 2.X. Building a Spark
app is intentionally decoupled from the process of submitting an app to a cluster. We want
users to be able to build Spark apps that they can run on e.g. different versions of Hadoop.

> Spark master doesn't compile against hadoop-common trunk
> --------------------------------------------------------
>
>                 Key: SPARK-1518
>                 URL: https://issues.apache.org/jira/browse/SPARK-1518
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Marcelo Vanzin
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>
> FSDataOutputStream::sync() has disappeared from trunk in Hadoop; FileLogger.scala is
calling it.
> I've changed it locally to hsync() so I can compile the code, but haven't checked yet
whether those are equivalent. hsync() seems to have been there forever, so it hopefully works
with all versions Spark cares about.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message