spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Boudnik <...@apache.org>
Subject Re: Important: Changes to Spark's build system on master branch
Date Wed, 21 Aug 2013 18:22:00 GMT
For what it worth guys - hadoop2 profile content is misleading: CDH isn't
Hadoop2: it has 1354 patches on top of Hadoop2 alpha.

What is called hadoop2-yarn is actually hadoop2. Perhaps, while we are at it
the profiles need to be renamed. I can supply the patch if the community is ok
with it.

Cos
 
On Tue, Aug 20, 2013 at 11:36PM, Andy Konwinski wrote:
> Hey Jey,
> 
> I'd just like to add that you can also run hadoop2 without modifying the
> pom.xml file by passing the hadoop.version property at the command line
> like this:
> 
> mvn -Dhadoop.version=2.0.0-mr1-cdh4.1.2 clean verify
> 
> Also, when you mentioned building with Maven in your instructions I think
> you forgot to finish writing out your example for activating the yarn
> profile, which I think would be something like:
> 
> mvn -Phadoop2-yarn clean verify
> 
> ...right?
> 
> BTW, I've set up the AMPLab Jenkins Spark Maven Hadoop2 project to build
> using the new options
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-Hadoop2/
> 
> Andy
> 
> On Tue, Aug 20, 2013 at 8:39 PM, Jey Kottalam <jey@cs.berkeley.edu> wrote:
> 
> > The master branch of Spark has been updated with PR #838, which
> > changes aspects of Spark's interface to Hadoop. This involved also
> > making changes to Spark's build system as documented below. The
> > documentation will be updated with this information shortly.
> >
> > Please feel free to reply to this thread with any questions or if you
> > encounter any problems.
> >
> > -Jey
> >
> >
> >
> > When Building Spark
> > ===============
> >
> > - General: The default version of Hadoop has been updated to 1.2.1 from
> > 1.0.4.
> >
> > - General: You will probably need to perform an "sbt clean" or "mvn
> > clean" to remove old build files. SBT users may also need to perform a
> > "clean" when changing Hadoop versions (or at least delete the
> > lib_managed directory).
> >
> > - SBT users: The version of Hadoop used can be specified by setting
> > the SPARK_HADOOP_VERSION environment variable when invoking sbt, and
> > YARN-enabled builds can be created by setting SPARK_WITH_YARN=true.
> > Example:
> >
> >     # Using Hadoop 1.1.0 (a version of Hadoop without YARN)
> >     SPARK_HADOOP_VERSION=1.1.0 ./sbt/sbt package assembly
> >
> >     # Using Hadoop 2.0.5-alpha (which is a YARN-based version of Hadoop)
> >     SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_WITH_YARN=true ./sbt/sbt
> > package assembly
> >
> > - Maven users: Set the Hadoop version built against by editing the
> > "pom.xml" file in the root directory and changing the "hadoop.version"
> > property (and, the "yarn.version" property if applicable). If you are
> > building with YARN disabled, you no longer need to enable any Maven
> > profiles (i.e. "-P" flags). To build with YARN enabled, use the
> > "hadoop2-yarn" Maven profile. Example:
> >
> > - The "make-distribution.sh" script has been updated to take
> > additional parameters to select the Hadoop version and enable YARN.
> >
> >
> >
> > When Writing Spark Applications
> > ========================
> >
> >
> > - Non-YARN users: If you wish to use HDFS, you will need to add the
> > appropriate version of the "hadoop-client" artifact from the
> > "org.apache.hadoop" group to your project.
> >
> >     SBT example:
> >         // "force()" is required because "1.1.0" is less than Spark's
> > default of "1.2.1"
> >         "org.apache.hadoop" % "hadoop-client" % "1.1.0" force()
> >
> >     Maven example:
> >         <dependency>
> >           <groupId>org.apache.hadoop</groupId>
> >           <artifactId>hadoop-client</artifactId>
> >           <!-- the brackets are needed to tell Maven that this is a
> > hard dependency on version "1.1.0" exactly -->
> >           <version>[1.1.0]</version>
> >         </dependency>
> >
> >
> > - YARN users: You will now need to set SPARK_JAR to point to the
> > spark-yarn assembly instead of the spark-core assembly previously
> > used.
> >
> >   SBT Example:
> >        SPARK_JAR=$PWD/yarn/target/spark-yarn-assembly-0.8.0-SNAPSHOT.jar \
> >         ./run spark.deploy.yarn.Client \
> >           --jar
> > $PWD/examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar
> > \
> >           --class spark.examples.SparkPi --args yarn-standalone \
> >           --num-workers 3 --worker-memory 2g --master-memory 2g
> > --worker-cores 1
> >

Mime
View raw message