spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mridul Muralidharan <mri...@gmail.com>
Subject Re: Important: Changes to Spark's build system on master branch
Date Wed, 21 Aug 2013 20:04:20 GMT
hadoop2, in this context, is use of spark on a hadoop cluster without
yarn but with hadoop2 interfaces.
hadoop2-yarn uses yarn RM to launch a spark job (and obviously uses
hadoop2 interfaces).

Regards,
Mridul

On Wed, Aug 21, 2013 at 11:52 PM, Konstantin Boudnik <cos@apache.org> wrote:
> For what it worth guys - hadoop2 profile content is misleading: CDH isn't
> Hadoop2: it has 1354 patches on top of Hadoop2 alpha.
>
> What is called hadoop2-yarn is actually hadoop2. Perhaps, while we are at it
> the profiles need to be renamed. I can supply the patch if the community is ok
> with it.
>
> Cos
>
> On Tue, Aug 20, 2013 at 11:36PM, Andy Konwinski wrote:
>> Hey Jey,
>>
>> I'd just like to add that you can also run hadoop2 without modifying the
>> pom.xml file by passing the hadoop.version property at the command line
>> like this:
>>
>> mvn -Dhadoop.version=2.0.0-mr1-cdh4.1.2 clean verify
>>
>> Also, when you mentioned building with Maven in your instructions I think
>> you forgot to finish writing out your example for activating the yarn
>> profile, which I think would be something like:
>>
>> mvn -Phadoop2-yarn clean verify
>>
>> ...right?
>>
>> BTW, I've set up the AMPLab Jenkins Spark Maven Hadoop2 project to build
>> using the new options
>> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-Hadoop2/
>>
>> Andy
>>
>> On Tue, Aug 20, 2013 at 8:39 PM, Jey Kottalam <jey@cs.berkeley.edu> wrote:
>>
>> > The master branch of Spark has been updated with PR #838, which
>> > changes aspects of Spark's interface to Hadoop. This involved also
>> > making changes to Spark's build system as documented below. The
>> > documentation will be updated with this information shortly.
>> >
>> > Please feel free to reply to this thread with any questions or if you
>> > encounter any problems.
>> >
>> > -Jey
>> >
>> >
>> >
>> > When Building Spark
>> > ===============
>> >
>> > - General: The default version of Hadoop has been updated to 1.2.1 from
>> > 1.0.4.
>> >
>> > - General: You will probably need to perform an "sbt clean" or "mvn
>> > clean" to remove old build files. SBT users may also need to perform a
>> > "clean" when changing Hadoop versions (or at least delete the
>> > lib_managed directory).
>> >
>> > - SBT users: The version of Hadoop used can be specified by setting
>> > the SPARK_HADOOP_VERSION environment variable when invoking sbt, and
>> > YARN-enabled builds can be created by setting SPARK_WITH_YARN=true.
>> > Example:
>> >
>> >     # Using Hadoop 1.1.0 (a version of Hadoop without YARN)
>> >     SPARK_HADOOP_VERSION=1.1.0 ./sbt/sbt package assembly
>> >
>> >     # Using Hadoop 2.0.5-alpha (which is a YARN-based version of Hadoop)
>> >     SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_WITH_YARN=true ./sbt/sbt
>> > package assembly
>> >
>> > - Maven users: Set the Hadoop version built against by editing the
>> > "pom.xml" file in the root directory and changing the "hadoop.version"
>> > property (and, the "yarn.version" property if applicable). If you are
>> > building with YARN disabled, you no longer need to enable any Maven
>> > profiles (i.e. "-P" flags). To build with YARN enabled, use the
>> > "hadoop2-yarn" Maven profile. Example:
>> >
>> > - The "make-distribution.sh" script has been updated to take
>> > additional parameters to select the Hadoop version and enable YARN.
>> >
>> >
>> >
>> > When Writing Spark Applications
>> > ========================
>> >
>> >
>> > - Non-YARN users: If you wish to use HDFS, you will need to add the
>> > appropriate version of the "hadoop-client" artifact from the
>> > "org.apache.hadoop" group to your project.
>> >
>> >     SBT example:
>> >         // "force()" is required because "1.1.0" is less than Spark's
>> > default of "1.2.1"
>> >         "org.apache.hadoop" % "hadoop-client" % "1.1.0" force()
>> >
>> >     Maven example:
>> >         <dependency>
>> >           <groupId>org.apache.hadoop</groupId>
>> >           <artifactId>hadoop-client</artifactId>
>> >           <!-- the brackets are needed to tell Maven that this is a
>> > hard dependency on version "1.1.0" exactly -->
>> >           <version>[1.1.0]</version>
>> >         </dependency>
>> >
>> >
>> > - YARN users: You will now need to set SPARK_JAR to point to the
>> > spark-yarn assembly instead of the spark-core assembly previously
>> > used.
>> >
>> >   SBT Example:
>> >        SPARK_JAR=$PWD/yarn/target/spark-yarn-assembly-0.8.0-SNAPSHOT.jar \
>> >         ./run spark.deploy.yarn.Client \
>> >           --jar
>> > $PWD/examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar
>> > \
>> >           --class spark.examples.SparkPi --args yarn-standalone \
>> >           --num-workers 3 --worker-memory 2g --master-memory 2g
>> > --worker-cores 1
>> >

Mime
View raw message