spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Important: Changes to Spark's build system on master branch
Date Wed, 21 Aug 2013 23:59:40 GMT
I understand this Cos, but Jey's patch actually removes the idea of "hadoop2". You only set
SPARK_HADOOP_VERSION (which can be 1.0.x, 2.0.0-cdh4, 2.0.5-alpha, etc) and possibly SPARK_YARN_MODE
if you want to run on YARN.

Matei

On Aug 21, 2013, at 4:50 PM, Konstantin Boudnik <cos@apache.org> wrote:

> I hear you guys - and I am well aware about the differences between the two.
> However, actual Hadoop2 doesn't even have such thing as MR1 - this is why
> profile naming is misleading. What you see under the current profile 'hadoop2'
> is essentially a commercial hack, that doesn't exist anywhere beyond CDH
> artifacts (and event there not for long).
> 
> Besides, YARN != MR2 :) YARN is a resource manager that, among other things,
> provides for running MR applications on it.
> 
> We can argue about semantics till blue in the face, but the reality is simple:
> current 'hadoop2' profile doesn't reflect Hadoop2 facts. That's my only point.
> 
> Cos
> 
> On Wed, Aug 21, 2013 at 01:20PM, Jey Kottalam wrote:
>> As Mridul points out, the old "hadoop1" and "hadoop2" terminology
>> referred to the versions of certain interfaces and classes within
>> Hadoop. With these latest changes we have unified the handling of both
>> hadoop1 and hadoop2 interfaces so that the build is agnostic to the
>> exact Hadoop version available at runtime.
>> 
>> However, the distinction between YARN-enabled and non-YARN builds does
>> still exist. I propose that we retroactively reinterpret
>> "hadoop2-yarn" as shorthand for "Hadoop MapReduce v2 (aka YARN)".
>> 
>> -Jey
>> 
>> On Wed, Aug 21, 2013 at 1:04 PM, Mridul Muralidharan <mridul@gmail.com> wrote:
>>> hadoop2, in this context, is use of spark on a hadoop cluster without
>>> yarn but with hadoop2 interfaces.
>>> hadoop2-yarn uses yarn RM to launch a spark job (and obviously uses
>>> hadoop2 interfaces).
>>> 
>>> Regards,
>>> Mridul
>>> 
>>> On Wed, Aug 21, 2013 at 11:52 PM, Konstantin Boudnik <cos@apache.org> wrote:
>>>> For what it worth guys - hadoop2 profile content is misleading: CDH isn't
>>>> Hadoop2: it has 1354 patches on top of Hadoop2 alpha.
>>>> 
>>>> What is called hadoop2-yarn is actually hadoop2. Perhaps, while we are at
it
>>>> the profiles need to be renamed. I can supply the patch if the community
is ok
>>>> with it.
>>>> 
>>>> Cos
>>>> 
>>>> On Tue, Aug 20, 2013 at 11:36PM, Andy Konwinski wrote:
>>>>> Hey Jey,
>>>>> 
>>>>> I'd just like to add that you can also run hadoop2 without modifying
the
>>>>> pom.xml file by passing the hadoop.version property at the command line
>>>>> like this:
>>>>> 
>>>>> mvn -Dhadoop.version=2.0.0-mr1-cdh4.1.2 clean verify
>>>>> 
>>>>> Also, when you mentioned building with Maven in your instructions I think
>>>>> you forgot to finish writing out your example for activating the yarn
>>>>> profile, which I think would be something like:
>>>>> 
>>>>> mvn -Phadoop2-yarn clean verify
>>>>> 
>>>>> ...right?
>>>>> 
>>>>> BTW, I've set up the AMPLab Jenkins Spark Maven Hadoop2 project to build
>>>>> using the new options
>>>>> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-Hadoop2/
>>>>> 
>>>>> Andy
>>>>> 
>>>>> On Tue, Aug 20, 2013 at 8:39 PM, Jey Kottalam <jey@cs.berkeley.edu>
wrote:
>>>>> 
>>>>>> The master branch of Spark has been updated with PR #838, which
>>>>>> changes aspects of Spark's interface to Hadoop. This involved also
>>>>>> making changes to Spark's build system as documented below. The
>>>>>> documentation will be updated with this information shortly.
>>>>>> 
>>>>>> Please feel free to reply to this thread with any questions or if
you
>>>>>> encounter any problems.
>>>>>> 
>>>>>> -Jey
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> When Building Spark
>>>>>> ===============
>>>>>> 
>>>>>> - General: The default version of Hadoop has been updated to 1.2.1
from
>>>>>> 1.0.4.
>>>>>> 
>>>>>> - General: You will probably need to perform an "sbt clean" or "mvn
>>>>>> clean" to remove old build files. SBT users may also need to perform
a
>>>>>> "clean" when changing Hadoop versions (or at least delete the
>>>>>> lib_managed directory).
>>>>>> 
>>>>>> - SBT users: The version of Hadoop used can be specified by setting
>>>>>> the SPARK_HADOOP_VERSION environment variable when invoking sbt,
and
>>>>>> YARN-enabled builds can be created by setting SPARK_WITH_YARN=true.
>>>>>> Example:
>>>>>> 
>>>>>>    # Using Hadoop 1.1.0 (a version of Hadoop without YARN)
>>>>>>    SPARK_HADOOP_VERSION=1.1.0 ./sbt/sbt package assembly
>>>>>> 
>>>>>>    # Using Hadoop 2.0.5-alpha (which is a YARN-based version of Hadoop)
>>>>>>    SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_WITH_YARN=true ./sbt/sbt
>>>>>> package assembly
>>>>>> 
>>>>>> - Maven users: Set the Hadoop version built against by editing the
>>>>>> "pom.xml" file in the root directory and changing the "hadoop.version"
>>>>>> property (and, the "yarn.version" property if applicable). If you
are
>>>>>> building with YARN disabled, you no longer need to enable any Maven
>>>>>> profiles (i.e. "-P" flags). To build with YARN enabled, use the
>>>>>> "hadoop2-yarn" Maven profile. Example:
>>>>>> 
>>>>>> - The "make-distribution.sh" script has been updated to take
>>>>>> additional parameters to select the Hadoop version and enable YARN.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> When Writing Spark Applications
>>>>>> ========================
>>>>>> 
>>>>>> 
>>>>>> - Non-YARN users: If you wish to use HDFS, you will need to add the
>>>>>> appropriate version of the "hadoop-client" artifact from the
>>>>>> "org.apache.hadoop" group to your project.
>>>>>> 
>>>>>>    SBT example:
>>>>>>        // "force()" is required because "1.1.0" is less than Spark's
>>>>>> default of "1.2.1"
>>>>>>        "org.apache.hadoop" % "hadoop-client" % "1.1.0" force()
>>>>>> 
>>>>>>    Maven example:
>>>>>>        <dependency>
>>>>>>          <groupId>org.apache.hadoop</groupId>
>>>>>>          <artifactId>hadoop-client</artifactId>
>>>>>>          <!-- the brackets are needed to tell Maven that this
is a
>>>>>> hard dependency on version "1.1.0" exactly -->
>>>>>>          <version>[1.1.0]</version>
>>>>>>        </dependency>
>>>>>> 
>>>>>> 
>>>>>> - YARN users: You will now need to set SPARK_JAR to point to the
>>>>>> spark-yarn assembly instead of the spark-core assembly previously
>>>>>> used.
>>>>>> 
>>>>>>  SBT Example:
>>>>>>       SPARK_JAR=$PWD/yarn/target/spark-yarn-assembly-0.8.0-SNAPSHOT.jar
\
>>>>>>        ./run spark.deploy.yarn.Client \
>>>>>>          --jar
>>>>>> $PWD/examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar
>>>>>> \
>>>>>>          --class spark.examples.SparkPi --args yarn-standalone \
>>>>>>          --num-workers 3 --worker-memory 2g --master-memory 2g
>>>>>> --worker-cores 1
>>>>>> 


Mime
View raw message