spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: The default CDH4 build uses avro-mapred hadoop1
Date Fri, 20 Feb 2015 10:34:29 GMT
True, although a number of other little issues make me, personally,
not want to continue down this road:

- There are already a lot of build profiles to try to cover Hadoop versions
- I don't think it's quite right to have vendor-specific builds in
Spark to begin with
- We should be moving to only support Hadoop 2 soon IMHO anyway
- CDH4 is EOL in a few months I think

On Fri, Feb 20, 2015 at 8:30 AM, Mingyu Kim <mkim@palantir.com> wrote:
> Hi all,
>
> Related to https://issues.apache.org/jira/browse/SPARK-3039, the default CDH4 build,
which is built with "mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests clean package”,
pulls in avro-mapred hadoop1, as opposed to avro-mapred hadoop2. This ends up in the same
error as mentioned in the linked bug. (pasted below).
>
> The right solution would be to create a hadoop-2.0 profile that sets avro.mapred.classifier
to hadoop2, and to build CDH4 build with “-Phadoop-2.0” option.
>
> What do people think?
>
> Mingyu
>
> ——————————
>
> java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext,
but class was expected
>        at org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
>        at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133)
>        at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:107)
>        at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>        at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>        at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>        at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>        at org.apache.spark.scheduler.Task.run(Task.scala:56)
>        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message