spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD
Date Wed, 07 Jan 2015 10:55:16 GMT
Problems like this are always due to having code compiled for Hadoop 1.x
run against Hadoop 2.x, or vice versa. Here, you compiled for 1.x but at
runtime Hadoop 2.x is used.

A common cause is actually bundling Spark / Hadoop classes with your app,
when the app should just use the Spark / Hadoop provided by the cluster. It
could also be that you're pairing Spark compiled for Hadoop 1.x with a 2.x
cluster.

On Wed, Jan 7, 2015 at 9:38 AM, Antony Mayi <antonymayi@yahoo.com.invalid>
wrote:

> Hi,
>
> I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running
> as yarn-client) - pretty much the standard case demonstrated in the
> hbase_inputformat.py from examples... the thing is the when trying the very
> same code on spark 1.2 I am getting the error bellow which based on similar
> cases on another forums suggest incompatibility between MR1 and MR2.
>
> why would this now start happening? is that due to some changes in
> resolving the classpath which now picks up MR2 jars first while before it
> was MR1?
>
> is there any workaround for this?
>
> thanks,
> Antony.
>
> the error:
>
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD. :
> java.lang.IncompatibleClassChangeError: Found interface
> org.apache.hadoop.mapreduce.JobContext, but class was expected at
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:158)
> at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at
> scala.Option.getOrElse(Option.scala:120) at
> org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at
> org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at
> scala.Option.getOrElse(Option.scala:120) at
> org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at
> org.apache.spark.rdd.RDD.take(RDD.scala:1060) at
> org.apache.spark.rdd.RDD.first(RDD.scala:1093) at
> org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202)
> at
> org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500)
> at org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606) at
> py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at
> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at
> py4j.Gateway.invoke(Gateway.java:259) at
> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at
> py4j.commands.CallCommand.execute(CallCommand.java:79) at
> py4j.GatewayConnection.run(GatewayConnection.java:207) at
> java.lang.Thread.run(Thread.java:745)
>
>
>

Mime
View raw message