spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Mayi <antonym...@yahoo.com.INVALID>
Subject Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD
Date Wed, 07 Jan 2015 13:46:32 GMT
this is official cloudera compiled stack cdh 5.3.0 - nothing has been done by me and I presume
they are pretty good in building it so I still suspect it now gets the classpath resolved
in different way?
thx,Antony. 

     On Wednesday, 7 January 2015, 18:55, Sean Owen <sowen@cloudera.com> wrote:
   
 

 Problems like this are always due to having code compiled for Hadoop 1.x run against Hadoop
2.x, or vice versa. Here, you compiled for 1.x but at runtime Hadoop 2.x is used.
A common cause is actually bundling Spark / Hadoop classes with your app, when the app should
just use the Spark / Hadoop provided by the cluster. It could also be that you're pairing
Spark compiled for Hadoop 1.x with a 2.x cluster.
On Wed, Jan 7, 2015 at 9:38 AM, Antony Mayi <antonymayi@yahoo.com.invalid> wrote:

Hi,
I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running as yarn-client)
- pretty much the standard case demonstrated in the hbase_inputformat.py from examples...
the thing is the when trying the very same code on spark 1.2 I am getting the error bellow
which based on similar cases on another forums suggest incompatibility between MR1 and MR2.
why would this now start happening? is that due to some changes in resolving the classpath
which now picks up MR2 jars first while before it was MR1?
is there any workaround for this?
thanks,Antony.
the error:
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.:
java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext,
but class was expected at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:158)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.RDD.take(RDD.scala:1060) at org.apache.spark.rdd.RDD.first(RDD.scala:1093)
at org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202) at org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500)
at org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
 



 
   
Mime
View raw message