spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin kak <nitinkak...@gmail.com>
Subject Re: Is Spark 1.1.0 incompatible with Hive?
Date Mon, 27 Oct 2014 18:21:27 GMT
I am now on CDH 5.2 which has the Hive module packaged in it.

On Mon, Oct 27, 2014 at 2:17 PM, Michael Armbrust <michael@databricks.com>
wrote:

> Which version of CDH are you using?  I believe that hive is not correctly
> packaged in 5.1, but should work in 5.2.  Another option that people use is
> to deploy the plain Apache version of Spark on CDH Yarn.
>
> On Mon, Oct 27, 2014 at 11:10 AM, Nitin kak <nitinkak001@gmail.com> wrote:
>
>> Yes, I added all the Hive jars present in Cloudera distribution of
>> Hadoop. I added them because I was getting ClassNotFoundException for many
>> required classes(one example stack trace below). So, someone on the
>> community suggested to include the hive jars:
>>
>> *Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/hadoop/hive/conf/HiveConf*
>> *        at
>> org.apache.spark.sql.hive.api.java.JavaHiveContext.<init>(JavaHiveContext.scala:30)*
>> *        at HiveContextExample.main(HiveContextExample.java:57)*
>> *        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>       at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>       at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>       at java.lang.reflect.Method.invoke(Method.java:606)*
>> *        at
>> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:331)*
>> *        at
>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)*
>> *        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)*
>> *Caused by: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.conf.HiveConf*
>>
>> On Mon, Oct 27, 2014 at 1:57 PM, Michael Armbrust <michael@databricks.com
>> > wrote:
>>
>>> No such method error almost always means you are mixing different
>>> versions of the same library on the classpath.  In this case it looks like
>>> you have more than one version of guava.  Have you added anything to the
>>> classpath?
>>>
>>> On Mon, Oct 27, 2014 at 8:36 AM, nitinkak001 <nitinkak001@gmail.com>
>>> wrote:
>>>
>>>> I am working on running the following hive query from spark.
>>>>
>>>> /"SELECT * FROM spark_poc.<table_name> DISTRIBUTE BY GEO_REGION,
>>>> GEO_COUNTRY
>>>> SORT BY IP_ADDRESS, COOKIE_ID"/
>>>>
>>>> Ran into /java.lang.NoSuchMethodError:
>>>>
>>>> com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;/
>>>> (complete stack trace at the bottom). Found a few mentions of this
>>>> issue in
>>>> the user list. It seems(from the below thread link) that there is a
>>>> Guava
>>>> version incompatibility between Spark 1.1.0 and Hive which is probably
>>>> fixed
>>>> in 1.2.0.
>>>>
>>>> /
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Hive-From-Spark-td10110.html#a12671/
>>>>
>>>> *So, wanted to confirm, is Spark SQL 1.1.0 incompatible with Hive or is
>>>> there a workaround to this?*
>>>>
>>>>
>>>>
>>>> /Exception in thread "Driver"
>>>> java.lang.reflect.InvocationTargetException
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>         at
>>>>
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>         at
>>>>
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>>>         at
>>>>
>>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)
>>>> Caused by: java.lang.NoSuchMethodError:
>>>>
>>>> com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
>>>>         at
>>>> org.apache.spark.util.collection.OpenHashSet.org
>>>> $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
>>>>         at
>>>>
>>>> org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
>>>>         at
>>>>
>>>> org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
>>>>         at
>>>>
>>>> org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
>>>>         at
>>>> scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>>>>         at
>>>> org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
>>>>         at
>>>>
>>>> org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
>>>>         at
>>>>
>>>> org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
>>>>         at
>>>> org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
>>>>         at
>>>>
>>>> org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
>>>>         at
>>>>
>>>> org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
>>>>         at
>>>>
>>>> org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31)
>>>>         at
>>>> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:236)
>>>>         at
>>>> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:126)
>>>>         at
>>>> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:104)
>>>>         at
>>>> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:750)
>>>>         at
>>>>
>>>> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:601)
>>>>         at
>>>> org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:872)
>>>>         at
>>>>
>>>> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:79)
>>>>         at
>>>>
>>>> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:68)
>>>>         at
>>>>
>>>> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
>>>>         at
>>>>
>>>> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
>>>>         at
>>>>
>>>> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
>>>>         at
>>>> org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
>>>>         at
>>>> org.apache.spark.sql.hive.HadoopTableReader.<init>(TableReader.scala:68)
>>>>         at
>>>>
>>>> org.apache.spark.sql.hive.execution.HiveTableScan.<init>(HiveTableScan.scala:68)
>>>>         at
>>>>
>>>> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
>>>>         at
>>>>
>>>> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
>>>>         at
>>>>
>>>> org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364)
>>>>         at
>>>>
>>>> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184)
>>>>         at
>>>>
>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>>>>         at
>>>>
>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>>>>         at
>>>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>>>         at
>>>>
>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>>>>         at
>>>>
>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>>>>         at
>>>>
>>>> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:292)
>>>>         at
>>>>
>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>>>>         at
>>>>
>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>>>>         at
>>>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>>>         at
>>>>
>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>>>>         at
>>>>
>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>>>>         at
>>>>
>>>> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:266)
>>>>         at
>>>>
>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>>>>         at
>>>>
>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>>>>         at
>>>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>>>         at
>>>>
>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>>>>         at
>>>>
>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402)
>>>>         at
>>>>
>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400)
>>>>         at
>>>>
>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406)
>>>>         at
>>>>
>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406)
>>>>         at
>>>>
>>>> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360)
>>>>         at
>>>>
>>>> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360)
>>>>         at
>>>> org.apache.spark.sql.SchemaRDD.getDependencies(SchemaRDD.scala:120)
>>>>         at
>>>> org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:191)
>>>>         at
>>>> org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:189)
>>>>         at scala.Option.getOrElse(Option.scala:120)
>>>>         at org.apache.spark.rdd.RDD.dependencies(RDD.scala:189)
>>>>         at org.apache.spark.rdd.RDD.firstParent(RDD.scala:1236)
>>>>         at
>>>> org.apache.spark.sql.SchemaRDD.getPartitions(SchemaRDD.scala:117)
>>>>         at
>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>>>>         at
>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>>>>         at scala.Option.getOrElse(Option.scala:120)
>>>>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
>>>>         at
>>>> org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
>>>>         at
>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>>>>         at
>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>>>>         at scala.Option.getOrElse(Option.scala:120)
>>>>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
>>>>         at
>>>>
>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>>>>         at
>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>>>>         at
>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>>>>         at scala.Option.getOrElse(Option.scala:120)
>>>>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
>>>>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1135)
>>>>         at org.apache.spark.rdd.RDD.collect(RDD.scala:774)
>>>>         at
>>>>
>>>> org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:305)
>>>>         at org.apache.spark.api.java.JavaRDD.collect(JavaRDD.scala:32)
>>>>         at HiveContextExample.printRDD(HiveContextExample.java:77)
>>>>         at HiveContextExample.main(HiveContextExample.java:71)
>>>> /
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-1-1-0-incompatible-with-Hive-tp17364.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>
>

Mime
View raw message