spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Guava 11 dependency issue in Spark 1.2.0
Date Tue, 06 Jan 2015 10:23:32 GMT
Oh, are you actually bundling Hadoop in your app? that may be the problem.
If you're using stand-alone mode, why include Hadoop? In any event, Spark
and Hadoop are intended to be 'provided' dependencies in the app you send
to spark-submit.

On Tue, Jan 6, 2015 at 10:15 AM, Niranda Perera <niranda.perera@gmail.com>
wrote:

> Hi Sean,
>
> My mistake, Guava 11 dependency came from the hadoop-commons indeed.
>
> I'm running the following simple app in spark 1.2.0 standalone local
> cluster (2 workers) with Hadoop 1.2.1
>
> public class AvroSparkTest {
>     public static void main(String[] args) throws Exception {
>         SparkConf sparkConf = new SparkConf()
>                 .setMaster("spark://niranda-ThinkPad-T540p:7077")
> //("local[2]")
>                 .setAppName("avro-spark-test");
>
>         JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
>         JavaSQLContext sqlContext = new JavaSQLContext(sparkContext);
>         JavaSchemaRDD episodes = AvroUtils.avroFile(sqlContext,
>
> "/home/niranda/projects/avro-spark-test/src/test/resources/episodes.avro");
>         episodes.printSchema();
>         episodes.registerTempTable("avroTable");
>         List<Row> result = sqlContext.sql("SELECT * FROM
> avroTable").collect();
>
>         for (Row row : result) {
>             System.out.println(row.toString());
>         }
>     }
> }
>
> As you pointed out, this error occurs while adding the hadoop dependency.
> this runs without a problem when the hadoop dependency is removed and the
> master is set to local[].
>
> Cheers
>
> On Tue, Jan 6, 2015 at 3:23 PM, Sean Owen <sowen@cloudera.com> wrote:
>
>> -dev
>>
>> Guava was not downgraded to 11. That PR was not merged. It was part of a
>> discussion about, indeed, what to do about potential Guava version
>> conflicts. Spark uses Guava, but so does Hadoop, and so do user programs.
>>
>> Spark uses 14.0.1 in fact:
>> https://github.com/apache/spark/blob/master/pom.xml#L330
>>
>> This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava
>> 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as
>> well.
>>
>> Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a
>> lot of these problems are solved. As we've seen though, this one is tricky.
>>
>> What's your Spark version? and what are you executing? what mode --
>> standalone, YARN? What Hadoop version?
>>
>>
>> On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera <niranda.perera@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have been running a simple Spark app on a local spark cluster and I
>>> came across this error.
>>>
>>> Exception in thread "main" java.lang.NoSuchMethodError:
>>> com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
>>>     at org.apache.spark.util.collection.OpenHashSet.org
>>> $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
>>>     at
>>> org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
>>>     at
>>> org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
>>>     at
>>> org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
>>>     at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>>>     at
>>> org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
>>>     at
>>> org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
>>>     at
>>> org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
>>>     at
>>> org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
>>>     at
>>> org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
>>>     at
>>> org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
>>>     at
>>> org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31)
>>>     at
>>> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
>>>     at
>>> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136)
>>>     at
>>> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
>>>     at
>>> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
>>>     at
>>> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
>>>     at
>>> org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
>>>     at
>>> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
>>>     at
>>> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:84)
>>>     at
>>> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
>>>     at
>>> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
>>>     at
>>> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
>>>     at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
>>>     at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695)
>>>     at
>>> com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45)
>>>     at
>>> com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44)
>>>     at
>>> org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56)
>>>     at
>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>>>     at
>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>>>     at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>>     at
>>> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>>>     at
>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
>>>     at
>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
>>>     at
>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
>>>     at
>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
>>>     at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
>>>     at
>>> org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)
>>>
>>>
>>> While looking into this I found out that Guava was downgraded to version
>>> 11 in this PR.
>>> https://github.com/apache/spark/pull/1610
>>>
>>> In this PR OpenHashSet.scala:261 line hashInt has been changed to
>>> hashLong.
>>> But when I actually run my app,  "java.lang.NoSuchMethodError:
>>> com.google.common.hash.HashFunction.hashInt" error occurs,
>>> which is understandable because hashInt is not available before Guava 12.
>>>
>>> So, I''m wondering why this occurs?
>>>
>>> Cheers
>>> --
>>> Niranda Perera
>>>
>>>
>>
>
>
> --
> Niranda
>

Mime
View raw message