spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Upgrade to Spark 1.2.1 using Guava
Date Mon, 02 Mar 2015 16:52:03 GMT
Marcelo’s work-around works. So if you are using the itemsimilarity stuff, the CLI has a
way to solve the class not found and I can point out how to do the equivalent if you are using
the library API. Ping me if you care.


On Feb 28, 2015, at 2:27 PM, Erlend Hamnaberg <erlend@hamnaberg.net> wrote:

Yes. I ran into this problem with mahout snapshot and spark 1.2.0 not really trying to figure
out why that was a problem, since there were already too many moving parts in my app. Obviously
there is a classpath issue somewhere.

/Erlend

On 27 Feb 2015 22:30, "Pat Ferrel" <pat@occamsmachete.com <mailto:pat@occamsmachete.com>>
wrote:
@Erlend hah, we were trying to merge your PR and ran into this—small world. You actually
compile the JavaSerializer source in your project?

@Marcelo do you mean by modifying spark.executor.extraClassPath on all workers, that didn’t
seem to work?

On Feb 27, 2015, at 1:23 PM, Erlend Hamnaberg <erlend@hamnaberg.net <mailto:erlend@hamnaberg.net>>
wrote:

Hi.

I have had a simliar issue. I had to pull the JavaSerializer source into my own project, just
so I got the classloading of this class under control.

This must be a class loader issue with spark.

-E

On Fri, Feb 27, 2015 at 8:52 PM, Pat Ferrel <pat@occamsmachete.com <mailto:pat@occamsmachete.com>>
wrote:
I understand that I need to supply Guava to Spark. The HashBiMap is created in the client
and broadcast to the workers. So it is needed in both. To achieve this there is a deps.jar
with Guava (and Scopt but that is only for the client). Scopt is found so I know the jar is
fine for the client.

I pass in the deps.jar to the context creation code. I’ve checked the content of the jar
and have verified that it is used at context creation time.

I register the serializer as follows:

class MyKryoRegistrator extends KryoRegistrator {

  override def registerClasses(kryo: Kryo) = {

    val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]()
    //kryo.addDefaultSerializer(h.getClass, new JavaSerializer())
    log.info <http://log.info/>("\n\n\nRegister Serializer for " + h.getClass.getCanonicalName
+ "\n\n\n") // just to be sure this does indeed get logged
    kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], new JavaSerializer())
  }
}

The job proceeds up until the broadcast value, a HashBiMap, is deserialized, which is where
I get the following error.

Have I missed a step for deserialization of broadcast values? Odd that serialization happened
but deserialization failed. I’m running on a standalone localhost-only cluster.


15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 (TID 9, 192.168.0.2):
java.io.IOException: com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
        at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
        at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
        at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
        at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
        at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
        at my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
        at my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
        at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:56)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
        at com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
        at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
        at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
        at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
        ... 19 more

======== root eror ==========
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        ...








On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin <vanzin@cloudera.com <mailto:vanzin@cloudera.com>>
wrote:

Guava is not in Spark. (Well, long version: it's in Spark but it's
relocated to a different package except for some special classes
leaked through the public API.)

If your app needs Guava, it needs to package Guava with it (e.g. by
using maven-shade-plugin, or using "--jars" if only executors use
Guava).

On Wed, Feb 25, 2015 at 5:17 PM, Pat Ferrel <pat@occamsmachete.com <mailto:pat@occamsmachete.com>>
wrote:
> The root Spark pom has guava set at a certain version number. It’s very hard
> to read the shading xml. Someone suggested that I try using
> userClassPathFirst but that sounds too heavy handed since I don’t really
> care which version of guava I get, not picky.
>
> When I set my project to use the same version as Spark I get a missing
> classdef, which usually means a version conflict.
>
> At this point I am quite confused about what is actually in Spark as far as
> Guava and how to coexist with it happily.
>
> Let me rephrase my question: Does anyone know how or has anyone used Guava
> in a project? Is there a recommended way to use it in a job?

--
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <mailto:user-unsubscribe@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org <mailto:user-help@spark.apache.org>



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <mailto:user-unsubscribe@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org <mailto:user-help@spark.apache.org>





Mime
View raw message