spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow
Date Sat, 02 Aug 2014 05:06:31 GMT
The original version numbers I reported were indeed what we had, so let me
clarify the situation.

Our application had Guava 14 because that's what Spark depends on.  But we
had added an in-house library to the Hadoop cluster and also the Spark
cluster to add a new FileSystem (think hdfs://, s3n://, etc) that was using
Guava 16.  So the Guava 16 from our additional FileSystem overrode the
Guava 11 jar from the CDH4.4.0 lib directory and the Guava 14 class files
that are bundled in the Spark assembly jar.  That mismatch between 16 on
the cluster and 14 on the driver caused us problems with ImmutableLists,
which must have changed in a way between 14 and 16 that aren't binary
compatible in Kryo serialization.

At least that's our current understanding of the bug we experienced.


On Fri, Aug 1, 2014 at 11:13 PM, Patrick Wendell <pwendell@gmail.com> wrote:

> Andrew - I think Spark is using Guava 14... are you using Guava 16 in your
> user app (i.e. you inverted the versions in your earlier e-mail)?
>
> - Patrick
>
>
> On Fri, Aug 1, 2014 at 4:15 PM, Colin McCabe <cmccabe@alumni.cmu.edu>
> wrote:
>
> > On Fri, Aug 1, 2014 at 2:45 PM, Andrew Ash <andrew@andrewash.com> wrote:
> > > After several days of debugging, we think the issue is that we have
> > > conflicting versions of Guava.  Our application was running with Guava
> 14
> > > and the Spark services (Master, Workers, Executors) had Guava 16.  We
> had
> > > custom Kryo serializers for Guava's ImmutableLists, and commenting out
> > > those register calls did the trick.
> > >
> > > Have people had issues with Guava version mismatches in the past?
> >
> > There's some discussion about dealing with Guava version issues in
> > Spark in SPARK-2420.
> >
> > best,
> > Colin
> >
> >
> > >
> > > I've found @srowen's Guava 14 -> 11 downgrade PR here
> > > https://github.com/apache/spark/pull/1610 and some extended discussion
> > on
> > > https://issues.apache.org/jira/browse/SPARK-2420 for Hive
> compatibility
> > >
> > >
> > > On Thu, Jul 31, 2014 at 10:47 AM, Andrew Ash <andrew@andrewash.com>
> > wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> I'm seeing the below exception coming out of Spark 1.0.1 when I call
> it
> > >> from my application.  I can't share the source to that application,
> but
> > the
> > >> quick gist is that it uses Spark's Java APIs to read from Avro files
> in
> > >> HDFS, do processing, and write back to Avro files.  It does this by
> > >> receiving a REST call, then spinning up a new JVM as the driver
> > application
> > >> that connects to Spark.  I'm using CDH4.4.0 and have enabled Kryo and
> > also
> > >> speculation.  The cluster is running in standalone mode on a 6 node
> > cluster
> > >> in AWS (not using Spark's EC2 scripts though).
> > >>
> > >> The below stacktraces are reliably reproduceable on every run of the
> > job.
> > >>  The issue seems to be that on deserialization of a task result on the
> > >> driver, Kryo spits up while reading the ClassManifest.
> > >>
> > >> I've tried swapping in Kryo 2.23.1 rather than 2.21 (2.22 had some
> > >> backcompat issues) but had the same error.
> > >>
> > >> Any ideas on what can be done here?
> > >>
> > >> Thanks!
> > >> Andrew
> > >>
> > >>
> > >>
> > >> In the driver (Kryo exception while deserializing a DirectTaskResult):
> > >>
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 | 20:52:52.667 [Result
> resolver
> > >> thread-0] ERROR o.a.spark.scheduler.TaskResultGetter - Exception while
> > >> getting task result
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |
> > >> com.esotericsoftware.kryo.KryoException: Buffer underflow.
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >> com.esotericsoftware.kryo.io.Input.require(Input.java:156)
> > >> ~[kryo-2.21.jar:na]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >> com.esotericsoftware.kryo.io.Input.readInt(Input.java:337)
> > >> ~[kryo-2.21.jar:na]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >> com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:762)
> > >> ~[kryo-2.21.jar:na]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >> com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:624)
> > ~[kryo-2.21.jar:na]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >>
> >
> com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:26)
> > >> ~[chill_2.10-0.3.6.jar:0.3.6]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >>
> >
> com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:19)
> > >> ~[chill_2.10-0.3.6.jar:0.3.6]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> > >> ~[kryo-2.21.jar:na]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >>
> >
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:147)
> > >> ~[spark-core_2.10-1.0.1.jar:1.0.1]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >> org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> > >> ~[spark-core_2.10-1.0.1.jar:1.0.1]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >>
> >
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:480)
> > >> ~[spark-core_2.10-1.0.1.jar:1.0.1]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >>
> >
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:316)
> > >> ~[spark-core_2.10-1.0.1.jar:1.0.1]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >>
> >
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> > >> [spark-core_2.10-1.0.1.jar:1.0.1]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >>
> >
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> > >> [spark-core_2.10-1.0.1.jar:1.0.1]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >>
> >
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> > >> [spark-core_2.10-1.0.1.jar:1.0.1]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
> > >> [spark-core_2.10-1.0.1.jar:1.0.1]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >>
> >
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> > >> [spark-core_2.10-1.0.1.jar:1.0.1]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >> [na:1.7.0_65]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >> [na:1.7.0_65]
> > >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> > >> java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
> > >>
> > >>
> > >> In the DAGScheduler (job gets aborted):
> > >>
> > >> org.apache.spark.SparkException: Job aborted due to stage failure:
> > >> Exception while getting task result:
> > >> com.esotericsoftware.kryo.KryoException: Buffer underflow.
> > >>     at org.apache.spark.scheduler.DAGScheduler.org
> > >>
> >
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
> > >>     at
> > >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
> > >>     at
> > >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
> > >>     at
> > >>
> >
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> > >>     at
> > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> > >>     at
> > >>
> >
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
> > >>     at
> > >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
> > >>     at
> > >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
> > >>     at scala.Option.foreach(Option.scala:236)
> > >>     at
> > >>
> >
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
> > >>     at
> > >>
> >
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
> > >>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> > >>     at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> > >>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> > >>     at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> > >>     at
> > >>
> >
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> > >>     at
> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> > >>     at
> > >>
> >
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> > >>     at
> > >>
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> > >>     at
> > >>
> >
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> > >>
> > >>
> > >> In an Executor (running tasks get killed):
> > >>
> > >> 14/07/29 22:57:38 INFO broadcast.HttpBroadcast: Started reading
> > broadcast
> > >> variable 0
> > >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill
> > task
> > >> 153
> > >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill
> > task
> > >> 147
> > >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill
> > task
> > >> 141
> > >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill
> > task
> > >> 135
> > >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill
> > task
> > >> 150
> > >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill
> > task
> > >> 144
> > >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill
> > task
> > >> 138
> > >> 14/07/29 22:57:39 INFO storage.MemoryStore: ensureFreeSpace(241733)
> > called
> > >> with curMem=0, maxMem=30870601728
> > >> 14/07/29 22:57:39 INFO storage.MemoryStore: Block broadcast_0 stored
> as
> > >> values to memory (estimated size 236.1 KB, free 28.8 GB)
> > >> 14/07/29 22:57:39 INFO broadcast.HttpBroadcast: Reading broadcast
> > variable
> > >> 0 took 0.91790748 s
> > >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> > >> locally
> > >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> > >> locally
> > >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> > >> locally
> > >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> > >> locally
> > >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> > >> locally
> > >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> > >> locally
> > >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 135
> > >> org.apache.spark.TaskKilledException
> > >>         at
> > >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
> > >>         at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>         at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>         at java.lang.Thread.run(Thread.java:745)
> > >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 144
> > >> org.apache.spark.TaskKilledException
> > >>         at
> > >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
> > >>         at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>         at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>         at java.lang.Thread.run(Thread.java:745)
> > >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 150
> > >> org.apache.spark.TaskKilledException
> > >>         at
> > >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
> > >>         at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>         at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>         at java.lang.Thread.run(Thread.java:745)
> > >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 138
> > >> org.apache.spark.TaskKilledException
> > >>         at
> > >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
> > >>         at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>         at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>         at java.lang.Thread.run(Thread.java:745)
> > >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 141
> > >> org.apache.spark.TaskKilledException
> > >>         at
> > >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
> > >>         at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>         at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>         at java.lang.Thread.run(Thread.java:745)
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message