spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow
Date Sat, 02 Aug 2014 03:13:27 GMT
Andrew - I think Spark is using Guava 14... are you using Guava 16 in your
user app (i.e. you inverted the versions in your earlier e-mail)?

- Patrick


On Fri, Aug 1, 2014 at 4:15 PM, Colin McCabe <cmccabe@alumni.cmu.edu> wrote:

> On Fri, Aug 1, 2014 at 2:45 PM, Andrew Ash <andrew@andrewash.com> wrote:
> > After several days of debugging, we think the issue is that we have
> > conflicting versions of Guava.  Our application was running with Guava 14
> > and the Spark services (Master, Workers, Executors) had Guava 16.  We had
> > custom Kryo serializers for Guava's ImmutableLists, and commenting out
> > those register calls did the trick.
> >
> > Have people had issues with Guava version mismatches in the past?
>
> There's some discussion about dealing with Guava version issues in
> Spark in SPARK-2420.
>
> best,
> Colin
>
>
> >
> > I've found @srowen's Guava 14 -> 11 downgrade PR here
> > https://github.com/apache/spark/pull/1610 and some extended discussion
> on
> > https://issues.apache.org/jira/browse/SPARK-2420 for Hive compatibility
> >
> >
> > On Thu, Jul 31, 2014 at 10:47 AM, Andrew Ash <andrew@andrewash.com>
> wrote:
> >
> >> Hi everyone,
> >>
> >> I'm seeing the below exception coming out of Spark 1.0.1 when I call it
> >> from my application.  I can't share the source to that application, but
> the
> >> quick gist is that it uses Spark's Java APIs to read from Avro files in
> >> HDFS, do processing, and write back to Avro files.  It does this by
> >> receiving a REST call, then spinning up a new JVM as the driver
> application
> >> that connects to Spark.  I'm using CDH4.4.0 and have enabled Kryo and
> also
> >> speculation.  The cluster is running in standalone mode on a 6 node
> cluster
> >> in AWS (not using Spark's EC2 scripts though).
> >>
> >> The below stacktraces are reliably reproduceable on every run of the
> job.
> >>  The issue seems to be that on deserialization of a task result on the
> >> driver, Kryo spits up while reading the ClassManifest.
> >>
> >> I've tried swapping in Kryo 2.23.1 rather than 2.21 (2.22 had some
> >> backcompat issues) but had the same error.
> >>
> >> Any ideas on what can be done here?
> >>
> >> Thanks!
> >> Andrew
> >>
> >>
> >>
> >> In the driver (Kryo exception while deserializing a DirectTaskResult):
> >>
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 | 20:52:52.667 [Result resolver
> >> thread-0] ERROR o.a.spark.scheduler.TaskResultGetter - Exception while
> >> getting task result
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |
> >> com.esotericsoftware.kryo.KryoException: Buffer underflow.
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >> com.esotericsoftware.kryo.io.Input.require(Input.java:156)
> >> ~[kryo-2.21.jar:na]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >> com.esotericsoftware.kryo.io.Input.readInt(Input.java:337)
> >> ~[kryo-2.21.jar:na]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >> com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:762)
> >> ~[kryo-2.21.jar:na]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >> com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:624)
> ~[kryo-2.21.jar:na]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >>
> com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:26)
> >> ~[chill_2.10-0.3.6.jar:0.3.6]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >>
> com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:19)
> >> ~[chill_2.10-0.3.6.jar:0.3.6]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> >> ~[kryo-2.21.jar:na]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >>
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:147)
> >> ~[spark-core_2.10-1.0.1.jar:1.0.1]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >> org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> >> ~[spark-core_2.10-1.0.1.jar:1.0.1]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >>
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:480)
> >> ~[spark-core_2.10-1.0.1.jar:1.0.1]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >>
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:316)
> >> ~[spark-core_2.10-1.0.1.jar:1.0.1]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >>
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> >> [spark-core_2.10-1.0.1.jar:1.0.1]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >>
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> >> [spark-core_2.10-1.0.1.jar:1.0.1]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >>
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> >> [spark-core_2.10-1.0.1.jar:1.0.1]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
> >> [spark-core_2.10-1.0.1.jar:1.0.1]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >>
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> >> [spark-core_2.10-1.0.1.jar:1.0.1]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> [na:1.7.0_65]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> [na:1.7.0_65]
> >> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
> >> java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
> >>
> >>
> >> In the DAGScheduler (job gets aborted):
> >>
> >> org.apache.spark.SparkException: Job aborted due to stage failure:
> >> Exception while getting task result:
> >> com.esotericsoftware.kryo.KryoException: Buffer underflow.
> >>     at org.apache.spark.scheduler.DAGScheduler.org
> >>
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
> >>     at
> >>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
> >>     at
> >>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
> >>     at
> >>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> >>     at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> >>     at
> >>
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
> >>     at
> >>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
> >>     at
> >>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
> >>     at scala.Option.foreach(Option.scala:236)
> >>     at
> >>
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
> >>     at
> >>
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
> >>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> >>     at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> >>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> >>     at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> >>     at
> >>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> >>     at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >>     at
> >>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> >>     at
> >> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >>     at
> >>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >>
> >>
> >> In an Executor (running tasks get killed):
> >>
> >> 14/07/29 22:57:38 INFO broadcast.HttpBroadcast: Started reading
> broadcast
> >> variable 0
> >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill
> task
> >> 153
> >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill
> task
> >> 147
> >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill
> task
> >> 141
> >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill
> task
> >> 135
> >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill
> task
> >> 150
> >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill
> task
> >> 144
> >> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill
> task
> >> 138
> >> 14/07/29 22:57:39 INFO storage.MemoryStore: ensureFreeSpace(241733)
> called
> >> with curMem=0, maxMem=30870601728
> >> 14/07/29 22:57:39 INFO storage.MemoryStore: Block broadcast_0 stored as
> >> values to memory (estimated size 236.1 KB, free 28.8 GB)
> >> 14/07/29 22:57:39 INFO broadcast.HttpBroadcast: Reading broadcast
> variable
> >> 0 took 0.91790748 s
> >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> >> locally
> >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> >> locally
> >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> >> locally
> >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> >> locally
> >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> >> locally
> >> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
> >> locally
> >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 135
> >> org.apache.spark.TaskKilledException
> >>         at
> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
> >>         at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>         at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>         at java.lang.Thread.run(Thread.java:745)
> >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 144
> >> org.apache.spark.TaskKilledException
> >>         at
> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
> >>         at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>         at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>         at java.lang.Thread.run(Thread.java:745)
> >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 150
> >> org.apache.spark.TaskKilledException
> >>         at
> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
> >>         at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>         at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>         at java.lang.Thread.run(Thread.java:745)
> >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 138
> >> org.apache.spark.TaskKilledException
> >>         at
> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
> >>         at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>         at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>         at java.lang.Thread.run(Thread.java:745)
> >> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 141
> >> org.apache.spark.TaskKilledException
> >>         at
> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
> >>         at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>         at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>         at java.lang.Thread.run(Thread.java:745)
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message