spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@alumni.cmu.edu>
Subject Re: Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow
Date Fri, 01 Aug 2014 23:15:28 GMT
On Fri, Aug 1, 2014 at 2:45 PM, Andrew Ash <andrew@andrewash.com> wrote:
> After several days of debugging, we think the issue is that we have
> conflicting versions of Guava.  Our application was running with Guava 14
> and the Spark services (Master, Workers, Executors) had Guava 16.  We had
> custom Kryo serializers for Guava's ImmutableLists, and commenting out
> those register calls did the trick.
>
> Have people had issues with Guava version mismatches in the past?

There's some discussion about dealing with Guava version issues in
Spark in SPARK-2420.

best,
Colin


>
> I've found @srowen's Guava 14 -> 11 downgrade PR here
> https://github.com/apache/spark/pull/1610 and some extended discussion on
> https://issues.apache.org/jira/browse/SPARK-2420 for Hive compatibility
>
>
> On Thu, Jul 31, 2014 at 10:47 AM, Andrew Ash <andrew@andrewash.com> wrote:
>
>> Hi everyone,
>>
>> I'm seeing the below exception coming out of Spark 1.0.1 when I call it
>> from my application.  I can't share the source to that application, but the
>> quick gist is that it uses Spark's Java APIs to read from Avro files in
>> HDFS, do processing, and write back to Avro files.  It does this by
>> receiving a REST call, then spinning up a new JVM as the driver application
>> that connects to Spark.  I'm using CDH4.4.0 and have enabled Kryo and also
>> speculation.  The cluster is running in standalone mode on a 6 node cluster
>> in AWS (not using Spark's EC2 scripts though).
>>
>> The below stacktraces are reliably reproduceable on every run of the job.
>>  The issue seems to be that on deserialization of a task result on the
>> driver, Kryo spits up while reading the ClassManifest.
>>
>> I've tried swapping in Kryo 2.23.1 rather than 2.21 (2.22 had some
>> backcompat issues) but had the same error.
>>
>> Any ideas on what can be done here?
>>
>> Thanks!
>> Andrew
>>
>>
>>
>> In the driver (Kryo exception while deserializing a DirectTaskResult):
>>
>> INFO   | jvm 1    | 2014/07/30 20:52:52 | 20:52:52.667 [Result resolver
>> thread-0] ERROR o.a.spark.scheduler.TaskResultGetter - Exception while
>> getting task result
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |
>> com.esotericsoftware.kryo.KryoException: Buffer underflow.
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> com.esotericsoftware.kryo.io.Input.require(Input.java:156)
>> ~[kryo-2.21.jar:na]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> com.esotericsoftware.kryo.io.Input.readInt(Input.java:337)
>> ~[kryo-2.21.jar:na]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:762)
>> ~[kryo-2.21.jar:na]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:624) ~[kryo-2.21.jar:na]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:26)
>> ~[chill_2.10-0.3.6.jar:0.3.6]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:19)
>> ~[chill_2.10-0.3.6.jar:0.3.6]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>> ~[kryo-2.21.jar:na]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:147)
>> ~[spark-core_2.10-1.0.1.jar:1.0.1]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
>> ~[spark-core_2.10-1.0.1.jar:1.0.1]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:480)
>> ~[spark-core_2.10-1.0.1.jar:1.0.1]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:316)
>> ~[spark-core_2.10-1.0.1.jar:1.0.1]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
>> [spark-core_2.10-1.0.1.jar:1.0.1]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
>> [spark-core_2.10-1.0.1.jar:1.0.1]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
>> [spark-core_2.10-1.0.1.jar:1.0.1]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
>> [spark-core_2.10-1.0.1.jar:1.0.1]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
>> [spark-core_2.10-1.0.1.jar:1.0.1]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> [na:1.7.0_65]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> [na:1.7.0_65]
>> INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
>> java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>>
>>
>> In the DAGScheduler (job gets aborted):
>>
>> org.apache.spark.SparkException: Job aborted due to stage failure:
>> Exception while getting task result:
>> com.esotericsoftware.kryo.KryoException: Buffer underflow.
>>     at org.apache.spark.scheduler.DAGScheduler.org
>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>     at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>     at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>     at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>     at
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>     at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>     at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>     at scala.Option.foreach(Option.scala:236)
>>     at
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>     at
>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>     at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>     at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>     at
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>     at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>     at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>     at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>     at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>>
>> In an Executor (running tasks get killed):
>>
>> 14/07/29 22:57:38 INFO broadcast.HttpBroadcast: Started reading broadcast
>> variable 0
>> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
>> 153
>> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
>> 147
>> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
>> 141
>> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
>> 135
>> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
>> 150
>> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
>> 144
>> 14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
>> 138
>> 14/07/29 22:57:39 INFO storage.MemoryStore: ensureFreeSpace(241733) called
>> with curMem=0, maxMem=30870601728
>> 14/07/29 22:57:39 INFO storage.MemoryStore: Block broadcast_0 stored as
>> values to memory (estimated size 236.1 KB, free 28.8 GB)
>> 14/07/29 22:57:39 INFO broadcast.HttpBroadcast: Reading broadcast variable
>> 0 took 0.91790748 s
>> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
>> locally
>> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
>> locally
>> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
>> locally
>> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
>> locally
>> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
>> locally
>> 14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0
>> locally
>> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 135
>> org.apache.spark.TaskKilledException
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 144
>> org.apache.spark.TaskKilledException
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 150
>> org.apache.spark.TaskKilledException
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 138
>> org.apache.spark.TaskKilledException
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>> 14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 141
>> org.apache.spark.TaskKilledException
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>>

Mime
View raw message