spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stoney Vintson <ston...@gmail.com>
Subject Re: Loss was due to com.esotericsoftware.kryo.KryoException: Buffer overflow.
Date Sun, 06 Oct 2013 00:22:51 GMT
Ultimately, what do you want to do with the 10GBs of results?  Do you want
to write it back to hdfs, hbase, etc. Are you going to store it in Tachyon
and query it from shark?  Why make a distributed IO write travel across the
slower network and make it non distributed?
On Oct 5, 2013 5:06 PM, "Ryan Compton" <compton.ryan@gmail.com> wrote:

> I have 128g for each node
>
> On Sat, Oct 5, 2013 at 4:58 PM, Reynold Xin <rxin@apache.org> wrote:
> > You probably shouldn't be collecting a 10g dataset, because that is
> going to
> > put all the 10g to the driver node ...
> >
> >
> > On Fri, Oct 4, 2013 at 6:53 PM, Ryan Compton <compton.ryan@gmail.com>
> wrote:
> >>
> >> Some hints: I'm doing collect() on a large (~10g??) dataset. If I
> >> shrink that down, I have no problems. Ive tried
> >>
> >> System.setProperty("spark.akka.frameSize", "15420")
> >>
> >> But then I get:
> >>
> >> 13/10/04 18:49:33 ERROR client.Client$ClientActor: Failed to connect to
> >> master
> >> org.jboss.netty.channel.ChannelPipelineException: Failed to initialize
> >> a pipeline.
> >> at
> >>
> org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:209)
> >> at
> >>
> org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:183)
> >> at
> >>
> akka.remote.netty.ActiveRemoteClient$$anonfun$connect$1.apply$mcV$sp(Client.scala:173)
> >> at akka.util.Switch.liftedTree1$1(LockUtil.scala:33)
> >> at akka.util.Switch.transcend(LockUtil.scala:32)
> >> at akka.util.Switch.switchOn(LockUtil.scala:55)
> >> at akka.remote.netty.ActiveRemoteClient.connect(Client.scala:158)
> >> at
> >>
> akka.remote.netty.NettyRemoteTransport.send(NettyRemoteSupport.scala:153)
> >> at akka.remote.RemoteActorRef.$bang(RemoteActorRefProvider.scala:247)
> >> at
> >>
> org.apache.spark.deploy.client.Client$ClientActor.preStart(Client.scala:61)
> >> at akka.actor.ActorCell.create$1(ActorCell.scala:508)
> >> at akka.actor.ActorCell.systemInvoke(ActorCell.scala:600)
> >> at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:209)
> >> at akka.dispatch.Mailbox.run(Mailbox.scala:178)
> >> at
> >>
> akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
> >> at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
> >> at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
> >> at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
> >> at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
> >> Caused by: java.lang.IllegalArgumentException: maxFrameLength must be
> >> a positive integer: -1010827264
> >> at
> >>
> org.jboss.netty.handler.codec.frame.LengthFieldBasedFrameDecoder.<init>(LengthFieldBasedFrameDecoder.java:270)
> >> at
> >>
> org.jboss.netty.handler.codec.frame.LengthFieldBasedFrameDecoder.<init>(LengthFieldBasedFrameDecoder.java:236)
> >> at
> >>
> akka.remote.netty.ActiveRemoteClientPipelineFactory.getPipeline(Client.scala:340)
> >> at
> >>
> org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:207)
> >> ... 18 more
> >> 13/10/04 18:49:33 ERROR cluster.SparkDeploySchedulerBackend:
> >> Disconnected from Spark cluster!
> >> 13/10/04 18:49:33 ERROR cluster.ClusterScheduler: Exiting due to error
> >> from cluster scheduler: Disconnected from Spark cluster
> >>
> >> On Fri, Oct 4, 2013 at 5:31 PM, Ryan Compton <compton.ryan@gmail.com>
> >> wrote:
> >> > When I turn on Kryo serialization in 0.8 my jobs fail with these
> >> > errors and don't understand what's going wrong. Any ideas?
> >> >
> >> > I've got these properties:
> >> >
> >> >     //my usual spark props
> >> >     System.setProperty("spark.serializer",
> >> > "org.apache.spark.serializer.KryoSerializer")
> >> >     System.setProperty("spark.kryo.registrator",
> >> > classOf[OSIKryoRegistrator].getName)
> >> >     System.setProperty("spark.cores.max", "532")
> >> >     System.setProperty("spark.executor.memory", "92g")
> >> >     System.setProperty("spark.default.parallelism", "256")
> >> >     System.setProperty("spark.akka.frameSize", "1024")
> >> >     System.setProperty("spark.kryoserializer.buffer.mb","24")
> >> >
> >> > And these errors:
> >> >
> >> > 13/10/04 17:21:56 INFO cluster.ClusterTaskSetManager: Loss was due to
> >> > com.esotericsoftware.kryo.KryoException: Buffer overflow. Available:
> >> > 6, required: 8
> >> > Serialization trace:
> >> > longitude (com.hrl.issl.osi.geometry.Location) [duplicate 2]
> >> > 13/10/04 17:21:56 INFO cluster.ClusterTaskSetManager: Starting task
> >> > 2.0:728 as TID 1490 on executor 17: node25 (PROCESS_LOCAL)
> >> > 13/10/04 17:21:56 INFO cluster.ClusterTaskSetManager: Serialized task
> >> > 2.0:728 as 1892 bytes in 0 ms
> >> > 13/10/04 17:21:58 INFO cluster.ClusterTaskSetManager: Lost TID 1486
> >> > (task 2.0:730)
> >> > 13/10/04 17:21:58 INFO cluster.ClusterTaskSetManager: Loss was due to
> >> > com.esotericsoftware.kryo.KryoException
> >> > com.esotericsoftware.kryo.KryoException: Buffer overflow. Available:
> >> > 6, required: 8
> >> > Serialization trace:
> >> > latitude (com.hrl.issl.osi.geometry.Location)
> >> >         at
> com.esotericsoftware.kryo.io.Output.require(Output.java:138)
> >> >         at
> >> > com.esotericsoftware.kryo.io.Output.writeLong(Output.java:477)
> >> >         at
> >> > com.esotericsoftware.kryo.io.Output.writeDouble(Output.java:596)
> >> >         at
> >> >
> com.esotericsoftware.kryo.serializers.DefaultSerializers$DoubleSerializer.write(DefaultSerializers.java:137)
> >> >         at
> >> >
> com.esotericsoftware.kryo.serializers.DefaultSerializers$DoubleSerializer.write(DefaultSerializers.java:131)
> >> >         at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
> >> >         at
> >> >
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:576)
> >> >         at
> >> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
> >> >         at
> >> > com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
> >> >         at
> >> > com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:38)
> >> >         at
> >> > com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:34)
> >> >         at
> >> > com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
> >> >         at
> >> >
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
> >> >         at
> >> >
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
> >> >         at
> >> > com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
> >> >         at
> >> >
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:126)
> >> >         at
> >> >
> org.apache.spark.scheduler.TaskResult.writeExternal(TaskResult.scala:40)
> >> >         at
> >> >
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1429)
> >> >         at
> >> >
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1398)
> >> >         at
> >> > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1158)
> >> >         at
> >> > java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:330)
> >> >         at
> >> >
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:27)
> >> >         at
> >> >
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:47)
> >> >         at
> >> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:171)
> >> >         at
> >> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> >         at
> >> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> >         at java.lang.Thread.run(Thread.java:662)
> >> > 13/10/04 17:21:58 ERROR cluster.ClusterTaskSetManager: Task 2.0:730
> >> > failed more than 4 times; aborting job
> >> > 13/10/04 17:21:58 INFO cluster.ClusterScheduler: Remove TaskSet 2.0
> from
> >> > pool
> >
> >
>

Mime
View raw message