spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anson Abraham <anson.abra...@gmail.com>
Subject Re: spark-shell giving me error of unread block data
Date Wed, 19 Nov 2014 22:13:40 GMT
yeah but in this case i'm not building any files.  just deployed out config
files in CDH5.2 and initiated a spark-shell to just read and output a file.

On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin <vanzin@cloudera.com> wrote:

> Hi Anson,
>
> We've seen this error when incompatible classes are used in the driver
> and executors (e.g., same class name, but the classes are different
> and thus the serialized data is different). This can happen for
> example if you're including some 3rd party libraries in your app's
> jar, or changing the driver/executor class paths to include these
> conflicting libraries.
>
> Can you clarify whether any of the above apply to your case?
>
> (For example, one easy way to trigger this is to add the
> spark-examples jar shipped with CDH5.2 in the classpath of your
> driver. That's one of the reasons I filed SPARK-4048, but I digress.)
>
>
> On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham <anson.abraham@gmail.com>
> wrote:
> > I'm essentially loading a file and saving output to another location:
> >
> > val source = sc.textFile("/tmp/testfile.txt")
> > source.saveAsTextFile("/tmp/testsparkoutput")
> >
> > when i do so, i'm hitting this error:
> > 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at
> > <console>:15
> > org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 0 in
> > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0
> > (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException:
> unread
> > block data
> >
> > java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(
> ObjectInputStream.java:2421)
> >         java.io.ObjectInputStream.readObject0(ObjectInputStream.
> java:1382)
> >
> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> >
> > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> >
> > java.io.ObjectInputStream.readOrdinaryObject(
> ObjectInputStream.java:1798)
> >         java.io.ObjectInputStream.readObject0(ObjectInputStream.
> java:1350)
> >         java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> >
> > org.apache.spark.serializer.JavaDeserializationStream.
> readObject(JavaSerializer.scala:62)
> >
> > org.apache.spark.serializer.JavaSerializerInstance.
> deserialize(JavaSerializer.scala:87)
> >
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:162)
> >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> >         java.lang.Thread.run(Thread.java:744)
> > Driver stacktrace:
> > at
> > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$failJobAndIndependentStages(
> DAGScheduler.scala:1185)
> > at
> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(
> DAGScheduler.scala:1174)
> > at
> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(
> DAGScheduler.scala:1173)
> > at
> > scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:59)
> > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> > at
> > org.apache.spark.scheduler.DAGScheduler.abortStage(
> DAGScheduler.scala:1173)
> > at
> > org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
> > at
> > org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
> > at scala.Option.foreach(Option.scala:236)
> > at
> > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
> DAGScheduler.scala:688)
> > at
> > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$
> $anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> > at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> > at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> > at
> > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(
> AbstractDispatcher.scala:386)
> > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> > at
> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> runTask(ForkJoinPool.java:1339)
> > at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
> ForkJoinPool.java:1979)
> > at
> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
> ForkJoinWorkerThread.java:107)
> >
> >
> > Cant figure out what the issue is.  I'm running in CDH5.2 w/ version of
> > spark being 1.1.  The file i'm loading is literally just 7 MB.  I
> thought it
> > was jar files mismatch, but i did a compare and see they're all
> identical.
> > But seeing as how they were all installed through CDH parcels, not sure
> how
> > there would be version mismatch on the nodes and master.  Oh yeah 1
> master
> > node w/ 2 worker nodes and running in standalone not through yarn.  So
> as a
> > just in case, i copied the jars from the master to the 2 worker nodes as
> > just in case, and still same issue.
> > Weird thing is, first time i installed and tested it out, it worked, but
> now
> > it doesn't.
> >
> > Any help here would be greatly appreciated.
>
>
>
> --
> Marcelo
>

Mime
View raw message