spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anson Abraham <anson.abra...@gmail.com>
Subject Re: spark-shell giving me error of unread block data
Date Wed, 19 Nov 2014 22:41:14 GMT
yeah CDH distribution (1.1).

On Wed Nov 19 2014 at 5:29:39 PM Marcelo Vanzin <vanzin@cloudera.com> wrote:

> On Wed, Nov 19, 2014 at 2:13 PM, Anson Abraham <anson.abraham@gmail.com>
> wrote:
> > yeah but in this case i'm not building any files.  just deployed out
> config
> > files in CDH5.2 and initiated a spark-shell to just read and output a
> file.
>
> In that case it is a little bit weird. Just to be sure, you are using
> CDH's version of Spark, not trying to run an Apache Spark release on
> top of CDH, right? (If that's the case, then we could probably move
> this conversation to cdh-users@cloudera.org, since it would be
> CDH-specific.)
>
>
> > On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin <vanzin@cloudera.com>
> wrote:
> >>
> >> Hi Anson,
> >>
> >> We've seen this error when incompatible classes are used in the driver
> >> and executors (e.g., same class name, but the classes are different
> >> and thus the serialized data is different). This can happen for
> >> example if you're including some 3rd party libraries in your app's
> >> jar, or changing the driver/executor class paths to include these
> >> conflicting libraries.
> >>
> >> Can you clarify whether any of the above apply to your case?
> >>
> >> (For example, one easy way to trigger this is to add the
> >> spark-examples jar shipped with CDH5.2 in the classpath of your
> >> driver. That's one of the reasons I filed SPARK-4048, but I digress.)
> >>
> >>
> >> On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham <anson.abraham@gmail.com
> >
> >> wrote:
> >> > I'm essentially loading a file and saving output to another location:
> >> >
> >> > val source = sc.textFile("/tmp/testfile.txt")
> >> > source.saveAsTextFile("/tmp/testsparkoutput")
> >> >
> >> > when i do so, i'm hitting this error:
> >> > 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at
> >> > <console>:15
> >> > org.apache.spark.SparkException: Job aborted due to stage failure:
> Task
> >> > 0 in
> >> > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> >> > 0.0
> >> > (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException:
> >> > unread
> >> > block data
> >> >
> >> >
> >> > java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(
> ObjectInputStream.java:2421)
> >> >
> >> > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
> >> >
> >> > java.io.ObjectInputStream.defaultReadFields(
> ObjectInputStream.java:1990)
> >> >
> >> > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> >> >
> >> >
> >> > java.io.ObjectInputStream.readOrdinaryObject(
> ObjectInputStream.java:1798)
> >> >
> >> > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> >> >         java.io.ObjectInputStream.readObject(ObjectInputStream.
> java:370)
> >> >
> >> >
> >> > org.apache.spark.serializer.JavaDeserializationStream.
> readObject(JavaSerializer.scala:62)
> >> >
> >> >
> >> > org.apache.spark.serializer.JavaSerializerInstance.
> deserialize(JavaSerializer.scala:87)
> >> >
> >> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:162)
> >> >
> >> >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> >> >
> >> >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> >> >         java.lang.Thread.run(Thread.java:744)
> >> > Driver stacktrace:
> >> > at
> >> >
> >> > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$failJobAndIndependentStages(
> DAGScheduler.scala:1185)
> >> > at
> >> >
> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(
> DAGScheduler.scala:1174)
> >> > at
> >> >
> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(
> DAGScheduler.scala:1173)
> >> > at
> >> >
> >> > scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:59)
> >> > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> >> > at
> >> >
> >> > org.apache.spark.scheduler.DAGScheduler.abortStage(
> DAGScheduler.scala:1173)
> >> > at
> >> >
> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
> >> > at
> >> >
> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
> >> > at scala.Option.foreach(Option.scala:236)
> >> > at
> >> >
> >> > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
> DAGScheduler.scala:688)
> >> > at
> >> >
> >> > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$
> $anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
> >> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> >> > at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> >> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> >> > at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> >> > at
> >> >
> >> > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(
> AbstractDispatcher.scala:386)
> >> > at scala.concurrent.forkjoin.ForkJoinTask.doExec(
> ForkJoinTask.java:260)
> >> > at
> >> >
> >> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> runTask(ForkJoinPool.java:1339)
> >> > at
> >> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(
> ForkJoinPool.java:1979)
> >> > at
> >> >
> >> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
> ForkJoinWorkerThread.java:107)
> >> >
> >> >
> >> > Cant figure out what the issue is.  I'm running in CDH5.2 w/ version
> of
> >> > spark being 1.1.  The file i'm loading is literally just 7 MB.  I
> >> > thought it
> >> > was jar files mismatch, but i did a compare and see they're all
> >> > identical.
> >> > But seeing as how they were all installed through CDH parcels, not
> sure
> >> > how
> >> > there would be version mismatch on the nodes and master.  Oh yeah 1
> >> > master
> >> > node w/ 2 worker nodes and running in standalone not through yarn.  So
> >> > as a
> >> > just in case, i copied the jars from the master to the 2 worker nodes
> as
> >> > just in case, and still same issue.
> >> > Weird thing is, first time i installed and tested it out, it worked,
> but
> >> > now
> >> > it doesn't.
> >> >
> >> > Any help here would be greatly appreciated.
> >>
> >>
> >>
> >> --
> >> Marcelo
>
>
>
> --
> Marcelo
>

Mime
View raw message