spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ritesh Kumar Singh <riteshoneinamill...@gmail.com>
Subject Re: spark-shell giving me error of unread block data
Date Thu, 20 Nov 2014 01:06:39 GMT
As Marcelo mentioned, the issue occurs mostly when incompatible classes are
used by executors or drivers.  Try out if the output is coming on
spark-shell. If yes, then most probably in your case, there might be some
issue with your configuration files. It will be helpful if you can paste
the contents of the config files you edited.

On Thu, Nov 20, 2014 at 5:45 AM, Anson Abraham <anson.abraham@gmail.com>
wrote:

> Sorry meant cdh 5.2 w/ spark 1.1.
>
> On Wed, Nov 19, 2014, 17:41 Anson Abraham <anson.abraham@gmail.com> wrote:
>
>> yeah CDH distribution (1.1).
>>
>> On Wed Nov 19 2014 at 5:29:39 PM Marcelo Vanzin <vanzin@cloudera.com>
>> wrote:
>>
>>> On Wed, Nov 19, 2014 at 2:13 PM, Anson Abraham <anson.abraham@gmail.com>
>>> wrote:
>>> > yeah but in this case i'm not building any files.  just deployed out
>>> config
>>> > files in CDH5.2 and initiated a spark-shell to just read and output a
>>> file.
>>>
>>> In that case it is a little bit weird. Just to be sure, you are using
>>> CDH's version of Spark, not trying to run an Apache Spark release on
>>> top of CDH, right? (If that's the case, then we could probably move
>>> this conversation to cdh-users@cloudera.org, since it would be
>>> CDH-specific.)
>>>
>>>
>>> > On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin <vanzin@cloudera.com>
>>> wrote:
>>> >>
>>> >> Hi Anson,
>>> >>
>>> >> We've seen this error when incompatible classes are used in the driver
>>> >> and executors (e.g., same class name, but the classes are different
>>> >> and thus the serialized data is different). This can happen for
>>> >> example if you're including some 3rd party libraries in your app's
>>> >> jar, or changing the driver/executor class paths to include these
>>> >> conflicting libraries.
>>> >>
>>> >> Can you clarify whether any of the above apply to your case?
>>> >>
>>> >> (For example, one easy way to trigger this is to add the
>>> >> spark-examples jar shipped with CDH5.2 in the classpath of your
>>> >> driver. That's one of the reasons I filed SPARK-4048, but I digress.)
>>> >>
>>> >>
>>> >> On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham <
>>> anson.abraham@gmail.com>
>>> >> wrote:
>>> >> > I'm essentially loading a file and saving output to another
>>> location:
>>> >> >
>>> >> > val source = sc.textFile("/tmp/testfile.txt")
>>> >> > source.saveAsTextFile("/tmp/testsparkoutput")
>>> >> >
>>> >> > when i do so, i'm hitting this error:
>>> >> > 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile
at
>>> >> > <console>:15
>>> >> > org.apache.spark.SparkException: Job aborted due to stage failure:
>>> Task
>>> >> > 0 in
>>> >> > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in
>>> stage
>>> >> > 0.0
>>> >> > (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateExceptio
>>> n:
>>> >> > unread
>>> >> > block data
>>> >> >
>>> >> >
>>> >> > java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(
>>> ObjectInputStream.java:2421)
>>> >> >
>>> >> > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
>>> >> >
>>> >> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea
>>> m.java:1990)
>>> >> >
>>> >> > java.io.ObjectInputStream.readSerialData(ObjectInputStream.
>>> java:1915)
>>> >> >
>>> >> >
>>> >> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre
>>> am.java:1798)
>>> >> >
>>> >> > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> >> >         java.io.ObjectInputStream.readObject(ObjectInputStream.java
>>> :370)
>>> >> >
>>> >> >
>>> >> > org.apache.spark.serializer.JavaDeserializationStream.readOb
>>> ject(JavaSerializer.scala:62)
>>> >> >
>>> >> >
>>> >> > org.apache.spark.serializer.JavaSerializerInstance.deseriali
>>> ze(JavaSerializer.scala:87)
>>> >> >
>>> >> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.
>>> scala:162)
>>> >> >
>>> >> >
>>> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1145)
>>> >> >
>>> >> >
>>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:615)
>>> >> >         java.lang.Thread.run(Thread.java:744)
>>> >> > Driver stacktrace:
>>> >> > at
>>> >> >
>>> >> > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$sch
>>> eduler$DAGScheduler$$failJobAndIndependentStages(DAGSchedule
>>> r.scala:1185)
>>> >> > at
>>> >> >
>>> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
>>> 1.apply(DAGScheduler.scala:1174)
>>> >> > at
>>> >> >
>>> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
>>> 1.apply(DAGScheduler.scala:1173)
>>> >> > at
>>> >> >
>>> >> > scala.collection.mutable.ResizableArray$class.foreach(Resiza
>>> bleArray.scala:59)
>>> >> > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.
>>> scala:47)
>>> >> > at
>>> >> >
>>> >> > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu
>>> ler.scala:1173)
>>> >> > at
>>> >> >
>>> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
>>> etFailed$1.apply(DAGScheduler.scala:688)
>>> >> > at
>>> >> >
>>> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
>>> etFailed$1.apply(DAGScheduler.scala:688)
>>> >> > at scala.Option.foreach(Option.scala:236)
>>> >> > at
>>> >> >
>>> >> > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
>>> DAGScheduler.scala:688)
>>> >> > at
>>> >> >
>>> >> > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$
>>> anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
>>> >> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>> >> > at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>> >> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>> >> > at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>> >> > at
>>> >> >
>>> >> > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(
>>> AbstractDispatcher.scala:386)
>>> >> > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.
>>> java:260)
>>> >> > at
>>> >> >
>>> >> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(
>>> ForkJoinPool.java:1339)
>>> >> > at
>>> >> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPoo
>>> l.java:1979)
>>> >> > at
>>> >> >
>>> >> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinW
>>> orkerThread.java:107)
>>> >> >
>>> >> >
>>> >> > Cant figure out what the issue is.  I'm running in CDH5.2 w/
>>> version of
>>> >> > spark being 1.1.  The file i'm loading is literally just 7 MB.
 I
>>> >> > thought it
>>> >> > was jar files mismatch, but i did a compare and see they're all
>>> >> > identical.
>>> >> > But seeing as how they were all installed through CDH parcels,
not
>>> sure
>>> >> > how
>>> >> > there would be version mismatch on the nodes and master.  Oh yeah
1
>>> >> > master
>>> >> > node w/ 2 worker nodes and running in standalone not through yarn.
>>> So
>>> >> > as a
>>> >> > just in case, i copied the jars from the master to the 2 worker
>>> nodes as
>>> >> > just in case, and still same issue.
>>> >> > Weird thing is, first time i installed and tested it out, it
>>> worked, but
>>> >> > now
>>> >> > it doesn't.
>>> >> >
>>> >> > Any help here would be greatly appreciated.
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Marcelo
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>

Mime
View raw message