spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Honey Joshi" <honeyjo...@ideata-analytics.com>
Subject Re: Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition
Date Thu, 03 Jul 2014 13:18:59 GMT
On Wed, July 2, 2014 2:00 am, Mayur Rustagi wrote:
> two job context cannot share data, are you collecting the data to the
> master & then sending it to the other context?
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
>
> On Wed, Jul 2, 2014 at 11:57 AM, Honey Joshi <
> honeyjoshi@ideata-analytics.com> wrote:
>
>> On Wed, July 2, 2014 1:11 am, Mayur Rustagi wrote:
>>
>>> Ideally you should be converting RDD to schemardd ?
>>> You are creating UnionRDD to join across dstream rdd?
>>>
>>>
>>>
>>>
>>> Mayur Rustagi
>>> Ph: +1 (760) 203 3257
>>> http://www.sigmoidanalytics.com
>>> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jul 1, 2014 at 3:11 PM, Honey Joshi
>>> <honeyjoshi@ideata-analytics.com
>>>
>>>
>>>> wrote:
>>>>
>>>>
>>>
>>>> Hi,
>>>> I am trying to run a project which takes data as a DStream and dumps
>>>> the data in the Shark table after various operations. I am getting
>>>> the following error :
>>>>
>>>> Exception in thread "main" org.apache.spark.SparkException: Job
>>>> aborted:
>>>> Task 0.0:0 failed 1 times (most recent failure: Exception failure:
>>>> java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition
>>>> cannot be cast to org.apache.spark.rdd.HadoopPartition) at
>>>>
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$s
>>>> ched uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
>>>> at
>>>>
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$s
>>>> ched uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
>>>> at
>>>>
>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArra
>>>> y.sc ala:59)
>>>> at
>>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>> at org.apache.spark.scheduler.DAGScheduler.org
>>>> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala
>>>> :102
>>>> 6)
>>>> at
>>>>
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.ap
>>>> ply( DAGScheduler.scala:619)
>>>> at
>>>>
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.ap
>>>> ply( DAGScheduler.scala:619)
>>>> at scala.Option.foreach(Option.scala:236) at
>>>>
>>>> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.s
>>>> cala :619)
>>>> at
>>>>
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$a
>>>> nonf un$receive$1.applyOrElse(DAGScheduler.scala:207)
>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at
>>>> akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at
>>>> akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>> at
>>>>
>>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(Ab
>>>> stra ctDispatcher.scala:386)
>>>> at
>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260
>>>> )
>>>> at
>>>>
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPo
>>>> ol.j ava:1339)
>>>> at
>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:
>>>> 1979
>>>> )
>>>> at
>>>>
>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerTh
>>>> read .java:107)
>>>>
>>>>
>>>>
>>>> Can someone please explain the cause of this error, I am also using
>>>> a Spark Context with the existing Streaming Context.
>>>>
>>>>
>>>>
>>>
>>
>> I am using spark 0.9.0-Incubating, so it doesnt have anything to do
>> with schemaRDD.This error is probably coming when I am trying to use one
>> spark context and one shark context in the same job.Is there any way to
>> incorporate two context in one job? Regards
>>
>>
>> Honey Joshi
>> Ideata-Analytics
>>
>>
>>
>
Both of these contexts are independently executing but they were still
giving us issues, mostly because of the lazy evaluation in scala.This
error is probably coming when I am trying to use one spark context and one
shark context in the same job.Got it resolved by stopping the existing
spark context before calling the shark context. Thanks for your help
Mayur.


Mime
View raw message