spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <ilike...@gmail.com>
Subject Re: count on RDD yields NoClassDefFoundError on 1.0.1
Date Tue, 15 Jul 2014 05:01:09 GMT
I don't believe the spark-ec2 scripts have been updated for 1.0.1, so you
may have to download the release yourself on the master node, and rsync it
(using "~/spark-ec2/copy-dir ~/spark") to the other workers.


On Mon, Jul 14, 2014 at 9:49 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> For the record, this same code against the same dataset works fine on a
> 1.0.0 EC2 cluster.
>
>
> On Tue, Jul 15, 2014 at 12:36 AM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> Like this:
>>
>> val tweets = raw.map(_.split('\t')).map(t => Tweet(t(0), t(1), t(2), t(3).toBoolean))
>>
>> raw is just an RDD of tab-delimited strings.
>>
>> scala> raw
>> res35: org.apache.spark.rdd.RDD[String] = MappedRDD[5] at repartition at <console>:23
>>
>> Nick
>> ​
>>
>>
>> On Tue, Jul 15, 2014 at 12:16 AM, Yin Huai <huaiyin.thu@gmail.com> wrote:
>>
>>> Hi Nick,
>>>
>>> How was tweets generated?
>>>
>>> Thanks,
>>>
>>> Yin
>>>
>>>
>>> On Mon, Jul 14, 2014 at 7:12 PM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> Changing the subject since this doesn’t appear to be related to Spark
>>>> SQL specifically. I’m on a 1.0.1 EC2 cluster.
>>>>
>>>> On Mon, Jul 14, 2014 at 12:05 AM, Michael Armbrust <
>>>> michael@databricks.com> wrote:
>>>>
>>>> Are you sure the code running on the cluster has been updated?
>>>>
>>>> I’m launching the cluster using spark-ec2 so I’m assuming that’s been
>>>> taken care of.
>>>>
>>>> If the above doesn't fix it, the following would be helpful:
>>>>>  - The full stack trace
>>>>>
>>>> Here’s the stack trace:
>>>>
>>>> scala> tweets
>>>> res13: org.apache.spark.rdd.RDD[Tweet] = MappedRDD[18] at map at <console>:32
>>>>
>>>> scala> tweets.count14/07/15 02:04:04 WARN TaskSetManager: Lost TID 756
(task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Loss was due to java.lang.NoClassDefFoundError
>>>> java.lang.NoClassDefFoundError: Could not initialize class $line36.$read$
>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>     at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>     at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 749 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 747 (task 27.0:2)14/07/15
02:04:04 WARN TaskSetManager: Loss was due to java.lang.ExceptionInInitializerError
>>>> java.lang.ExceptionInInitializerError
>>>>     at $line36.$read$$iwC.<init>(<console>:6)
>>>>     at $line36.$read.<init>(<console>:48)
>>>>     at $line36.$read$.<init>(<console>:52)
>>>>     at $line36.$read$.<clinit>(<console>)
>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>     at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>     at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 746 (task 27.0:1)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 753 (task 27.0:8)14/07/15
02:04:04 WARN TaskSetManager: Lost TID 752 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 750 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 755 (task 27.0:10)14/07/15
02:04:04 WARN TaskSetManager: Lost TID 745 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 748 (task 27.0:3)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 751 (task 27.0:6)14/07/15
02:04:04 WARN TaskSetManager: Lost TID 754 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 757 (task 27.0:13)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 758 (task 27.0:11)14/07/15
02:04:04 WARN TaskSetManager: Lost TID 760 (task 27.0:1)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 759 (task 27.0:2)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 761 (task 27.0:12)14/07/15
02:04:04 WARN TaskSetManager: Lost TID 762 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 763 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 764 (task 27.0:10)14/07/15
02:04:04 WARN TaskSetManager: Lost TID 765 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 766 (task 27.0:20)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 771 (task 27.0:13)14/07/15
02:04:04 WARN TaskSetManager: Lost TID 767 (task 27.0:3)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 769 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 768 (task 27.0:9)14/07/15
02:04:04 WARN TaskSetManager: Lost TID 770 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 772 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 774 (task 27.0:14)14/07/15
02:04:04 WARN TaskSetManager: Lost TID 776 (task 27.0:17)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 775 (task 27.0:16)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 777 (task 27.0:6)14/07/15
02:04:04 WARN TaskSetManager: Lost TID 773 (task 27.0:15)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 778 (task 27.0:21)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 781 (task 27.0:4)14/07/15
02:04:04 WARN TaskSetManager: Lost TID 780 (task 27.0:20)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 779 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 782 (task 27.0:0)14/07/15
02:04:04 WARN TaskSetManager: Lost TID 784 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 783 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 786 (task 27.0:14)14/07/15
02:04:04 WARN TaskSetManager: Lost TID 785 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 787 (task 27.0:16)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 788 (task 27.0:9)14/07/15
02:04:04 WARN TaskSetManager: Lost TID 789 (task 27.0:15)14/07/15 02:04:04 WARN TaskSetManager:
Lost TID 790 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 791 (task 27.0:4)14/07/15
02:04:04 ERROR TaskSetManager: Task 27.0:4 failed 4 times; aborting job
>>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 27.0:4
failed 4 times, most recent failure: Exception failure in TID 791 on host ip-10-231-146-237.ec2.internal:
java.lang.NoClassDefFoundError: Could not initialize class
>>>>         $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>         $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>         org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>>>         org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>         org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>         org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>         org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>         java.lang.Thread.run(Thread.java:744)
>>>> Driver stacktrace:
>>>>     at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>>>     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>>     at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>     at scala.Option.foreach(Option.scala:236)
>>>>     at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>>>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>>>     at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>>>     at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>>     at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>>>     at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>     at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>     at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>     at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>
>>>> scala> 14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 0 on ip-10-237-184-110.ec2.internal:
Uncaught exception14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 1 on ip-10-231-146-237.ec2.internal:
Uncaught exception14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 2 on ip-10-144-192-36.ec2.internal:
remote Akka client disassociated
>>>>
>>>> The definition of Tweet is as follows:
>>>>
>>>> case class Tweet(
>>>>   user: String,
>>>>   created_at: String,
>>>>   text: String,
>>>>   is_retweet: Boolean
>>>> )
>>>>
>>>>   - The queryExecution from the SchemaRDD (i.e. println(sql("SELECT
>>>>> ...").queryExecution))
>>>>
>>>> I was able to reproduce the problem this time without a query.
>>>>
>>>> Nick
>>>> ​
>>>>
>>>
>>>
>>
>

Mime
View raw message