spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shivani Rao <raoshiv...@gmail.com>
Subject Re: Worker dies while submitting a job
Date Fri, 20 Jun 2014 23:43:38 GMT
That error typically means that there is a communication error (wrong
ports) between master and worker. Also check if the worker has "write"
permissions to create the "work" directory. We were getting this error due
one of the above two reasons



On Tue, Jun 17, 2014 at 10:04 AM, Luis Ángel Vicente Sánchez <
langel.groups@gmail.com> wrote:

> I have been able to submit a job successfully but I had to config my spark
> job this way:
>
>   val sparkConf: SparkConf =
>     new SparkConf()
>       .setAppName("TwitterPopularTags")
>       .setMaster("spark://int-spark-master:7077")
>       .setSparkHome("/opt/spark")
>       .setJars(Seq("/tmp/spark-test-0.1-SNAPSHOT.jar"))
>
> Now I'm getting this error on my worker:
>
> 4/06/17 17:03:40 WARN TaskSchedulerImpl: Initial job has not accepted any
> resources; check your cluster UI to ensure that workers are registered and
> have sufficient memory
>
>
>
>
> 2014-06-17 17:36 GMT+01:00 Luis Ángel Vicente Sánchez <
> langel.groups@gmail.com>:
>
> Ok... I was checking the wrong version of that file yesterday. My worker
>> is sending a DriverStateChanged(_, DriverState.FAILED, _) but there is
>> no case branch for that state and the worker is crashing. I still don't
>> know why I'm getting a FAILED state but I'm sure that should kill the actor
>> due to a scala.MatchError.
>>
>> Usually in scala is a best-practice to use a sealed trait and case
>> classes/objects in a match statement instead of an enumeration (the
>> compiler will complain about missing cases); I think that should be
>> refactored to catch this kind of errors at compile time.
>>
>> Now I need to find why that state changed message is sent... I will
>> continue updating this thread until I found the problem :D
>>
>>
>> 2014-06-16 18:25 GMT+01:00 Luis Ángel Vicente Sánchez <
>> langel.groups@gmail.com>:
>>
>> I'm playing with a modified version of the TwitterPopularTags example and
>>> when I tried to submit the job to my cluster, workers keep dying with this
>>> message:
>>>
>>> 14/06/16 17:11:16 INFO DriverRunner: Launch Command: "java" "-cp"
>>> "/opt/spark-1.0.0-bin-hadoop1/work/driver-20140616171115-0014/spark-test-0.1-SNAPSHOT.jar:::/opt/spark-1.0.0-bin-hadoop1/conf:/opt/spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop1.0.4.jar"
>>> "-XX:MaxPermSize=128m" "-Xms512M" "-Xmx512M"
>>> "org.apache.spark.deploy.worker.DriverWrapper"
>>> "akka.tcp://sparkWorker@int-spark-worker:51676/user/Worker"
>>> "org.apache.spark.examples.streaming.TwitterPopularTags"
>>> 14/06/16 17:11:17 ERROR OneForOneStrategy: FAILED (of class
>>> scala.Enumeration$Val)
>>> scala.MatchError: FAILED (of class scala.Enumeration$Val)
>>> at
>>> org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:317)
>>>  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>  at
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>  at
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>  at
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> 14/06/16 17:11:17 INFO Worker: Starting Spark worker
>>> int-spark-app-ie005d6a3.mclabs.io:51676 with 2 cores, 6.5 GB RAM
>>> 14/06/16 17:11:17 INFO Worker: Spark home: /opt/spark-1.0.0-bin-hadoop1
>>> 14/06/16 17:11:17 INFO WorkerWebUI: Started WorkerWebUI at
>>> http://int-spark-app-ie005d6a3.mclabs.io:8081
>>> 14/06/16 17:11:17 INFO Worker: Connecting to master
>>> spark://int-spark-app-ie005d6a3.mclabs.io:7077...
>>> 14/06/16 17:11:17 ERROR Worker: Worker registration failed: Attempted to
>>> re-register worker at same address: akka.tcp://
>>> sparkWorker@int-spark-app-ie005d6a3.mclabs.io:51676
>>>
>>> This happens when the worker receive a DriverStateChanged(driverId,
>>> state, exception) message.
>>>
>>> To deploy the job I copied the jar file to the temporary folder of
>>> master node and execute the following command:
>>>
>>> ./spark-submit \
>>> --class org.apache.spark.examples.streaming.TwitterPopularTags \
>>> --master spark://int-spark-master:7077 \
>>> --deploy-mode cluster \
>>> file:///tmp/spark-test-0.1-SNAPSHOT.jar
>>>
>>> I don't really know what the problem could be as there is a 'case _'
>>> that should avoid that problem :S
>>>
>>
>>
>


-- 
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA

Mime
View raw message