spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vipul Pandey <vipan...@gmail.com>
Subject Re: job failing in standalone mode
Date Fri, 13 Sep 2013 18:59:46 GMT
Didn't know about the location of those logs - so thanks for pointing out.  But no, there's
nothing useful in them. 

here's an example

log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jEventHandler).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Spark Executor Command: "java" "-cp" ":/opt/geo/midas/spark/spark/conf:/opt/geo/midas/spark/spark/assembly/target/scala-2.9.3/spark-assembly-0.8.0-SNAPSHOT-hadoop2.0.0-cdh4.3.0.jar"
"-Xms512M" "-Xmx512M" "org.apache.spark.executor.StandaloneExecutorBackend" "akka://spark@rd..abc.xyz:60874/user/StandaloneScheduler"
"3" "rd.abc.xyz.com" "4"


I'm able to write to hdfs from the spark-shell just fine. it's only this separate application
mode that's not working





On Sep 13, 2013, at 10:52 AM, Matei Zaharia <matei.zaharia@gmail.com> wrote:

> Have you looked at the stdout and stderr files created for the job on the worker nodes?
By default they're in the "work" directory under SPARK_HOME.
> 
> In my experience this either means no write permissions to the filesystem, or no Java
found.
> 
> Matei
> 
> On Sep 12, 2013, at 10:59 PM, Vipul Pandey <vipandey@gmail.com> wrote:
> 
>> - Master Branch
>> - Standalone Mode
>> 
>> I'm able to run the some basic commands on the Spark Shell. But when I package the
same commands in a scala object in my app and run it separately - it just fails within a second
without giving any reasons. 
>> 
>> This is what I see in the master logs : 
>> 
>> 13/09/12 22:20:22 INFO Master: Registering app indexXformation
>> 13/09/12 22:20:22 INFO Master: Registered app indexXformation with ID app-20130912222022-0010
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/0 on worker
worker-20130911184654.xyz.abc..com-57772
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/1 on worker
worker-20130912180752-abc-vm0105.xyz.com-43175
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/2 on worker
worker-20130911184654-abc-vm0108.xyz.com-39247
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/3 on worker
worker-20130912183838-abc-vm0105.xyz.com-43175
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/4 on worker
worker-20130911184654-abc-vm0109.xyz.com-57730
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/5 on worker
worker-20130911184654-abc-vm0105.xyz.com-43175
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/6 on worker
worker-20130911184654-abc-vm0106.xyz.com-43044
>> 13/09/12 22:20:22 INFO Master: Removing executor app-20130912222022-0010/0 because
it is FAILED
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/7 on worker
worker-20130911184654-abc-vm0107.xyz.com-57772
>> 13/09/12 22:20:22 INFO Master: Removing executor app-20130912222022-0010/1 because
it is FAILED
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/8 on worker
worker-20130912180752-abc-vm0105.xyz.com-43175
>> 13/09/12 22:20:22 INFO Master: Removing executor app-20130912222022-0010/2 because
it is FAILED
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/9 on worker
worker-20130911184654-abc-vm0108.xyz.com-39247
>> 13/09/12 22:20:22 INFO Master: Removing executor app-20130912222022-0010/4 because
it is FAILED
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/10 on worker
worker-20130911184654-abc-vm0109.xyz.com-57730
>> 13/09/12 22:20:22 INFO Master: Removing executor app-20130912222022-0010/3 because
it is FAILED
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/11 on worker
worker-20130912183838-abc-vm0105.xyz.com-43175
>> 13/09/12 22:20:22 INFO Master: Removing executor app-20130912222022-0010/6 because
it is FAILED
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/12 on worker
worker-20130911184654-abc-vm0106.xyz.com-43044
>> 13/09/12 22:20:22 INFO Master: Removing executor app-20130912222022-0010/7 because
it is FAILED
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/13 on worker
worker-20130911184654-abc-vm0107.xyz.com-57772
>> 13/09/12 22:20:22 INFO Master: Removing executor app-20130912222022-0010/5 because
it is FAILED
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/14 on worker
worker-20130911184654-abc-vm0105.xyz.com-43175
>> 13/09/12 22:20:22 INFO Master: Removing executor app-20130912222022-0010/8 because
it is FAILED
>> 13/09/12 22:20:22 INFO Master: Launching executor app-20130912222022-0010/15 on worker
worker-20130912180752-abc-vm0105.xyz.com-43175
>> 13/09/12 22:20:22 INFO Master: Removing executor app-20130912222022-0010/9 because
it is FAILED
>> 13/09/12 22:20:22 ERROR Master: Application indexXformation with ID app-20130912222022-0010
failed 10 times, removing it
>> 13/09/12 22:20:22 INFO Master: Removing app app-20130912222022-0010
>> 
>> 
>> and the slave logs say nothing at all. 
>> 
>> I read lines from a file and transform them in a different form. The "transformedRDD".first
runs just fine and prints out the first value but RDD.count just fails without any reasons.

>> I'm unable to find out why. I'm deploying the correct spark jar  file in my project
as well. 
>> Any clues anyone?	
>> again, this is the master branch and in standalone mode. 
>> 
>> 
>> 
>> 
>> 
> 


Mime
View raw message