spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manuel Sopena Ballesteros <manuel...@garvan.org.au>
Subject RE: spark-submit can find python?
Date Tue, 16 Jan 2018 00:02:18 GMT
Apologies, I copied the wrong spark-submit output from running in a cluster. Please find below
the right output for the question asked:

-bash-4.1$ spark-submit --master yarn \
>     --deploy-mode cluster \
>     --driver-memory 4g \
>     --executor-memory 2g \
>     --executor-cores 4 \
>     --queue default \
>     --conf spark.pyspark.virtualenv.enabled=true \
>     --conf spark.pyspark.virtualenv.type=native \
>     --conf spark.pyspark.virtualenv.requirements=/home/mansop/requirements.txt \
>     --conf spark.pyspark.virtualenv.bin.path=/home/mansop/hail-test/python-2.7.2/bin/activate
\
>     --jars $HAIL_HOME/build/libs/hail-all-spark.jar \
>     --py-files $HAIL_HOME/build/distributions/hail-python.zip \
>     test.py

18/01/16 10:42:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable
18/01/16 10:42:50 WARN DomainSocketFactory: The short-circuit local reads feature cannot be
used because libhadoop cannot be loaded.
18/01/16 10:42:50 INFO RMProxy: Connecting to ResourceManager at wp-hdp-ctrl03-mlx.mlx/10.0.1.206:8050
18/01/16 10:42:50 INFO Client: Requesting a new application from cluster with 4 NodeManagers
18/01/16 10:42:50 INFO Client: Verifying our application has not requested more than the maximum
memory capability of the cluster (450560 MB per container)
18/01/16 10:42:50 INFO Client: Will allocate AM container, with 4505 MB memory including 409
MB overhead
18/01/16 10:42:50 INFO Client: Setting up container launch context for our AM
18/01/16 10:42:50 INFO Client: Setting up the launch environment for our AM container
18/01/16 10:42:50 INFO Client: Preparing resources for our AM container
18/01/16 10:42:51 INFO Client: Use hdfs cache file as spark.yarn.archive for HDP, hdfsCacheFile:hdfs://wp-hdp-ctrl01-mlx.mlx:8020/hdp/apps/2.6.3.0-235/spark2/spark2-hdp-yarn-archive.tar.gz
18/01/16 10:42:51 INFO Client: Source and destination file systems are the same. Not copying
hdfs://wp-hdp-ctrl01-mlx.mlx:8020/hdp/apps/2.6.3.0-235/spark2/spark2-hdp-yarn-archive.tar.gz
18/01/16 10:42:51 INFO Client: Uploading resource file:/home/mansop/hail-test2/hail/build/libs/hail-all-spark.jar
-> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0045/hail-all-spark.jar
18/01/16 10:42:51 INFO Client: Uploading resource file:/home/mansop/requirements.txt ->
hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0045/requirements.txt
18/01/16 10:42:51 INFO Client: Uploading resource file:/home/mansop/test.py -> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0045/test.py
18/01/16 10:42:51 INFO Client: Uploading resource file:/usr/hdp/2.6.3.0-235/spark2/python/lib/pyspark.zip
-> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0045/pyspark.zip
18/01/16 10:42:51 INFO Client: Uploading resource file:/usr/hdp/2.6.3.0-235/spark2/python/lib/py4j-0.10.4-src.zip
-> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0045/py4j-0.10.4-src.zip
18/01/16 10:42:51 INFO Client: Uploading resource file:/home/mansop/hail-test2/hail/build/distributions/hail-python.zip
-> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0045/hail-python.zip
18/01/16 10:42:52 INFO Client: Uploading resource file:/tmp/spark-592e7e0f-6faa-4c3c-ab0f-7dd1cff21d17/__spark_conf__8493747840734310444.zip
-> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0045/__spark_conf__.zip
18/01/16 10:42:52 INFO SecurityManager: Changing view acls to: mansop
18/01/16 10:42:52 INFO SecurityManager: Changing modify acls to: mansop
18/01/16 10:42:52 INFO SecurityManager: Changing view acls groups to:
18/01/16 10:42:52 INFO SecurityManager: Changing modify acls groups to:
18/01/16 10:42:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls
disabled; users  with view permissions: Set(mansop); groups with view permissions: Set();
users  with modify permissions: Set(mansop); groups with modify permissions: Set()
18/01/16 10:42:52 INFO Client: Submitting application application_1512016123441_0045 to ResourceManager
18/01/16 10:42:52 INFO YarnClientImpl: Submitted application application_1512016123441_0045
18/01/16 10:42:53 INFO Client: Application report for application_1512016123441_0045 (state:
ACCEPTED)
18/01/16 10:42:53 INFO Client:
         client token: N/A
         diagnostics: AM container is launched, waiting for AM container to Register with
RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1516059772092
         final status: UNDEFINED
         tracking URL: http://wp-hdp-ctrl03-mlx.mlx:8088/proxy/application_1512016123441_0045/
         user: mansop
18/01/16 10:42:54 INFO Client: Application report for application_1512016123441_0045 (state:
ACCEPTED)
18/01/16 10:42:55 INFO Client: Application report for application_1512016123441_0045 (state:
ACCEPTED)
18/01/16 10:42:56 INFO Client: Application report for application_1512016123441_0045 (state:
ACCEPTED)
18/01/16 10:42:57 INFO Client: Application report for application_1512016123441_0045 (state:
ACCEPTED)
18/01/16 10:42:58 INFO Client: Application report for application_1512016123441_0045 (state:
ACCEPTED)
18/01/16 10:42:59 INFO Client: Application report for application_1512016123441_0045 (state:
ACCEPTED)
18/01/16 10:43:00 INFO Client: Application report for application_1512016123441_0045 (state:
ACCEPTED)
18/01/16 10:43:01 INFO Client: Application report for application_1512016123441_0045 (state:
FAILED)
18/01/16 10:43:01 INFO Client:
         client token: N/A
         diagnostics: Application application_1512016123441_0045 failed 2 times due to AM
Container for appattempt_1512016123441_0045_000002 exited with  exitCode: 15
For more detailed output, check the application tracking page: http://wp-hdp-ctrl03-mlx.mlx:8088/cluster/app/application_1512016123441_0045
Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1512016123441_0045_02_000001
Exit code: 15

Container exited with a non-zero exit code 15. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
52)
        at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:646)
Caused by: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
        at java.lang.ProcessImpl.start(ProcessImpl.java:134)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 9 more
18/01/16 10:43:00 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason:
User class threw exception: java.io.IOException: Cannot run program "/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0045/container_1512016123441_0045_02_000001/tmp/1516059780057-0/bin/python":
error=2, No such file or directory)
18/01/16 10:43:00 ERROR ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
        at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:423)
        at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:282)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:768)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
        at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
        at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:766)
        at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: java.io.IOException: Cannot run program "/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0045/container_1512016123441_0045_02_000001/tmp/1516059780057-0/bin/python":
error=2, No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at org.apache.spark.api.python.VirtualEnvFactory.execCommand(VirtualEnvFactory.scala:103)
        at org.apache.spark.api.python.VirtualEnvFactory.setupVirtualEnv(VirtualEnvFactory.scala:91)
        at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:52)
        at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:646)
Caused by: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
        at java.lang.ProcessImpl.start(ProcessImpl.java:134)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 9 more
18/01/16 10:43:00 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag
message: User class threw exception: java.io.IOException: Cannot run program "/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0045/container_1512016123441_0045_02_000001/tmp/1516059780057-0/bin/python":
error=2, No such file or directory)
18/01/16 10:43:00 INFO ApplicationMaster: Deleting staging directory hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0045
18/01/16 10:43:00 INFO ShutdownHookManager: Shutdown hook called

Failing this attempt. Failing the application.
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1516059772092
         final status: FAILED
         tracking URL: http://wp-hdp-ctrl03-mlx.mlx:8088/cluster/app/application_1512016123441_0045
         user: mansop
Exception in thread "main" org.apache.spark.SparkException: Application application_1512016123441_0045
finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1187)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1233)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/01/16 10:43:01 INFO ShutdownHookManager: Shutdown hook called
18/01/16 10:43:01 INFO ShutdownHookManager: Deleting directory /tmp/spark-592e7e0f-6faa-4c3c-ab0f-7dd1cff21d17

QUESTION:
Why spark/yarn can't find this file /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0045/container_1512016123441_0045_02_000001/tmp/1516059780057-0/bin/python?
Who copies it and from where? And what do I need to do in order to make my spark-submit job
to run?

Thank you

Manuel

From: Manuel Sopena Ballesteros
Sent: Tuesday, January 16, 2018 10:53 AM
To: user@spark.apache.org
Subject: spark-submit can find python?

Hi all,

I am quite new to spark and need some help troubleshooting the execution of an application
running on a spark cluster...

My spark environment is deployed using Ambari (HDP), YARM is the resource scheduler and hadoop
as file system.

The application I am trying to run is a python script (test.py).

The worker nodes have python 2.6 so I am asking spark to spin up a virtual environment based
on python 2.7.

I can successfully run this test app in a single node (see below):

-bash-4.1$ spark-submit \
> --conf spark.pyspark.virtualenv.type=native \
> --conf spark.pyspark.virtualenv.requirements=/home/mansop/requirements.txt \
> --conf spark.pyspark.virtualenv.bin.path=/home/mansop/hail-test/python-2.7.2/bin/activate
\
> --conf spark.pyspark.python=/home/mansop/hail-test/python-2.7.2/bin/python \
> --jars $HAIL_HOME/build/libs/hail-all-spark.jar \
> --py-files $HAIL_HOME/build/distributions/hail-python.zip \
> test.py
hail: info: SparkUI: http://192.168.10.201:4040
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.1-0320a61
[Stage 2:==================================================>     (91 + 4) / 100]Summary(samples=3,
variants=308, call_rate=                                                                 
                                                                1.000000, contigs=['1'], multiallelics=0,
snps=308, mnps=0, insertions=0, deletions=0, complex=0, star=0, max_alleles=2)


However spark crashes while trying to run my test script (error below) throwing this error
message /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0032/container_1512016123441_0032_02_000001/tmp/1515989862748-0/bin/python

-bash-4.1$ spark-submit --master yarn \
>     --deploy-mode cluster \
>     --driver-memory 4g \
>     --executor-memory 2g \
>     --executor-cores 4 \
>     --queue default \
>     --conf spark.pyspark.virtualenv.type=native \
>     --conf spark.pyspark.virtualenv.requirements=/home/mansop/requirements.txt \
>     --conf spark.pyspark.virtualenv.bin.path=/home/mansop/hail-test/python-2.7.2/bin/activate
\
>     --jars $HAIL_HOME/build/libs/hail-all-spark.jar \
>     --py-files $HAIL_HOME/build/distributions/hail-python.zip \
>     test.py
18/01/16 09:55:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable
18/01/16 09:55:18 WARN DomainSocketFactory: The short-circuit local reads feature cannot be
used because libhadoop cannot be loaded.
18/01/16 09:55:18 INFO RMProxy: Connecting to ResourceManager at wp-hdp-ctrl03-mlx.mlx/10.0.1.206:8050
18/01/16 09:55:18 INFO Client: Requesting a new application from cluster with 4 NodeManagers
18/01/16 09:55:18 INFO Client: Verifying our application has not requested more than the maximum
memory capability of the cluster (450560 MB per container)
18/01/16 09:55:18 INFO Client: Will allocate AM container, with 4505 MB memory including 409
MB overhead
18/01/16 09:55:18 INFO Client: Setting up container launch context for our AM
18/01/16 09:55:18 INFO Client: Setting up the launch environment for our AM container
18/01/16 09:55:18 INFO Client: Preparing resources for our AM container
18/01/16 09:55:19 INFO Client: Use hdfs cache file as spark.yarn.archive for HDP, hdfsCacheFile:hdfs://wp-hdp-ctrl01-mlx.mlx:8020/hdp/apps/2.6.3.0-235/spark2/spark2-hdp-yarn-archive.tar.gz
18/01/16 09:55:19 INFO Client: Source and destination file systems are the same. Not copying
hdfs://wp-hdp-ctrl01-mlx.mlx:8020/hdp/apps/2.6.3.0-235/spark2/spark2-hdp-yarn-archive.tar.gz
18/01/16 09:55:19 INFO Client: Uploading resource file:/home/mansop/hail-test2/hail/build/libs/hail-all-spark.jar
-> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/hail-all-spark.jar
18/01/16 09:55:20 INFO Client: Uploading resource file:/home/mansop/requirements.txt ->
hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/requirements.txt
18/01/16 09:55:20 INFO Client: Uploading resource file:/home/mansop/test.py -> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/test.py
18/01/16 09:55:20 INFO Client: Uploading resource file:/usr/hdp/2.6.3.0-235/spark2/python/lib/pyspark.zip
-> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/pyspark.zip
18/01/16 09:55:20 INFO Client: Uploading resource file:/usr/hdp/2.6.3.0-235/spark2/python/lib/py4j-0.10.4-src.zip
-> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/py4j-0.10.4-src.zip
18/01/16 09:55:20 INFO Client: Uploading resource file:/home/mansop/hail-test2/hail/build/distributions/hail-python.zip
-> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/hail-python.zip
18/01/16 09:55:20 INFO Client: Uploading resource file:/tmp/spark-888af623-c81d-4ff1-ac8a-15f25112cc4a/__spark_conf__1173722187739681647.zip
-> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/__spark_conf__.zip
18/01/16 09:55:20 INFO SecurityManager: Changing view acls to: mansop
18/01/16 09:55:20 INFO SecurityManager: Changing modify acls to: mansop
18/01/16 09:55:20 INFO SecurityManager: Changing view acls groups to:
18/01/16 09:55:20 INFO SecurityManager: Changing modify acls groups to:
18/01/16 09:55:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls
disabled; users  with view permissions: Set(mansop); groups with view permissions: Set();
users  with modify permissions: Set(mansop); groups with modify permissions: Set()
18/01/16 09:55:20 INFO Client: Submitting application application_1512016123441_0043 to ResourceManager
18/01/16 09:55:20 INFO YarnClientImpl: Submitted application application_1512016123441_0043
18/01/16 09:55:21 INFO Client: Application report for application_1512016123441_0043 (state:
ACCEPTED)
18/01/16 09:55:21 INFO Client:
         client token: N/A
         diagnostics: AM container is launched, waiting for AM container to Register with
RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1516056920515
         final status: UNDEFINED
         tracking URL: http://wp-hdp-ctrl03-mlx.mlx:8088/proxy/application_1512016123441_0043/
         user: mansop
18/01/16 09:55:22 INFO Client: Application report for application_1512016123441_0043 (state:
ACCEPTED)
18/01/16 09:55:23 INFO Client: Application report for application_1512016123441_0043 (state:
ACCEPTED)
18/01/16 09:55:24 INFO Client: Application report for application_1512016123441_0043 (state:
ACCEPTED)
18/01/16 09:55:25 INFO Client: Application report for application_1512016123441_0043 (state:
ACCEPTED)
18/01/16 09:55:26 INFO Client: Application report for application_1512016123441_0043 (state:
ACCEPTED)
18/01/16 09:55:27 INFO Client: Application report for application_1512016123441_0043 (state:
ACCEPTED)
18/01/16 09:55:28 INFO Client: Application report for application_1512016123441_0043 (state:
ACCEPTED)
18/01/16 09:55:29 INFO Client: Application report for application_1512016123441_0043 (state:
ACCEPTED)
18/01/16 09:55:30 INFO Client: Application report for application_1512016123441_0043 (state:
FAILED)
18/01/16 09:55:30 INFO Client:
         client token: N/A
         diagnostics: Application application_1512016123441_0043 failed 2 times due to AM
Container for appattempt_1512016123441_0043_000002 exited with  exitCode: 1
For more detailed output, check the application tracking page: http://wp-hdp-ctrl03-mlx.mlx:8088/cluster/app/application_1512016123441_0043
Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1512016123441_0043_02_000001
Exit code: 1

Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/d1/hadoop/yarn/local/filecache/11/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.3.0-235/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/01/16 09:55:27 INFO SignalUtils: Registered signal handler for TERM
18/01/16 09:55:27 INFO SignalUtils: Registered signal handler for HUP
18/01/16 09:55:27 INFO SignalUtils: Registered signal handler for INT
18/01/16 09:55:28 INFO ApplicationMaster: Preparing Local resources
18/01/16 09:55:28 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1512016123441_0043_000002
18/01/16 09:55:28 INFO SecurityManager: Changing view acls to: yarn,mansop
18/01/16 09:55:28 INFO SecurityManager: Changing modify acls to: yarn,mansop
18/01/16 09:55:28 INFO SecurityManager: Changing view acls groups to:
18/01/16 09:55:28 INFO SecurityManager: Changing modify acls groups to:
18/01/16 09:55:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls
disabled; users  with view permissions: Set(yarn, mansop); groups with view permissions: Set();
users  with modify permissions: Set(yarn, mansop); groups with modify permissions: Set()
18/01/16 09:55:28 INFO ApplicationMaster: Starting the user application in a separate Thread
18/01/16 09:55:28 INFO ApplicationMaster: Waiting for spark context initialization...
18/01/16 09:55:29 ERROR ApplicationMaster: User application exited with status 1
18/01/16 09:55:29 INFO ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason:
User application exited with status 1)
18/01/16 09:55:29 ERROR ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
        at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:423)
        at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:282)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:768)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
        at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:766)
        at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: org.apache.spark.SparkUserAppException: User application exited with 1
        at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:105)
        at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:646)
18/01/16 09:55:29 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag
message: User application exited with status 1)
18/01/16 09:55:29 INFO ApplicationMaster: Deleting staging directory hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043
18/01/16 09:55:29 INFO ShutdownHookManager: Shutdown hook called

Failing this attempt. Failing the application.
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1516056920515
         final status: FAILED
         tracking URL: http://wp-hdp-ctrl03-mlx.mlx:8088/cluster/app/application_1512016123441_0043
         user: mansop
Exception in thread "main" org.apache.spark.SparkException: Application application_1512016123441_0043
finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1187)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1233)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/01/16 09:55:30 INFO ShutdownHookManager: Shutdown hook called
18/01/16 09:55:30 INFO ShutdownHookManager: Deleting directory /tmp/spark-888af623-c81d-4ff1-ac8a-15f25112cc4a

QUESTION:
Why spark/yarn can't find this file /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0032/container_1512016123441_0032_02_000001/tmp/1515989862748-0/bin/python?
Who copies it and from where? And what do I need to do in order to make my spark-submit job
to run?

Thank you very much


Manuel Sopena Ballesteros | Big data Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E: manuel.sb@garvan.org.au<mailto:manuel.sb@garvan.org.au>

NOTICE
Please consider the environment before printing this email. This message and any attachments
are intended for the addressee named and may contain legally privileged/confidential/copyright
information. If you are not the intended recipient, you should not read, use, disclose, copy
or distribute this communication. If you have received this message in error please notify
us at once by return email and then delete both messages. We accept no liability for the distribution
of viruses or similar in electronic communications. This notice should not be removed.

Mime
View raw message