hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Zhu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-9549) Not able to run pyspark in docker driver container on Yarn3
Date Tue, 14 May 2019 03:36:00 GMT
Jack Zhu created YARN-9549:
------------------------------

             Summary: Not able to run pyspark in docker driver container on Yarn3
                 Key: YARN-9549
                 URL: https://issues.apache.org/jira/browse/YARN-9549
             Project: Hadoop YARN
          Issue Type: Bug
          Components: yarn
    Affects Versions: 3.1.2
         Environment: Hadoop 3.1.1.3.1.0.0-78

spark version 2.3.2.3.1.0.0-78

Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_211

Server: Docker Engine - Community Version:          18.09.6
            Reporter: Jack Zhu
         Attachments: Dockerfile, test.py

I follow [https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/DockerContainers.html] to
build up a spark docker image to run pyspark, there isn't a good document describe how to
use spark-submit pyspark job to a hadoop3 cluster, so I use below command to launch my simple
python job:

PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3.5 spark-submit --master yarn --deploy-mode cluster
--num-executors 3 --executor-memory 1g --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 --conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8
--conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker ./test.py

 

in the test.py, it only simply collect the hostname from the executor, and check whether the
python job run in a container or not.

I found that the driver always run direct on the host, not run in the container, as a result
we need to keep python version in docker image consistent with the nodemanager, this is meanless
to use docker to package all the dependencies.

 

The spark job can be run successfully, below is the std output:

Log Type: stdout

Log Upload Time: Tue May 14 02:07:06 +0000 2019

Log Length: 141

host.test.com

False ============>going to print all the container names. [True, True, True, True, True,
True, True, True, True]

please see attached Dockfile and test.py

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Mime
View raw message