spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Garlapati, Suryanarayana (Nokia - IN/Bangalore)" <>
Subject RE: Python kubernetes spark 2.4 branch
Date Wed, 26 Sep 2018 17:43:55 GMT
Hi Ilan/Yinan,
My observation is as follows:
The dependent files specified with “--py-files”
are being downloaded and available in the container at “/var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/”.
I guess we need to export PYTHONPATH with this path as well with following code change in

if [ -n "$PYSPARK_FILES" ]; then


if [ -n "$PYSPARK_FILES" ]; then
    PYTHONPATH="$PYTHONPATH:<directory where the dependent files are downloaded and available
in container for example /var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/>"
Let me know, if this approach is fine.

Please correct me if my understanding is wrong with this approach.


From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Wednesday, September 26, 2018 9:14 AM
To: Ilan Filonenko <>;
Cc: Spark dev list <>;
Subject: RE: Python kubernetes spark 2.4 branch

Hi Ilan/ Yinan,
Yes my test case is also similar to the one described in

My spark-submit is as follows:
./spark-submit --deploy-mode cluster --master k8s://<>
--conf --properties-file /tmp/program_files/spark_py.conf --py-files

Following is the error observed:

+ exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=
--deploy-mode client --properties-file /opt/spark/conf/ --class org.apache.spark.deploy.PythonRunner
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Traceback (most recent call last):
File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/", line 13, in <module>
from getNN import *
ImportError: No module named getNN
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229

Observing the same kind of behaviour as mentioned in
(file getting downloaded and available in pod)

This is also the same with the local files as well:

./spark-submit --deploy-mode cluster --master k8s://<>
--conf --properties-file /tmp/program_files/spark_py.conf --py-files
./<> has dependencies from

But the same is working in spark 2.2 k8s branch.


From: Ilan Filonenko <<>>
Sent: Wednesday, September 26, 2018 2:06 AM
Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) <<>>;
Spark dev list <<>>;<>
Subject: Re: Python kubernetes spark 2.4 branch

Is this in reference to: ?

On Tue, Sep 25, 2018 at 12:38 PM Yinan Li <<>>
Can you give more details on how you ran your app, did you build your own image, and which
image are you using?

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) <<>>
I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent
files are passed through the --py-files option, they are not getting resolved by the main
python script. Please let me know, is this a known issue?


View raw message