spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Garlapati, Suryanarayana (Nokia - IN/Bangalore)" <suryanarayana.garlap...@nokia.com>
Subject RE: Python kubernetes spark 2.4 branch
Date Wed, 26 Sep 2018 17:43:55 GMT
Hi Ilan/Yinan,
My observation is as follows:
The dependent files specified with “--py-files http://10.75.145.25:80/Spark/getNN.py”
are being downloaded and available in the container at “/var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/”.
I guess we need to export PYTHONPATH with this path as well with following code change in
entrypoint.sh


if [ -n "$PYSPARK_FILES" ]; then
    PYTHONPATH="$PYTHONPATH:$PYSPARK_FILES"
fi

to

if [ -n "$PYSPARK_FILES" ]; then
    PYTHONPATH="$PYTHONPATH:<directory where the dependent files are downloaded and available
in container for example /var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/>"
fi
Let me know, if this approach is fine.

Please correct me if my understanding is wrong with this approach.

Regards
Surya

From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Wednesday, September 26, 2018 9:14 AM
To: Ilan Filonenko <if56@cornell.edu>; liyinan926@gmail.com
Cc: Spark dev list <dev@spark.apache.org>; user@spark.apache.org
Subject: RE: Python kubernetes spark 2.4 branch

Hi Ilan/ Yinan,
Yes my test case is also similar to the one described in https://issues.apache.org/jira/browse/SPARK-24736

My spark-submit is as follows:
./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/>
--conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files
http://10.75.145.25:80/Spark/getNN.py http://10.75.145.25:80/Spark/test.py

Following is the error observed:

+ exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=192.168.1.22
--deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner
http://10.75.145.25:80/Spark/test.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Traceback (most recent call last):
File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in <module>
from getNN import *
ImportError: No module named getNN
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229

Observing the same kind of behaviour as mentioned in https://issues.apache.org/jira/browse/SPARK-24736
(file getting downloaded and available in pod)

This is also the same with the local files as well:

./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/>
--conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files
./getNN.py<getNN.py> http://10.75.145.25:80/Spark/test.py

test.py has dependencies from getNN.py.


But the same is working in spark 2.2 k8s branch.


Regards
Surya

From: Ilan Filonenko <if56@cornell.edu<mailto:if56@cornell.edu>>
Sent: Wednesday, September 26, 2018 2:06 AM
To: liyinan926@gmail.com<mailto:liyinan926@gmail.com>
Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) <suryanarayana.garlapati@nokia.com<mailto:suryanarayana.garlapati@nokia.com>>;
Spark dev list <dev@spark.apache.org<mailto:dev@spark.apache.org>>; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Python kubernetes spark 2.4 branch

Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ?

On Tue, Sep 25, 2018 at 12:38 PM Yinan Li <liyinan926@gmail.com<mailto:liyinan926@gmail.com>>
wrote:
Can you give more details on how you ran your app, did you build your own image, and which
image are you using?

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) <suryanarayana.garlapati@nokia.com<mailto:suryanarayana.garlapati@nokia.com>>
wrote:
Hi,
I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent
files are passed through the --py-files option, they are not getting resolved by the main
python script. Please let me know, is this a known issue?

Regards
Surya

Mime
View raw message