spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-24736) --py-files not functional for non local URLs. It appears to pass non-local URL's into PYTHONPATH directly.
Date Thu, 14 Feb 2019 22:00:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-24736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-24736:
------------------------------------

    Assignee:     (was: Apache Spark)

> --py-files not functional for non local URLs. It appears to pass non-local URL's into
PYTHONPATH directly.
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-24736
>                 URL: https://issues.apache.org/jira/browse/SPARK-24736
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, PySpark
>    Affects Versions: 2.4.0
>         Environment: Recent 2.4.0 from master branch, submitted on Linux to a KOPS Kubernetes
cluster created on AWS.
>  
>            Reporter: Jonathan A Weaver
>            Priority: Minor
>
> My spark-submit
> bin/spark-submit \
>         --master k8s://[https://internal-api-test-k8s-local-7afed8-796273878.us-east-1.elb.amazonaws.com|https://internal-api-test-k8s-local-7afed8-796273878.us-east-1.elb.amazonaws.com/] \
>         --deploy-mode cluster \
>         --name pytest \
>         --conf spark.kubernetes.container.image=[412834075398.dkr.ecr.us-east-1.amazonaws.com/fids/pyspark-k8s:latest|http://412834075398.dkr.ecr.us-east-1.amazonaws.com/fids/pyspark-k8s:latest] \
>         --conf [spark.kubernetes.driver.pod.name|http://spark.kubernetes.driver.pod.name/]=spark-pi-driver
\
>         --conf spark.kubernetes.authenticate.submission.caCertFile=[cluster.ca|http://cluster.ca/] \
>         --conf spark.kubernetes.authenticate.submission.oauthToken=$TOK \
>         --conf spark.kubernetes.authenticate.driver.oauthToken=$TOK \
> --py-files "[https://s3.amazonaws.com/maxar-ids-fids/screw.zip]" \
> [https://s3.amazonaws.com/maxar-ids-fids/it.py]
>  
> *screw.zip is successfully downloaded and placed in SparkFIles.getRootPath()*
> 2018-07-01 07:33:43 INFO  SparkContext:54 - Added file [https://s3.amazonaws.com/maxar-ids-fids/screw.zip] at [https://s3.amazonaws.com/maxar-ids-fids/screw.zip] with
timestamp 1530430423297
> 2018-07-01 07:33:43 INFO  Utils:54 - Fetching [https://s3.amazonaws.com/maxar-ids-fids/screw.zip] to
/var/data/spark-7aba748d-2bba-4015-b388-c2ba9adba81e/spark-0ed5a100-6efa-45ca-ad4c-d1e57af76ffd/userFiles-a053206e-33d9-4245-b587-f8ac26d4c240/fetchFileTemp1549645948768432992.tmp
> *I print out the  PYTHONPATH and PYSPARK_FILES environment variables from the driver
script:*
>      PYTHONPATH /opt/spark/python/lib/pyspark.zip:/opt/spark/python/lib/py4j-0.10.7-src.zip:/opt/spark/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/opt/spark/python/lib/pyspark.zip:/opt/spark/python/lib/py4j-*.zip:*[https://s3.amazonaws.com/maxar-ids-fids/screw.zip]*
>     PYSPARK_FILES [https://s3.amazonaws.com/maxar-ids-fids/screw.zip]
>  
> *I print out sys.path*
> ['/tmp/spark-fec3684b-8b63-4f43-91a4-2f2fa41a1914', u'/var/data/spark-7aba748d-2bba-4015-b388-c2ba9adba81e/spark-0ed5a100-6efa-45ca-ad4c-d1e57af76ffd/userFiles-a053206e-33d9-4245-b587-f8ac26d4c240',
'/opt/spark/python/lib/pyspark.zip', '/opt/spark/python/lib/py4j-0.10.7-src.zip', '/opt/spark/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar',
'/opt/spark/python/lib/py4j-*.zip', *'/opt/spark/work-dir/https', '//[s3.amazonaws.com/maxar-ids-fids/screw.zip|http://s3.amazonaws.com/maxar-ids-fids/screw.zip]',* '/usr/lib/python27.zip',
'/usr/lib/python2.7', '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old',
'/usr/lib/python2.7/lib-dynload', '/usr/lib/python2.7/site-packages']
>  
> *URL from PYTHONFILES gets placed in sys.path verbatim with obvious results.*
>  
> *Dump of spark config from container.*
> Spark config dumped:
> [(u'spark.master', u'k8s://[https://internal-api-test-k8s-local-7afed8-796273878.us-east-1.elb.amazonaws.com|https://internal-api-test-k8s-local-7afed8-796273878.us-east-1.elb.amazonaws.com/]'),
(u'spark.kubernetes.authenticate.submission.oauthToken', u'<present_but_redacted>'),
(u'spark.kubernetes.authenticate.driver.oauthToken', u'<present_but_redacted>'), (u'spark.kubernetes.executor.podNamePrefix',
u'pytest-1530430411996'), (u'spark.kubernetes.memoryOverheadFactor', u'0.4'), (u'spark.driver.blockManager.port',
u'7079'), (u'[spark.app.id|http://spark.app.id/]', u'spark-application-1530430424433'), (u'[spark.app.name|http://spark.app.name/]',
u'pytest'), (u'[spark.executor.id|http://spark.executor.id/]', u'driver'), (u'spark.driver.host',
u'pytest-1530430411996-driver-svc.default.svc'), (u'spark.kubernetes.container.image', u'[412834075398.dkr.ecr.us-east-1.amazonaws.com/fids/pyspark-k8s:latest'|http://412834075398.dkr.ecr.us-east-1.amazonaws.com/fids/pyspark-k8s:latest']),
(u'spark.driver.port', u'7078'), (u'spark.kubernetes.python.mainAppResource', u'[https://s3.amazonaws.com/maxar-ids-fids/it.py']),
(u'spark.kubernetes.authenticate.submission.caCertFile', u'[cluster.ca|http://cluster.ca/]'),
(u'spark.rdd.compress', u'True'), (u'spark.driver.bindAddress', u'100.120.0.1'), (u'[spark.kubernetes.driver.pod.name|http://spark.kubernetes.driver.pod.name/]',
u'spark-pi-driver'), (u'spark.serializer.objectStreamReset', u'100'), (u'spark.files', u'[https://s3.amazonaws.com/maxar-ids-fids/it.py,https://s3.amazonaws.com/maxar-ids-fids/screw.zip']),
(u'spark.kubernetes.python.pyFiles', u'[https://s3.amazonaws.com/maxar-ids-fids/screw.zip']),
(u'spark.kubernetes.authenticate.driver.mounted.oauthTokenFile', u'/mnt/secrets/spark-kubernetes-credentials/oauth-token'),
(u'spark.submit.deployMode', u'client'), (u'spark.kubernetes.submitInDriver', u'true')]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message