spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject import yaml fails with docker or kubernetes but works ok when run wiyh YARN
Date Mon, 19 Jul 2021 10:26:58 GMT
Hi,


My environment is set up OK with packages PySpark needs including

PyYAML     version 5.4.1


In YARN or local mode a simple skeleton test I have setup picks up yaml.
However with docker image or when the image used inside kubernetes it fails


This is the code used to test


import sys
import os
def main():
    print("\n Printing os stuff")
    p=sys.path
    print("\n Printing p")
    print(p)
    user_paths = os.environ['PYTHONPATH'].split(os.pathsep)
    print("\n Printing user_paths")
    print(user_paths)
    print("checking yaml")
    import yaml
    spark_context.stop()

if __name__ == "__main__":
  main()


Checks the OS path and tries to import yaml


With k8 I get


        spark-submit --verbose \
           --master k8s://$K8S_SERVER \
           --conf
"spark.yarn.dist.archives"=hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/${pyspark_venv}.tar.gz#${pyspark_venv}
\
           --deploy-mode cluster \
           --name pytest \
           --conf spark.kubernetes.namespace=spark \
           --conf spark.executor.instances=1 \
           --conf spark.kubernetes.driver.limit.cores=1 \
           --conf spark.executor.cores=1 \
           --conf spark.executor.memory=500m \
           --conf
spark.kubernetes.container.image=pytest-repo/spark-py:3.1.1 \
           --conf
spark.kubernetes.authenticate.driver.serviceAccountName=spark-serviceaccount
\
           --py-files hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/DSBQ.zip \
           hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/${APPLICATION}



+ SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client
"$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf
spark.driver.bindAddress=172.17.0.9 --deploy-mode client --properties-file
/opt/spark/conf/spark.properties --class
org.apache.spark.deploy.PythonRunner hdfs://
50.140.197.220:9000/minikube/codes/testyml.py
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform
(file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor
java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of
org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
WARNING: All illegal access operations will be denied in a future release
2021-07-19 10:20:41,430 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable


 Printing p
['/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d',
'/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d/DSBQ.zip',
'/opt/spark/python/lib/pyspark.zip',
'/opt/spark/python/lib/py4j-0.10.9-src.zip',
'/opt/spark/jars/spark-core_2.12-3.1.1.jar', '/usr/lib/python37.zip',
'/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload',
'/usr/local/lib/python3.7/dist-packages', '/usr/lib/python3/dist-packages']

 Printing user_paths
['/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d/DSBQ.zip',
'/opt/spark/python/lib/pyspark.zip',
'/opt/spark/python/lib/py4j-0.10.9-src.zip',
'/opt/spark/jars/spark-core_2.12-3.1.1.jar']
checking yaml
Traceback (most recent call last):
  File "/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d/testyml.py", line
17, in <module>
    main()
  File "/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d/testyml.py", line
13, in main
    import yaml
ModuleNotFoundError: No module named 'yaml'


Well yaml is a bit of an issue so I was wondering if anyone has seen this
before?


Thanks


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Mime
View raw message