spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <>
Subject import yaml fails with docker or kubernetes but works ok when run wiyh YARN
Date Mon, 19 Jul 2021 10:26:58 GMT

My environment is set up OK with packages PySpark needs including

PyYAML     version 5.4.1

In YARN or local mode a simple skeleton test I have setup picks up yaml.
However with docker image or when the image used inside kubernetes it fails

This is the code used to test

import sys
import os
def main():
    print("\n Printing os stuff")
    print("\n Printing p")
    user_paths = os.environ['PYTHONPATH'].split(os.pathsep)
    print("\n Printing user_paths")
    print("checking yaml")
    import yaml

if __name__ == "__main__":

Checks the OS path and tries to import yaml

With k8 I get

        spark-submit --verbose \
           --master k8s://$K8S_SERVER \
           --deploy-mode cluster \
           --name pytest \
           --conf spark.kubernetes.namespace=spark \
           --conf spark.executor.instances=1 \
           --conf spark.kubernetes.driver.limit.cores=1 \
           --conf spark.executor.cores=1 \
           --conf spark.executor.memory=500m \
spark.kubernetes.container.image=pytest-repo/spark-py:3.1.1 \
           --py-files hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/ \

+ SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf
spark.driver.bindAddress= --deploy-mode client --properties-file
/opt/spark/conf/ --class
org.apache.spark.deploy.PythonRunner hdfs://
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform
(file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor
WARNING: Please consider reporting this to the maintainers of
WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
WARNING: All illegal access operations will be denied in a future release
2021-07-19 10:20:41,430 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes where

 Printing p
'/opt/spark/jars/spark-core_2.12-3.1.1.jar', '/usr/lib/',
'/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload',
'/usr/local/lib/python3.7/dist-packages', '/usr/lib/python3/dist-packages']

 Printing user_paths
checking yaml
Traceback (most recent call last):
  File "/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d/", line
17, in <module>
  File "/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d/", line
13, in main
    import yaml
ModuleNotFoundError: No module named 'yaml'

Well yaml is a bit of an issue so I was wondering if anyone has seen this


   view my Linkedin profile

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

View raw message