spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Spark on minikube hanging when doing simple rdd.collect()
Date Sun, 04 Jul 2021 11:56:51 GMT
Hi,

I have a minikube with spark running.

when the py file is pretty simple and does something like print(0 it works

from src.config import config, oracle_url
from pyspark.sql import functions as F
from pyspark.sql.functions import col, round
from pyspark.sql.window import Window
from sparkutils import sparkstuff as s
from othermisc import usedFunctions as uf
import locale
locale.setlocale(locale.LC_ALL, 'en_GB')

def main():
    appName = "testme"
    spark_session = s.spark_session(appName)
    spark_context = s.sparkcontext()
    spark_context.setLogLevel("ERROR")
    print(spark_session)
    print(spark_context)
    print(f"""\n Printing a line from {appName}""")

if __name__ == "__main__":
  main()

It comes back with


<pyspark.sql.session.SparkSession object at 0x7f9aebbdff28>
<SparkContext master=k8s://https://192.168.49.2:8443 appName=testme>

 Printing a line from testme


However, if I create a simple range as below


    spark_session = s.spark_session(appName)
    spark_context = s.sparkcontext()
    spark_context.setLogLevel("ERROR")
    print(spark_session)
    print(spark_context)
    rdd = spark_context.parallelize([1,2,3,4,5,6,7,8,9,10])

    print(rdd)
    rdd.collect()

    print(f"""\n Printing a line from {appName}""")

It never gets to collect()  --> rdd.collect()



<pyspark.sql.session.SparkSession object at 0x7f431ee92f28>
<SparkContext master=k8s://https://192.168.49.2:8443 appName=testme>
ParallelCollectionRDD[0] at readRDDFromFile at PythonRDD.scala:274


and hangs


Examining the pods from another session I see:


kubectl get pod -n spark
NAME                              READY   STATUS              RESTARTS   AGE
spark-master                      0/1     Completed           0          69m
testme-b7a8f97a714d9351-exec-33   1/1     Running             0          9s
testme-b7a8f97a714d9351-exec-34   0/1     ContainerCreating   0          9s


It keeps failing and recreating testme pods


 kubectl get pod -n spark
NAME                              READY   STATUS              RESTARTS   AGE
spark-master                      0/1     Completed           0          69m
testme-b7a8f97a714d9351-exec-33   1/1     Running             0          9s
testme-b7a8f97a714d9351-exec-34   0/1     ContainerCreating   0          9s
 kubectl get pod -n spark
NAME                              READY   STATUS              RESTARTS   AGE
spark-master                      0/1     Completed           0          70m
testme-b7a8f97a714d9351-exec-49   0/1     ContainerCreating   0          2s
testme-b7a8f97a714d9351-exec-50   0/1     ContainerCreating   0          1s
kubectl get pod -n spark
NAME                              READY   STATUS        RESTARTS   AGE
spark-master                      0/1     Completed     0          71m
testme-b7a8f97a714d9351-exec-49   0/1     Terminating   0          8s
testme-b7a8f97a714d9351-exec-50   1/1     Running       0          7s
testme-b7a8f97a714d9351-exec-51   0/1     Pending       0          0s
 kubectl get pod -n spark
NAME                              READY   STATUS              RESTARTS   AGE
spark-master                      0/1     Completed           0          71m
testme-b7a8f97a714d9351-exec-51   0/1     ContainerCreating   0          3s
testme-b7a8f97a714d9351-exec-52   0/1     ContainerCreating   0          2s


So it seems that pods testme.xxx keep failing and being created and I
cannot get the logs!


kubectl get pod -n spark
NAME                               READY   STATUS              RESTARTS
AGE
spark-master                       0/1     Completed           0
 80m
testme-b7a8f97a714d9351-exec-164   1/1     Running             0          8s
testme-b7a8f97a714d9351-exec-165   0/1     ContainerCreating   0          3s
kubectl logs -n spark testme-b7a8f97a714d9351-exec-165
Error from server (NotFound): pods "testme-b7a8f97a714d9351-exec-165" not
found


I have created this minikube with 8192 MB of memory and 3 cpus


Spark GUI showing rdd collection not starting


[image: image.png]



Appreciate any advice as Google search did not show much.



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Mime
View raw message