spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Spark on minikube hanging when doing simple rdd.collect()
Date Sun, 04 Jul 2021 12:19:47 GMT
Further to this I tried to create the same through spark shell connecting
to k8s on --master k8s://<KS_SERVER>:8443


scala> val r = 1 to 10
r: scala.collection.immutable.Range.Inclusive = Range 1 to 10

scala> val b = sc.parallelize(r)
b: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize
at <console>:26

scala> b.collect
2021-07-04 13:14:45,848 WARN scheduler.TaskSchedulerImpl: Initial job has
not accepted any resources; check your cluster UI to ensure that workers
are registered and have sufficient resources


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sun, 4 Jul 2021 at 12:56, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> Hi,
>
> I have a minikube with spark running.
>
> when the py file is pretty simple and does something like print(0 it works
>
> from src.config import config, oracle_url
> from pyspark.sql import functions as F
> from pyspark.sql.functions import col, round
> from pyspark.sql.window import Window
> from sparkutils import sparkstuff as s
> from othermisc import usedFunctions as uf
> import locale
> locale.setlocale(locale.LC_ALL, 'en_GB')
>
> def main():
>     appName = "testme"
>     spark_session = s.spark_session(appName)
>     spark_context = s.sparkcontext()
>     spark_context.setLogLevel("ERROR")
>     print(spark_session)
>     print(spark_context)
>     print(f"""\n Printing a line from {appName}""")
>
> if __name__ == "__main__":
>   main()
>
> It comes back with
>
>
> <pyspark.sql.session.SparkSession object at 0x7f9aebbdff28>
> <SparkContext master=k8s://https://192.168.49.2:8443 appName=testme>
>
>  Printing a line from testme
>
>
> However, if I create a simple range as below
>
>
>     spark_session = s.spark_session(appName)
>     spark_context = s.sparkcontext()
>     spark_context.setLogLevel("ERROR")
>     print(spark_session)
>     print(spark_context)
>     rdd = spark_context.parallelize([1,2,3,4,5,6,7,8,9,10])
>
>     print(rdd)
>     rdd.collect()
>
>     print(f"""\n Printing a line from {appName}""")
>
> It never gets to collect()  --> rdd.collect()
>
>
>
> <pyspark.sql.session.SparkSession object at 0x7f431ee92f28>
> <SparkContext master=k8s://https://192.168.49.2:8443 appName=testme>
> ParallelCollectionRDD[0] at readRDDFromFile at PythonRDD.scala:274
>
>
> and hangs
>
>
> Examining the pods from another session I see:
>
>
> kubectl get pod -n spark
> NAME                              READY   STATUS              RESTARTS
> AGE
> spark-master                      0/1     Completed           0
>  69m
> testme-b7a8f97a714d9351-exec-33   1/1     Running             0          9s
> testme-b7a8f97a714d9351-exec-34   0/1     ContainerCreating   0          9s
>
>
> It keeps failing and recreating testme pods
>
>
>  kubectl get pod -n spark
> NAME                              READY   STATUS              RESTARTS
> AGE
> spark-master                      0/1     Completed           0
>  69m
> testme-b7a8f97a714d9351-exec-33   1/1     Running             0          9s
> testme-b7a8f97a714d9351-exec-34   0/1     ContainerCreating   0          9s
>  kubectl get pod -n spark
> NAME                              READY   STATUS              RESTARTS
> AGE
> spark-master                      0/1     Completed           0
>  70m
> testme-b7a8f97a714d9351-exec-49   0/1     ContainerCreating   0          2s
> testme-b7a8f97a714d9351-exec-50   0/1     ContainerCreating   0          1s
> kubectl get pod -n spark
> NAME                              READY   STATUS        RESTARTS   AGE
> spark-master                      0/1     Completed     0          71m
> testme-b7a8f97a714d9351-exec-49   0/1     Terminating   0          8s
> testme-b7a8f97a714d9351-exec-50   1/1     Running       0          7s
> testme-b7a8f97a714d9351-exec-51   0/1     Pending       0          0s
>  kubectl get pod -n spark
> NAME                              READY   STATUS              RESTARTS
> AGE
> spark-master                      0/1     Completed           0
>  71m
> testme-b7a8f97a714d9351-exec-51   0/1     ContainerCreating   0          3s
> testme-b7a8f97a714d9351-exec-52   0/1     ContainerCreating   0          2s
>
>
> So it seems that pods testme.xxx keep failing and being created and I
> cannot get the logs!
>
>
> kubectl get pod -n spark
> NAME                               READY   STATUS              RESTARTS
> AGE
> spark-master                       0/1     Completed           0
>  80m
> testme-b7a8f97a714d9351-exec-164   1/1     Running             0
>  8s
> testme-b7a8f97a714d9351-exec-165   0/1     ContainerCreating   0
>  3s
> kubectl logs -n spark testme-b7a8f97a714d9351-exec-165
> Error from server (NotFound): pods "testme-b7a8f97a714d9351-exec-165" not
> found
>
>
> I have created this minikube with 8192 MB of memory and 3 cpus
>
>
> Spark GUI showing rdd collection not starting
>
>
> [image: image.png]
>
>
>
> Appreciate any advice as Google search did not show much.
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Mime
View raw message